This is the fourth in our series on the Gaming Internet. You can read the previous post here.
In our previous post we made the case that game developers suffer real financial impact from poor quality networking. Here we want to explore the reasons for those networking problems.
First, there is the simple fact that building an online game is a pretty serious technical challenge. Games can have millions of concurrent users, and individual game play can have thousands of users in a single session. For instance, EVE Online had a battler with 7,548 players all logged onto the same “server” in a single battle. (The event merited its own Wikipedia entry.) The game has to keep track of all the action, have graphical elements to represent all the game elements, and most crucially keep all the players synchronized. In the EVE example above, the company actually throttles the clock in the area of massive battles to keep their servers from getting swamped.
And players notice these things. Unsynchronized play is awful, someone gets a big advantage, being able to shoot an opponent before the opponent even sees the shooter, to name the most basic example. We used to follow a mobile game called Vainglory closely. This company had a perennial problem with their servers in the “Southeast Asia” region. Players there always seemed to suffer significant timing issues and gameplay lag. As we noted in our previous post, this had a serious negative marketing effect. Whenever service issues flared in SEA, customers in other regions started to complain about lag too. Even when users in other regions may not have actually been experiencing lag, their perception from the community was that there was a problem. Vainglory got knocked off its growth trajectory for a variety of reasons. Lag was not explicitly a problem that caused their stall, but it unquestionably affected player perception to a large degree, which compounded other issues.
Vainglory was aware of this problem and worked very hard to solve it. Their engineering team published a post on Reddit addressing the issue. And even five years later we find it informative. They note that the areas where they had the most networking trouble were “Southeast Asia” and “Europe”. These two share massive geographies with dozens of national borders to cross. The data center hosting the server was physically located in Singapore (which is in Southeast Asia, hence the name) but served all users from India to China to Australia. That involved crossing multiple Internet exchange points, underseas fiber strands and a host of local data rules. Our back of the envelope math gave us the impression that lag budget for some of these connections exceeded the 50ms maximum that is a rough rule of thumb in game play comfort. So before we even think about Wi-Fi or cellular connections, just getting close to the user already exceeded the threshold for many gamers. And as we noted, this had real financial impact on the company.
Game companies running network games have to contend with this sort of basic connectivity issues, which is as much geographic as technical. But there are even more fundamental issues that even advanced networks have not been able to fully grasp. We recently came across this paper by DiDomenico, Perna, Vassio and Giordano of the computer science department at Politecnico di Torino in Italy. They analyzed network usage of three streaming game services from Google (Stadia), Nvidia (GeForce Now) and Sony (PS Now). They found that the three services had a wide range of bandwidth consumption – with Nvidia and Google using roughly double what Sony required. This divergence seems to stem largely from the choice of networking protocols used by the service – the laggards used RTP and Sony used a custom implementation of UDP). This is a massive difference all stemming from what was likely a simple engineering choice made at the very start of the project, and only in hindsight did anyone realize the drawback of that choice. The engineers may have even realized the problem before launch, but we are guessing that by the point they understood the problem everything else had been built on top of that choice. It is also worth noting that the slower services are both newer, with Sony having been toiling away at this problem for much longer.
Our point here is just that the issue of lag, while of great importance, has multiple causes, and these are buried under many layers of follow-on engineering work.
In our next piece, we will turn the corner and start to look at solutions to these problems.