Why China’s Train Ticket Site Struggled: A Practical Look at High-Traffic Web Performance
The real problem is not just traffic
When a national ticketing site collapses under demand, the public reaction is obvious: people complain, mock the technology, and compare it to other large internet systems that seem to work fine. But ticketing at this scale is not the same as instant messaging, online games, or even ordinary promotional flash sales. If you want to talk seriously about performance, the first thing to examine is the business model behind the system.
This discussion focuses only on performance. It does not cover UI, general usability, or product decisions such as whether payment and ticket ordering should be split into separate steps.
Why this business is unusually difficult
It is not like QQ or online games
A common comparison is to messaging platforms or massively multiplayer games. That comparison is misleading. Those systems mostly read and update each user’s own data. A train ticketing system is centered on a shared pool of inventory. Everyone is competing for the same ticket stock, and the system has to coordinate access to that central data under extreme concurrency.
That is a very different backend problem. Systems like games or messaging may have huge user counts, but their load characteristics are often simpler than a transactional commerce system.
It resembles a flash sale, but only on the surface
At first glance, train ticket booking during peak travel seasons looks like a giant limited-time sale. The resemblance is superficial.
A typical flash sale can often be simplified aggressively:
- accept only the first N incoming requests,
- avoid touching the database during the rush,
- record order intent in logs,
- keep the available quantity in memory,
- spread inventory across machines,
- stop accepting requests once the quota is reached,
- write results back to the database later in batches.
That works because the item set is usually small and the process can be heavily pre-filtered.
Train tickets are much harder. The system has to handle an enormous volume of searches before ordering even begins. Then, when people actually place orders, consistency becomes painful. A ticket is not just one isolated item; seat availability can involve consistency across multiple route segments between departure and arrival points. Add transfers, multiple trains, different times, and constant changes in user choices, and the complexity goes far beyond a simple “buy now” event.
In peak holiday periods, almost every ticket is effectively a hot ticket. The user base is nationwide, demand arrives all at once, and inventory updates may span multiple lines and segments. That is vastly more complicated than an ordinary promotional event.
There is also an important practical point about large e-commerce events: the hardest systems do not simply let all users hit the database directly. A large-scale sale can survive because users are filtered heavily before they ever reach transactional systems. For example, CAPTCHA and CDN-level filtering can reduce ten million users to a few tens of thousands of serious requests, making the database load manageable.
It is not like the Olympic ticketing model either
Another misleading comparison is Olympic ticketing. Even when those systems had public failures, the business rules were still different. If allocation is lottery-based rather than pure first-come, first-served, then the front-end phase mainly collects user information. There is no immediate contention over a limited inventory object in real time, which means less locking and easier horizontal scaling.
It is closest to an e-commerce order system
The most meaningful comparison is a retail order system with inventory control. The pattern is familiar:
- reserve inventory,
- collect payment, when required,
- deduct inventory.
This is exactly where consistency checks matter. Under concurrency, the system has to lock data or enforce equivalent guarantees. That is the real bottleneck.
Most B2C systems do not fully process every order immediately. They commonly accept the order first and process it asynchronously. That is why people sometimes receive follow-up emails telling them an order could not actually be fulfilled. The system delays final confirmation because concurrent consistency is expensive.
The release pattern makes everything worse
Rail ticket sales are made even harder by the business pattern itself. Tickets are often released at specific times in sudden bursts, while demand far exceeds supply. That creates the familiar scramble. The moment tickets appear, millions of people rush in to search and submit orders.
A site taking tens of millions of visits within a short window is already under pressure. Reports at the time described peak traffic around 1 billion page views, concentrated between 8 a.m. and 10 a.m., with peak per-second page views reaching into the tens of millions.
That business pattern alone guarantees stress.
Inventory consistency is the real performance ceiling
Inventory is one of the hardest problems in B2C systems. Anyone who has worked in retail or traditional commerce knows stock management is complicated even before internet-scale concurrency enters the picture.
For websites, rendering pages is the easy part. Search is harder but often manageable with caching. Ordering is the truly difficult operation because it touches inventory and demands correctness.
Order throughput is usually much lower than outsiders expect. Public-facing sites can serve huge numbers of page requests, but once transactions require strong consistency, the numbers collapse. People often mention that nginx can handle 100,000 static requests per second. That may be true for static traffic under ideal conditions, with enough bandwidth, I/O capacity, CPU, and concurrent TCP handling. But that number becomes almost meaningless once consistency, locking, and inventory updates enter the picture.
This is the key point: data consistency, not raw HTTP serving, is the true bottleneck.
Front-end optimizations that could make a major difference
If the goal is better performance, there are a number of standard techniques that can dramatically reduce front-end pressure before the backend is overwhelmed.
1. Front-end load balancing
DNS-based balancing can distribute users across multiple web servers. Because HTTP requests are short-lived, simple load-balancing approaches can already reduce pressure significantly.
A CDN is even better. It routes users to nearby servers and usually comes with distributed storage, which helps both latency and scalability.
2. Reduce the number of front-end connections
One of the easiest ways to hurt yourself under heavy traffic is to make each page load too many separate resources.
At the time, the homepage required more than 60 HTTP connections, and the ticket booking page more than 70. Browsers request these concurrently. A single page has a concurrency cap, but users open multiple tabs, and backend TCP resources do not disappear instantly when the front end disconnects.
That means one million users could theoretically create around 60 million connections on first load. Browser caching reduces the number after the initial visit, but even if only 20% remain, that still leaves millions of connections.
The fix is straightforward:
- combine JavaScript into one file,
- combine CSS into one file,
- merge icons into a sprite,
- use CSS to display portions of the sprite,
- reduce resource counts as much as possible.
For a login and search page, minimalism matters.
3. Shrink pages and save bandwidth
Images and large assets consume bandwidth quickly. A page around 900 KB may not look excessive under normal broadband conditions, and repeat visits may only require around 10 KB thanks to browser cache. But the first-visit scenario is what kills you.
Consider one million simultaneous first-time users, each downloading 1 MB. If the response has to complete within 120 seconds, the bandwidth required is:
1M * 1M /120 * 8 = 66Gbps
That is enormous.
This suggests that, during the first surge, network bandwidth may become the first bottleneck. Users see timeouts or no response at all. Later, as browser caching reduces bandwidth pressure, the bottleneck shifts to backend processing. At that point, HTTP 500 errors start appearing, which usually indicates backend failure rather than network saturation.
4. Serve more static content
Pages and data that do not change frequently should be pre-generated and compressed with gzip. Static delivery is far cheaper than dynamic generation.
Using nginx sendfile allows files to be transferred efficiently through kernel space, reducing expensive disk and application overhead.
5. Optimize repeated searches
A large percentage of users query the same routes, dates, and trains. There is no reason to let identical searches hammer the database independently.
Reverse proxies and query-result caches can collapse identical concurrent requests:
- the first request hits the database,
- the result is cached,
- subsequent identical requests are served directly from cache.
A hash of the query can be used as the cache key, and NoSQL systems are well suited to this sort of workload.
For ticket availability, showing a simple “available” or “not available” instead of exact counts could also reduce complexity and relieve pressure. Search load should be pushed away from the database so the database can focus on actual ordering.
6. Caching is powerful, but not free
Caching can accelerate dynamic pages and search results, but it introduces its own problems.
Cache updates:
The cache must stay reasonably synchronized with the database. Two common approaches are:
- expiration-based invalidation, where stale entries time out and are reloaded,
- explicit update notifications from the backend whenever data changes.
The first is simpler but less real-time. The second is more accurate but more complex.
Cache eviction:
Memory is limited, so inactive data must be removed. This mirrors operating system paging behavior. Classic strategies include FIFO, LRU, and LFU.
Cache rebuild and persistence:
In-memory cache can be lost during failures or maintenance. If the cache is large, rebuilding it can be slow enough to hurt production traffic. That means persistence strategies matter too.
Modern NoSQL systems generally provide strong support for these caching concerns.
Backend optimizations: where the real battle begins
If front-end improvements succeed, the bottleneck simply moves deeper into the system. Then the hard problems become unavoidable.
1. Data redundancy
One way to improve performance is to denormalize data and reduce expensive table joins. That can make queries faster, but it weakens consistency and raises risk.
This is one reason NoSQL can appear fast: duplication reduces relational overhead. But in exchange, consistency becomes harder to guarantee. Whether that trade-off is acceptable depends entirely on the business.
An important practical note: moving from a relational model toward NoSQL is often easier than going back the other way.
2. Data mirroring and replication
Mainstream databases support replication, which provides two major benefits:
- better load distribution,
- higher availability.
Traffic can be spread across multiple replicas, and if one node fails, another can continue serving.
Consistency across replicas can itself be complicated, so another technique is to partition a single hot inventory item across machines. For example, if one product has 10,000 units, you might distribute it across 10 servers with 1,000 units each. That resembles warehouse-level partitioning in retail.
3. Data partitioning
Replication does not solve the problem of oversized tables and slow operations on huge datasets. Partitioning is still necessary.
Several common strategies exist.
Logical partitioning
Split data by business logic. For a rail booking system, partitioning could be based on railway bureaus, train types, departure stations, destinations, and so on. The original large table becomes multiple tables with the same fields but different categories, and those tables can live on different machines.
This is often the most practical and useful strategy.
Vertical partitioning
Split fields rather than rows. Data that rarely changes can live in one table, while frequently updated fields go elsewhere. This reduces the width of heavily accessed rows and can improve performance, though it also makes the application logic more complex.
Very wide rows may spill across multiple storage pages, which hurts both reads and writes.
Range-based sharding
If logical categories are not evenly distributed, some partitions may still be much hotter than others. In that case, data can also be split more evenly by primary key range.
Partitioning a single hot item
This is the earlier idea of dividing one inventory value across several servers, such as 10,000 units spread across 10 nodes.
All of these methods have trade-offs. But for train ticket data, partitioning by geography and placing data closer to provinces or cities could produce a major qualitative improvement in performance.
4. Backend load balancing
Partitioning reduces average pressure, but it does not solve hotspots. Some lines between major cities will always attract concentrated demand.
That is why replication and backend load balancing are still required. However, backend balancing is harder than front-end balancing. A router can distribute traffic volume, but traffic volume does not necessarily reflect server stress.
What is needed is a task distribution system that also monitors backend health.
That system has to answer difficult questions:
- What counts as “busy”? High CPU? High disk I/O? High memory use? High concurrency? Heavy paging?
- How are those metrics reported to the dispatcher?
- How does the dispatcher choose the least loaded machine?
The dispatcher itself also introduces problems:
- task queues must not lose work,
- queues need persistence,
- batching task assignments can improve efficiency,
- the dispatcher must be highly available,
- if a dispatcher fails, persisted queues must move safely to another machine.
Static assignment methods such as simple hashing or round-robin are common, but they are often inadequate. They do not adapt well to uneven load, failed servers, or newly added machines. Consistent hashing helps somewhat, but does not solve everything.
A more flexible model is pull-based or preemptive balancing, where downstream workers fetch tasks from a task server when they are ready. That reduces central complexity and makes it easier to scale machines up or down dynamically. The drawback is that special tasks tied to special servers become harder to handle. Even so, this approach is often better overall.
5. Asynchronous processing, throttling, and batching
These three ideas belong together because all of them rely on queueing.
Asynchronous processing
From a business perspective, async means collecting requests now and processing them later. From a technical perspective, it allows components to run in parallel and scale horizontally.
But async brings complications:
- returning results from called services may involve inter-process or inter-thread communication,
- rollback becomes harder,
- concurrency control grows more complex,
- message-based systems must handle loss and out-of-order delivery.
Throttling
Throttling does not increase throughput by itself. Its purpose is protection. It prevents the system from being crushed by traffic beyond its capacity.
This is especially important when interacting with systems you do not control, such as banking interfaces.
Batch processing
When many similar requests arrive together, they can often be handled in groups. If many users are buying the same item, you do not necessarily need one database write per buyer at the exact moment of arrival.
Batching also saves network bandwidth. Ethernet MTU is 1500 bytes, and fiber can be much larger. Sending undersized packets wastes capacity because network drivers are more efficient when reading fuller chunks. So waiting briefly to collect enough data before performing network I/O is also a form of batching.
The enemy of batching is low traffic. For that reason, batch systems usually use two thresholds:
- a work-size threshold,
- a timeout threshold.
As soon as either one is met, processing begins.
In practice, once a system becomes asynchronous, it almost always also gains throttling, queues, persistence, and some form of batch execution.
The queueing-system idea: useful, but not a complete answer
A queue-based ordering system is often proposed for overloaded ticketing sites. The basic idea is simple: the system receives your booking request but does not process it immediately. Instead, it limits incoming flow according to its own capacity and handles requests gradually. After processing, it informs users by email or SMS whether they can proceed.
Technically, this sounds attractive. But several business and user-level questions remain.
1. Queue-based denial-of-service and scalper abuse
Is the queue just first-come, first-served? If so, that alone does not stop scalpers.
A queue identifier can be abused. An attacker can create a large number of queue entries, enter the purchase process, and then simply not complete the purchase, occupying slots for long periods. If the hold time is half an hour, a malicious actor can consume capacity cheaply and prevent normal users from buying.
Requiring ID-based accounts helps, but it does not eliminate abuse. Scalpers can still create many accounts, enter the queue, and abandon the process. Their real goal at that point may simply be to make the official site unusable so customers are forced to buy through intermediaries.
2. The queue itself becomes a consistency bottleneck
How is the queue managed? If assigning queue positions requires locking, then the queue becomes another contention point. With one million users simultaneously asking for positions, that queue can easily become a performance bottleneck in its own right.
It is hard to believe a custom queue system will outperform a mature database at this kind of contention. Fighting over a queue is fundamentally not so different from fighting over a database.
3. Wait-time design is full of trade-offs
How much time should a user get once admitted? Is 30 minutes too much or too little? If it is too short, users complain that they did not have enough time. If it is too long, everyone behind them complains.
And the arithmetic quickly becomes brutal.
Assume:
- 10 million users,
- only 10,000 can be admitted at a time,
- each group gets 15 minutes.
Then the full cycle is:
- 10,000,000 / 10,000 = 1,000 batches
- 1,000 * 15 minutes = 15,000 minutes
- 250 hours
- more than 10 days
At that pace, the train would leave before many users finished the queue. Even under reduced load, queueing may still fail the business requirement.
4. A single queue is not enough
Even if the queue mechanism works, one universal queue is still a poor fit. If the users admitted at the same time all want the same train and seat type, contention simply reappears on the same backend server.
A better design would queue by user demand, such as departure and destination. Multiple queues can then scale horizontally.
That improves performance, but it still does not solve the unpleasant reality of long waits.
A more practical booking model
A stronger approach would be to learn from e-commerce ordering.
Instead of asking the user to remain online for a long interactive purchase window, the system could collect everything up front:
- passenger information,
- desired routes,
- fallback options,
- purchase priority rules,
- preloaded funds.
For example, a user could specify: if sleeper on Train A is unavailable, try sleeper on Train B; if that fails, try a hard seat, and so on.
After that, the system could process orders asynchronously and automatically. Users would simply receive a message indicating success or failure.
This approach has several advantages:
- it removes long interactive wait windows,
- automation speeds up fulfillment,
- identical requests can be merged for batch processing,
- database writes can be reduced,
- user demand becomes explicit and measurable,
- queue assignment can be optimized based on actual intent,
- planners could potentially use aggregate demand to adjust scheduling and capacity more intelligently.
Of course, the queue or order system itself must still be persisted in a database or other durable storage. Keeping it only in memory would be reckless, because a machine failure would erase user requests instantly.
Structural lessons
Several broader conclusions follow from all of this.
Horizontal scalability is non-negotiable
No matter how the system is designed, every stage in the data flow should be able to scale horizontally. If one stage cannot, it will become the choke point, and adding more servers elsewhere will not help much.
These capabilities are built over time
None of these techniques can be assembled overnight. Every optimization introduces complexity and trade-offs. High-performance transactional systems are the result of long engineering accumulation, not a last-minute patch.
Centralized nationwide selling is inherently hard
A fully centralized ticketing architecture is difficult to make work under this business pattern. The techniques described above can improve performance by orders of magnitude, but separating sales across regional or provincial systems would likely produce the most dramatic practical gains.
The business model is itself extreme
A demand pattern where supply is released suddenly, supply is far below demand, and tens of millions of people try to log in and buy at the same time in the same morning is an extreme business form. The more abnormal the demand pattern, the more inevitable the criticism becomes, regardless of technical effort.
Building for a few peak weeks is expensive
There is also an economic reality: constructing an enormous system for one or two weeks of peak demand each year means much of that capacity sits idle the rest of the time.
Traffic snapshot
An Alexa chart from the time illustrated the site's page-view trend. Note that Alexa counted multiple clicks by one user on the same page within a single day as just one page view.
