Vol. 2 — The Algorithm of Two Empires

Chapter 9: 1.4 Billion Data Points — Why China Dominates AI Application

~22 min read

Opening

Shenzhen, January 2025. 7:12 a.m.

The alarm goes off for Xiao Li, a twenty-eight-year-old office worker. He picks up his smartphone before his eyes are fully open. His hand moves first. He opens Douyin.

The moment the screen lights up, last night's data has already been processed. The videos he lingered on for more than three seconds. The ones where he turned on the sound. The ones he replayed. The one where his finger hovered over the save button for 0.3 seconds before pulling away. Every one of these actions was logged on ByteDance's servers. This morning's feed was designed by last night's Xiao Li. He does not know this.

7:35 a.m. A street stall outside his apartment complex. A bowl of rice porridge, 5.5 yuan. One youtiao, 1 yuan. Xiao Li does not reach for cash. He scans with WeChat Pay. 0.3 seconds. The transaction is recorded. Time, location, amount, merchant name. This 6.5-yuan breakfast is transmitted to Tencent's servers.

8:04 a.m. A transfer station on Metro Line 3. The camera above the turnstile reads his face. 0.1 seconds. The gate opens. He does not break stride. The camera was manufactured by Hikvision. Tens of thousands of faces passing through this station today are being recorded.

9:15 a.m. The office entrance. Once more. 12:22 p.m. A convenience store self-checkout. Once more. 4:00 p.m. A hospital waiting room. Once more.

On his commute home, an ad for the running shoes he placed in his Taobao cart appears in his Douyin feed. Xiao Li pauses. 3.2 seconds. That, too, is recorded.

The day ends. Xiao Li's location trajectory, spending patterns, gaze direction, product interests, movement speed, and emotional responses have flowed into the training data of dozens of AI systems. He did not sell his data. He simply lived.

"China has a lot of data."

This is true. But it is insufficient. Having 1.4 billion people does not, by itself, produce sufficient data. India also has 1.4 billion. Nigeria has a large population too. Volume alone does not create AI advantage.

The question in this chapter is more specific.

What is it about Chinese data? Through what structures is it converted into AI capability? And under what conditions does that capability become a genuine competitive edge, rather than an overrated myth?

We dissect Xiao Li's day across three layers. Payment data, movement data, behavioral data. How these three layers overlap to produce something approaching a "digital replica of human behavior." How that replica is amplified inside a symbiotic loop of AI and surveillance. And finally, why China's chosen strategy — "not building the world's best AI, but deploying good-enough AI to one billion people" — can, under specific conditions, generate greater economic impact than "selling frontier AI to one million people."

In the language of Volume 1, this is the national-scale divergence between the execution layer and the design layer.

Section A: Three Layers of the Data Lake — Payment, Movement, Behavior

Layer 1: The Completeness of Payment Data

Hangzhou. A fifteen-minute walk from Alibaba headquarters. A traditional market alley popular with tourists.

One flatbread. 2.5 yuan. Two skewers. 8 yuan. A bottle of water. 3 yuan.

The vendor has taped a palm-sized laminated card to the counter. An Alipay QR code. A WeChat QR code. A tourist pulls a 100-yuan bill from his wallet. The vendor looks up.

"Got QR?"

For this vendor, cash is a nuisance. He has to prepare change, deposit it at the bank, complicate his bookkeeping. QR settles instantly and leaves a record. Tax filing becomes easier. Right now, every transaction in this alley is being transmitted in real time to Alipay and WeChat Pay servers.

This is the first layer of China's data advantage. Completeness.

In the United States, credit card transactions are tracked. But cash transactions are not. Cash still accounts for 15 to 20 percent of the American payment market. Small businesses, tips, flea markets, street food. These transactions do not exist as data.

China is different. Virtually every transaction is digitally recorded. Street vendors, temple donation boxes, even roadside beggars have QR codes. With Alipay and WeChat Pay commanding over 90 percent of the payments market, China's consumption data now constitutes a near-complete record of the entire population.

In AI training, complete data is far more valuable than biased samples. A recommendation algorithm trained only on credit card users' spending patterns is a fundamentally different model from one trained on total consumption patterns that include cash street vendors.

There is, however, a trap.

China's payment data is complete, but homogeneous. Only transactions occurring within the WeChat Pay and Alipay ecosystems are recorded. Chinese-language users, domestic consumption, Chinese consumption patterns. This data lacks global diversity. If the AI serves Chinese consumers, this limitation is not a weakness. If the AI is meant for global services, the story changes.

America's data strength lies elsewhere. The English-language global internet — Wikipedia, Reddit, Common Crawl — contains text and knowledge from hundreds of languages and thousands of cultures. A combination of diversity and depth. The Chinese-language internet, censored behind the Great Firewall, does not possess this diversity. Data on politically sensitive topics — Tiananmen, Xinjiang, Taiwan — is structurally biased.

This contrast is not a simple ranking. It is a difference in purpose. Chinese AI is optimized for predicting the behavior of Chinese people. American AI holds the advantage in global language comprehension.

Layer 2: The Density of Movement Data

Comparitech estimates 700 million surveillance cameras as of 2025.

With a population of 1.4 billion, that is one camera for every two people. What these cameras record around the clock extends beyond faces. License plates, pedestrian patterns, crowd density, direction of movement. This data merges with real-time ride records from Didi Chuxing, and GPS trajectory data from Baidu Maps.

The result is a near-complete digital replica of movement patterns across entire cities. Which districts are crowded at which hours. Who moves from where to where. How those movement patterns translate into consumption.

Apply this data to traffic optimization, and traffic lights adjust in real time. Apply it to crime prediction, and public security deploys officers before crowds gather. Apply it to commercial analysis, and a convenience store chain decides the location of its next outlet by data.

This is the second layer. Density.

The United States also has smartphone GPS data, card-based location records, and social media check-in data. But it is scattered. Some at Google, some at Apple, some at each credit card company. China's movement data is concentrated in a single ecosystem to which the state holds unified access.

Layer 3: The Precision of Behavioral Data

Douyin — the Chinese version of TikTok. 800 million users in China. Average time spent per day: over one hour.

Douyin's data records something different from payments or movement. Emotional response. Attention span. The direction of desire.

Which video makes a face brighten. Which music makes a finger stop. Which product, when it appears, earns a second look. The recommendation algorithm built on this precise behavioral data is the basis for Douyin being the most addictive app in the world.

When these three layers — completeness of payment, density of movement, precision of behavior — overlap, the result is not a simple sum of data. It is a complete map of daily life, connecting what a person eats in the morning, where they travel, what they watch in the evening, and what they buy.

This is the real data advantage of Chinese AI. Not volume, but connectivity.

And for this connectivity to be constructed, one precondition is required. A structure in which data is collected whether or not people consent. That structure is the subject of the next section.

Section B: The AI-Surveillance Symbiosis — The Cycle of Data, Technology, and Control

Beyond the Turnstile

Shenzhen. One of the stations on Metro Line 1. Rush hour.

Turnstiles stand in rows. No card required. You simply walk. The camera reads your face, and 0.1 seconds later the gate opens. The fare is automatically deducted from WeChat Pay.

The name of the company that built this system is printed on a small metal plate beside the turnstile. Hikvision. The world's largest surveillance camera manufacturer, headquartered in Hangzhou.

The thousands of commuters passing through do not notice the name. While they walk, the cameras record. Faces. Speed. Expressions. Companions. No one knows where this data goes.

The Structure of Symbiosis

Hikvision's growth trajectory compresses the distinctive dynamics of China's AI industry.

Founded in 1998. Its early products were simple DVRs (digital video recorders). The turning point came after 2008, with the Ministry of Public Security's massive surveillance camera installation program, known colloquially as the "Sharp Eyes" project (Xueliang Gongcheng). The Ministry ordered hundreds of thousands of cameras, and Hikvision won contracts at scale.

The payment for those contracts was not only money. It was data access.

By building cameras connected to the public security network, Hikvision gained access to real-world facial recognition data from hundreds of millions of people. Faces in backlight, sidelight, crowds, and masks. AI trained on this data vastly outperformed competitors trained on laboratory datasets in accuracy. This was data that money could not buy on the open market.

Hikvision then commercialized that AI into civilian products. Corporate access control systems, school attendance checks, supermarket VIP recognition. AI trained through public security projects flowed into the private market.

The concept of "Harmonious AI" (hexie rengong zhineng), proposed by Professor Zeng Yi of the Chinese Academy of Sciences, encapsulates this difference. Where Western AI safety discourse revolves around "alignment" and "control," the Chinese discourse centers on "harmony" and "symbiosis." Same technology, different philosophical frame. The divergence in worldview — around the purpose for which the state collects data and the manner in which AI processes it — determines the direction of technology governance.

This is the structure of the cycle.

` Public security contracts -> AI firms build systems -> Access to real-world data ^ | Civilian commercialization <- More sophisticated AI <- Data training `

As long as this cycle operates, Chinese AI companies possess a strategic resource unobtainable through market competition alone. The relationship with the state is data access, and data access is technological competitiveness.

Dahua Technology followed a parallel path to Hikvision. The two companies now export surveillance cameras to more than 150 countries. According to the Carnegie Endowment for International Peace, over 75 countries have adopted Chinese AI surveillance technology. Africa, Southeast Asia, the Middle East, and even some European cities.

Price makes it possible. 30 to 50 percent cheaper than American competitors. This price competitiveness is not accidental. The domestic base of public security contracts created economies of scale, and that scale underwrites global price competitiveness.

The Cost of Symbiosis

But this cycle carries two kinds of cost.

The first is geopolitical. In 2019, the U.S. Department of Commerce placed Hikvision and Dahua on the Entity List. The stated reason: involvement in building surveillance systems in Xinjiang. Both companies' products were subsequently banned from U.S. federal procurement. American allies began gradually removing their equipment from government facilities.

Part of the export market closed. But the Global South market did not. American sanctions were effective among Western allies, but Hikvision's market share in Africa and Southeast Asia actually grew. The stigma of "products from a company with human rights issues" weakened when set against a 30 to 50 percent price discount.

The second cost is more fundamental. The privacy of citizens.

The thousands of people passing through the Shenzhen metro turnstile do not know where their faces are stored, how the data is used, or who has access. They did not consent. They may be aware that data is being collected, but they have no way to refuse. If they do not take this metro, they cannot get to work.

This is the hidden condition of China's data advantage. The completeness and density of data presuppose collection without consent. This presupposition arises from the nature of the Chinese system and is difficult to replicate in a democracy.

Views within China on data governance are not monolithic, either. Professor Xue Lan of Tsinghua University, in an April 2024 address, pointed to an uncomfortable reality in Chinese AI. Over 130 large language models have proliferated, but a significant number are assembly products — open-source models "encapsulated" with a superficial wrapper. "The originality of LLMs built this way is limited. Moreover, our computing power is being choked." Data is abundant, but the hardware to process it and the original models to build from it are lacking — an internal diagnosis.

To frame this simply as "bad China" is to flatten the analysis. American Big Tech also buries user consent inside labyrinthine terms of service. The difference between China and the United States is not whether surveillance exists, but who conducts it and to what end. In America, corporations collect data for advertising. In China, the state collects data for control.

That difference in purpose determines what kind of AI gets built.

Section C: The "Good Enough AI" Strategy — The Economics of Scale Deployment

The Empty Floor

JD.com's customer service center. On the outskirts of Beijing. Until 2023, this floor held thousands of agents.

Now it is empty.

The desks remain. The chairs remain. Only the computer monitors have been removed. The fluorescent lights stay on because the lease has not expired. The people who filled this space (mostly women in their twenties, from provincial cities, earning 4,000 to 6,000 yuan per month) — where did they go?

JD.com's AI customer service system now handles over 80 percent of simple inquiries. Order confirmation, shipment tracking, exchange and return processing, basic complaint resolution. Response speed is dozens of times faster than a human agent. Operating costs are a fraction of what they were.

This became possible because the performance gap narrowed.

The Gap Narrows to "Months"

In 2023, many analyses pegged the U.S.-China AI performance gap at "two to three years." That assessment became obsolete fast.

According to Epoch AI's data, the average performance gap between China's top models and American frontier models is seven months (a minimum of four months, a maximum of fourteen). Demis Hassabis, CEO of Google DeepMind, described the gap in a 2025 public appearance as "a matter of months."

The global market share of Chinese open-source AI surged from 1.2 percent at the end of 2024 to 30 percent by August 2025. Alibaba's Qwen 3.5, ByteDance's Doubao 2.0, DeepSeek V4. By early 2026, these models are rated on par with Western frontier models.

This narrowing of the gap explains the empty floor at JD.com.

AI does not need to be perfect. If it is ten times faster than a human agent, costs one-tenth as much, and succeeds 80 percent of the time, that is sufficient. The remaining 20 percent is still handled by humans. But the number of those humans shrinks from thousands to hundreds.

This cost calculation also factors in infrastructure expenses. China's data center electricity rates are less than half of America's. The net power capacity China added in 2024 alone was 430 GW — over fourteen times the 30 GW added in the United States that year (National Energy Administration / Jefferies). The cost of training and running inference on AI models is structurally lower. This means "good enough AI" can be deployed at lower cost. The energy cost advantage is a product of policy in the short term, but in the long term it becomes a structural foundation for AI competitiveness.

This is the strategic meaning of "Good Enough AI." The phrase does not denote inferior performance. It describes a strategic choice that prioritizes cost-performance ratio and deployment speed over absolute capability.

Kai-Fu Lee, chairman of Sinovation Ventures and former head of Google China, identified the turning point in an April 2025 interview with DigiTimes. "Nine months ago, I said China was still waiting for its 'ChatGPT moment.' Today, that moment has arrived. It is the 'DeepSeek moment.'" He designated 2025 as "the year Chinese AI applications reach world-class level." His strategic message fit into four words: "Make AI work." — Kai-Fu Lee, DigiTimes interview, April 2025

On the structural division of labor in U.S.-China AI competition, Lee maintained a consistent diagnosis. "The U.S. is strong in research, China is strong in execution. Chinese companies have always been exceptional at application. WeChat is far better than WhatsApp. China's ride-hailing and grocery delivery services are better than America's." — Kai-Fu Lee, National Committee on U.S.-China Relations (NCUSCR) podcast, 2025

The Execution-to-Design Trajectory

The execution-to-design trajectory, traced in Chapter 8, applies directly here. Germany began by "executing" British technology and eventually overtook Britain in organic chemistry and the electrical industry. The United States created the Transformer architecture and GPT-4; China maximizes applications on top of those principles — payment systems, surveillance networks, recommendation algorithms, customer service. Whether China's scale of execution can generate a similar leap into the design layer is the conditional question.

If China breaks through chip sanctions and acquires the capability to train frontier models, the application experience accumulated through execution can convert into design capability. If sanctions persist and the bottleneck in frontier development holds, China may become locked into a position that is strong in application but structurally lagging in design. Which direction prevails depends on the semiconductor supply chain and the pace of algorithmic innovation.

Temu and TikTok: The Globalization of Execution

Two numbers demonstrate the deployment scale of "good enough AI."

Temu's global e-commerce market share: 24 percent. Tied with Amazon. Three years after entering the U.S. market in 2022.

TikTok is the number-two app on the U.S. App Store in 2025.

Neither Temu nor TikTok calls itself an AI company. But the principle on which these platforms operate is AI. Temu's pricing algorithm, product recommendation system, supplier matching engine. TikTok's content recommendation, ad targeting, retention optimization. All powered by AI.

Are these AIs "GPT-5 level"? No. But for their purpose — keeping consumers on the platform, converting attention into purchases, and bringing them back — they are "good enough." And that "good enough" captured 24 percent of the global e-commerce market and the number-two spot on the U.S. App Store.

The paradox of Chinese AI application services gaining share in the United States reveals a structure in which America maintains its lead at the design layer (chips, model architecture) while China encroaches at the execution layer (user data, consumer behavior capture).

This is not a simple app competition. It is the process by which Chinese algorithms are redesigning the attention and wallets of American consumers.

Section D: Global Expansion — Chinese AI Spreading Despite Sanctions

Only the Algorithm Remained

In early 2025, the U.S. Congress passed the final version of the TikTok "divest or ban" bill. If ByteDance did not sell its U.S. operations within 180 days, the app would be removed from app stores.

ByteDance sat down at the negotiating table. It discussed a joint venture structure with American investors. Americans would control the board. Servers would be in the United States. User data would be managed by a U.S. entity.

But ownership of the algorithm was never separated. The core code of the content recommendation engine, the user behavior analysis models, the retention optimization systems: all of these remained classified as "Chinese-side intellectual property" during negotiations. American users' data would be stored on American servers, but the brain analyzing that data would still belong to ByteDance.

The U.S. government attempted to "expel the Chinese algorithm." What was actually expelled was the surface.

This structure is possible because of the nature of software. Move a factory, and the machinery follows. Move the servers, and the algorithm is still the algorithm. As long as intellectual property rights are separated, physical relocation produces only the illusion of control.

A $33 Billion Digital Trade Surplus

In 2025, China's digital services trade surplus reached $33 billion, an all-time high. The number is not a simple economic indicator. It is the structural paradox of a sanctioned nation running a surplus in digital space.

Hardware sanctions work. NVIDIA H100s are not going to China. ASML's EUV equipment is not going to China.

Software is different. TikTok's recommendation algorithm crosses borders. Temu's AI pricing engine crosses borders. SHEIN's real-time fashion trend analysis system crosses borders. While these services run on the screens of consumers in America, Europe, and Southeast Asia, those consumers' behavioral data flows back in the opposite direction.

Temu sells cheap goods to American consumers. Simultaneously, it collects their consumption pattern data. Pinduoduo's algorithm improves with that data. The improved algorithm produces more accurate product recommendations. Consumers buy more. More data accumulates.

If the goal of sanctions is to limit the capability of Chinese AI, at least at the software layer, that goal is not being achieved.

Two Kinds of Export

China's global AI expansion follows two paths.

The first is B2C services. TikTok, Temu, SHEIN. They engage consumers directly and collect consumer data. To block this path, you must ban the service itself. And bans generate consumer resistance. American users mass-installing TikTok just before the ban and protesting platform migration is the evidence. Politically prohibiting a service with 170 million American users invites legal conflict and public backlash simultaneously.

The second is B2G infrastructure export. Hikvision and Dahua's surveillance systems, Huawei's 5G equipment, Alibaba's smart city platforms. This path provides developing-country governments with cheap infrastructure while embedding Chinese technology standards into their digital backbone.

The second path carries deeper long-term implications. Once infrastructure is laid, it is difficult to replace. A telecommunications network running on Huawei 5G continues to require Huawei-compatible equipment. A city that adopts Alibaba's smart city platform remains within the Alibaba ecosystem. This is lock-in.

China's digital infrastructure export strategy engineers this lock-in by design. The first contract comes at a low price; revenue follows from upgrades and maintenance. This structure is identical in form to American Big Tech's cloud strategy. Lay the platform first, create dependency, then capture value.

Data advantage self-reinforces through infrastructure export. The more countries where Chinese companies install surveillance cameras, the more real-world facial recognition data accumulates. The more data accumulates, the more sophisticated the AI models become. More sophisticated AI delivers higher performance at lower cost. Lower cost drives adoption in more countries. This cycle operates at the national level.

But even this cycle encounters friction. As of 2025, some of the 75 countries that have adopted AI surveillance systems face simultaneous pressure from human rights organizations, domestic citizen backlash, and American diplomatic leverage. Lock-in is powerful, but it carries political cost. When a regime changes after infrastructure is already installed, a different story begins. The partial removal of Chinese surveillance systems in Malaysia and several Brazilian cities sets that precedent.

Volume 1 Connection: The Execution Layer vs. the Design Layer at National Scale

In Volume 1, we dissected one factory into two floors: the design layer, where Arkwright's machines were conceived and systems built, and the execution layer, where handloom weavers operated them. The design layer captured an ever-greater share of value; the execution layer was either replaced or saw its wages decline.

We now extend this dichotomy to the national scale.

The United States currently dominates the design layer of AI. The Transformer architecture, the CUDA ecosystem, the GPT family of models, cloud infrastructure. America is the side that creates the operating principles and rules of AI. NVIDIA, the maker of GPUs, is an American company. The frameworks — PyTorch, TensorFlow — originated in the United States. Control of this design layer is what makes export restrictions possible. The side that designs gets to decide what flows out and what is blocked.

China is strong at the execution layer of AI. It deploys AI into applications the fastest, the widest, and the cheapest. Approximately 700 million cameras, 800 million Douyin users, payment data from 1.4 billion people. The experience accumulated from real-world deployment at this scale cannot be replicated by laboratory research.

Volume 1 noted that this dichotomy is a simplification.

Germany started by "executing" British technology. By 1913, Germany's share of global manufacturing reached 14.8 percent, surpassing Britain's 13.6 percent. Germany took 90 percent of the global organic chemistry market. It began with execution, but once sufficient scale and time accumulated, design capability emerged.

The signals are visible in China today. DeepSeek R1 achieved frontier-level performance at 40 percent lower computational cost. This is not mere application. It is the design of an efficient training methodology. Training cost innovation is not a product of the execution layer. It is entry into the design layer.

"Those who use it most eventually come to understand it best." Whether this proposition applies to AI remains an open question. Pretending to hold a closed answer is not the analyst's role. Specifying the conditions precisely is.

Condition 1: If China secures the computing power required to train frontier models, there is a possibility that execution experience converts into design capability.

Condition 2: If semiconductor sanctions persist and the computing gap holds, China may become locked into a position that is strong in application but structurally constrained in architecture design.

Condition 3: If China sources computing through its own AI accelerators (Huawei's Ascend series or Cambricon/Hanwuji), the effect of sanctions is partially offset.

How these three conditions resolve determines whether "mass deployment of good enough AI" leads to a leap into the design layer or not.

In Volume 1, the handloom weavers did not foresee that machines would replace their skills. Today's question runs in the opposite direction. The United States knows exactly that China's scale of execution may, at some point, generate design capability. That is why export controls exist.

How effective those controls are is the subject of Chapter 13.

Transition: Where the Displaced Stand

Return to the empty floor of the JD.com customer service center.

Behind the statistic that AI chatbots handle 80 percent of simple inquiries lies another number. The thousands who worked on that floor: where did they find new jobs, and at what wages? While the data of 1.4 billion people trains AI, some of those 1.4 billion are being replaced by the very AI they trained.

Behind "the country that is strong in AI application" stands "the country of people displaced by AI."

Volume 1's core formula operates here as well. Technological innovation leads to capital concentration leads to social instability leads to institutional redesign. In Industrial Revolution-era Britain, the interval during which more than two-thirds of productivity gains accrued to capital was Engels' Pause. Today, platform GMV explodes while the incomes of the people who once powered those platforms stagnate or fall. A Chinese edition of Engels' Pause is unfolding.

In the world's factories, in customer service centers, in job postings: the people pushed out. Those carrying master's degrees and climbing onto delivery scooters.

These are China's displaced.

Next chapter — Ch. 10: China's Displaced — 996, Tang Ping, and the Age-35 Crisis

Citations and references for this chapter can be found in the Vol. 2 bibliography.