Case study: LLM tokenization | Capital for Tech Workers

Case study: LLM tokenization and commoditization

The term commoditization is increasingly used in the discourse around LLMs.¹ When people say that these models are becoming commoditized, what they tend to mean is that LLMs are becoming interchangeable and undifferentiated in the marketplace, driving prices lower as competition intensifies between providers. But we should not confuse this kind of commoditization with Marx’s concept of commodification. There is an important relation between the two uses, but they are not exactly the same.

To illustrate this, let’s consider the token, a unit that is used in LLM output and processing. A token chops up blobs of meaning into quantifiable pieces that can be processed by an LLM, and these choppings don’t necessarily correspond to words in the language being represented. For instance, the word ‘token’ itself might be ‘chopped’ into a single token, whereas the word ‘commoditization’ might be chopped into two (‘commod’ + ‘itization’). In contemporary services that make LLMs available such as Anthropic’s Claude Code or Google’s Gemini, token count is used as the unit of measure to determine what the service costs for a period of use. The more tokens generated in a month, the more dollars you pay.

This token count doesn’t necessarily have any meaningful relation to how useful the output actually is to the user. A wrong answer could cost the same (or even more) than a right one, depending on the token count. In Marx’s terms, it is the token that is the commodity being exchanged in the use of LLMs. The generation of language that LLMs make technically possible is made conceptually exchangeable by means of a specific quantity (the token count, typically priced per million) that is taken to express its social value. The commoditization of LLMs in the first sense, i.e. that prices are being driven down for these services, is a downstream consequence of the token’s originary commoditization. When multiple producers (OpenAI, Anthropic, Google) are selling what consumers collectively view as models of very similar quality, the price-per-token becomes the vector of competition on the open market.

What makes the token a striking case-study as a commodity is that it is a constructed proxy for the value that LLMs produce rather than a natural unit of language (what LLMs are usually thought to produce on-demand). A token is not a word, a sentence, or any explicitly innately meaningful linguistic unit. It is an artifact of how LLMs internally represent text, determined by their respective tokenizer algorithms. What this shows about the commodity form is that it is artificial to the things it claims to represent, which is another way of saying that it is ‘abstract’ (as in real abstraction). The quantity assigned to a thing doesn’t have to (and arguably always doesn’t) naturally belong to it. But this quantity, which brings the thing into the marketplace through the marker of its price, is often collectively treated as if it does, as if it is ‘naturally’ a part of the thing for sale. The token is a unit of computation and a unit of exchange-value; but this is not the same as saying that it is a unit of meaning or of value more generally.

The recent phenomenon of tokenmaxxing exposes the gap between the token as a linguistic/economic proxy and the originary value it is presumed to measure. Tokenmaxxing, now common practice at tech firms such as Meta and OpenAI, is the erection of a company-wide display on internal leaderboards to track token consumption per employee. An employee’s place on this leaderboard is then treated as a proxy for their contribution to the company, not unlike the way that LOC (lines of code) is sometimes mistreated as a metric that represents the amount of work one has put in.² While the impact of an employee’s work is what is supposed to ultimately matter, it is much easier to measure the amount of tokens or lines of code they produce than the actual quality (and impact) that those counts represent. It also creates a real possibility for what we could call token arbitrage, where employees optimize their workflows to produce more tokens at the explicit expense of producing other forms of value to the company and in their work. (What these other forms of value might be—and how the commodity form is structurally indifferent to them—is a question we will take up in a later chapter of this textbook on value.) As one industry observer notes, it has become “a career risk to not use A.I. at an accelerated pace, regardless of output quality” [2]. Generous token budgets have even emerged as a workplace perk (alongside free meals and gym memberships) with the distinction that they are explicitly tied to ‘employee output’, even as the measurement of that ‘output’ is unclear. The LLM token’s commoditization (in the Marxian sense), then, doesn’t just describe a suboptimal, abstract form of exchange when it comes to producing value. The token’s form as a commodity produces and reinforces real patterns of thought and behavior. Under capital, workers optimize for what is measurable and exchangeable, because that is what the system rewards (in and as dollars). Tokenmaxxing and the commodification of tokens is an emerging instance of that dynamic, and an exemplary case for how Marx’s analysis can (still) help us to understand the economic and labor ongoings of the tech world.

Bibliography

[1] E. SooHoo, “Did Elon Musk Really Fire People Using Lines Of Code As His Metric?,” Substack, Nov. 2022, [Online]. Available: https://evan-soohoo.medium.com/did-elon-musk-really-fire-people-using-lines-of-code-as-his-metric-15c17254ed33
[2] K. Roose, “More! More! More! Tech Workers Max Out Their A.I. Use.,” The New York Times, Mar. 2026, [Online]. Available: https://www.nytimes.com/2026/03/20/technology/tokenmaxxing-ai-agents.html

Case study: LLM tokenization and commoditization

Bibliography

Footnotes