[This time around, the article itself is the prompt - an experiment where both human readers and AI models are invited to critique and expand on this analogy. Please leave any reactions below, or copy paste away to your chatbot of choice. GPT, Gemini, Claude]
Why the heck am I, a non-US-citizen, sitting in Barcelona, writing about US elections on a blog about AI prompts? Aren't I far enough from all this to not care about the outcome? And as the Great John Mearsheimer says, aren't these 2 parties Tweedledee and Tweedledum when it comes to their impact on the rest of us plebs living outside the greatest country of the world?
Well, I had to write not about US Elections themselves, but about an analogy they perfectly crystallized: What if US democracy is just a giant transformer architecture, processing the mess of societal discourse embeddings through hidden layers until it collapses into a single output token - the president?

Fair warning: I'm about to venture into some seriously ambitious pseudo-intellectualism here. But before you dismiss me as just another tech person seeing AI patterns everywhere, let me break down this parallel that keeps tugging at my neural networks:
Initial Embeddings :: Political Issues: Take the immigration discourse. It begins as a complex tangle of border statistics, economic impacts, humanitarian concerns, and that video of migrants crossing the Rio Grande that your uncle won't stop sharing. Raw, multidimensional, messy.
Nodes :: Voters: Each voter receives these signals, processes them through their own context (that Thanksgiving dinner argument about "invasion" versus "humanitarian crisis"), and passes on transformed versions of these embeddings.
Attention Mechanisms :: Communication Channels: Fox News, MSNBC, Twitter echo chambers, Trump rallies, Democratic fundraising emails - each one an attention layer that selectively amplifies certain dimensions of the embedding space while dampening others. Your uncle's Rio Grande video gets more weight in some channels, the humanitarian crisis narrative in others. And just as in transformer architectures, some embeddings get progressively weakened through the layers until they effectively disappear from the final output.
Final Output Token :: Electoral Choice: All these transformed embeddings, processed through multiple layers of attention, ultimately have to collapse into a single binary choice. It's like trying to compress the entire works of Dostoyevski into a single emoji - you're going to lose some nuance along the way.
The fascinating (or terrifying) part? Just as language models can generate surprisingly coherent text from this process of dimensional reduction and transformation, democracy somehow manages to produce functional governance from this massive compression of political complexity into a single token: 🐘 or 🫏.
Like language models that don't create bright and shiny new information but rather recombine existing patterns1, democracy isn't meant to produce utopia2 - it's a massive information processing system for preventing dystopia. Its “genius” lies not in transforming our world, but in keeping us from destroying the one we have while ideas naturally evolve through generational change.
The real transformation happens outside the democratic transformer architecture: in university halls where young minds collide, in garage startups where old paradigms get disrupted, in protest movements where new social contracts get drafted. Democracy just tries its best to help us all survive long enough for these external innovations to reshape the system itself.

And if you think I'm just playing with abstract metaphors here, even a sloppy look at the outcome for the Democrats grounds the metaphor firmly:
Their attention mechanisms (mainstream media, academic discourse, urban social networks…) became increasingly self-reinforcing, effectively creating an embedding echo chamber.
When they tried to propagate messages about their priorities (climate change, social justice, institutional threats…), these embeddings kept getting weakened or lost as they passed through layers of voter attention that were more strongly tuned to other social and cultural concerns.
Ultimately, it looked like their attention mechanisms had been trained on a different dataset than the one they were trying to process3.
The parallels don't end there either - both training and prompting lend themselves eerily well to expanding this analogy.
Training: Think of how the concept of fine tuning aligns with historical shifts in political alignments. In the 60s, the American Civil Rights movement essentially retrained the entire political model: phrases like "states' rights" and "law and order" got entirely new embeddings. The old training data of segregation was effectively deprecated, and the model's weights were fundamentally rewritten to process race issues in an entirely new way. It was, in effect, a massive fine-tuning operation that permanently transformed how the democratic transformer processes race, rights, and political identity.
Prompting: Then there's the dark art of prompt engineering in politics. Just as AI researchers discover that slight variations in prompt structure can dramatically alter model outputs, political operators have long known that framing determines response. Take Brexit's masterful prompt engineering: "Take back control" filled the context window of the debate with an entirely different set of embeddings than what the decision was actually about: Suddenly, complex questions about trade agreements, freedom of movement, and economic interdependence collapsed into a binary choice between "control" and "submission".
So it's not just rhetoric; it's prompt optimization that fundamentally alters how political meaning propagates through society's attention layers.
If you've made it this far into my pseudo-intellectual musings, you're probably either nodding along or itching to point out where this analogy falls apart. Even better - if you think there's something to this parallel, pick your favorite political realignment moment and map its components to transformer mechanics. Paint the full picture: from initial embeddings through attention flows to final clustering.
Yes, I am with François Chollet & Yan Lecun on this one: You cannot produce brand new information from linear transformations of the old. Prior to a Swiss clerk's incredible intuition about the nature of our reality, the idea of spacetime existed nowhere, and no linear combination of the Newtonian universe was going to end up with gravity bending it. This talk with F. Chollet would be an effective way of “productive” procrastination if this topic is of interest to you.
Mandatory footnote on the etymology of “Utopia”: mid 16th century: wordplay based on Greek 1. ou ‘not’ + topos ‘place’. In short: No-place. 2. eu ‘good’ + topos ‘place’. In short: good place.
David Brooks does an amazing job capturing this with words in this (paywalled) article. This data from Patrick Flyn gives a good data-driven angle to Brooks’s analysis



