Designing an Ontology for Formula 1

Designing an Ontology for Formula 1: why LLMs struggle with deep, structured questions, and how a domain‑specific ontology can model teams, drivers, cars, results, and evolving regulations to answer nuanced, historically grounded questions that general‑purpose tools cannot.

Designing an Ontology for Formula 1
Photo by; Steffen Prößdorf

In recent years, the rapid rise of Large Language Models (LLMs) has transformed how people search for, generate, and interpret information. Their capacity to synthesize knowledge across vast domains has brought new levels of accessibility and efficiency to everyday research and professional work. From summarizing academic articles to generating natural-sounding answers in seconds, LLMs represent a remarkable step forward in human–machine interaction.

However, while LLMs excel at breadth, they often stumble at depth. The same generative fluidity that makes them useful for exploring broad topics becomes a liability when precision truly matters. In domain-specific or data-sensitive contexts, such as scientific research, technical analytics, or historical documentation, the accuracy of an answer depends not just on linguistic fluency but on the system’s understanding of structured relationships between entities. A question as simple as “Who was the team principal with the most Formula 1 World Championship titles?” illustrates this tension: without formalised definitions of teams, seasons, and roles, even powerful models can produce inconsistent results.

This is where ontologies and knowledge graphs still retain their importance. Unlike LLMs, which reason over distributed statistical patterns, ontologies offer an explicit, structured representation of knowledge—a way to define and relate concepts systematically across data sources. During my master’s thesis, I worked on Knowledge Graphs and Multi-Source Ontologies in the domain of cooking, exploring how structured semantic relationships could clarify ambiguities and improve automated reasoning. That experience revealed that ontologies are not simply technical artifacts; they are epistemic frameworks that help preserve meaning, trace data provenance, and ensure interpretability over time.

Even as LLMs dominate discussions about knowledge systems, ontologies continue to serve a crucial complementary role. They provide the semantic structure that models, researchers, and information systems can rely on for consistency, interoperability, and long-term integrity. In a future filled with generative AI tools, ontologies remain the ground truth—the quiet infrastructure ensuring that information, in all its complexity, stays connected to the real structures of the world. This is the context in which the central questions of this piece arise: what should an ontology for Formula 1 look like, and why design one in the first place?

A Motorsport Historian’s Motivation

Earlier this year, I began discussing with a few friends the idea of creating a motorsport‑dedicated search engine, powered not by broad natural-language generation but by structured, interconnected data. Every relevant entity—Drivers, Teams, Manufacturers, Engine Suppliers, Team Principals, and more—would be formally modelled, along with the intricate relationships linking them. Reactions ranged from cautiously intrigued to mildly confused, and nearly everyone asked the same question: “Why not just use Wikipedia?”

Wikipedia is a cornerstone of modern open knowledge. Its breadth offers immense value, with millions of articles supported by extensive cross-linking and citations. Yet this scale also introduces structural fragility: maintaining accuracy across such an enormous corpus requires constant contributor effort, ongoing validation, and continual updates. One of Wikipedia’s greatest strengths is its vast coverage and the taxonomic relationships that can be abstracted about specific entities, often backed by references to official records and trusted authorities.

By contrast, official data sources such as Formula 1’s own databases typically deliver high reliability and precision, but lack the flexibility and cross‑referential richness that open knowledge graphs provide. Between these two extremes lies the opportunity that motivates a dedicated F1 ontology: a domain-specific schema capable of combining the rigor of official records with the relational depth of a semantic knowledge graph. Such a system could model not only historic and technical data but also the contextual and temporal relationships that define the sport’s evolution—how teams, drivers, regulations, and organisations have interacted over decades. This is the why behind an ontology for Formula 1: to support questions that general-purpose tools and ad‑hoc databases are structurally ill‑equipped to answer.

What Should an F1 Ontology Answer?

Before deciding how to model Formula 1, it is necessary to clarify what kinds of questions the ontology should be able to answer and, at a more metaphysical level, what kind of “thing” this ontology aspires to be. Ideally, it should support questions that almost nobody has explicitly documented before. For example, computing the total number of wins for “Team Enstone” across its various incarnations—from Toleman in the early 1980s through Benetton and Renault to the Alpine era—or identifying which driver has the most points finishes or the most top‑ten finishes over a given period.

These considerations lead naturally to a pair of guiding questions:

  1. What kinds of questions and answers should the ontology support? 
  2. How specific should the data be, and to what extent should we allow for “information leakage” across eras, entities, and naming changes (for example, treating Enstone as a continuous identity despite commercial rebrandings)? 

The answers will determine the ontology’s scope, its granularity, and its philosophical stance toward identity and continuity in motorsport history.

Scope, Sources, and “Good Enough” Data

Among current Formula 1 teams, Ferrari stands out for the way it presents its heritage. Its official site dedicates entire sections to its drivers and cars, with rich historical and technical detail that reflects the team’s long‑standing status and sense of identity in the sport. Dedicated pages for individual cars and champions provide a useful benchmark for the kind of information available in a best‑case scenario.

Consider Ferrari’s 2024 car, the SF‑24. Technical details such as maximum MGU‑K power in the ERS system or maximum fuel flow rate illustrate what “ideal” data looks like: a primary source that lays out key specifications in a structured, human‑readable format. For ontology design, this suggests how deeply one could model a car’s technical state when authoritative, granular information is available.

However, most teams do not maintain such neatly organised, historically consistent technical archives on their public websites. For many cars and seasons, information must be pieced together from secondary and tertiary sources—media articles, fan databases, technical analyses, or partial documentation. This creates a tension: should a motorsport ontology aim for maximal completeness by incorporating “good enough” secondary data, or should it restrict itself to only the most authoritative—and therefore sparser—sources? Framed differently: is it better to have a richly detailed but occasionally uncertain model, or a minimal but rock‑solid one? The answer will shape not only the ontology’s structure and data ingestion pipeline, but also the kinds of claims that users can reasonably trust it to support.

Identity, Evolution, and Regulations

The problem of identity over time appears quickly. Asking “How many wins did BMW have as an engine supplier?” immediately raises another question: should that count implicitly include wins where BMW was also the constructor, or should the ontology distinguish strictly between BMW‑as‑engine‑supplier and BMW‑as‑team/manufacturer and require the user to specify which they mean?

Similarly, how should the ontology represent evolution over time? The oft‑cited example is “Team Enstone”: the same physical operation and core staff evolving through different commercial identities—Toleman, Benetton, Renault, Lotus, Alpine - across decades. Should these be modelled as a single enduring entity with changing names and ownership attributes, or as distinct teams linked by a continuity relation such as successorOf or inheritsInfrastructureFrom? The choice determines whether a query like “How many wins does Team Enstone have?” is even well‑posed, or whether it must instead be decomposed into wins per legal entity and then recombined according to an explicit notion of historical continuity.

A parallel challenge arises with technical and points regulations. Rule sets define what counts as a valid car, a legal result, or a points finish in a given era, and those definitions shift over time: turbo bans, hybrid introductions, changing points systems, sprint races, and so on. How do we model these regulations, and should we?

Deciding whether to encode regulations as first‑class entities, attach them as temporal qualifiers to seasons and events, or treat them as implicit background assumptions will determine how precisely the ontology can answer questions like “Who scored the most points under the pre‑2010 system?” or “Which constructors won titles across multiple technical eras?”

These are, in essence, the central questions behind Designing an Ontology for Formula 1. The “what” is a carefully scoped, semantically precise model of teams, drivers, cars, results, and rules over time; the “why” is the belief that only such a structure can truly support the kinds of nuanced, historically aware questions that make Formula 1 such a rich domain to study.

These are questions that will not be settled in a single pass, but explored iteratively. The long‑term ambition is to build a living ontology for Formula 1and eventually for motorsport more broadly that can grow, refine itself, and remain faithful to the sport’s evolving history.