What the TOON Format Is (Token-Oriented Object Notation) —— 为 AI 而生的数据格式

TOON：优化AI数据处理的新型格式 TOON（面向Token的对象表示法）是一种专为AI系统设计的紧凑数据格式，相比传统JSON可减少30-60%的token使用量。它通过以下创新实现高效：精简语法：去除JSON冗余符号，使用缩进代替括号表格化数组：对同类数据采用表头+行结构智能类型转换：自动优化数字、日期等格式主要优势包括：显著降低API成本提升模型处理效率保持人类可读性完整保

RR1335

354人浏览 · 2025-12-11 01:15:00

RR1335 · 2025-12-11 01:15:00 发布

How TOON (Token-Oriented Object Notation) works, the compact data format that reduces AI prompt tokens by up to 60% compared to JSON.

https://openapi.com/blog/what-the-toon-format-is-token-oriented-object-notation

How TOON Can Change the Way AI “Sees” Data

In the world of modern artificial intelligence—particularly when working with Large Language Models (LLMs)—every token matters. API cost, as well as latency and prompt efficiency, depend directly on how many tokens are sent to the model. In this context, TOON (Token-Oriented Object Notation) emerges as a compact, readable serialization format designed specifically to minimize the number of tokens required to represent structured data.

Instead of using the classic JSON format (or others), TOON “packages” the same information in a leaner, more optimized way while remaining easily readable by humans.

在现代人工智能领域——尤其是处理大型语言模型(LLMs)时——每个token都至关重要。API成本、延迟以及提示效率都直接取决于发送给模型的token数量。在这种背景下，TOON(面向Token的对象表示法)应运而生，它是一种紧凑、可读的序列化格式，专门设计用于最小化表示结构化数据所需的token数量。

与使用经典JSON格式(或其他格式)不同，TOON以更精简、更优化的方式"打包"相同信息，同时仍保持人类可读性。

Let’s take a closer look at what TOON is, why it’s useful, how it works, its limitations, and its current status.

Why TOON Matters Now

The Problem with JSON in AI Pipelines

JSON is widely used to represent structured data. However, it has a “verbose” grammar: curly braces { }, brackets [ ], quotes ", commas, indentation… all of this generates extra tokens when sending payloads to an LLM.

When dealing with many uniform objects (for example, a list of users, products, or records), keys are repeated continuously, further increasing token consumption.

The Solution? TOON!

TOON was created precisely to solve this problem: it is a format designed to maximize token efficiency while maintaining human readability and the full semantic structure of JSON. According to several benchmarks, TOON can reduce token usage by 30–60% compared to traditional JSON. This reduction translates into lower API costs, more available space within the model’s context window, and—in some cases—greater accuracy in the model’s understanding of structured data.

At Openapi we are working in the same direction, and soon (in the coming weeks) our APIs will also support the new format.

TOON的创建正是为了解决这一问题：它是一种旨在最大化令牌效率的格式，同时保持人类可读性和JSON的完整语义结构。根据多项基准测试，与传统JSON相比，TOON可减少30-60%的令牌使用量。这种减少意味着更低的API成本、模型上下文窗口中更多的可用空间，在某些情况下还能提高模型对结构化数据的理解准确性。

在Openapi，我们正朝着同一方向努力，很快（未来几周内）我们的API也将支持这一新格式。

What TOON Is: Definition and Principles

TOON stands for Token-Oriented Object Notation. It is a text-based serialization format for structured data, designed specifically to be sent to LLMs as input. TOON is:

Compact and token-efficient: it eliminates many redundant syntactic elements found in JSON.
Schema-aware: it uses explicit declarations of array lengths and fields to make the structure more formal.
Human-readable: it remains developer-friendly thanks to YAML-like indentation.
Tabular for uniform arrays: for arrays where all objects share the same fields, TOON uses a table-like structure (header + rows) similar to CSV.

According to the official GitHub repository, TOON is lossless with respect to JSON: you can convert JSON → TOON → JSON without any loss of information.

How the TOON Format Works

Basic Syntax

1. Simple Objects

TOON removes curly braces and uses indentation for nesting.

For example:

id: 123 name: Ada active: true

This represents the same JSON object

{ "id": 123, "name": "Ada", "active": true }

2. Nested Objects

Using YAML-like indentation:

user: id: 123 name: Ada

JSON representation

{ "user": { "id": 123, "name": "Ada" } }

3. Arrays of Primitives

TOON declares the length and values inline:

tags[3]: foo,bar,baz

This is equivalent to

["foo", "bar", "baz"]

in JSON.

4. Uniform Object Arrays (Tabular Arrays)

This is where the major token savings occur:

users[2]{id,name,role}: 1,Alice,admin 2,Bob,user

users[2]: indicates an array of 2 elements
{id,name,role}: the fields that each object contains

The lines below contain comma-separated values, one per object.

5. Delimiter Options

To separate row values, TOON supports multiple delimiters: comma (,), tab (\t), or pipe (|). Using tabs or pipes can offer additional token savings because it reduces the need for quoting or escaping.

6. Key Folding

The TOON specification includes a “key folding” option: if a structure contains chains of single-level “wrapper” keys, they may be represented using dotted paths to save tokens.

7. String Quoting

Strings in TOON are quoted only when necessary: for example, if they contain the active delimiter, colons :, leading/trailing spaces, control characters, etc.

8. Special Type Conversion

Numbers: formatted in decimal form (e.g., not scientific notation).
NaN or ± Infinity become null.
BigInt: if within safe range, converted to number; otherwise represented as a quoted decimal string.
Dates: converted to quoted ISO strings.
Non-serializable values typically become null

How TOON Works (For Non-Developers)

For those who are not developers, think of TOON as a leaner “language” for describing data.

When sending a group of similar elements (e.g., a list of users, all with name, age, and role), TOON allows you to declare the fields once (name, age, role) and then provide each row of data. This way, you don’t have to rewrite “name”, “age”, “role” for every user as you would in standard JSON—saving valuable “fuel” (tokens). If the data is simple (like just a list of tags or words), TOON represents everything compactly: fewer symbols, less “unnecessary punctuation”. For nested data (e.g., an object inside another), TOON uses indentation (spaces) to show hierarchy, similar to a well-structured document but without heavy brackets.

In short: TOON preserves the logical structure of data while reducing “empty words.”

What Are the Advantages of TOON?

Below are the advantages of using the TOON format:

1. Token Efficiency

Thanks to its minimal syntax, single declaration of keys, and tabular structure, TOON enables an average 30–60% token reduction compared with JSON. This savings is especially significant in LLM workloads involving repetitive tabular data.

2. Higher Accuracy (Schema-Aware Guardrails)

TOON is not merely compact: it is schema-aware. Declaring array length ([N]) and fields ({…}) helps LLMs validate structure more effectively, reducing errors, omissions, and hallucinations when the model must answer questions or reason about structured data. In the official repository, benchmarks show that TOON can achieve higher retrieval accuracy compared to compact JSON.

3. Human Readability

Despite reducing symbols, TOON remains readable for developers thanks to indentation, tabular organization, and very clear syntax. This facilitates debugging, manual prompt writing, and interpretation by prompt engineers.

4. Compatibility and Zero Information Loss

TOON is lossless relative to JSON: any JSON structure can be expressed in TOON and converted back without data loss. Libraries and SDKs exist for encoding (encode) and decoding (decode) between JSON and TOON in various languages (e.g., TypeScript, Elixir, PHP).

5. Cross-Language Implementations

TOON is not just a conceptual idea: there are concrete implementations in many languages:

TypeScript / JavaScript: the official repository includes a TypeScript SDK
Elixir: TOON implementation for Elixir
PHP: TOON port for PHP
R: CRAN package to serialize R objects into TOON

More languages continue to emerge thanks to the open-source nature of the specification.

When TOON Is Particularly Useful

TOON provides maximum benefit in certain common scenarios:

Lists of similar elements: When many objects share the same fields (e.g., lists of users, orders, products); this is where token savings are most evident.
AI prompting: When interacting with an AI model and structured data must be passed, TOON helps the model interpret it better with less “overhead.”
Token-sensitive applications: If tokens are billed (as in many AI model APIs), using TOON can significantly reduce costs.
Readable data debugging/modification: For those building or refining data, TOON’s readability helps compared to extremely minified JSON.

When Not to Use TOON

It is important to note that TOON is not always the ideal choice. There are cases where other formats may be more efficient:

Highly nested or non-uniform structures If the data contains many levels of nested objects or arrays with variable fields, TOON’s token savings may decrease or even be worse than compact JSON.
Simple, purely tabular data If the data is flat (a simple table) with no nesting, CSV may be more compact than TOON because TOON adds metadata like length declarations and field names.
Latency-critical applications In scenarios where serialization/deserialization time matters more than tokens, compact JSON may be faster even if it uses more tokens.

How to Use TOON in LLM Prompts

Encoding Before Sending

You can use official libraries to convert JSON objects into TOON: for example, with the official TypeScript package (@toon-format/toon)

When building the prompt, include the serialized TOON in a code block, for example:

```toon users[3]{id,name,role}: 1,Alice,admin 2,Bob,user 3,Charlie,user

This helps the model identify the structure and respond consistently.

Generating TOON from the LLM

If we want the model to generate TOON data, we can:

Show a sample TOON header (such as users[N]{…}:)
Specify that the model must produce matching rows and the correct [N] value
Require that the answer be returned only in the code block and in TOON format

This pattern is helpful because the model doesn’t need to “guess” key names every time—they are already declared.

Simple Tools for Using TOON

You don’t need to be a programming expert to experiment with TOON. There are tools designed for people who have data but don’t want to write complex code:

Web converters: There are websites where you can paste data (e.g., JSON or CSV) and obtain an “optimized” TOON version automatically, all in the browser.
Interactive playgrounds: Some tools allow you to see in real time how many tokens the format uses, helping you evaluate savings.
Secure tools: There are client-side apps (running in the browser) that perform the conversion without sending data to external servers, ensuring greater privacy.

Current Status of the TOON Project and Roadmap

The TOON project is active and open-source. The official GitHub repository includes the specification, code, and benchmarks. The official specification (spec) is updated to version 2.0 (working draft). Implementations already exist in many languages, as noted above (TypeScript, Elixir, PHP, R, etc.), and more may arrive. There are tools and web converters (e.g., ToonParse) to convert JSON → TOON and vice versa on the client side, without sending data to external servers. Benchmarks indicate that TOON not only reduces tokens but may also improve retrieval accuracy when using LLMs with structured data.

Caveats and Caution: What Is Not Yet Confirmed

As highlighted by community comments, many LLMs have not been explicitly “trained” on TOON: their training data likely consisted almost entirely of JSON or other formats. This means that using TOON may require adaptation, and in some cases, the model may respond less optimally if it is not “familiar” with the structure. Although open-source and rapidly evolving, TOON is still relatively new. The specification is in flux, so some implementations may not be fully compatible with each other unless the correct spec version is followed.

In Summary

TOON (Token-Oriented Object Notation) represents a significant innovation in data serialization for LLM applications. Thanks to its compact, human-readable, and schema-aware structure, it can drastically reduce the number of tokens compared to traditional JSON (often by as much as 60%) while preserving full semantic meaning and data structure.

However, it’s important to stay realistic: not all scenarios are suited to TOON yet. At present, TOON is a complement to JSON—useful when token savings provide economic or technical value, but not yet a universal replacement.

If you are a developer working with LLMs, TOON is worth exploring: you can use it to build more efficient prompts, optimize API costs, and increase the usable space within the model’s context window. But keep a pragmatic approach: always measure the benefits for your specific use case and test JSON ↔ TOON conversion reliability in your pipeline.

Openapi is about to support TOON in its APIs, and this is a very important update: it means applications relying on these APIs can become more efficient and less expensive by leveraging a modern, optimized format.

{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {
      "id": 1,
      "name": "Blue Lake Trail",
      "distanceKm": 7.5,
      "elevationGain": 320,
      "companion": "ana",
      "wasSunny": true
    },
    {
      "id": 2,
      "name": "Ridge Overlook",
      "distanceKm": 9.2,
      "elevationGain": 540,
      "companion": "luis",
      "wasSunny": false
    },
    {
      "id": 3,
      "name": "Wildflower Loop",
      "distanceKm": 5.1,
      "elevationGain": 180,
      "companion": "sam",
      "wasSunny": true
    }
  ]
}

https://github.com/toon-format/toon

https://toonformat.dev/

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

【2025年生成式引擎优化（GEO）白皮书，内容破局是关键】

2048 AI社区

An ESPRIT-Based Supervised Channel Estimation Method Using Tensor Train Decomposition for mmWave 3-D

—本文关注毫米波三维多输入多输出正交频分复用 (mmWave 3D MIMO-OFDM) 系统的下行链路监督信道估计问题（supervised channel estimation problem），其中发射机和接收机均配备均匀矩形阵列 (URAs)。基于稀疏散射特性，毫米波信道被建模为低秩高阶张量。

2048 AI社区

AI Agent在智能城市噪声管理中的实践

随着城市化进程的加速，城市噪声污染问题日益严重，对居民的生活质量和健康造成了负面影响。智能城市的发展为解决噪声管理问题提供了新的思路和技术手段。本文的目的是探讨AI Agent在智能城市噪声管理中的应用实践，研究如何利用AI Agent的自主决策和学习能力，实现对城市噪声的有效监测、分析和控制。研究范围涵盖了AI Agent的基本原理、相关算法，以及在智能城市噪声管理中的具体应用场景和实现方法。

2048 AI社区

所有评论(0)

查看更多评论

RR1335

@D1237890

已为社区贡献34条内容