Back to Resources
AI Foundations

What are Tokens? Understanding How AI Counts (and Charges) You

Tokens determine your AI costs. Learn what they are, how to calculate them, and how to optimize your AI spending.

January 29, 2025
8 min read
TokensAI CostsOptimizationLLM Basics

If you're using ChatGPT, Claude, or any LLM API, you're paying for tokens. But what exactly is a token? And why does understanding them matter for your business?

Why Tokens Matter

Tokens are how LLMs measure and charge for usage. Understanding tokens helps you:

  • Predict and control AI costs
  • Optimize prompts for efficiency
  • Choose the right model for your budget
  • Avoid unexpected bills

What is a Token?

Simple Definition:

A token is a piece of text that an LLM processes. It can be a word, part of a word, or even a single character. On average, one token equals about 4 characters or 0.75 words in English.

The Simple Rule of Thumb

Quick Token Estimation:

  • 100 tokens = 75 words = 1 paragraph
  • 1,000 tokens = 750 words = 1-2 pages
  • 10,000 tokens = 7,500 words = 15 pages

How Tokenization Works

LLMs break text into tokens using a process called tokenization. Here's how different text gets tokenized:

Example 1: Simple Words

Text: "Hello world"

Tokens: ["Hello", " world"] = 2 tokens

Example 2: Longer Words

Text: "Understanding"

Tokens: ["Under", "standing"] = 2 tokens

Example 3: Numbers and Symbols

Text: "Price: $1,234.56"

Tokens: ["Price", ":", " $", "1", ",", "234", ".", "56"] = 8 tokens

Example 4: Code

function hello() { return "Hi"; }

Tokens: ["function", " hello", "()", " {", " return", ' "', "Hi", '";', " }"] = 9 tokens

Key Insight

Common words are usually 1 token. Uncommon words, technical terms, and non-English text often use more tokens. Numbers and special characters can be very token-expensive!

Why Tokens Matter for Your Business

Input Tokens

Everything you send to the AI (your prompt, context, examples)

What Counts as Input:

  • Your question or instruction
  • System prompts and context
  • Previous conversation history
  • Documents you upload for analysis

Generally less expensive than output tokens

Output Tokens

Everything the AI generates in response

What Counts as Output:

  • The AI's response text
  • Generated code or content
  • Explanations and reasoning
  • Any text the model produces

Typically 2-3x more expensive than input tokens

Context Window

The maximum number of tokens (input + output) the model can handle at once

Common Context Windows:

  • GPT-4: 128,000 tokens (~96,000 words)
  • Claude 3.5: 200,000 tokens (~150,000 words)
  • Gemini 1.5 Pro: 1,000,000 tokens (~750,000 words)

Larger context = can process longer documents

Real-World Token Usage Examples

How to Reduce Token Usage (and Costs)

1. Optimize Your Prompts

❌ Inefficient Prompt (250 tokens):

"I need you to please help me write a professional email to my client explaining that we're going to be a little bit late with the project delivery. Make sure it's polite and professional and apologizes for the inconvenience. The client's name is John Smith and the project is the website redesign."

✅ Efficient Prompt (80 tokens):

"Write a professional apology email to John Smith about delayed website redesign project delivery."

Savings: 68% fewer tokens, same result

2. Use Prompt Caching

If you're sending the same context repeatedly (like system prompts or documentation), use prompt caching:

  • Cache system prompts and instructions
  • Cache product documentation or knowledge bases
  • Cache conversation history in chatbots

Potential savings: 50-90% reduction in input token costs

3. Choose the Right Model

Not every task needs the most powerful (and expensive) model:

  • Simple tasks: Use GPT-3.5 or Claude Haiku (much cheaper)
  • Complex reasoning: Use GPT-4 or Claude Sonnet
  • Long documents: Use Claude or Gemini (better value for high token counts)

4. Limit Output Length

Control how much the AI generates:

  • Set max_tokens parameter in API calls
  • Be specific: "Write a 100-word summary" vs "Summarize this"
  • Use bullet points instead of paragraphs when appropriate

5. Implement Smart Chunking

For long documents, don't send everything at once:

  • Extract only relevant sections using keyword search first
  • Use RAG (Retrieval Augmented Generation) to send only relevant chunks
  • Summarize long documents in stages rather than all at once

Token Efficiency Across Different Content Types

Feature
Plain Text
Most Efficient
Code
Moderate
Data/Numbers
Least Efficient
Tokens per Word~0.75 words/token~0.5 words/token~0.3 words/token
EfficiencyVery efficientModerateInefficient
ValueBestGoodPoor
Use CasesEmails, articlesProgrammingCSV, JSON, numbers
Relative CostStandard1.5x more2-3x more

Common Token Mistakes to Avoid

Mistake #1: Sending Entire Documents When You Need Specific Info

Problem: Sending a 50-page document (40,000 tokens) to extract one piece of information.

Solution: Use keyword search or embeddings to find relevant sections first, then send only those sections.

Mistake #2: Not Monitoring Token Usage

Problem: Running up unexpected bills because you didn't track usage.

Solution: Set up usage alerts, implement token counting in your app, review usage weekly.

Mistake #3: Using Expensive Models for Simple Tasks

Problem: Using GPT-4 for basic text classification that GPT-3.5 could handle.

Solution: Test with cheaper models first. Upgrade only when necessary for quality.

Mistake #4: Ignoring Conversation History Buildup

Problem: Chatbots that keep entire conversation history, growing token usage exponentially.

Solution: Implement conversation summarization or sliding window (keep only last N messages).

Mistake #5: Verbose Prompts

Problem: Using 500 tokens to say what could be said in 100 tokens.

Solution: Be concise. Remove filler words. Test if shorter prompts give same results.

Practical Token Management Tips

For Developers

  • Use tiktoken library (Python) or gpt-tokenizer (JavaScript) to count tokens before sending
  • Implement token budgets per user or per request
  • Log token usage for every API call
  • Set up alerts when usage exceeds thresholds

For Business Leaders

  • Review token usage reports monthly
  • Calculate cost per use case (e.g., cost per customer support ticket)
  • Compare token efficiency across different models
  • Set usage budgets per department or project

For Everyone

  • Test prompts with token counters before deploying
  • Start with smaller context windows and expand only if needed
  • Use streaming responses to stop generation early if needed
  • Regularly audit and optimize your highest-usage prompts

Token Quick Reference Guide

Token Estimation Formulas

  • English Text:

    Words ÷ 0.75 = Tokens

  • Characters:

    Characters ÷ 4 = Tokens

  • Code:

    Words ÷ 0.5 = Tokens

Common Document Sizes

  • Email:

    100-300 tokens

  • Blog Post:

    1,000-3,000 tokens

  • Report (10 pages):

    5,000-7,000 tokens

  • Book Chapter:

    10,000-15,000 tokens

Key Takeaways

Remember:

  • ✓ 1 token ≈ 0.75 words ≈ 4 characters
  • ✓ Input and output tokens are priced differently
  • ✓ Code and numbers use more tokens than plain text
  • ✓ Context window = max tokens per request

Action Steps:

  • → Count tokens before sending to API
  • → Optimize prompts for conciseness
  • → Use prompt caching for repeated content
  • → Monitor usage and set budgets
Back to Resources

Need Help Optimizing Your AI Costs?

Let's discuss strategies to reduce your token usage and maximize ROI.

View Solutions