Skip to main content
Reference

Working Efficiently with AI Tools

Practical guidance on token usage, conversation structure, and workflow habits. Not tips. Principles — with the reasoning behind them.

01 / Tokens

What tokens are and why they compound

Every message you send — and every response you receive — costs tokens. A token is roughly three-quarters of a word. A typical paragraph is 100–150 tokens. A full page of text is 500–700.

What most people don't realise: every follow-up message in a conversation re-sends the entire history. Turn one costs 100 tokens. Turn five costs five times that. Turn ten costs ten times that — for the same amount of new information.

This is the compounding problem. Long conversations aren't just inconvenient — they're exponentially expensive.

The Snowball Effect
Turn 1:   ~100 tokens
Turn 5:   ~500 tokens
Turn 10:  ~1,100 tokens

10 tasks in one chat ≈ 5.5× more expensive than 10 separate single-turn chats.

Output tokens cost more than input tokens. Files and PDFs add significant token load on upload.

This section maps to Efficiency (ef1–ef4)

02 / Prompts

Structure your prompts to eliminate correction rounds

SPECIFICITY

Cut preamble. State the task.

Vague prompts force the model to guess at scope and produce hedged, over-long responses. Polite preamble ("Can you help me with...", "I was wondering if...") adds tokens without adding value. State what you need. State the format. State the constraint. The model responds to clarity, not manners.

DEFINITION OF DONE

Know before you send.

A prompt without a testable output definition is a guess. If you can't state what acceptable output looks like before sending, the model can't reliably produce it. Write the Definition of Done before you write the prompt body. This single habit eliminates the majority of correction rounds.

BATCHING

One structured ask beats five sequential ones.

Each follow-up message re-sends the full history. Five separate prompts to accomplish one compound task costs five times the context overhead. Group related changes into a single structured prompt with ordered steps. The model handles complexity better than the conversation history handles volume.

This section maps to Prompt Quality (pq1–pq4)

03 / Conversations

Manage context deliberately — it doesn't manage itself

Start a new chat when the topic changes.

Prior conversation context rides along with every message. When you switch topics in an existing chat, all of that irrelevant history consumes tokens on every subsequent turn. A clean prompt in a fresh conversation almost always outperforms a continuation in a bloated thread.

Load recurring context once, not every time.

If you repeatedly re-explain the same background — your team, your goals, your data model, your constraints — put it in a project briefing. That context loads once per session and doesn't get re-sent with every message the way chat history does.

Know when to stop prompting.

Context degrades as a session fills. The model gradually loses awareness of earlier decisions and constraints. Symptoms: contradictions, repeated mistakes, ignored instructions. When you see this, stop sending correction prompts. The cost of starting a new session is always lower than the cost of drift that compounds across twenty more prompts.

Bring context forward deliberately.

Into a new session: bring your updated briefing, current state, and the specific task at hand. Do not bring the full conversation history. The briefing is designed to carry context efficiently; conversation history is not.

This section maps to Session Discipline (sd1–sd4) and On-Demand Context (oc1–oc4)

04 / Models

Match the model to the task — not every question needs the largest one

LIGHTWEIGHT

Fast, cheap, narrow tasks

Simple lookups. Yes/no questions. Formatting tasks. Syntax help. Summarising short content. Use when the task is bounded and the output is straightforward.

BALANCED

Default for most work

Summaries, drafts, data pulls, general Q&A, most writing. Research tasks. Standard analysis. Use for 80% of tasks. Start here. Move up only when the task genuinely requires deeper reasoning.

CAPABLE

Complex reasoning, reserve deliberately

Multi-step analysis. Strategic documents. Architecture decisions. Code with complex interdependencies. Anything requiring sustained chain-of-thought reasoning. Use only when the task demands it. Costs significantly more.

Rule of Thumb

Start with the balanced tier. Upgrade only when output quality on the balanced tier is genuinely insufficient for the task — not because a harder task feels like it deserves a bigger model.

05 / Context

What you load into context shapes everything downstream

Every file you upload, every document in project knowledge, every connector you enable — all of it adds to the token load on every session. The model searches all of it on every query.

Files that were useful two weeks ago and haven't been removed are now noise. They compete with relevant content and degrade the quality of what surfaces. There's no archive that sits outside search scope — unused files cost search quality, not just tokens.

The same applies to connectors. Every enabled integration adds tool definitions to the context. Only enable what you actively use.

Load this
  • ✓ Active reference material (architecture docs, data models, current specs)
  • ✓ Constraints and rules that must apply across sessions
  • ✓ Skill files for recurring workflows
  • ✓ Current-state briefing (short, accurate, updated)
Not this
  • ✗ Output artifacts and deliverables (these belong in Drive/local storage)
  • ✗ Drafts and exports from previous project phases
  • ✗ Binary assets unless directly required for the task
  • ✗ Connectors you enabled once and haven't used since

This section maps to Knowledge Quality (kq1–kq4) and Project Setup (ps1–ps4)

The Efficiency Principle

Token efficiency is workflow efficiency.

The prompts you don't send because your briefing carries the context. The correction rounds you avoid because you defined done before sending. The sessions you start fresh because you recognised degradation early.

These aren't optimisation tricks. They're the structural habits that separate practitioners who improve from practitioners who stay stuck.