ByteThirst Calculation Methodology
How we estimate the environmental impact of your AI usage
Summary
ByteThirst estimates the water consumption, energy usage, and carbon emissions of your AI interactions across ChatGPT, Claude, Gemini, Copilot, Perplexity, Poe, You.com, and Bing Chat. Every estimate is presented as a range (low / mid / high) to communicate the significant uncertainty inherent in these calculations. We anchor our model to the best available public measurements and apply scaling factors for query complexity and model size.
Calculation Pipeline
Step 1: Token Estimation
ByteThirst does not have direct access to the internal tokenizers used by each AI platform. Instead, we estimate token counts by dividing the character count of your input and the model's output by a platform-specific characters-per-token ratio. These ratios are calibrated against each platform's publicly available tokenizer tools and documentation.
| Platform | Characters per Token | Rationale |
|---|---|---|
| ChatGPT (GPT-4, GPT-4o, o-series) | 4.0 | Well-documented BPE tokenizer. Confirmed by OpenAI's tiktoken library for English text. |
| Claude (Sonnet, Opus, Haiku) | 3.8 | Slightly more granular tokenizer. Conservative estimate based on Anthropic API benchmarks. |
| Gemini (Pro, Flash, Ultra) | 4.2 | Google's SentencePiece tokenizer runs coarser on average. |
These ratios are calibrated for English text. Other languages—particularly CJK languages, Arabic, and Hindi—may have significantly different characters-per-token ratios. We plan to add language-specific adjustments in a future update.
Step 2: Energy Estimation
Energy consumption per query is the most studied and most variable component of our pipeline. We surveyed every major public source available as of early 2026 to anchor our baseline estimate.
| Source | Model | Energy per Query | Date | Notes |
|---|---|---|---|---|
| Google (official) | Gemini median text prompt | 0.24 Wh | Aug 2025 | Most transparent industry disclosure. Includes idle capacity and cooling overhead. |
| OpenAI (Altman) | ChatGPT average query | 0.34 Wh | Aug 2025 | Self-reported, less methodological detail. |
| Epoch AI (independent) | GPT-4o, 500 output tokens | 0.30 Wh | Feb 2025 | Based on H100 GPU compute analysis. Short query baseline. |
| Epoch AI (independent) | GPT-4o, ~7,500 input words | 2.5 Wh | Feb 2025 | Long context query. Demonstrates 8x range based on input length. |
| Jegham et al. (arXiv) | GPT-4o short query | 0.42 Wh ± 0.13 | May 2025 | Academic benchmark with uncertainty bounds. |
Our baseline
We use 0.30 Wh as the baseline energy cost for a standard query of approximately 100 input tokens + 500 output tokens = 600 total tokens. This is anchored to the Epoch AI independent estimate for GPT-4o, which falls in the middle of the industry self-reports (Google's 0.24 Wh and OpenAI's 0.34 Wh).
Scaling by query size
Not all tokens are created equal. Output tokens require significantly more compute than input tokens because each output token requires a full forward pass through the model, while input tokens can be processed in parallel during the prefill stage. Based on published inference cost analyses, output tokens cost roughly 15× the compute of input tokens.
We convert raw token counts into "effective tokens" to normalize compute cost:
For a standard query (100 input + 500 output tokens):
Energy is then scaled linearly relative to the standard query's effective token count. A query with twice the effective tokens uses approximately twice the energy.
Model tier multipliers
Different models within each platform vary dramatically in compute requirements. We classify models into five tiers:
| Tier | Example Models | Multiplier | Rationale |
|---|---|---|---|
| Small | GPT-4o-mini, Claude Haiku, Gemini Flash | 0.3× | Smaller parameter count, significantly less compute. Pricing 3-10× cheaper than standard. |
| Standard | GPT-4o, Claude Sonnet, Gemini Pro | 1.0× | Baseline. Most common consumer-facing models. |
| Large | GPT-4.1, Claude Opus, Gemini Ultra | 2.5× | Largest models with highest compute requirements. Pricing 3-5× higher than standard. |
| Reasoning | o1, o3, o4-mini, Claude extended thinking | 5.0× | These models generate extensive internal chain-of-thought tokens (often 10-50× the visible output) before producing a response. We apply a conservative 5× multiplier. |
| Image generation | DALL-E, Gemini image gen | 10.0× | Image generation uses ~2.9 kWh per 1,000 images (Luccioni et al., 2023), or ~2.9 Wh per image—roughly 10× a text query. |
Energy range
To communicate uncertainty, we present three estimates for every query:
- Low: base × 0.6 (optimistic—assumes best-case hardware utilization, latest-generation chips, and efficient batching)
- Mid: base × 1.0 (baseline—our best single-point estimate)
- High: base × 1.8 (conservative—accounts for older hardware, low utilization, and additional overhead)
Step 3: Water Estimation
Data centers consume water primarily for cooling. We convert energy estimates to water estimates using a water-intensity ratio (milliliters of water per watt-hour of energy consumed).
| Source | Ratio (mL/Wh) | Notes |
|---|---|---|
| Google (official) | 1.08 mL/Wh | Derived: 0.26 mL water per 0.24 Wh query. Comprehensive overhead included. |
| OpenAI (Altman, implied) | 0.94 mL/Wh | Derived: 0.32 mL water per 0.34 Wh query. |
Our range: Low: 0.50 mL/Wh (dry climate with air cooling), Mid: 0.94 mL/Wh (industry average derived from OpenAI disclosure), High: 1.20 mL/Wh (evaporative cooling in warm climates with older infrastructure).
A note on viral claims
The widely cited UC Riverside study (Li et al., 2023) estimated that ChatGPT consumes approximately 519 mL of water per 100 words of output—roughly 52 mL per short query. This figure is approximately 1,000× higher than the industry self-reports from Google and OpenAI. The discrepancy arises because the UC Riverside methodology includes the full lifecycle water footprint of electricity generation (so-called "off-site" or "upstream" water), including water consumed at power plants, in fuel extraction, and in the broader energy supply chain. By contrast, Google's and OpenAI's figures report only the direct ("on-site") water consumed at the data center for cooling. Both approaches are valid for different purposes, but they measure fundamentally different things. ByteThirst uses the direct water consumption methodology because it represents the water physically used at data centers and is the figure most comparable across providers.
Step 4: CO₂ Estimation
We estimate carbon emissions by multiplying energy consumption by a grid carbon intensity factor (grams of CO₂ emitted per watt-hour of electricity consumed).
| Source | Intensity | Notes |
|---|---|---|
| EPA eGRID (2023) | 0.39 kg CO₂/kWh | US national average, location-based. |
| Google (location-based) | 0.09 gCO₂e per Gemini query | Based on actual grid mix at data center locations. |
| Google (market-based) | 0.03 gCO₂e per Gemini query | Includes renewable energy certificate purchases. |
Our range: Low: 0.20 g/Wh (reflects grids with significant renewable penetration), Mid: 0.39 g/Wh (US national average from EPA eGRID), High: 0.60 g/Wh (coal-heavy grids or regions with older infrastructure).
We use location-based emissions rather than market-based emissions. While companies like Google and Microsoft purchase renewable energy certificates (RECs) to offset their electricity usage, location-based accounting reflects the actual carbon intensity of the grid where the data center operates. This is more representative of the real-world emissions impact, since RECs do not necessarily reduce the physical carbon intensity of the electricity consumed at the point of use.
Known Limitations & Uncertainty
- Token estimation is approximate. Our character-to-token ratios are averages for English text. Actual tokenization varies by language, content type (code vs. prose), and specific model version. Errors of 10–20% in token estimation are possible.
- Model tier detection is heuristic. ByteThirst infers the active model from DOM elements on each AI platform's interface. If a platform changes its UI, model detection may temporarily misclassify the model tier until we update the extension.
- Energy-per-token varies widely. The energy cost of inference depends on GPU type (H100 vs. A100 vs. TPUv5), batch size, quantization level, and server utilization. Our baseline assumes mid-range conditions, but actual energy consumption for any single query could be 2–3× higher or lower.
- Water consumption depends on local climate and cooling technology. A data center in Iowa using evaporative cooling will consume significantly more water per watt-hour than a data center in Finland using free air cooling. We cannot determine which data center serves any individual query, so we use an industry-average ratio.
- Cached and short-circuited responses are not detected. Some queries may be served from cache or routed to smaller models, consuming far less energy than our estimates suggest. We have no way to detect this from the client side.
- Reasoning model uncertainty is high. Models like o1, o3, and o4-mini generate internal chain-of-thought tokens that are not visible to the user. The number of internal tokens can vary from 2× to 50× the visible output length. Our 5× multiplier is a conservative midpoint, but individual queries may vary significantly.
- All constants are point-in-time. The energy efficiency of AI inference is improving rapidly. Our constants are based on data available as of early 2026 and will be updated as new measurements are published.
We believe the most honest approach is to communicate this uncertainty directly to our users through range-based estimates rather than false-precision single numbers. If you see a ByteThirst estimate of "0.28 mL (low: 0.10 / high: 0.60)," that range is the message: this is our best guess, but the true value could reasonably fall anywhere within it.
Comparison with Other Estimates
To validate our model, we compare our mid-range estimate for a standard text query against published per-query figures from other sources:
| Source | Per-query water estimate | Our mid estimate | Ratio |
|---|---|---|---|
| Google (official, Gemini) | 0.26 mL | 0.28 mL | ~1.1× |
| OpenAI (Altman, ChatGPT) | 0.32 mL | 0.28 mL | ~0.9× |
| UC Riverside (Li et al.) | ~52 mL | 0.28 mL | ~0.005× |
Our mid estimate aligns closely with the industry self-reports from Google and OpenAI, falling within 10% of both figures. The UC Riverside figure is not directly comparable due to the inclusion of upstream lifecycle water, as discussed in Step 3 above.
Source Citations
- Google, "Environmental Report: AI and Energy Use" (August 2025)
- Altman, S., "AI and Energy" blog post, OpenAI (August 2025)
- Epoch AI, "Estimating the energy consumption of LLM inference" (February 2025)
- Jegham, N. et al., "Energy Consumption of Large Language Models: A Systematic Benchmark" arXiv (May 2025)
- Luccioni, A. et al., "Power Hungry Processing: Watts Driving the Cost of AI Deployment?" FAccT (2023)
- US EPA, "eGRID Summary Tables" (2023 data)
- Li, P. et al., "Making AI Less Thirsty" UC Riverside (2023)
- SemiAnalysis, "Inference Cost Analysis" (2024)
Invitation for Peer Review
We welcome corrections, updated data, and methodological improvements from researchers, engineers, and anyone with domain expertise. If you spot an error, have access to better measurements, or can suggest a more rigorous approach to any step in our pipeline, please reach out at hello@bytethirst.com. We will credit all contributors who help improve the accuracy of ByteThirst's estimates.
Changelog
| Date | Change |
|---|---|
| February 15, 2026 | v1.0 — Initial methodology published |