How We Cut Our API Response Time by 80% Without Rewriting a Single Line

This is a case study from a project I worked on last year. Our main API endpoint was taking 4-8 seconds to respond during peak hours, causing timeouts and user complaints. We fixed it to under 400ms. Here’s exactly what we did.

The Situation

We had a product listing API that aggregated data from five different tables: products, inventory, pricing, reviews, and category hierarchy. Under normal load it was fine. During sales events it fell apart completely.

First instinct from the team? “We need to rewrite it in Go” and “We should add more servers.” Both turned out to be unnecessary.

Step 1: Measure Before You Optimize

We instrumented the endpoint with timing logs at each step:

import time

@app.route('/api/products')
def get_products():
    t0 = time.time()
    products = db.get_products(filters)
    t1 = time.time()
    
    inventory = inventory_service.get_stock(product_ids)
    t2 = time.time()
    
    prices = pricing_service.get_prices(product_ids)
    t3 = time.time()
    
    logging.info(f"DB query: {t1-t0:.3f}s, Inventory: {t2-t1:.3f}s, Pricing: {t3-t2:.3f}s")
    return jsonify(combine(products, inventory, prices))

Results: DB query was 0.1s, inventory was 2.3s, pricing was 1.8s. The DB wasn’t the problem.

Step 2: The N+1 Query Problem

Looking at the inventory service code, we found this:

# THE PROBLEM - calls the inventory API once per product
for product_id in product_ids:
    stock = inventory_api.get(f'/stock/{product_id}')
    inventory[product_id] = stock

With 50 products on a page, that was 50 HTTP requests. At 40ms each, that’s 2 seconds just in network overhead.

Fix: The inventory API had a bulk endpoint we weren’t using:

# FIXED - one HTTP request for all products
stock_data = inventory_api.post('/stock/bulk', {'ids': product_ids})
inventory = {item['id']: item['stock'] for item in stock_data}

Step 3: Caching Hot Data

Prices don’t change in real-time – they update every 15 minutes via a batch job. We were recalculating prices on every request. We added Redis caching:

def get_prices(product_ids):
    cache_key = f"prices:{':'.join(sorted(str(id) for id in product_ids))}"
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    
    prices = pricing_service.calculate(product_ids)
    redis.setex(cache_key, 300, json.dumps(prices))  # 5 min cache
    return prices

Step 4: Parallelizing Independent Requests

Even after fixing the N+1, inventory and pricing were still sequential. They don’t depend on each other, so we ran them in parallel:

import asyncio

async def get_product_data(product_ids):
    inventory_task = asyncio.create_task(get_inventory_bulk(product_ids))
    pricing_task = asyncio.create_task(get_prices(product_ids))
    
    inventory, prices = await asyncio.gather(inventory_task, pricing_task)
    return inventory, prices

Results

  • Before: 4-8 seconds average, frequent timeouts
  • After N+1 fix: 1.2 seconds average
  • After caching: 0.6 seconds average for cached requests
  • After parallelization: 0.35 seconds average

80% improvement. No new servers, no language switch, no architectural rewrite.

Lessons Learned

  1. Measure first, always. Our instinct about the database being slow was wrong.
  2. N+1 queries kill performance. Check for loops making individual API or DB calls.
  3. Cache aggressively. Most data doesn’t need to be fresh on every single request.
  4. Parallelize what you can. If two operations don’t depend on each other, run them simultaneously.

The “boring” optimization work almost always pays off better than rewrites. Profile first, then optimize specifically.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top