Optimizing 300 Billion Data Points: Building ONDC's High-Performance Serviceability System

Picture this: You're building a system where 20 million merchants need to define which of India's 30,000+ pincodes they can serve. That's potentially 600 billion data points in a sparse matrix where 99% of the values are empty.

Oh, and it needs to respond in under 50ms while handling 10,000+ requests per minute.

Welcome to the ONDC Pincode Serviceability Challenge - and how we built a system that earned us TOP 7 National Finalist status at Build for Bharat 2024.

The Problem: Scale Meets Speed

ONDC (Open Network for Digital Commerce) separates serviceability definition from verification. Merchants define their serviceable pincodes, while buyer apps verify if a merchant can serve a specific location. Simple concept, massive scale problem.

The numbers that kept us awake:

30,000+ Indian pincodes
20+ million potential merchants
10% enabling pincode serviceability = 2M active merchants
Sparse matrix: 2M × 30K = 60 billion potential data points
Target: Sub-50ms response time
Throughput: 10,000+ requests/minute

The Architecture: HBase + Smart Caching

Why Apache HBase?

Traditional databases crumble under this scale. We needed something built for:

Sparse data storage - Most merchant-pincode combinations don't exist
Horizontal scaling - Linear performance as data grows
Fast random access - Sub-millisecond key-value lookups
Column-family storage - Efficient sparse matrix representation

// HBase table design for optimal sparse storage
// Row Key: merchant_id
// Column Family: 'pincodes'  
// Column Qualifier: pincode
// Value: serviceability_metadata

public class ServiceabilityTable {
    private static final String TABLE_NAME = "merchant_serviceability";
    private static final String CF_PINCODES = "pincodes";
    
    public void addServiceability(String merchantId, List<String> pincodes) {
        Put put = new Put(Bytes.toBytes(merchantId));
        
        for (String pincode : pincodes) {
            put.addColumn(
                Bytes.toBytes(CF_PINCODES),
                Bytes.toBytes(pincode),
                Bytes.toBytes("ACTIVE")
            );
        }
        
        table.put(put);
    }
}

The Smart Caching Layer

Raw HBase performance wasn't enough. We implemented a multi-tier caching strategy:

@Service
public class ServiceabilityService {
    
    @Cacheable(value = "pincode-merchants", key = "#pincode")
    public List<String> getMerchantsByPincode(String pincode) {
        // L1: Local cache (Caffeine) - 1ms lookup
        List<String> cached = localCache.get(pincode);
        if (cached != null) return cached;
        
        // L2: Redis cluster - 5ms lookup  
        cached = redisTemplate.opsForValue().get("merchants:" + pincode);
        if (cached != null) {
            localCache.put(pincode, cached);
            return cached;
        }
        
        // L3: HBase - 20ms lookup
        return fetchFromHBase(pincode);
    }
}

This approach reduced database queries by 70% and achieved our sub-50ms target.

Performance Optimizations: The Details That Matter

1. Batch Processing for Bulk Uploads

Merchants don't add pincodes one by one - they upload CSVs with thousands of entries. We optimized for this:

public class BulkUploadProcessor {
    private static final int BATCH_SIZE = 1000;
    
    public void processBulkUpload(String merchantId, List<String> pincodes) {
        List<Put> puts = new ArrayList<>();
        
        for (String pincode : pincodes) {
            Put put = createPut(merchantId, pincode);
            puts.add(put);
            
            if (puts.size() >= BATCH_SIZE) {
                table.put(puts);
                puts.clear();
            }
        }
        
        // Process remaining
        if (!puts.isEmpty()) {
            table.put(puts);
        }
    }
}

Result: 40-50 rows/second processing speed for bulk uploads.

2. Intelligent Row Key Design

HBase performance depends heavily on row key design. Our strategy:

// Bad: Sequential keys cause hotspotting
String badRowKey = merchantId; 

// Good: Hash prefix prevents hotspots
String goodRowKey = DigestUtils.md5Hex(merchantId).substring(0, 4) + "_" + merchantId;

This distributed load evenly across HBase regions, preventing bottlenecks.

3. Parallel Query Processing

For multi-pincode queries, we process them concurrently:

public Map<String, List<String>> getMerchantsForPincodes(List<String> pincodes) {
    Map<String, List<String>> results = new ConcurrentHashMap<>();
    
    pincodes.parallelStream().forEach(pincode -> {
        List<String> merchants = getMerchantsByPincode(pincode);
        results.put(pincode, merchants);
    });
    
    return results;
}

Real-World Performance Results

After optimizations, our system delivered:

Query Response: < 50ms for single pincode lookup
Throughput: 10,000+ requests per minute
Storage Efficiency: 90%+ compression for sparse data
Concurrent Users: 1000+ simultaneous connections
Bulk Upload: 40-50 rows/second processing

The Spring Boot Microservices Architecture

We built separate services for different concerns:

Seller API Service

@RestController
@RequestMapping("/seller")
public class SellerController {
    
    @PostMapping("/upload/csv")
    public ResponseEntity<UploadResponse> uploadCsv(
            @RequestParam("file") MultipartFile file) {
        
        String processingId = UUID.randomUUID().toString();
        
        // Async processing
        CompletableFuture.runAsync(() -> 
            bulkUploadService.processCsv(processingId, file)
        );
        
        return ResponseEntity.ok(new UploadResponse(processingId));
    }
}

Buyer API Service

@RestController
@RequestMapping("/buyer")
public class BuyerController {
    
    @GetMapping("/merchants")
    public ResponseEntity<List<String>> getMerchants(
            @RequestParam String pincodes,
            @RequestParam(defaultValue = "pincodes") String mode) {
        
        List<String> pincodeList = Arrays.asList(pincodes.split(","));
        List<String> merchants = serviceabilityService
            .getMerchantsForPincodes(pincodeList);
            
        return ResponseEntity.ok(merchants);
    }
}

Key Lessons for Building High-Scale Systems

1. Choose the Right Database for Your Access Patterns

Traditional RDBMS couldn't handle our sparse matrix efficiently. HBase's column-family storage was perfect for our use case.

2. Cache Strategically, Not Everything

Our multi-tier caching reduced database load by 70%, but we only cached frequently accessed data to avoid memory bloat.

3. Design for Your Data Distribution

Understanding that 99% of merchant-pincode combinations would be empty guided our entire architecture.

4. Optimize for Bulk Operations

Real users don't make single requests - they upload thousands of records at once. Design for this reality.

5. Monitor What Matters

We tracked P95 response times, not just averages. That 95th percentile tells you about real user experience.

Beyond the Hackathon: Production Considerations

While our hackathon solution proved the concept, production deployment would need:

Data partitioning across geographic regions
Disaster recovery with cross-region replication
Rate limiting to prevent abuse
Monitoring and alerting for system health
A/B testing framework for optimizations

The Impact

This project demonstrated that with the right architecture, you can build systems that handle massive scale while maintaining blazing-fast performance. The ONDC network can now efficiently verify serviceability for millions of merchants across thousands of pincodes in real-time.

Optimizing 300 Billion Data Points: Building ONDC's High-Performance Serviceability System

Content

📢 Listen to this article

Optimizing 300 Billion Data Points: Building ONDC's High-Performance Serviceability System

The Problem: Scale Meets Speed

The Architecture: HBase + Smart Caching

Why Apache HBase?

The Smart Caching Layer

Performance Optimizations: The Details That Matter

1. Batch Processing for Bulk Uploads

2. Intelligent Row Key Design

3. Parallel Query Processing

Real-World Performance Results

The Spring Boot Microservices Architecture

Seller API Service

Buyer API Service

Key Lessons for Building High-Scale Systems

1. Choose the Right Database for Your Access Patterns

2. Cache Strategically, Not Everything

3. Design for Your Data Distribution

4. Optimize for Bulk Operations

5. Monitor What Matters

Beyond the Hackathon: Production Considerations

The Impact