AWS RTB Fabric for SSPs: Designing Sub-100ms, Cloud‑Native Auctions at Global Scale

Designing a real-time bidding fabric that consistently executes auctions in under 100 milliseconds is both an engineering and operating discipline. For SSPs, that discipline is the difference between winning demand and losing it, between publisher trust and churn, and between cost-consumptive infrastructure and a durable margin. This thought piece lays out a practical blueprint for SSPs to build a sub-100ms, cloud-native auction fabric on AWS. It blends hard-learned latency truths, a separation-of-concerns architecture, and modern AWS primitives that allow you to run active-active across regions with predictable tail latency. We write from the vantage point of the supply side. Red Volcano serves SSPs and publisher platforms with discovery, technology stack intelligence, and signals like ads.txt, sellers.json, SDK presence, and CTV inventory metadata. These data assets are invaluable when they are dependable in the control plane and cheap to consult in the data plane. The goal: a fabric that is fast, observable, privacy-forward, and cost-disciplined, while remaining simple enough to operate at scale.

The 100ms Problem, Stated Plainly

RTB is a deadline business. Publishers and SDKs set timeouts for their callouts to SSPs, and SSPs typically allow bidders 50 to 120ms to respond. Every microsecond spent on network hops, TLS, deserialization, cache misses, budget checks, price floors, and deal matching chips away at the auction time budget. You are not optimizing for average latency. You are optimizing for p99 and worst-case behavior, because the long tail is what blows budgets and drops win rates. A typical end-to-end budget for an SSP-side server-to-server auction might look like this:	Stage	Typical budget
Edge accept + auth + routing	4 to 8ms
Request parse + validation	2 to 5ms
Identity, consent, geo lookup (cached)	2 to 6ms
Deal matching + floors + brand safety policy fetch	4 to 10ms
Fanout to demand-side adapters (parallel)	35 to 55ms
Bid scoring + tie-breaks	2 to 6ms
Win notification scheduling + response build	3 to 6ms
Safety margin	5 to 10ms

If you run a demand fanout with 40ms budgets and 10ms of everything else, you are operating on a knife’s edge. The architecture must assume cache hits, local memory paths, and predictable network performance at the 95th and 99th percentiles.

Core Principles for a Sub-100ms Fabric

Separate data plane and control plane: The auction path must be thin, predictable, and non-blocking. Anything that is not absolutely required to return a bid should run off-path.
Cache-first, memory-first: Hot decision data must live in memory or on local Redis with deterministic time-to-first-byte. Disk, cross-AZ, and cross-region calls are last resorts.
Parallel everywhere, bounded everywhere: Fanout to adapters in parallel, enforce per-adapter deadlines, and return partial results rather than waiting for stragglers.
Region-local, active-active: Keep auctions regional. Use global routing to the closest healthy region and avoid cross-region state during the bidding window.
Deterministic failure: Time out before you blow the SLA. Return no-bid predictably when prerequisites cannot be fetched inside budget.
Privacy by design: Minimize personal data in the data plane. Enforce consent and jurisdiction logic early. Persist only what you must.
Measure tail latencies: Engineer to p95 and p99, not p50. Allocate headroom explicitly.

AWS Building Blocks That Matter

AWS gives you a menu of primitives. The challenge is choosing the simplest options that meet your latency and reliability needs without accidental complexity.

Global entry: Route 53 latency-based routing or AWS Global Accelerator for anycast and fast failover.
Layer 4 load balancing: Network Load Balancer for gRPC or HTTP/2 with low jitter and TLS pass-through or termination.
Compute: EC2 with Graviton for data plane services that need consistent performance; EKS for orchestrating microservices; Fargate for control plane tasks.
Low-latency caches: ElastiCache for Redis with cluster mode; node-local in-memory caches using lock-free structures.
Fast configurations: DynamoDB for hot configuration and budget counters; DynamoDB Streams for reactive updates.
Streaming: Amazon Kinesis Data Streams or Amazon MSK for paced event ingestion, auction logs, and downstream analytics.
Observability: CloudWatch metrics and alarms, AWS Distro for OpenTelemetry, distributed tracing with X-Ray or OTLP backends.
Edge code: CloudFront Functions or Lambda@Edge for lightweight request conditioning or maintenance mode signaling.
Security: AWS WAF, Shield Advanced, KMS for encryption, Secrets Manager and IAM for least privilege.

A note on compute selection: for the hot path, raw performance and noisy neighbor isolation matter more than convenience. EC2 with Nitro hypervisor and ENA networking typically offers lower jitter than multi-tenant serverless. Reserve Fargate and Lambda for control plane, backfills, and asynchronous pipelines where latency budgets are looser.

Reference Architecture Overview

At a high level, the RTB fabric decomposes into four planes:

Ingress plane: Global routing to the closest healthy region, L4 termination, admission control, basic auth, and coarse feature flags.
Data plane: Auction service, adapter fanout, scoring, and response assembly. This plane runs entirely within region and never blocks on external IO.
Control plane: Configuration management, deal catalogs, floors, policies, identity mappings, and model parameters pushed into caches.
Analytics plane: Auction logs, win/loss notifications, budget and pacing metrics, fraud investigations, and BI.

Ingress and Global Routing

Use AWS Global Accelerator to present a single anycast IP to SDKs and publisher servers. This shortens network paths and provides fast regional failover. Behind Global Accelerator, point to Network Load Balancers in two or more regions. NLB gives you low-latency L4 load balancing, TLS termination where appropriate, and stable performance under load. Keep admission control at ingress minimal: authentication via shared secrets or mTLS, rejection of malformed requests, and rate limits if needed. Anything more should be done inside the region by the auction service to preserve context.

Data Plane: Auction Microservices

Design the data plane as a small set of CPU-pinned services:

Request normalizer: Parses OpenRTB, validates fields, attaches context like consent and geo from local caches.
Auction coordinator: Orchestrates adapter fanout, enforces per-adapter timeouts, combines bids, applies floors and deals, selects the winner.
Adapter workers: Demand connectors that translate internal request format to DSP-specific nuances or forward OpenRTB to server-to-server partners.
Pricing and policy engine: Applies floors, brand safety, risk rules, and advertiser blocks using precomputed policy snapshots.

All of these services should run on EKS or EC2 Auto Scaling groups with:

Graviton-backed instances where possible for price-performance.
Host networking or high-performance CNI for minimized network overhead.
Memory-resident caches and a dedicated local Redis cluster for shared hot keys.
gRPC with protobuf for service-to-service calls to reduce overhead compared to JSON.

Control Plane: Consistency Without Blocking

Control plane systems own the source of truth but never sit on the critical request path. They publish snapshots and deltas to the data plane:

Config registry in DynamoDB with typed schemas for deals, floors, seller policies, and bidder settings.
Publisher inventory and tech stack data from platforms like Red Volcano, synced into normalized tables and annotated per domain, app bundle, and channel.
Consent and privacy rules updated as regulations or IAB frameworks evolve.
Budget and pacing counters rolled up from event streams into per-demand and per-deal tokens that adapters consult locally.

Push not pull is the mantra: data plane processes subscribe to changes via DynamoDB Streams, Kinesis, or an internal pub-sub. They apply deltas to in-memory structures without blocking calls.

Analytics Plane: Everything Off-Path

The analytics plane swallows high-volume logs and provides replayable truths:

Raw auction logs to Kinesis or MSK, then S3 as Parquet for long-term retention.
Real-time metrics for pacing and spend, computed in Flink or Kinesis Data Analytics with 1 to 5 second windows.
Feature stores built from offline joins, with snapshot exports pushed back to the data plane.

None of this should ever block or slow the auction. If the analytics plane is slow or down, the auction continues.

Identity, Consent, and Privacy in the Hot Path

Identity signals vary by channel. Web may include first-party IDs or TCF strings. Mobile brings IDFA or GAID subject to ATT and user choice. CTV varies by platform and often relies on device IDs or household proxies. Across all, consent governs processing. In the hot path:

Parse and validate IAB GPP strings, including TCF and US state sections when present.
Geo-resolve from IP using a memory-mapped database with subnet granularity. Avoid cross-network calls.
Apply data minimization: drop or truncate IP after geo, hash user identifiers only where lawful, and avoid persisting request bodies.
Tag the request with a policy bitmask computed locally, for quick checks in adapters and pricing logic.

In the control plane, maintain policy rules and mappings and push compact lookup tables to the data plane. Treat privacy logic as a first-class feature, not an afterthought.

Ads.txt, Sellers.json, and Supply Path Integrity

Supply integrity checks are best done off-path. However, the data plane needs a confident yes or no quickly when it encounters a seller domain or app bundle. Design pattern:

Continuously crawl and parse ads.txt and app-ads.txt for relevant properties.
Normalize and attach SSP and exchange identifiers, pub account IDs, and authorized sellers.
Push a compact snapshot of authorized tuples into Redis and in-memory tries keyed by domain and bundle.
Consult the snapshot in the auction with a single memory lookup.

Red Volcano maintains this data at scale and can be a source to seed your control plane, reducing engineering overhead for crawling and normalization.

Demand Fanout and Adapter Discipline

The largest share of latency sits in the fanout to demand. Treat adapters as external dependencies that must be bounded and observed.

Hard per-adapter deadlines less than your aggregate budget, for example 35 to 45ms.
Partial results philosophy: if adapter B is slow, do not wait at the expense of adapters A and C that already bid.
Connection reuse via HTTP/2 or gRPC with pooled connections and circuit breakers.
Backoff and trip problematic adapters automatically and degrade gracefully.

Example: Go Adapter with Context Budget

// Enforce a hard per-adapter deadline inside a global auction budget.
func callAdapter(ctx context.Context, req *AdapterRequest, timeout time.Duration) (*Bid, error) {
// Derive a child context with its own deadline
cctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
ch := make(chan *Bid, 1)
errCh := make(chan error, 1)
go func() {
bid, err := doHTTP2Call(cctx, req) // non-blocking, with connection pooling
if err != nil {
errCh <- err
return
}
ch <- bid
}()
select {
case bid := <-ch:
return bid, nil
case err := <-errCh:
return nil, err
case <-cctx.Done():
// Deadline exceeded - deterministic no-bid
return nil, context.DeadlineExceeded
}
}

Floors, Deals, and Pricing Logic

Pricing is where policy meets revenue. Keep logic deterministic and local.

Static and dynamic floors should be precomputed as a small in-memory table keyed by publisher, placement, and geography.
Deal catalogs should be compiled to a selector structure for O(1) or O(log N) matching at runtime.
Bid adjustments for currency, advertiser blocks, and supply-path requirements should be pure functions on a compact request context.
Tie-breaking should be simple and documented, for example highest price then earliest valid response timestamp.

Example: Compact Pricing Policy Cache

// Rust for deterministic, low-latency floors and deal checks
use std::collections::HashMap;
struct FloorKey {
pub_id: u64,
geo: u16,
size: u16,
}
struct PricingPolicy {
floors: HashMap<FloorKey, f32>,
deals: DealSelector, // precompiled trie-like structure
}
impl PricingPolicy {
fn floor_for(&self, key: &FloorKey) -> f32 {
*self.floors.get(key).unwrap_or(&0.0)
}
fn match_deals(&self, ctx: &RequestCtx) -> Vec<DealRef> {
self.deals.match_ctx(ctx)
}
}

Budget, Pacing, and Token Buckets

Demand partners care about pacing and budgets. Do not put distributed counters in the hot path. Instead, use token buckets that are replenished asynchronously. Pattern:

Maintain canonical spend and delivery in streaming jobs that aggregate auction and win logs.
Emit per-partner, per-deal tokens at 1 to 5 second intervals into DynamoDB or Redis.
Adapters consult local tokens and decrement atomically. If empty, return no-bid quickly.
Reconcile periodically with the canonical stream to correct drift.

Example: Redis Token Bucket Decrement

# Lua script to perform atomic token bucket decrement in Redis
lua = """
local key = KEYS[1]
local tokens = tonumber(redis.call('GET', key) or '0')
if tokens > 0 then
redis.call('DECR', key)
return 1
else
return 0
end
"""

Protocol Choices: gRPC, HTTP/2, and Serialization

Use gRPC with protobuf internally for lower CPU overhead and better framing under load. For external S2S partners, support OpenRTB over HTTP/2 where possible and maintain hardened HTTP/1.1 fallback. Keep serializers fast and stable:

Protobuf for internal service messages.
JSON with preallocated buffers for OpenRTB where necessary.
Binary features like Brotli or gzip should be used carefully; compression overhead can exceed savings at sub-100ms scales.

Example: Auction Service gRPC Proto

syntax = "proto3";
package auction.v1;
message AuctionRequest {
bytes openrtb_json = 1; // raw OpenRTB JSON bytes, already validated
string region = 2;
string consent_mask = 3; // compact privacy flags
uint64 deadline_ms = 4;
}
message Bid {
string adapter = 1;
double price = 2;
string creative_id = 3;
string deal_id = 4;
bytes ext = 5;
}
message AuctionResponse {
repeated Bid bids = 1;
string winner = 2;
double clearing_price = 3;
}
service Auction {
rpc Run(AuctionRequest) returns (AuctionResponse);
}

Edge Conditioning and Maintenance Modes

Do not serve heavy logic at the edge, but do use CloudFront Functions or Lambda@Edge for lightweight concerns:

Blocklists of abusive IPs or impossible user agents.
Maintenance headers to signal temporary no-bid modes during controlled regional failovers.
Early rejection of oversized payloads or unsupported content types.

Example: CloudFront Function to Enforce Size

function handler(event) {
var request = event.request;
var headers = request.headers;
var contentLength = headers['content-length'] ? parseInt(headers['content-length'].value) : 0;
if (contentLength > 65536) {
return {
statusCode: 413,
statusDescription: 'Payload Too Large'
};
}
return request;
}

Multi-Region, Active-Active

Your auction fabric should be multi-region active-active. Use Route 53 latency-based routing or Global Accelerator to steer traffic to the closest healthy region. Keep state regional during auctions. Key patterns:

Health-based failover with fast detection and promotion.
Asynchronous state replication via DynamoDB global tables for configs, and streams for analytics.
No cross-region calls in the data plane. If a region is degraded, shed load quickly rather than stretching latency.
Gray deployments with per-tenant or per-publisher routing to canary new versions safely.

CDK Snippet: Global Accelerator to NLB Targets

import * as ga from '@aws-cdk/aws-globalaccelerator-alpha';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
const accelerator = new ga.Accelerator(this, 'RtbAccelerator', {
acceleratorName: 'rtb-global',
});
const listener = accelerator.addListener('Listener', {
portRanges: [{ fromPort: 443 }],
});
const arnNlbUse1 = 'arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/net/rtb-use1/abc';
const arnNlbEuw1 = 'arn:aws:elasticloadbalancing:eu-west-1:123456789012:loadbalancer/net/rtb-euw1/def';
listener.addEndpointGroup('USE1', {
region: 'us-east-1',
endpoints: [new ga.NetworkLoadBalancerEndpoint(elbv2.NetworkLoadBalancer.fromNetworkLoadBalancerAttributes(
this, 'NlbUSE1', { loadBalancerArn: arnNlbUse1 }
))],
});
listener.addEndpointGroup('EUW1', {
region: 'eu-west-1',
endpoints: [new ga.NetworkLoadBalancerEndpoint(elbv2.NetworkLoadBalancer.fromNetworkLoadBalancerAttributes(
this, 'NlbEUW1', { loadBalancerArn: arnNlbEuw1 }
))],
});

Observability for Tail Latency

You cannot optimize what you cannot see. Observability must be built in from day one.

High-cardinality metrics at adapter and publisher granularity for p50, p95, p99 latency and error rates.
Tracing with baggage that carries auction deadlines through the call graph.
Blackbox canaries that submit synthetic auctions from edge locations to measure external latency, not just service timings.
Event logs sampled with headers that allow correlation across planes.

OpenTelemetry Collector Config Skeleton

receivers:
otlp:
protocols:
http:
grpc:
exporters:
awsprometheusremotewrite:
endpoint: 'https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-abc/api/v1/remote_write'
awsxray: {}
processors:
batch: {}
probabilistic_sampler:
hash_seed: 1234
sampling_percentage: 5.0
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [awsprometheusremotewrite]
traces:
receivers: [otlp]
processors: [probabilistic_sampler, batch]
exporters: [awsxray]

Cost Discipline Without Compromising Latency

Cloud cost is a design constraint, not a post hoc surprise. Sub-100ms does not require unbounded spend.

Right-size instances and pin CPU for data plane workers. Prefer Graviton for price-performance.
Reserve where stable: Savings Plans for a baseline, On-Demand for burst, Spot only for non-critical analytics.
Minimize cross-AZ traffic by placing cache and workers together when viable, balanced against AZ fault tolerance.
Control serialization cost with protobuf internally and JSON only at the edges.
Keep payloads small: prune OpenRTB request fields you do not use, compress only when net win.

Channel Nuances: Web, App, and CTV

A single fabric can serve three channels, but each benefits from tailored logic.

Web: TCF consent parsing, cookies or first-party IDs, SSP integration with Prebid Server and deal priority. Latency budget is typically 80 to 120ms for SSPs in header bidding chains.
Mobile App: ATT status and limited identifiers, SDK networking variability on cellular networks, and more aggressive retry behavior by SDKs. Reduce payload sizes and consider shorter adapter timeouts.
CTV: SSAI and longer creative fetch paths, VAST validation, and in some cases more generous timeouts. However, live events present bursty demand that requires predictable scaling and pacing.

Use Red Volcano’s CTV inventory intelligence and SDK presence data in the control plane to pre-segment supply, customize floors and quality controls, and align deals to engaged content types. Keep the data-plane lookup constant time.

Data Models and Schemas That Avoid Surprises

Define schemas explicitly to avoid serialization tax and query ambiguity.

OpenRTB request subsets you support, with defaults and coercions documented.
Internal structs that collapse request context into fixed-width types for fast hashing and matching.
Config tables in DynamoDB with strict types and TTLs where appropriate for ephemeral entries.

CDK: DynamoDB Tables for Config and Tokens

import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
const configTable = new dynamodb.Table(this, 'Config', {
partitionKey: { name: 'pk', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'sk', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
replicationRegions: ['eu-west-1'],
pointInTimeRecovery: true,
});
const tokenTable = new dynamodb.Table(this, 'Tokens', {
partitionKey: { name: 'tokenKey', type: dynamodb.AttributeType.STRING },
timeToLiveAttribute: 'ttl',
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
});

Failure Modes and How to Handle Them

The question is not whether you will see failures but how you fail.

Cache cold start: Boot with read-only snapshots baked into container images. Layer with streaming updates.
Adapter brownouts: Trip circuits, reduce per-adapter deadlines, and gradually reintroduce traffic.
Regional impairment: Return no-bid quickly rather than timing out. Route new traffic via Global Accelerator to healthy regions.
Control plane lag: Continue with last known good policy and expire aggressively only when absolutely required.

Governance, Compliance, and Auditability

You need credible answers for publishers, auditors, and regulators.

Deterministic policy application: Be able to demonstrate, with logs and config hashes, what rules were applied to any auction.
Consent enforcement: Log consent bitmasks and the paths taken, without storing personal data.
Change management: Version configs and support rollback to last known good states.
Data retention: Separate audit logs from feature logs; minimize retention windows where feasible.

Experimentation Without Latency Regressions

Experimentation is vital, but it must not degrade tail latency.

Shadow experiments that compute alternative decisions off-path and compare outcomes offline.
Sticky assignment of traffic to variants using fast, stateless hashing of stable identifiers.
Guardrails that abort experiments automatically when p95 or p99 latency breaches thresholds.

Example: Sticky Assignment Function

func AssignVariant(key string, variants []string) string {
h := fnv.New32a()
_, _ = h.Write([]byte(key))
idx := int(h.Sum32()) % len(variants)
return variants[idx]
}

Migration Path: From Monolith to Fabric

If you operate a monolithic auction service today, you can evolve.

Step 1: Carve out the adapter fanout into a separate service with strict deadlines.
Step 2: Introduce Redis for floors and deals, then move config authoring to DynamoDB with streaming updates.
Step 3: Front with Global Accelerator and add a second region with read-only config replication.
Step 4: Separate analytics streams from the hot path and migrate reports to S3-backed queries.

Measure at each step. Do not bundle changes that make attribution of improvements difficult.

What Not to Do

A short anti-patterns list can save months:

Do not put databases in the auction path for lookups that can be cached.
Do not depend on cross-region reads during bidding.
Do not rely on best-effort retries to meet deadlines; they increase tail latency.
Do not accumulate unbounded ext fields in OpenRTB; keep payloads lean.
Do not ignore adapter variability; enforce deadlines universally.

Bringing Red Volcano Data to the Party

Control-plane data quality is an edge in the auction. Red Volcano’s assets can reduce misspend and improve match rates:

Publisher tech stack detection: Identify wrappers, SSP tags, and SDKs, then auto-tune integration strategies per property.
Ads.txt and sellers.json monitoring: Feed authorized seller snapshots into the policy cache to reject invalid supply quickly.
CTV dataset: Classify inventory types, content packages, and SSAI vendors to improve deal targeting and pricing rules.
Mobile SDK intelligence: Detect SDK combinations correlated with networking or fraud risks and adjust adapter budgets accordingly.

Integrate these as feeds into your control plane, transformed into compact policy snapshots for the auction services.

Putting It All Together: A Day-2-Ready Checklist

Ingress: Global Accelerator to NLB; TLS 1.3; mTLS for partners where feasible.
Compute: EC2 Graviton for data plane; EKS for orchestration; node-local caches; Redis cluster mode.
State: DynamoDB for config and tokens; S3 for logs; Kinesis for streams.
Protocols: gRPC internal; HTTP/2 external; protobuf; bounded JSON.
Privacy: GPP parsing; geo in memory; consent bitmasking; data minimization.
Observability: p50, p95, p99 per adapter; traces with deadlines; blackbox canaries.
Resilience: Active-active regions; no cross-region calls in data plane; deterministic timeouts.
Cost: Savings Plans baseline; right-sized instances; compression only when net positive.

Conclusion: Sub-100ms as an Operating System

Sub-100ms is not a feature. It is an operating system for how you build, measure, and deploy. On AWS, the combination of Global Accelerator, NLB, EC2 or EKS with Graviton, Redis, DynamoDB, and Kinesis gives you the primitives for a predictable, regional, and cost-disciplined fabric. The architectural through-line is clear: keep the data plane thin and local, push state from the control plane, observe tail latency relentlessly, and favor determinism over cleverness. For SSPs and publisher platforms, the payoff is durable. Better hit rates from demand. Fewer timeouts for publishers. More transparent policy enforcement. And a platform that can evolve with privacy and channel shifts, not be broken by them. Red Volcano’s role is to keep your control plane sharp with reliable publisher and supply-chain intelligence. The design presented here ensures that intelligence shows up where it matters most: in memory, on time, under 100 milliseconds.

References and Further Reading

IAB Tech Lab OpenRTB 2.6 Specification: Defines bid request and response semantics widely used in RTB.
IAB Tech Lab Global Privacy Platform: Framework for passing multi-regional consent and privacy signals.
Prebid Server Documentation: Open source reference for server-side header bidding architectures.
AWS Global Accelerator: Anycast front door with regional failover and performance benefits.
AWS Nitro System and Graviton Processors: Compute foundations for predictable performance and efficiency.
Amazon ElastiCache for Redis: Low-latency shared caching with cluster mode for horizontal scale.