AWS Deep Dive Part 3: Application Integration — API Gateway, SQS, SNS, EventBridge, Step Functions

AWS Deep Dive Part 3: Application Integration

The first two parts of this series covered compute & networking and storage & databases — the runtime and the state. This third part covers the connective tissue: application integration. Modern AWS architectures are built from many small services that need to talk to each other reliably, often asynchronously, often across team boundaries. The five services in this post — API Gateway, SQS, SNS, EventBridge, and Step Functions — are the dominant patterns for doing that on AWS.

The Integration Landscape

Before diving into each service it helps to see how they relate. Roughly speaking: API Gateway is how clients talk to your system; SQS moves work between producers and consumers with at-least-once delivery; SNS fans a single message out to many subscribers via pub/sub; EventBridge routes structured events from many sources to many targets with content-based filtering; and Step Functions orchestrates multi-step workflows where you need to coordinate, branch, retry, and wait.

AWS application integration landscape

1. Amazon API Gateway — The Front Door

Amazon API Gateway is the managed service for accepting HTTP and WebSocket traffic and forwarding it to backends such as Lambda, EC2/ECS behind an ALB, or any HTTP endpoint. It handles TLS termination, authentication, throttling, request validation, request/response transformation, caching, and usage plans — a long list of things you would otherwise write into a reverse proxy yourself.

API Gateway has three flavors. REST APIs are the original, feature-rich variant: AWS_IAM, Cognito and Lambda authorizers, mapping templates, API keys with usage plans, edge-optimized endpoints with CloudFront, and request validation with JSON Schema. HTTP APIs are the newer, simpler, and significantly cheaper option (about 70% less per million calls); they cover the majority of needs and integrate natively with JWT authorizers. WebSocket APIs maintain persistent connections for real-time use cases like chat or live dashboards.

API Gateway request pipeline

For new projects with simple authn/authz needs, start with HTTP APIs. Reach for REST APIs only when you actually need their extra features: per-method API keys with usage plans, response caching, request validation, or AWS_IAM authorization on routes. For WebSockets, the API Gateway flavor is essentially mandatory unless you maintain a persistent connection tier yourself on ECS or EC2.

2. Amazon SQS — Simple Queue Service

Amazon SQS is the workhorse asynchronous queue on AWS. A producer sends a message; a consumer (or many consumers in parallel) polls the queue and processes the message. The queue persists messages for up to 14 days and survives outages of either side. Two flavors are available: Standard queues offer best-effort ordering, at-least-once delivery, and effectively unlimited throughput; FIFO queues offer strict ordering, exactly-once processing (per message group ID), and up to 3,000 messages per second per FIFO queue with high-throughput mode enabled.

The key abstractions to understand are visibility timeout (how long a message is hidden from other consumers while one consumer processes it), the dead-letter queue (where messages go after exceeding a maxReceiveCount, so they can be inspected without blocking the main queue), long polling (a Receive call that waits up to 20 seconds for a message, dramatically reducing empty-receive cost and latency), and message attributes (key/value metadata you can filter on without parsing the body).

SQS message lifecycle

SQS pairs especially well with Lambda. AWS automatically polls the queue, batches up to 10 messages (or up to 10,000 with batchSize tuning and partial batch reporting), and invokes your function. If the function throws, the batch (or failed items only) becomes visible again and Lambda eventually retries. Failed messages eventually land in the DLQ for offline inspection.

3. Amazon SNS — Simple Notification Service

Amazon SNS is the pub/sub counterpart to SQS. Publishers send a message to a topic and SNS delivers it to every subscriber attached to that topic. Subscribers can be SQS queues, Lambda functions, HTTPS endpoints, email addresses, SMS phone numbers, mobile push, or Kinesis Data Firehose.

The two patterns SNS solves are fanout (one event needs to land in many places — update a cache, write to an audit log, send an email, kick off a billing run) and cross-account or cross-region notification. The most common SNS topology in production is the SNS → SQS fanout: SNS holds the topic, SQS holds per-subscriber durable buffers, and each consumer reads from its own queue at its own pace. This decouples producers from the pace of slow consumers and prevents one consumer's outage from affecting the others.

SNS to SQS fanout pattern

Two SNS features that are easy to overlook but pay off: message filtering lets each subscription evaluate a JSON policy against message attributes, so a single topic can serve different audiences without each subscriber doing post-receive filtering. And SNS FIFO topics paired with SQS FIFO queues give you ordered fanout when you need it (with the trade-off of lower throughput).

4. Amazon EventBridge — The Serverless Event Bus

Amazon EventBridge is the more modern, richer alternative to SNS for many event-driven use cases. Where SNS is a generic message broker, EventBridge is a content-aware event router that understands a structured envelope ( source, detail-type, account, region, time, detail ) and lets you define rules that pattern-match against any of those fields.

The other reason EventBridge matters is its three native sources. The default bus in every account receives events from many AWS services automatically (EC2 state changes, CloudTrail API calls, S3 events when configured, Auto Scaling lifecycle hooks, Health Dashboard notifications, and so on). Partner event sources deliver events from third parties like Datadog, PagerDuty, Stripe, Shopify, GitHub, MongoDB, and Auth0 without you writing any glue. Custom buses are where your own application events land, isolated per domain or per tenant.

EventBridge sources, bus, rules and targets

The newer features built on EventBridge are worth knowing too. Schemas auto-discover event shapes and let you generate strongly-typed code bindings. Archive & Replay stores all events on a bus and lets you replay them through rules to back-fill a new consumer or recover from a downstream outage. Pipes connects a source (DynamoDB Streams, Kinesis, SQS, MQ, MSK) directly to a target with optional filter and enrich steps, replacing a lot of glue Lambda. Scheduler is a separate but related service that gives you a one-time or recurring trigger with timezone support and ~10 million schedules per account.

When should you reach for EventBridge instead of SNS? Use EventBridge when you have a structured event model, want content-based routing, expect cross-domain or third-party events, or need replay. Use SNS when you have a high-volume firehose to fan out, need SMS or mobile push delivery, or want the lower per-message price for very high throughput.

5. AWS Step Functions — Workflow Orchestration

AWS Step Functions is the orchestrator for multi-step workflows. You define a state machine in Amazon States Language (a JSON DSL), and the service drives execution through your states — invoking Lambdas, calling AWS APIs, branching on conditions, waiting on timers or callbacks, executing tasks in parallel, retrying with backoff, and catching errors. It is durable: the service holds state across invocations that may span minutes, days, or up to a year.

Two execution modes are available. Standard workflows are durable, exactly-once, can run up to 365 days, and are billed per state transition (~$25 per million transitions); they are right for business processes, orchestrated saga patterns, and long-running approvals. Express workflows are designed for high-volume short-lived workflows (under 5 minutes), at-least-once semantics, and are billed per invocation plus duration; they are an order of magnitude cheaper for high throughput and are perfect as the orchestration layer behind an API Gateway endpoint.

Step Functions order workflow

Step Functions has direct integrations with more than 220 AWS services through optimized SDK and AWS SDK integrations — you can call DynamoDB.GetItem, SQS.SendMessage, ECS.RunTask, or Bedrock.InvokeModel directly from a state without any Lambda glue. Combined with the Wait for callback pattern (a state pauses until an external system calls SendTaskSuccess with the task token), this turns Step Functions into a remarkably powerful integration layer that often eliminates entire fleets of orchestration Lambdas.

How to Choose Between Them

The decision boils down to shape of communication. If a client needs a synchronous request/response, use API Gateway. If one producer wants to hand work to one or many independent consumers, use SQS (and add SNS in front for fanout). If you are routing events with structure across teams or third parties, use EventBridge. If you are coordinating a multi-step process with branching, parallelism, retries, or human-in-the-loop, use Step Functions.

In a typical application you will use four of the five together. API Gateway accepts an order; a Lambda enqueues a job to SQS for asynchronous processing; another Lambda publishes an OrderPlaced event to EventBridge; a Step Functions state machine subscribes to that event and runs the fulfillment workflow; and SNS notifies the customer at each step via email or SMS. Each piece is independently scalable, deployable, and observable.

Common Patterns and Anti-Patterns

A few patterns appear over and over. Fanout with filtering uses an SNS topic or EventBridge bus where each subscriber declares the subset of messages it cares about; producers never need to know the consumer list. Saga via Step Functions implements long-running business processes with explicit compensation paths instead of distributed transactions. Producer-consumer with DLQ uses SQS plus a dead-letter queue and a redrive policy so transient failures are absorbed without losing messages. Webhooks ingest uses API Gateway → SQS → Lambda so a slow downstream cannot cause the third party to retry and back off.

Anti-patterns to avoid: using SQS for ordered per-entity processing without FIFO and a sensible group ID; treating SNS like a queue (it is fire-and-forget without a backing SQS); building a custom orchestrator on Lambda + DynamoDB when Step Functions Express would do it for a third of the code; and using API Gateway REST when HTTP APIs cover the use case at one-third of the cost.

What is Next

Compute, storage, and integration give you a working architecture. The last piece is the connective tissue that keeps it running safely and observably. Part 4 wraps up this series with the four most-used security, observability, and DevOps services on AWS: IAM for identity and access, CloudWatch for metrics/logs/alarms, CloudFormation (and the CDK) for infrastructure as code, and Secrets Manager for handling credentials properly.

☁️ CONTINUE THE AWS DEEP DIVE SERIES

Post a Comment

Previous Post Next Post