AWS API Gateway supports websockets.
Unfortunately, their service does not provide the ability to persist connection data and reconnect on flaky internet sessions. Nor could I find any example projects with those features. In this article, I will explore a potential solution using Lambda and DynamoDB.
Firstly, how do API Gateway websockets work?
AWS uses routes to execute different actions. Consider a chat application workflow, assuming Lambda is used for compute:
Client connects to websocket, firing the
$connect
route and associated Lambda.Client sends JSON payload
{action: 'sendmessage'}
, firing thesendmessage
route.Server can send data to client by specifying a
socketUrl
withConnectionId
.If client sends JSON payload without action, the
$default
route is fired.Client disconnects, firing the
$disconnect
route.
Limitations with API Gateway websocket:
Every time the
$connect
route is called, a newConnectionId
is created. To persist connection on flaky internet, the client must store an ID.Lambda is ephemeral, so a database like DynamoDB is required to persist connection data — connection data shouldn’t be stored in memory in case of failure.
The 10 minute connection timeout can be avoided with a ping/pong request.
Max websocket duration of 2 hours, which would require a new connection session.
A solution to reconnecting websockets on AWS
This solution uses custom actions instead of $connect
and $disconnect
to manage connections, which shifts connection management to the client instead of AWS. DynamoDB will persist connection data and provide an event-driven architecture for returning messages as well as interfacing with external services.
Case 1: Ideal connection
After opening a websocket connection, the client will send an empty socketId to the
open
route, which generates a new socketId for the client. The socketUrl, ID and currentId are stored in DynamoDB.Client calls external service “ping”, providing
socketUrl
and socketId. These details are used to update the DynamoDB table.Updating DynamoDB will trigger the
message
Lambda to post any stored messages in the updated row to the client.Client is responsible for triggering the
close
route, which will delete the associated row in DynamoDB.If client fails to close the connection, a TTL (time to live) data deletion timer can be specified for Dynamo DB.
Case 2: Flaky connection
If the internet connection is poor, the websocket connection may close on the client side.
When the client is offline, new messages will be stored in DynamoDB.
A “back online” event listener on the client will create a new websocket connection.
Client will send an existing socketId to the
open
route, updating DynamoDB with the newConnectionID
. This triggers an event to send the last message back to the client.
Case 3: Long connection
An under 10 minute disconnect and connect interval will solve this issue.
Limitations to consider
- DynamoDB stream adds latency for message updates to client.
- Connection data will not persist if client is refreshed. Also, storing socketId in local storage is not ideal when multiple tabs are used.