By Anubhav Gupta
GoFood, our food delivery product sees over 50 million transactions per month. The inner workings of GoFood is a maze that involves relentless efforts from our GoTroops, driver partners, and merchant partners. On the outside, a typical GoFood order journey looks pretty straightforward:
A customer orders food 🍲
The order goes to our booking system 📱
This order is then sent to our merchant partners 👨🍳
The merchant partner gets 3 minutes to accept the order before order times-out ⌛
The order is fulfilled and our customers are happy 💚
Sometime ago, when our merchant partners got orders through the merchant app, it used a combination of polling, push notifications, and SMS. This approach had issues:
- Polling is a data and battery-intensive operation
- Frequent polling in the background causes the OS to kill the app to save battery
- Polling is non-deterministic. For example: If the CPU is in sleep or doze mode it will not get triggered
- Push notifications are unreliable as there is no guarantee of delivery
- SMS is expensive and also since these are outside the Gojek ecosystem we can’t control their behavior
Now, if you’re thinking we should’ve just increased the polling frequency to save battery and data… We tried that 😅
But it resulted in bad merchant and customer experience due to the following reasons:
- Since merchants have only a few minutes to accept the order, a delayed poll results in merchants losing orders
- Also, it increased customer anxiety while waiting and wondering why the merchant wasn’t accepting the order, and this forced them to cancel the order
It almost seemed like we’d reached a dead end. But we remembered who we are, and at Gojek, there is always a way. 🖖
Our team brainstormed how to fulfill an order without using polling and not relying on push notifications. We began exploring ways to deliver the order through a long-running connection.
But does a long-running connection solve the problems described above? YES! But How?
- A long-running persistent connection is a bi-directional channel where the server can push data at any time
- This eliminates the need for polling on the app.
- This results in very little data and battery consumption
- Also, we can serve the same number of orders with much less server capacity.
Now we know that a long-running connection solves the problem. So, what technology should we use?
There are a lot of options available to achieve the same. We explored gRPC, web-socket, MQTT over web-socket, and MQTT over TCP.
Here’s our comparative analysis:
Challenges on Android
There are some unique challenges on Android which make running a long-running connection quite tricky. Android OS has a few mechanisms to save battery.
- Sleep Mode: Here, the CPU will be sleeping and not accept any command except RIL and alarms.
- Doze Mode: Android introduced Doze mode in Android 6.0(API Level 23) which reduces battery consumption by deferring background CPU and network activity.
- App Standby: An app that goes into standby loses all network access and all its background sync jobs are suspended.
So, why MQTT?
Let's first understand what is an MQTT protocol?
MQTT is an OASIS standard messaging protocol for Internet of Things (IoT). It is designed as an extremely lightweight publish/subscribe messaging transport that is ideal for connecting remote devices with a small code footprint and minimal network bandwidth. MQTT today is used in a wide variety of industries such as automotive, manufacturing, telecommunications, oil and gas, etc.
Here’s how MQTT helps us with the scenarios listed above.
- Very small Network Footprint: The MQTT protocol has a very small network footprint during CONNECT and PUBLISH which comes in handy in low network conditions as well as high throughput messaging.
- QoS Guarantees: Since merchants often have flaky network connections so it’s important to have QoS1 and QoS2 guarantees.
- Built-in Acknowledgment: Backend service can listen to publish acknowledgment from android app and if they don’t receive an acknowledgment within some threshold, backend service can send a High Priority FCM Push notification. On receiving the Push the mobile clients can reconnect. This will helps in Doze, App Standby, and TCP-half-open scenarios
- Alarm Manager Pings: Android app schedules pings using RTC_WAKEUP Alarms and use setExactAndAllowWhileIdle() methods to schedule Alarms. This will increase the reliability of sending pings to maintain long-running connections in the case of sleep mode and light doze mode. Though this is not protocol-specific, the MQTT android library supports it out of the box.
Courier is a scalable, low latency, and persistent network transmission mechanism for mobile<>server communication built over MQTT protocol.
Merchant App Order Delivery Use-case
The first use-case we migrated to Courier was the order delivery flow.
Instead of polling, we now have an MQTT long-running connection established between the app and backend. Whenever we have a new order for a merchant, we deliver the order through this connection.
We deployed this solution to our merchant partners a few months back and the results were excellent. 🙌
Acceptance Rate: We saw a very significant increase in acceptance rates for merchants with relativity low AR.
Network Consumption: There was about a 50% reduction in network consumption by the app.
Future use cases
We are now planning to expand Courier to multiple use-cases
- Replacing polling for food order states and trip status on customer app on Android and iOS app
- For sending high frequency and real-time analytics
- Chat Messaging
The list is endless and we are just getting started. 💪
In Part-2 of this blog, we shall discuss how we went about choosing the MQTT broker to power our setup and which one we chose.
Find more stories from our vault, here.
Also, we’re hiring! Check out open job positions by clicking below: