How is Dropbox's newly developed 'asynchronous processing framework' built?



In a synchronous processing system that processes multiple programs in sequence, if one of the processes takes time, the subsequent processing will be delayed entirely. Asynchronous processing can execute the next processing without waiting for the completion of one processing for such synchronous processing. Since bottlenecks due to heavy processing can be avoided, it is useful for improving the response speed of websites.

Arun Sai Krishnan , an engineer of the company, explains the large-scale asynchronous processing framework that was being developed by the cloud storage service Dropbox.

How we designed Dropbox's ATF --an async task framework --Dropbox
https://dropbox.tech/infrastructure/asynchronous-task-scheduling-at-dropbox

Until now, it seems that each team has implemented asynchronous processing functions in Dropbox, and for the purpose of improving development efficiency, improving compatibility of each system, and reducing dependence on old software, a single asynchronous task frame A project was underway to replace the asynchronous processing capabilities of each team with Work (ATF). Dropbox's proprietary ATF is a powerful system that processes 10,000 asynchronous tasks per second and can be shared by teams with about 30 different code bases, and there is room for further scaling.

The whole picture of ATF looks like this. It's hard to understand what's going on at first glance, but Krishnan adds explanations one by one.



' Frontend ' is the part that receives

RPC from ATF users and generates tasks in cooperation with the Task Store described later. A task is an execution unit of a callback function received from a user, and ATF manages the processing in task units.



' Task Store ' is a part that associates fragmentary tasks and saves and inputs and outputs, and it is said that the metadata store 'Edge store' developed by Dropbox is used.



' Store Consumer ' is the part that polls the Task Store, and when it finds an executable task, it retrieves the task from the Task Store and pushes it to the queue. Unlike the in-house Task Store, the queue uses

Amazon Simple Queue Service (SQS) .



' Controller ' and ' Executor ' are the parts responsible for task execution, and one Controller and multiple Executors are operating on a single physical machine. When the Executor requests the next task from the Controller, the Controller gets the task from SQS and puts it in a local queue. The Executor then retrieves the task from the local queue and executes it. Since the Controller has a local queue for each priority, it is possible to control the priority for executing tasks.



' Heartbeat and Status Controller (HSC) ' is the part that monitors the execution status of tasks and updates the status of Task Store. The Controller sends an RPC that tells that the next task request has been received from the Executor, and the Executor sends an RPC that tells the life and death monitoring information of the execution environment and the execution result of the task to the HSC. HSC will update the status of tasks in the Task Store based on the information from Controller and Executor.



The state transition of the task is performed as follows. The new task is in the ' NEW ' state and changes to ' ENQUEUED ' by being queued. After that, the task acquired from the Controller moves to the ' CLAIMED ' state, the task being processed by the Executor moves to the ' PROCESSING ' state, and after the processing ends, it moves to the ' SUCCESS ' or ' FATAL_FAILURE ' state. In addition, by shifting the task in which an error occurred in the intermediate state such as 'CLAIMED' or 'PROCESSING' to the ' RETRIABLE_FAILURE ' state and making it the task to be reprocessed, the completion of processing is guaranteed at least once for each task. It is said that there is.



The above is the ATF architecture developed by Dropbox. 'We hope this post will help other engineers develop a good asynchronous task framework,' said Krishnan.

in Software,   Web Service, Posted by darkhorse_log