How the Bluesky team reduced Timeline latency by 96%



The number of Bluesky users

exceeded 30 million on January 30, 2025. As the number of users continues to grow, Jaz of the Bluesky development team reports that by adjusting the timeline management method, they have succeeded in reducing latency by 96% without affecting most users.

When Imperfect Systems are Good, Actually: Bluesky's Lossy Timelines · Jaz's Blog
https://jazco.dev/2025/02/19/imperfection/



Bluesky indexes user posts and stores them in a database. It also builds each user's timeline by searching for all followers of the user who posted and adding a new row with a reference to the post to each follower's timeline table.



The timeline table is split into hundreds of

shards , and the processing load of each individual timeline is small enough that it doesn't cause any issues if all users are behaving normally.



However, at the time of writing, Bluesky has more than 31.88 million users, some of whom behave abnormally like bots, such as 'following hundreds of thousands of people.' Such users' timelines generate an extremely large amount of posts, which increases the load on the system and increases latency. In addition, because one shard manages multiple users, this also has a negative impact on users other than the user in question.



Jaz explains how latency occurs using the example of processing the timelines of 2 million users based on a single user's post.

Bluesky's timeline write process takes an average of 600 microseconds per user, and processing 2 million timelines sequentially takes a total of 20 minutes. By processing 1000 timelines in parallel at once, the total processing time can be reduced to 1.2 seconds.

However, the 600 microsecond write latency is only an average latency, and in reality it can exceed 15 milliseconds. Bluesky also separates pages for every 10,000 followers, and when processing more than 10,000 users, a 'process to expand the next page' occurs. When the write latency and page processing overlap, it can take up to 5 minutes to process the timeline for a celebrity with 200,000 followers.



Jaz points out that 'users with more than 100,000 followers never actually read the entire timeline.' For users who follow such a large number of users, it is possible to reduce the processing load on each shard by applying a mechanism that 'processes all posts in chronological order, but loses some of them.' Jaz calls this mechanism 'Lossy Timelines.'

By implementing Lossy Timelines, we were able to significantly reduce the latency of Bluesky's timeline writing process. The graph below shows the time it takes for 90% of requests to be processed (P90 latency) on the top and the time it takes for 99% of requests to be processed (P99 latency) on the bottom. With the introduction of Lossy Timelines, P99 was reduced by more than 90%.



Below is a diagram showing the P99 latency of the entire timeline before (top) and after (bottom) the introduction of Lossy Timelines. The P99 latency of the entire process was reduced by 96%, and it seems that processes that previously took 5 to 10 minutes can now be executed in less than 10 seconds.



Jaz also directs people interested in developing the above-mentioned systems to Bluesky's job information page.

Jobs - Bluesky
https://bsky.social/about/join



in Web Service,   , Posted by log1o_hf