Jun 09, 2024 22:05:00

Here's an animated visualization of the various rate limiting algorithms:

There are many algorithms that perform rate limiting to prevent spam, etc. ChatGPT-related service smudge.ai has published a blog post on its development blog that uses animation to clearly summarize the characteristics of various algorithms.

rate limiter – smudge.ai blog

https://smudge.ai/blog/ratelimit-algorithms

In the case of spam appearing in a stream's chat, if there is no rate limiting, a spammer can flood the chat by posting multiple times in a short period of time.

Check 'Enable rate limiting' in the upper left to see how rate limiting works. By adding rate limiting, we were able to block most of the spammers' posts and reduce their impact on the chat section. By selecting the appropriate algorithm depending on the situation, we can effectively block spam while reducing the impact on general users' chats.

A commonly used rate limiting algorithm is 'Fixed windows,' which limits the number of hits allowed within a certain period of time. The blog animation shows a timeline, and you can simulate hits by manually clicking 'Hit.' The example has a limit of '6 hits per day,' meaning that after six successful hits immediately after the date change, no more hits are allowed for the rest of the day.

The fixed window method has the advantages of being simple, easy to implement, and easy for users to understand, but it can sometimes result in more traffic than expected concentrating near the end of a period.

To prevent consecutive accesses near the end of the period, you can start the period from the user's first access. When using this method, it is important to show the user when the rate limit will be restored.

Apart from fixed windows, one of the commonly used methods is the 'sliding window method.' With the sliding window method, a rate limiting period occurs each time access occurs, and the rate limit is restored by 1 after a certain period of time. The sliding window method distributes traffic smoothly, making it suitable for use in high-load environments, but it is said to be complex to implement and tends to increase the load.

When limiting the rate to relieve load, it would be counterproductive if the rate limiting algorithm increased the load. For this reason, many services use an approximation method called the floating window method for calculation. The floating window method is based on the fixed window method, and the number of accesses that occurred in the previous window is weighted by the proportion of the previous window between the current time and a certain time ago, and is included as a target for rate limiting.

For example, in the figure below, five accesses occurred in the previous window, and the proportion of the previous window from the 'current time' to a certain time ago is 57%. Since 5 x 0.57 = 2.85 is added to the rate limit, the rate limit occurs and the request fails even though only three requests have occurred in the current window.

Strictly speaking, the timing of access differs between the sliding window method and the floating window method, but the number of accesses is the same on average, and the floating window method has the advantage that it is much easier to implement and has a lighter load.

Another rate limiting method is the 'token bucket method.' Instead of tracking access over a period of time, the token bucket method uses a 'bucket in which tokens accumulate at regular intervals.' If there are tokens in the bucket, access is allowed, and once the tokens are used up, rate limiting occurs until they are replenished.

Although the token bucket method still allows for the possibility of sudden increases in traffic, it has the advantage that it can clearly define the upper limit of the number of consecutive accesses without tracking access over a certain period of time, and the long-term allowable average access rate is within the token replenishment interval.

In addition, the token bucket method is highly flexible and can be used to mimic other methods. For example, by setting the tokens to be replenished every time by the same number as the upper limit, it is possible to achieve the same rate limit as a fixed window method that starts from the time the user accesses the service.

Regardless of which method is used, if a user's access is denied due to rate limiting, the user should be notified via

a 429 status code and/or HTTP headers such as 'RateLimit-Remaining' or 'RateLimit-Reset.'

At the end of the original blog post there is an interactive animation that compares the rate limiting methods discussed in this article, so be sure to check it out if you're interested.

Related Posts:

Jun 09, 2024 22:05:00 in Software, Web Application, Posted by log1d_ts