Google upgrades Gmail's spam filter to automatically detect ``spam that can only be read by humans''



Spam emails that are sent indiscriminately and in large quantities are not only used to promote products and services, but are also sometimes used to direct you to websites that distribute malicious malware or steal personal information. Google has recently announced that it has successfully upgraded its filter for detecting spam emails, which is a big problem for many people, and has significantly improved its detection ability.

Google Online Security Blog: Improving Text Classification Resilience and Efficiency with RETVec
https://security.googleblog.com/2023/11/improving-text-classification.html



Gmail's AI-powered spam detection is its biggest security upgrade in years | Ars Technica
https://arstechnica.com/gadgets/2023/12/gmails-ai-powered-spam-detection-is-its-biggest-security-upgrade-in-years/

Gmail's spam detection has received its 'largest defense upgrades'
https://9to5google.com/2023/12/04/gmail-spam-detection-retvec/

Google uses text classification models that read the sentences in content to identify content such as phishing attacks, harmful comments, and scams in services such as Gmail, YouTube, and Google Play.

Malicious attackers, on the other hand, mix special characters, emojis, intentional typos, etc. to create text that looks like normal text to humans but is unreadable to computers, resulting in spam. It seems to be avoiding filter detection.

Below is an example of a message created using a method called 'hostile text manipulation.' At first glance, it looks like it says 'Congratulations! A balance of $1,000 is available for your jackpot account,' but it is actually written in the alphabet 'O.' )' are mixed with the number '0 (zero)' and mathematical symbols that look like alphabets to the human eye are used to avoid spam detection by computers.



In its security blog on November 29, 2023, Google announced a new multilingual text vectorizer called 'Resilient & Efficient Text Vectorizer (RETVec)' to make text classification models that detect spam emails more robust and efficient. announced that it had been developed.

RETVec is effective against adversarial text manipulation because it uses machine learning to identify characters by visual similarity, similar to human vision, rather than recognizing each character as a digital symbol. In addition, it is equipped with a very lightweight word embedding model with less than 200,000 parameters, which significantly reduces calculation costs and latency, and can be executed on local devices. .

Google has thoroughly tested the utility of RETVec over the past year and has confirmed that it is highly effective in security and anti-abuse applications. In particular, by replacing the text vectorizer in Gmail's spam classification tool from the previous model with RETVec, they reported that the spam detection rate improved by 38% and the false positive rate decreased by 19.4%. Google says, ``Furthermore, using RETVec reduces the model's Tensor processing unit (TPU) usage by 83%, making the rollout of RETVec one of the largest security upgrades in recent years.'' We are promoting improved spam detection accuracy.

In addition, Google has released the RETVec source code at the link below.

GitHub - google-research/retvec: RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.
https://github.com/google-research/retvec



in Software,   Web Service,   Security, Posted by log1h_ik