Email address obfuscation techniques such as 'change from ☆ to @' can be easily broken through with ChatGPT.



If you post your email address as is in the profile section of SNS, it will be collected by scraping and sent as spam, so if you post it as is, it will be collected by scraping and sent as spam. Such obfuscation is often used. However, the AI tool developer pointed out that this technique can be easily circumvented with ChatGPT.

Email Obfuscation Rendered (almost) Ineffective Against ChatGPT

https://bulkninja.notion.site/Email-Obfuscation-Rendered-almost-Ineffective-Against-ChatGPT-728fba1b948d42c6b8dfa73cb64984e4



Arnaud Norman, who is developing the AI tool ``

BulkNinja '', was working on a project to use AI to organize threads called ` `Ask HN: Who is hiring? '' on the social news site Hacker News when he used ChatGPT. I realized that if I use it, I can make the obfuscation of email addresses pointless.

On 'Ask HN: Who is hiring?', various companies and startups post job advertisements, and conversely, job seekers promote themselves, and at the time of writing, there were a total of 48,934 posts. are collected, but the format is inconsistent, so organizing the vast amount of information is a painstaking task.

Mr. Norman, who was trying to compile this data into Google Sheets, expected that ``extracting obfuscated contacts would be difficult,'' but ChatGPT uses Even if it was replaced with text, I was able to collect contact information without any problem.



In addition to the 'replacement method,' there are three obfuscation techniques that Mr. Norman found impressive during the project:

◆1: Information division
This is done by writing part of the email address as 'john@company name domain' so that the email address cannot be found unless it is combined with the company name listed in the post. Although this method was quite effective, it was easily defeated by using the 'think step by step' prompt.

◆2: Indirect publication
This means that you will not be able to obtain an email address unless you access the page by adding a line that says ``For inquiries, please use the email address on the job information page'' instead of writing your email address directly. Since Mr. Norman's code did not have a browsing function, this method is still valid.

◆3: Indirect publication part 2
This is the same method as above, by writing 'The email address is in my profile' and referring to the Hacker News profile. This method was also effective for the reasons mentioned above.

Mr. Norman succeeded in organizing email addresses into Google Sheets using generated AI, but ultimately decided to exclude the obfuscated addresses from the database. This is because the fact that you have gone out of your way to obfuscate it clearly means that the person does not want their email address to be collected.



Commenting on this experience, Norman said, ``In summary, traditional email obfuscation techniques like character substitution are completely ineffective in the face of advanced language models like ChatGPT. They have a great ability to decipher technology, so the fight to protect email addresses from automated collection is a hot topic.If you really want to protect your email addresses, use multiple layers of obfuscation. , it may be possible to better protect the address by sprinkling it across multiple sources.'

The Hacker News thread that featured Norman's article said, 'The cost of extracting email addresses with ChatGPT exceeds the revenue generated by scraping emails, so this has no impact.' There were posts pointing out that this was the case, as well as postings that countered that operating costs can be kept low because there are open source models that run on local machines.

in Software, Posted by log1l_ks