AI 'DarkBERT' trained with dark web data where hackers and criminals gather
Chat AIs such as ChatGPT, Microsoft Bing, and Google Bard are trained on internet data. These AI models can be specialized in certain fields such as 'financial specialization' and 'military specialization' by narrowing down the genre of training data. It revealed that it has developed a dark web specialized model ' DarkBERT ' trained only with dark web data.
DarkBERT: A Language Model for the Dark Side of the Internet
New DarkBert AI was trained using dark web data from hackers and cybercriminals | Tom's Guide
Dark Web ChatGPT Unleashed: Meet DarkBERT | Tom's Hardware
Jin Yongjin of the Korea Institute of Science and Technology crawled the dark web for 16 days via the Tor network, which is often used to access the dark web, and created a dark web database. Mr. Youngjin and others processed the constructed data with Meta's natural language processing architecture ' RoBERTa ' and developed AI 'DarkBERT' specializing in the dark web.
Trained on dark web data, DarkBERT is able to analyze the unique terminology and highly obfuscated messages used on the dark web and extract useful information from it. Youngjin and others do not plan to release DarkBERT to the public, but they are accepting requests for use for research purposes.
Despite being trained on limited data, DarkBERT is said to be as powerful as other large-scale language models. DarkBERT is a new AI model, but it was based on 'RoBERTa' developed by Facebook researchers in 2019. RoBERTa was created based on the natural language processing model ``
However, Yongjin and others, who referred to RoBERTa, pointed out that RoBERTa was insufficiently trained when it was first released. By clarifying this research, I showed that 'RoBERTa can do more'.
Technology media Tom's Guide said, 'DarkBERT may represent the future of AI models that have been trained in a specific field and become more specialized. It wouldn't be surprising if a similar AI model developed in a similar way appeared.'
Related Posts:
in Software, Posted by log1p_kr