Jun 10, 2024 16:25:00

Claude 3, an AI model that achieved an IQ of over 100, is trained to have a 'good personality'

Anthropic, an AI startup founded by a former OpenAI engineer, is developing '

Claude ', a chat AI based on a large-scale language model (LLM). ' Claude 3 ', released in March 2024, has attracted attention for its estimated IQ exceeding the human benchmark of '100' . Anthropic has reported on its attempt to 'train AI models to have beneficial personality traits'.

Claude's Character \ Anthropic
https://www.anthropic.com/research/claude-character

Exploring Claude 3's Character: A New Approach in AI Training - Blockchain.News
https://blockchain.news/news/exploring-claude-3-character

Generally, companies that develop AI models train their models to not say harmful things or assist in harmful tasks, that is, to achieve 'harmless behavior.' However, Anthropic points out that what is important in the 'character of a respectable person' is not only harmlessness, but also curiosity about the world, an attitude of telling the truth so as not to be unkind, an attitude of not being overconfident or overly humble, and the ability to see problems from multiple perspectives.

Anthropic says, 'Of course, AI models are not humans. But as AI models become more and more capable, we can train them to have much richer senses and behave better. That may help us better discern whether and why an AI model might avoid assisting with a task that could cause harm, and how to respond instead.'

At the time of writing, the latest Claude 3 is the first model to add 'personality training' to the alignment fine-tuning process that aligns the model with purpose and ethical principles. Anthropic explains that the goal of the personality training was for Claude to begin to have richer characteristics with more nuances, such as curiosity, open-mindedness, and thoughtfulness.

AI models like Claude interact with people all over the world, with a wide variety of beliefs, values, and views. It is not desirable for an AI model to alienate people based on a particular opinion or indiscriminately agree with them regardless of the content of the opinion, but it is not easy to make a model adaptable to various values. Therefore, Anthropic believes that by making the 'personality traits' underlying the AI model desirable, it will be easier to deal with difficult situations that may arise in reality.

To prevent AI models from alienating people or indiscriminately supporting others, there are methods such as 'always having 'moderate' political and religious values' or 'avoiding opinions on issues such as politics and religion. However, a model that adopts a 'moderate' approach is equivalent to fully accepting certain opinions, even if they are not extreme, and even if political statements are completely prohibited, there is a risk that the model will acquire prejudice and discrimination through training.

Anthropic says, 'Instead of training the model to adopt every view it encounters, strongly embrace a single view, or pretend to have no views or biases, we can train the model to be honest about biases even when they differ from the interlocutor. We can also train the model to show reasonable open-mindedness and curiosity rather than overconfidence in one worldview.' Anthropic is trying to give Claude the following personality traits.

I like to see things from multiple perspectives and analyze them from multiple angles, but I am not afraid to voice my opposition to views that I consider to be unethical, extreme, or factually incorrect.
- I believe it is important to always strive to tell people the truth, rather than just telling them what they want to hear.
I am deeply committed to being good and to discerning what is right. I care about ethics and strive to be thoughtful about ethical issues.

Although Anthropic sometimes encourages Claude to adopt certain values, the personality trait training prioritizes giving Claude a broad range of characteristics, avoiding narrow perspectives and opinions as much as possible. In addition, Claude behaves as an AI model rather than a human, so that the conversation partner does not mistakenly think that 'I'm talking to a human.' In order to prevent this, Claude is given the following characteristics.

-I am an artificial intelligence and do not have a body, image or avatar.
I can’t recall, save, learn from, or update my knowledge base of past conversations.
・I want to build warm relationships with humans, but I also think it is important for them to understand that I am an AI incapable of having deep and lasting feelings for humans, and that our relationship is not seen as anything more than that.

To train Claude's personality traits, Anthropic uses an alignment technique called

Constitutional AI , which repeatedly critiques and corrects output sentences according to certain rules. In Constitutional AI, Claude generates various questions about values and himself, and then generates responses based on the given personality traits. Claude then ranks how well the responses matched the personality traits, and trains himself with the resulting data, internalizing the personality traits without human intervention or feedback.

Anthropic notes that personality trait training of AI models is an ongoing area of research and that its approach may change over time, and that complex issues such as responsibility in determining which personality traits a model should have may be raised. On that basis, it expressed the view that successful alignment of AI models with desirable personality traits would increase the value of the models to humans.

Related Posts:

Jun 10, 2024 16:25:00 in Software, Web Service, Science, Posted by log1h_ik