How does AI identify 'cats'?

When humans see a photo of a cat, they can immediately tell that it's a cat, but it's not easy for a computer to recognize the essential characteristics of a cat photographed against a different background or angle and determine that it's a cat. Today's computers use
How Can AI ID a Cat? An Illustrated Guide. | Quanta Magazine
https://www.quantamagazine.org/how-can-ai-id-a-cat-an-illustrated-guide-20250430/
Cat detection in a neural network is an example of what researchers call a 'classification task,' in which a neural network is given a particular object (in this case, a photo), and the goal is to assign it to the appropriate category (in this case, 'cat' or 'not cat').
Quanta Magazine explains how the classification task works using two fictional regions, 'Triangle Territor' and 'Square State,' as examples.

Suppose you have a neural network that, given a point with latitude and longitude coordinates, can determine whether that point is located in a 'triangle territory' or a 'square country.' However, there is no map showing the boundaries; the neural network is only given a set of known points that are either in the 'triangle territory' or the 'square country.'

To build a 'classifier system' to classify unknown points, we first need to draw a boundary.
Artificial neurons, the building blocks of neural networks, are mathematical functions that take multiple inputs and output a single number.

The output value is always close to either '0' or '1' and is determined by the combination of other numerical values called parameters and the input values. For example, an artificial neuron with two inputs has a parameter called 'weight' that indicates how much the input value affects the output, and a parameter called 'bias' that determines the overall priority of the output value. Since there are two inputs, there are two weights and one bias, for a total of three parameters.

The following three figures show how artificial neurons with different parameters draw boundaries. In these figures, the boundaries are all straight lines, and the parameters determine the position and angle of the lines.

To create a classifier that can determine whether a new input point should belong to the 'triangle territory' or the 'square country', we need to adjust this boundary to precisely mark the boundary between the two regions, for example, 'if the output value is close to '0', classify it as a 'square country', if it is close to '1', classify it as a 'triangle territory'.'

To adjust the boundaries, the neuron's parameters must be adjusted through a process called training: in the first step, the parameters are set to random values, and then training compares the difference between the output value and the true value, adjusting the error.

Every time the artificial neuron gives a wrong answer, an automated algorithm adjusts the weight and bias parameters, moving the boundary line.

Ultimately, we obtain parameters that approximate the correct boundary line.

In simple cases where the boundaries are close to straight lines, such as the case of 'triangular territory' and 'square country,' a single artificial neuron works well, but for more complex tasks such as 'identifying whether something is a cat or not,' a single artificial neuron does not work well. Therefore, a 'neural network' consisting of many connected artificial neurons is used. Like individual artificial neurons, a neural network is a mathematical function that takes a number as an input and outputs a new number.

Artificial neurons in a neural network are arranged in groups called layers, where each layer can contain any number of artificial neurons, and the output of each layer serves as the input to the next layer.

Larger neural networks have more parameters, which allows them to detect more complex boundaries.

Also, while we've looked at neural networks with two inputs so far, there's no limit to the number of inputs a neural network can have. In the previous example, the inputs were latitude and longitude, but they could instead represent things like the grayscale of a pixel, or coordinates in 3D space.

Combining 2500 input values can also represent a 50x50 pixel grid.

With enough data points, it's possible to build a large neural network to distinguish between cats and non-cats, Quanta Magazine explained.

Related Posts:
in Software, Posted by log1h_ik







