NVIDIA unveils 'Cosmos 3,' a suite of physical AI foundational models including high-performance image and video generation models.

NVIDIA announced ' Cosmos 3, ' a suite of foundational models for physical AI, on June 1, 2026, Japan time. Cosmos 3 is a suite of models intended for use in robotics and autonomous driving, and its image generation model 'Cosmos3-Super-Text2Image' and video generation model 'Cosmos3-Super-Image2Video' have achieved the highest performance among open models.
Cosmos 3 — Cosmos Lab
NVIDIA Launches Cosmos 3, the Open Frontier Foundation Model for Physical AI | NVIDIA Newsroom
https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai
How Cosmos 3 Helps Physical AI Think Before It Acts | NVIDIA Blog
https://blogs.nvidia.com/blog/cosmos-3-physical-ai-open-world-foundation-model/
Cosmos 3 is a set of foundational models for physical AI, and as of the time of writing, the following five types are publicly available as open models.
Cosmos3-Nano: A multimodal model with 16 billion parameters. Supports input and output of text, images, videos, audio, and motion data.
Cosmos3-Super: A multimodal model with 65 billion parameters. Supports input and output of text, images, videos, audio, and motion data.
Cosmos3-Nano-Policy-DROID: A multimodal model with 16 billion parameters. Enables robot motion control.
Cosmos3-Super-Text2Image: An image generation model with 65 billion parameters. It generates images from text.
Cosmos3-Super-Image2Video: A video generation model with 65 billion parameters. It generates videos from images.
Researchers can use the Cosmos 3 series models to develop robots, self-driving cars, and other devices that operate in the real world.

An example of generating a Cosmos3-Super-Text2Image is shown below.

In tests conducted by the third-party organization Artificial Analysis, Cosmos3-Super-Text2Image was rated as the highest-performing open model as of May 28, 2026. Artificial Analysis's tests were conducted in a format where 'humans evaluated the quality of the generated images without knowing the name of the AI,' indicating that Cosmos3-Super-Text2Image was 'evaluated as high-quality by human aesthetic judgment, not by mechanical benchmark tests.'

Even in rankings that include closed models, it surpassed the Nano Banana Pro and ranked 4th.

The Cosmos3-Super-Image2Video video generation model is also considered to have the best performance among open models.

In the ranking that included closed models, it came in 22nd place.

The five versions, 'Cosmos3-Nano,' 'Cosmos3-Super,' 'Cosmos3-Nano-Policy-DROID,' 'Cosmos3-Super-Text2Image,' and 'Cosmos3-Super-Image2Video,' are available for download at the following link. Additionally, 'Cosmos3-Edge,' which prioritizes real-time processing, is scheduled to be released soon.
nvidia/Cosmos3-Nano · Hugging Face
https://huggingface.co/nvidia/Cosmos3-Nano
nvidia/Cosmos3-Super · Hugging Face
https://huggingface.co/nvidia/Cosmos3-Super
nvidia/Cosmos3-Nano-Policy-DROID · Hugging Face
https://huggingface.co/nvidia/Cosmos3-Nano-Policy-DROID
nvidia/Cosmos3-Super-Text2Image · Hugging Face
https://huggingface.co/nvidia/Cosmos3-Super-Text2Image
nvidia/Cosmos3-Super-Image2Video · Hugging Face
https://huggingface.co/nvidia/Cosmos3-Super-Image2Video
While NVIDIA's Cosmos series is developed for applications such as robotics and autonomous vehicles, it is also being used in fields other than physical AI. For example, 'Anima,' which can generate high-quality illustrations, is based on ' Cosmos-Predict2-2B-Text2Image .'
The official version of 'Anima,' an AI image generation tool strong in anime and illustrations, has finally been released. It supports both tags and natural language, and can be easily run locally on any PC that can handle SDXL or Illustrious-type models - GIGAZINE

Related Posts:
in AI, Posted by log1o_hf







