News In Brief Technology and Gadgets

Meta's Chameleon AI Outperforms GPT-4 and Gemini in Mixed-Modal Tasks

638

03 Jul 2024

5 min read

News Synopsis

The AI boom is here, making significant strides in transforming how we handle monotonous tasks. AI now allows anyone to generate customized text, images, audio, and video.

But what if an AI could create images from text or vice versa? Meta, owned by Mark Zuckerberg, has achieved this with their new multi-modal LLM, Chameleon, which has outperformed competitors like GPT-4 and Gemini in specific tasks. Let's explore this AI in detail.

What is Meta Chameleon?

The Fundamental AI Research (FAIR) team at Meta recently launched five new AI models, including a new family called CM3leon (pronounced "chameleon"). This mixed-modal AI, named Chameleon, can both understand and generate text and images.

This capability sets it apart from other large language models (LLMs), which typically focus on a single output, like converting text into voice.

Key Capabilities of Chameleon AI

Unlike traditional LLMs that predict the next word one at a time using the preceding text as context, Chameleon uses a different approach called multi-token prediction, enabling it to predict numerous future words simultaneously.

Furthermore, Meta's AI employs a single token-based representation for text and image, allowing it to produce mixed-media outputs, including text-only responses.

Training and Sizes

Meta's Chameleon AI comes in two sizes:

Chameleon-7B and Chameleon-34B, with 7 billion and 34 billion parameters, respectively. Both models were pre-trained with more than 4 trillion tokens of mixed text and picture data. After pre-training, they were fine-tuned using smaller datasets to ensure proper alignment and safety.

Performance and Benchmarks

Meta's research paper reveals that Chameleon has shown exceptional performance on several benchmarks, including visual question answering and image captioning tasks.

It outperforms models like Flamingo, Llava-1.5, and IDEFICS in these areas. Additionally, Chameleon is on par with Mixtral 8x7B and Gemini-Pro in common-sense reasoning and reading comprehension.

Human Judgement Tests

Meta tested Chameleon's output with humans as judges, comparing it against baseline models like ChatGPT and Gemini. In these pairwise comparisons, Chameleon-34B outperformed Gemini-Pro and GPT-4, with a 60.4% and 51.6% preference against Gemini-Pro and GPT-4, respectively.

Availability

Currently, Meta has not officially launched Chameleon AI to the public due to safety concerns. However, a revised version is available upon request under a "research-only license."

Upcoming AI Events

Notably, India is hosting the Global India AI Summit on July 3 and 4, aimed at promoting ethical and inclusive AI. This event underscores the global interest and investment in advancing AI technologies.

Final Thoughts

Meta's Chameleon AI represents a significant leap in AI technology, demonstrating superior performance in mixed-media tasks. As AI continues to evolve, innovations like Chameleon will play a crucial role in shaping the future of this dynamic field.

Conclusion

The AI industry is fiercely competitive, with startups and tech giants vying to create the best models. Meta's release of five multi-modal LLMs, including Chameleon, marks a significant milestone. Chameleon's unique ability to generate text and images simultaneously sets it apart, making it a pioneering model in the AI landscape.

Meta's Chameleon AI Outperforms GPT-4 and Gemini in Mixed-Modal Tasks