Meta Releases Llama 3.2 Models With Vision Capability For The First Time

At the Meta Connect 2024 event, Mark Zuckerberg announced the new Llama 3.2 family of models to take on OpenAI’s o1 and o1 mini models. Moreover, for the first time, the Llama 3.2 models come with multimodal image support.

Llama 3.2 Models Are Optimized for On-Device Tasks

First of all, Llama 3.2 has two smaller models, which include Llama 3.2 1B and 3B for on-device tasks. Meta says these small models are optimized to work on mobile devices and laptops.

Llama 3.2 1B and 3B models are best suited for on-device summarization, instruction following, rewriting, and even function calling to create an action intent locally. Meta also claims that its latest Llama models outperform Google’s Gemma 2 2.6B and Microsoft’s Phi-3.5-mini.

Basically, developers can deploy these models on Qualcomm and MediaTek platforms to power many AI use cases. Meta further says Llama 3.2 1B and 3B models are pruned and distilled from the larger Llama 3.1 8B and 70B models.

Llama 3.2 Models Give Vision to Meta AI

Now coming to the exciting vision models, they come in larger sizes — Llama 3.2 11B and Llama 3.2 90B. They replace the older text-only Llama 3.1 8B and 70B models. Meta goes on to say that Llama 3.2 11B and 90B models rival closed models like Anthropic’s Claude 3 Haiku and OpenAI’s GPT-4o mini in visual reasoning.

These new Llama 3.2 11B and 90B vision models will be available through the Meta AI chatbot on the web, WhatsApp, Instagram, Facebook, and Messenger. Since these are vision models, you can upload images and ask questions about them. For example — you can upload an image of a recipe, and it can analyze and give you instructions on how to make it. you can have Meta AI capture your face and reimagine yourself in tons of different scenarios and portraits.

How to Install and Run LLMs Locally on Android Phones

Arjun Sha Apr 22, 2024

Microsoft Releases a Small Phi-3 Vision Multimodal Model

Arjun Sha May 21, 2024

Llama 3.1 vs ChatGPT 4o: Meta AI Is Not So Intelligent After All

Arjun Sha Jul 30, 2024

The vision models also come in handy while understanding charts and graphs. On social media apps like Instagram and WhatsApp, the vision models can also generate captions for you.

Overall, Meta has released multimodal models for the first time under an open-source license. It is going to be pretty exciting to test the vision models against the competition.

Blindfire Is A Lightless Twist On A Multiplayer Shooter: Early Access Launched
Qualcomm Approaches Intel For A Possible Takeover
NYT Strands Today: Hints, Answers & Spangram For October 14