Meta SAM 3 and SAM 3D: new generation of visual AI

SAM 3 introduces image and video segmentation guided by text and visual examples, with a vocabulary of millions of concepts.
SAM 3D allows you to reconstruct objects, scenes, and human bodies in 3D from a single image, using open models.
Models can be tested without technical knowledge in Segment Anything Playground, with practical and creative templates.
Meta releases weights, checkpoints, and new benchmarks so that developers and researchers in Europe and the rest of the world can integrate these capabilities into their projects.

Meta has taken another step in its commitment to artificial intelligence applied to computer vision with the launch of SAM 3 and SAM 3D, two models that expand the Segment Anything family and that They aim to change the way we work with photos and videosFar from remaining a laboratory experiment, the company wants these tools to be used by both professionals and users without a technical background.

With this new generation, Meta is focusing on improve object detection and segmentation and in bringing the three-dimensional reconstruction to a much wider audienceFrom video editing to product visualization for e-commerce in Spain and the rest of Europe, the company envisions a scenario in which Simply describing what you want to do in words is enough for AI to do most of the heavy lifting..

What does SAM 3 offer compared to previous versions?

SAM 3 is positioned as the direct evolution of the segmentation models that Meta presented in 2023 and 2024, known as SAM 1 and SAM 2. Those early versions focused on identifying which pixels belonged to each object, mainly using visual cues such as dots, boxes or masks, and in the case of SAM 2, following objects throughout a video almost in real time.

The key new development now is that SAM 3 understands rich and precise text promptsnot just general labels. Whereas before simple terms like "car" or "bus" were used, the new model is capable of responding to much more specific descriptions, for example "yellow school bus" or "red car double-parked".

In practice, this means that it is enough to write something like “red baseball cap” so that the system can locate and separate all the elements that fit that description within an image or video. This ability to refine with words is especially useful in professional editing contexts, advertising or content analysis, where you often have to look at very specific details.

Furthermore, SAM 3 has been designed to integrate with large multimodal language modelsThis allows you to go beyond simple phrases and use complex instructions such as: “People sitting down but not wearing a red cap” or “pedestrians who are looking at the camera but without a backpack.” This type of instruction combines conditions and exclusions that until recently were difficult to translate into a computer vision tool.

Exclusive content - Click Here How to Use Copilot Vision on Edge: Features and Tips

Performance and scale of the SAM 3 model

SAM 3 meta model

Meta also wanted to highlight the less visible but crucial part: the technical performance and knowledge scale of the model. According to the company's data, SAM 3 is capable of processing a single image with more than one hundred detected objects in around 30 milliseconds using an H200 GPU, a speed very close to what is needed for demanding workflows.

In the case of the video, the firm assures that the system maintains performance virtually in real time when working with around five simultaneous objects, making it viable for tracking and segmenting moving content, from short social media clips to more ambitious production projects.

To achieve this behavior, Meta has built a training base with more than 4 million unique conceptsCombining human annotators with AI models to help label large volumes of data, this blend of manual and automated oversight aims to balance accuracy and scale—key to ensuring the model responds well to diverse inputs in European, Latin American, and other market contexts.

The company frames SAM 3 within what it calls Segment Anything CollectionA family of models, benchmarks, and resources designed to expand AI's visual understanding. The launch is accompanied by a new benchmark for "open vocabulary" segmentation, focused on measuring the extent to which the system can understand almost any concept expressed in natural language.

Integration with Edits, Vibes, and other Meta tools

Edit 4K videos with Meta Edits

Beyond the technical component, Meta has already begun to integrate SAM 3 into specific products that are intended for everyday use. One of the first destinations will be Edits, their video creation and editing application, where the idea is that the user can select specific people or objects with a simple text description and apply effects, filters or changes only to those parts of the footage.

Another avenue for integration will be found in Vibes, within the Meta AI app and the meta.ai platformIn this environment, text segmentation will be combined with generative tools to create new editing and creative experiences, such as custom backgrounds, motion effects, or selective photo modifications designed for social networks that are very popular in Spain and the rest of Europe.

The company's proposal is that these capabilities not be restricted to professional studies, but rather reach... independent creators, small agencies, and advanced users who work daily with visual content. The ability to segment scenes by writing descriptions in natural language reduces the learning curve compared to traditional tools based on manual masks and layers.

At the same time, Meta maintains an open approach towards external developers, suggesting that third-party applications -from editing tools to solutions for video analytics in retail or security- can rely on SAM 3 as long as the company's usage policies are respected.

Exclusive content - Click Here Comparison: Windows 11 vs Linux Mint on older PCs

SAM 3D: Three-dimensional reconstruction from a single image

How SAM 3D works

The other big news is SAM 3Da system designed to perform three-dimensional reconstructions starting from 2D images. Instead of needing multiple captures from different angles, the model aims to generate a reliable 3D representation from a single photo, something especially interesting for those who do not have specialized scanning equipment or workflows.

SAM 3D consists of two open-source models with distinct functions: SAM 3D Objectsfocused on reconstructing objects and scenes, and SAM 3D Body, geared towards estimating human shape and body. This separation allows the system to be adapted to very different use cases, from product catalogs to health or sports applications.

According to Meta, SAM 3D Objects marks a New performance benchmark in AI-guided 3D reconstructioneasily surpassing previous methods in key quality metrics. To more rigorously evaluate the results, the company has worked with artists to create SAM 3D Artist Objects, a dataset specifically designed to assess the fidelity and detail of reconstructions across a wide variety of images and objects.

This advance opens the door to practical applications in areas such as robotics, science, sports medicine, or digital creativityFor example, in robotics it can help systems better understand the volume of the objects they interact with; in medical or sports research, it could help analyze body posture and movement; and in creative design, it serves as a basis for generating 3D models for animation, video games, or immersive experiences.

One of the first commercial applications already visible is the function "View in Room" de Facebook Marketplacewhich allows you to visualize how a piece of furniture or decorative object would look in a real room before buying it. With SAM 3D, Meta seeks to perfect these types of experiences, highly relevant for European e-commerce, where returning products due to unmet expectations represents an increasing cost.

How to convert people and objects into 3D models with SAM 3D

Convert people and objects into 3D with Meta's SAM 3 and SAM 3D

Segment Anything Playground: an environment for experimenting

To allow the public to test these capabilities without installing anything, Meta has enabled the Segment Anything PlaygroundIt's a web platform that lets you upload images or videos and experiment with SAM 3 and SAM 3D directly from your browser. The idea is that anyone curious about visual AI can explore what's possible without any programming knowledge.

In the case of SAM 3, the Playground allows segmenting objects using short phrases or detailed instructionsCombining text and, if desired, visual examples. This simplifies common tasks such as selecting people, cars, animals, or specific elements of the scene and applying specific actions to them, from aesthetic effects to blurring or background replacement.

Exclusive content - Click Here How to use Microsoft Designer to enhance your creative projects

When working with SAM 3D, the platform makes it possible Explore scenes from new perspectivesrearrange objects, apply three-dimensional effects, or generate alternative views. For those who work in design, advertising, or 3D content, it offers a quick way to prototype ideas without having to use complex technical tools from the outset.

The Playground also includes a series of ready-to-use templates These features are geared towards very specific tasks. They include practical options such as pixelating faces or license plates for privacy reasons, and visual effects like motion trails, selective highlights, or spotlights on areas of interest in the video. These types of functions can be a particularly good fit for the workflows of digital media and content creators in Spain, where the production of short videos and social media content is constant.

Open resources for developers and researchers

SAM 3D Meta Examples

In line with the strategy Meta has followed in other AI releases, the company has decided to release a significant portion of the technical resources associated with SAM 3 and SAM 3DFor the first, the model weights, a new benchmark focused on open vocabulary segmentation, and a technical document detailing its development have been made public.

In the case of SAM 3D, the following are available: model checkpoints, inference code, and an evaluation dataset next generation. This dataset includes a considerable variety of images and objects that aims to go beyond traditional 3D reference points, providing greater realism and complexity, something that can be very useful for European research groups working in computer vision and graphics.

Meta has also announced collaborations with annotation platforms like Roboflow, with the goal of enabling developers and companies to Enter your own data and adjust SAM 3 to specific needs. This opens the door to sector-specific solutions, from industrial inspection to urban traffic analysis, including cultural heritage projects where it is important to accurately segment architectural or artistic elements.

By opting for a relatively open approach, the company seeks to ensure that the developer ecosystem, universities and startups -including those operating in Spain and the rest of Europe- can experiment with these technologies, integrate them into their own products and, ultimately, contribute use cases that go beyond those that Meta can develop internally.

With SAM 3 and SAM 3D, Meta aims to consolidate a more flexible and accessible visual AI platformwhere text-guided segmentation and 3D reconstruction from a single image are no longer capabilities reserved for highly specialized teams. The potential impact extends from everyday video editing to advanced applications in science, industry, and e-commerce, in a context where the combination of language, computer vision, and creativity is becoming a standard working tool and not just a technological promise.

Alberto navarro

I am a technology enthusiast who has turned his "geek" interests into a profession. I have spent more than 10 years of my life using cutting-edge technology and tinkering with all kinds of programs out of pure curiosity. Now I have specialized in computer technology and video games. This is because for more than 5 years I have been writing for various websites on technology and video games, creating articles that seek to give you the information you need in a language that is understandable to everyone.

If you have any questions, my knowledge ranges from everything related to the Windows operating system as well as Android for mobile phones. And my commitment is to you, I am always willing to spend a few minutes and help you resolve any questions you may have in this internet world.