AiNews.com
Posts
Meta Unveils SAM 2.1, Spirit LM, and New AI Tools for Open Science

Meta Unveils SAM 2.1, Spirit LM, and New AI Tools for Open Science

Alicia Shapiro
October 21, 2024 • Estimated Reading Time: 9 minutes

A visually rich representation of Meta's AI research. On the left, a video and image frame are displayed with objects automatically segmented, illustrating SAM 2.1’s ability to detect objects in both images and videos. On the right, a split screen shows a speech waveform and corresponding text, symbolizing Meta Spirit LM’s integration of speech and text generation. In the background, a layered neural network is visible, with some layers 'skipped,' representing the Layer Skip technique for accelerating AI performance. Additionally, digital locks in the background depict cryptographic security, while molecular structures being analyzed represent Meta Open Materials 2024’s focus on AI-assisted materials discovery.

Image Source: ChatGPT-4o

Meta Unveils SAM 2.1, Spirit LM, and New AI Tools for Open Science

Meta's Fundamental AI Research (FAIR) team continues to lead the way in AI innovation, with a strong focus on achieving Advanced Machine Intelligence (AMI). The latest release of research artifacts aims to advance machine learning while supporting open science and reproducibility. Key highlights include new tools for image segmentation, language modeling, and cryptographic security, designed to accelerate progress in fields like AI development, materials discovery, and large language model (LLM) optimization.

Mark Zuckerberg recently emphasized the role of open-source AI in driving progress, saying it "has more potential than any other modern technology to increase human productivity, creativity, and quality of life."

Key Releases: SAM 2.1 and Spirit LM

Meta Segment Anything Model 2.1 (SAM 2.1)

SAM 2.1, an updated version of the widely adopted Segment Anything Model, offers stronger performance and improved handling of visually similar and small objects. Meta has been amazed by the widespread adoption and impact of SAM 2 across various fields, from medical image analysis to meteorology, and beyond. Enhancements include new data augmentation techniques and refined object pointer memory, which have boosted the model’s accuracy in segmenting objects in both images and videos. Additionally, Meta has introduced a SAM 2 Developer Suite, providing open-source code for model training and the web demo, allowing researchers to fine-tune SAM 2 for their own data.

A bar chart comparing the performance improvements of SAM 2.1 B+ with SAM 2 B+, DEVA, and Cutie across three test categories: SA-V test, MOSE val, and LVOS val. The chart shows significant performance gains for SAM 2.1 B+ in all categories, especially in the LVOS val, where it leads by a wide margin.

Image Source: Meta Blog

Meta Spirit LM

Meta's first open-source multimodal language model, Spirit LM, integrates text and speech for seamless cross-modality generation. By using a word-level interleaving method, it delivers expressive and natural-sounding speech, while enabling tasks like automatic speech recognition and speech classification. Two versions, Spirit LM Base and Spirit LM Expressive, cater to different speech generation needs.

Spirit LM Base models speech using phonetic tokens to represent basic sound structures. In contrast, Spirit LM Expressive goes further by incorporating pitch and style tokens to capture nuances in tone, such as emotions like excitement, anger, or surprise. This allows the model to generate speech that not only conveys the words but also reflects the intended emotional expression. Spirit LM enables the generation of more natural-sounding speech and has the flexibility to learn new tasks across different modalities, including automatic speech recognition, text-to-speech conversion, and speech classification.

Layer Skip: Enhancing Large Language Model Performance

Layer Skip introduces an innovative approach to accelerating LLM generation times. By skipping certain layers and verifying results through subsequent layers, Layer Skip reduces the computational burden, enabling faster performance without the need for specialized hardware. Meta has optimized several models, including Llama 3, Llama 2 and Code Llama, with this technique, offering up to 1.7x performance improvements. These distinctive features open up new possibilities for cutting-edge research in model optimization and interpretability.

SALSA: Strengthening Post-Quantum Cryptography Security

As cryptographic standards evolve, staying ahead of potential security threats is crucial, especially with the rise of quantum computing. Meta’s SALSA (Sparse Learning with AI for Secure Algorithms) is an advanced tool designed to validate the security of post-quantum cryptography (PQC) systems, which are increasingly critical in securing data against quantum attacks.

One of the most widely adopted post-quantum cryptographic standards is lattice-based cryptography, specifically the Learning with Errors (LWE) problem, which is foundational to several cryptosystems approved by the National Institute of Standards and Technology (NIST). LWE ensures that it is computationally difficult to learn a secret vector from noisy linear equations, forming a secure basis for PQC. The NIST-approved Kyber cryptosystem, which SALSA targets, is one such implementation of lattice-based encryption designed to withstand quantum-level threats.

Meta’s SALSA method introduces the first AI-based attacks on this type of cryptography, focusing on cracking sparse secrets within the Kyber system. SALSA's key innovation lies in its ability to attack sparse secrets more efficiently than traditional methods, making it a vital tool in assessing the robustness of cryptographic standards like Kyber. Although it currently targets sparse secrets, SALSA is continually evolving, and future iterations could extend to attacking more general secrets within lattice-based encryption schemes.

Meta Lingua and Open Materials: Accelerating Research

Meta Lingua is a modular, efficient codebase created to streamline the training of large language models. With a focus on ease of use and flexibility, Lingua enables quick experimentation and reproducible research, allowing AI researchers to concentrate on innovation without the burden of complex setups. To achieve this, we made intentional design choices that ensure the code remains modular, self-contained, and highly efficient. Leveraging key PyTorch features, we maintained a balance between flexibility and performance while making installation and maintenance straightforward.

Meta Open Materials 2024

Discovering new materials is essential for advancements in technology, but the process can take decades. Meta is aiming to revolutionize this field with Meta Open Materials 2024, a massive open-source dataset and model release that could greatly accelerate the timeline for materials discovery.

Meta’s Open Materials 2024 release is a significant step forward in the field of inorganic materials discovery. This dataset and the accompanying models are designed to accelerate breakthroughs in materials science by providing open-source tools for researchers. The dataset, which consists of 100 million training examples, ranks at the top of the Matbench-Discovery leaderboard, positioning it as one of the most competitive open-source options in this field.

Meta Open Materials 2024 addresses the challenge of closed, proprietary models that often dominate materials discovery. By offering an open-source alternative, Meta aims to foster collaboration within the AI and materials science communities. These resources empower researchers to conduct open, reproducible research, helping bridge the gap between proprietary and open-source models.

MEXMA: Improving Sentence Representations

Meta’s latest release, MEXMA, introduces a novel approach to improving cross-lingual sentence encoding. MEXMA stands out from previous methods by incorporating both token-level and sentence-level objectives during its training process, significantly enhancing the performance of sentence encoders across languages.

Previous sentence encoders primarily relied on sentence-level objectives, which limited their ability to fully capture the nuances of multilingual text. MEXMA overcomes this limitation by updating its encoder through token-level objectives as well, allowing for more precise and aligned sentence representations across different languages. The inclusion of token-level learning improves the overall accuracy of sentence encodings, resulting in better performance on multilingual tasks.

Covering 80 languages, MEXMA ensures that its sentence representations are aligned across all supported languages. This alignment is crucial for cross-lingual tasks such as sentence classification and translation, where consistency and accuracy across different linguistic contexts are essential.

Meta is also releasing a codebase alongside MEXMA to support further research in cross-lingual models. Researchers can build on this work to explore new avenues in natural language processing (NLP), benefiting from MEXMA’s enhanced sentence representations for tasks like classification and translation. Meta hopes that MEXMA will advance research in NLP by enabling more accurate and scalable cross-lingual models.

A technical diagram illustrating the architecture of a cross-lingual sentence encoder. It shows multiple encoders processing sentences in English and Spanish ("The car is red" and "El coche es rojo"), with aligned components across both languages. The diagram highlights the use of CLS tokens, KoLeo (token objectives), and MLM heads, visualizing how the model aligns sentence representations across languages.

How MEXMA Works. Image Source: Meta Blog

Looking Ahead: Open Science for AI Progress

With these releases, Meta continues to expand the boundaries of what is possible in AI, demonstrating its commitment to fostering an open AI ecosystem. By making these tools and research openly available, Meta encourages collaboration and innovation across the global AI research community. The company hopes that the latest artifacts will inspire others to build upon these advancements, contributing to the development of responsible, advanced machine intelligence.

For more details on the research, or to download the code, please visit Meta's blog.