Meta, the parent company of Facebook, has introduced a revolutionary AI model with the potential to transform the way we communicate across languages. This groundbreaking technology, known as SeamlessM4T, promises to bridge language barriers by enabling real-time translation and transcription of speech in various languages.
In a recent blog post, Meta announced that the SeamlessM4T AI model has integrated capabilities that were once available only in separate models. It is capable of offering translations between text and speech for an impressive 100 languages. Moreover, it boasts complete speech-to-speech translation functionality for 35 languages.
Mark Zuckerberg, the CEO of Meta, expressed his enthusiasm for these advancements. He emphasized that these innovations will empower users worldwide to effectively communicate within the metaverse. The metaverse, a connected network of virtual worlds, represents a strategic focus for Meta’s future endeavors.
The company’s decision to make the SeamlessM4T model available to the public for non-commercial purposes underscores its commitment to fostering innovation in language technology.
Throughout this year, Meta has been on a mission to democratize AI capabilities, releasing several AI models to the public. One of the notable offerings is the Llama language model, designed to compete with proprietary models from industry giants like Google and Microsoft-backed OpenAI.
Zuckerberg’s vision for an open AI ecosystem aligns with Meta’s approach, as it promotes collaboration in developing user-facing tools for social platforms. This approach differs from the conventional model of charging access fees for the use of AI models.
However, like its counterparts in the industry, Meta faces legal challenges associated with the training data for its AI models. In a legal case filed earlier this year against both Meta and OpenAI, comedian Sarah Silverman and two other authors alleged unauthorized use of their works as training data.
Meta’s researchers revealed that the audio training data for the SeamlessM4T model was drawn from a repository of publicly available web data, encompassing approximately 4 million hours of raw audio. However, the specific source of this repository remains undisclosed.
Inquiries about the origin of the audio data were met with silence from a Meta spokesperson. Additionally, the text data used for the model was curated from datasets gathered the previous year, originating from platforms like Wikipedia and related websites.