Microsoft Phi-4 Multimodal: AI wanda ke fahimtar murya, hotuna da rubutu

Sabuntawa ta ƙarshe: 27/02/2025

  • Microsoft ya ƙaddamar da Phi-4-multimodal, samfurin AI wanda ke sarrafa murya, hotuna da rubutu lokaci guda.
  • Tare da sigogi biliyan 5.600, ya fi girma samfura a cikin murya da fahimtar hangen nesa.
  • Ya haɗa da Phi-4-mini, sigar da aka mayar da hankali kawai akan ayyukan sarrafa kalmomi.
  • Akwai akan Azure AI Foundry, Hugging Face, da NVIDIA, tare da aikace-aikace iri-iri a cikin kasuwanci da ilimi.
Menene Phi-4 multimodal-0

Microsoft ya ɗauki mataki na gaba a duniyar ƙirar harshe tare da multimodal Phi-4, sabon sa kuma mafi ci gaba na fasaha na wucin gadi mai ikon sarrafa rubutu, hotuna da murya lokaci guda. Wannan samfurin, tare da Phi-4-mini, yana wakiltar a Juyin Halitta a cikin ƙarfin ƙananan samfura (SLM), yana ba da inganci da daidaito ba tare da buƙatar ɗimbin sigogi ba.

Zuwan Phi-4-multimodal ba kawai yana wakiltar haɓakar fasaha ga Microsoft ba, har ma Yana gogayya kai tsaye tare da manyan samfura kamar na Google da Anthropic. Ingantattun tsarin gine-ginensa da ingantaccen iyawar tunani sun sa shi zaɓi mai ban sha'awa don aikace-aikacen da yawa, daga fassarar na'ura zuwa hoto da tantance murya.

Keɓaɓɓen abun ciki - Danna nan  Wurin Kasuwar Muryar Iconic: ElevenLabs yana buɗe kasuwar sa don mashahuran muryoyin

Menene Phi-4-multimodal kuma ta yaya yake aiki?

Phi-4 Microsoft

Phi-4-multimodal samfurin AI ne wanda Microsoft ya haɓaka wanda zai iya sarrafa rubutu, hotuna da murya lokaci guda.. Ba kamar samfuran gargajiya waɗanda ke aiki tare da tsari ɗaya ba, wannan basirar wucin gadi tana haɗa hanyoyin samun bayanai daban-daban zuwa sararin wakilci guda ɗaya, godiya ga amfani da dabarun koyo.

An gina samfurin a kan gine-gine na Sigogi biliyan 5.600, ta amfani da wata dabara da aka sani da LoRAs (Ƙaramar Matsayin Adafta) don haɗa nau'ikan bayanai daban-daban. Wannan yana ba da damar ƙarin daidaito cikin sarrafa harshe da zurfin fassarar mahallin.

Mabuɗin iyawa da fa'idodi

Phi-4-multimodal yana da tasiri musamman a ayyuka masu mahimmanci waɗanda ke buƙatar babban matakin hankali na wucin gadi:

  • Gane murya: Ya fi ƙwararrun ƙira irin su WhisperV3 a cikin rubuce-rubuce da gwaje-gwajen fassarar inji.
  • Sarrafa hoto: Yana da ikon fassara takardu, zane-zane da yin OCR tare da daidaito mai girma.
  • Ƙarfin Latency: Wannan yana ba shi damar yin aiki akan na'urorin hannu da ƙananan ƙarfi ba tare da sadaukar da aikin ba.
  • Haɗin kai mara kyau tsakanin hanyoyin: Ikon fahimtar rubutu, magana da hotuna tare yana inganta tunanin mahallin su.
Keɓaɓɓen abun ciki - Danna nan  Misalan Fasahar Wucin Gadi ta Artificial

Kwatanta da sauran samfura

PHI-4-aiki multimodal

Dangane da aiki, Phi-4-multimodal ya tabbatar da kasancewa daidai da manyan samfura. Idan aka kwatanta da Gemini-2-Flash-lite da Claude-3.5-Sonnet, yana samun sakamako iri ɗaya a cikin ayyuka na multimodal, yayin da yake ci gaba da ingantaccen aiki godiya ga ƙirar ƙira.

Duk da haka, yana gabatar da wasu iyakoki a cikin tambayoyin tushen murya da amsoshi, Inda samfuran kamar GPT-4o da Gemini-2.0-Flash suna da fa'ida. Wannan ya faru ne saboda ƙaramin girman samfurinsa, wanda ke tasiri ga riƙe ilimin gaskiya. Microsoft ya nuna cewa yana aiki don inganta wannan damar a cikin sigogin gaba.

Phi-4-mini: ƙane na Phi-4-multimodal

Tare da Phi-4-multimodal, Microsoft kuma ya ƙaddamar Phi-4-mini, bambance-bambancen da aka inganta don takamaiman ayyuka na tushen rubutu. An tsara wannan samfurin don bayarwa babban inganci a sarrafa harshe na halitta, yana mai da shi manufa don chatbots, mataimaka na gani, da sauran aikace-aikacen da ke buƙatar ingantaccen fahimta da tsara rubutu.

Kasancewa da aikace-aikace

Menene Phi-4 multimodal-5

Microsoft ya sanya Phi-4-multimodal da Phi-4-mini samuwa ga masu haɓaka ta hanyar Azure AI Foundry, Hugging Face, da NVIDIA API Catalog. Wannan yana nufin cewa duk wani kamfani ko mai amfani da ke da damar yin amfani da waɗannan dandamali zai iya fara gwaji tare da ƙirar kuma a yi amfani da shi a yanayi daban-daban.

Keɓaɓɓen abun ciki - Danna nan  ChatGPT yana shirin haɗa talla a cikin ƙa'idarsa da canza ƙirar AI ta tattaunawa

Ganin tsarin sa na multimodal, Phi-4 shine An yi nufin sassa kamar:

  • Fassarar na'ura da rubutun rabe-rabe na ainihi.
  • Gane takarda da bincike don kasuwanci.
  • Aikace-aikacen hannu tare da mataimaka masu hankali.
  • Samfuran ilimi don haɓaka koyarwar tushen AI.

Microsoft ya ba da wani karkatarwa mai ban sha'awa tare da waɗannan samfuran ta hanyar mai da hankali kan inganci da haɓaka. Tare da haɓaka gasa a fagen ƙananan ƙirar harshe (SLM), Phi-4-multimodal an gabatar da shi azaman madaidaicin madadin ga manyan samfura, yana ba da daidaituwa tsakanin aiki da iya aiki samuwa ko da akan na'urori marasa ƙarfi.