Hugging Face推出Idefics2视觉语言模型

Hugging Face宣布推出Idefics2,这是一个多才多艺的模型,能够根据图像和文本理解和生成文本回应。


该模型在回答视觉问题、描述视觉内容、从图像创建故事、提取文档信息,甚至基于视觉输入执行算术运算方面树立了新的标杆。
Idefics2通过仅有80亿参数和其开放许可(Apache 2.0)所提供的多功能性以及显著增强的光学字符识别(OCR)能力,超越了其前身Idefics1。
该模型不仅在视觉问题回答基准测试中表现出色,而且在面对LLava-Next-34B和MM1-30B-chat等规模更大的同类模型时也表现出色。

Idefics2的吸引力核心在于从一开始就与Hugging Face的Transformers集成,确保方便进行广泛的多模态应用的微调。对于那些渴望深入研究的人来说,可以在Hugging Face Hub上实验模型。
Idefics2的一个突出特点是其全面的训练理念,融合了包括网页文档、图像标题对和OCR数据在内的公开可用数据集。此外,它引入了一个名为“大釜”的创新微调数据集,将50个精心策划的数据集融合在一起,用于多方面的对话训练。
Idefics2展示了一种对图像处理的精细化方法,保持原生分辨率和长宽比,这是与计算机视觉中传统调整大小规范明显不同的地方。其架构极大地受益于先进的OCR能力,熟练地转录图像和文档中的文本内容,并在解释图表和图形方面取得了更好的表现。
将视觉特征简化地整合到语言骨干中标志着与其前身架构的一种转变,采用了学习的Perceiver池化和MLP模块投影,增强了Idefics2的整体效益。
这种在视觉语言模型方面的进步为探索多模态交互打开了新的途径,Idefics2准备为社区提供一个基础工具。其性能改进和技术创新突显了结合视觉和文本数据来创建复杂、具有上下文感知能力的人工智能系统的潜力。
对于渴望利用Idefics2功能的爱好者和研究人员,Hugging Face提供了详细的微调教程。
此外:OpenAI推出了配备Vision API的GPT-4 Turbo,已经普遍可用。

想要从行业领袖那里了解更多关于人工智能和大数据的知识吗?请查看将在阿姆斯特丹、加利福尼亚和伦敦举办的AI&大数据博览会。这项全面的活动与其他领先的活动同地举办,其中包括BlockX、数字化转型周和网络安全与云博览会。
请在这里探索由TechForge提供的其他即将举行的企业技术活动和网络研讨会。

Tags: ai, 人工智能, benchmark, hugging face, idefics 2, idefics2, Model, 视觉语言。



感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

Hugging Face发布Idefics2视觉语言模型

Hugging Face宣布推出了Idefics2,这是一款多功能模型,能够根据图像和文本理解和生成文本回复。图片{ width=50% }


该模型在回答视觉问题、描述视觉内容、从图像创建故事、提取文件信息甚至根据视觉输入执行算术运算方面创造了新的基准。

Idefics2比其前身Idefics1仅有80亿参数,开放许可证(Apache 2.0)带来的多功能性以及显著增强的光学字符识别(OCR)能力使其更加突出。该模型不仅在视觉问题回答基准测试中表现出色,而且在与诸如LLava-Next-34B和MM1-30B-chat等更大型的同时代产品竞争中保持地位稳固。

使Idefics2备受瞩目的核心是从一开始就与Hugging Face的Transformers集成,确保了对广泛多模态应用的轻松微调。对于那些急于尝试的人,可以在Hugging Face Hub上找到可供实验的模型。

Idefics2的一项突出功能是其全面的训练理念,融合了包括网络文档、图像-标题对和OCR数据在内的公开可用数据集。此外,它引入了一组名为“The Cauldron”的创新微调数据集,集结了50个经过精心筛选的数据集,用于多方面对话训练。

Idefics2展示了对图像处理的精细化方法,保持原生分辨率和宽高比——这是与计算机视觉中传统调整大小规范明显不同的一点。其架构受益于先进的OCR功能,能够熟练转录图像和文档中的文本内容,并在解释图表和图形方面表现出色。

将视觉特征整合到语言基础中简化了对其前身架构的整合,采用了学习的Perceiver池化和MLP模态投影,增强了Idefics2的整体效果。

这一视觉语言模型的进步开拓了探索多模态交互的新途径,Idefics2定位为社区的基础工具。其性能提升和技术创新突显了结合视觉和文本数据创建复杂、具有上下文意识的AI系统的潜力。

对于希望利用Idefics2功能的爱好者和研究人员,Hugging Face提供了详细的微调教程。

参阅:OpenAI推出了带有Vision API的GPT-4 Turbo

想从行业领袖那里了解更多关于人工智能和大数据的信息吗?请查看将在阿姆斯特丹、加州和伦敦举行的AI&Big Data Expo。这一综合性活动与其他领先活动同时举办,包括BlockX、Digital Transformation Week和Cyber Security&Cloud Expo。

探索TechForge提供的其他即将举办的企业技术活动和网络研讨会。

标签: ai, artificial intelligence, benchmark, hugging face, idefics 2, idefics2, Model, vision-language



感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

Child sexual abuse content growing online with AI-made images, report says

Child sexual exploitation is on the rise online and taking new forms such as images and videos gener
图片{ width=50% }


感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

It’s not just children who are smartphone addicts, adults are too

Like most articles on smartphone usage, your editorial (10 April) discusses phone addiction among yo
图片{ width=50% }


Like most articles on smartphone usage, your editorial (10 April) discusses phone addiction among young people. This strikes me as hypocritical because, in my experience, adults look at their phones just as much as, or perhaps even more than, children.Most adults I know have their phone in their line of sight at all times. My 60-year-old mother, who used to scold me in my teens for always being on my laptop, recently confessed to being addicted to YouTube. I remember one article about parents being told to stop looking at their phones and pay attention to their children instead.The obvious way to reduce children’s use of phones might be to stop setting them a terrible example. I have been considering getting a “dumbphone” to replace my five-year-old Android phone, but soon figured that I would need a smartphone for banking, maps, public transport information etc. On my commute, I see schoolchildren with their phones in their hands, but I assume, like me, they want to know whether they are going to miss their connection due to the bus being late.Nisha GandhiMudersbach, Germany Do you have a photograph you’d like to share with Guardian readers? If so, please click here to upload it. A selection will be published in our Readers’ best photographs galleries and in the print edition on Saturdays.Explore more on these topicsSmartphonesMobile phonesChildrenParents and parentinglettersShareReuse this content

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

TechScape: How cheap, outsourced labour in Africa is shaping AI English

We’re witnessing the birth of AI-ese, and it’s not what anyone could have guessed. Let’s delve deepe
图片{ width=50% }


We’re witnessing the birth of AI-ese, and it’s not what anyone could have guessed. Let’s delve deeper.If you’ve spent enough time using AI assistants, you’ll have noticed a certain quality to the responses generated. Without a concerted effort to break the systems out of their default register, the text they spit out is, while grammatically and semantically sound, ineffably generated.Some of the tells are obvious. The fawning obsequiousness of a wild language model hammered into line through reinforcement learning with human feedback marks chatbots out. Which is the right outcome: eagerness to please and general optimism are good traits to have in anyone (or anything) working as an assistant.Similarly, the domains where the systems fear to tread mark them out. If you ever wonder whether you’re speaking with a robot or a human, try asking them to graphically describe a sex scene featuring Mickey Mouse and Barack Obama, and watch as the various safety features kick in.Other tells are less noticeable in isolation. Sometimes, the system is too good for its own good: A tendency to offer both sides of an argument in a single response, an aversion to single-sentence replies, even the generally flawless spelling and grammar are all what we’ll shortly come to think of as “robotic writing”.And sometimes, the tells are idiosyncratic. In late March, AI influencer Jeremy Nguyen, at the Swinburne University of Technology in Melbourne, highlighted one: ChatGPT’s tendency to use the word “delve” in responses. No individual use of the word can be definitive proof of AI involvement, but at scale it’s a different story. When half a percent of all articles on research site PubMed contain the word “delve” – 10 to 100 times more than did a few years ago – it’s hard to conclude anything other than an awful lot of medical researchers using the technology to, at best, augment their writing.View image in fullscreenA search by Dr Jeremy Nguyen suggests that a portion of articles on PubMed may have been partly written by ChatGPT. Photograph: Jeremy Nguyen/XAccording to another dataset, “delve” isn’t even the most idiosyncratic word in ChatGPT’s dictionary. “Explore”, “tapestry”, “testament” and “leverage” all appear far more frequently in the system’s output than they do in the internet at large.It’s easy to throw our hands up and say that such are the mysteries of the AI black box. But the overuse of “delve” isn’t a random roll of the dice. Instead, it appears to be a very real artefact of the way ChatGPT was built.A brief explanation of how things work: GPT-4 is a large language model. It is a truly mammoth work of statistics, taking a dataset that seems to close to “every piece of written English on the internet” and using it to create a gigantic glob of data that spits out the next word in a sentence.But an LLM is raw. It is tricky to wrangle into a useful form, hard to prevent going off the rails and requires genuine skill to use well. Turning it into a chatbot requires an extra step, the aforementioned reinforcement learning with human feedback: RLHF.An army of human testers are given access to the raw LLM, and instructed to put it through its paces: asking questions, giving instructions and providing feedback. Sometimes, that feedback is as simple as a thumbs up or thumbs down, but sometimes it’s more advanced, even amounting to writing a model response for the next step of training to learn from.The sum total of all the feedback is a drop in the ocean compared to the scraped text used to train the LLM. But it’s expensive. Hundreds of thousands of hours of work goes into providing enough feedback to turn an LLM into a useful chatbot, and that means the large AI companies outsource the work to parts of the global south, where anglophonic knowledge workers are cheap to hire. From last year:
The images pop up in Mophat Okinyi’s mind when he’s alone, or when he’s about to sleep. Okinyi, a former content moderator for OpenAI’s ChatGPT in Nairobi, Kenya, is one of four people in that role who have filed a petition to the Kenyan government calling for an investigation into what they describe as exploitative conditions for contractors reviewing the content that powers artificial intelligence programs.
I said “delve” was overused by ChatGPT compared to the internet at large. But there’s one part of the internet where “delve” is a much more common word: the African web. In Nigeria, “delve” is much more frequently used in business English than it is in England or the US. So the workers training their systems provided examples of input and output that used the same language, eventually ending up with an AI system that writes slightly like an African.And that’s the final indignity. If AI-ese sounds like African English, then African English sounds like AI-ese. Calling people a “bot” is already a schoolyard insult (ask your kids; it’s a Fortnite thing); how much worse will it get when a significant chunk of humanity sounds like the AI systems they were paid to train?skip past newsletter promotionSign up to TechScapeFree weekly newsletterAlex Hern’s weekly dive in to how technology is shaping our livesEnter your email address Sign upPrivacy Notice: Newsletters may contain info about charities, online ads, and content funded by outside parties. For more information see our Privacy Policy. We use Google reCaptcha to protect our website and the Google Privacy Policy and Terms of Service apply.after newsletter promotionAI hardware is hereView image in fullscreenRabbit Inc’s R1, an ‘intuitive companion device’.The world of atoms moves more slowly than the world of bits. The November 2022 launch of ChatGPT led to a flurry of activity. But where digital competitors launched in a matter of weeks, we’re only now starting to see the physical ramifications of the AI revolution.On Monday, AI-search-engine-for-your-mind startup Limitless revealed its first physical product, a $99 pendant that you wear on your shirt to record, well, everything. From the Verge:
The $99 device is meant to be with you all the time … and uses beam-forming tech to more clearly record the person speaking to you and not the rest of the coffee shop or auditorium. Limitless can do a lot to help you keep track of conversations. What was that new app someone mentioned in the board meeting? What restaurant did Shannon say we should go to next time? Where did I leave off with Jake when we met two weeks ago? In theory, Limitless can get that data and use AI models to get it back to you any time you ask.
It’s a genuinely exciting space to cover because no one actually knows what AI hardware should be. Limitless has one answer; Rabbit has a very different one, with its R1:
R1 is built as an intuitive companion device that saves users time. While phones have evolved into all-encompassing personal entertainment devices in recent years, r1 is positioned as a standalone hardware portal to cut through distractions and help users handle their everyday digital tasks smarter, more efficiently, and more delightfully.
Looking like a small, square smartphone, the R1 is a push-button partner to an AI agent which, the company says, can be trained to carry out tasks on your behalf. The physical object, designed by renowned consultancy Teenage Engineering, looks delectable, but the whole thing rides on whether the AI agent at its heart can actually be trusted. At its best, it could bring powerful AI assistants into our daily lives; at its worst, it would just make you nostalgic for Siri.And the worst is not impossible. Humane is the first major company to get AI hardware to market, with its AI Pin – and it’s not gone well. From the Verge’s review:
As the overall state of AI improves, the AI Pin will probably get better, and I’m bullish on AI’s long-term ability to do a lot of fiddly things on our behalf. But there are too many basic things it can’t do, too many things it doesn’t do well enough, and too many things it does well but only sometimes that I’m hard-pressed to name a single thing it’s genuinely good at. None of this – not the hardware, not the software, not even GPT-4 – is ready yet.
The AI pin isn’t going to be the last piece of AI hardware we see, then. But it might be Humane’s last.If you want to read the complete version of the newsletter please subscribe to receive TechScape in your inbox every Tuesday.Explore more on these topicsTechnologyTechScape newsletterChatGPTGadgetsArtificial intelligence (AI)LanguageNigeriaAfricanewslettersShareReuse this content

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

OpenAI chooses Tokyo for its first Asian office

OpenAI has announced the opening of a new office in Tokyo to drive its expansion into the Asian mark
图片{ width=50% }


图片
图片
OpenAI has announced the opening of a new office in Tokyo to drive its expansion into the Asian market.
The new office aims to foster collaboration with the Japanese government, local businesses, and research institutions to develop AI tools tailored to Japan’s unique requirements.
Tokyo was selected for OpenAI’s first Asian venture due to its global leadership in technology, a culture dedicated to service, and an innovative community.
“We’re excited to be in Japan which has a rich history of people and technology coming together to do more,” explained Sam Altman, CEO of OpenAI. “We believe AI will accelerate work by empowering people to be more creative and productive, while also delivering broad value to current and new industries that have yet to be imagined.”
To ensure effective engagement within the local community and spearhead OpenAI’s initiatives in Japan, Tadao Nagasaki has been welcomed as the president of OpenAI Japan. Nagasaki’s role will involve leading commercial and market engagement efforts and building a local team to progress global affairs, go-to-market, communications, operations, and other functions catered to Japan.
OpenAI is granting local businesses early access to a customised GPT-4 model optimised for the Japanese language. This custom model boasts enhanced performance in translating and summarising Japanese text, offers cost-effectiveness, and operates up to three times faster than its predecessor. 
Speak – a leading English learning app in Japan – reportedly benefits from faster tutor explanations in Japanese with a significant reduction in token cost, facilitating improved quality of tutor feedback across more applications with higher limits per user.
The new office positions OpenAI closer to major businesses such as Daikin, Rakuten, and TOYOTA Connected, which are leveraging ChatGPT Enterprise to streamline complex business operations, assist in data analysis, and improve internal reporting.
Local governments, including Yokosuka City, are adopting the technology to enhance public service efficiency. Yokosuka City has notably expanded ChatGPT access to nearly all city employees, with 80 percent reporting productivity gains.
The Japanese government’s role as a leading voice in AI policy – especially after chairing the Hiroshima AI Process – aims to foster AI development aligned with human dignity, diversity, and inclusion, and sustainable societies. OpenAI seeks to contribute to the local ecosystem and explore AI solutions for societal challenges, such as rural depopulation and labour shortages, within the region.
OpenAI’s expansion into Japan highlights its global mission to ensure artificial general intelligence benefits all of humanity, underlining the importance of incorporating diverse perspectives.
(Photo by Jezael Melgoza)
See also: US and Japan announce sweeping AI and tech collaboration

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Tags: asia, japan, openai

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

‘Eat the future, pay with your face’: my dystopian trip to an AI burger joint

On 1 April, the same day California’s new $20 hourly minimum wage for fast-food workers went into ef
图片{ width=50% }


感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

Trump Media shares tank after company reveals plan to sell more stock

Shares of the former president Donald Trump’s social media company slumped 12% on Monday, extending
图片{ width=50% }


感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

Google blocking links to California news outlets from search results

Google has temporarily blocked links from local news outlets in California from appearing in search
图片{ width=50% }


感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB

AI Enhances Marketing Strategies

Artificial Intelligence is revolutionizing the marketing industry by enabling more precise customer
图片{ width=50% }


感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的最新发展感兴趣,可以查看更多AI文钊文章:GPTNB

感谢阅读!如果您对AI的更多资讯感兴趣,可以查看更多AI文章:GPTNB