GPTNB · AI资讯与技术分享站

2024-04-17发表2025-03-21更新 ByteAILib 6 分钟读完 (大约932个字)

Hugging Face宣布推出Idefics2，这是一个多才多艺的模型，能够根据图像和文本理解和生成文本回应。

该模型在回答视觉问题、描述视觉内容、从图像创建故事、提取文档信息，甚至基于视觉输入执行算术运算方面树立了新的标杆。
Idefics2通过仅有80亿参数和其开放许可（Apache 2.0）所提供的多功能性以及显著增强的光学字符识别（OCR）能力，超越了其前身Idefics1。
该模型不仅在视觉问题回答基准测试中表现出色，而且在面对LLava-Next-34B和MM1-30B-chat等规模更大的同类模型时也表现出色。

Idefics2的吸引力核心在于从一开始就与Hugging Face的Transformers集成，确保方便进行广泛的多模态应用的微调。对于那些渴望深入研究的人来说，可以在Hugging Face Hub上实验模型。
Idefics2的一个突出特点是其全面的训练理念，融合了包括网页文档、图像标题对和OCR数据在内的公开可用数据集。此外，它引入了一个名为“大釜”的创新微调数据集，将50个精心策划的数据集融合在一起，用于多方面的对话训练。
Idefics2展示了一种对图像处理的精细化方法，保持原生分辨率和长宽比，这是与计算机视觉中传统调整大小规范明显不同的地方。其架构极大地受益于先进的OCR能力，熟练地转录图像和文档中的文本内容，并在解释图表和图形方面取得了更好的表现。
将视觉特征简化地整合到语言骨干中标志着与其前身架构的一种转变，采用了学习的Perceiver池化和MLP模块投影，增强了Idefics2的整体效益。
这种在视觉语言模型方面的进步为探索多模态交互打开了新的途径，Idefics2准备为社区提供一个基础工具。其性能改进和技术创新突显了结合视觉和文本数据来创建复杂、具有上下文感知能力的人工智能系统的潜力。
对于渴望利用Idefics2功能的爱好者和研究人员，Hugging Face提供了详细的微调教程。
此外：OpenAI推出了配备Vision API的GPT-4 Turbo，已经普遍可用。

想要从行业领袖那里了解更多关于人工智能和大数据的知识吗？请查看将在阿姆斯特丹、加利福尼亚和伦敦举办的AI&大数据博览会。这项全面的活动与其他领先的活动同地举办，其中包括BlockX、数字化转型周和网络安全与云博览会。
请在这里探索由TechForge提供的其他即将举行的企业技术活动和网络研讨会。

Tags: ai, 人工智能, benchmark, hugging face, idefics 2, idefics2, Model, 视觉语言。

感谢阅读！如果您对AI的最新发展感兴趣，可以查看更多AI文钊文章：GPTNB。

感谢阅读！如果您对AI的更多资讯感兴趣，可以查看更多AI文章：GPTNB。

2024-04-17发表2025-03-21更新 ByteAILib 6 分钟读完 (大约900个字)

Hugging Face发布Idefics2视觉语言模型

Hugging Face宣布推出了Idefics2，这是一款多功能模型，能够根据图像和文本理解和生成文本回复。{ width=50% }

该模型在回答视觉问题、描述视觉内容、从图像创建故事、提取文件信息甚至根据视觉输入执行算术运算方面创造了新的基准。

Idefics2比其前身Idefics1仅有80亿参数，开放许可证（Apache 2.0）带来的多功能性以及显著增强的光学字符识别（OCR）能力使其更加突出。该模型不仅在视觉问题回答基准测试中表现出色，而且在与诸如LLava-Next-34B和MM1-30B-chat等更大型的同时代产品竞争中保持地位稳固。

使Idefics2备受瞩目的核心是从一开始就与Hugging Face的Transformers集成，确保了对广泛多模态应用的轻松微调。对于那些急于尝试的人，可以在Hugging Face Hub上找到可供实验的模型。

Idefics2的一项突出功能是其全面的训练理念，融合了包括网络文档、图像-标题对和OCR数据在内的公开可用数据集。此外，它引入了一组名为“The Cauldron”的创新微调数据集，集结了50个经过精心筛选的数据集，用于多方面对话训练。

Idefics2展示了对图像处理的精细化方法，保持原生分辨率和宽高比——这是与计算机视觉中传统调整大小规范明显不同的一点。其架构受益于先进的OCR功能，能够熟练转录图像和文档中的文本内容，并在解释图表和图形方面表现出色。

将视觉特征整合到语言基础中简化了对其前身架构的整合，采用了学习的Perceiver池化和MLP模态投影，增强了Idefics2的整体效果。

这一视觉语言模型的进步开拓了探索多模态交互的新途径，Idefics2定位为社区的基础工具。其性能提升和技术创新突显了结合视觉和文本数据创建复杂、具有上下文意识的AI系统的潜力。

对于希望利用Idefics2功能的爱好者和研究人员，Hugging Face提供了详细的微调教程。

参阅：OpenAI推出了带有Vision API的GPT-4 Turbo

想从行业领袖那里了解更多关于人工智能和大数据的信息吗？请查看将在阿姆斯特丹、加州和伦敦举行的AI＆Big Data Expo。这一综合性活动与其他领先活动同时举办，包括BlockX、Digital Transformation Week和Cyber Security＆Cloud Expo。

探索TechForge提供的其他即将举办的企业技术活动和网络研讨会。

标签: ai, artificial intelligence, benchmark, hugging face, idefics 2, idefics2, Model, vision-language

感谢阅读！如果您对AI的最新发展感兴趣，可以查看更多AI文钊文章：GPTNB。

感谢阅读！如果您对AI的更多资讯感兴趣，可以查看更多AI文章：GPTNB。

2024-04-16发表2025-03-21更新 ByteAILib 6 分钟读完 (大约930个字)

Child sexual abuse content growing online with AI-made images, report says

Child sexual exploitation is on the rise online and taking new forms such as images and videos gener
{ width=50% }

Child sexual exploitation is on the rise online and taking new forms such as images and videos generated by artificial intelligence, according to an annual assessment released on Tuesday by the National Center for Missing & Exploited Children (NCMEC), a US-based clearinghouse for the reporting of child sexual abuse material.‘Pimps’ use Instagram to glorify sexual violence and abuse, investigation findsRead moreReports to the NCMEC of child abuse online rose by more than 12% in 2023 compared with the previous year, surpassing 36.2m reports, the organization said in its annual CyberTipline report. The majority of tips received were related to the circulation of child sexual abuse material (CSAM) such as photos and videos, but there was also an increase in reports of financial sexual extortion, when an online predator lures a child into sending nude images or videos and then demands money.Some children and families were extorted for financial gain by predators using AI-made CSAM, according to the NCMEC.The center received 4,700 reports of images or videos of the sexual exploitation of children made by generative AI, a category it only started tracking in 2023, a spokesperson said.“The NCMEC is deeply concerned about this quickly growing trend, as bad actors can use artificial intelligence to create deepfaked sexually explicit images or videos based on any photograph of a real child or generate CSAM depicting computer-generated children engaged in graphic sexual acts,” the NCMEC report states.“For the children seen in deepfakes and their families, it is devastating.”AI-generated child abuse content also impedes the identification of real child victims, according to the organization.Creating such material is illegal in the United States, as making any visual depictions of minors engaging in sexually explicit conduct is a federal crime, according to a Massachusetts-based prosecutor from the Department of Justice, who spoke on the condition of anonymity.In total in 2023, the CyberTipline received more than 35.9m reports that referred to incidents of suspected CSAM, more than 90% of it uploaded outside the US. Roughly 1.1m reports were referred to police in the US, and 63,892 reports were urgent or involved a child in imminent danger, according to Tuesday’s report.There were 186,000 reports regarding online enticement, up 300% from 2022; enticement is a form of exploitation involving an individual who communicates online with someone believed to be a child with the intent to commit a sexual offense or abduction.The platform that submitted the most cybertips was Facebook, with 17,838,422. Meta’s Instagram made 11,430,007 reports, and its WhatsApp messaging service made 1,389,618. Google sent NCMEC 1,470,958 tips, Snapchat sent 713,055, TikTok sent 590,376 and Twitter reported 597,087.skip past newsletter promotionSign up to First ThingFree daily newsletterOur US morning briefing breaks down the key stories of the day, telling you what’s happening and why it mattersEnter your email address Sign upPrivacy Notice: Newsletters may contain info about charities, online ads, and content funded by outside parties. For more information see our Privacy Policy. We use Google reCaptcha to protect our website and the Google Privacy Policy and Terms of Service apply.after newsletter promotionIn total, 245 companies submitted CyberTipline reports to the NCMEC out of 1,600 companies around the world who have registered their participation with the cybertip reporting program. US-based internet service providers, such as social media platforms, are legally mandated to report instances of CSAM to the CyberTipline when they become aware of them.According to the NCMEC, there is disconnect between the volumes of reporting and the quality of the reports submitted. The center and law enforcement cannot legally take action in response to some of the reports, including ones made by content moderation algorithms, without human input. This technicality can prevent police from seeing reports of potential child abuse.“The relatively low number of reporting companies and the poor quality of many reports marks the continued need for action from Congress and the global tech community,” the NCMEC report states. In the US, call or text the Childhelp abuse hotline on 800-422-4453 or visit their website for more resources and to report child abuse or DM for help. You can also report child sexual exploitation at NCMEC’s CyberTipline. For adult survivors of child abuse, help is available at ascasupport.org. In the UK, the NSPCC offers support to children on 0800 1111, and adults concerned about a child on 0808 800 5000. The National Association for People Abused in Childhood (Napac) offers support for adult survivors on 0808 801 0331. In Australia, children, young adults, parents and teachers can contact the Kids Helpline on 1800 55 1800, or Bravehearts on 1800 272 831, and adult survivors can contact Blue Knot Foundation on 1300 657 380. Other sources of help can be found at Child Helplines International.Explore more on these topicsTechnologySex traffickingChildrenMetaFacebookWhatsAppGooglenewsShareReuse this content

感谢阅读！如果您对AI的最新发展感兴趣，可以查看更多AI文钊文章：GPTNB。

感谢阅读！如果您对AI的更多资讯感兴趣，可以查看更多AI文章：GPTNB。

2024-04-16发表2025-03-21更新 ByteAILib 3 分钟读完 (大约386个字)

It’s not just children who are smartphone addicts, adults are too

Like most articles on smartphone usage, your editorial (10 April) discusses phone addiction among yo
{ width=50% }

感谢阅读！如果您对AI的最新发展感兴趣，可以查看更多AI文钊文章：GPTNB。

感谢阅读！如果您对AI的更多资讯感兴趣，可以查看更多AI文章：GPTNB。

2024-04-16发表2025-03-21更新 ByteAILib 11 分钟读完 (大约1719个字)

TechScape: How cheap, outsourced labour in Africa is shaping AI English

We’re witnessing the birth of AI-ese, and it’s not what anyone could have guessed. Let’s delve deepe
{ width=50% }

We’re witnessing the birth of AI-ese, and it’s not what anyone could have guessed. Let’s delve deeper.If you’ve spent enough time using AI assistants, you’ll have noticed a certain quality to the responses generated. Without a concerted effort to break the systems out of their default register, the text they spit out is, while grammatically and semantically sound, ineffably generated.Some of the tells are obvious. The fawning obsequiousness of a wild language model hammered into line through reinforcement learning with human feedback marks chatbots out. Which is the right outcome: eagerness to please and general optimism are good traits to have in anyone (or anything) working as an assistant.Similarly, the domains where the systems fear to tread mark them out. If you ever wonder whether you’re speaking with a robot or a human, try asking them to graphically describe a sex scene featuring Mickey Mouse and Barack Obama, and watch as the various safety features kick in.Other tells are less noticeable in isolation. Sometimes, the system is too good for its own good: A tendency to offer both sides of an argument in a single response, an aversion to single-sentence replies, even the generally flawless spelling and grammar are all what we’ll shortly come to think of as “robotic writing”.And sometimes, the tells are idiosyncratic. In late March, AI influencer Jeremy Nguyen, at the Swinburne University of Technology in Melbourne, highlighted one: ChatGPT’s tendency to use the word “delve” in responses. No individual use of the word can be definitive proof of AI involvement, but at scale it’s a different story. When half a percent of all articles on research site PubMed contain the word “delve” – 10 to 100 times more than did a few years ago – it’s hard to conclude anything other than an awful lot of medical researchers using the technology to, at best, augment their writing.View image in fullscreenA search by Dr Jeremy Nguyen suggests that a portion of articles on PubMed may have been partly written by ChatGPT. Photograph: Jeremy Nguyen/XAccording to another dataset, “delve” isn’t even the most idiosyncratic word in ChatGPT’s dictionary. “Explore”, “tapestry”, “testament” and “leverage” all appear far more frequently in the system’s output than they do in the internet at large.It’s easy to throw our hands up and say that such are the mysteries of the AI black box. But the overuse of “delve” isn’t a random roll of the dice. Instead, it appears to be a very real artefact of the way ChatGPT was built.A brief explanation of how things work: GPT-4 is a large language model. It is a truly mammoth work of statistics, taking a dataset that seems to close to “every piece of written English on the internet” and using it to create a gigantic glob of data that spits out the next word in a sentence.But an LLM is raw. It is tricky to wrangle into a useful form, hard to prevent going off the rails and requires genuine skill to use well. Turning it into a chatbot requires an extra step, the aforementioned reinforcement learning with human feedback: RLHF.An army of human testers are given access to the raw LLM, and instructed to put it through its paces: asking questions, giving instructions and providing feedback. Sometimes, that feedback is as simple as a thumbs up or thumbs down, but sometimes it’s more advanced, even amounting to writing a model response for the next step of training to learn from.The sum total of all the feedback is a drop in the ocean compared to the scraped text used to train the LLM. But it’s expensive. Hundreds of thousands of hours of work goes into providing enough feedback to turn an LLM into a useful chatbot, and that means the large AI companies outsource the work to parts of the global south, where anglophonic knowledge workers are cheap to hire. From last year:
The images pop up in Mophat Okinyi’s mind when he’s alone, or when he’s about to sleep. Okinyi, a former content moderator for OpenAI’s ChatGPT in Nairobi, Kenya, is one of four people in that role who have filed a petition to the Kenyan government calling for an investigation into what they describe as exploitative conditions for contractors reviewing the content that powers artificial intelligence programs.
I said “delve” was overused by ChatGPT compared to the internet at large. But there’s one part of the internet where “delve” is a much more common word: the African web. In Nigeria, “delve” is much more frequently used in business English than it is in England or the US. So the workers training their systems provided examples of input and output that used the same language, eventually ending up with an AI system that writes slightly like an African.And that’s the final indignity. If AI-ese sounds like African English, then African English sounds like AI-ese. Calling people a “bot” is already a schoolyard insult (ask your kids; it’s a Fortnite thing); how much worse will it get when a significant chunk of humanity sounds like the AI systems they were paid to train?skip past newsletter promotionSign up to TechScapeFree weekly newsletterAlex Hern’s weekly dive in to how technology is shaping our livesEnter your email address Sign upPrivacy Notice: Newsletters may contain info about charities, online ads, and content funded by outside parties. For more information see our Privacy Policy. We use Google reCaptcha to protect our website and the Google Privacy Policy and Terms of Service apply.after newsletter promotionAI hardware is hereView image in fullscreenRabbit Inc’s R1, an ‘intuitive companion device’.The world of atoms moves more slowly than the world of bits. The November 2022 launch of ChatGPT led to a flurry of activity. But where digital competitors launched in a matter of weeks, we’re only now starting to see the physical ramifications of the AI revolution.On Monday, AI-search-engine-for-your-mind startup Limitless revealed its first physical product, a $99 pendant that you wear on your shirt to record, well, everything. From the Verge:
The $99 device is meant to be with you all the time … and uses beam-forming tech to more clearly record the person speaking to you and not the rest of the coffee shop or auditorium. Limitless can do a lot to help you keep track of conversations. What was that new app someone mentioned in the board meeting? What restaurant did Shannon say we should go to next time? Where did I leave off with Jake when we met two weeks ago? In theory, Limitless can get that data and use AI models to get it back to you any time you ask.
It’s a genuinely exciting space to cover because no one actually knows what AI hardware should be. Limitless has one answer; Rabbit has a very different one, with its R1:
R1 is built as an intuitive companion device that saves users time. While phones have evolved into all-encompassing personal entertainment devices in recent years, r1 is positioned as a standalone hardware portal to cut through distractions and help users handle their everyday digital tasks smarter, more efficiently, and more delightfully.
Looking like a small, square smartphone, the R1 is a push-button partner to an AI agent which, the company says, can be trained to carry out tasks on your behalf. The physical object, designed by renowned consultancy Teenage Engineering, looks delectable, but the whole thing rides on whether the AI agent at its heart can actually be trusted. At its best, it could bring powerful AI assistants into our daily lives; at its worst, it would just make you nostalgic for Siri.And the worst is not impossible. Humane is the first major company to get AI hardware to market, with its AI Pin – and it’s not gone well. From the Verge’s review:
As the overall state of AI improves, the AI Pin will probably get better, and I’m bullish on AI’s long-term ability to do a lot of fiddly things on our behalf. But there are too many basic things it can’t do, too many things it doesn’t do well enough, and too many things it does well but only sometimes that I’m hard-pressed to name a single thing it’s genuinely good at. None of this – not the hardware, not the software, not even GPT-4 – is ready yet.
The AI pin isn’t going to be the last piece of AI hardware we see, then. But it might be Humane’s last.If you want to read the complete version of the newsletter please subscribe to receive TechScape in your inbox every Tuesday.Explore more on these topicsTechnologyTechScape newsletterChatGPTGadgetsArtificial intelligence (AI)LanguageNigeriaAfricanewslettersShareReuse this content

感谢阅读！如果您对AI的最新发展感兴趣，可以查看更多AI文钊文章：GPTNB。

感谢阅读！如果您对AI的更多资讯感兴趣，可以查看更多AI文章：GPTNB。

2024-04-16发表2025-03-21更新 ByteAILib 5 分钟读完 (大约687个字)

OpenAI chooses Tokyo for its first Asian office

OpenAI has announced the opening of a new office in Tokyo to drive its expansion into the Asian mark
{ width=50% }

OpenAI has announced the opening of a new office in Tokyo to drive its expansion into the Asian market.
The new office aims to foster collaboration with the Japanese government, local businesses, and research institutions to develop AI tools tailored to Japan’s unique requirements.
Tokyo was selected for OpenAI’s first Asian venture due to its global leadership in technology, a culture dedicated to service, and an innovative community.
“We’re excited to be in Japan which has a rich history of people and technology coming together to do more,” explained Sam Altman, CEO of OpenAI. “We believe AI will accelerate work by empowering people to be more creative and productive, while also delivering broad value to current and new industries that have yet to be imagined.”
To ensure effective engagement within the local community and spearhead OpenAI’s initiatives in Japan, Tadao Nagasaki has been welcomed as the president of OpenAI Japan. Nagasaki’s role will involve leading commercial and market engagement efforts and building a local team to progress global affairs, go-to-market, communications, operations, and other functions catered to Japan.
OpenAI is granting local businesses early access to a customised GPT-4 model optimised for the Japanese language. This custom model boasts enhanced performance in translating and summarising Japanese text, offers cost-effectiveness, and operates up to three times faster than its predecessor.
Speak – a leading English learning app in Japan – reportedly benefits from faster tutor explanations in Japanese with a significant reduction in token cost, facilitating improved quality of tutor feedback across more applications with higher limits per user.
The new office positions OpenAI closer to major businesses such as Daikin, Rakuten, and TOYOTA Connected, which are leveraging ChatGPT Enterprise to streamline complex business operations, assist in data analysis, and improve internal reporting.
Local governments, including Yokosuka City, are adopting the technology to enhance public service efficiency. Yokosuka City has notably expanded ChatGPT access to nearly all city employees, with 80 percent reporting productivity gains.
The Japanese government’s role as a leading voice in AI policy – especially after chairing the Hiroshima AI Process – aims to foster AI development aligned with human dignity, diversity, and inclusion, and sustainable societies. OpenAI seeks to contribute to the local ecosystem and explore AI solutions for societal challenges, such as rural depopulation and labour shortages, within the region.
OpenAI’s expansion into Japan highlights its global mission to ensure artificial general intelligence benefits all of humanity, underlining the importance of incorporating diverse perspectives.
(Photo by Jezael Melgoza)
See also: US and Japan announce sweeping AI and tech collaboration