AI News Daily 04-10
AI News|Daily Briefing|Full-Net Data Aggregation|Frontier Science Exploration|Industry Voices|Open Source Innovation|AI & Humanity's Future| Visit Web Version ↗️ | Join Group Chat
Today’s Digest
Coze 2.5 empowers Agents with cloud devices and long-term memory; Tencent Lobster Browser enables WeChat remote control of computers.
Tsinghua AutoSOTA breaks 105 SOTAs in a week; Zhiyuan GO-2 embodied model reaches 98.5% success.
OpenAI model conquers five Erdős math problems at once; Microsoft's Chain of Thought compression sees a 15-point accuracy jump.
Microsoft MarkItDown open-sources universal format conversion; Financial large model Kronos and HKU DeepTutor launched concurrently.Product & Feature Updates
Coze 2.5: Kicking Off the Agent World Era. Coze 2.5 is officially launched, bringing the Agent World era! Following DeerFlow 2.0’s open-sourcing, ByteDance is making big waves again. Agents now get their own independent cloud devices, letting them control phones and computers. With a 24/7 workstation always on standby, and long-term memory inheriting DeerFlow’s tech, agents even get a dedicated email for a digital identity. Talk about a massive leap in collaborative experience! 🚀

Tencent Lobster QBotClaw Browser Agent is Here. Tencent Cloud has dropped the Lobster QBotClaw Browser Agent, which natively integrates AI Agent capabilities. You can remotely control your computer with a WeChat scan – even if it’s locked! Cross-page operations? Done with just a single command. The Mac version is live, with Windows coming soon. Get ready! ⚡️
Google Gemini Unlocks Lyria 3 for Music Creation. Google’s Gemini has unlocked the Lyria 3 model, offering 5 full songs for free every day! Each track can hit 3 minutes, and the ecosystem has already generated over 100 million songs. This literally brings music creation’s barrier to zero. How cool is that? 🎶
Mistral Open-Sources Its First Voice Model, Voxtral. Mistral just dropped Voxtral, its first open-source TTS large model! This 4B parameter powerhouse runs on mobile and can clone a voice from just 3 seconds of audio. With a hybrid AR and flow-matching architecture, it boasts first-packet latency as low as 90 milliseconds and supports cross-language transfer for 9 languages. Pretty wild, right? 🎤

Ocean Engine’s PinXingYun Reshapes AI Marketing. Ocean Engine has launched its PinXingYun system, totally reshaping AI marketing. AI can now automatically identify video highlights to seamlessly insert ads. It also generates viral mini-series spin-offs, where brands smoothly blend into the storyline. XiaoXing AI digs deep into viral logic, engaging users in co-creation, leading to absolutely explosive traffic growth. Talk about next-level marketing! 📈

Cutting-Edge Research
Tsinghua AutoSOTA: 105 SOTAs in One Week. Tsinghua has unveiled the AutoSOTA system, delivering an end-to-end research closed-loop. This research agent automates parameter tuning and experiments, smashing 105 top-conference SOTAs in just one week! The model also boosts average performance by 10%, with the paper already published . Researchers can finally kick back and let AI do the heavy lifting. 😎

Zhiyuan GO-2 Sets New Embodied AI Benchmark. Zhiyuan Robotics has unleashed its embodied large model GO-2, pioneering an action chain-of-thought mechanism. It rocks an asynchronous dual-system architecture for fast and slow coordination, achieving a whopping 98.5% success rate in benchmark tests. Robots can finally get stuff done obediently! Embodied intelligence is seriously speeding towards practical application. 💪
Tencent MoT Architecture’s 2B Model Sweeps Leaderboards. Tencent Hunyuan has dropped an embodied foundation model with an innovative MoT architecture that’s crushing it! Its 2B parameters snatched 16 best-in-class titles across 22 evaluations. This hybrid architecture obliterates catastrophic forgetting and boasts incredible spatial awareness. The project source code (with 1.2k stars) is now fully open. That’s a mic drop moment! 🎤

Microsoft’s Chain-of-Thought Compression Tech Amazes. Microsoft is stepping up its inference optimization game, following Google’s TurboQuant KV cache compression. Their models now learn to compress chain-of-thought during generation, retaining invisible information via KV cache. This leads to a whopping 15-point jump in accuracy and doubles throughput! Inference resource optimization is definitely the industry’s hot topic right now. 🔥

VGGT-SLAM++ Solves Visual Localization Drift. Scientists have unveiled a new visual localization solution, VGGT-SLAM++, that’s a game-changer. It integrates DINOv2 for semantic enhancement and geometric perception, letting robots finally kiss localization drift goodbye! Large-scale mapping accuracy gets a major boost, pushing visual SLAM into an exciting new phase. ✨
PlaneCycle: Zero-Training 2D to 3D Conversion. PlaneCycle is here, smashing dimensionality barriers! This solution needs zero retraining and no adapters, looping 2D foundation model features across three planes. The open-source code is out, and it’s seriously powerful, boosting performance without any loss. Mind-blowing stuff! 🤯
PRISM Physical Model Achieves SOTA in Dehazing. PRISM is laser-focused on real-world dehazing tasks, introducing the PSAR physical reconstruction framework. It specifically tackles the industry challenge of non-uniform fog, delivering results comparable to commercial-grade solutions. Talk about crystal-clear image restoration! 🌬️
SurFITR Dataset Fights Surveillance Image Forgery. The SurFITR dataset is here to fight back against AI forgery, which is increasingly threatening visual evidence security. This dataset boasts 130,000 manipulated images, generated with fine-tuned precision by multimodal LLMs. It significantly boosts detection and localization capabilities, taking security offense and defense to the next level. Game on! 🛡️
Industry Outlook & Social Impact
- ByteDance Becomes an AI Talent Incubator. ByteDance is becoming an unexpected AI talent incubator as its core talent continues to flow out, leading to an intensifying resignation wave. Former employees are even launching companies like AISharing Technology that are hot on ByteDance’s heels. In response, the company has rolled out Doubao stock incentives and is favoring recent graduates for promotions. Management is clearly trying to stem the bleeding with new blood. At the end of the day, young talent’s learning power is the real core asset. 💼
Top Open-Source Projects
Microsoft MarkItDown: The Universal Format Conversion Ninja. Microsoft has open-sourced MarkItDown, an all-in-one Markdown conversion tool that’s a total game-changer! It supports one-click conversion for PDFs, Word docs, audio files, and even YouTube links. Plus, it natively adapts to MCP protocol and RAG workflows. You can supercharge its capabilities with rich external plugins . This is a developer’s dream – just one command to install! 🎉

Kronos: Financial Large Model Makes Its Open-Source Debut. The Kronos financial market language model has officially launched, already grabbing over 2.8k stars in the community! It uses time-series encoding to process financial data, precisely identifying market patterns and optimizing quantitative trading. Get ready, the financial AI toolkit just got a powerful new member! 💰

HKU DeepTutor Crafts Personalized Smart Tutors. HKU has released DeepTutor, a personalized intelligent tutor that’s a game-changer for education. It uses a native Agent architecture to instantly pinpoint learning pain points, empowering a truly personalized educational experience. Finally, a fresh solution for tough educational challenges! 🎓

Superpowers Framework: Stacking Skills for Agents. The Superpowers agent development framework has officially launched! In this “year of the agent,” it provides a complete methodology for tackling complex logic, letting you rapidly stack skills for your Agents. Developer efficiency is set to double, accelerating the shift from writing code to orchestrating Agents. Get ready to build some super agents! 🦸♀️
Karpathy’s Programming Secret Sauce Config Library Goes Live. Following the community buzz around Karpathy’s methodology, this programming experience configuration library has gone live! It precisely fixes the weaknesses of large models writing code, significantly boosting the code output quality for models like Claude. The project has already garnered 450 stars, and deployment is a breeze. Talk about a code-writing superpower! 💻

Social Media Shares
OpenClaw Releases memory-wiki Memory Plugin. OpenClaw has dropped its memory-wiki memory organization plugin, following up on Karpathy’s methodology and building on the previous “Light/Deep/REM” three-stage memory consolidation mechanism. It comes with confidence labeling and health checks built-in, transforming messy Markdown into a structured knowledge base in seconds. Outdated info? Automatically purged! This thing seriously boosts Agent retrieval accuracy. That’s smart! 🧠
OpenAI Model Conquers Five Erdős Math Problems in One Shot. An internal OpenAI model just crushed it, solving five Erdős math problems in one go! Mathematicians are raving about the incredibly elegant proof processes, especially the stunning answer to Erdős 1091. This marks a monumental leap in logical reasoning, proving that AI’s math prowess is seriously not to be underestimated. Mind blown! 💫

AI News Daily - Multi-Channel
| 💬 WeChat Official Account | 📹 Douyin |
|---|---|
| Official Account: Hexi 2077 | Self-Media Account |
![]() | ![]() |

