They Created an "Unsuppressible Douyin Phone" That Has Already Secured Tens of Millions in Angel Investment

robot
Abstract generation in progress

“Help me order a milk tea.”

“Help me buy a basketball on JD.com.”

“Help me purchase a movie ticket on Maoyan.”

The control battle over Doubao Mobile and various apps is still unresolved. The operation of Qianwen’s food delivery has only deeply integrated into its own ecosystem. Even the popular online crayfish (OpenClaw) hasn’t solved the cross-device automation challenge.

Recently, two engineers from a major hardware company, Zhang Zhiyong and Shan Wenbang, used their self-developed Agent ZeroFlow. Based on a domestically developed large model with multimodal capabilities, they achieved perfect multimodal control on Android devices, Chrome browsers, and PC desktops. ZeroFlow can look at the screen, click, swipe, input, and complete a series of complex cross-device automation tasks just like a human.

Different Technical Approaches

In attempting to realize cross-device automation, Doubao Mobile and Zhizhuo’s AutoGLM have taken completely different technical routes.

Doubao Mobile, through cooperation with phone manufacturers, obtained very high permissions, allowing it to bypass user or app authorization steps. However, this also triggered resistance from app developers, leading to subsequent bans.

Zhizhuo’s open-source AutoGLM is based on adb protocol permissions. But this mode cannot run directly on users’ phones, so AutoGLM uses a remote virtual machine mode, operating the user’s phone via adb inside a virtual machine. This approach has higher trust costs.

ZeroFlow’s core solution relies on Android Accessibility Service. Originally designed as a system-level assistive feature for visually impaired users, once granted this permission, the agent can read screen content, access all text, buttons, input fields, and their positions. It can also simulate human operations like clicking, long-pressing, swiping, and typing. This approach heavily depends on the agent and underlying model’s multimodal operation capabilities, making it theoretically impossible for app developers to ban.

The principle sounds simple, but the actual development process is far more complex. Zhang Zhiyong told investors that one of the biggest difficulties is that many domestic web pages, from the start, incorporate verification steps and hidden engineering measures to prevent automation (essentially anti-scraping and anti-bot measures). For example, a button might appear in one place, but its real element could be far away. This makes understanding web pages from code very difficult, but from a multimodal perspective, it’s much easier. This is also the fundamental reason some large models cannot read web links but can read webpage screenshots.

On the other hand, how to use the fewest screenshots for the agent to understand the correct intent is also an engineering optimization challenge.

Shan Wenbang told investors that ads and automatic redirects on web pages can interfere with multimodal understanding. Using the most powerful multimodal models can give the most accurate answers, but the token cost might be unaffordable for ordinary users. How to use cheaper models, capture the fewest images, and achieve the best understanding results is a very challenging engineering problem.

Balancing Security and Convenience

When asked whether they worry about big companies producing similar products, Zhang Zhiyong said he is not concerned. Large companies, due to their ecosystem isolation, cannot achieve truly cross-platform, cross-device automation even if they have the technology. Once one big company gets involved, others will target it. This is precisely the advantage of startup teams.

ZeroFlow draws on the open-source philosophy of OpenClaw, with deep architecture design and optimization focused on security, model adaptability, and convenience.

The core security risk of OpenClaw is that it, as an “AI with tool-calling capabilities,” can execute shell commands, read/write files, send messages, and access the internet. If its prompt is injected or manipulated, it could lead to host control or sensitive data leaks.

ZeroFlow addresses this risk through sandbox isolation and a small model desensitization mechanism. It isolates and hides sensitive user information, such as keys, in the workspace, making it difficult for the AI to find sensitive files. Additionally, it uses a small model to monitor all user interactions with the large model, encrypting sensitive information when detected. This ensures that sensitive files stored in the cloud are hard to locate and even harder to interpret. Under this dual mechanism, ZeroFlow allows ordinary users to enjoy the convenience of agents while maximizing privacy protection.

In terms of convenience, ZeroFlow has lowered the barrier to use to a new “minimal” level. The entire deployment process is extremely close to typical internet product usage habits, almost imperceptible. Just open a browser, register an account on the website, and start using it in the chat box.

OpenClaw, based on OpenAI/Anthropic’s Tool Calling standards, has compatibility issues with domestic models. ZeroFlow has optimized engineering for mainstream domestic large models (like Kimi, DeepSeek, etc.), improving tool call experience and reducing prompt length by nearly 40%, significantly lowering token costs.

Zhang Zhiyong told investors that the token cost for ordinary users could be reduced by about 30% with ZeroFlow.

From Programming Agents to General-Purpose Agents

ZeroFlow was not just a crude copycat riding the wave.

When the wave of large language models first emerged, Zhang Zhiyong and Shan Wenbang’s team were at the forefront. Instead of chasing a grand narrative, they focused on solving a very specific pain point: how to free engineers from complex coding details and truly apply intelligence to creation. They internally incubated the first generation of programming agents—“coding partners” that understand context, predict intent, and proactively complete logic.

This tool grew quietly within their engineering system. From naive prompt engineering during GPT-3.5, to multi-turn memory, tool invocation, code review loops… each iteration was driven by real needs. Over several years, this system helped them achieve several times the efficiency in R&D.

At the time of OpenClaw’s breakout, Zhang Zhiyong recalls sitting in a meeting room watching demo videos, silent for a long time. Not because of shock, but because they recognized something familiar—the path they had walked was being retraced by a broader world.

In that moment, they realized that what they had built over three years was not just a programming tool, but a methodology for “enabling agents to truly understand human intent and execute continuously.”

“If this methodology can double engineers’ efficiency, why can’t it free everyone in every industry?” that was the question that led to ZeroFlow’s creation.

“One person can go faster”

From left to right: Shan Wenbang, Zhang Zhiyong

“I believe agents can genuinely improve everyone’s quality of life, and everyone should be liberated to do higher-level things. But the biggest current problem is that the cost for ordinary people to access this is still too high. It’s not just about owning a crayfish, but enabling that crayfish to freely cross devices and automate real-world scenarios for its owner. So, what we want to do is a universal agent with zero access cost—just open your browser and use it,” Zhang Zhiyong said.

“ZeroFlow is not just a replacement for programming assistants. It transfers the core paradigm of programming agents (understand intent → plan → call tools → execute continuously → feedback and iterate) to broader knowledge work scenarios. Financial analysis, operations, content creation, data insights… wherever there’s repetition, logic, and output, ZeroFlow can take root,” Shan Wenbang explained.

When asked why they didn’t pursue their ideal within their previous company, Zhang Zhiyong and Shan Wenbang exchanged smiles: “I think a group of people can go further, but one person can go faster. In this era, speed might be more important.”

Currently, Yiling Technology has received nearly ten million yuan in angel investment from individual angels and Hansheng Capital. The funds will mainly be used for further product development and promotion.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin