OpenClaw Open Source AI Agent Application Attack Surface and Security Risk System Analysis

February 12, 2026 | NSFOCUS

Background

In early 2026, OpenClaw (formerly known as Clawdbot and Moltbot), an open-source autonomous AI agent project, quickly attracted global attention. As an automated intelligent application running in the form of a chatbot, it allows users to input natural language commands through Web pages and IM tools (such as Telegram, Slack, Discord, etc.) to achieve high-authority tasks such as email reading and writing, calendar management, browser control, file operations and even Shell command execution. With its fully local deployment and strong autonomous execution capabilities, OpenClaw’s GitHub Star count soared to 183K in just a few weeks, becoming one of the fastest growing open source AI projects in recent years, and its influence has rapidly expanded from the developer community to the global technology industry.

Openclaw from Star Spike to Consecutive RCE Disclosures

However, there are serious security risks behind this explosive growth. The project exposed at least 3 high-risk RCE vulnerabilities in a very short period of time; at the same time, frequent name changes also caused supply chain risks, including domain name/package name registration, GitHub and X account impersonation, etc. Code security and unstable identification together pose long-term security risks.

This paper systematically analyzes the core attack surface of OpenClaw, focusing on its architectural design and case analysis of known high-risk vulnerabilities. It deeply dismantles the complete exploitation chain and actual damage path of typical vulnerabilities, and combines them with current real threat scenarios to help developers, enterprise users and security practitioners rationally examine the true costs hidden behind this AI agent application boom, avoiding ignoring the huge potential security risks in the pursuit of efficiency.

OpenClaw Architecture Analysis and Threat Insight

Architecture Analysis and Attack Surface Analysis

Security Risks in OpenClaw’s Architectural Design

OpenClaw uses a hierarchical architecture that deeply couples social IM software with automated agents. The command input of this architecture starts with the IM integrated gateway, which is responsible for connecting to third-party communication software such as Telegram and the Web console to introduce unstructured commands into the system. Then, the instructions enter the core agent system, where LLM (large language model) performs task orchestration and decision reasoning, and relies on context and memory management modules to maintain the continuity of complex tasks. To enable the agent to operate in the real world, OpenClaw opens up the agent’s ability to call command execution and file reading and writing of the underlying operating system. At the same time, with the help of Skills and MCP plug-in systems, the system can flexibly expand various tool capabilities, such as querying network information, calling APIs, operating software, etc. At the ecological base of the entire architecture, the ClawHub community market provides a dynamic distribution and loading mechanism for plug-ins.

The layered architecture not only gives OpenClaw flexibility and scalability, but also introduces multi-dimensional security risks and attack surfaces. Its main attack surface covers direct/indirect prompt word injection at the source of instructions, configuration errors in the authentication link, and permission abuse and supply chain poisoning attacks at the downstream execution end.

Entry layer: Instruction forgery and configuration defects: Due to the openness of chat scenarios (such as group conversations), attackers can use information noise to implement direct prompt word injection and bypass preset instructions. In addition, the authentication strength of the gateway API directly determines the security boundary of the backend. Incorrect permission configuration will cause the API gateway to become a springboard for remote code execution.
Decision-making layer: Logical manipulation and memory poisoning: The core threat at this level lies in prompt word injection against LLM logic. An attacker can induce the orchestrator to deviate from its intended goal through malicious dialogue. Even more insidious is memory poisoning, which causes the agent to produce persistent security biases in subsequent decisions by embedding malicious strategies in context or long-term memory.
Execution layer: High-privilege abuse and actual destruction: OpenClaw usually has high permissions (such as root permissions) when deployed (especially in server environments). Once exploited by malicious instructions, it will evolve into a catastrophic system control risk. At the same time, the introduction of Skills and MCP plug-ins has increased the frequency of attacks. Malicious tool calls may lead to sensitive data leakage or have unintended effects on the real physical environment.
Ecological layer: Supply chain poisoning and ecological pollution: The ClawHub community market has formed a typical supply chain risk. If there is a lack of strict code auditing and signature verification, attackers can achieve code poisoning by publishing Skills plug-ins containing malicious prompt words and codes. When users load such plug-ins with one click, attackers can gain persistent resident capabilities in the victim’s environment.

OpenClaw Asset Exposure in the Wild and Risk Analysis

Although the reverse proxy mapping public network helps a large number of OpenClaw instances to go online quickly, it also buries the core contradiction between convenience and security. Using reverse proxy tools such as Nginx and Caddy to directly map the OpenClaw web panel to the public network is the current mainstream convenient deployment method. It can not only avoid changing the native operating environment of OpenClaw, but also flexibly implement domain name binding, HTTPS activation, load balancing and access control. Therefore, it is widely used by individuals and small teams to quickly go online. However, this method bypasses the internal network isolation and unified entry control, allowing a large number of instances to be directly exposed to the Internet, forming significant external visibility, which brings challenges to subsequent in-the-wild asset discovery and security governance.

OpenClaw Global Asset Mapping Statistics

The mapping data of the entire network shows that OpenClaw’s assets in the wild have grown rapidly in the short term, and China has surpassed the United States to become the world’s largest deployment area. At the beginning of 2026, there were only a few examples in the world, and then it quickly entered the scale of 10,000. By mid-February, it had accumulated to tens of thousands. In terms of regional distribution, China’s asset volume has gradually caught up with and surpassed the United States. As of press time, it has surpassed the United States by about 14,000, ranking first in the world.

The rapid expansion of asset scale and specific security shortcomings have made OpenClaw’s deployment in the wild face high risks in multiple dimensions. First, sensitive industry assets are exposed. For example, instances of critical infrastructure such as finance are exposed to the public network and can easily become the focus of attacks; Second, historical vulnerabilities amplify the threat. The reverse proxy configuration errors that have occurred have caused unauthorized access vulnerabilities, which still exist in some public network assets and can lead to a rapid expansion of the attack impact area; Third, homogeneous assets are prone to cause batch risks. Rapid deployment of templates makes a large number of instance structures converge. Once the template is defective or has serious vulnerabilities, the risk will spread rapidly along the same asset chain; Fourth, the scope of harm is expanded by associating permissions. OpenClaw not only involves IM chat group data, but also associates host control permissions and sensitive business data. Once compromised, it can easily become a springboard for infiltrating the intranet and stealing core data.

In the face of this series of emerging and large-scale security exposures, it is urgent to establish discovery and governance capabilities for OpenClaw’s wild assets. Accurate identification and dynamic monitoring shall be carried out to realize early detection and disposal of risks, so as to prevent potential safety hazards from turning into actual attacks.

Case Analysis of High-risk Vulnerabilities in OpenClaw

The current focus of discussion in the security community and cybersecurity companies on OpenClaw’s security issues has shifted to the secondary risks introduced by its Skills ecosystem, such as abuse of plug-in capabilities, unclear permission boundaries or third-party component supply chain problems. But in fact, as a typical Vibe Coding project, OpenClaw itself has many security weaknesses at the code level, which also deserves high attention.

Vibe Coding emphasizes rapid generation and iteration. Under the development rhythm of efficiency first, it often weakens systematic security design, threat modeling and strict audit processes, and is prone to introduce basic security issues such as insufficient input verification, missing authentication, sensitive information exposure, and chaotic dependency management.

These problems do not stem from ecological expansion, but from the structural risks brought about by the development paradigm itself. Therefore, we should pay more attention to the security crisis brought about by OpenClaw’s own development method and code quality.

Unauthorized Vulnerabilities Caused by Incorrect Reverse Proxy Configuration of OpenClaw

January 25, 2026: Twitter user theonejvo discovered an unauthorized vulnerability in OpenClaw’s Nginx reverse proxy scenario. By design, OpenClaw automatically allows “local connections” by default, so that when it is deployed on reverse proxies such as Nginx/Caddy, all requests appear to come from 127.0.0.1 on the backend. As a result, it is regarded as a trusted local connection. At the same time, if trustedProxies are not correctly configured or mandatory authentication is enabled, any user will directly access the control interface, which has high permissions such as proxy configuration, credential storage, conversation history and command execution. Eventually, this vulnerability evolved into a complete takeover of the AI agent.

Unauthorized Direct Interaction with Chat, Bash Tool Directly Executes System Commands

Obtain Configuration Information Such as Telegram in the Configuration Column

OpenClaw 1click Vulnerability-Modify Gateway Address-CVE-2026-25253

January 26, 2026 DepthFirst discovered a vulnerability again. It arbitrarily modified the gateway address. App-settings.ts directly accepted the query parameters in gatewayUrl and saved them to storage. After setting the gateway, it immediately triggered the connection operation and sent authToken to the new gateway connection handshake. This will generate a complete attack chain. When the victim accidentally clicks on the phishing constructed by the attacker around this vulnerability, from stealing tokens to creating a WebSocket connection to local port 18789, and finally directly executing arbitrary commands.

OpenClaw Command Injection Vulnerability-CVE-2026-25157

There are two command injection issues in the processing of SSH remote connections. The SSH target string is parsed in ssh-tunnel.ts, but host names starting with a short dash (“–, -“) are not prohibited, which allows attackers to construct malicious SSH targets similar to -oProxyCommand=… The client will parse it as a command line option ssh, resulting in local command execution.

The OpenClaw project’s own handling of issues is also a cause for concern. Its GitHub repository has a short-term backlog of more than 6,700 issues, far exceeding traditional open source projects. A large number of issuesuses were not handled in a timely and adequate manner, exposing lags in maintenance responses, which may mean that there are deep-seated code security issues lurking in the project that have not been discovered or resolved.

Given that OpenClaw, as a representative product of the Vibe Coding era, is highly sought after but generally ignores security pre-positioning in rapid deployment and direct launch. In addition, its own maintenance lags behind and a large number of wild assets are exposed, resulting in significant accumulation of security risks. We urge users, especially individuals and small and medium-sized teams, to rely on the cloud service sandbox environment for comprehensive testing and evaluation during the trial phase. Do not connect it directly to the enterprise production environment or personal home network to prevent unknown vulnerabilities and large-scale attack risks.

OpenClaw Attack Surface Actual Combat Case Analysis

Risk Case 1: “Security for Convenience”, blind trust in AI agents amplifies the risk of malicious manipulation

Highly autonomous AI agents represented by OpenCode and OpenClaw have gradually achieved large-scale deployment in the user’s local environment, and deeply integrated operating systems, development tool chains and various third-party services through plug-in mechanisms such as Skills. Such agents usually hold user-configured context information, access credentials and execution permissions during operation to support the continuous execution of automated tasks. In actual application scenarios, agents often can directly execute system commands, read and write local files, initiate network requests, and can establish direct connections with high-value targets such as encrypted wallets, trading platforms and enterprise-level business systems. As the agent’s capability boundaries and authority scope continue to expand, its specific behavior path increasingly depends on the input content and the execution logic of the expansion mechanism (Skill/plug-in). The prompt word input and Skill call process gradually evolve into the core control surface that affects the agent’s behavior.

However, in actual deployment and use, if there is a lack of real-time verification and constraint mechanisms for the decision-making process and execution link of agents, over-trusting their autonomous decision-making capabilities can easily lead to systemic security risks. Taking a real case as an example, the attacker can implement indirect prompt word injection by constructing malicious emails (Indirect Prompt Injection), thereby interfering with OpenClaw’s normal email processing logic and inducing it to execute the attacker’s preset malicious instructions without explicit authorization. Analogous to the traditional security attack and defense model, this attack mode can be approximately regarded as a “0-click” email-triggered remote code execution attack path, that is, the attack link triggering and execution control can be completed without user interaction. Its attack effect is highly similar to the historically known “triangulation” attack model in terms of structural characteristics and control mechanism, reflecting the systematic risk characteristics of the new automated attack surface under the AI agent architecture. The more external information sources and tools are connected, the easier it is for malicious instructions to be uncontrollable. In the instruction data, not only emails, a malicious web page, but also files sent by chat boxes may become the initial vectors of attacks.

Indirect Prompt Injection via Email-Assisted OpenClaw [2]

In actual use, some users directly use highly autonomous agents such as OpenClaw in high-risk operation scenarios, exposing sensitive operations and credentials that should have been strictly isolated and protected by multiple verifications to the agent execution link. The traditional security system relies on access control, permission classification and security audit mechanisms built over many years, which are structurally bypassed in the automated execution mode of agents. For example, in actual cases, users directly authorize the agent to perform cloud server automated operation and maintenance tasks through AK/SK credentials, so that cloud resource control permissions are unconditionally embedded in the execution path of the agent. Once the agent’s behavior path is hijacked by input pollution, prompt word injection or skill extension logic, its impact will directly extend to the cloud infrastructure control layer, forming a rapid cross-layer propagation risk link from “agent out of control” to “cloud resource control out of control”.

OpenClaw is connected to the automatic operation and maintenance tool, with instance operation capabilities

Risk Case 2: Skills plug-in system supply chain risks superimposed on the lack of isolation mechanism, amplifying the threat of poisoning

NSFOCUS Tianyuan Lab conducted a systematic analysis of Skills security risks in the article “Analysis of the Attack Surface in the Agent Skills Architecture: Case Studies and Ecosystem Research“. After the popularity of OpenClaw, we observed that the security risks of Skills in terms of architectural design and lack of ecological supervision have spread to OpenClaw.

ClawHub is the official Skills plug-in distribution platform officially launched by OpenClaw, which hosts more than 3k open source Skills; it supports one-click plug-in installation and deployment through OpenClaw’s CLI client. And the threshold for uploading custom skills is extremely low, you only need to register a Github account that does not require real names.

Clawhub is the Official Skills Distribution Channel for OpenClaw

The platform did not set up an effective security audit and content verification mechanism in the early stage of its launch (the first month of operation), resulting in a large number of malicious plug-ins (Skills) quickly pouring into the ecosystem. Research and analysis show that among the more than 3,000 Clawhub Skill samples collected, a total of 336 malicious poisoning samples were identified, accounting for about 10.8%, showing significant large-scale penetration characteristics and systemic risk levels. The security situation cannot be ignored.

Further analysis found that the main attack mode of malicious Skills is “download-execute” as its core feature. Most samples achieve persistent control or subsequent attack chain deployment by remotely pulling scripts or binary payloads and executing them directly locally. In the typical example shown in the figure below, the attacker first encodes the malicious instructions into Base64 to achieve obfuscation and evasion detection, then decodes and restores the encoded content during the runtime phase, and finally downloads the attack script from the remote server through the curl command and executes it directly, thus completing the complete “obfuscation → decoding → downloading → execution” attack chain.

February 7, 2026, OpenClaw officially announced a partnership with VirusTotal to focus on security governance work for the Skills plug-in ecosystem in its ClawHub market. Through the VirusTotal Code Insight platform, automated security analysis has been performed on more than 3016 OpenClaw Skills samples, and malicious Skills identified have been removed from shelves and deleted. So far, the malicious Skills previously disclosed in the ClawHub market have basically been officially removed from the shelves. However, the study found that due to historical backup and synchronization mechanisms, OpenClaw’s official GitHub repository still retains a large number of copies of early Skills projects, which contain some sample code that has been judged to be malicious. Technically, the relevant warehouse content can still be directly accessed and downloaded by users, which poses a potential risk of secondary transmission. It is recommended that users remain highly cautious when using and integrating related resources, and verify them in combination with security audits and code traceability mechanisms.

Summary

As a phenomenal AI agent application, OpenClaw’s core innovation lies in the deep combination of IM chat software and highly autonomous agents, with group chat as the main interactive entrance, which greatly reduces the threshold for ordinary users to use and truly allows AI to move from professional tools to daily productivity.

However, this convenience also exposes an obvious security cost: the loose permission architecture design and the lack of a strictly controlled plug-in ecosystem bring multiple real risks. Project code security issues are frequent, and multiple high-risk RCE vulnerabilities have been exposed in a short period of time; the design concept of “safety for convenience” leads to blind trust in the capabilities of agents, which can easily amplify the consequences of malicious operations; the supply chain risks of the Skills plug-in system are superimposed on the lack of isolation mechanisms, further amplifying the possibility of poisoning and malicious code execution. These problems are no longer theoretical hidden dangers but have generated real threats in the wild.

The key to solving the common security risks of OpenClaw-like AI agents is to build a full life cycle, systematic security fence and security assessment capabilities. Strictly follow the principle of least privilege from the design stage, introduce sandbox isolation for all tools and plug-in execution at runtime, strengthen the filtering and isolation of external input content, and establish automatic discovery, continuous risk assessment and dynamic reinforcement capabilities for agent assets. Only by truly integrating “assessment + fencing” throughout the entire life cycle of an agent can we effectively control its potential risks while unleashing powerful autonomous capabilities, allowing AI agents to truly transform from “high-risk experimental products” into credible and controllable productivity infrastructure.

NSFOCUS AI Red Teaming

At a time when open source AI agent applications such as OpenClaw are growing explosively, agent security has been upgraded from “prompt word confrontation” to system-level vulnerability exploitation and full-link challenges of runtime risks. NSFOCUS AI Red Teaming security assessment helps enterprises quickly build a protection system with the closed-loop capability of “intelligent generation of attack surface → automatic identification at runtime”. Core advantages of the platform:

Full link coverage: automatic mapping from the attack surface to runtime behavior monitoring, built-in 70+ attack template matrices and intelligent generation engine
Targeted jailbreaking and exploitation: Rely on prompt variants coupled with dynamic business context + deep integration of mainstream frameworks such as Dify and n8n to accurately explore application layer risks such as command injection and sensitive file reading
Automated closed-loop assessment: The behavior analysis engine automatically constructs attack payloads, performs penetration tests and outputs executable reports to achieve a one-stop closed loop from risk discovery to remediation suggestions
Practical orientation: Deeply integrate the practical experience of the NSFOCUS Red Teaming to provide full-link capabilities for asset discovery, risk quantification and continuous verification for complex threats in the era of agents.

NSFOCUS AI-Guardrails

NSFOCUS AI-Guardrails is a professional security protection product for LLMs and agent applications. It provides real-time online detection and protection against the unique risks of multimodal content security, sensitive data leakage, malicious prompt word injection and computing power abuse in AI interaction. The product strictly complies with relevant national security regulations, has passed the authoritative government LLM security test, and is deeply adapted to domestic computing power chips such as Huawei Ascend and Hygon K100. It aims to provide safe, compliant and reliable AI application security solutions for customers in various industries such as government, finance, and energy, and ensure the stable and healthy development of intelligent business.

Five major security models linkage: Integrate five core models of content values, prompt word attacks, data security, computing power DDoS detection and secure answering, covering the full-link risks of AI interaction in one stop.
Model-based governance, deep intelligent protection: Adopting the core concept of “model-based governance”, we deeply understand the behavioral logic of LLMs through self-developed dedicated security models. It can not only intercept keywords, but also accurately identify illegal content, deep attack intentions and sensitive data leakage risks from the semantic level, realizing intelligent deep confrontation.
Meet strict compliance requirements and reduce regulatory risks: The product has passed the first batch of LLM security tests by the Ministry of Industry and Information Technology of China, and its detection and protection capabilities are strictly benchmarked against national standards, providing authoritative and reliable compliance guarantees for industry customers and significantly reducing the risk of AI application violations.
Prevent new AI risks and ensure business security: The product has built-in security models that have been fine-tuned and optimized by NSFOCUS, which can provide one-stop defense against core risks such as content violations, prompt word injection, data leakage, and computing power exhaustion, ensuring the stable and reliable operation of AI applications.
Customization of industry knowledge base to achieve customized protection: Customers can import customized sensitive keywords and negative knowledge base based on their own industry characteristics (such as finance, medical care, law) and internal compliance requirements to achieve “tailor-made” accurate detection and protection.
Lightweight and flexible deployment, finely configurable policies: The product adopts lightweight container deployment, low resource usage, and convenient integration. Supports rapid docking of various AI services through API, and allows the configuration of differentiated detection strategies and response actions for different business scenarios to achieve highly adaptable matching between security and business.

References

[1] https://depthfirst.com/post/1-click-rce-to-steal-your-moltbot-data-and-keys

[2] https://x.com/Mkukkk/status/2015951362270310879