Incident Review
In April 2026, Black Hat Asia 2026 disclosed a systematic security threat named Ghost Bits, targeting underlying encoding flaws in the Java ecosystem that can render mainstream WAF/IDS defenses completely ineffective.
The core of this risk lies in inconsistent encoding interpretations of the same input between the security detection chain and the application execution chain, which may result in the frontline protection judging the input as harmless while the backend execution restores it to high-risk semantics. This issue is essentially an “end-to-end semantic inconsistency” rather than a defect of a single component.
1. Ghost Bits Encoding Principle
Ghost Bits can be understood as high-order bits that are silently discarded but affect security semantics during the narrowing conversion from characters to bytes.
Taking Java as an example:
- char is 16-bit (UTF-16 code unit)
- byte is 8-bit
- When code performs operations such as (byte) ch, ch & 0xFF, or write(int) writing only the lower 8 bits, the upper 8 bits are discarded
This means a Unicode character will “degrade” into another byte value in certain chains. Attackers can exploit this discrepancy to construct inputs with inconsistent front-end and back-end semantics: what the front-end detection sees as A is actually B when executed at the backend.
2. How Attackers Exploit Ghost Bits Encoding to Bypass Detection
Leveraging the “silent high-bit truncation” feature, attackers replace critical ASCII characters in attack payloads with carefully crafted Unicode characters (whose lower 8 bits match the payload). The WAF sees the Unicode characters as harmless, while the backend Java server truncates the high-order bits during decoding and only selects the lower bits to restore the attack payload, thus bypassing WAF detection and enabling actual command execution.
Risk Assessment: From WAF Bypass to Total Compromise
The harm of Ghost Bits attacks lies in “one flaw, multiple exploits”—using the same underlying defect to trigger multiple high-risk attack chains:
| Attack Type | Affected Components/Scenarios | Severity |
|---|---|---|
| SQL Injection | Jackson charToHex truncation, payload embedded using steganography in Unicode | Critical |
| Deserialization RCE | BCEL ClassLoader, fastjson \u/\x escaping with Ghost Bits | Critical |
| File Upload Bypass | Tomcat RFC2231Utility truncates when processing filenames, .jsp can be disguised as harmless characters | Critical |
| Path Traversal/Authentication Bypass | URL decoding path flaws in Spring, Jetty, Undertow, Vert.x and other frameworks | Critical |
| Bypass of Known High-Risk CVEs | Direct bypass of existing WAF protections for GeoServer CVE-2024-36401 (CVSS 9.8), Spring4Shell (CVE-2022-22965), etc. | Critical |
| SMTP Injection | Angus Mail and other mail libraries can restore steganographic CRLF sequences to line breaks, reproduced on Jira and Confluence | Critical |
| HTTP Request Smuggling/XSS | Apache HttpClient (≤4.5.9), JDK native HttpServer affected by CRLF | High |
Disposal Priority: High. This vulnerability requires no permission, no user interaction, and can be triggered under default configurations. Public POC/EXP is available with low exploitation threshold and medium repair complexity.
NSFOCUS WAF: Semantic Detection at the Decoding Layer to Unveil “Ghosts”
Against Ghost Bits encoding bypasses, WAF rules relying solely on string feature matching are inadequate. NSFOCUS WAF has completed targeted defense deployment—not a post-incident remedy, but pre-incident immunity.
NSFOCUS WAF’s solution: Conduct semantic detection at the decoding layer to eliminate “end-to-end semantic inconsistency” from the source.
1. Current Product Capabilities: Unicode Ghost Bits Detection Enabled by Default
NSFOCUS WAF currently supports detection of Unicode-type Ghost Bits encoding bypasses, enabled by default in all configurations:
- Version 6090: Unicode decoding is enabled by default in the Web decoding engine, directly identifying and alerting on Ghost Bits modified payloads.

- Versions 6081 & 6073: Equivalent protection is achieved by enabling Unicode decoding in the semantic analysis engine.

2. Detection Verification: Actual Interception with Verifiable Records
Taking SQL injection detection as an example, NSFOCUS WAF has completed targeted verification:
| Stage | Payload Example |
|---|---|
| Original Attack Payload | 1 or 1=1 |
| After Unicode Encoding | %u0031%u0020%u006F%u0072%u0020%u0031%u003D%u0031 |
| After Ghost Bits Encoding (Attack Modification) | %u0131%u0120%u016F%u0172%u0120%u0131%u013D%u0131 |
| NSFOCUS WAF Detection Result | Successfully alerted and intercepted in 6090/6081 |

Alert on version 6090:

Alert on version 6081:

Both standard Unicode encoding and Ghost Bits-modified payloads can be semantically restored by NSFOCUS WAF at the decoding stage, revealing hidden “ghosts” before they reach the business system.
Building a Defense-in-Depth System
WAF is a defense line, but not the only one. For Ghost Bits attacks, a defense-in-depth system is recommended across five dimensions:
1. Unify Input Semantics: Fixed UTF-8 Across the Entire Chain
Use fixed UTF-8 encoding/decoding across the entire chain and prohibit “automatic encoding guessing”. Inconsistent encoding is the breeding ground for Ghost Bits attacks; a unified encoding standard greatly reduces the attack surface.
2. Input Normalization + Whitelist Validation
- Perform Unicode normalization (NFC/NFKC) before business validation.
- Implement character set whitelists for high-risk fields (username, filename, SQL-related parameters, paths).
- Explicitly reject invisible control characters, abnormal obfuscated characters, and unexpected character sets.
3. Database Execution Layer: Parameterized Query as a Safety Net
Parameterized queries are the ultimate insurance against injection. Even if the front-end WAF is bypassed, parameterized execution at the database layer blocks the actual effectiveness of attack payloads.
4. Code Audit: Identify High-Risk Coding Patterns
Focus on auditing typical Ghost Bits patterns in business code:
- (byte)ch
- ch & 0xFF
- baos.write(ch)
- DataOutputStream#writeBytes()
Replace with secure coding practices that explicitly specify encoding, such as String.getBytes(StandardCharsets.UTF_8).
5. Network Layer: Reduce Attack Surface
For Java application services exposed to the public network, restrict access sources. Before code repair is completed, reduce exposure via IP whitelists, VPNs, etc.
Conclusion
Ghost Bits attacks once again prove that security protection cannot only look at “surface characters”—it must delve into underlying semantics. With decoding-layer semantic detection capabilities, NSFOCUS WAF completed defense deployment before the Ghost Bits threat was publicly disclosed. Against increasingly sophisticated bypass techniques, “pre-incident immunity” is always more valuable than “post-incident remedy”.