Decoding the Double-Edged Sword: The Role of LLM in Cybersecurity

outubro 3, 2024 | NSFOCUS

Large Language Models (LLMs) are essentially language models with a vast number of parameters that have undergone extensive training to understand and process human language. They have been trained on a wide array of texts, enabling them to assist in problem-solving across various domains. Security professionals are also exploring the potential of LLMs to aid in their daily tasks, including code auditing, vulnerability mining, and malware analysis. However, LLMs have also become a powerful tool for threat actors, bringing some negative impacts to cybersecurity.

The Help LLMs Provide

Code Security

LLMs, having learned from a plethora of code samples and security best practices, can generate more standardized code, avoiding common security vulnerabilities and reducing the chances of introducing new security flaws. Many vulnerabilities are essentially caused by non-standard coding practices (such as unreasonable memory usage, inconsistent serialization and deserialization, etc.), and LLMs’ adherence to secure coding standards can effectively prevent such situations.

Using LLMs for generating test cases is an important direction currently being explored in cybersecurity. Studies have shown that, compared to traditional methods, test cases generated by LLMs have higher coverage and can more effectively test software supply chain attacks. Fuzzing, a widely used technique for generating test cases and mining vulnerabilities, can be more efficiently performed with the aid of LLMs. Their natural language understanding capabilities allow for more effective generation and targeted modification of test cases, enhancing testing efficiency and coverage.

Existing static code scanning tools largely rely on manually maintained rule sets. LLMs can assist in generating and modifying these rules, reducing the cost of manual writing and maintenance. Traditional tools have limited semantic understanding of code and can only find vulnerabilities based on rules or pattern matching. In contrast, LLMs can understand code, so applying LLMs in scanning can detect more complex attack scenarios.

Figure 1 illustrates the distribution of abilities related to code security among different models. It can be seen that current LLMs each have their focus, and there is no all-rounder model that covers the entire software lifecycle yet.

Figure 1 LLMs for code security and privacy

Malware Detection and Analysis

Malware is a significant threat in modern network security, and rapid and accurate identification of malware can help protect systems, data privacy, and security.

Traditional detection tools often rely on static signatures or specific rules. LLMs, by learning from a vast array of malware samples, can extract common malicious code patterns and behavioral characteristics, assisting security personnel in faster and more efficient analysis when facing new variants.

Code obfuscation is one of the main methods for malware to evade detection. By learning numerous de-obfuscation methods, LLMs can be used to analyze obfuscated code, determine the software’s true intent, and help restore the original logic of malware.

LLMs can integrate multi-dimensional data for comprehensive analysis. Traditional detection methods such as NIDS (Network-based Intrusion Detection System) and HIDS (Host-based Intrusion Detection System) operate independently. LLMs can process data from both, analyzing system events and network traffic simultaneously for a more comprehensive identification of malware runtime behavior and feature extraction.

Personal Information Protection

Phishing is a common tactic used by malicious attackers, who deceive victims into entering sensitive information by impersonating highly similar websites and emails, leading to account theft and other malicious acts. LLMs can effectively identify websites and phishing emails containing phishing content, protecting user privacy.

PII (Personally Identifiable Information) detection is an essential part of privacy leak detection. Conventional detection methods are mostly based on regular expression matching or rules, which require manual maintenance and are prone to omissions and false positives. LLMs, with their powerful contextual understanding capabilities, can better determine whether information is PII in context. Additionally, LLMs can perform cross-language detection without the need to configure different rules for different languages.

The Malicious Use of LLMs

Although LLMs bring many security enhancements, their formidable capabilities have also been harnessed for malicious activities, leading to new security threats. Figure 2 shows the parts of attacks where LLMs can be involved, indicating that LLMs are used in various dimensions.

Figure 2 Taxonomy of Cyberattacks. The colored boxes represent attacks that have been demonstrated to be executable using LLMs.

Assisting in Attacks: Although LLMs cannot directly access operating systems or hardware, they can assist attackers in implementing attacks by analyzing information from operating systems. Research has shown that LLMs can assist in automating privilege escalation attacks, helping attackers discover system vulnerabilities and execute malicious operations. After attackers input system information, LLMs can analyze the existing vulnerabilities on the system and provide feasible attack plans. LLMs may also be used to attack network infrastructure, simulating and deploying complex phishing and man-in-the-middle attacks.
Writing Malware: LLMs possess powerful programming capabilities that can aid in the generation of malware. Directly generating malware with LLMs is often intercepted by underlying security measures. However, by breaking down software functions and using simple prompts to generate different parts of the code, a complete piece of malware can be created, such as ransomware or computer worms. To evade detection, LLMs can also be used to rewrite malware code. Code modified by LLMs may change its original binary characteristics, making it more difficult to be detected by traditional antivirus software. As the code generation capabilities of LLMs continue to improve, the risk of such malicious applications may further expand.
Attacks Targeting Users: The ability of LLMs to generate realistic text and reasoning can also be maliciously exploited. The most common application is social engineering attacks, such as phishing attacks and misinformation. Attackers can use LLMs to analyze known information to infer victims’ private information and generate highly realistic fake emails or messages to trick victims into revealing personal information or clicking on malicious links. In addition, LLMs can also be used to generate fake news or false information, further expanding the scope of information manipulation.

Summary and Outlook

LLMs have tremendous potential in the field of security, but they also bring new challenges. We need to utilize the positive applications of LLMs while being vigilant about their potential malicious applications and adopting effective defense measures. Through continuous research and innovation, we can better harness LLMs and contribute to building a safer digital world.

Reference:

Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, Yue Zhang, A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly, High-Confidence Computing, Volume 4, Issue 2, 2024, 100211, ISSN 2667-2952