Security Knowledge Graph | Drawing Knowledge Graph of Software Supply Chain and Strengthening Risk Analysis

Security Knowledge Graph | Drawing Knowledge Graph of Software Supply Chain and Strengthening Risk Analysis

October 5, 2022 | Adeline Zhang

The security knowledge graph, a knowledge graph specific to the security domain, is the key to realizing cognitive intelligence in cyber security, and it also lays an indispensable technological foundation for dealing with advanced, continuous and complex threats and risks in cyberspace. NSFOCUS published a series of articles about the application of the security knowledge graph in several scenarios. This paper introduces the application of knowledge graph technology in software supply chain security.

Rise and Challenges of Software Supply Chain Security

With the rapid development of software technology and the continuous progress of software development technology, third-party software products or open source components are often applied in the process of software development and integration. The security and reliability of software in its supply chain have gradually become critical security issues for the software industry. In recent years, rampant security incidents in the software supply chain have shown different characteristics. Compared with attacking software, the difficulty and cost of attacking the software supply chain will be significantly reduced, while the influence scope will generally be expanded. Since the attack can be concealed by multiple transmissions in the supply chain, it seems difficult to be identified and dealt with by security measures on the existing computer system. For example, Xcode unofficial version malicious code pollution incident, remote terminal management tool Xshell backdoor incident, open source component Fastjson deserialization vulnerability and other security incidents mainly involve the pollution of development tools, source code pollution, reserved backdoor, bundled download, software vulnerability, etc.

We have been concerned about software supply chain security for a long time. The APT attack on SolarWinds supply chain in 2020 is a typical case of software supply chain attacks. In February 2021, U.S. President Biden signed an executive order to construct a more secure supply chain for commodities, including information technology, which promoted software supply chain security to the public. Regulations and standards for software supply chain security in China have been increasingly improved. Information Security Technology – Security Criterion on Supplier Conduct of Information Technology Products (GB/T 32921-2016) constrains the supplier’s security requirements. Information Security Technology – Security Requirements of Software Supply Chain, led by China Information Technology Security Evaluation Center, is also in the preparation stage, which constrains the security requirements of both parties. This shows the importance of software supply chain security.

Compared with traditional software security, the software supply chain is facing increasing security risks, and network security incidents caused by its destruction are getting worse. First, the attack surface of the software supply chain expands from the vulnerability of the software product itself to the vulnerability of the software, components and services of the upstream suppliers. Any vulnerability in the upstream stage will affect all the downstream software, resulting in the continuous expansion of the attack surface of the entire software supply chain. Second, the expansion of attack surface significantly reduces the difficulty for attackers. Once any link in the software supply chain is attacked or tampered with, a chain reaction of security risks will be initiated, resulting in huge security hazards. In addition, the trend of open source software is increasing. According to Forrester Research, 80% to 90% of the code of application software generates from open source components. Therefore, the security of open source components is directly related to the security of information system infrastructure and has become an important factor in the growth of software supply chain security issues.

The security governance of the software supply chain can be conducted from the following perspectives:

1. Clear supply chain relationship: Suppliers, software, information, tools, services and upstream and downstream delivery links involved in the software supply chain shall be comprehensively sorted out to ensure no omission in the supply chain process.

2. Reference relationship between systems and components: Security issues existing in software source codes and tools shall be identified, with emphasis on the reference relationship between system and components, and between components.

3. Risk management of upstream and downstream software vulnerabilities in the supply chain: If upstream software vulnerabilities are introduced into the current link, downstream users applying its output as well as the information flow shall be tracked to ensure the influence scope knowable, controllable and traceable when problems occur.

4. Formulating an effective security protection scheme: Decisions or schemes shall be formulated for known security risks, which can be solved through replacement, upgrade and repair, while security protection and blocking can also be applied to avoid risks.

In actual enterprise security operations, the consideration of software supply chain security from the above aspects is far from enough. We shall also consider finer-grained security from the whole life cycle of software, such as SDL software security development life cycle, DevSecOps, security coding, software supplier evaluation and risk management, mature emergency response mechanism, etc.

Application Scheme Based on Knowledge Graph

Traditional Software Supply Chain Security System

In the narrow sense, software supply chain security solutions generally take open source software as the entry point, and track the application of open source software in systems and projects as the main content of software supply chain security. In this case, checking and managing open source components, and identifying the known vulnerabilities of open source components have become the main technical means of software supply chain security.

However, software supply chain security is by no means equal to open source components and vulnerability management. Traditional software supply chain security system is generally constructed by SDL software security development life cycle. Besides, system contents and technical elements belonging to software supply chain security exist in each stage. Figure 1 shows a simplified version of SDL-based software supply chain technical practice.

Figure 1 SDL Software Supply Chain Security System

It can be seen that in the SDL process, software supplier evaluation to product security detection and protection are all important links in the software supply chain security solution. At each stage, special technologies, products or methods will be applied to ensure the comprehensiveness and security of the process.

Traditional software supply chain security system lacks unified measurement standards or norms to evaluate the risks of the supply-demand relationship and software quality in the supply chain. In the current security operation scenario, the implementation results of each stage can be reflected through SOC (Security Operation Center) or security management platform. However, such results are scattered and not connected to reflect the overall results. Secondly, with SDL solution, specific associated risks in supply chain security cannot be found, and possible risk points can only be preset in advance for further security assessment.

Considering the deficiency of the above traditional technology system, we combine the correlation characteristics of software supply chain and analyze the risk of software supply chain security based on knowledge graph.

Knowledge Graph Construction of Software Supply Chain

Taking the current organization as the entry point, the upstream and downstream dependencies of the software supply chain will be collected. These dependencies can include component references, purchase of products, use of services, installation of software, download of applications, etc., as well as geographical location, open source community, and credibility of suppliers.

Figure 2 shows the dependency between objects of different software in the knowledge graph of a typical software supply chain. It can be seen that many of these nodes and processes have potential security risks and are vulnerable to attacks.

Figure 2 Knowledge Graph of Software Supply Chain

Risk Identification

Combined with the security knowledge graph of software supply chain, inconsistency is used in judgment. The query mode can be set to find out entities that should be the same but different and that should be different but the same in the knowledge graph. As shown in Figure 3, open source components A and B are supposed to be different, but they point to the same md5 hash value, so A and B may be at risk of fraudulent use. If the open source component A in the graph has two different md5 attributes, there may be the risk that the component A will be tampered with or disguised by other malicious software.

Figure 3 Risk Identification of Open Source Component

High-risk Software Recommendation

Software supply chain is obviously characterized by a long reference relation chain. If a software entity is published or exposed with potential safety hazards, we can quickly sort out the affected users and supplier entities according to multi-step correlation search of knowledge graph, and further carry out risk management and emergency response. As shown in Figure 4, user A has APP-I installed, while APP-I references open source component A and open source component B. If remote code execution vulnerability caused by CVE-2020-8840 injection exists in open source component A (FastJSON 1.2.67), the affected user A can be indirectly analyzed. The edges between nodes in the knowledge graph can well represent the reference, existence and dependency relationships between software and components, which can be completely applicable to multi-level dependency and analysis scenarios in software supply chain security. Through in-depth search, the corresponding features can be extracted as the input of the risk assessment model.

Figure 4 High-risk Software Recommendation

Based on graph pattern, high-risk open source software nodes can be identified, and the risk value of corresponding nodes can be calculated according to the number of references or update cycle of nodes. As shown in Figure 4, open source component B is referenced by multiple APPs. Therefore, once open source component B is exposed with security vulnerabilities, the influence scope will be expanded. Open source component B is then designated as a high-risk node. Meanwhile, from the perspective of open source community, open source components in active communities are updated in a timely manner and of high quality, while community components with untimely version iteration, update and maintenance have relatively higher security risks.

Analysis of Impact Scope

Based on all stored history information in the security knowledge graph, the potential threats are modeled, including development tool pollution, reserved backdoor, source code pollution, bundled download, upgrade hijacking, etc. The impact scope of identifiable threats can be directly analyzed through the graph. As shown in figure 5, CVE-2017-11882 remote execution vulnerability exists in Microsoft Office 2007, 2013 and 2016, which was applied by SideWinder organization in an APT attack. Through the reference relationship of the software in the knowledge graph, the affected terminal PC in the environment can be analyzed.

Figure 5 Analysis of the Impact Scope of CVE Vulnerability Exploitation

Mitigation Measures Decision Making

The attack and risk are combined to form <risk value, influence> pair, and construct the risk graph of the software supply chain. Based on the risk graph and the vulnerability solutions and protection strategies collected in the industry, corresponding security management measures can be formulated to mitigate the impact of attacks on the software supply chain. In practice, mitigation measures can be constrained so that only mitigation measures within the constraints are implemented (for example, within the specified cost, within the specified duration, etc.), and any mitigation measures that do not meet the constraints will not be put forward in advance for further consideration.

Conclusion

Knowledge graph can well represent the dependency and usage relationship in software supply chain security, and can detect risks by its own characteristics. In the complex scene of software supply chain security which shows both breadth and depth, the adoption of knowledge graph to ensure software supply chain security seems advanced and creative. However, in the current application and research, the relevant entities of the software supply chain, including suppliers, components, codes, tools, communities, application development platforms, are the key risk nodes in the software supply chain security. Currently, comprehensive collection of key nodes and accurate semantic expression are important challenges which deserve our further study.

Posts about Security Knowledge Graph: