The security knowledge graph, a knowledge graph specific to the security domain, is the key to realizing cognitive intelligence in cyber security, and it also lays an indispensable technological foundation for dealing with advanced, continuous and complex threats and risks in cyberspace. NSFOCUS will publish a series of articles about the application of the security knowledge graph in several scenarios. This article focuses on the application of the security knowledge graph in cyberspace mapping.
Figure Out Assets and Follow the Trend
There are endless varieties of assets in cyberspace. In particular, those assets and services exposed on the Internet have become the first choice for attackers, including hardware equipment, network equipment, security equipment, industrial control equipment, software, operating systems, portals, and applications. Targeted network protection can be implemented only after the basic information and current status of these assets are figured out. Therefore, the research on cyberspace mapping is becoming increasingly important. In recent years, cyberspace mapping has become a transdisciplinary frontier area integrating network communication technology, cyberspace security, and geography[1]. It models and expresses the attributes of considerable and complex cyberspace resources and their association relationships, and draws a “holographic map” of global cyberspace information to reflect changes in the state of cyberspace resources, cyberattack behaviors, etc.
Considering the complex and dynamic association relationships between cyberspace resources, the knowledge graph technology can be used to simply figure out resource exposure, sensitive data leakage, threat intelligence and other situations based on graph analysis, so as to track the attack process and attack results in real time. The cyberspace mapping knowledge graph allows discovering threats and risks, formulating preventive strategies in advance, and taking effective security measures, so as to achieve “tailor-made” security protection. There are considerable varieties of resources in cyberspace, such as network infrastructure, cloud platforms, industrial control systems, and IoT resources. Therefore, a comprehensive security ontology model needs to be built based on expert experience to completely express cyberspace resource information. Moreover, a complete knowledge graph for cyberspace mapping needs to be built, which can fulfill tasks such as intelligent Q&A and knowledge reasoning based on the semantic search, query, reasoning and other features of the knowledge graph, and master cyberspace assets and their status. At present, the security knowledge graph is mainly used for cyberspace asset risk analysis and group relationship analysis in cyberspace mapping.
Know Yourself and the Enemy and Take Appropriate Measures
Cyberspace Asset Risk Analysis
Cyberspace security protection is more than the building of a more reliable firewall. The degree of control over assets, asset vulnerabilities, user information, and IT architecture information often determines the upper limit of cyberspace defense capabilities. At present, asset management solutions are far from being as mature as the ideal. Especially, against the background of rapidly growing cloud computing, IoT, and mobile Internet, the assets see a sharply increased number and a richer variety of types, and face graver situations of vulnerability exposure. “Knowing yourself” is more critical than “knowing the enemy”. Both assets exposed to the public network and “black assets” not under management within the boundary will greatly increase the risk to cyberspace security protection.
Cyberspace asset risk analysis is mainly to assess the risk situation of assets, analyze the status of known assets and discover unknown assets based on active fingerprint detection and passive information collection, evaluate vulnerable high-risk hosts, hosts with malicious behaviors, IP addresses/domain names followed closely, domain name filing, etc., and integrate them into cyberspace assets, identities, data and other entities and their feature information, which are then input into the security knowledge graph to form an overall portrait and entity local portraits for cyberspace assets and support comprehensive and in-depth analysis of cyberspace asset risks.
Cyberspace asset risk analysis based on the knowledge graph needs to consider various entities in cyberspace and their attributes (basic information, vulnerability, compliance information, etc.), as well as their association relationships. The construction of the asset data graph requires the support of tools and services such as asset management, vulnerability management, and risk assessment, as well as business data such as enterprise organizational information, IT system architecture information, and human resource information to help enrich environmental entities and establish relationships. Cauldron[2] uses network security attributes to build network capture components, obtains the affected software and hardware through the vulnerability database, integrates vulnerability scanning tools and firewall data, etc., and analyzes network connections of vulnerable host services, as shown in Figure 1.
Figure 1 Cauldron topology vulnerability analysis[2]
Cyberspace asset risk analysis based on the knowledge graph requires real-time monitoring and analysis of changes in asset risks. The key to the analysis is to ensure the coverage of entity instances and accurate dynamic portraits. The recall of known types of entities includes the matching of feature fingerprints and behavior patterns and the quick recall of the listed entity type instances. The classification of unknown types of entities requires methods such as unsupervised or semi-supervised feature and behavior clustering, information flow or structural association analysis, and statistical frequent item mining to identify pattern information in unknown entity data and seek similarity and association with known types of entities. Therefore, it is necessary to carry out ontological modeling of assets, threat intelligence, relationships, vulnerabilities, knowledge bases, security behavior events, etc., and realize data fusion through association and matching to form a more complete asset-based security knowledge graph, which lays a foundation for subsequent downstream tasks and helps deal with the ever-changing cyberspace problems.
Group relationship analysis
The development of the Internet and the emergence of various social platforms have brought human social activities in the real physical environment into the virtual cyber world. In cyberspace, users tend to use nicknames or virtual identities to carry out various activities, which makes it difficult to identify their real identities. Criminals leverage the virtuality of cyberspace to commit crimes with forged identities, which has caused serious harm to society. Interest-driven attack groups and attack sources with relatively stable group control can carry out refined attacks on designated targets. Therefore, it is of great significance to maintain the security of cyberspace by mining massive big data based on association analysis of the knowledge graph, mining implicit relationships of cyberspace data using AI algorithms, effectively analyzing group behaviors in a timely manner, and locating and identifying attack groups.
The discovery of group behaviors is a typical application scenario of the knowledge graph in cyberspace mapping. Attack groups have often formed a closely collaborative network that seriously threatens cyberspace security. The key to the discovery of attack groups is the association graph generated based on cyberspace data and the discovery of communities on the graph. Common techniques for community discovery are based on modularity optimization, spectrum analysis, information theory, label propagation, or deep learning. The discovery of data-driven attack groups is an intelligence or behavioral data enhancement technology, which recalls suspected groups and describes their behavior patterns based on the structural association and feature association of dynamic intelligence data. It helps improve the evidence chain of attack events and the intelligence confidence[3]. Currently, the main challenge facing group relationship analysis is that traditional graph mining methods are difficult to effectively extract global features from graph models, thus ignoring potential association relationships.
With the continuous development of deep learning, embedding methods corresponding to fast processing of 100 million-scale data have been proposed. The graph embedding technology is a bridge connecting the knowledge graph and deep learning, so it is necessary to use the effective graph embedding technology[4][5] to improve the performance of group analysis. The graph embedding technology allows global analysis of the entire graph model, especially, to discover potential anomalies that cannot be found in the local association of groups. It can provide a global perspective to gain a clearer insight into the potential association of different entities. As shown in Figure 2, a data graph of authentication and access event behaviors is constructed, and the Louvain community discovery algorithm is used on the association graph to identify the cyber community of entities such as users and service devices at the graph structure level[6]. Then, the key access paths across group communities can be identified and located, to assist in dynamic strategy deployment.
Figure 2 Community discovery of authentication and access behaviors[6]
Conclusion
In the field of cyberspace mapping, the fragmentation of cyberspace data and the rapid change of resources and information result in the high cost of mapping recognition, thus posing considerable challenges to the governance of cyberspace security. In order to deal with pervasive threats, it is required to discover key entities and key relationships for cybersecurity protection, and comprehensively evaluate the potential impact scope and depth of the threat before and after the occurrence of the threat event, to ensure accurate identification of vulnerable surfaces. Cyberspace mapping based on the security knowledge graph is required to analyze whether there are vulnerable high-risk hosts, domain names that cannot be accessed normally, counterfeit malicious websites, or various types of sensitive information leaked to various libraries. In addition, it is necessary to integrate historical logs from multiple sources like terminals, webs, vulnerabilities, and threat intelligence, based on dynamic behaviors and asset environments, and to backtrack, refine, and reconstruct the attacker’s behavior data for aggregation, profiling, and multi-dimensional assessments. Moreover, the exposure of considerable and complex information assets on the Internet, sensitive data leakage, asset filing information, threat intelligence, changes in common ports or services, etc. need to be fully understood. Moreover, the attack process and attack results need to be tracked in real time, so that possible risks can be predicted in advance and relevant protective measures can be taken.
References
[1] Chundong G, Qiquan G, Dong J, et al. Theoretical Basis and Technical Methods of Cyberspace Geography [J].GEOGRAPHICAL RESEARCH, 2019,74(9):5-18.
[2] Jajodia, S., et al., Cauldron mission-centric cyber situational awareness with defense in depth. IEEE, 2011.
[3] http://blog.nsfocus.net/wp-content/uploads/2020/12/AISecOps_White_Paper_NSFOCUS_20201218.pdf.
[4] Cavallari, S., et al. Learning Community Embedding with Community Detection and Node Embedding on Graphs. in the 2017 ACM. 2017.
[5] Rozemberczki, B., et al., GEMSEC: Graph Embedding with Self Clustering. 2018.
Posts about Security Knowledge Graph:
- Security Knowledge Graph | Technologies and Applications of the Security Knowledge Graph
- Security Knowledge Graph | Build an APT Group Graph to Avoid the Information Island Effect
- Security Knowledge Graph | APT Group Profiling and Attribution