Literally, AISecOps is composed of three core technologies, i.e. AIOps, AISec, and SecOps.
AISec-enabled technology fusion brings new expectations to the industry. Both AI security and AI-based security applications have become hot topics in academia and industry. AI has been successfully applied in multiple single-point security technologies and specified scenarios, such as malware classification, identification of malicious traffic, and intrusion detection.
AIOps (intelligent IT operations) is a research focus in the whole Internet and intelligent computing fields[i]. It focuses on anomaly detection, root cause orientation, alert analysis and diagnosis, and other critical technologies in complex IT system environments. Unlike SecOps, AIOps lacks systematic modeling of core risk factors, such as network threats, vulnerabilities, and assets. In addition, AIOps-related technological experience cannot be used in SecOps scenarios. Serving as both an application scenario and objective, SecOps mainly consists of three core elements: process, person, and technique. Here, we focus on the technology element. For traditional security operations, technical capabilities are provided by security experts, including alert classification and grading, threat hunting, sample analysis, and threat traceback. However, security experts’ operations capability falls short of what is required to respond to quickly expanded protection requirement. The severe talent shortage and bottleneck is increasingly apparent. Thus, it is pressing to explore the AISecOps solution.
This post summarizes core connotations of AISecOps to make clear how this technique is implemented and evolves:
“Geared towards security operations goals, AISecOps, based on integration of personnel, processes, and techniques, and data, serves security risk control and key defense phases, including prevention, detection, response, predication, recovery, and other critical links in cybersecurity risk control and attack and defense confrontation. Establishing a data-driven and highly automated trustworthy intelligent security technology stack, it provides the perception, cognition, and action capabilities and even replace manual security operations services in a dynamic environment.”
Unlike single-point integration of intelligent technologies and the security field during AISec practices, AISecOps, in alignment with core operations indicators, implements systematic, in-depth, and multidimensional intelligent technique solutions to adapt to different security operations phases and scenarios. This imposes new requirements for the robustness, credibility, and security of AI technologies.
As mentioned above, security operations goals guide the development of technical capabilities. Considering critical security operations requirements, here we present AISecOps’s indicator hierarchy from top to bottom: vision, operational indicators, and technical indicators. Technical indicators can be further divided into data indicators and analysis indicators. Security operations goals give direction and guidance to cybersecurity operations capabilities. AISecOps’s indicator hierarchy is used to access the effectiveness of technical implementations. At the top of the hierarchy is the security operations vision of the enterprise, organization, or country. Under the guidance of the vision, security operations indicators are developed shown in the middle of the hierarchy. Furthermore, the indicators are broken down into data indicators and analysis indicators at the bottom of the hierarchy.
The vision refers to core goals for security, services, and business of enterprises, organizations, and countries, such as maintaining stable operation of IT infrastructure, protecting core data assets, and ensuring the security of the brand value. The vision is inseparable from the development goals of subjects.
Consistent with the security operations vision, operational indicators are developed to access security operations capabilities. The data fusion capability and data analysis capability are evaluated to promote the iteration of technical capabilities.
In terms of data, we need to consider such indicators as the coverage ratio, standardization, storage timeliness, diversity, and interaction. In addition to technique (such as machine learning) assessment indicators like predication accuracy, recall rate, and ROC, we should focus on the scenario coverage ratio, TOPN recall rate/false rate positive, overall/single-point false rate positive, model interpretability, and other indicators for operability and ease of operations, with a view to promoting the deep integration between techniques, personnel, and processes.
- Currently, access to massive multidimensional security big data opens a new door to discovering and handling network security threats through data analysis. Considering limited resources available for storage and computing, it is especially important to identify security data sources and manage them in a unified way. With an aim to protect assets and crack down on threat actors in the given cyberspace, intelligent data analysis should focus on data collection and development of the following data graphs. This makes a sharp contrast to DIKW’s[ii] and hierarchical data model and CyGraph’s[iii] cybersecurity/mission knowledge stack:
- Environmental data graph: presenting assets, vulnerabilities in assets, files, users, and the IT system architecture.
- Behavioral data graph: including network-side alerts, device-side alerts, file analysis logs, application logs, honeypot logs, and sandbox logs.
- Intelligence data graph: various types of external threat intelligence.
- Knowledge data graph: various types of knowledge bases (such as ATT&CK[iv], CAPEC[v], and CWE[vi]).
By reference to the classical paradigm of AI (perception – cognition – decision-making – action) and classical version of the OODA loop model (Observe – Orient – Decide – Action)[vii], this framework divides the security operations process into several stages, each of which involve different child tasks. The following details each stage and related child tasks.
- Perception: data fusion and information tagging, including identification and detection child tasks. Identification child tasks categorize, deduplicate, and standardize entities (assets, signatures, vulnerabilities, etc.) and their behaviors in massive data to promote fusion of multisource heterogeneous data. Detection child tasks capture and tag anomalies, vulnerabilities, threat signatures, and other critical dynamic and static information from the massive data pool to provide critical clues for threat analysis, hunting, and risk analysis.
- Cognition: retrieval and building of clues and event context information through child tasks for association, traceback, and predication. Association child tasks provide an exhaustive information connection view through integration of various types of multidimensional information that spans a long time period. Traceback child tasks, through traceback and root cause analysis, identify and ascertain event sources and determine the casual relationship and dependency between events. Based on the current information context, predication child tasks rely on path predication and trend analysis to predict potential attacks and high-risk vulnerabilities, so as to getting ahead of attacks by rapidly identifying the attack intentions and adopting appropriate protection methods.
- Decision-making: generation of assessment and creation child tasks through risk assessment in accordance with the predefined goals. In alignment with the core operations indicators, assessment child tasks, based on critical information such as behavior, environment, and knowledge, provides the ongoing overall situation and network risk level, informing optimal risk reports under a certain operating cost. Based on dynamic environments and behaviors, creation child tasks adaptively choose and generate effective risk-informed action plans and policy to clarify specific action steps.
- Action: accomplishing action goals through collaboration of action units, in accordance with plans, policies, and steps. This phase involves child tasks for response and feedback. Response child tasks involve policy dispatch, device deployment, patch update, error tolerance, and other risk response actions by platforms, modules, or devices, or via instruction sets. Feedback child tasks continuously collect response action execution results to generate feedback reports that aggregate data for interaction of multiple operating elements (process, person, and technique), informing subsequent automated tasks.
The preceding stages and their related child tasks are critical capabilities to enable cybersecurity operations to evolve towards higher automation. Overall, the technical framework of AISecOps contains two major loops. One is the machine self-loop in the area enclosed with solid lines in the figure, which is the ultimate goal pursued by AISecOps for automation of critical operation tasks. The other is the human-in-the-loop (HITL) in the area surrounded by dotted lines, which highlights human interaction during each key operations automation phase and manual acquisition of data feedback from machines. The key to high-level operations automation still lies in hierarchical analysis and digging of “data-information-knowledge” in response to dynamic cyberspace environments and highly interactive combats between attackers and defenders. Therefore, we can see only by increasing hierarchical task capabilities of cyberspace data can security task automation be achieved. The current intelligence level of threat identification, traceback, predication, and other critical technical capabilities hardly gives full support for SOAR-based precise responses. Various types of technical bottlenecks, such as false positives, mistaken connection blocking, and black boxes in decision-making, make it hard to achieve more highly automated intelligence in high-risk security scenarios that involve critical decision-making. Thus, fully integrated human-machine intelligence is especially critical at the current stage.
Technology Readiness Levels
Given the fact that traditional intelligent security practices hardly match urgent needs of security operations, NSFOCUS proposes multi-stage AISecOps Technology Readiness Levels (TRLs), i.e. a method of establishing a matrix of automation capabilities. This method allows the use of uniform semantics for horizontal and vertical location of the development level, status quo, application scope, and application depth of relevant technologies with a unified meaning.
By reference of automation levels of self-driving[viiii], we came up with a taxonomy of automation levels (from no automation to full automation) to measure the ability to automate key security operations tasks. The important part of security operations is conceptually divided according to the classical AI paradigm “perception – cognition – decision – making-action”. The whole process corresponds to the OODA loop model consisting of four elements: Observe – Orient – Decide – Action. The perception layer features identification (such as entity identification and classification) and detection (such as threat detection) tasks; the cognition layer involves association (such as analysis of multisource data integration), traceback (tracing back attack paths), and predication (predicating attack behaviors) tasks; at the decision-making layer, assessment (such as comprehensive risk assessment) and creation (such as policy and scheme generation) tasks are performed; at the action layer, response (such as policy deployment) and feedback (such as active reporting) tasks are executed. Whether the tasks at each level are effective depends on the maturity of the upper level. The following briefly describes the different levels of AISecOps automation capabilities:
- L0 (no automation): All security operations tasks are completed manually. AI and other analysis technologies can provide identification and detection capabilities at a certain level which refer to high-level data collection capabilities, but have nothing to do with any security operations tasks.
- L1 (operations auxiliary): In line with security operations indicators, the automated operations system participates in some child tasks for environmental perception, information processing cognition, and risk assessment. At this automation level, the system provides routine data analysis as an auxiliary means, instead of performing any child tasks of automated action.
- L2 (partial automation): In certain single environments, the automated operations system take part in child tasks throughout the security operations process and make continuous data and knowledge interactions with operations personnel.
- L3 (conditional automation): Under all task scenarios, the automated operations system completes all child tasks, including those in the action stage. Manual responses and system takeover are required at critical stages
- L4 (high automation): Under restricted complex scenarios, the automated operating system performs security operations in a fully automated way in accordance with predefined operational indicators, without manual interventions.
- L5 (Full automation): Under any complex scenarios, the automated operating system performs security operations in a fully automated way in accordance with predefined operational indicators, without manual interventions.
AISecOps technology readiness levels relieve technical practitioners of a bother of the technology bubble. Currently, the security operations intelligence is mainly in L1 and L2 levels, with higher-level breakthroughs in multiple single-point technologies.
AISecOps is evolving at a rapid pace, with quick iterations in applied technical solutions. To explore the direction of future AISecOps development and identify bottlenecks in key capabilities, we draw a technical graph to present 16 fundamental frontier techniques for automated and intelligent security operations, with a view to creating a technical graph for cybersecurity operations scenarios. Horizontally, the technical graph divides attack identification techniques into several types from micro to macro levels: fingerprint and signature, technique and behavior, tactic and intention, group and organization, and campaign and situation. Vertically, the technical graph categorizes classic AISecOps techniques into fusion modeling at the data layer and risk perception, causal cognition, robust decision-making, and reliable action at the analysis level. Meanwhile, vertical techniques are indicated by color in terms of core data sources that contribute environmental data, knowledge data, behavioral data, and multidimensional comprehensive data. A clear division of AISecOps into 16 types provides a solid basis for fine-grained abstraction and integration of technical schemes and building of basic capabilities of the AISecOps platform.
Stay tuned for the next blog: AISecOps Development Trend
Related post: SecOps Development – Brief History and Outlook
[i] Dang Y, Lin Q, Huang P. AIOps: real-world challenges and research innovations[C]// 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2019: 4-5.
[ii] Rowley, J. The wisdom hierarchy: representations of the DIKW hierarchy[J]. Journal of information science, 2007, 33(2): 163-180.
[iii]Noel S, Harley E, Tam K H, et al.: CyGraph: graph-based analytics and visualization for cybersecurity, Handbook of Statistics: Elsevier, 2016: 117-167.
[vii] Grant T. Unifying planning and control using an OODA-based architecture[C]. Proceedings of Annual Conference of the South African Institute of Computer Scientists and Information Technologists, 2005: 111-122.