Compliance has seen radical changes in the requirements and driving force of data security and a broader category of data objects under data security protection. Application scenarios covered by data security will become more diversified, and data security requirements will cover all phases of the data lifecycle. In order to better cope with the challenges posed by compliance, enterprises need to transform from traditional single-point development to systematic data security development.
This post focuses on privacy data protection. Typically, there are three scenarios, i.e. privacy protection in data collection, governance and visualization of personal information, and user data rights request responses.
Privacy Protection in Data Collection
Personal privacy data needs to be protected during data collection.
- GDPR Compliance Requirement
During data collection and processing, appropriate technical and organizational measures shall be implemented to ensure a level of security appropriate to the risk (Article 32).
- Security Challenge
How to strike a balance between data availability and privacy protection in data collection.
- Solution: Differential privacy
Governance and Visualization of Personal Information
While governing personal information, enterprises need to enhance data visibility via visualization and other tools.
- GDPR Compliance Requirement
GDPR grants users multiple rights regarding personal data, such as right of access, right to rectification, and right to erasure. Accordingly, enterprises must fulfill and respond to user requests. For instance, when a user initiates a data viewing request, enterprises must completely present the personal data report of the data subject, including which user data has been collected and which has been shared by third-party enterprises (Articles 12 to 22).
- Problem and Challenge
When information and dimensions of the same data subject are distributed in multiple systems and applications of enterprises, how to identify entities and associate data becomes a technological challenge.
- Solution: Knowledge graph
User Data Rights Request Responses
Enterprises must respond to user data rights requests within specified time.
- GDPR Compliance Requirement
Regarding the response time for data rights, GDPR requires that enterprises respond to and process all requests within one month and allows that period to be extended by two further months if the request is too complicated (Articles 12 to 22).
- Problem and Challenge
According to Gartner, most enterprises cannot rise to challenges posed by subject rights requests (SRRs), and about two-thirds of enterprises need more than two weeks to respond to a single SRR, which is usually completed manually and incurs an average cost of about US $1400[i]. It is challenging to work out how to improve operational efficiency and reduce violation risks of response timeouts.
- Solution: Process automation
Below we’d like to introduce these technologies to solve privacy data problems facing enterprises.
Differential privacy (DP) does not need to assume the background knowledge grasped by attackers, and its security can be proved by mathematics. Therefore, in recent years, it has drawn extensive attention from both academia and industry.
It was first proposed by Cynthia Dwork in 2011, a Microsoft researcher[ii]. It can ensure that inserting or deleting a record in the database will not have a significant impact on query or statistical results. Its mathematical description is as follows:
D and D’ are adjacent datasets (the difference lies in only one record), and f() is a certain operation or algorithm (such as query, average, and sum). For the arbitrary output C, the probabilities of the two datasets outputting such a result are close. In other words, if the ratio of the two probabilities is less than ez, then it is called satisfying z- privacy. To this end, add noise to the query result, such as Laplace noise, so that the query result is distorted within a certain range and the probability distribution of two adjacent databases is almost the same. The parameter z is usually referred to as privacy budget. The smaller z is, the closer the results of two queries (by the adjacent datasets and ) are, and the better privacy is protected. Generally, z is set to a small number, such as 0.01 or 0.1. In practical applications, z needs to be adjusted to balance privacy and data availability. In early DP application scenarios, data was stored in databases and available to queriers via a query interface with the DP function. This solution is usually called centralized differential privacy (CDP). With research and development deepening, another mode emerged as local differential privacy (LDP). In LDP mode, each user terminal runs a DP algorithm, and the data collected by each terminal adds noise and uploads it to the servers. Although the servers cannot obtain accurate privacy data (privacy protection) of a certain user, they can discover the accurate behavior trend distribution of user groups through aggregation and conversion.
The concept of knowledge graph was first put forward by Google in 2012[iii]. It was originally used to optimize existing search engines and better the query of complex information through information extraction and association. With the development and improvement of theories and technologies, knowledge graphs have been widely applied in data mining in social network, finance, e-commerce, and other fields. A knowledge graph is essentially a semantic network, which is a graph-based data structure composed of points and edges. In a knowledge graph, each point represents an “entity” that exists in the real world, and each edge shows the “relationships” between entities. The knowledge graph is the most effective way to represent relationships. Popularly speaking, the knowledge graph is a relational network obtained by connecting all different types of information (heterogeneous information) together. It provides a capability of analyzing problems from the perspective of “relationships”.
Applying knowledge graphs to personal data can help enterprises know about where customer data is stored, how the data is used, and what contracts, laws, regulatory obligations are involved. Besides, it can associate the dimensional information of all properties of a personal data subject, such as name, date of birth, mobile phone number, and address, as shown in the preceding figure. In this way, when a user makes a personal data request, such as deleting data, enterprises can quickly obtain all data dimensions, storage locations, and shared third-party information of the user entity so as to process the user’s request within a short time and achieve compliance.
Process automation can assist data security operation teams of enterprises in switching from redundant and repeated manual processing featuring “request-response” to automated processing. This can not only lower manual operational costs but also reduce violation risks incurred by response timeouts.
Process automation empowers two types of privacy compliance products: one is Subject Rights Request (SRR), and the other is Universal Consent and Preference Management (UCPM). SSR can process user rights requests of accessing, rectifying, and deleting personal data, while UCPM can process and respond to user rights requests of restricting and objecting to the processing of the personal data collected. SRR and UCPM can be divided into two function layers:
- User-side functions: SRR and UCPM add clear and transparent request windows and buttons to product interfaces of mobile APPs, applications, or web pages for users, including providing buttons for viewing, rectifying, and deleting personal data, restricting processing objectives, objecting to sharing with third-party companies, or making other preferential settings on the panel, which is similarly shown in the following figure.
- Enterprise-side functions: After receiving a request, an enterprise’s back-end system authenticates and confirms the user’s identity, analyzes the contents of the request, maps the associated entity data, responds to the request within the specified time, and sends the results back via email or web pages to the user who made the request.
In user privacy data security and compliance scenarios, enterprises need to meet various privacy compliance requirements while collecting user information or interacting with users. For information collection concerning users’ sensitive behaviors, such as GPS trajectories, input emoji, and browsing behaviors, in order to reduce compliance risks, localized differential technology can be adopted to mine user behavior data in batches without revealing individual private information. To better satisfy users’ various data rights requests and make responses, knowledge graphs can be used to manage and visualize personal information. Furthermore, the application of process automation can empower the “request-response” process of user data rights. On the one hand, it can improve the processing efficiency and thus decrease manual operating costs; on the other hand, it can reduce compliance risks caused by response timeouts.
Related post: Compliance-driven Data Security
[i] Gartner 2020. Market Guide for Subject Rights Request Automation
[ii] Dwork C. Differential privacy. In:Encyclopedia of Cryptography and Security, 2011, 338-340.
[iii] Singhal A. Introducing the knowledge graph: things, not strings. Official google blog, 2012.