Skip to main content

Modelling user notification scenarios in privacy policies

Abstract

The processing of personal data gives a rise to many privacy concerns, and one of them is to ensure the transparency of data processing to end users. Usually this information is communicated to them using privacy policies. In this paper, the problem of user notification in case of data breaches and policy changes is addressed, besides an ontology-based approach to model them is proposed. To specify the ontology concepts and properties, the requirements and recommendations for the legislative regulations as well as existing privacy policies are evaluated. A set of SPARQL queries to validate the correctness and completeness of the proposed ontology are developed. The proposed approach is applied to evaluate the privacy policies designed by cloud computing providers and IoT device manufacturers. The results of the analysis show that the transparency of user notification scenarios presented in the privacy policies is still very low, and the companies should reconsider the notification mechanisms and provide more detailed information in privacy policies.

Introduction

Computationally efficient AI based applications are used in many subject domains, such as chemistry, genetics, healthcare, cybersecurity, unmanned vehicles, etc. Many such systems require processing of high volumes of personal and sensitive data to support accurate decision making. Thus, the collection and usage of users’ personal data have become an extremely common scenario. The legislative regulations such as GDPR (2016), PDPA (2012), HIPAA (1996), CCPA (2018) aim to protect data subjects’ privacy by establishing the principles of personal data processing and requirements to it. These regulations also define a way to communicate this information to the end user by means of privacy policies. Currently, different approaches have been proposed to analyze and formalize privacy policies in order to increase their transparency and clearness to end users. However, in major cases such approaches focus on activities that directly relate to the personal data processing, but the data practices that are with data processor’s obligations are not studied (Leicht and Heisel 2019).

This paper focuses on the problem of formalizing data processor’s obligations in case of data breach and policy change. Though these aspects do not relate directly to the personal data processing, they define the users’ awareness when the data breach occurs or policy is updated. The detailed analysis of the related works showed that these aspects have gained less research interest, resulting in low awareness of the end users about leakage of their personal data (Thomas 2021; Zou and Schaub 2019).

The authors suggest formalizing the notification scenarios using the ontological approach and propose novel concepts and object properties that characterize the key attributes of these scenarios. Such representation allows structuring information given in privacy policies, and enables automatic reasoning about them in respect with these scenarios. It could be also used in further evaluation of the privacy risks based on the analysis of the privacy policy. The proposed classes extend the ontology presented in Novikova et al. (2020). Thus, the novelty of the paper consists in the semantic modelling of the data breach and policy change notification scenarios that enable automatic evaluation of the privacy policy in terms of these scenarios.

This paper is an extended version of the conference paper presented on the 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) (Kuznetsov et al. 2022). The initial version of the ontology was significantly refactored, specifically in respect with notification scenarios. In the proposed ontology, the notification activities are modelled as a set of related (sub-)activities that consider actions taken by a data subject and the first party data processor; these activities “use” different communication mechanisms to inform the parties. These changes resulted in the introduction of the novel hierarchy of the subclasses characterizing data processing activity; definition of hierarchy of communication mechanisms specific to the first party, data subject and common mechanisms. Such solution allows representing the complex structure of the breach processing activity that certainly consists not only in user notification, but also in breach investigation activities. There are also minor changes in definition of the data control activities, the authors propose to distinguish two types of such activities: one relating directly to the personal data privacy control, and other relating to the management of marketing and advertising notifications. The experimental part of the research is extended by the automated analysis of the privacy policies contained in OPP-115 data set (Wilson et al. 2016) and reasoning about them.

Thus, the authors’ contribution consists in:

  • the development of the ontology model for the end user notification scenarios in cases of data breach and policy change;

  • the implementation of experiments that assumed construction of the ontologies for the privacy policies from the OPP-115 (Wilson et al. 2016) data set, the IoT data set (Kuznetsov et al. 2022), and the privacy policies of cloud service providers;

  • SPARQL queries for reasoning about notification scenarios presented in the privacy policies’ text.

The implemented experiments and analysis of the privacy policies enabled the authors to formulate the following key findings about the notification scenarios presented in the documents:

  • the privacy policies of the cloud computing and big analytics service providers include information only about one notification scenario, namely, notification in case of policy change, the given information in major cases is brief and rather vague, and the end user is encouraged to monitor the privacy policy themselves as the most frequently way to inform the end user is a general notice on a web page with the privacy policy;

  • only 10% and 16% of the privacy policies of the IoT device data set contain statements relating the data breach and policy change notifications;

  • the most widely used notification mechanism used in privacy policies to notify about policy change is a message on web site;

  • the temporal aspects of policy change activity as well as the consequences in case of policy decline are not discussed in the privacy policy.

The paper is organized as follows. Section Related works reviews related works in the field of formal languages developed for the privacy policy specification and modelling. In section Formalization of the user notification scenarios the authors analyze the requirements to the notification procedures in case of data breach and policy change that are given in the legislative regulations and other recommendations, outline their key attributes, and describe proposed ontology modelling these data usage scenarios. Section The analysis of the notification scenarios in privacy policies presents the experiments conducted to validate the proposed ontology in terms of its ability to reason about the outlined notification scenario attributes, and to analyze how the studied data usage scenarios are presented in texts of privacy policies. The paper ends with a conclusion and definition of the future research directions.

Related works

Currently, there are numerous research papers devoted to the analysis of the different aspects of privacy, including perception of the risks by end users (Gonzalez-Granadillo et al. 2021; Harkous et al. 2018; Pardo and Le Métayer 2019; Gerl et al. 2018; Azraoui et al. 2015; Karjoth and Schunter 2002; DPV 2018; Tang and Meersman 2002; Gharib et al. 2021, 2020; Kost and Freytag 2012; Bawany and Shaikh 2017; Santoro et al. 2018). In the scope of this research the authors evaluated the papers devoted to the analysis of the privacy policies.

In the context of the privacy policies analysis, it is possible to outline two main research tasks. The first one focuses on the analysis of the existing privacy policies written in natural languages (Harkous et al. 2018), while the second one relates to the creation of the formal languages that could be used to define personal data processing rules or model data usage scenarios. To analyze privacy policies written in natural language, different machine learning techniques have been proposed (Wilson et al. 2018; Gopinath et al. 2018; Tesfay et al. 2018). For example, in (Wilson et al. 2018) the authors tested different classification models to detect different usage scenarios such as first-party data collection, data retention, etc. in the texts of privacy policies, and demonstrated that achieved detection accuracy is quite high. A similar problem is discussed in Tesfay et al. (2018). The authors present the PrivacyGuide tool that is intended to simplify the user’s comprehension of the privacy policies. They also apply machine learning techniques to extract different privacy aspects such as data collection, retention, third party sharing, available privacy and user account controls, children protection, and etc. The authors defined indicators corresponding to these privacy aspects and use them to represent the risk level detected for each aspect visually.

The application of formal languages to describe privacy policies and rules allows an analyst to reason about their completeness and consistency, and enables automatic enforcement of the privacy specifications. The usage of the ontological modelling adds formal semantics to the defined rules and policies.

The P3P (Platform for Privacy Preferences) framework (Karjoth and Schunter 2002) developed by the W3C consortium included the formal language E-P3P (Ashley et al. 2002) to define privacy rules for managing data collected via web browsers. It allowed to define types of the collected data, the purposes of their processing and access rules. Currently, this framework is obsolete due to lack of adoption and demand.

In Pardo and Le Métayer (2019) the language PILOT is presented, and this language enables setting up the rules specifying data collection, usage and transfer procedures. For example, it allows a user to define the conditions for data collection, i.e. the constraints and the context of this activity, to specify activity purposes and set up its time limits. It was demonstrated that it is possible to evaluate privacy risks based on the policy analysis if it is specified using the PILOT language.

Another GDPR-compliant language—the Layered Privacy Language (LPL)—is described in Gerl et al. (2018). It also includes such components as retention time, data categories, data processing purpose and allows specification of data anonymization techniques to be applied. In Leicht and Heisel (2019) the existing privacy policy languages were reviewed, and it was shown that only a few languages allow describing data processor’s obligations such as notifications in case of a data breach. For example, the A-PPL language allows indicating who is the recipient of the notification in case of data breaches (Azraoui et al. 2015). However, the case of notification due to policy change is not considered at all.

One of the first ontologies devoted to the privacy issues is presented in Tang and Meersman (2002). It serves as a formal basis for the DOGMA framework and its primary goal is to link the real cases with legislative directives and principles to support legal argument process. Thus, it models such notions as jurisdictional principles, directives, facts, and cases in the field of confidentiality and privacy.

An ontology-based approach to verify an information system in context of privacy-by-design principles is proposed in Kost and Freytag (2012). The ontology specifies not only domain-dependent aspects but also privacy-aware aspects including privacy protection mechanisms, enabling, thus, reasoning about privacy protection in the information systems. In Bawany and Shaikh (2017); Santoro et al. (2018); Cano-Benito et al. (2021) the ontology-based approaches to modelling privacy control processes for different use cases are proposed. For example, in Santoro et al. (2018) an ontology that describes requirements to the protection of such personal data as photos, videos, messages, as well as users’ reaction to certain social media content, stored in the cloud and/or on social networks’ servers is described. This ontology is used to generate data access rules using Semantic Web Rule Language (SWRL).

The adoption of GDPR has led to a number of papers devoted to semantic modelling of GDPR principles and concepts. A comprehensive review of the GDPR-compliant ontologies and formal languages is presented in Esteves and Rodríguez-Doncel (2022). The developed ontologies focus on modelling of such notions as categories of personal data, purpose, consent, security measures (Palmirani et al. 2018; Elluri and Joshi 2018; DPV 2018; Pandit et al. 2019). For example, in Elluri and Joshi (2018) the preliminary version of a taxonomy of the cloud service provider’s obligations is proposed. It includes the data breach notification as an obligation mandatory for all data processors, but the discussion on its properties and relations with other concepts is not provided.

The W3C consortium is working on a Data Privacy Vocabulary (DPV) that includes terms describing key attributes of personal data processing (DPV 2018). It provides extensible concepts for describing types of the personal data, purposes, legal basis and entities involved in data processing. However, the current version does not include specification relating user notification scenarios.

Another GDPR-compliant model is described in Torre et al. (2019). The authors provide the UML representation of the GDPR as a step of the automated GDPR compliance checking procedure. The authors propose to decompose the GDPR model into two tiers: the generic one which describes basic concepts, relations, restrictions of the GDPR, and specific one that considers specific situations such as national laws, domain and organizational regulations.

To formalize and analyse texts of privacy policies, the PrivOnto ontology was developed (Zimmeck et al. 2019; Oltramari 2018). Similarly to Wilson et al. (2018); Tesfay et al. (2018), it models 11 aspects of data processing (or data processing practices) and details their attributes. Though this ontology was developed before the adoption of GDPR, it is compliant to its requirements and principles (Poplavska et al. 2020), moreover, it could be considered as universal and basic one as it does not have any tight relations to any specific local legislation. However, the primary goal of this ontology was to support the annotation process of the privacy policies written in natural language. That is why the data processing practices are represented by a single entity, and their attributes – via data attributes. As a result some concepts such as data types, processing purposes, legal basis are formally different for different usage scenarios. The PrivOnto ontology also does not cover all aspects relating to user notifications in case of data breach.

The P2Onto ontology (Novikova et al. 2020) is quite similar to the PrivOnto ontology, as the similar data processing practices serve as a basis for its definition, however, the modelling approach is different. The data scenario is represented by a set of linked concepts and properties, enabling thus making inference about objects in the context of several scenarios. It was shown that it could be used to assess risks based on the analysis of the privacy policies (Novikova et al. 2020). However, P2Onto ontology also lacks information about notification mechanisms in case of data breach and policy change.

Thus, it is possible to conclude that though all related works focus on modelling of the privacy requirements or personal data processing practices, they either do not provide a specification of data processor’s obligations or provide only high level formalisation of these activities that could not be applied in respect with the analysis of the privacy policies or user agreements. In this paper, the authors propose a refinement of the P2Onto ontology in order to consider user notification scenarios that are typically to be presented in texts of these documents.

Formalization of the user notification scenarios

Before presenting a formal representation of the user notification scenarios, we define the following terms that are used in the rest of the paper:

  • a data practice is an activity that arises from the processing of personal data, including the collection, storage and sharing of personal data;

  • a data usage scenario is a description of data practice present in a privacy policy, which explicitly defines different implementation aspects specific for a given data practice.

When analyzing privacy policies, researchers traditionally outline from 10 to 11 practices depending on level of detailing that characterize personal data processing (Novikova et al. 2020; Oltramari 2018). The main reason for highlighting these aspects is to determine how a data processor observes the data subject rights defined in the GDPR. The description of these practices as well as data subject rights are given in Table 1. It is clearly seen that some data practices such as the first-party collection and the Third party sharing directly relate to processing of personal data and include many aspects that characterise not only types of personal data but also legal basis and purpose of their processing. Some data practices characterize specific attributes of data processing only, for example, the Do Not Track scenario refers to processing of the specific type of personal data that enables tracking. It focuses on the collection of such data as location data, unique identifiers of mobile network operator, device, Wi-Fi access point, etc., and includes also analysis of the opt-in/opt-out options. Two scenarios, namely, the Policy Change and the Data Breach Notification stand apart. Though they do not directly relate to personal data processing, they are still important. The Policy Change practice defines how an user is notified when the privacy policy changes, and as the changes may concern almost all aspects of data processing, it becomes an essential issue especially in the context of keeping the data subject informed. The Data Breach Notification deals with informing the data subject when personal data leaks due to different reasons.

Table 1 Personal data processing practices and their relation to end user’s rights defined in GDPR

Data Breach notification scenario

In general case, the procedures in case of data breach are clearly defined in the existing legislative documents such as GDPR (2016), PDPA (2012), HIPAA (1996), and CCPA (2018). The data subject is notified when the data breach accident is to result in severe risk to their security and privacy. The data processor has to inform them in no delay manner, and the notification has to include information about nature of the incident, data affected by data breach, possible consequences as well as measures taken by data processors. If the number of the affected people is large, and individual notification is not resource efficient, this notification is done using public media in order to inform all data subjects in equally efficient manner. The European legislation requires also informing data subjects about the data controller or data protection officer who is authorized to handle the accident and could provide more information. Thus, it is possible to outline the following key attributes that describe this scenario:

  • nature (or cause) of the data breach;

  • personal data affected;

  • consequences;

  • mitigation measures;

  • data controller (or data protection officer);

  • security and privacy risk level;

  • notification mechanisms.

Obviously, it is not possible to define all attributes of the data breach scenario in advance in order to specify them in the privacy policy. For example, consequences, security and privacy risk level as well as mitigation measures could be assessed and defined only after the breach occurs. Moreover, the legislative regulations require to notify a data subject if the expected impact on the data subject privacy and security in case of data breach is high. As the goal of the research is to model user notification scenarios present in the privacy policy, these attributes could be omitted from the attribute list to be considered. However, the notification procedures that would be used to inform the data subject, nature of the data breach and time period set to implement the user notification activity should be clarified in the privacy policy. Thus, these attributes are selected to describe the user notification scenario in case of the data breach as provided by the privacy policy.

By outlining these attributes the authors also enable formulating questions of interest or competence questions as they are referred to in Pandit et al. (2018) to reason about the privacy policy. They are as follows:

  • What mechanisms are used to notify about privacy breaches?

  • What are the causes of the data breach?

  • What is the time period the data processor is obliged to notify about the data breach?

Policy change notification scenario

The notification procedure in case of policy change is not defined clearly in the existing legislative documents (GDPR 2016; PDPA 2012; HIPAA 1996; CCPA 2018).

In order to extract main notions associated with this scenario, authors and a small group of experts consisting of two lawyers with different level of expertise in personal data protection, and one invited specialist in security and privacy risk assessment, analyzed recommendations for writing GDPR-compliant privacy policy and a template provided by GDPR privacy notice template (2019), a set of privacy policies designed by large companies, and recommendations given by organizations that provide services for automatic generation of the privacy policies.

This allowed the authors to outline the following attributes that are used to characterize this scenario:

  • nature or cause of the privacy change;

  • notification mechanism;

  • time period given to data subject to make a decision on policy update;

  • mechanisms to give/withdraw the consent;

  • consequences in case of denial to accept the updated privacy policy the end users.

It should be noted that the privacy policy update notice should contain exact date, when the changes come into the effect, and description of the procedures how to accept or decline the policy update. Obviously, as these attributes depend on the scope of the privacy policy update, they have to be detailed in the update notice, and it is not possible to detail them in advance in the privacy policy. However, it is expected that the information about notification mechanism and how it depends on scope of the privacy policy change has to present in the privacy policy.

As in case of data breach notification scenario it is also possible to formulate questions of interest to reason about the privacy policy in terms policy change scenario. They are as follows:

  • What mechanisms are used to notify about changes?

  • What is the scope or cause of policy change?

  • Is there any period given to data subject to make decision on policy update?

  • What are the consequences of policy decline?

Ontology-based approach

To formalize the notification scenarios, the authors propose using an ontology-based approach. Ontology is a formal specification that enables representation of some subject domain as a set of domain concepts (classes) with determined relations between them in order to organize data into information and knowledge in a structured machine-readable format limiting thus its complexity. One of the important ontological relations is a subsumption, i.e. the is-a-subclass-of relation that allows forming a taxonomy of concepts. In respect to the personal data processing, this property is used to define the taxonomies of the personal data types, activity types, purposes, etc. Other types of relations between ontology concepts enable reflecting the semantically meaningful domain specific links between them. For example, they allow linking types of personal data with a specific processing activity, and specify the involved entities and parameters of this processing, including its temporal characteristics. At the stage of creating ontology instances (objects), concept attributes get specific values. In the case of privacy policy analysis, such values can be represented by excerpts from the text. These excerpts, on the one hand, serve as an evidence that these concepts present in the text of the policy, and, on the other hand, provide a detailed description of these concepts.

As a basis for modelling, the authors use the P2Onto ontology that was initially developed to model all data usage scenarios. In contrast to Oltramari (2018), this ontology is designed specifically to describe scenarios by defining concepts and relations between them. In Novikova et al. (2020) authors focused on detailing the first three data practices presented in Table 1, and showed that introduced classes and properties also cover such practices as Do Not Track, International & Specific Audiences, and Data Aggregation. Though this ontology has classes that could be used to determine data subject’ possibilities to control data as well as a data processor’s ones to inform them in case of any changes in data processing, these scenarios were not studied in detail. The section Description of the P2Onto ontology details the P2Onto ontology that serves as a basis for formalising notification scenarios, and section Proposed ontological representation of user notification scenarios presents novel concepts and refinements made to the P2Onto ontology.

Description of the P2Onto ontology

The updated P2Onto ontology consists of several core classes, namely, PrivacyPolicy, Data, Activity, Agent, and etc. (see Fig. 1) which are described further in detail. These classes define key aspects and attributes of personal data processing such as what data are being processed, how data are processed, what entities are involved in processing, and what the reason and basis of this processing are.

Fig. 1
figure 1

P2Onto core classes

The Data class serves as a root class used to define the hierarchy of data types provided by the user (or data subject) during service usage. Even if the first party collects data from the third party sources, including public sources, the data still relate to the user. This class has two high level subclasses – the PersonalData class and the NonPersonalData class. The PersonalData class and the hierarchy of its subclasses are determined on the basis of the GDPR definition of personal data (GDPR 2016). They are defined as “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”, Art. 4 (GDPR 2016). Thus, it is possible to outline the following subclasses of the PersonalData class:

  • SensitiveData class that represents physical, physiological, genetic, mental state of the data subject, their ethnic origin, religious and philosophical beliefs etc;

  • ServiceData class that reveals various service information i.e. service or application logs;

  • TrackingData class that is used to consider location and other data types that enable application of tracking technologies;

  • ApplicationData class that is used to characterize application settings, version, etc;

  • DeviceData class that is used to describe device properties such as device type and model, firmware, hardware, operating system if applicable, and manufacturer;

  • AccountData class that presents such information about end user as name and given name or nickname, official identifier or identification document, profile photo, etc.

  • FinancialData class that includes information about financial account identifier, bank accounts, and payment cards.

The NonPersonalData class is used to describe all other non-personal data that are not included in classes listed above. The hierarchy of data types is given in Fig. 2.

Fig. 2
figure 2

The Data class hierarchy

The Activity class is a basic class that is used to determine different types of activities relating directly or indirectly to personal data processing (Fig. 3).

Fig. 3
figure 3

The Activity class hierarchy

This class has a recurrent relation called ‘binded_to’ to express the fact that one activity may be represented by a subset of interrelated activities.

The concepts presented below are defined using notions of description logic as they are specified in Ashley et al. (2007). The definition of the Activity class according to core classes relations is given in Eq. 1.

$$\begin{aligned} Activity&\sqsubseteq (Thing\nonumber \\&\sqcap {=1}\ considered\_by.PrivacyPolicy\nonumber \\&\sqcap {\le 1}\ initiated\_by.Agent\nonumber \\&\sqcap \forall has\_mechanism.Mechanism\nonumber \\&\sqcap \forall has\_mode.Mode\nonumber \\&\sqcap \forall caused\_by.Cause\nonumber \\&\sqcap \forall has\_consequence.Consequence\nonumber \\&\sqcap \forall has\_purpose.Purpose\nonumber \\&\sqcap \forall lasts\_for.TimePeriod\nonumber \\&\sqcap \forall has\_basis.Basis\nonumber \\&\sqcap \forall binded\_to.Activity). \end{aligned}$$
(1)

According to Eq. 1, the Activity concept is described as a Thing concept which can be considered by only one PrivacyPolicy instance and initiated only by one or no Agents and may have relations with other different entities such as Mechanism, Mode, Cause etc. Such notations are also known as class restrictions in OWL terminology and allow to define complex concepts with focus on the concept itself, rather than on relation types which make ontology less flexible and maintainable.

On the next level of the hierarchy, there are two important subclasses: DataActivity and DataControl. The first one serves as a parent class to define activities relating directly to personal data processing such as data collection, usage, sharing with third parties, retention and protection. The DataActivity class has specific restrictions that specify who initiates data processing and what the purposes of such activity are. These classes are more specific thus they have exactly one initiating agent – the FirstParty and the User respectively (see Eq. 2):

$$\begin{aligned} DataActivity&\sqsubseteq (Activity\nonumber \\&\sqcap {=1}\ initiated\_by.FirstParty\nonumber \\&\sqcap \forall has\_purpose.DataActivityPurpose\nonumber \\&\sqcap \forall applied\_to.Data). \end{aligned}$$
(2)

The second subclass, DataControl, is a parent class that is used to determine activities related to control procedures such as giving consent to processing certain types of personal data (the OptinOptoutControl subclass) and setting up marketing mailings (the AdvertisingDataControl subclass). The definition of the DataControl class is given in Eq. 3.

$$\begin{aligned} DataControl&\sqsubseteq (Activity\nonumber \\&\sqcap {=1}\ initiated\_by.User\nonumber \\&\sqcap \forall applied\_to.Data). \end{aligned}$$
(3)

The DataBreachProcessing and NotificationActivity classes are detailed in the next section dedicated to formalizing notification scenarios.

The Agent class is a basic class for defining entities that are involved in data processing. It is used to present data subject (the class User), organizations that perform data processing including third-parties. This class could be also used to describe entities that serve as additional sources of personal data such as public organization, state organisations, social networks etc. Currently, it has following three subclasses:

  • an end user of a product or service,

  • the first-party company, which provides mentioned service for the user, and

  • the third party that can also interact with the first-party company to receive user’s personal data to process.

Another core concept is the Mechanism class, it serves as a basic class for describing different tools or interfaces that are used to support different types of activities. For example, it is a parent class for the CollectionMechanism class that is used to determine ways and procedures for data collection. Another example of the Mechanism subclass is a SecurityMechanism class that is used to define the technical tools and organizational measures applied to protect the data. The Mechanism class is linked to the Activity class via the hasMechanism property; thus, it is possible to set any mechanisms and/or tools for any activity that is inherited from the Activity class.

The LegalBasis class (subclass of Basis class) serves as basis to define legitimate foundations for personal data processing. Currently, this class in the P2Onto ontology is not detailed with subclasses, however, the authors propose to re-use the LegalBasis concept defined in the Data Privacy Vocabulary (DPV) (2018). It has such subclasses as consent, contract performance, legitimate interest, public interest and vital interest of a data subject or other natural person. The LegalBasis class is linked to the DataActivity and DataControl concepts using the ‘has_basis’ property.

Another core P2Onto class is the DataActivityPurpose class. It is used to define the purpose of data activity, for this reason it is linked to the DataActivity class using the ‘has_purpose’ property. There are a variety of data processing purposes, the authors of DPV (2018) outline 74 different purposes. However, it is possible to group them in categories with lower level of detail but still enough to characterize them in meaningful manner. Thus, this class has following subclasses:

  • LegalCompliance class that corresponds to the purposes relating to legal requirements or regulation obligations;

  • ServiceProvision class that is used to define purposes relating to the functionality provided by a product or a service;

  • Advertising &Marketing class that includes purposes relating to promoting, selling, and distributing a product or a service.

  • Analytics &Personalization class that is used to define purposes relating to personalized advertising, benefits, creation of the personalized content and recommendations;

  • SecurityPurposes class that corresponds to the procedures and tasks that are implemented in order to secure data processing such as identity verification procedure, fraud detection and prevention.

  • Merger &Acquisition class that is used to characterize purposes that arise when the ownership of companies, or their operating units are transferred to other entities.

The cases not covered by the concepts present above are assigned to the root DataActivityPurpose class.

Proposed ontological representation of user notification scenarios

Since the P2Onto ontology is activity-centric, the modelling of the notification scenarios starts with the definition of the NotificationActivity class. The notification process semantically consists of agent interaction that employs appropriate mechanisms and has a period of time to be implemented. Thus, the formal definition of the notification is given in Eq. 4.

$$\begin{aligned} Notification&\sqsubseteq (Activity\nonumber \\&\sqcap {\le 1}\ initiated\_by.Agent\nonumber \\&\sqcap \forall notifies.Agent\nonumber \\&\sqcap \forall has\_mechanism.CommunicationMechanism\nonumber \\&\sqcap \forall lasts\_for.NotificationTime). \end{aligned}$$
(4)

Considering that the notification process can be initiated either by a user (or data subject) or the first party (or data processor), and these two entities could use mechanisms that are available only to them, the authors propose to distinguish between user notification activity (UserNotification class) and first party notification activity (FPNotification class). The formal definition of these subclasses is described by Eq. 5. This definition considers the fact that the user and the first-party could use both common and their own communication mechanisms:

$$\begin{aligned} UserNotification&\sqsubseteq (NotificationActivity\nonumber \\&\sqcap {=1}\ initiated\_by.User\nonumber \\&\sqcap \forall notifies.FirstParty\nonumber \\&\sqcap \forall has\_mechanism.( \nonumber \\&\quad UserSpecificCommunicationMechanism \nonumber \\&\quad \sqcup GeneralCommunicationMechanism\nonumber \\&)),\nonumber \\ FPNotification&\sqsubseteq (NotificationActivity\nonumber \\&\sqcap {=1}\ initiated\_by.FirstParty\nonumber \\&\sqcap \forall notifies.User\nonumber \\&\sqcap \forall has\_mechanism.( \nonumber \\&\quad FPSpecificCommunicationMechanism \nonumber \\&\quad \sqcup GeneralCommunicationMechanism\nonumber \\&)). \end{aligned}$$
(5)

Such formalization of the notification activity is very generic, and, in fact, does not express any information about data breach, policy change, or policy acceptance; thus, it is required to define formal concepts corresponding to breach and policy change activities and link them to the notification activity using Activity class ‘binded_to’ property (see Eq. 6).

$$\begin{aligned} DataBreachProcessing&\sqsubseteq (Activity\nonumber \\&\sqcap {=1}\ initiated\_by.FirstParty),\nonumber \\ PolicyChange&\sqsubseteq (DataActivity\nonumber \\&\sqcap \forall has\_scope.PolicyChangeScope\nonumber \\&\sqcap \forall lasts\_for.PolicyAcceptanceTime),\nonumber \\ PrivacyControl&\sqsubseteq DataControl. \end{aligned}$$
(6)

Based on the analysis of the scenario attributes and questions of interest identified in sections Data Breach notification scenario and Policy change notification scenario, the authors propose the following key basic concepts common to both scenarios:

  • CommunicationMechanism class;

  • Cause class;

  • Consequence class;

  • TimePeriod class.

The parent class of the CommunicationMechanism class is the basic Mechanism class. The CommunicationMechanism class is used to specify different techniques to inform a data subject and other subjects involved in data processing when various events occur. These events are represented by a policy change event or a data breach. The way of notification has to be explicitly specified in the text of a privacy policy. The analysis of privacy policies designed by large companies and recommendations given by organizations that provide services in automatic generation of privacy policies enabled the authors to outline the following techniques that could be employed to inform subjects about any events: notice on the website, SMS, e-mail or postal mail, notification in application or service software, etc. Figure 4 shows the proposed hierarchy of subclasses for this class.

Fig. 4
figure 4

The CommunicationMechanism class hierarchy

The Cause class is a basic class that is used to characterize the causes of the events and corresponding notifications. As the origins of the events in the two studied scenarios are different, it is proposed to determine two different classes specific to these scenarios – DataBreach and PolicyChangeCause.

The DataBreach class defines causes for a data breach processing. The following high-level causes for the data breach processing were outlined: intentional breach, unintentional breach, and force majeure. The intentional causes include different malicious attacks and actions that lead to data corruption or leakage. The unintentional causes include other technical, organizational, or physical reasons for data corruption or leakage. This concept also includes unintentional human factors. Force majeure is defined as a separate subclass, as this case is unpredictable and unmanageable one and is usually treated separately in legislative documents (GDPR 2016; PDPA 2012; HIPAA 1996; CCPA 2018).

The PolicyChangeCause class characterizes policy specific changes. The authors adopted the taxonomy of the policy change causes proposed in Wilson et al. (2016). Thus, the following subclasses of the PolicyChangeCause class are defined:

  • NonPrivacyRelatedCause class that is used to specify causes that do not significantly affect personal data privacy, such as an extended description of service provided, novel features that do not require the collection of additional personal data, etc.

  • PrivacyRelatedCause class that is used to differentiate changes that impact on data subject privacy such as changes of data collection, extension or reduction of types of personal data being processed, changes in the organizational or technical measures used to protect data security or privacy.

  • MergeAcquisitionCause class that is used to characterize changes due to changes in business process ownership. These changes could be quite significant ones.

The cases not covered by the concepts present above are assigned to the root PolicyChangeCause class.

Figure 5 shows the hierarchy of the proposed Cause with links to the corresponding activity classes.

Fig. 5
figure 5

The Cause class hierarchy

The Consequence class is used to model the consequences of an entity activity. For example, it could be used to determine the consequences for an end user if they opt-out provision of some types of personal data. In the case of a policy change, this concept is applicable to both parties: the first party and the data subject. This happens because the policy change activity, in fact, initiates another activity – giving a consent. In the case of the first party, the consequence of a policy change is the adoption of novel rules and procedures for personal data processing. In the case of the end user, the consequences specify the impact of the user’s decline of the policy update. It is a common situation that withdrawal of consent in the case of a privacy policy update may lead to full or partial service restriction. In order to characterize this case, the authors introduce the UserChoiceConsequence class that specifies possible results of the consent withdrawal. Figure 6 shows the proposed hierarchy of the Consequence subclass.

Fig. 6
figure 6

The Consequences class hierarchy

Both scenarios have temporal aspects. In the case of the policy change notification, there is a period during which the end user needs to decide whether to accept or decline the updated version of the privacy policy. In the case of the data breach notification scenario, there is a time period given to the data processor to inform the data subject about a data breach. To support the description of these aspects, authors added PolicyAcceptanceTime and BreachNotificationTime as subclasses to the TimePeriod class that is used to determine temporal aspects of data processing scenarios. The updated hierarchy of the TimePeriod subclass hierarchy is shown in Fig. 7.

Fig. 7
figure 7

The hierarchy of classes used to present temporal attributes of different data usage scenarios

The object property ‘binded_to’ puts all discussed concepts together in one semantic net. In Fig. 8 two activities represented by DataBreachProcessing and FPNotification classes, correspondingly, are linked together to reflect the sequence of activities initiated by a data breach event (DataBreach class). The detailed description of each class and subclass used to define this scenario is given in Table 2.

Fig. 8
figure 8

The linked set of ontology classes used to characterize the data breach scenario

Table 2 Data breach scenario, and description of its attribute values

The policy change scenario is modelled in a similar manner. It is represented by four linked activities: PolicyChangeActivity which is initiated by FirstParty, then FirstParty entity notifies User with FPNotification about the changes which forces User to perform PrivacyControl activities by accepting or rejecting a new version of policy within PolicyAcceptanceTime period, and to notify FirstParty with UserNotification about the decision. Figure 9 illustrates all relations between classes, and Table 3 provides detailed description of the defined classes and subclasses.

Table 3 Policy change scenario, and description of its attribute values
Fig. 9
figure 9

The set of the linked ontology classes used to characterize the policy change scenario

Apart from adding novel classes to the P2Onto ontology, the authors refactored the structure of the ontology to enable reasoning about a set of privacy policies. This feature is useful when comparing changes between several versions of one policy or making inference on the set of policies. In order to do this, a novel root element PrivacyPolicy that represents a privacy policy was added. The core P2Onto concepts such as Data, Agent and Activity that define data usage scenarios are linked to this class. The object property ‘newer_than’ is used to link different versions of one privacy policy.

Another important change in the ontology design relates to the presentation of the data that confirms the presence of data usage scenario concepts in the text of the privacy policy. The evidences are represented by excerpts from the privacy policy text, and are included in the ontology as a service literal data property ‘evidence’. If the concept is present in privacy policy, then the property ‘evidence’ stores a text from it, otherwise it stores the string “Not defined”.

The schematic representation of the P2Onto ontology is depicted in Fig. 20 in the Appendix 1. It shows all classes that are used to describe different personal data scenarios, the semantic relations are drawn only for the low level of abstraction concepts to make figure clearer.

The design of the ontology was done using the Draw.io tool (Draw.io 2022), which allows online collaboration, and afterward, it was prototyped using Protégé (2014) of version 5.6.3 and Owlready2 library (Lamy 2017) of version 0.44. The formal consistency check was performed with Hermit reasoner (Glimm et al. 2014) of version 1.4.3.456 and showed no errors.

The analysis of the notification scenarios in privacy policies

The primary goal of the experiments is to demonstrate how the ontological representation of the privacy policy could be used to reason about aspects that are formulated as questions of interest in sections  Data Breach notification scenario and Policy change notification scenario.

The construction of the ontology for a privacy policy requires parsing its text, extracting required data, and mapping them to the ontology concepts. Currently, this task is solved in two ways: manual analysis of privacy policies and automated analysis of annotations created for privacy policies that are labelled in accordance with a schema corresponding to the proposed ontology classes. Thus, the experiment was conducted in two phases: in the first phase, privacy policies developed by large cloud computing companies and IoT vendors were studied. These policies were manually analyzed, and ontologies were manually created for them. In the second phase the authors used the OPP-115 (Wilson et al. 2016) data set that has annotations relating data practices to process them automatically.

The queries for reasoning about notification scenarios were written in SPARQL language which is designed for building queries to databases containing semantic networks or knowledge graphs in Resource Description Framework (RDF) format. The RDF format is an open standard to describe objects and resources in machine-readable format. The ontology is described using the OWL2 ontology description language, which build upon the RDF objects and allows establishing semantic relations between them. The syntax of SPARQL queries resembles to syntax of the SQL queries; however, the logical conditions are specified using triplets or basic graph patterns. A triplet specifies how one object (subject) relates to another object (object) or literal, the relation is defined by a predicate, which is also represented by an object or resource. The SPARQL query language, the OWL ontology description language, and the RDF and RDF data formats description languages are all parts of the semantic web framework maintained and standardized by W3C (1994) consortium.

The queries were performed with GraphDB database (GraphDB by Ontotext 2021) due to inability of Protégé‘s SPARQL Query tool to perform queries with such amount of individuals.

The authors performed manual ontological modelling for privacy policies of four large companies providing cloud services and big data analytics (Amazon Web Services; Google Cloud; Google Cloud; Hewlett Packard Enterprise privacy notice; Yandex privacy notice). The authors discovered that these privacy policies contain information about only one notification scenario – the policy change scenario. Let us consider them in detail.

Figure 10 shows the ontology for a policy change activity in the Amazon Web Services (AWS) privacy policy (Amazon Web Services 2022). It should be noted that the ontology graph in Fig. 10 is organized in such manner that the abstraction level lowers moving from upper elements to lower elements of the plot, thus the individuals of the scenario are shown in the bottom part of the figure. A similar layout scheme of the ontology elements is used in Fig. 11, Fig. 14, and etc. The scenario is described partly, however, the notification mechanism is set, and it is done using a notice on the privacy policy web page. This corresponds to the InPrivacyPolicy concept, which has the following evidence: “You should check our website frequently to see recent changes”. Also this privacy policy gives end users some kind of choice by stating: “never materially change our policies and practices to make them less protective of personal information collected in the past without informing affected customers and giving them a choice”.

Fig. 10
figure 10

Ontology for policy change activity described in Amazon Web Services privacy policy

The ontology of the another privacy policy designed for the Google Cloud services (Google Cloud 2022) is given in Fig. 11. This policy does not specify any causes and acceptance time for applied changes, but states two notification mechanisms. One of them is not clearly defined and presented as: “sending you a direct communication”. Thus, it was assigned to the generic CommunicationMechanism class. In contrast to the first notification mechanism, the second one is defined clearly as: “posting a prominent notice on this page”.

The description of the policy change scenarios in Hewlett Packard Enterprise (HPE) and Yandex policies is extremely limited, and as the result the implemented policy change scenario was quite simple in both cases. The Hewlett Packard (2022) privacy policy states: “If we modify this Privacy Statement, we will publish a revised version with an updated revision date. The privacy link on the footer of every HPE web page will then point to that new version”. The ontology modelling these statements is shown in Fig. 12.

Fig. 11
figure 11

Ontology for policy change activity in the Google Cloud computing privacy policy

Fig. 12
figure 12

Ontology for policy change activity process in HP privacy policy

The Yandex (Yandex privacy notice 2022) privacy policy states: “This Policy may be subject to amendments. Yandex shall be entitled to make such amendments at its own discretion, including, without limitation, to reflect changes in applicable legislation or amendments to the Sites and/or Services. Yandex undertakes not to introduce significant changes, imposing additional obligations or reducing your rights under this Policy without proper notice to you. You will be notified about such changes. If applicable, changes will be announced on the Site or Services (e.g. via a pop-up or web banner) prior to such changes taking effect, and you may also be notified via other channels (e.g. by email) if you have provided us with your contact details”. The constructed ontology for these statements is provided in Fig. 13.

The authors also performed historical analysis of the selected privacy policies (Amazon Web Services; Google Cloud; Hewlett Packard Enterprise privacy notice; Yandex privacy notice) to see how they evolved over the time in respect to the user notification scenario. The authors discovered that only one effective version of the HPE privacy policy (Hewlett Packard Enterprise privacy notice 2022) is publicly available. The last ten versions of the privacy policy are available for the Google Cloud services (Google 2022) and AWS services (Amazon Web Services 2022), their oldest versions date back to 2020. There are no changes in the texts of the privacy policies relating notification of the user in case of privacy policy change and update. The Yandex privacy policy has 16 versions, and the oldest version dates back to 2013. Since 2013, there has only been one change to the statement regarding user notification in the case of a policy change in 2018. This change is motivated by the adoption of the GDPR in EU, and the company’s entry into the European market. The focus of the statement has shifted from the company’s right to amend the privacy policy to the company’s obligations not to ”introduce significant changes” without proper notice to users. The first version of the policy change point did not contain any information about the notification activity other than the URL where the updated valid policy change could be found.

Fig. 13
figure 13

Ontology for policy change activity process in Yandex privacy policy

Thus, it is possible to conclude, that the information provided in all policies is quite vague and short. The main notification mechanism in both cases is an update notice in the text of the privacy policy, and as this type of notification is a passive one, the users are encouraged to monitor the privacy updates themselves. Thus, the exercise and observance of the Right to be informed (GDPR, Art.13) is delegated to the user themselves.

There is no obvious reason for omitting data breach notification scenario from the scope of these privacy policies, however, this could be explained by the fact that legislative documents provide a well-defined procedure in case of the data breach, specifying the notification period to inform affected persons.

The authors also modelled manually a set of privacy policies from the IoT data set (Kuznetsov et al. 2022) that contains policies generated by IoT manufacturers and developed after the adoption of GDPR. Unfortunately, this data set does not have any annotations relating data usage scenarios. Therefore, the authors conducted a simple keyword search in order to determine the presence of notification scenarios in their texts and select a subset of documents. The following keywords were used: “data breach”, “policy change”. This rather rough analysis showed that only 94 of 592 documents contain statements about policy change, and 58 of 592 policies explicitly state about data breach notifications. Moreover, 7 documents of these 58 policies discuss the responsibility of the data subject in case of data breach caused by them. In the rest cases, the information about data breach notification is vague, the details are often missing. For example, this information could be presented as follows: “In order to be in line with Fair Information Practices we will take the following responsive action, should a data breach occur: we will notify you”, or “In the event of a security breach, we will promptly notify you and the proper authorities if required by law”. The most exact description of the data breach notification was discovered in 3plususa (2021), it even specifies the time period during which a user will be notified about a data breach: “In order to be in line with Fair Information Practices, should a data breach occur, we will notify users via email within seven (7) business days.” The corresponding ontology is shown in Fig. 14.

Fig. 14
figure 14

Ontology for breach activity process in 3plususa privacy policy

On the second stage of the experiments, the authors analyzed the OPP-115 data set (Wilson et al. 2016) in automated manner using SPARQL queries. The OPP-115 data set has annotations, and recently it was shown that the used labelling scheme conforms to the GDPR requirements, though it was developed before the adoption of the GDPR (Poplavska et al. 2020). To construct the ontology in an automated manner, the authors mapped the OPP-115 labelling scheme to the developed ontology concepts and properties, and developed a corresponding script in Python. The mapping scheme is given in Table 4 in the Appendix 2. The annotations of the OPP-15 data set contain excerpts from the privacy policy text proving the presence of the privacy concepts; thus, these excerpts were automatically extracted and assigned to the ‘evidence’ property of the corresponding class.

A set of SPARQL queries was developed to answer the questions of interest identified in sections  Data Breach notification scenario and Policy change notification scenario.

The analysis of the OPP-115 labelling scheme showed that there is no information about the data breach notification. Under these circumstances, the analysis focus was done on privacy policy change scenario. We also extended the questions of interests by adding queries relating reasoning about a set of policies, i.e. allowing assessing number of policies with notification mechanism of a given type, or number of policies that contain the description about the policy change cause. The examples of such questions are given below:

  • How many privacy policies include information about notification mechanisms used to inform end users about policy change?

  • What notification mechanisms are used to notify an end user about a policy change?

  • How many privacy policies use general notice as notification mechanism?

  • What are evidences which prove the presence of the notification scenario in the privacy policy?

  • What are causes for policy change in the OPP-115 data set?

  • What are possible causes for a policy change in the particular privacy policy?

The developed set SPARQL queries allow extracting information about attributes of the policy change notifications that present in the text and have evidences in the form of privacy policies excerpts. Examples of some SPARQL queries are given below. The code of the implemented ontology for the OPP-115 data set as well SPARQL queries can be found in Novikova et al. (2022).

The SPARQL query in Listing 1 displays how many privacy policies in the ontology include information about notification mechanisms used to inform end users about a privacy policy change. The result of this query is 93 privacy policies, the rest privacy policies do not have any information about policy change scenario.

figure a

Listing 1 The SPQRQL query requesting a number of privacy policy with notification mechanisms in the case of policy change

The SPARQL query in Listing 2 reveals what notification mechanisms are used in privacy policies to notify end users about a privacy policy change. The result of the query is presented partly (due to a large amount of data) in the Fig. 15. The first column of the Fig. 15 shows the website, which the privacy policy relates to, the second shows a class of notification mechanism that is used to notify end users, the third column represents the evidence of the CommunicationMechanism class related to a particular practice in the privacy policy. It should be clarified that the evidences are presented by the annotations which were made for the OPP-115 dataset.

figure b

Listing 2 The SPQRQL query that requests notification mechanisms specified in privacy policies

Fig. 15
figure 15

Notification mechanisms defined in privacy policies (result of the Listing 2 query)

The next SPARQL query in Listing 6 retrieves a number of different notification mechanisms mentioned in the policy change statements. The result of the query is shown in Fig 16. It should be noted that the total number of the mechanisms exceeds the number of privacy policies, since one privacy policy may specify multiple notification mechanisms. Another reason is one mechanism in the privacy policy may be mentioned several times and, therefore, may have been labelled several times. When parsing annotations of the OPP-115, each annotation was treated independently of the others.

figure c

Listing 3 The SPARQL query that requsts a number of different communication mechanisms in the OPP-115 dataset

Fig. 16
figure 16

The amount of different communication mechanisms mentioned in OPP-115 dataset (results of the Listing 3 query)

The SPARQL query in Listing 4 returns a number of policies that use a general notice on a web site as the notification mechanism in case of their change. For the OPP-115 dataset, this number is equal to 49, which is near a half of the whole data set. The general notices on website is an easy and cheap way to deliver the privacy policy changes to end users.

figure d

Listing 4 The SPARQL query that returns a number of policies with a general notice on the web site as a communication mechanism

The SPARQL query in Listing 5 asks if any causes for a privacy policy change are mentioned in the OPP-115 data set. The result of the query is presented partly (due to the large amount of data) in Fig. 17 where the first column shows the website, to which the privacy policy belongs, the second column shows the type of the PolicyChangeCause, and the third column shows the evidence of the PolicyChangeCause instance.

figure e

Listing 5 The SPARQL query that requests the causes for privacy policy change mentioned in the OPP-115 policies

Fig. 17
figure 17

The causes for privacy policy change mentioned in the OPP-115 policies (results of the Listing 3 query)

The next SPARQL query in Listing 6 retrieves a number of different policy cahnge causes that mentioned in the corresponding statement. The same situation arises here as in the case of querying notification mechanisms, the number of possible causes exceeds the number of documents (see Fig. 18). This situation is explained in the same way as in the previous case.

figure f

Listing 6 The SPARQL query that returns a number of different policy change causes specified in the OPP-115 policies

Fig. 18
figure 18

Number of different policy change causes specified in the OPP-115 policies (results of the Listing 6 query)

The last SPARQL query in Listing 7 answers the question of what the possible reasons for changing the privacy policy are specified in the given privacy policy. In the experiments the policy was chose randomly, its identifier is “cc785552975b4cf8a554544”. The result of the query are provided in Fig. 19.

figure g

Listing 7 The SPARQL query that retrieves possible policy change causes specified in a given policy

Fig. 19
figure 19

Results of the Listing 6 query

Finally, it should be mentioned that the experiments showed that the consequences of privacy policy decline as well as the time period to make a decision about privacy policy are not present in the texts of OPP-115 policies. Perhaps, the absence of these notions in privacy policies are explained by the data of their creation (before the GDPR adoption), however, a similar situation is observed for the latest privacy policies of the IoT devices and cloud computing services.

Discussion and conclusions

The usage of super computing centers currently is a widely adopted solution to analyze large volumes of data collected by different devices, applications and services. Such data often contain sensitive data including health data, location data and etc. Existing legislative regulations such as (GDPR 2016; HIPAA 1996; CCPA 2018) establish strict requirements to the processing of the protected data. One of the requirements is to make the data processing transparent to the data subject, and ensure that they are aware of what types of personal data are processed, what the purposes of the data processing are, what options for controlling data privacy are available to them. This information has to be presented in the text of the privacy policy. The problem of privacy policy’s analysis is currently a highly researched topic, and there are numerous solutions proposed to increase their transparency to the end-user. However, in major cases, the researchers investigate the aspects relating data collection, retention and the third party sharing. The scenarios relating user’s notifications in case of the data breach and policy change are not studied well enough, though the implemented analysis showed that they support the right to be informed defined in GDPR, Art 13.

In this paper, the authors addressed the problem of creating formal models for notification scenarios in case of the data breach and policy change on the basis of the ontological approach. In order to define the key attributes, the authors analyzed the requirements of GDPR, recommendations provided by companies specializing in privacy policy design, and existing privacy policies. This enabled the authors to describe key concepts and define semantic links between them. On the basis of the identified scenario attributes, the questions of interest (or competence questions) were formulated which were used to define SPARQL queries to assess the correctness of the developed ontology. These queries were also used to analyze privacy policies that were developed by both large companies providing cloud computing and big analytics services and small IoT device manufacturers.

The implemented experiments showed that the privacy policies of the selected companies providing cloud computing services do not contain information about notifications in case of breaches. And only 10 % of the privacy policies developed for IoT devices provide such information in the document. The certain reason for this is that this scenario is well detailed in legislative documents, and, thus, there is no need to specify it in the privacy policy, as the data subjects may refer to the legislative documents directly. However, this assumption contradicts to the findings of Zou and Schaub (2019) who showed that users rarely consider the risks arising due to data breach, and, thus, do not read such documents. Moreover, these risks certainly depend on types of personal data and their amount being collected, they are specific for each service and product, therefore, this information along with notification mechanism has to be included in the privacy policy in the context of a data breach notification scenario.

Almost all privacy policies include information about policy update. However, this information is usually very brief and vague. The most commonly used way to notify the end user is a notice update posted on the web page of the privacy policy. Thus, the responsibility to monitor the updates and changes in these documents is delegated to the end user. And this solution could not be considered as a way of increasing the data subject’ awareness about personal data processing, because it was shown that people intent to read privacy policies only if they are engaged in reading in active manner (Karegar et al. 2020).

These findings enable the authors to conclude that transparency of user notification scenarios presented in the privacy policies is still very low, the companies should reconsider the notification mechanisms in both cases and provide more detailed information in privacy policies. Implementation of the actionable notifications may significantly increase the overall awareness of the end user about privacy risks arising due to processing of the personal data, and support users’ trust to the processing entity in case of data breaches. The proposed ontology could be used to evaluate the privacy policy and reason about its transparency and completeness in context of the identified questions of interest.

The performed experiments also outlined certain limitations of the proposed approach. They relate to the automation of the ontology construction. The authors consider that this task could be solved using machine learning techniques, namely, natural language processing techniques, however, it is required to develop labelling scheme conforming with the proposed ontology classes, and to perform annotation of the privacy policy corpus.

Another limitation consists in that modelled notification scenarios consider the attributes that could be found in privacy policies. Although the introduced ‘binded_to’ relation can establish a set of interrelated activities, but could not establish temporal sequence of their implementation. This problem mainly arises from the accurate recognition of concepts in real text, and is included in the future work.

The identified limitations define the directions of the future works. The first of them relates to the automatic detection of the ontology concepts in privacy policies, it includes the task of privacy policy annotation as well as development of machine learning techniques to analyze documents. Another direction of the future works consists in detailing and validation of the data processor obligations in terms of data breach in order to model all processes relating to this activity.

References

  • 3plususa (2021) Available online https://3plususa.com. Accessed 20 Jan 2021

  • Amazon Web Services (2022) Available online https://aws.amazon.com/en/privacy/. Accessed 20 June 2022

  • Ashley P, Hada S, Karjoth G, Schunter M (2007) The description logic handbook: theory, implementation and applications. Cambridge University Press, Cambridge

    Google Scholar 

  • Ashley P, Hada S, Karjoth G, Schunter M (2002) E-p3p privacy policies and privacy authorization. In: Proceedings of the 2002 ACM Workshop on Privacy in the Electronic Society. WPES ’02, pp 103–109. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/644527.644538

  • Azraoui M, Elkhiyaoui K, Önen M, Bernsmed K, De Oliveira AS, Sendor J (2015) A-PPL: an accountability policy language. In: Garcia-Alfaro J, et al (eds) Data privacy management, autonomous spontaneous security, and security assurance. DPM 2014, QASA 2014, SETOP 2014, Lecture Notes in computer science, vol. 8872, pp 319–326. Springer, Switzerland, Cham. https://doi.org/10.1007/978-3-319-17016-9_21

  • Bawany NZ, Shaikh ZA (2017) Data privacy ontology for ubiquitous computing. Int J Adv Comput Sci Appl 8(1). https://doi.org/10.14569/IJACSA.2017.080120

  • Blinded: Blinded for the review

  • California consumer privacy act home page (2018) Available online https://oag.ca.gov/privacy/ccpa. Accessed 20 Jan 2021

  • Cano-Benito J, Cimmino A, García-Castro R (2021) Toward the ontological modeling of smart contracts: a solidity use case. IEEE Access 9:140156–140172. https://doi.org/10.1109/ACCESS.2021.3115577

    Article  Google Scholar 

  • Data Privacy Vocabulary (DPV) (2018) Available online https://w3c.github.io/dpv/dpv/#sotd. Accessed 21 Oct 2021

  • Draw.io (2022) Available online https://app.diagrams.net. Accessed 20 Oct 2021

  • Elluri L, Joshi KP (2018) A knowledge representation of cloud data controls for EU GDPR compliance. In: 2018 IEEE World Congress on Services (SERVICES), pp 45–46. https://doi.org/10.1109/SERVICES.2018.00036

  • Esteves B, Rodríguez-Doncel V (2022) Analysis of ontologies and policy languages to represent information flows in GDPR. Semantic Web

  • GDPR privacy notice template (2019) Available online https://gdpr.eu/privacy-notice. Accessed 20 June 2022

  • General Data Protection Regulation (2016) Available online https://gdpr.eu. Accessed 20 Jan 2021

  • Gerl A, Bennani N, Kosch H, Brunie L (2018) LPL, towards a GDPR-compliant privacy language: formal definition and usage. Large-Scale Data-Knowl.-Centered Syst., vol. 37, pp 41–80 Springer, Switzerland, Cham

  • Gharib M, Giorgini P, Mylopoulos J (2020) COPri: a core ontology for privacy requirements engineering. In: Research challenges in information science. lecture notes in business information processing, vol. 385, pp 472–489. https://doi.org/10.1007/978-3-030-50316-1_28

  • Gharib M, Giorgini P, Mylopoulos J (2021) COPri vol 2: a core ontology for privacy requirements. Data Knowl Eng 133. https://doi.org/10.1016/j.datak.2021.101888

  • Glimm B, Horrocks I, Motik B, Stoilos G, Wang Z (2014) Hermit: an owl 2 reasoner. J. Autom. Reason. 53(3):245–269. https://doi.org/10.1007/s10817-014-9305-1

    Article  Google Scholar 

  • Gonzalez-Granadillo G, Menesidou SA, Papamartzivanos D, Romeu R, Navarro-Llobet D, Okoh C, Nifakos S, Xenakis C, Panaousis X (2021) Automated cyber and privacy risk management toolkit. Sensors 5493(16). https://doi.org/10.3390/s21165493

  • Google Cloud (2022) Available online https://cloud.google.com/terms/cloud-privacy-notice. Accessed 20 June 2022

  • Gopinath AAM, Wilson S, Sadeh NM (2018) Supervised and unsupervised methods for robust separation of section titles and prose text in web documents. In: EMNLP

  • GraphDB by Ontotext (2021) Available online https://www.ontotext.com/products/graphdb/. Accessed 26 June 2022

  • Harkous H, Fawaz K, Lebret R, Schaub F, Shin KG, Aberer K (2018) Polisis: automated analysis and presentation of privacy policies using deep learning. https://arxiv.org/abs/1802.02561

  • Health insurance portability and accountability act (1996) Available online https://www.hhs.gov/hipaa/for-individuals/index.html. Accessed 20 June 2022

  • Hewlett Packard Enterprise privacy notice (2022) Available online https://www.hpe.com/us/en/legal/privacy.html. Accessed 26 June 2022

  • Karegar F, Pettersson JS, Fischer-Hübner S (2020) The dilemma of user engagement in privacy notices: effects of interaction modes and habituation on user attention. ACM Trans Priv Secur 23(1):38. https://doi.org/10.1145/3372296

    Article  Google Scholar 

  • Karjoth G, Schunter M (2002) A privacy policy model for enterprises. In: Proceedings 15th IEEE computer security foundations workshop. CSFW-15, pp 271–281. https://doi.org/10.1109/CSFW.2002.1021821

  • Kost M, Freytag JC (2012) Privacy analysis using ontologies. In: Proceedings of the second ACM conference on data and application security and privacy. CODASPY ’12, pp 205–216. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2133601.2133627

  • Kuznetsov M, Novikova E, Kotenko I (2022) An approach to formal desription of the user notification scenarios in privacy policies. In: 2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Valladolid, Spain, 2022, pp. 275-282. https://doi.org/10.1109/PDP55904.2022.00049.

  • Lamy J (2017) Owlready: ontology-oriented programming in python with automatic classification and high level constructs for biomedical ontologies. Artif Intell Med 80:11–28

    Article  Google Scholar 

  • Leicht J, Heisel M (2019) A survey on privacy policy languages: expressiveness concerning data protection regulations. In: 12th CMI Conference on Cybersecurity and Privacy (CMI), pp 1–6. https://doi.org/10.1109/CMI48017.2019.8962144

  • Thomas L (2021). Most victims of data breaches are unaware. Michigan Today. Available online: https://michigantoday.umich.edu/2021/06/25/most-victims-of-data-breaches-remain-unaware. Accessed 21 Oct 2021

  • Novikova E, Doynikova E, Kotenko I (2020) P2onto: making privacy policies transparent. In: Katsikas S, et al (eds.) Computer Security. CyberICPS 2020, SECPRE 2020, ADIoT 2020, Lecture Notes in Computer Science, vol. 12501, pp 235–252. Springer, Switzerland, Cham. https://doi.org/10.1007/978-3-030-64330-0_15

  • Novikova E, Kuzntesov M, Kotenko I (2022) The enhanced P2Onto ontology. GitHub repository. Available: https://github.com/kuznetsovmd/privacy-ontology. Accessed 13 Oct 2023

  • Oltramari A et al (2018) PrivOnto: a semantic framework for the analysis of privacy policies. Semantic Web 9:185–203. https://doi.org/10.1109/ACCESS.2021.3115577

    Article  Google Scholar 

  • Palmirani M, Martoni M, Rossi A, Bartolini C, Robaldo L (2018) Pronto: privacy ontology for legal reasoning. In: Kő A, Francesconi E (eds) Electronic Government and the information systems perspective. Springer, Cham, pp 139–152

    Chapter  Google Scholar 

  • Pandit HJ, O’Sullivan D, Lewis D (2018) An ontology design pattern for describing personal data in privacy policies. In: WOP@ISWC

  • Pandit HJ, Debruyne C, O’Sullivan D, Lewis D (2019) GConsent: a consent ontology based on the GDPR. In: Hitzler P, Fernández M, Janowicz K, Zaveri A, Gray AJG, Lopez V, Haller A, Hammar K (eds) The semantic web. Springer, Cham, pp 270–282

    Chapter  Google Scholar 

  • Pardo R, Le Métayer D (2019) Analysis of privacy policies to enhance informed consent. DBSec 2019, Lecture Notes in Computer Science, vol. 11559, pp. 177–198. Springer, Switzerland, Cham. https://doi.org/10.1007/978-3-030-22479-0_10

  • PDPA overview (2012) Available online https://www.pdpc.gov.sg/Overview-of-PDPA/The-Legislation/Personal-Data-Protection-Act. Accessed 20 Jan 2021

  • Poplavska E, Norton TB, Wilson S, Sadeh NM (2020) From prescription to description: mapping the GDPR to a privacy policy corpus annotation scheme. In: JURIX

  • Protégé is a free, open-source ontology editor and framework for building intelligent systems (2014). Available online https://protege.stanford.edu. 20 Oct 2021

  • Santoro F, Baião F, Rodrigues Teixeira B (2018) MyMemory: an ontology for privacy protection in external digital memories. In: Proceedings of the second AMCIS conference, available online https://aisel.aisnet.org/amcis2018/Philosophy/Presentations/2

  • Tang Y, Meersman R (2002) Judicial support systems: ideas for a privacy ontology-based case analyzer. Lecture Notes Comput Sci 3762:800–807. https://doi.org/10.1007/11575863_100

    Article  Google Scholar 

  • Tesfay WB, Hofmann P, Nakamura T, Kiyomoto S, Serna J (2018) Privacyguide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In: Proceedings of the fourth ACM international workshop on security and privacy analytics. IWSPA ’18, pp 15–21. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3180445.3180447

  • The World Wide Web Consortium (W3C) (1994). Available online https://www.w3.org/. Accessed 10 Oct 2023

  • Torre D, Soltana G, Sabetzadeh M, Briand LC, Auffinger Y, Goes P (2019) Using models to enable compliance checking against the GDPR: an experience report. In: 2019 ACM/IEEE 22nd international conference on model driven engineering languages and systems (MODELS), pp 1–11. https://doi.org/10.1109/MODELS.2019.00-20

  • Wilson A, Schaub F, Dara A, Liu F, Cherivirala S, Leon P (2016) The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 1330–1340

  • Wilson S, Schaub F, Liu F, Sathyendra KM, Smullen D, Zimmeck S, Ramanath R, Story P, Liu F, Sadeh N, Smith NA (2018) Analyzing privacy policies at scale: from crowdsourcing to automated annotations. ACM Trans. Web 13(1). https://doi.org/10.1145/3230665

  • Yandex privacy notice (2022) Available online https://yandex.ru/legal/confidential/. Accessed 26 June 2022

  • Zimmeck S, et al (2019) Maps: Scaling privacy compliance analysis to a million apps. In: Proceedings on privacy enhancing technologies, vol 66. https://doi.org/10.1145/2133601.2133627

  • Zou Y, Schaub F (2019) Beyond mandatory: making data breach notifications useful for consumers. IEEE Security Privacy 17(2):67–72. https://doi.org/10.1109/MSEC.2019.2897834

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

EN: Conceptualization and Methodology, Formal analysis, Writing - Review & Editing; MK: Formal analysis, Software, Data curation, Writing - Original Draft, Visualization; IK: Validation, Writing - Review & Editing, Project supervision.

Corresponding author

Correspondence to Evgenia Novikova.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: The enhanced P2Onto ontology

See Fig. 20.

Fig. 20
figure 20

Overview of the classes and properties of the enhanced P2Onto ontology

Appendix 2: The mapping of P2Onto for OPP-115 scenarios

See Table 4.

Table 4 Personal data processing scenarios and their mapping to P2Onto

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuznetsov, M., Novikova, E. & Kotenko, I. Modelling user notification scenarios in privacy policies. Cybersecurity 7, 41 (2024). https://doi.org/10.1186/s42400-024-00234-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42400-024-00234-8

Keywords