PhD Theses

PhD theses from CMI

Systematic Analysis and Visualization of Privacy Policies of Online Services

Prashant Shantaram Dhotre


Due to the advancement in mobile and wireless communications in today’s digital world, Internet services like social networks, search engines, etc. have brought many benefits to the users. However, most of the services also collect excessive information about the users and their day-to-day activities online. Using “Big Data” technologies, the user information is collected and analysed by the service providers to improve their services, an approach giving rise to several privacy concerns.

For service providers, user information has become an important part of their business model and an economic asset. On the service provider side, the user information is collected, stored, processed, and analysed to get additional value from it, often without the users’ consent, which constitutes a major privacy risk. Once the information has been disclosed, the users have no control over it.

Although the business practices of the service providers are usually specified in the form of privacy policies (terms of use), these documents are time-consuming to read and complicated to understand, and users do not really know what happens to their data. Hence, increasing the privacy awareness is an important means to empower the users towards the service providers.

Presently, several privacy awareness tools, e.g. website rating tools (based on users’ experience), and blocking tools (blocking hidden data trackers, advertisers, third parties, etc.) are available. However, there is still a clear need to increase the user’s privacy awareness and assist them in understanding the content of privacy policies. This was confirmed by a comprehensive survey with Indian users, which was carried out during this project.

In the thesis, a new Privacy Policy Elucidator Tool (PPET) is proposed and implemented. It is capable of classifying, summarizing and visualizing the contents of privacy policies of service providers. Using a Naïve Bayes approach, the PPET tool classifies the contents of privacy policies into different sections, dealing with the collection, sharing, usage, protection, and management of user information. The tool extracts and summarizes the policy content, provides a graphic visualization of it, and thereby assists the users to learn and understand the practices of service providers.

For test and performance evaluation of the PPET, a number of training and testing records in the form of a matrix was used. The PPET achieved more than 95 % accuracy for classification of privacy policies into predefined sections. This accuracy is also well supported by the analysis of the user feedback on the PPET. According to the user feedback, the PPET served to motivate users to read the privacy policies and helped them in enhancing their privacy awareness.

Another important result of this work is the detection that the current unstructured privacy policies do not comply with the general privacy design guidelines and privacy regulations. Hence, this thesis also proposes a standardized uniform template for the privacy policies, which is aligned with the upcoming EU General Data Protection Regulation (GDPR), which will be enforced from 2018.

Date of defence: 2017.12.18

See his publications

PhD theses