Understanding Worldwide Private Information Collection on Android
Data plays a vital role in running much of the online world and the digital ecosystem. As smart devices, especially smartphones, are becoming more critical for users worldwide, we do not realize that they are also just reliable sources of rich information about us (e.g., where you go, what activities you do, etc.).
Most mobile apps require access to some sort of your information and obtain specific permissions on the device you are using. In most cases, your information will be shared after you enable the device permissions. Once the approval is given, however, you probably won’t remember which app collects what information, not to mention
tracing the location the data is transmitted to and who may further process, use, and control the data collected.
Therefore, it is pretty challenging to get a wide view of the information collected by those mobile apps. For instance, you may ask questions like how many apps on my smartphone collect private information, what kind of data these apps collect, which company processes and stores my confidential data, etc. We conducted a study together with researchers from Boston University where we investigated 22 categories of information that may affect the user’s privacy. Our goal is to address the above questions and understand worldwide private information collection on Android phones by analyzing the information flow (i.e., which app collects what information to which domain) generated by 2.1M different applications installed by 17.3M users over 21 months between 2018 and 2019. We listed them in Table 1 below:
Is Private Information Collection Pervasive in Mobile Apps?
It is now a common practice that the apps installed on your smartphone request information about you and the device (e.g., your name, email address, geographic location) before you can use them. We try to understand if private information collection is persistent in mobile apps.1
By analyzing our dataset, we discovered that, on average, a mobile app sends private information to 2 different domains. We also observed that over 57.6K apps (installed on 12.8M devices collectively) collect at least 5 unique categories of private information and send them to at least 5 unique domains. Our findings confirm that private information collection in mobile apps is universal and diversified at the same time, which highlights the need for additional security and privacy layer on the devices.
Figure 1. Top 25 data controlling apps ranked by the fraction of devices they collect private information from. These 25 data controllers collect personal data from 13.9M devices in total, covering 80.2% of all devices used in this study.
Figure 2. Heatmap illustrating global top 20 domains that collect top 12 types of private information. Each row is
normalized to [0, 1] by a PIC domain’s total device penetration rate. Darker red implies that more devices are distributing their data in a PIC domain (i.e., a public domain)
Who collects and processes private information?
We further analyzed who ultimately obtains and processes the information collected by the mobile apps. We used our authorized technology to reveal the domains’ ownership to which the private information was transmitted. These domains were then ranked by the fraction of devices they collect private information from. Figure 1 depicts the top 25 data processors and controllers. These data processors and controllers gather private information from 13.9M devices. Notably, 2 out of 3 devices would have their information collected by either Facebook or Alphabet. Figure 2 depicts the top 12 types of private data collected by the global top 20 domains. We observed that these domain companies consistently collect four types of personal information from the users - device, sim card, location, and settings information. These kinds of data enables them to track the users more systematically.
Figure 3. Sankey diagrams illustrating private information flows between EU27 (European Union) and top 20 domain locations.
GDPR and its impact on private information flow
The European Union’s (EU) General Data Protection Regulation (GDPR) entered into effect on May 25th, 2018. However, the implementation of GDPR did not significantly change the flow of personal data originating from EU countries to countries outside the EU, see Figure 3. Our observations of these data flows show that confinement within the EU is low. Germany and Ireland are the only two European countries that host a reasonable portion of private information gathered from Europe. At the same time, the United States dominates the personal information collection in the EU.
PDPB and its impact on private information flow
The EU requires any personal data flow from one country to another with compliance to all laws and regulations regarding data privacy on both parties, recipients, and the data transferring party. Similarly, the protection of data privacy must be confirmed before any of such agreements take place. Collection of data that do not comply with the protection provision guaranteed by the GDPR will be suspended.2 This somewhat controls the private information flow in the EU.
In India, a recently approved regulation Personal Data Protection Bill (PDPB) subjects sensitive personal data, such as passwords, financial data, official identifiers, genetic data, etc., along with other basic personal data like name, fingerprints, contact details, and more. The PDPB is, in fact, one of the most comprehensive laws regarding data privacy and seems to contain some strict prohibitions on data flow, similar to that of GDPR. It requires all the business bodies to reaccess their safeguards, data processing practices, and policies. This will help regulate the flow of personal data and control what data has to be protected.
Why do you see intrusive ads?
Potentially harmful applications (PHAs) could put users, user data, or their devices at risk (e.g., trojan and spyware infection, etc.).3 We identified 1.2M PHAs were installed on 3.8M devices. We uncovered that 116K PHAs (installed on 393K devices) collect operator information, and 63K PHAs (installed on 280K devices) also collect running app information on a global scale. As we can see in Figure 4, such aggressive private information collection behaviour enables adversaries to profile the users better and may lead to some intrusive advertisements. For example, we also uncovered that 590K PHAs installed devices are affected by notification bar ads (i.e., ads displayed as app notifications), and 317k devices suffer from short-cut ads (i.e., targeted ads placed on the home screen).
Figure 4. Heatmap illustration of private information collection by PHAs in different regions. Implications to the research community and the policymakers
Our findings highlight several challenges faced by the research community when studying private information collection on Android. We showed that looking at device penetration is critical to observe the distribution of information collected online. We also hope that our study will encourage policymakers to think critically about how and what private information is used for and shared among the companies and how accountability and customer choice can be truly guaranteed.
Implications to the consumers
Your privacy is absolutely critical, and protecting your personal data is the key to remain safe. If you follow careful steps, it can help reduce the risk of suffering from cybercrimes such as identity theft, blackmailing, or worse, hackers could sell your private info on the dark web. We have the following recommendations for users who want to take more control over their privacy on their mobile devices.
Read Privacy Policies
Turn off ad personalization
Every Android device has a unique Advertising ID. It allows app developers and the Google ad network to identify your Android device and then target you with ads. You can opt out of ad personalization so that your Advertising ID won't be used by the services to target ads to you.
Raise privacy awareness
You can take different steps to protect your personal information while using your mobile for your daily tasks. Firstly, it’s vital that you are aware of privacy risks and ways to remain shielded against those threats. At the same time, you must make informed choices to minimize unwanted information disclosure. The latest tips on how to raise your privacy awareness can found here to equip yourself with best practices to take control of what you share online.