Online User Tracking in Extreme Right and Left-Leaning Websites

Online User Tracking in Extreme Right and Left-Leaning Websites

In the era of mass Web monitoring, users are being tracked and their behavioral data collected and used, typically for ad-targeting purposes. The tracking happens when users visit a publisher (first-party) to consume content (e.g., news, video, music, etc.), but it is performed by third-party entities such as online trackers, analytics, and advertisers embedded in the visited publishers.

Telefonica Research in action:

In Telefonica Research, in Barcelona, we have been studying online user privacy and how personally identifiable information (PII) leaks to such third-parties, while users are browsing the Web. With this research, our overall goal is to increase transparency on the Web, as well as user awareness on these obscure practices by the advertising ecosystem. During this multi-year effort, we have reversed-engineered several advertising protocols such as Real-Time Bidding [1], Cookie Synchronization [2], Cross-Device Tracking [3], Header Bidding [4]. We have found how these protocols are used to track and target users, and published our findings in top academic, peer-reviewed venues. We try to understand how third parties of the advertising ecosystem may be violating users’ privacy (and in effect GDPR [5]) while tracking and targeting users for ads.

What did we studied?

To this end, we recently put effort [6] to understand if online users are being tracked differently based on their demographics (e.g., gender and age), especially when they visit extreme left and right-wing news websites (i.e., websites with a particular political leaning) that are typically spreading fake news and may be facilitating politically targeted advertising. This study, which was also covered by journalists of Wired.com [7] and other media, analyzed the tracking and targeting detected within network traffic produced by synthetic users (personas) which exhibit specific characteristics (e.g., senior woman, or young man). We focused on the USA Web ecosystem, in which extreme left and right-leaning websites have been instrumental in skewing public opinion in political events such as presidential and parliamentary elections. Also, the USA advertising ecosystem is more complex with respect to methods and intensity for tracking and targeting users for ads.

What did we find?

We found that extreme right-leaning websites have embedded significantly more third parties, and store up to 25% more cookies to user browsers than extreme left-leaning sites. We also found that popular, highly-ranked partisan websites track users more intensely than lower-ranked sites. In fact, right-leaning hyper-partisan websites support third-parties to track more intensely their users than left-leaning sites, by facilitating up to 50% more cookie synchronizations (cookie synchronization is the protocol used by such companies to synchronize their cookies with each other, in order to get more information about a targeted user) between online trackers than left-leaning sites. Furthermore, extreme right-leaning websites deliver ads to that cost up to 5x more than in left-leaning sites, as a consequence of more intense tracking and ad-targeting performed.

Regarding the demographics of users and how they were being tracked, we found that our synthetic users which exhibit realistic and representative demographic characteristics, tend to receive up to 15% more tracking (cookies) from such extreme left and right websites, than baseline personas (i.e., with no characteristics). Also, single-feature personas (e.g., Woman, Man, Youth), are highly tracked by default, no matter what party-leaning they demonstrate through their visits.

Why we do this research?

This line of research helps Telefonica understand what privacy problems exist on the Web, and investigate ways to better protect Telefonica users from illegal tracking that violates regulations such as GDPR [5], e-Privacy [8], and even the recent CCPA [9] in the USA. In essence, this research allows Telefonica to better understand how its online products and services can be improved, without violating, but instead boosting, users’ privacy and trust.

References:

[1] P. Papadopoulos, N. Kourtellis, P. Rodriguez Rodriguez, N. Laoutaris. If you are not paying for it, you are the product: How much do advertisers pay to reach you?
Conference: ACM Internet Measurements Conference (IMC), London, UK, 2017.
DOI: https://doi.org/10.1145/3131365.3131397
Preprint: https://arxiv.org/pdf/1701.07058

[2] P. Papadopoulos, N. Kourtellis, E. P. Markatos. Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask.
Conference: ACM Web Conference (WWW), San Francisco, USA, 2019.
DOI: https://doi.org/10.1145/3308558.3313542
Preprint: https://arxiv.org/pdf/1805.10505

[3] K. Solomos, P. Ilia, S. Ioannidis, N. Kourtellis. Talon: An Automated Framework for Cross-Device Tracking Detection.
Conference: USENIX RAID, Beijing, China, 2019
Preprint: https://www.usenix.org/system/files/raid2019-solomos.pdf

[4] M. Pachilakis, P. Papadopoulos, E. P. Markatos, N. Kourtellis. No More Chasing Waterfalls: A Measurement Study of the Header Bidding Ad-Ecosystem
Conference: ACM Internet Measurements Conference (IMC), Amsterdam, Netherlands, 2019.
DOI: https://doi.org/10.1145/3355369.3355582
Preprint: https://arxiv.org/pdf/1907.12649

[[5] General Data Protection Regulation (GDPR). https://gdpr.eu/

[6] P. Agarwal, S. Joglekar, P. Papadopoulos, N. Sastry, N. Kourtellis. Stop tracking me Bro! Differential Tracking of User Demographics on Hyper-Partisan Websites.
Conference: ACM Web Conference (WWW), Taipei, Taiwan, 2020.
DOI: https://doi.org/10.1145/3366423.3380221
Preprint: https://arxiv.org/pdf/2002.00934

[7] https://www.wired.com/story/right-left-news-site-ad-tracking/

[8] e-Privacy. https://ec.europa.eu/digital-single-market/en/proposal-eprivacy-regulation

[9] California Consumer Privacy Act (CCPA), 2020. https://oag.ca.gov/privacy/ccpa

Short Bio for the author:

Dr. Nicolas Kourtellis is a Research Scientist in the Telefonica R&D team, in Barcelona. He holds a Ph.D. in Computer Science and Engineering from the University of South Florida (2012). He has published more than 60 papers, and presented his work in top academic conferences and journals. His primary interests are 1) user online privacy and personal data leakage detection, 2) characterization of online user behavior on social media, 3) streaming mining. His work on online user privacy, personal data leakage detection and web transparency has been partially funded by the European Commission, with projects such as Types (653449), Protasis (690972), IbidaaS (780787), Concordia (830927), Pimcity (871370) and Accordion (871793).