WEBVTT

0
00:00.950 --> 00:03.410
In the connected world we live in

1
00:03.620 --> 00:06.890
it becomes harder and harder to remain anonymous.

2
00:07.550 --> 00:09.470
When we connect to the Internet,

3
00:09.530 --> 00:12.350
to browse a website, send an e-mail

4
00:12.440 --> 00:13.790
or use another service

5
00:14.180 --> 00:21.710
our identity and online activity could be tracked by governments and private companies, mainly to serve

6
00:21.710 --> 00:23.390
us targeted ads.

7
00:24.140 --> 00:30.020
I think that anyone has searched on Google at least once for something ordinary like sports shoes,

8
00:30.410 --> 00:33.500
and was later bumped by tens of related.

9
00:33.710 --> 00:36.110
adds on third party websites.

10
00:37.080 --> 00:43.080
There is nothing wrong with being anonymous and controlling your own personal privacy

11
00:43.410 --> 00:50.060
if you are doing a legitimate business. In this lecture, we'll discuss how online tracking really works.

12
00:51.130 --> 00:58.900
But what is actually web tracking? It is the practice by which websites identify and collect information

13
00:59.170 --> 01:07.840
about users; without tracking e-commerce and advertising businesses would treat every user is a stranger

14
01:08.170 --> 01:11.470
and would be unable to show him personalized content.

15
01:12.540 --> 01:20.400
Whenever you use the Internet, you leave a record of the websites you have visited, along with each

16
01:20.460 --> 01:24.870
and everything you've clicked. To track this information

17
01:25.140 --> 01:33.480
many websites save a small piece of data, embed invisible objects or use your user accounts and

18
01:33.480 --> 01:37.110
hardware configuration to uniquely identify you.

19
01:38.990 --> 01:44.720
Say, for example, you visit the website and read the review about a pair of shoes you'd like to buy.

20
01:45.290 --> 01:49.280
The website you visit knows what you are interested in, 

21
01:49.490 --> 01:54.200
shoes in this case, and it is referred to as a first-party.

22
01:54.830 --> 02:01.430
At this moment, you won't find anything strange as you don't actually care that the website knows

23
02:01.640 --> 02:07.730
you are interested in shoes because you are the person who willingly visited the website and clicked

24
02:07.820 --> 02:09.680
on the link of the review page.

25
02:10.340 --> 02:13.010
But what happens next is troublesome.

26
02:13.730 --> 02:21.660
The website you are visiting has embedded a 3rd party-tracker like AddRoll.com

27
02:21.680 --> 02:27.320
or doubleclick.com - that logs all of the information about you and your visit as well.

28
02:27.890 --> 02:36.490
So a third party tracker is an entity, other than the website directly visited by the user, that tracks

29
02:36.500 --> 02:38.840
their users visit to the website.

30
02:40.040 --> 02:46.730
Using sophisticated techniques and by collecting and correlating your data on different occasions in

31
02:46.730 --> 02:54.140
the past, the tracker could uniquely identify you and know everything about you, such as your

32
02:54.140 --> 03:01.640
age, your gender, your location or address, your income, your medical status, your habits, your

33
03:01.640 --> 03:03.770
favorite websites and so on.

34
03:04.160 --> 03:10.780
So just imagine if you are reading a review about a pair of shoes you want to buy on a random website

35
03:11.050 --> 03:17.270
and that website already knows everything about you, just like your best friend or your spouse.

36
03:18.180 --> 03:25.670
Maybe you think that Web tracking is anonymous since only habits are connected, not your real identity,

37
03:25.670 --> 03:34.220
like your name, address or government I.D. But when a website like Facebook acts as a third party tracker,

38
03:34.580 --> 03:36.440
then your identity is exposed

39
03:36.530 --> 03:43.550
since you've created a Facebook account and you are logged in. Or even if you aren't logged in; this

40
03:43.550 --> 03:44.410
is really creepy!

41
03:45.650 --> 03:48.340
Let’s see what are the tracking mechanism!

42
03:48.640 --> 03:56.740
The most commonly used ways to track a user are by using cookies, browser fingerprinting and invisible

43
03:56.740 --> 04:00.490
pixels or beacons. Let's take them one by one!

44
04:01.740 --> 04:09.250
A cookie is a small text file saved locally on your machine, in your browser storage by a web server.

45
04:09.690 --> 04:16.170
When you visit the website for the first time, a cookie file with a unique identifier is stored on

46
04:16.170 --> 04:16.900
your computer.

47
04:18.140 --> 04:24.890
Subsequently, when you revisit the website, the web server will read the cookie to find your unique

48
04:24.890 --> 04:32.300
ID so that it can uniquely identify you; and then eat to retrieve all information about you from its

49
04:32.300 --> 04:33.110
database.

50
04:34.060 --> 04:41.660
Modern websites used cookies for two main purposes: keeping you locked in and tracking your behavior.

51
04:42.110 --> 04:48.530
I'm sure you've noticed that websites like Google.com or Facebook.com do not require you to

52
04:48.530 --> 04:55.700
log in each time you visit the website and that's because your details will be remembered by the browser

53
04:55.920 --> 04:58.760
through a cookie stored during your first log in.

54
04:59.270 --> 05:05.390
If you want to view the cookies saved in your browser, it depends on your browser you use.

55
05:05.680 --> 05:11.960
But on Chrome, go to Chrome menu in the top right corner select settings

56
05:13.920 --> 05:18.720
and under privacy and security, click on a site settings.

57
05:20.140 --> 05:22.900
And then click on cookies and cite data.

58
05:23.900 --> 05:30.770
Here you can block all third party cookies or view all cookies that were saved in your browser.

59
05:32.410 --> 05:34.840
Let's all cookies and side data.

60
05:36.220 --> 05:39.950
These are all cookies saved in my browser.

61
05:41.320 --> 05:45.270
On this page, you can also remove one or all cookies.

62
05:50.390 --> 05:55.070
Another technique used for tracking is called browser fingerprinting.

63
05:55.550 --> 06:03.800
This is another incredibly accurate method of identifying users uniquely and tracking their online activity.

64
06:04.820 --> 06:11.270
When you connect to the Internet, your device will hand over a bunch of specific data to the receiving

65
06:11.270 --> 06:17.570
server, which includes the browser type and version operating system and version, screen resolution,

66
06:17.600 --> 06:22.880
supported fonts, plugins, time zone, language and phone preferences

67
06:23.180 --> 06:25.280
and even harder configurations.

68
06:26.240 --> 06:33.170
These data points might seem generic at first and don't necessarily look tailored to identify one

69
06:33.170 --> 06:34.190
specific person.

70
06:34.900 --> 06:40.680
However, there's a significantly smal chance for another user to have 100% 

71
06:40.710 --> 06:42.560
matching browsing information.

72
06:45.020 --> 06:52.230
A service called Panopticlick found that only 1 in 286,000 other


73
06:52.230 --> 06:56.490
browsers will share the same fingerprint as another user.

74
06:57.490 --> 06:59.200
Take a look at this paper,

75
06:59.440 --> 07:01.720
if you want to go deeper into it.

76
07:03.120 --> 07:06.750
You'll also find it as an attachment to this lecture.

77
07:07.900 --> 07:15.190
The uniqueness of browser information is closely related to the investigation method of the police

78
07:15.430 --> 07:23.560
who identifies suspects and criminals based on fingerprints at the crime scene; browser fingerprinting

79
07:23.770 --> 07:24.700
works the same.

80
07:25.220 --> 07:32.770
Websites collect a large set of data from visitors in order to later use it to match against the browser

81
07:32.770 --> 07:34.960
fingerprints of known users.

82
07:36.490 --> 07:43.210
Or in other words, they will try to uniquely identify you using these generic information your browser

83
07:43.270 --> 07:45.110
will hand over to the server.

84
07:46.260 --> 07:53.520
You can test if your browser is safe against tracking at this address. Let's test

85
07:53.610 --> 07:56.760
The latest version of Chrome on Windows 10.

86
08:00.750 --> 08:02.310
It's testing my browser.

87
08:05.800 --> 08:06.990
And the results 

88
08:07.240 --> 08:08.350
don't look so good.

89
08:10.050 --> 08:14.190
We see that it doesn't protect us too much against tracking.

90
08:15.670 --> 08:23.440
From my tests with all known browsers, I've noticed that Brave browser offers the best tracking

91
08:23.440 --> 08:26.110
protection using the default settings.

92
08:26.500 --> 08:29.620
This is the Brave browser; let's test it
.

93
08:35.920 --> 08:39.550
And it looks much better compared to Krumm.

94
08:40.530 --> 08:42.900
This is brave, and this is Krumm.

95
08:45.800 --> 08:52.220
And the last common way to track users is by using a so-called tracking pixel or the web beacon.

96
08:52.760 --> 08:57.230
This is an invisible object embedded in a Web page or an e-mail,

97
08:57.590 --> 09:04.610
most of the time as a transparent image with the size of one pixel that is loaded by the browser

98
09:04.670 --> 09:10.580
when a Web page is loaded or a link in the e-mail clicked. This way

99
09:10.820 --> 09:12.710
they know that their advertisment

100
09:13.160 --> 09:17.330
e-mail has just been opened or their web page visited.

101
09:19.130 --> 09:25.580
This is, by the way, another reason why you should not display images in the e-mails received from

102
09:25.580 --> 09:26.260
senders

103
09:26.330 --> 09:27.320
you do not trust.

104
09:28.240 --> 09:37.240
Because it is so small, the tracking pixel can hardly be seen by visitors of a Web site or e-mail recipients.

105
09:37.780 --> 09:41.380
When connecting to the server to download the tracking pixel

106
09:41.650 --> 09:48.400
some JavaScript code is run on the client and tracking information like operating systems, screen resolution,

107
09:48.700 --> 09:54.700
the ip address, activities on the website during the session are acquired.

108
09:56.200 --> 09:56.740
OK.

109
09:57.100 --> 10:02.950
These were the most common tracking mechanisms in the next lecture

110
10:03.220 --> 10:05.890
we'll see how to protect us against tracking.