0
1
00:00:00,810 --> 00:00:06,450
So how does machine learning actually happen in practice?
1

2
00:00:06,450 --> 00:00:08,430
I mean it can't just be magic, right?
2

3
00:00:09,540 --> 00:00:17,460
In this lesson you and I are going to establish a useful framework for thinking about machine learning techniques.
3

4
00:00:17,520 --> 00:00:24,090
This is going to be our basis for thinking about the gradient descent algorithm. So at the most basic
4

5
00:00:24,090 --> 00:00:29,440
level what we're doing is feeding a whole bunch of data into a computer
5

6
00:00:29,520 --> 00:00:33,550
and it gives us back some solution, some answer.
6

7
00:00:33,720 --> 00:00:38,940
The thing that our computer is actually learning is the relationship in the data.
7

8
00:00:38,940 --> 00:00:44,820
So how is it that we can feed a whole bunch of data into our Python program and our program spits out
8

9
00:00:44,910 --> 00:00:49,160
a function that describes the relationship in this data?
9

10
00:00:49,290 --> 00:00:56,880
What are the steps involved in how our machine learns this mathematical function? In a very simple linear
10

11
00:00:56,880 --> 00:00:58,180
regression example,
11

12
00:00:58,200 --> 00:01:05,580
our machine learning program has had to learn the orange theta zero and the green theta one parameters
12

13
00:01:05,700 --> 00:01:07,410
in this equation.
13

14
00:01:07,410 --> 00:01:11,460
And that's going to be on the basis of the data points that it was given.
14

15
00:01:11,550 --> 00:01:16,800
So you can keep this example in mind in what I'm about to describe to you, but this framework that we're
15

16
00:01:16,800 --> 00:01:23,130
going to talk about goes well beyond regression and that's because many, many, many machine learning techniques
16

17
00:01:23,460 --> 00:01:29,040
follow pretty much the same three step process to arrive at their solution.
17

18
00:01:29,040 --> 00:01:38,320
And here it is - step one is to make a prediction, predict what exactly? Well, the coefficients in our function
18

19
00:01:38,320 --> 00:01:42,370
for example, the theta zero and theta one. Our machine is learning a function,
19

20
00:01:42,370 --> 00:01:47,900
so it has to start by predicting the coefficients in that function.
20

21
00:01:47,950 --> 00:01:56,080
Now, the very first time this happens the very first prediction is pretty much like a completely random
21

22
00:01:56,080 --> 00:01:57,060
guess.
22

23
00:01:57,160 --> 00:01:58,450
So let's move on to step two.
23

24
00:01:59,410 --> 00:02:01,270
After making the prediction,
24

25
00:02:01,270 --> 00:02:03,960
step two is calculating the error -
25

26
00:02:04,060 --> 00:02:09,580
in other words we need to measure how good the prediction was.
26

27
00:02:09,610 --> 00:02:17,670
We need to calculate how far off we were from the data and that's why we calculate the size of our error.
27

28
00:02:18,100 --> 00:02:21,640
And step three is the learning step.
28

29
00:02:21,640 --> 00:02:24,880
This is where we adjust our initial prediction.
29

30
00:02:24,880 --> 00:02:26,650
And this is the crucial part, right?
30

31
00:02:26,680 --> 00:02:28,150
First we made a prediction.
31

32
00:02:28,150 --> 00:02:34,490
Second, we compared our prediction to the data and now it's time to learn from our mistakes.
32

33
00:02:34,490 --> 00:02:34,980
Yeah.
33

34
00:02:35,110 --> 00:02:43,640
Having figured out how far off we were in the previous step, we can now make a change to the coefficients.
34

35
00:02:43,770 --> 00:02:44,870
But, we're not done just yet,
35

36
00:02:44,880 --> 00:02:45,380
right?
36

37
00:02:45,390 --> 00:02:48,930
This was only the first run through. At this point,
37

38
00:02:48,930 --> 00:02:53,280
we're going to go back to step one and make a new prediction.
38

39
00:02:53,580 --> 00:02:58,620
This new prediction is going to have our modified coefficients.
39

40
00:02:58,620 --> 00:03:05,610
So using this new prediction, we once again calculate how badly we did and calculate the error.
40

41
00:03:05,610 --> 00:03:09,630
Hopefully this time round the error is smaller than the first time round.
41

42
00:03:10,110 --> 00:03:17,010
So, having measured the error and how badly we did, we adjust our prediction once again and then rinse
42

43
00:03:17,010 --> 00:03:18,450
and repeat.
43

44
00:03:18,450 --> 00:03:26,450
So, in summary, there are three steps. Number one is predict or infer the theta values of the function.
44

45
00:03:26,470 --> 00:03:32,860
Number two is calculate the error and measure how far off we were in our prediction from the data.
45

46
00:03:32,860 --> 00:03:37,500
And step three is making an adjustment to have a smaller error
46

47
00:03:37,510 --> 00:03:44,910
the next time round, and slowly learn the best coefficients. And this is the learning process.
47

48
00:03:45,220 --> 00:03:50,380
When we're writing our Python code in this module, this is how we can think about training our machine
48

49
00:03:50,380 --> 00:03:51,760
learning model.
49

50
00:03:51,820 --> 00:03:56,890
Now there is actually a name for this kind of step by step approach that we just described.
50

51
00:03:56,890 --> 00:04:03,840
This is called an algorithm. An algorithm is a set of instructions for solving a problem.
51

52
00:04:03,910 --> 00:04:11,920
The Cambridge Dictionary defines an algorithm as a set of mathematical instructions or rules that, especially
52

53
00:04:11,920 --> 00:04:16,860
if given to a computer, will help calculate an answer to a problem.
53

54
00:04:17,840 --> 00:04:23,390
You know, the thing is you and I are probably more familiar with a different usage of this word, right?
54

55
00:04:23,390 --> 00:04:29,090
Having heard sentences like "My app uses an algorithm to predict if fans of one particular band will
55

56
00:04:29,090 --> 00:04:31,780
also like music from another band."
56

57
00:04:31,970 --> 00:04:39,890
So it's perfectly understandable that most people think that the word algorithm is actually a word used
57

58
00:04:39,890 --> 00:04:42,980
by programmers when they don't want explain what they did.
58

59
00:04:42,980 --> 00:04:49,070
So before moving on to the next lesson I'm going to leave you with a fun fact. The word algorithm actually
59

60
00:04:49,070 --> 00:04:51,580
gets his name from a guy, right?
60

61
00:04:51,620 --> 00:04:57,220
Mohammed Ibn Musa Al-Khwarizmi. Al-Khwarizmi, algorithm.
61

62
00:04:57,440 --> 00:05:02,620
Now I probably didn't pronounce that right but 825,
62

63
00:05:02,740 --> 00:05:09,290
yeah a thousand two hundred years ago this guy wrote a best selling book in mathematics and the Latin
63

64
00:05:09,290 --> 00:05:15,330
translators in the Middle Ages did an even worse job than I in pronouncing this guy's Persian name.
64

65
00:05:15,530 --> 00:05:18,970
So, that's how we get stuck with the word algorithm.
65

66
00:05:19,580 --> 00:05:23,540
So anyhow, on that bombshell, I'll see you in the next lesson.
66

67
00:05:23,540 --> 00:05:24,050
Take care.