﻿1
00:00:00,100 --> 00:00:01,166
Hello my friends, and

2
00:00:01,166 --> 00:00:05,100
welcome to this new part on association
rule learning.

3
00:00:05,233 --> 00:00:09,266
In this part, we're going to cover
a new way of learning relationships.

4
00:00:09,266 --> 00:00:10,500
You know, correlations.

5
00:00:10,500 --> 00:00:14,566
And this time what we will
learn is some association rules.

6
00:00:14,566 --> 00:00:19,033
You know, like that famous statement
people who but this also

7
00:00:19,033 --> 00:00:20,066
but that you know,

8
00:00:20,066 --> 00:00:23,666
that's what Amazon, for example,
does in its recommendation system.

9
00:00:23,833 --> 00:00:27,900
It predicts what customers will buy
based on what they bought before.

10
00:00:28,133 --> 00:00:28,866
And that's why you see

11
00:00:28,866 --> 00:00:32,333
all these suggestions of new products
when you buy a certain product.

12
00:00:32,433 --> 00:00:36,466
Well, and this new part,
I'm going to teach you how to make a model

13
00:00:36,600 --> 00:00:38,700
that can do these kind of things.
All right.

14
00:00:38,700 --> 00:00:41,333
And that is called associate
rule learning.

15
00:00:41,333 --> 00:00:44,200
So it's very different
actually than what we had before.

16
00:00:44,200 --> 00:00:45,633
You know, before we were either

17
00:00:45,633 --> 00:00:48,366
predicting a dependent variable
and we knew what to predict.

18
00:00:48,366 --> 00:00:52,166
We also did clustering
where we learned some patterns in the data

19
00:00:52,166 --> 00:00:55,166
so as to create a new dependent variable
a posteriori.

20
00:00:55,200 --> 00:00:57,000
And now what we're going to learn

21
00:00:57,000 --> 00:01:01,200
is some association rules inside
and in symbol of transactions.

22
00:01:01,200 --> 00:01:01,833
All right.

23
00:01:01,833 --> 00:01:07,266
So very useful especially for retail
businesses or any kind of e-commerce.

24
00:01:07,266 --> 00:01:11,066
So if you are a data scientist
working for an e-commerce company

25
00:01:11,100 --> 00:01:12,466
you will definitely use it.

26
00:01:12,466 --> 00:01:14,166
And if you are yourself,
you know, a business

27
00:01:14,166 --> 00:01:17,266
owner of a retail company
or an e-commerce company,

28
00:01:17,400 --> 00:01:20,300
then you will definitely benefit
from it as well.

29
00:01:20,300 --> 00:01:20,866
All right.

30
00:01:20,866 --> 00:01:21,766
So let's do this.

31
00:01:21,766 --> 00:01:24,233
Let's start to learn this new technique.

32
00:01:24,233 --> 00:01:27,833
And before we start, let's just make sure
everyone here is on the same page.

33
00:01:28,066 --> 00:01:32,066
This is a folder containing all the codes
and data sets of this course.

34
00:01:32,166 --> 00:01:34,833
And I give you the link to this folder
right before this tutorial.

35
00:01:34,833 --> 00:01:36,233
So make sure not to miss it.

36
00:01:36,233 --> 00:01:36,900
Click it.

37
00:01:36,900 --> 00:01:39,500
And now we should all be on the same page.

38
00:01:39,500 --> 00:01:40,800
All right so let's do this.

39
00:01:40,800 --> 00:01:44,533
Let's get into part
five association rule learning.

40
00:01:44,866 --> 00:01:46,933
And we are going to cover two models.

41
00:01:46,933 --> 00:01:48,400
The first one a priori

42
00:01:48,400 --> 00:01:51,366
which is actually the best
one of the two issues in my opinion.

43
00:01:51,366 --> 00:01:54,400
And and of course
we're going to start with a priori

44
00:01:54,700 --> 00:01:57,700
and we're going to start here with Python.

45
00:01:57,866 --> 00:01:58,333
All right.

46
00:01:58,333 --> 00:02:02,633
So inside this Python folder
you will find as usual the implementation

47
00:02:02,633 --> 00:02:04,666
a Priory dot ipynb

48
00:02:04,666 --> 00:02:08,566
which you can open with either
Google Collaboratory or Jupyter Notebook.

49
00:02:08,800 --> 00:02:14,566
And of course you will find the data set
which is called market basket optimization

50
00:02:14,566 --> 00:02:18,100
because actually association
rule learning is used to do

51
00:02:18,166 --> 00:02:21,400
market basket analysis or optimization.

52
00:02:21,533 --> 00:02:24,333
That's why I called this data
set this way. All right.

53
00:02:24,333 --> 00:02:28,033
So speaking of which let's describe
what this data set is about.

54
00:02:28,833 --> 00:02:29,166
All right.

55
00:02:29,166 --> 00:02:33,833
So for starters let's imagine
the beautiful region of south of France.

56
00:02:33,833 --> 00:02:38,033
You know with all these cute villages
with happy people walking down the streets

57
00:02:38,033 --> 00:02:41,466
and going to the grocery store every now
and then or to the coffee places.

58
00:02:41,700 --> 00:02:44,400
Well,
you know, imagine a very lively place.

59
00:02:44,400 --> 00:02:46,633
Well, people hang out a lot and love

60
00:02:46,633 --> 00:02:48,966
going to the different shops,
the different commerces

61
00:02:48,966 --> 00:02:52,000
not only to buy their favorite product,
but also to, you know,

62
00:02:52,000 --> 00:02:57,000
chill out in the beautiful, lovely town
and imagine that you are the business

63
00:02:57,000 --> 00:03:00,666
owner of one of these stores, you know,
selling food and delicious stuff.

64
00:03:01,000 --> 00:03:04,200
And so, you know, you are the business
owner of this shop and you would like to,

65
00:03:04,233 --> 00:03:08,300
as any business owners, optimize
and boost the sales, and you have an idea

66
00:03:08,400 --> 00:03:12,033
it is to offer some new great deals
to your customers.

67
00:03:12,300 --> 00:03:16,200
And the deal is that you have in
mind is to identify,

68
00:03:16,333 --> 00:03:19,766
you know, the best association rules
among the different products,

69
00:03:19,766 --> 00:03:23,733
but by your customers in order to offer,
you know, this very famous deal,

70
00:03:23,833 --> 00:03:25,966
buy this and get that for free.

71
00:03:25,966 --> 00:03:29,366
You know, if you buy this product,
you're going to get that product for free.

72
00:03:29,700 --> 00:03:30,433
All right.

73
00:03:30,433 --> 00:03:33,600
And so now you see the idea
we're going to use association

74
00:03:33,600 --> 00:03:36,833
rule learning to find the strongest rules.

75
00:03:36,833 --> 00:03:39,833
Saying if customers buy this product

76
00:03:39,900 --> 00:03:43,233
then they will have a high chance
to buy that other product.

77
00:03:43,233 --> 00:03:47,333
And we will measure that chance
so that in fact, if they get this product,

78
00:03:47,333 --> 00:03:50,033
they will very likely
want this other product,

79
00:03:50,033 --> 00:03:51,300
and therefore they will very likely

80
00:03:51,300 --> 00:03:54,633
get that deal, get this product
and get that one for free.

81
00:03:54,633 --> 00:03:57,166
All right. So that's what
this owner would like to do.

82
00:03:57,166 --> 00:04:01,733
And this owner just hired the best
data scientist of a thousand friends.

83
00:04:01,733 --> 00:04:06,100
Which means you of course, to do the job,
you know, to find these association rules

84
00:04:06,500 --> 00:04:10,066
and therefore what this owner did,
because he knows a little bit

85
00:04:10,066 --> 00:04:10,866
about data science.

86
00:04:10,866 --> 00:04:14,433
He knows that he has to collect some data
to provide to the data scientists

87
00:04:14,433 --> 00:04:15,933
in order to learn these rules.

88
00:04:15,933 --> 00:04:20,400
And so each week, this owner collect
all the different transactions

89
00:04:20,400 --> 00:04:21,600
of the customers.

90
00:04:21,600 --> 00:04:25,466
And at the end of the week,
you're given this list of transactions

91
00:04:25,466 --> 00:04:28,500
which you have in front of you
so that you can learn the association

92
00:04:28,500 --> 00:04:32,733
rules and return the best deals that
this owner should offer to the client.

93
00:04:32,733 --> 00:04:34,433
All right. To that's your mission.

94
00:04:34,433 --> 00:04:38,333
You have to identify the best deals
so as to maximize the chance

95
00:04:38,600 --> 00:04:41,500
that you know
the customers will get the deals.

96
00:04:41,500 --> 00:04:44,433
You know, we'll buy this product
and get another one for free.

97
00:04:44,433 --> 00:04:46,166
And of course, the price of this,

98
00:04:46,166 --> 00:04:49,900
buy this, get one for free
will be well calculated by the owner.

99
00:04:49,900 --> 00:04:51,066
But that's a different question.

100
00:04:51,066 --> 00:04:55,733
So that it can indeed not only optimize
the sales, but also the profit.

101
00:04:55,933 --> 00:04:56,733
All right.

102
00:04:56,733 --> 00:04:58,300
So that's the mission.

103
00:04:58,300 --> 00:05:01,300
And now let's make sure
everyone understands the data set.

104
00:05:01,466 --> 00:05:05,000
You know each row of the data
set corresponds

105
00:05:05,000 --> 00:05:08,566
to different transactions you know
from different customers actually.

106
00:05:08,700 --> 00:05:12,800
And for each of these transactions
you have all the different products

107
00:05:12,800 --> 00:05:15,833
that the customer
who did the transaction purchased.

108
00:05:15,833 --> 00:05:16,300
All right.

109
00:05:16,300 --> 00:05:18,666
So for example,
this first transaction corresponds

110
00:05:18,666 --> 00:05:22,800
to a certain customer
who bought in one basket some shrimps,

111
00:05:22,800 --> 00:05:26,700
some almonds, avocado,
vegetable mixed green grapes, etcetera.

112
00:05:26,700 --> 00:05:29,633
Up to, you know, olive oil. All right.

113
00:05:29,633 --> 00:05:33,933
Then this customer or you know,
this transaction corresponds to a customer

114
00:05:33,933 --> 00:05:38,100
who bought in one same basket,
some burgers, meatballs and eggs.

115
00:05:38,266 --> 00:05:38,966
All right.

116
00:05:38,966 --> 00:05:41,100
This customer just bought chutney.

117
00:05:41,100 --> 00:05:44,366
This first customer bought turkey,
avocado and etc..

118
00:05:44,400 --> 00:05:45,100
Right.

119
00:05:45,100 --> 00:05:49,033
So all these transactions correspond
to different customers

120
00:05:49,500 --> 00:05:51,133
and all these different transactions.

121
00:05:51,133 --> 00:05:54,900
There are actually many of them
there actually 7500.

122
00:05:54,900 --> 00:05:58,333
You know, if we scroll down up to the end,
yes, 7500.

123
00:05:58,600 --> 00:06:03,700
And these 7500 transactions were collected
during the whole week.

124
00:06:03,733 --> 00:06:06,433
You know, each week
the owner of the shop does this.

125
00:06:06,433 --> 00:06:09,800
He records all the transactions
and he gives them to you,

126
00:06:09,833 --> 00:06:13,133
data scientist, so that you can learn
the association rules.

127
00:06:13,133 --> 00:06:13,500
All right.

128
00:06:13,500 --> 00:06:17,033
And your mission is to return to
this owner as fast as possible.

129
00:06:17,033 --> 00:06:20,666
Well, the best association rules,
you know, of two elements

130
00:06:20,800 --> 00:06:23,800
so that this owner can find
the best deals for his clients.

131
00:06:24,033 --> 00:06:25,333
All right. Good.

132
00:06:25,333 --> 00:06:29,400
So now that we clearly understand
the data set, well,

133
00:06:29,400 --> 00:06:32,966
I suggest that we move on directly
to the implementation.

134
00:06:33,466 --> 00:06:35,400
So we're going to go back to our folder.

135
00:06:35,400 --> 00:06:38,833
And we're going to open this a priori
that ipynb

136
00:06:39,100 --> 00:06:43,233
either with Google Colaboratory
or Jupyter Notebook as you want.

137
00:06:43,266 --> 00:06:45,366
Choose your favorite. Now it is loading.

138
00:06:45,366 --> 00:06:50,200
It is laying out the notebook
and in a second will have it opened.

139
00:06:50,233 --> 00:06:51,200
All right. Perfect.

140
00:06:51,200 --> 00:06:55,700
So as usual,
this notebook is in read only mode.

141
00:06:55,700 --> 00:06:59,666
And therefore what we're all going to do
now is go to file here to create a copy

142
00:06:59,766 --> 00:07:02,900
by clicking here on save a copy in Drive.

143
00:07:02,933 --> 00:07:06,966
This will create a copy inside
which we will be able to re-implement

144
00:07:07,200 --> 00:07:09,500
that file from scratch. All right.

145
00:07:09,500 --> 00:07:11,133
Because I remind that this course

146
00:07:11,133 --> 00:07:15,000
is an action based course
I want all of you to learn by doing,

147
00:07:15,000 --> 00:07:18,766
because this is how you will maximize
your progress in machine learning.

148
00:07:18,933 --> 00:07:23,166
Okay, so now again, as usual,
let's remove all the code cells

149
00:07:23,400 --> 00:07:26,233
you know, only the code cell
so that we can keep the well

150
00:07:26,233 --> 00:07:28,433
highlighted structure
of this implementation.

151
00:07:28,433 --> 00:07:30,733
So don't remove the text cells here.

152
00:07:30,733 --> 00:07:31,266
All right.

153
00:07:31,266 --> 00:07:34,533
So there are a few code cells
but we should be done soon.

154
00:07:34,966 --> 00:07:36,433
This should be the last one.

155
00:07:36,433 --> 00:07:39,333
Oh no. One last one. And there we go.

156
00:07:39,333 --> 00:07:39,933
All right. So good.

157
00:07:39,933 --> 00:07:41,933
We see the whole structure in one page.

158
00:07:41,933 --> 00:07:43,133
So let's have a look.

159
00:07:43,133 --> 00:07:45,600
So first we import the libraries as usual.

160
00:07:45,600 --> 00:07:46,200
You notice

161
00:07:46,200 --> 00:07:50,100
that I kept my data preprocessing template
because we will actually use it a bit.

162
00:07:50,100 --> 00:07:51,966
At least you know the first cells.

163
00:07:51,966 --> 00:07:54,066
Then we have the data preprocessing phase.

164
00:07:54,066 --> 00:07:56,333
There you go. That's inevitable.

165
00:07:56,333 --> 00:07:59,566
Then we're going to train
the primary model on the data set.

166
00:08:00,066 --> 00:08:02,166
And then we're going
to visualize the results.

167
00:08:02,166 --> 00:08:05,166
And by visualizing the results here
I mean, you know, visualize

168
00:08:05,166 --> 00:08:09,266
all the different rules
sorted by relevance as you can see.

169
00:08:09,366 --> 00:08:12,666
Well, for his display
the results meaning the rules non sorted

170
00:08:12,800 --> 00:08:16,600
and then the results meaning the rules
sorted by descending lifts.

171
00:08:16,600 --> 00:08:19,600
You know,
is that metric of measuring the relevance

172
00:08:19,600 --> 00:08:23,600
of an association rule which you saw
in the intuition lectures with Cairo.

173
00:08:23,600 --> 00:08:24,733
So you see we're going to

174
00:08:24,733 --> 00:08:28,666
display the rules by descending lift
so that we can see the most relevant

175
00:08:28,666 --> 00:08:32,433
and therefore the ones having the
highest chance to convert the customers.

176
00:08:33,000 --> 00:08:34,600
All right. So that's the structure.

177
00:08:34,600 --> 00:08:37,500
And now whenever you're ready let's meet

178
00:08:37,500 --> 00:08:40,500
in the next tutorial
to start this implementation.

179
00:08:40,600 --> 00:08:42,433
It will be a very exciting new journey.

180
00:08:42,433 --> 00:08:44,300
I can't wait to implement this model
with you.

181
00:08:44,300 --> 00:08:46,133
And until then, enjoy machine learning.
