1
00:00:01,010 --> 00:00:02,270
Now, let's proceed further.

2
00:00:04,290 --> 00:00:10,220
First, let's create a new variable with the name data and store our data for a minute.

3
00:00:12,030 --> 00:00:19,020
And then let's create a dummy variable, the variable is a numerical variable that represents categorical

4
00:00:19,020 --> 00:00:22,080
data such as gender, race, etc..

5
00:00:24,020 --> 00:00:31,900
Dummy variables are quantitative variables they can take any to quantitative values suggest one and

6
00:00:31,910 --> 00:00:32,300
zero.

7
00:00:33,320 --> 00:00:38,450
One man's presence of that attribute and zero absence of that.

8
00:00:40,890 --> 00:00:44,650
Here we are creating the variable for the column name page schedule.

9
00:00:46,750 --> 00:00:52,290
After creating that dummy variable, we are removing biweekly label from.

10
00:00:53,460 --> 00:01:03,250
Let's run the cell and show you what it is, let's produce values first, let's piece in the bulletin.

11
00:01:05,710 --> 00:01:06,490
Let's created.

12
00:01:07,760 --> 00:01:09,200
Al Sharpton told Anderson.

13
00:01:10,670 --> 00:01:11,900
Let's see this.

14
00:01:14,930 --> 00:01:16,750
Let's see this dummy.

15
00:01:23,680 --> 00:01:30,740
So patient deal was one column, so it converted one column and created four dummy variables, bi weekly

16
00:01:30,760 --> 00:01:31,510
one means.

17
00:01:32,610 --> 00:01:36,400
Presence of bigly and monthly, semi, monthly and weekly zero.

18
00:01:37,110 --> 00:01:38,340
So you're in the second row.

19
00:01:38,370 --> 00:01:41,910
We have weekly one, which meant it was a weekly payment.

20
00:01:43,360 --> 00:01:46,550
Weekly pay, you and all of us are zero.

21
00:01:47,930 --> 00:01:51,080
Now, from this, let's drop by quickly.

22
00:01:52,600 --> 00:01:59,410
So dramatically so the money is equal to the money that drug labels is able to buy quickly and actually

23
00:01:59,860 --> 00:02:01,630
the one person and sell.

24
00:02:02,920 --> 00:02:06,800
Now, let's drop this dual column from the.

25
00:02:08,960 --> 00:02:09,190
Let's.

26
00:02:11,060 --> 00:02:19,430
So as we have removed the Piscatella column, instead of that, let's add this dummy columns in our

27
00:02:19,550 --> 00:02:20,150
dataset.

28
00:02:21,880 --> 00:02:29,170
So that is a call to be read out concat data and Dommy and Access is called to one.

29
00:02:30,580 --> 00:02:31,510
Let's run this in.

30
00:02:33,200 --> 00:02:36,010
Now, let's check the shape of their data from.

31
00:02:38,410 --> 00:02:39,370
So here we have.

32
00:02:41,020 --> 00:02:45,880
So nine thousand nine zero eight rows and 23 columns.

33
00:02:47,160 --> 00:02:49,920
Now let's update our features and labels.

34
00:02:51,710 --> 00:02:58,080
So is this our label, so let's save you U.S. column in response.

35
00:02:58,670 --> 00:03:01,670
So from day to day, the frame we are taking.

36
00:03:02,780 --> 00:03:06,380
You signed the column name and we are storing it in response.

37
00:03:07,840 --> 00:03:10,630
So response contain only our labels.

38
00:03:11,810 --> 00:03:13,640
So now we have to remove.

39
00:03:15,050 --> 00:03:16,550
This is a column from.

40
00:03:19,360 --> 00:03:29,260
So data don't drop, columns is equal, do you find comma in and we also need to remove this in reality

41
00:03:29,720 --> 00:03:31,280
illustrated in the.

42
00:03:33,050 --> 00:03:36,380
Let's run this El Al Sharpton tattoed on D.L..

43
00:03:37,920 --> 00:03:44,430
Now we need to do data transformation, just standard killing, the use of standard scale to transform

44
00:03:44,430 --> 00:03:52,560
our data into same scale, it standardize features by removing the mean and scaling it to unit variance.

45
00:03:54,120 --> 00:03:59,700
So the standard is of a sample, it is calculated with the formula.

46
00:04:01,530 --> 00:04:02,940
Zero is equal to.

47
00:04:04,310 --> 00:04:08,780
X minus you, you by s.

48
00:04:10,190 --> 00:04:12,800
So excuse here our value.

49
00:04:14,210 --> 00:04:18,080
And you is mean of the training's ample.

50
00:04:20,850 --> 00:04:23,340
And this is standard deviation.

51
00:04:28,010 --> 00:04:34,520
So excuse our value minus you is over, I mean, they were based on the division, so this is how we

52
00:04:34,520 --> 00:04:37,520
calculate the value of standards called.

53
00:04:40,280 --> 00:04:41,130
Do this.

54
00:04:43,900 --> 00:04:44,890
That's Randall.

55
00:04:48,400 --> 00:04:51,060
No, let's split our data into training set test.

56
00:04:53,420 --> 00:04:54,730
So to split our data.

57
00:04:58,080 --> 00:05:05,250
They use a skill and library, so from a skill learned to model selection, import, train spirit.

58
00:05:07,900 --> 00:05:16,450
So your ex is our dataset is the response, which is labeled features, is called dataset labels, is

59
00:05:16,450 --> 00:05:22,540
called response and pesticides is equal to zero point two, which means 20 percent testing site and

60
00:05:22,540 --> 00:05:26,860
80 percent training set and random state is equal to zero.

61
00:05:28,150 --> 00:05:31,910
So here we are creating extra next test by train test.

62
00:05:32,380 --> 00:05:38,790
So these are training features and training training levels, which is why train and exercise control

63
00:05:38,860 --> 00:05:40,590
testing features and why test is called.

64
00:05:41,710 --> 00:05:42,670
Heisting libbers.

65
00:05:43,820 --> 00:05:44,690
Let's run renditioned.

66
00:05:46,760 --> 00:05:53,600
So here we have initialised our standards, Caylor and stored it in a sea, underscore X. Now we need

67
00:05:53,600 --> 00:05:54,950
to fit the standard killer.

68
00:05:56,770 --> 00:05:58,600
So Gautrain is going to.

69
00:06:00,450 --> 00:06:03,570
As you underscore X, which is standard scalar.

70
00:06:04,590 --> 00:06:13,020
That would transform extrem, so we are transforming our extrem and expressed with standards killer.

71
00:06:14,330 --> 00:06:15,260
Let's undersell.

72
00:06:17,960 --> 00:06:19,510
Check the shape of training SEC.

73
00:06:24,060 --> 00:06:28,710
So here we have fourteen thousand three twenty six heroes and 21 columns in.