1 00:00:01,120 --> 00:00:05,470 OK, guys, and welcome back to our class of our course with the complete introduction to data science 2 00:00:05,470 --> 00:00:06,440 with Python. 3 00:00:07,360 --> 00:00:12,880 So in the past few classes, we talked about all the different algorithms that exist. 4 00:00:12,880 --> 00:00:17,870 And basically we talked about the theory behind all of those algorithms. 5 00:00:17,870 --> 00:00:21,930 So basically right now, you should have some basic knowledge about each of those algorithms. 6 00:00:22,900 --> 00:00:28,360 And right now we are going to practice writing them down on your text editor, which is. 7 00:00:28,810 --> 00:00:32,860 So basically, we're going to write down some of those algorithms inside of picture. 8 00:00:33,700 --> 00:00:34,960 So pretty simple today. 9 00:00:34,960 --> 00:00:37,360 We are going to start with linear regression. 10 00:00:37,390 --> 00:00:43,540 So basically, we're going to learn what we're going to create a linear regression model with Aristo. 11 00:00:43,540 --> 00:00:47,260 Basically, we're going to generate arrays and create a linear regression model. 12 00:00:47,290 --> 00:00:52,000 So for the purpose of this class, we're going to use non-pay to be able to make these calculations 13 00:00:52,390 --> 00:00:58,600 and work with arrays as well as matplotlib to be able to visualize everything that we will create. 14 00:00:59,950 --> 00:01:00,830 So let's start. 15 00:01:01,180 --> 00:01:05,160 So, as always, the first thing that we need to do is import the tools that we are going to work with. 16 00:01:05,440 --> 00:01:08,110 And in this case, we are going to work with numpties. 17 00:01:08,380 --> 00:01:12,170 So we are going to import it and we are going to work with matplotlib. 18 00:01:14,230 --> 00:01:14,750 Here we go. 19 00:01:15,160 --> 00:01:18,740 So that Bayport as BLT. 20 00:01:18,970 --> 00:01:19,390 All right. 21 00:01:19,420 --> 00:01:23,620 So when we have everything so right now, we have imported everything that we need. 22 00:01:23,650 --> 00:01:25,680 So we have no idea as well as by blood. 23 00:01:26,020 --> 00:01:31,450 And as I said, the first thing that we need to do is create our IRI's because we will work with those 24 00:01:31,450 --> 00:01:31,880 numbers. 25 00:01:31,900 --> 00:01:38,770 So basically, this would be our database to have our X's and our ways inside of the graph that we are 26 00:01:38,770 --> 00:01:39,520 going to create. 27 00:01:40,540 --> 00:01:43,020 So let's start basically where are X right now? 28 00:01:43,540 --> 00:01:46,830 So the value of X, as I said, will be a Arry. 29 00:01:46,990 --> 00:01:48,430 So let's write it down. 30 00:01:49,480 --> 00:01:50,190 So pretty simple. 31 00:01:50,200 --> 00:01:53,510 What I'll ask you guys to do is to write down from 10 to 15 numbers. 32 00:01:53,520 --> 00:01:54,910 It can be random numbers. 33 00:01:57,040 --> 00:01:59,830 So you you write down whatever you guys want. 34 00:02:01,910 --> 00:02:03,710 Doesn't have to be big numbers. 35 00:02:03,730 --> 00:02:05,070 It doesn't have to be small numbers. 36 00:02:05,080 --> 00:02:07,000 You guys decide what you want to write. 37 00:02:10,720 --> 00:02:11,120 All right. 38 00:02:12,340 --> 00:02:15,130 Next thing is writing down our Y values. 39 00:02:15,160 --> 00:02:19,090 Right now we have our X values and write down our white values. 40 00:02:25,140 --> 00:02:25,890 All right. 41 00:02:33,250 --> 00:02:36,090 So once again, you decide what numbers you want to write down. 42 00:02:59,730 --> 00:03:04,890 And what I want to do right now is simply print everything to be sure that our hours worked perfectly, 43 00:03:05,130 --> 00:03:06,930 that we didn't make any mistake. 44 00:03:09,280 --> 00:03:12,260 All right, so that's everything, all right. 45 00:03:12,280 --> 00:03:16,780 So right now, we have our eyes right here so you can see everything works perfectly, everything works 46 00:03:16,780 --> 00:03:17,070 fine. 47 00:03:17,080 --> 00:03:17,630 It's awesome. 48 00:03:18,280 --> 00:03:22,000 So the next thing I want to do so if you want to keep your print, you can keep it. 49 00:03:22,000 --> 00:03:23,350 If you want to delete it, you can delete. 50 00:03:23,780 --> 00:03:28,450 Want to do right now is want to create a function that will allow us to generate those. 51 00:03:28,450 --> 00:03:32,710 Well, to make the calculation for our well for our plug in. 52 00:03:32,930 --> 00:03:38,650 Well not our upload but our sloup inside of our regression model. 53 00:03:39,550 --> 00:03:42,590 So we will do it by using the Bollywood function. 54 00:03:42,880 --> 00:03:46,880 So basically we'll return an array of coefficients that will minimize the squared error. 55 00:03:47,320 --> 00:03:49,650 So basically how exactly we write it down is pretty simple. 56 00:03:49,660 --> 00:03:53,140 So we'll create a function that will call function. 57 00:03:53,140 --> 00:03:57,250 So function one where you can limit whatever you guys want. 58 00:03:57,280 --> 00:04:04,120 So it's going to be ENPI because we are referring to a number by two in number by function and it's 59 00:04:04,120 --> 00:04:05,580 going to be following that. 60 00:04:06,880 --> 00:04:09,530 So inside of our prolific function we have three arguments. 61 00:04:09,530 --> 00:04:12,070 So we have the X in our case, the Y. 62 00:04:12,220 --> 00:04:17,650 So basically the database that we are going to use and finally we have a number. 63 00:04:17,670 --> 00:04:19,320 So it's going to be one in our case. 64 00:04:19,750 --> 00:04:21,640 So if we run everything. 65 00:04:25,000 --> 00:04:27,330 I mean, just that's the function, I forgot to print it. 66 00:04:27,730 --> 00:04:31,810 So if we print our function one and we run everything. 67 00:04:32,960 --> 00:04:37,120 As you can see, this would be the answer of this function right here. 68 00:04:37,150 --> 00:04:39,130 So this is the answer that is generated. 69 00:04:39,740 --> 00:04:40,080 All right. 70 00:04:40,090 --> 00:04:42,660 So next thing that I want to do, we want to create our graph. 71 00:04:42,850 --> 00:04:49,130 So basically we are going to use a Matlab function, which is that blood. 72 00:04:49,150 --> 00:04:51,060 So we want to plug all our points. 73 00:04:51,730 --> 00:04:57,270 So as always, we are using X and Y as data and what type of points we want. 74 00:04:57,280 --> 00:04:58,140 It's pretty simple. 75 00:04:58,150 --> 00:05:00,010 We want small circles. 76 00:05:00,490 --> 00:05:03,550 So to have small circles will just write down points. 77 00:05:03,880 --> 00:05:05,060 So we have a point. 78 00:05:05,530 --> 00:05:07,000 So if we run everything. 79 00:05:12,100 --> 00:05:16,510 I just write it down, Kielty got so. 80 00:05:21,190 --> 00:05:26,650 So we have our small points right here, so basically those points represent our well, all the points 81 00:05:26,650 --> 00:05:31,300 that we have right here and want to do right now, it's pretty simple. 82 00:05:31,300 --> 00:05:35,030 We want to add a line right here to be able to have a linear regression. 83 00:05:35,050 --> 00:05:40,240 Once again, the points right here are not really representative of something, but we'll be able to 84 00:05:40,240 --> 00:05:42,670 have an average line in the middle right now. 85 00:05:43,840 --> 00:05:44,220 All right. 86 00:05:44,230 --> 00:05:45,500 So how exactly do we do this? 87 00:05:45,520 --> 00:05:46,210 It's pretty simple. 88 00:05:46,210 --> 00:05:48,220 We will simply add another function. 89 00:05:49,170 --> 00:05:51,820 So once again, we'll write down our block function. 90 00:05:52,870 --> 00:06:01,660 And in this case, we will need to use another function, which is the function and basically dysfunctional. 91 00:06:01,660 --> 00:06:04,420 Allow us to add a line that I'm talking about. 92 00:06:05,610 --> 00:06:07,330 So pretty simple, how do we do this? 93 00:06:07,350 --> 00:06:12,780 It's pretty simple, so we will need a few arguments to make it work. 94 00:06:12,780 --> 00:06:18,870 So we'll have our X, which will be part of our database then where we are going to make a reference 95 00:06:18,870 --> 00:06:21,440 to numbers and p that public info. 96 00:06:23,760 --> 00:06:27,350 And in this case we will have our function. 97 00:06:27,370 --> 00:06:30,060 So we are making reference to our function, which is function. 98 00:06:31,600 --> 00:06:32,740 26. 99 00:06:34,890 --> 00:06:42,550 And finally, the last little thing that went right down would be our miners and then simply show everything. 100 00:06:42,750 --> 00:06:47,890 So basically we have all the arguments that we need right now to be able to write down our function. 101 00:06:48,510 --> 00:06:53,160 So if we run everything, as you can see, we have our life right here in the middle. 102 00:06:53,170 --> 00:07:00,090 So the main problem of all this so basically what we can see with this model right now is that there 103 00:07:00,090 --> 00:07:02,260 is no correlation between X and Y. 104 00:07:02,280 --> 00:07:06,600 So basically, there is nothing that well, there is no correlation between those two points, because 105 00:07:06,600 --> 00:07:10,150 once again, the points are everywhere on the graph and the line is just right here. 106 00:07:10,440 --> 00:07:17,100 So basically, the line doesn't mean a lot because in our case, the points are just everywhere on the 107 00:07:17,100 --> 00:07:17,430 graph. 108 00:07:18,180 --> 00:07:22,800 But basically there is no correlation between X's and Y's in this example. 109 00:07:23,070 --> 00:07:26,220 If, for example, the points where, I don't know, in the street or. 110 00:07:26,400 --> 00:07:29,700 So basically, let's say it's one, two, three, four and etc.. 111 00:07:30,180 --> 00:07:31,480 Well, to have different. 112 00:07:31,500 --> 00:07:35,430 So let's let me just show you another dataset. 113 00:07:36,860 --> 00:07:42,470 So let's say instead of all this, we have another that I said that looks something like this, so more 114 00:07:42,470 --> 00:07:42,900 simple. 115 00:07:43,160 --> 00:07:44,320 So one, two, three, four or five. 116 00:07:44,480 --> 00:07:46,160 And we are up. 117 00:07:49,390 --> 00:07:51,970 So as you can see, it's a bit more representative right now. 118 00:07:52,540 --> 00:07:57,640 The line is a bit more representative and it makes more sense because the points, the more R X grows, 119 00:07:57,640 --> 00:07:59,130 the more are Y groups. 120 00:07:59,210 --> 00:08:04,090 Basically, this is this this is what we can understand from our graph right here. 121 00:08:04,510 --> 00:08:09,930 And it makes a little bit more sense that the graph where the points were everywhere on the graph. 122 00:08:10,750 --> 00:08:16,960 So we get this help us find a correlation between X and Y, and in this case, we have a correlation. 123 00:08:16,970 --> 00:08:22,050 It's not a really strong correlation, but once again, it's a we have a correlation there. 124 00:08:22,400 --> 00:08:27,430 Well, there are statistical ways to find the correlation between X and Y, between the well, between 125 00:08:27,430 --> 00:08:29,320 X and Y and the points on the graph. 126 00:08:29,470 --> 00:08:31,820 Once again, we're not going to talk about this right now. 127 00:08:31,840 --> 00:08:38,660 You know how to create a basic linear regression model and how exactly you can evaluate this model. 128 00:08:38,890 --> 00:08:42,280 So that's a first class guys in you in our next class.