1 00:00:00,200 --> 00:00:01,133 Hello my friends. 2 00:00:01,133 --> 00:00:04,333 Welcome to part one Data Preprocessing. 3 00:00:04,333 --> 00:00:08,033 And mostly welcome to the practical activities of this course. 4 00:00:08,333 --> 00:00:11,966 My name is Adam and this and I am more than happy 5 00:00:11,966 --> 00:00:13,800 to welcome you into this course. 6 00:00:13,800 --> 00:00:16,766 We're about to embark on a super huge journey, 7 00:00:16,766 --> 00:00:20,400 but yet a very exciting one because we're going to build together 8 00:00:20,400 --> 00:00:25,300 a series of machine learning models from all the branches of machine learning. 9 00:00:25,300 --> 00:00:29,400 And speaking of these branches, will you have them in front of you? 10 00:00:29,666 --> 00:00:33,433 That's the whole folder of this machine learning it is it course 11 00:00:33,633 --> 00:00:37,433 structured into all the different branches of machine learning, 12 00:00:37,566 --> 00:00:41,433 from data preprocessing, regression, classification, clustering, 13 00:00:41,633 --> 00:00:45,300 association rule learning, reinforcement learning and NLP and deep learning. 14 00:00:45,633 --> 00:00:49,533 And now we're about to start part one day data preprocessing. 15 00:00:49,666 --> 00:00:52,800 And this is the first very important step of this journey. 16 00:00:52,966 --> 00:00:56,500 Because indeed any time you build a machine learning model, 17 00:00:56,600 --> 00:01:00,633 you always have a data preprocessing phase to work on, right? 18 00:01:00,633 --> 00:01:05,200 You have to preprocess the data in the right way, so that the machine 19 00:01:05,200 --> 00:01:09,400 learning model that you're going to build can be trained the right way on the data. 20 00:01:09,700 --> 00:01:11,833 So we have to start with this step. 21 00:01:11,833 --> 00:01:15,600 This will not be the most exciting step, but I will make sure to 22 00:01:15,633 --> 00:01:19,033 do this very efficiently so that we can rapidly proceed 23 00:01:19,066 --> 00:01:23,166 to the branches of machine learning, where we're going to build indeed some models. 24 00:01:23,333 --> 00:01:26,733 But before that, we have to master data preprocessing. 25 00:01:26,966 --> 00:01:29,966 And that's exactly what we're going to do in this part one. 26 00:01:30,266 --> 00:01:30,766 All right. 27 00:01:30,766 --> 00:01:31,966 So first of all, 28 00:01:31,966 --> 00:01:35,733 make sure that you have the same as what I have on my computer right now. 29 00:01:36,000 --> 00:01:40,500 This is the whole machine learning is that codes and data sets folder containing 30 00:01:40,500 --> 00:01:44,433 old Python codes and all the R codes, as well as all the data sets 31 00:01:44,766 --> 00:01:48,333 in each of the practical activities we're going to do together. 32 00:01:48,600 --> 00:01:52,400 The link of this folder was provided to you in the previous section. 33 00:01:52,400 --> 00:01:53,633 You know, section one. 34 00:01:53,633 --> 00:01:56,166 So if we are on the same page let's do this. 35 00:01:56,166 --> 00:02:00,000 Let's go inside this part1 data preprocessing folder 36 00:02:00,266 --> 00:02:02,233 which is structured the following way. 37 00:02:02,233 --> 00:02:06,300 It has only one section, luckily for us, so that we can quickly get to the machine 38 00:02:06,300 --> 00:02:10,166 learning models and as any folder in this course, 39 00:02:10,166 --> 00:02:13,166 well, it is structured the following way in two folders. 40 00:02:13,166 --> 00:02:15,000 First Python and then R. 41 00:02:15,000 --> 00:02:18,833 Despite and folder contains of course all the Python implementations 42 00:02:18,833 --> 00:02:22,700 and the data set, and this folder contains all the R implementations 43 00:02:22,700 --> 00:02:24,100 with the same data set. 44 00:02:24,100 --> 00:02:28,633 So now we're going to go into Python because each time in in each section, 45 00:02:28,633 --> 00:02:33,000 anytime we build a model we're going to start first with Python and then R okay. 46 00:02:33,000 --> 00:02:36,800 So that if you are more interested in Python, will you take the first part. 47 00:02:36,933 --> 00:02:39,766 And if you're more interested in R you take the second part. 48 00:02:39,766 --> 00:02:42,533 And here I want to say something very important. 49 00:02:42,533 --> 00:02:46,133 This course is not meant for you to master both Python and R. 50 00:02:46,200 --> 00:02:50,066 We just covered the two tools so that anyone can learn machine 51 00:02:50,066 --> 00:02:52,000 learning on their preferred tool. 52 00:02:52,000 --> 00:02:54,433 So if you prefer Python, you can just do Python. 53 00:02:54,433 --> 00:02:56,266 If you prefer R, you can just to R. 54 00:02:56,266 --> 00:02:58,800 And if you want to learn both well you're welcome to learn both. 55 00:02:58,800 --> 00:03:01,766 But you don't have to. That's what I want to say okay. 56 00:03:01,766 --> 00:03:04,100 So we're going to start with Python here 57 00:03:04,100 --> 00:03:07,300 I'm going to teach you all the tools on data preprocessing. 58 00:03:07,500 --> 00:03:08,233 And there you go. 59 00:03:08,233 --> 00:03:11,400 That's the files you will find in this data preprocessing folder. 60 00:03:11,400 --> 00:03:15,333 In the Python folder you will first find this data preprocessing tools. 61 00:03:15,333 --> 00:03:18,766 Implement version which contains all the different tools 62 00:03:19,033 --> 00:03:23,400 of data preprocessing that you might have to use on your data sets 63 00:03:23,400 --> 00:03:27,000 in order to preprocess them the right way for your machine learning model. 64 00:03:27,400 --> 00:03:31,733 Then we have the data preprocessing template, which will be very useful 65 00:03:31,733 --> 00:03:35,933 for us to tackle any data preprocessing phase of our future. 66 00:03:35,933 --> 00:03:38,933 Machine learning model implementations in the following parts. 67 00:03:39,000 --> 00:03:40,166 So you will see. 68 00:03:40,166 --> 00:03:42,533 You will absolutely love this template. 69 00:03:42,533 --> 00:03:46,166 And we also have this data in a CSV file, 70 00:03:46,233 --> 00:03:50,000 data dot CSV, which is the data set on which 71 00:03:50,000 --> 00:03:53,700 I will show you how to implement all these data preprocessing tools. 72 00:03:53,900 --> 00:03:56,833 And just to give you some context, let's say that this data 73 00:03:56,833 --> 00:04:00,733 set belongs to a retail company that collected some data 74 00:04:00,733 --> 00:04:04,900 from their customers, whether or not they purchase a certain product. 75 00:04:04,900 --> 00:04:08,566 So here each of the rows correspond to different customers. 76 00:04:08,566 --> 00:04:12,566 And for each of these customers, well, this company gathered their country, 77 00:04:12,666 --> 00:04:17,166 their age, their salary and whether or not they purchased their product. 78 00:04:17,166 --> 00:04:19,366 All right. So that's a simple data set. 79 00:04:19,366 --> 00:04:22,733 But I wanted to use a simple one so that we can really focus 80 00:04:22,966 --> 00:04:25,766 on all the tools we're going to learn in this section. 81 00:04:25,766 --> 00:04:31,566 And speaking of them, well, now I suggest that we start this data preprocessing 82 00:04:31,600 --> 00:04:36,000 tools implementation because indeed this course is really action based. 83 00:04:36,000 --> 00:04:39,000 You know, you're going to learn by doing in this course. 84 00:04:39,000 --> 00:04:42,233 And therefore for this implementation and each future 85 00:04:42,233 --> 00:04:45,866 implementation, we will re-implement it from scratch. 86 00:04:45,966 --> 00:04:46,800 And so there you go. 87 00:04:46,800 --> 00:04:50,966 Our first implementation will be for all the data preprocessing tools. 88 00:04:50,966 --> 00:04:53,200 And now we're going to open this file. 89 00:04:53,200 --> 00:04:57,333 And as I explained in the first section of this course you can either open it 90 00:04:57,333 --> 00:05:02,500 in Google Colaboratory by just double clicking here or on Jupyter Notebook. 91 00:05:02,500 --> 00:05:04,500 If you don't like Google Colaboratory. 92 00:05:04,500 --> 00:05:07,500 So feel free to select your most comfortable environment. 93 00:05:07,533 --> 00:05:10,600 And for Google colaboratory lovers, well, there we go. 94 00:05:10,600 --> 00:05:13,600 Let's open our data preprocessing tools. 95 00:05:13,866 --> 00:05:18,300 And first I'm going to show you exactly the tools that we're going to implement. 96 00:05:18,300 --> 00:05:21,500 And mostly that you're going to learn for your machine learning models.