1 00:00:00,330 --> 00:00:01,620 Welcome back. 2 00:00:01,620 --> 00:00:06,840 So while you were gone in between the last video and this one I've kind of tidied up this data dictionary 3 00:00:06,840 --> 00:00:07,070 here. 4 00:00:07,080 --> 00:00:08,330 So it looks a bit better now. 5 00:00:08,340 --> 00:00:12,300 And don't worry you'll have a copy of this in the resources section so you're not going to be missing 6 00:00:12,300 --> 00:00:14,730 out on anything but just where we left off. 7 00:00:14,730 --> 00:00:20,250 We talked about features which is step four in the machine learning pipeline that we're using on the 8 00:00:20,250 --> 00:00:23,460 machine learning modelling pipeline that we're using. 9 00:00:23,460 --> 00:00:27,570 And this is where you'll get different information about each of the features in your data. 10 00:00:27,570 --> 00:00:33,030 You can do this by doing your own researchers researchers by doing your own research such as looking 11 00:00:33,030 --> 00:00:33,840 at the links above. 12 00:00:34,290 --> 00:00:36,720 That's what I did here to kind of tidy this up. 13 00:00:36,720 --> 00:00:45,090 Reading between this information here to attribute information as well as UCI data set web page. 14 00:00:45,090 --> 00:00:45,900 If you go through here. 15 00:00:45,900 --> 00:00:54,570 Here are all 76 attributes but realistically we only used 14 so only 14 attributes used and that's what 16 00:00:54,690 --> 00:00:56,270 this data dictionary is. 17 00:00:56,280 --> 00:01:01,380 So basically a data dictionary just tells you about the data that you're working on. 18 00:01:01,380 --> 00:01:02,360 Wonderful. 19 00:01:02,370 --> 00:01:03,830 And so now where are we up to. 20 00:01:03,960 --> 00:01:10,320 If we check back we've worked through one two three four but have we really. 21 00:01:10,330 --> 00:01:12,400 So we've we've just kind of done it in thought. 22 00:01:12,400 --> 00:01:19,150 So now it's time to apply the tools that we've been going through to this section rights in this data 23 00:01:19,150 --> 00:01:20,560 analysis section. 24 00:01:20,620 --> 00:01:22,750 So let's do that let's get our tools ready. 25 00:01:23,140 --> 00:01:28,310 So create a heading here preparing the tools. 26 00:01:28,870 --> 00:01:41,440 We're going to use Panda's map plot lib and name PI for data analysis and manipulation. 27 00:01:41,440 --> 00:01:41,950 Wonderful. 28 00:01:42,570 --> 00:01:47,560 And so at the start of every notebook what you might do is import. 29 00:01:47,560 --> 00:01:55,260 This is what you'll probably see out in the wild is a big chunk or one big Jupiters cell importing all 30 00:01:55,260 --> 00:01:57,130 different tools at once. 31 00:01:57,150 --> 00:02:02,040 You don't necessarily have to do it this way but it's kind of industry standard to do it this way. 32 00:02:02,070 --> 00:02:03,750 So let's see what that would look like. 33 00:02:04,500 --> 00:02:08,150 Import all the tools we need. 34 00:02:08,150 --> 00:02:11,190 So what we might do is make little subheadings here. 35 00:02:11,640 --> 00:02:18,020 So that way when we're working through this step by step process we've got all the tools that we need. 36 00:02:18,090 --> 00:02:23,610 And now again a project might not start out like this I'm only doing it this way because this is what 37 00:02:23,610 --> 00:02:26,190 you'll see with this sort of finished project. 38 00:02:26,190 --> 00:02:30,540 What you're more regularly do is kind of how we've been working with it in the past is just import a 39 00:02:30,540 --> 00:02:32,280 tool as you need it. 40 00:02:32,310 --> 00:02:37,290 So when you see this big block of code that I'm about to type with all the imports and all different 41 00:02:37,290 --> 00:02:42,480 tools and whatnot you might look at and go holy goodness I'm not sure what's going on with everything 42 00:02:42,480 --> 00:02:44,830 there but that's okay. 43 00:02:44,970 --> 00:02:49,550 What you'll do is because you've got the framework now you've got this. 44 00:02:49,830 --> 00:02:53,700 And we've seen different ways that we can apply different tooling. 45 00:02:53,700 --> 00:02:57,590 You'll be pretty familiar with all the functionality that we're going through here. 46 00:02:57,600 --> 00:03:00,040 And if not that's perfectly okay. 47 00:03:00,360 --> 00:03:03,030 You know where to look and you know where to get help. 48 00:03:03,240 --> 00:03:08,030 So let's do this we'll go regular a D.A. and plotting libraries. 49 00:03:09,180 --> 00:03:17,230 Here's where we're going to import name pi EDI A stands for exploratory data analysis by the way. 50 00:03:17,250 --> 00:03:21,770 Got to learn not to talk in acronyms where your code is like to shorten everything so important. 51 00:03:21,780 --> 00:03:33,750 Num pi as MP import panders as PDA import mat plot lib dot pie plot as party at party. 52 00:03:33,750 --> 00:03:34,950 Come on Daniel. 53 00:03:35,070 --> 00:03:41,430 Import seaborne as S and S so seaborne remembers just that plotting library build on top of the map 54 00:03:41,430 --> 00:03:46,160 plot lib map plot lib is the base plotting library that we're gonna be using. 55 00:03:46,860 --> 00:03:57,580 So we're going to do map plot lib in line because we want our plots to appear inside the notebook. 56 00:03:57,680 --> 00:04:01,600 So this is what this little magic function does with that percentage sign there. 57 00:04:02,590 --> 00:04:09,100 And then we want to import models from my skyline so I can't line. 58 00:04:09,350 --> 00:04:12,890 Also known as SDK line you're about to see that in a second. 59 00:04:12,890 --> 00:04:22,830 So from S K learn we want a linear model implored logistic regression. 60 00:04:22,850 --> 00:04:32,980 Now if you're wondering where these models came from remember we've got cyclic loan model map choosing 61 00:04:32,980 --> 00:04:42,170 the right estimate of wonderful so we follow this along in a previous section we're working on a classification 62 00:04:42,170 --> 00:04:42,470 problem. 63 00:04:42,470 --> 00:04:46,100 So we're only gonna be importing classification models. 64 00:04:46,100 --> 00:04:50,900 So from S.K. loan dot neighbors we're also going to import another model here. 65 00:04:52,690 --> 00:04:54,210 So kind neighbors classifier 66 00:04:57,360 --> 00:05:01,800 and you remember this one on some you might not have seen the first two but that's okay. 67 00:05:01,860 --> 00:05:07,230 We'll talk about then when we when it comes to it random forest classifier we have seen that one and 68 00:05:07,230 --> 00:05:16,210 then we're going to go model evaluations we're going to go from S.K. learn dot model selection import 69 00:05:16,300 --> 00:05:18,190 train test split. 70 00:05:18,190 --> 00:05:21,440 This is gonna help us split our data into training and test sets. 71 00:05:21,490 --> 00:05:24,190 We'll also import cross vowel score from here. 72 00:05:24,190 --> 00:05:27,410 Now if you're wondering how I know all these it's because I've had some practice. 73 00:05:27,430 --> 00:05:32,610 And if you're all looking at this I'm going Holy goodness there's a lot of things to remember it important 74 00:05:32,610 --> 00:05:33,220 you like. 75 00:05:33,760 --> 00:05:34,590 Well that's what I'm saying. 76 00:05:34,600 --> 00:05:40,300 I'm agreeing with you that machine learning is a very broad field and there's a lot of different ways 77 00:05:40,300 --> 00:05:41,650 to do similar things. 78 00:05:41,980 --> 00:05:47,920 But after some hands on practice like what we're going through now you'll start to get a bit more familiar. 79 00:05:47,920 --> 00:05:49,060 And again it's no rush. 80 00:05:49,180 --> 00:05:55,230 I still have to look up things so I've done these on separate lines. 81 00:05:55,230 --> 00:05:58,860 These two I could have done them on the same line because they're from the same module. 82 00:05:58,860 --> 00:06:02,000 But just to keep it a little bit tidy a little bit polyphonic. 83 00:06:02,010 --> 00:06:04,880 We want to go hear from S.K. learned metrics. 84 00:06:04,920 --> 00:06:08,080 Import confusion matrix classification report. 85 00:06:08,100 --> 00:06:13,630 Well they may be familiar from the evaluation section of the socket loan section. 86 00:06:13,710 --> 00:06:14,430 We go here. 87 00:06:14,460 --> 00:06:15,000 Import. 88 00:06:15,000 --> 00:06:19,170 We want to get precision score because we're thinking in our head. 89 00:06:19,380 --> 00:06:21,730 This is a classification problem. 90 00:06:21,780 --> 00:06:25,910 What metrics can we use to evaluate classification models. 91 00:06:26,020 --> 00:06:37,560 If one score from S.K. loan metrics import plot ROIC curve because we want to Roc curve while you might 92 00:06:37,560 --> 00:06:40,340 be looking at that go and Daniel that is a lot of stuff. 93 00:06:40,350 --> 00:06:45,120 And it truly is this is twenty one lines of imports minus a few comments. 94 00:06:45,330 --> 00:06:48,020 But this is the tools that we're going to need. 95 00:06:48,050 --> 00:06:54,390 So if we come back here we may still even need more I might have missed some but this is kind of what 96 00:06:54,390 --> 00:06:59,280 you'll see at the start of a notebook is because once you've set it up you want to approach these and 97 00:06:59,280 --> 00:07:02,070 we want to minimize our time between experiments. 98 00:07:02,100 --> 00:07:07,020 So that's why we're getting all our tooling ready right at the top so that we can get hands on as soon 99 00:07:07,020 --> 00:07:08,050 as possible. 100 00:07:08,100 --> 00:07:12,450 So we've got pandas we've got that plot label we've got num pi we're working with the energy put a notebook 101 00:07:12,840 --> 00:07:18,630 in our condo environment we've got socket loan and now we're going to work through the data analysis 102 00:07:18,810 --> 00:07:19,310 section. 103 00:07:19,320 --> 00:07:26,010 So that means basically importing the data having a look at it and trying to become ourselves a subject 104 00:07:26,010 --> 00:07:31,500 matter expert which is just someone who knows about the data like a fancy word for someone who knows 105 00:07:31,500 --> 00:07:32,180 about the data. 106 00:07:32,190 --> 00:07:34,860 That's what we're trying to do in these three steps here. 107 00:07:35,730 --> 00:07:40,740 So without further ado let's hit shift and enter import our tools fingers crossed. 108 00:07:40,740 --> 00:07:45,110 If we've set up our environment to work they should all import correctly. 109 00:07:45,330 --> 00:07:46,320 What do we get in here. 110 00:07:46,320 --> 00:07:49,310 Something wrong runtime error. 111 00:07:49,520 --> 00:07:58,010 MP Hugh funk size change may indicate binary incompatibility expected 1 9 2 from C and I got to 16 from 112 00:07:58,010 --> 00:07:58,940 pi object. 113 00:07:58,940 --> 00:08:00,710 I've actually never seen that error before. 114 00:08:00,980 --> 00:08:02,930 So this is what you come across right. 115 00:08:02,990 --> 00:08:07,550 These type of things even after you've had some experience you'll come across these things. 116 00:08:07,550 --> 00:08:12,220 This warning so we're just waiting on to make sure that this cell runs fully. 117 00:08:12,290 --> 00:08:18,310 It's already given us a warning or we're getting another warning here again. 118 00:08:18,320 --> 00:08:25,040 MP though you fund now we might let this cell run out as in Finnish because you're gonna take a little 119 00:08:25,040 --> 00:08:25,570 while. 120 00:08:27,390 --> 00:08:36,030 Unrecognised arguments maybe I can't put this this needs to go above here. 121 00:08:36,140 --> 00:08:39,590 Maria will this import. 122 00:08:39,610 --> 00:08:40,330 Yes it should. 123 00:08:40,330 --> 00:08:41,710 And now we're getting no warnings. 124 00:08:41,710 --> 00:08:43,570 Huh that's interesting. 125 00:08:43,570 --> 00:08:48,580 Well that's the surprises that you'll get from day to day running working with different machine learning 126 00:08:48,580 --> 00:08:50,410 code and different code in general. 127 00:08:50,480 --> 00:08:53,540 Sometimes it does things that you don't really expect. 128 00:08:53,800 --> 00:08:55,840 And so that warning has gone away. 129 00:08:55,840 --> 00:09:00,400 If it does come back in the future we'll probably deal with it then but for now we'll assume that we've 130 00:09:00,400 --> 00:09:03,550 got our tools ready now we can start applying them.