1 00:00:00,170 --> 00:00:01,770 Hoo hoo hoo hoo hoo. 2 00:00:01,980 --> 00:00:04,090 I'm rubbing my hands if you can't hear them. 3 00:00:04,230 --> 00:00:04,980 You know why. 4 00:00:05,100 --> 00:00:06,240 Because I'm pumped. 5 00:00:06,270 --> 00:00:13,080 I'm ready to to evaluate our machine learning models or see how they go see how each of these models 6 00:00:13,170 --> 00:00:18,220 using the function we've just created in the last video see how logistic regression goes. 7 00:00:18,260 --> 00:00:24,000 Caner its name is classifier and random for its classifier goes at finding patterns in our training 8 00:00:24,000 --> 00:00:27,660 data and then how those patterns get evaluated in our test data. 9 00:00:28,380 --> 00:00:36,440 So without any further ado let's see how each of these models performs so we can call our function fit 10 00:00:36,440 --> 00:00:40,230 and score and hopefully it works and we got here. 11 00:00:40,530 --> 00:00:46,710 We see that it takes models or actually if we press shift tab this is where our doctoring comes in handy 12 00:00:46,710 --> 00:00:51,050 because it tells us what happens fits and evaluates given machine learning models. 13 00:00:51,060 --> 00:00:51,970 This is beautiful. 14 00:00:51,990 --> 00:00:56,850 This is just like what we've seen with other functions but except we've created this one ourself. 15 00:00:57,060 --> 00:00:58,800 That's the helpfulness of a doctrine right. 16 00:00:59,370 --> 00:01:00,290 So we're gonna go here. 17 00:01:00,300 --> 00:01:09,200 Models equals models which is our dictionary of machine learning models we could just go X train x test 18 00:01:10,390 --> 00:01:17,650 but for completeness we're gonna go X train equals x train and then we're going to go X test equals 19 00:01:17,680 --> 00:01:28,830 x test and then we're going to go y train equals Y train and then finally y test equals Y test. 20 00:01:29,080 --> 00:01:36,860 And if we've written that function correctly what do you think it's going to return. 21 00:01:36,930 --> 00:01:40,390 I'll give you a few seconds okay. 22 00:01:40,520 --> 00:01:42,230 That's enough because we want to see it. 23 00:01:42,230 --> 00:01:45,380 If in doubt run the code let's see what they our function returns 24 00:01:48,100 --> 00:01:48,380 okay. 25 00:01:48,410 --> 00:01:49,800 So we get some warnings here. 26 00:01:49,820 --> 00:01:52,140 What's it say stop. 27 00:01:52,140 --> 00:01:58,150 Total number of iterations reached would require us to dive in. 28 00:01:58,170 --> 00:02:00,920 Increase the number of iterations so this is where it's helpful. 29 00:02:00,960 --> 00:02:04,260 Increase the number of iterations Max ITER or scale it down. 30 00:02:04,260 --> 00:02:09,780 So this is saying that potentially our logistic regression model could be improved and to figure out 31 00:02:09,780 --> 00:02:14,770 how you could do that would require going to the documentation for alternative solver options. 32 00:02:14,820 --> 00:02:17,510 But what we're gonna do is just work with what we've got so far. 33 00:02:17,580 --> 00:02:23,000 So if we have a look at this this is the score how each of our models without tuning. 34 00:02:23,160 --> 00:02:25,470 Oh that's a spoiler for what's coming out. 35 00:02:25,470 --> 00:02:35,160 This is how each of our models as the baseline class has performed on finding patterns in our test data. 36 00:02:35,190 --> 00:02:36,390 So it's getting the score here. 37 00:02:36,390 --> 00:02:37,850 This is what we're getting back. 38 00:02:37,950 --> 00:02:45,180 So if we look at this which ones are highest look at logistic regression coming in as the dark horse. 39 00:02:45,450 --> 00:02:48,440 Not even on the machine learning map. 40 00:02:48,600 --> 00:02:49,410 Right. 41 00:02:49,500 --> 00:02:51,710 And it's getting the highest score. 42 00:02:51,710 --> 00:02:52,690 Mm hmm. 43 00:02:52,740 --> 00:02:59,310 Well this might take a little bit of investigation because remember where we up to we're two experiments 44 00:02:59,320 --> 00:03:00,790 and that's what we're doing here. 45 00:03:00,790 --> 00:03:05,650 An experiment is comparing different models trying different models and comparing them to each other. 46 00:03:05,650 --> 00:03:07,810 Now this is a comparison in a dictionary. 47 00:03:07,810 --> 00:03:12,750 But if we wanted to show this to someone we might compare it visually. 48 00:03:12,800 --> 00:03:16,900 So let's do that nice and quickly we can make a model. 49 00:03:16,900 --> 00:03:19,100 Actually we might make a little heading here. 50 00:03:19,590 --> 00:03:28,560 So we go here model comparison beautiful model compare equals might turn our dictionary into a data 51 00:03:28,560 --> 00:03:32,070 frame because that's nice and simple to model scores. 52 00:03:32,100 --> 00:03:33,990 This is just taking this dictionary here. 53 00:03:34,020 --> 00:03:41,420 Model scores and we're gonna set the index to be what does it have to be and has to be a list called 54 00:03:41,450 --> 00:03:44,360 accuracy because that's what this score function here. 55 00:03:44,360 --> 00:03:48,520 Because our models are all classifiers their default score metric is accuracy. 56 00:03:48,530 --> 00:03:50,420 We saw that in their socket Loan Section. 57 00:03:50,930 --> 00:03:59,720 So if we go here model of compare we need to transpose it plot dot bar boom. 58 00:03:59,720 --> 00:04:04,580 Now I'll just show you why we need to transpose it because if we did that it would look we'd like that 59 00:04:04,790 --> 00:04:07,880 and we actually want it to look good like that. 60 00:04:08,690 --> 00:04:13,110 So this is this is quickly showing how accurate each of our different models are. 61 00:04:13,300 --> 00:04:20,060 And you can see logistic regression just tips out our random forest and K and N Well because that's 62 00:04:20,060 --> 00:04:22,850 nowhere near the accuracy of our logistic regression model. 63 00:04:22,850 --> 00:04:27,950 We're going to say goodbye to cane in so once we've got this graph this might be something that we can 64 00:04:27,950 --> 00:04:33,470 take to the boss or one of our colleagues and say hey look at this we've built a machine learning model 65 00:04:33,590 --> 00:04:36,920 a beautiful logistic regression model and it's performed the best. 66 00:04:36,920 --> 00:04:40,630 So we're going to use the logistic regression model in practice. 67 00:04:40,640 --> 00:04:41,660 And so you might go. 68 00:04:41,690 --> 00:04:42,550 I found it. 69 00:04:42,890 --> 00:04:44,000 And your boss is like. 70 00:04:44,480 --> 00:04:45,140 Nice one. 71 00:04:45,140 --> 00:04:46,410 What did you find. 72 00:04:46,640 --> 00:04:51,990 And then you're like well the best algorithm for predicting heart disease is logistic regression and 73 00:04:51,990 --> 00:04:53,360 then she might say something like. 74 00:04:54,180 --> 00:04:55,280 Excellent. 75 00:04:55,280 --> 00:04:57,970 I'm surprised the hyper Reverend attorney isn't finished by now. 76 00:04:58,050 --> 00:05:01,930 And then you might wonder what is hyper parameter tuning. 77 00:05:01,950 --> 00:05:02,800 Yeah me too. 78 00:05:02,810 --> 00:05:04,230 I went pretty quick. 79 00:05:04,230 --> 00:05:09,300 So you sort of covering yourself here and then she might say Well I'm very proud. 80 00:05:09,570 --> 00:05:14,190 How about you put together a classification report to show the team and be sure to include a confusion 81 00:05:14,190 --> 00:05:18,490 matrix and the class validated precision recall on F1 scores. 82 00:05:18,510 --> 00:05:21,150 I'd also be curious to see what features are most important. 83 00:05:21,480 --> 00:05:27,350 Oh and don't forget to include an ROIC curve and then you might be thinking or asking yourself okay. 84 00:05:27,370 --> 00:05:29,870 There's some person a lot of complex words there. 85 00:05:29,920 --> 00:05:30,310 Right. 86 00:05:30,940 --> 00:05:35,370 But then you actually say of course I'll have it to you by tomorrow. 87 00:05:35,820 --> 00:05:37,390 We're going to take care of all these things. 88 00:05:37,440 --> 00:05:41,640 We're going to take care of all these things in the next video because we're still at a point where 89 00:05:42,210 --> 00:05:45,690 even though we found a great machine learning model and we've probably shown that to the boss of one 90 00:05:45,690 --> 00:05:51,690 of our colleagues and they've gone wow logistic regression performing at 88 percent accuracy we're still 91 00:05:51,690 --> 00:05:59,230 not near our valuation metric which is if we come back up the top which is we said we kind of want at 92 00:05:59,230 --> 00:06:02,970 least 95 percent accuracy to continue with this experiment. 93 00:06:03,010 --> 00:06:08,230 So that's what we're going to be doing as well as fulfilling all those requests that the boss asked 94 00:06:08,230 --> 00:06:13,630 us just to make sure our model was a little bit more robust than just getting the default score metric 95 00:06:14,860 --> 00:06:18,020 so without any further ado take a little break. 96 00:06:18,050 --> 00:06:23,770 Have a review of what we've done so far and we're going to review some of the things that we need to 97 00:06:23,770 --> 00:06:30,490 make sure we do to make sure our classification models are evaluated correctly and are improved as well 98 00:06:30,490 --> 00:06:31,060 as possible.