1 00:00:00,450 --> 00:00:03,390 Okay everyone I hope you're as excited as I am. 2 00:00:03,390 --> 00:00:04,350 Why. 3 00:00:04,350 --> 00:00:06,500 Well because we're on the homestretch. 4 00:00:06,570 --> 00:00:10,640 We're up to number six save and load a trained model. 5 00:00:10,950 --> 00:00:12,910 So we've gone through all these steps here. 6 00:00:12,990 --> 00:00:17,430 We've seen how to improve our models with hyper parameter tuning and let's say we're at a point where 7 00:00:17,430 --> 00:00:21,780 we've improved our model to the point that we're happy to either share that with one of our colleagues 8 00:00:21,780 --> 00:00:27,300 or something like that or export it and save it and put it in one of our applications so that our customers 9 00:00:27,300 --> 00:00:30,430 might want to use it might want to access its predictions. 10 00:00:30,540 --> 00:00:34,750 Something along those lines essentially we want to take our model that we've trained. 11 00:00:34,800 --> 00:00:36,210 We've got these results. 12 00:00:36,480 --> 00:00:42,360 Export it to file and so that we can import it and use it somewhere else rather than it just being within 13 00:00:42,360 --> 00:00:46,910 our Jupiter notebook so let's create a heading. 14 00:00:46,910 --> 00:00:54,260 Six guys saving and loading trained machine learning models. 15 00:00:55,520 --> 00:00:56,580 Beautiful. 16 00:00:56,840 --> 00:00:59,300 And there's two ways we can do this. 17 00:00:59,300 --> 00:01:05,930 So two ways to save and load chain learning models. 18 00:01:05,930 --> 00:01:18,740 First I'm gonna cover is with Python's pickle module and the second one is with the job leave module. 19 00:01:18,740 --> 00:01:27,610 So let's check out how we might do it with pickle so the first thing is to import pickle and then and 20 00:01:27,990 --> 00:01:33,270 if you're wondering what pickle is you can go Python pickle as always if if we're not sure what something 21 00:01:33,270 --> 00:01:38,550 is we can search it up so python object serialization. 22 00:01:38,550 --> 00:01:44,730 So if you read this the pickle module implements binary protocols for serialization decentralizing a 23 00:01:44,730 --> 00:01:52,650 python object structure well in our case if you wondering what a python object structure is in our case 24 00:01:52,930 --> 00:01:55,860 a python object is our model. 25 00:01:55,860 --> 00:02:03,060 So if we want to say for example save G.S. classifiers who say we're most happy with our grid search 26 00:02:03,060 --> 00:02:04,260 parameters model. 27 00:02:04,380 --> 00:02:07,500 So the random forest classifier we've trained with grid search. 28 00:02:07,680 --> 00:02:13,120 If we wanted to export that this is our python object and that's what people can help us do. 29 00:02:13,200 --> 00:02:13,860 So let's do that. 30 00:02:13,860 --> 00:02:20,460 Let's say an example of how we might export our GCF which is a python object to a pickle file. 31 00:02:20,640 --> 00:02:25,950 So save an existing model to file. 32 00:02:26,390 --> 00:02:28,880 Yes please. 33 00:02:28,880 --> 00:02:35,030 And to do so we're going to use pickle dot dump we're going to pass it the object we would like to save 34 00:02:35,030 --> 00:02:41,660 in our case G.S. CnF our grid searched classifier and then you can name it something here. 35 00:02:41,660 --> 00:02:47,540 So we need to use the open function so Python's open function for dealing with input and output of files 36 00:02:48,290 --> 00:02:53,900 and in our case we're just going to I guess random forest model 1. 37 00:02:53,900 --> 00:02:56,720 Now you can get as creative as you like with these names. 38 00:02:56,720 --> 00:03:00,650 There might be some nomenclature that you decide upon with your team or something like that. 39 00:03:00,650 --> 00:03:02,110 That's what you use for your models. 40 00:03:02,240 --> 00:03:09,050 But now case I've just gone something nice and simple and Dot P K L stands for pickle file and we also 41 00:03:09,050 --> 00:03:17,240 pass because we're writing a file we pass the parameter WB for write binary so if we hit shift and enter 42 00:03:18,430 --> 00:03:27,120 beautiful this is going to save our grid search model with this file name to our current working directory. 43 00:03:27,120 --> 00:03:32,580 So if we go up here if we go guess random what we've got to randoms in there. 44 00:03:32,760 --> 00:03:34,780 So he s random random forest. 45 00:03:34,820 --> 00:03:40,170 So this is a random random forest really random one dot scale and you can see there's some other model 46 00:03:40,170 --> 00:03:40,970 files here. 47 00:03:41,040 --> 00:03:46,500 So dot P K L This is one I tried just before and this is another one we we created right at the start 48 00:03:46,500 --> 00:03:51,960 where we were looking at a cyclone workflow and now if you've exported a fair few of these models you 49 00:03:51,960 --> 00:03:56,600 might want to put them in like a model file like we've got our data in a data file. 50 00:03:56,670 --> 00:04:01,680 So this is just one way it to tidy up the directory but you can see the one we've just saved here that's 51 00:04:01,680 --> 00:04:05,810 seconds ago now if we come back how would we load that in. 52 00:04:06,060 --> 00:04:11,750 Well we can do it with people as well so load a saved model. 53 00:04:11,880 --> 00:04:18,520 So we're going to go loaded people model Eagles pickle dot load. 54 00:04:18,540 --> 00:04:24,270 Now the advantage here is that because we've saved a model to a pickle file and because it's stored 55 00:04:24,270 --> 00:04:31,380 in our directory here is that someone who wanted to use our model won't have to go through all of the 56 00:04:31,380 --> 00:04:33,150 training steps that we've done above. 57 00:04:33,150 --> 00:04:39,010 They can just access the model and be able to make predictions by loading it in we could copy this file 58 00:04:39,010 --> 00:04:39,480 name here. 59 00:04:39,490 --> 00:04:45,760 But we're just going to type it in Jesus tab we used tab auto complete for that and now because we're 60 00:04:45,760 --> 00:04:51,390 reading in a file we passing the parameter R B for read binary. 61 00:04:51,400 --> 00:04:52,010 Wonderful. 62 00:04:52,090 --> 00:04:54,080 So that's loaded in our save model. 63 00:04:54,220 --> 00:05:00,580 Now the way to check it is we can make some predictions to see if this model if it actually worked if 64 00:05:00,580 --> 00:05:01,330 we saved it. 65 00:05:01,480 --> 00:05:05,090 And we can use it to make predictions while then we're in a good place. 66 00:05:05,120 --> 00:05:11,620 We want to do loaded people model don't predict now similar to how we made predictions before with our 67 00:05:11,830 --> 00:05:13,060 GCF. 68 00:05:13,130 --> 00:05:21,010 We're just going to run this exact same line here but make predictions without pickled model predict 69 00:05:21,190 --> 00:05:22,910 on the test dataset. 70 00:05:23,050 --> 00:05:23,620 Wonderful. 71 00:05:23,650 --> 00:05:32,020 And then we'll call our evaluate parades function and compare it the test labels to the pickle y parades 72 00:05:34,050 --> 00:05:38,310 what have we got here found input variables with inconsistent number of samples. 73 00:05:38,730 --> 00:05:39,170 Mm hmm. 74 00:05:40,970 --> 00:05:42,070 I know what we've done. 75 00:05:42,110 --> 00:05:46,320 We've got a data mismatch here what we'd have to do 76 00:05:49,190 --> 00:05:54,800 is I believe if we just commented out this line and recreate we have to re instantiate our train and 77 00:05:54,800 --> 00:05:56,690 test split that's what we have to do. 78 00:05:56,780 --> 00:06:05,400 So maybe I'll just comment out these lines for the time being run this cell come back down here we load 79 00:06:05,400 --> 00:06:09,360 it in so the mistake here I believe and maybe wrong hit shift enter. 80 00:06:09,690 --> 00:06:10,870 There we go. 81 00:06:11,160 --> 00:06:18,720 Is that our x test had been instantiated somewhere above and we had to re instantiate it so that it 82 00:06:18,720 --> 00:06:20,910 was reset back to the correct shape. 83 00:06:21,060 --> 00:06:26,530 And now the way we can know that this is gone through correctly is if we can pair. 84 00:06:26,610 --> 00:06:30,330 So if we compare our original guess CnF predictions. 85 00:06:30,390 --> 00:06:36,240 So these values here if we copy these is the original grid search classify metrics 86 00:06:38,960 --> 00:06:39,860 so let's have a look. 87 00:06:40,010 --> 00:06:44,550 So this is our loaded models metrics accuracy seventy eight point six nine. 88 00:06:44,600 --> 00:06:53,690 Same precision zero point seven four same recall zero point eight two same and F1 score zero point seven 89 00:06:53,690 --> 00:06:53,900 eight. 90 00:06:53,930 --> 00:06:59,330 So this same that means that our import our saving our model and reporting it. 91 00:06:59,330 --> 00:07:05,150 So loading it in has worked correctly and now again these names here you can change them to whatever 92 00:07:05,150 --> 00:07:05,890 you like. 93 00:07:05,900 --> 00:07:11,300 Same with the loaded variable names but this is just a quick example of how you can use pickle to quickly 94 00:07:11,300 --> 00:07:17,810 save one of your models to file and then load it back in at a later date if need be and then make predictions 95 00:07:17,810 --> 00:07:24,050 using the exact same loaded model that get the exact same results as the model we originally saved in 96 00:07:24,050 --> 00:07:28,550 the next video we'll see how we can do the same thing except this time using job Lib.