1 00:00:00,720 --> 00:00:06,570 We've got a predictions array but now we need to get it in the Kaggle submission format so that's what 2 00:00:06,570 --> 00:00:08,150 we'll be working towards in this video. 3 00:00:08,940 --> 00:00:12,060 Let's go make ourselves a little heading. 4 00:00:12,060 --> 00:00:21,810 Preparing test data set predictions for Kaggle and if we look at the sample submission we kind of know 5 00:00:21,810 --> 00:00:27,330 what we need and we know we need an I.D. column which is the file I.D. from our test images. 6 00:00:27,330 --> 00:00:34,110 We know we need columns for each of the different dog breeds and we know we need the prediction probabilities 7 00:00:34,110 --> 00:00:37,170 which is what we've got at the moment we've got an array with all of these numbers. 8 00:00:37,200 --> 00:00:43,670 We basically need to just set up our predictions probability array into a data frame like this. 9 00:00:43,820 --> 00:00:48,300 Export it as a CSB and then upload that to Kaggle. 10 00:00:48,300 --> 00:00:49,850 So let's write that down. 11 00:00:50,040 --> 00:00:55,910 Actually we'll put a little link in here of what Kaggle wants a valuation. 12 00:00:55,910 --> 00:00:56,480 Copy that. 13 00:00:56,510 --> 00:01:00,350 This is just what the sample submission has to look like. 14 00:01:00,350 --> 00:01:16,550 Looking at the Kaggle sample submission we find that it wants our models prediction probability outputs 15 00:01:17,270 --> 00:01:30,020 in a data frame with an idea column and a column for each different dog breed. 16 00:01:30,040 --> 00:01:35,270 Well it's not really a data frame it's a CSA but we can be a little bit incorrect there in our wording 17 00:01:35,930 --> 00:01:38,090 and how we're gonna do this. 18 00:01:38,420 --> 00:01:40,220 Let's write ourselves some guides. 19 00:01:40,370 --> 00:01:49,640 So to get the data in this format well this is a testament to our favorite little graphic here of preparing 20 00:01:49,670 --> 00:01:51,170 inputs and outputs. 21 00:01:51,170 --> 00:01:57,420 So now we've worked with the inputs we've saved and reloaded our train model. 22 00:01:57,520 --> 00:02:00,140 Now we're making a bunch of predictions that we have made predictions. 23 00:02:00,160 --> 00:02:05,290 We're preparing our outputs more specifically preparing our outputs for Kaggle. 24 00:02:05,750 --> 00:02:20,570 We will create a panda's data frame with an I.D. idea column as well as a column for each dog breed 25 00:02:21,920 --> 00:02:32,240 and then we're going to add data to the idea column by extracting the test image ideas from their file 26 00:02:32,240 --> 00:02:36,870 parts and then we're going to add data. 27 00:02:37,180 --> 00:02:51,910 The prediction probabilities to each of the dog breed columns and then we want to export the data frame 28 00:02:52,840 --> 00:03:00,790 as a CSP to submit it to Kaggle. 29 00:03:00,820 --> 00:03:02,530 This is what we're after. 30 00:03:02,830 --> 00:03:11,640 And probabilities is spelled wrong standard Daniel type poking some nice little fool stops all right. 31 00:03:11,700 --> 00:03:12,470 So the first step. 32 00:03:12,480 --> 00:03:17,780 Let's go through and create a panel is data frame we've had experience of this way back in the panda 33 00:03:17,790 --> 00:03:20,460 section with empty columns. 34 00:03:20,490 --> 00:03:28,830 So we'll go to parades DLF equals PD data frame so just the standard nomenclature for prediction data 35 00:03:28,830 --> 00:03:36,770 frame parades IDF parades data frame all predictions data frame columns is going to be equal to I D 36 00:03:37,820 --> 00:03:46,330 plus we need a column for each different dog breed so we'll do one list unique breeds and I just want 37 00:03:46,330 --> 00:03:50,270 to show you what list unique breeds looks like list unique breeds 38 00:03:53,640 --> 00:03:54,300 so there we go. 39 00:03:54,840 --> 00:04:09,690 So if we added I.D. to that let's give that about I date IPS trigger happy once again plus there we 40 00:04:09,700 --> 00:04:11,980 go I.D. plus you list unique breeds 41 00:04:14,970 --> 00:04:25,320 that is some column names for us we'll comment out that and we will check out the head of our spreads 42 00:04:25,320 --> 00:04:32,340 IDF should be empty with just a bunch of column names that's what we're after I.D. and all of our unique 43 00:04:32,340 --> 00:04:35,700 breeds in order we've done this. 44 00:04:35,700 --> 00:04:37,910 So putting in some emojis. 45 00:04:38,070 --> 00:04:43,290 All right add data to the idea come on by extracting the test image I.D. from their file paths I believe 46 00:04:43,290 --> 00:04:51,900 we have already done that but what we might do is just do it again test file names but the way we can 47 00:04:51,900 --> 00:05:00,270 get each I.D. is by we want to extract this out so we could do a little bit of fancy regex on this but 48 00:05:00,450 --> 00:05:09,450 I'll show you another way append test image I.D. to predictions data frame. 49 00:05:09,870 --> 00:05:17,650 So we want to go test path look did we already create test path we did. 50 00:05:17,680 --> 00:05:18,640 Okay. 51 00:05:18,790 --> 00:05:19,720 Wonderful. 52 00:05:19,720 --> 00:05:24,380 So let's get test I.D.. 53 00:05:24,430 --> 00:05:33,640 So this is just the path to our test file and test I.D. is going to be equal to a list comprehension 54 00:05:33,640 --> 00:05:35,320 of our stock path. 55 00:05:35,320 --> 00:05:37,940 And then what is it split text. 56 00:05:37,990 --> 00:05:48,810 I'll show you what this does in the second split text and for path in always don't list the test path. 57 00:05:48,820 --> 00:05:52,910 That should be correct how can I show you an example. 58 00:05:53,150 --> 00:05:56,030 Always step path dot split text. 59 00:05:56,030 --> 00:06:01,330 We want to get test file names 0. 60 00:06:01,650 --> 00:06:04,160 Well this work. 61 00:06:04,190 --> 00:06:04,670 There we go. 62 00:06:04,950 --> 00:06:05,230 OK. 63 00:06:05,270 --> 00:06:09,980 So it splits off the last little bit of the path but we only want this. 64 00:06:10,100 --> 00:06:19,790 So that's why we have to do it for just the test path so we want to just go through this test folder. 65 00:06:19,820 --> 00:06:22,500 Now it's not going to show up here is it because there's a lot of images in there. 66 00:06:22,500 --> 00:06:26,650 That'll take a long time to load and Colette you're just going to have to trust me with what this does 67 00:06:28,780 --> 00:06:30,620 let's just see it in action anyway. 68 00:06:30,760 --> 00:06:32,150 Test ideas. 69 00:06:32,560 --> 00:06:39,410 Will this work if in doubt run the code. 70 00:06:39,470 --> 00:06:48,660 All righty actually we don't need the JPEG bit. 71 00:06:48,710 --> 00:06:53,720 So what we're going to have to do we only want the first element of each of these. 72 00:06:54,320 --> 00:07:01,850 So we're going to have to put a little zero index there and let's see what this does. 73 00:07:01,890 --> 00:07:03,120 That's what we're after. 74 00:07:03,120 --> 00:07:04,080 Test days. 75 00:07:05,030 --> 00:07:06,390 I don't want a large print out there. 76 00:07:06,900 --> 00:07:10,100 So what I'm gonna do is just go Fred's DNF. 77 00:07:10,200 --> 00:07:12,590 So I'm taking my data frame from up here. 78 00:07:12,710 --> 00:07:18,430 I'm going to make an I.D. column with the test I.D.. 79 00:07:18,900 --> 00:07:24,930 So just taking that and putting that there little bit of overkill running this cell three times but 80 00:07:25,290 --> 00:07:26,430 that is all right. 81 00:07:26,460 --> 00:07:32,860 Let's check out the head of our spreads ADF as we always do when we manipulate our data okay. 82 00:07:32,880 --> 00:07:34,230 So you've got some ideas in there. 83 00:07:34,230 --> 00:07:40,770 Now we need to just append all of our prediction probabilities for each dog breed. 84 00:07:40,770 --> 00:07:42,440 Now how might we do that. 85 00:07:42,440 --> 00:07:43,290 Mm hmm. 86 00:07:43,320 --> 00:07:47,700 I want you to have a think about this for a second. 87 00:07:47,830 --> 00:07:49,310 What do we have. 88 00:07:49,420 --> 00:07:56,390 We have test predictions which is an array of predictions probabilities and what do we have that's a 89 00:07:56,390 --> 00:07:57,950 list of different dog breeds. 90 00:07:58,910 --> 00:08:05,330 We have unique breeds cos we do so add the prediction probabilities this is getting exciting we're getting 91 00:08:05,330 --> 00:08:11,260 so close to being out to submit our first full blown DB learning model to Kaggle. 92 00:08:11,450 --> 00:08:19,150 We've gone from head to toe the whole process the whole shebang. 93 00:08:19,430 --> 00:08:21,130 She can't tell I'm pumped. 94 00:08:21,150 --> 00:08:26,640 So this is going to access the parades DLF of all of the unique breeds. 95 00:08:26,640 --> 00:08:34,770 So all of the braids columns and then we want to just set that easy this is to test predictions. 96 00:08:34,770 --> 00:08:37,200 Parades the F dot head 97 00:08:40,830 --> 00:08:45,050 and we are done ladies and gentlemen have a go at that. 98 00:08:45,150 --> 00:08:48,360 We would compare that to the sample submission. 99 00:08:48,480 --> 00:08:54,860 Of course this goes on for a lot longer but ours is using the model we have built. 100 00:08:54,870 --> 00:08:57,250 The numbers are going to be different but that's okay. 101 00:08:57,480 --> 00:09:02,240 Now let's export it to a CSA so we can submit it to Kaggle. 102 00:09:02,310 --> 00:09:06,070 How do we export Panda's data frame to CSP. 103 00:09:06,090 --> 00:09:14,670 We use to see as vehicles we want to go drive my drive I'm going to go to dog vision and I'm just gonna 104 00:09:14,700 --> 00:09:27,930 save it as maybe full model predictions submission one CSA or maybe we'll all tell ourselves my net 105 00:09:29,160 --> 00:09:32,670 V2 don't see us. 106 00:09:32,860 --> 00:09:34,900 And we don't want index. 107 00:09:35,230 --> 00:09:39,570 So index equals false we'll go here. 108 00:09:39,570 --> 00:09:51,390 We'll just write what we're doing save our predictions data frame to see us v for submission to Kaggle 109 00:09:52,900 --> 00:09:54,610 is is gonna save. 110 00:09:54,710 --> 00:09:56,030 Fingers crossed. 111 00:09:56,240 --> 00:09:58,670 Yes of course it will. 112 00:09:58,670 --> 00:09:59,810 Now let's have a look in here. 113 00:10:01,490 --> 00:10:11,400 In our files oh maybe we've glitched out we've just saved a c v there may be lives lost the plot on 114 00:10:11,400 --> 00:10:12,570 us. 115 00:10:12,570 --> 00:10:21,700 Give me a second for my lab window to reload and I'll be back once I have access to this folder okay. 116 00:10:21,770 --> 00:10:22,240 I'm back. 117 00:10:22,520 --> 00:10:24,860 So sorry about that little interruption in broadcast. 118 00:10:24,860 --> 00:10:27,770 Seems my collab window decided to go bonkers. 119 00:10:27,770 --> 00:10:28,520 I see why. 120 00:10:28,520 --> 00:10:35,750 Because I decided to open the test folder and I need to close that and I might have to pause it again 121 00:10:36,740 --> 00:10:38,120 one second. 122 00:10:38,120 --> 00:10:38,440 All right. 123 00:10:38,450 --> 00:10:39,380 Third time's a charm. 124 00:10:39,380 --> 00:10:40,060 We are back. 125 00:10:40,070 --> 00:10:44,830 So be very careful with folders that you open in your lab window. 126 00:10:44,990 --> 00:10:48,160 If they're full of data a.k.a. 10000 plus images. 127 00:10:48,260 --> 00:10:54,430 It will freeze up your co lab notebook there we go for model predictions. 128 00:10:54,430 --> 00:10:56,370 Well we can't really see the whole thing there. 129 00:10:56,480 --> 00:10:57,130 Oh there we go. 130 00:10:57,370 --> 00:10:59,200 And that's just our CSB file there. 131 00:10:59,800 --> 00:11:02,000 So if we wanted to we can download that. 132 00:11:02,050 --> 00:11:03,530 We do want to. 133 00:11:03,550 --> 00:11:05,190 So we'll just wait for that to go through. 134 00:11:05,200 --> 00:11:09,650 I'll speed up my video until this is fully downloaded Ronnie. 135 00:11:09,740 --> 00:11:11,710 So this is fully downloaded. 136 00:11:11,750 --> 00:11:15,980 All I did was just click download and then waited for it to go to about a minute or so not too long. 137 00:11:15,980 --> 00:11:17,900 This is about a 30 megabyte CSB. 138 00:11:18,160 --> 00:11:21,120 We come to my downloads folder. 139 00:11:21,120 --> 00:11:21,910 If we come here. 140 00:11:21,970 --> 00:11:26,580 Yeah about a 30 megabyte C as V so 10000 predictions. 141 00:11:26,690 --> 00:11:27,590 Now we've got that. 142 00:11:27,620 --> 00:11:33,590 Let's go to cargo and we can go late submission because this competition has already close you can see 143 00:11:33,590 --> 00:11:36,390 my past one there five days ago. 144 00:11:36,620 --> 00:11:42,140 So let's upload I want to show you that this works open. 145 00:11:42,590 --> 00:11:51,680 I'm just selecting the CSB we just downloaded it's gonna go through a briefly describe your submission. 146 00:11:51,910 --> 00:11:52,640 Let's go. 147 00:11:52,720 --> 00:11:55,180 Transfer learning that's what we'll call it because that's what we used. 148 00:11:55,180 --> 00:12:05,590 Transfer learning my ball net the to dog vision. 149 00:12:05,630 --> 00:12:12,110 Baby and then we'll make submission and it's gonna score it. 150 00:12:12,220 --> 00:12:14,440 Now there's something up with this. 151 00:12:14,570 --> 00:12:19,710 Oh 20 so that would be our goal is to beat this one on the next one. 152 00:12:20,620 --> 00:12:28,140 And if we go back to the leaderboard see where we appeared we'll probably find public leaderboard score 153 00:12:28,140 --> 00:12:30,570 0. 154 00:12:31,210 --> 00:12:36,010 I'd be very skeptical of any kind of competition where people have a score of zero. 155 00:12:36,100 --> 00:12:45,040 So what you can actually find out is you can actually janky or actually hack the score here so don't 156 00:12:45,040 --> 00:12:48,960 worry too much about submitting to here on a late submission. 157 00:12:48,970 --> 00:12:56,350 Your next goal is to reduce your own score and if you want my example submission CSA I'll just like 158 00:12:56,350 --> 00:13:00,550 this in the resource section you'll probably get the same score as me. 159 00:13:00,660 --> 00:13:06,990 Our next step in this project if we were to keep extending it and train a new model would be to beat 160 00:13:07,080 --> 00:13:12,810 our original score rather than paying attention to the leaderboard straight away is focus on when you 161 00:13:12,810 --> 00:13:15,690 first begin beach your own score. 162 00:13:15,690 --> 00:13:23,720 So now we've seen how to make full blown prediction for the cargo competition but that's not our original 163 00:13:23,720 --> 00:13:24,440 focus. 164 00:13:24,440 --> 00:13:25,610 Well it kind of was. 165 00:13:25,610 --> 00:13:27,760 I lied right back up the top. 166 00:13:27,800 --> 00:13:34,260 What's this notebook called Dog vision so our problem is identifying the breed of a dog given an image 167 00:13:34,260 --> 00:13:34,730 of a dog. 168 00:13:34,890 --> 00:13:39,720 When I'm sitting at the cafe and I take a photo of a dog I want to know what breed of dog it is. 169 00:13:39,840 --> 00:13:47,060 So it's likely you want to be able to make predictions on your own custom images. 170 00:13:47,070 --> 00:13:51,750 So rather than just images that Kaggle has given us in a test and training set. 171 00:13:51,780 --> 00:13:58,290 What if you had some images of your own dog which I happen to have or a dog that you saw at the cafe 172 00:13:58,290 --> 00:14:02,550 that looked really cute and pretty and you wanted to know what breed it was you took a photo and you 173 00:14:02,550 --> 00:14:07,530 came back to a machine learning model in this notebook and you like I want to see what breed of dog 174 00:14:07,530 --> 00:14:09,380 that was using my model. 175 00:14:09,390 --> 00:14:10,570 Well let's have a look at how to do that. 176 00:14:10,570 --> 00:14:14,160 The next video making predictions on custom images.