1 00:00:00,450 --> 00:00:11,620 OK so last video we got in our labels dot CSB which is now in the data frame labels say it's vague and 2 00:00:11,620 --> 00:00:17,980 it contains image ideas which are file names and that their brave label and then we checked out the 3 00:00:17,980 --> 00:00:25,010 distribution of our breed labels and we figured out there's about 80 to also images per class. 4 00:00:25,210 --> 00:00:32,350 Now it'd be good if we could view an image inside the notebook because this is part of our exploratory 5 00:00:32,350 --> 00:00:35,110 data analysis is checking out our data. 6 00:00:35,140 --> 00:00:38,380 Let's do that let's you an image. 7 00:00:38,890 --> 00:00:42,550 So one way to do that is using Python. 8 00:00:42,550 --> 00:00:47,380 Remember even though we're working in Google lab this is still under the hood. 9 00:00:47,400 --> 00:00:50,980 You put a notebook so we can go from Python. 10 00:00:51,030 --> 00:00:56,080 Don't display import Image. 11 00:00:56,220 --> 00:01:06,160 Then we can write image and then we just pass it a file name and a little trick here in collab oh is 12 00:01:06,970 --> 00:01:11,950 before we had to press shift and TAB energy for notebook to find the doctoring in collab and me it's 13 00:01:11,950 --> 00:01:16,320 command space shift and then Siri comes up. 14 00:01:16,320 --> 00:01:19,590 So command shift space. 15 00:01:19,660 --> 00:01:23,910 There we go on Windows it might be control instead of command. 16 00:01:24,880 --> 00:01:25,420 So there we go. 17 00:01:25,420 --> 00:01:27,330 This is our doctoring here. 18 00:01:27,400 --> 00:01:32,560 We need to pass it some sort of data and if we want to see it we need to pass it a file name. 19 00:01:32,650 --> 00:01:36,730 Let's grab one of these image I.D. so I'll just copy that. 20 00:01:36,730 --> 00:01:37,630 It should be. 21 00:01:37,630 --> 00:01:43,880 Actually let's have a look at a dingo See I'm from Australia and we're pretty famous for a story called 22 00:01:43,880 --> 00:01:46,380 dingo took my baby. 23 00:01:47,360 --> 00:01:50,110 I won't go into the details of that but you can look it up. 24 00:01:50,390 --> 00:01:52,390 And then if we go here we go dry. 25 00:01:52,390 --> 00:01:53,790 We need a positive file name. 26 00:01:53,810 --> 00:01:57,190 My drive file path sorry. 27 00:01:57,230 --> 00:01:58,440 And then we go now. 28 00:01:58,790 --> 00:02:03,580 Dog vision folder dog vision slash. 29 00:02:04,080 --> 00:02:06,610 It's going to be in the training folder. 30 00:02:06,890 --> 00:02:14,430 This and we need to put on the end j peg because if we come up to Kaggle we can see that is actually 31 00:02:14,430 --> 00:02:16,040 what the phone name is. 32 00:02:16,050 --> 00:02:20,630 That's the I.D. but the full phone name has the Dodge IPG. 33 00:02:20,910 --> 00:02:28,500 Now let's close this bad boy and shift in into oh look at that cute little Dingo. 34 00:02:28,920 --> 00:02:31,470 But apparently they can be savages at times. 35 00:02:31,470 --> 00:02:33,710 That's one way to view images in our notebook. 36 00:02:33,900 --> 00:02:39,540 So if we needed to explore what image we're working with while we're turning it into a tensor we can 37 00:02:39,540 --> 00:02:46,560 use this little function here or I think there's another one with Matt Lib we can use in the show but 38 00:02:46,560 --> 00:02:47,700 we'll have a look at that one later. 39 00:02:48,660 --> 00:02:53,910 Now we can view an image and we've got the Firefly days and their braids. 40 00:02:53,910 --> 00:03:00,420 Let's get a list of all the file names so we don't have to type out this every single time because that 41 00:03:00,420 --> 00:03:01,730 would get annoying. 42 00:03:01,740 --> 00:03:05,280 So what we might do is write a little mock down cell here. 43 00:03:05,310 --> 00:03:15,930 So press command M.M. changed that into markdown and go here getting images and their labels. 44 00:03:16,260 --> 00:03:24,440 Let's get a list of all of our image file path names. 45 00:03:24,750 --> 00:03:25,910 Beautiful. 46 00:03:26,310 --> 00:03:27,660 And so how can we do this. 47 00:03:27,680 --> 00:03:38,310 So we might go here create path names from Image I.D. Now we've just done one here. 48 00:03:38,310 --> 00:03:39,150 So this is a path. 49 00:03:39,150 --> 00:03:46,380 Name of this particular image and we did it by stealing this value here from label so we might bring 50 00:03:46,380 --> 00:03:56,800 labels CSB down here just so we can have a gaze at it while we work with this label CSA dot head. 51 00:03:56,900 --> 00:03:57,500 There we go. 52 00:03:57,980 --> 00:04:05,600 So we might create a little variable to loop through this and create these long strings at the same 53 00:04:05,600 --> 00:04:06,440 time. 54 00:04:06,540 --> 00:04:12,410 So let's see if we can do that with a list comprehension so file names equals let's just go f name for 55 00:04:12,440 --> 00:04:14,420 F name to start with. 56 00:04:14,420 --> 00:04:21,600 We'll start small in labels CSA idea what does this return 57 00:04:24,380 --> 00:04:29,870 check the first 10 phone aims. 58 00:04:30,050 --> 00:04:31,310 Wonderful. 59 00:04:31,340 --> 00:04:33,910 So this is just a short little list comprehension. 60 00:04:33,920 --> 00:04:42,750 Basically it's saying it's create a list of f name for F name which is short file name in labels CSP 61 00:04:43,160 --> 00:04:44,150 idea column. 62 00:04:44,300 --> 00:04:50,900 So take this idea column go through each one of them and take the value and save it to the list. 63 00:04:50,900 --> 00:04:51,910 So that's what we've got there. 64 00:04:54,900 --> 00:05:01,600 Now we need to change it up to be a bit more like this because these at the moment are just ideas. 65 00:05:01,620 --> 00:05:14,970 So what we'll do is drive my drive dog vision trying as we can to drive slash my drive slash dog vision 66 00:05:15,840 --> 00:05:24,150 slash train and then it's gonna be plus f name on a and destroying plus f name. 67 00:05:24,150 --> 00:05:30,380 What does that return I said check the first 10 here but I've lied. 68 00:05:30,380 --> 00:05:32,060 I didn't put the little slice 69 00:05:36,780 --> 00:05:37,380 OK. 70 00:05:37,380 --> 00:05:44,170 And now we just need to add of course the file extension because otherwise it wouldn't work. 71 00:05:44,180 --> 00:05:48,540 These are GI pigs don't JP How does this look. 72 00:05:50,910 --> 00:05:52,210 Wonderful. 73 00:05:52,230 --> 00:05:52,930 OK. 74 00:05:53,250 --> 00:05:59,580 So now we've got a list of all the file names from the I.D. column of labels CSB what we should do is 75 00:05:59,580 --> 00:06:05,280 compare them to the number of files in our training data directory. 76 00:06:05,280 --> 00:06:06,450 Why would we do this. 77 00:06:06,450 --> 00:06:13,080 Well this is to make sure that we've got the same amount of file names as we do actual files in our 78 00:06:13,080 --> 00:06:14,220 training file. 79 00:06:14,250 --> 00:06:21,370 We downloaded this data from Kaggle we uploaded it to collaborate more specifically our dog vision folder 80 00:06:22,210 --> 00:06:23,390 and then we unzipped it. 81 00:06:23,390 --> 00:06:28,110 So we want to make sure that we're working with the same amount of data. 82 00:06:28,190 --> 00:06:28,940 So let's do that. 83 00:06:28,970 --> 00:06:41,150 So we go here check whether number of file names matches the number of actual image files because otherwise 84 00:06:41,300 --> 00:06:47,080 if we just kept going now and we've worked out later on that hey we've got some issue with our data 85 00:06:48,190 --> 00:06:51,220 then it's kind of going to undo a lot of the stuff that we've done. 86 00:06:51,400 --> 00:06:51,960 So that's it. 87 00:06:51,970 --> 00:06:59,620 That's a big point is a lot of machine learning is just basically massaging your data into a form so 88 00:06:59,620 --> 00:07:02,620 that it works with machine learning if that makes sense. 89 00:07:02,620 --> 00:07:07,720 So that's what we're doing now we're just verifying that the data we've got is the right format the 90 00:07:07,720 --> 00:07:15,460 right amount and then we know once we run our machine learning models they'll be all hunky dory or at 91 00:07:15,460 --> 00:07:17,520 least hunky dory as possible. 92 00:07:17,770 --> 00:07:25,280 So we're going to input ours and we can check the amount of files in a particular file by using list. 93 00:07:25,450 --> 00:07:27,920 So Len OS Lister. 94 00:07:28,480 --> 00:07:31,300 I'll just write the code and then we'll explain it 95 00:07:34,720 --> 00:07:50,280 train is that equal to land of file names and if it is print phone aims match actual amount of files 96 00:07:52,820 --> 00:07:54,120 proceed. 97 00:07:54,310 --> 00:08:05,090 It's like we're giving ourselves a little instruction in print and phone names do not match actual amount 98 00:08:05,090 --> 00:08:10,710 of files check the target directory. 99 00:08:10,750 --> 00:08:11,420 All right. 100 00:08:11,420 --> 00:08:15,540 So what this is going to do if you do our list list lets you see what this prints out 101 00:08:18,280 --> 00:08:19,410 code. 102 00:08:19,750 --> 00:08:25,900 Actually I might only do the first 10 again otherwise it'll be just spam name always is not defined. 103 00:08:25,900 --> 00:08:29,320 Oh that would make sense because of the import os there and not here. 104 00:08:30,340 --> 00:08:31,300 Let's do it up here. 105 00:08:34,680 --> 00:08:42,030 So what this is going to do is tell collab to hey go into dog vision train. 106 00:08:42,050 --> 00:08:44,390 Oh I've just called it data. 107 00:08:44,400 --> 00:08:45,860 There's no file there. 108 00:08:45,990 --> 00:08:48,680 Stop that. 109 00:08:48,890 --> 00:08:52,750 By the way if the executing code is not responding to interrupt. 110 00:08:52,760 --> 00:08:57,080 So if this comes up it's because this cell is unable to stop. 111 00:08:57,080 --> 00:08:59,700 So these are the things you run into. 112 00:08:59,910 --> 00:09:06,870 And the reason why this is going in sort of an infinite loop is because I actually do have a folder 113 00:09:06,870 --> 00:09:09,960 called data which is what this used to be. 114 00:09:09,960 --> 00:09:16,630 And within there there is also a try and follow so it's looping through there. 115 00:09:16,700 --> 00:09:17,530 What I'm gonna do. 116 00:09:17,630 --> 00:09:21,460 Can I can that then candidate. 117 00:09:21,460 --> 00:09:28,050 So let me just tell you what it does always Lister is just gonna go through a directory and then list 118 00:09:28,140 --> 00:09:29,580 all the files in that directory. 119 00:09:29,910 --> 00:09:36,180 So we're saying if the length of all the files in this drive my drive dark vision 120 00:09:39,780 --> 00:09:43,580 his our other warning that's popped up this is beautiful timing. 121 00:09:43,580 --> 00:09:46,460 So these are the actual things that you're going to run into. 122 00:09:46,520 --> 00:09:47,690 Get back to our keynote. 123 00:09:47,690 --> 00:09:48,510 Here we go. 124 00:09:49,480 --> 00:09:55,360 So this is a little area that you might experience if there's something going on with your google drive 125 00:09:55,360 --> 00:09:58,080 and the way that collab is interacting with it. 126 00:09:59,130 --> 00:10:02,310 So the good thing is it usually fixes itself. 127 00:10:02,310 --> 00:10:08,770 And if you want to put this more info tab you'll get this little paragraph of text. 128 00:10:08,770 --> 00:10:16,580 So basically it's saying if you have a lot of files in Google Drive in the top folder. 129 00:10:16,840 --> 00:10:24,800 So if we come back to dog vision so in this folder here if you have a lot of files there. 130 00:10:24,820 --> 00:10:30,610 Go drive can time out as well as if you have a lot of files within a particular folder. 131 00:10:30,780 --> 00:10:32,430 It can also time out. 132 00:10:32,550 --> 00:10:37,230 So that's very handy that we saw that in real time because I don't want you to come across these warnings 133 00:10:37,230 --> 00:10:38,520 and not be sure of what to do. 134 00:10:38,520 --> 00:10:42,860 But if in doubt remember you can always just search this asset question. 135 00:10:42,870 --> 00:10:44,540 I'm sure someone will be out to help you out. 136 00:10:44,550 --> 00:10:46,100 If not I'll help you out. 137 00:10:46,110 --> 00:10:47,730 Click the links. 138 00:10:47,730 --> 00:10:51,790 There's a bunch of things you can try out let's just run the code. 139 00:10:51,870 --> 00:10:53,060 We've spoken about it for too long. 140 00:10:54,560 --> 00:10:56,420 File names match actual amount of files. 141 00:10:56,430 --> 00:10:57,390 Proceed. 142 00:10:57,390 --> 00:11:01,590 Now I will let you test this one out yourself. 143 00:11:01,590 --> 00:11:05,330 We're going to follow the orders of this code here and proceed. 144 00:11:05,390 --> 00:11:08,520 OK so everything's working. 145 00:11:08,520 --> 00:11:16,160 We have all the file names in a list and we have the same amount of file names as we have actual images. 146 00:11:16,170 --> 00:11:21,390 Let's do one more check to make sure that we're visualizing directly from a fall apart so we can go 147 00:11:21,390 --> 00:11:27,070 here one more check. 148 00:11:27,100 --> 00:11:30,370 Remember we want to make sure our data is in the right format. 149 00:11:30,820 --> 00:11:33,220 Let's go the nine thousand. 150 00:11:33,220 --> 00:11:39,690 Image nine thousand index. 151 00:11:39,860 --> 00:11:42,740 Good Lord. 152 00:11:43,190 --> 00:11:45,480 I wonder what type of dog that is. 153 00:11:45,500 --> 00:11:47,360 What an absolute base we can find out. 154 00:11:47,360 --> 00:11:47,780 Really. 155 00:11:48,230 --> 00:11:49,550 Let's go here. 156 00:11:50,120 --> 00:11:56,000 Labels CSB I'd say nine thousand 157 00:11:59,130 --> 00:12:00,710 oh no we won't breed 158 00:12:04,510 --> 00:12:05,740 that thing looks Savage. 159 00:12:05,740 --> 00:12:09,030 Look at the size of that chain Tibetan mastiff. 160 00:12:09,250 --> 00:12:14,080 That even sounds like a tough dog I want one of them. 161 00:12:14,090 --> 00:12:16,370 That's a base that's like a dog Hercules would have. 162 00:12:16,550 --> 00:12:17,790 OK. 163 00:12:18,010 --> 00:12:20,220 Now we've got all of our phone names that are correct. 164 00:12:20,230 --> 00:12:22,710 What we're going to have to do so we've got the file names. 165 00:12:22,810 --> 00:12:25,170 Now we need to work with our labels. 166 00:12:25,180 --> 00:12:29,040 We need to get our labels in a format that we can use. 167 00:12:29,050 --> 00:12:31,020 So let's do that in the next video.