1 00:00:00,210 --> 00:00:03,480 Now this is exactly what I was talking about. 2 00:00:03,540 --> 00:00:07,860 If we go back to the KEYNOTE things you might see in collab. 3 00:00:07,860 --> 00:00:09,070 This is one of them. 4 00:00:09,090 --> 00:00:11,970 So what it means is that I took a little break just now. 5 00:00:11,970 --> 00:00:18,710 And if we come back to collab because of the limited amount of resources that Google is able to provide. 6 00:00:18,840 --> 00:00:25,470 If you leave your notebook in collab for a long enough time it will eventually disconnect from a runtime 7 00:00:25,710 --> 00:00:31,340 which is a runtime is again just a computer on Google server that powers you know. 8 00:00:31,710 --> 00:00:34,680 So this is what you do if you have to take a break. 9 00:00:34,740 --> 00:00:36,740 You should take a break every so often. 10 00:00:36,780 --> 00:00:39,270 You're not spending too much time sitting down go for a walk. 11 00:00:39,270 --> 00:00:40,400 Nature is beautiful right. 12 00:00:40,850 --> 00:00:41,970 Well that's what I just did. 13 00:00:41,970 --> 00:00:47,860 I just took my dog for a walk so if you do get this little pop up you're going to have to reconnect. 14 00:00:47,860 --> 00:00:51,310 But you'll see up here the runtime is reconnecting. 15 00:00:51,310 --> 00:00:57,440 So it's initializing mounting Google Drive which is where our files are maybe it takes a little while 16 00:00:57,440 --> 00:00:58,820 to appear. 17 00:00:58,880 --> 00:00:59,480 There we go. 18 00:00:59,480 --> 00:01:00,970 Drive. 19 00:01:00,980 --> 00:01:03,030 We'll have google drive back. 20 00:01:03,050 --> 00:01:06,260 There's dog vision our beautiful little project photo. 21 00:01:06,350 --> 00:01:06,770 Excellent. 22 00:01:06,770 --> 00:01:08,470 We can close this out we don't need that. 23 00:01:08,570 --> 00:01:12,800 But what we're going to have to do is Oh better take off these right. 24 00:01:12,800 --> 00:01:14,890 Pay attention to my own notes. 25 00:01:15,320 --> 00:01:19,850 I'll put a little green tick emoji we've imported import intensive not hub making sure we're using a 26 00:01:19,850 --> 00:01:21,990 GP you now. 27 00:01:22,010 --> 00:01:28,730 The beautiful thing is the notebook itself will save the fact that we are using a GP you. 28 00:01:28,890 --> 00:01:34,610 So when it reloads we won't have to change the hardware accelerate it to you it will automatically load 29 00:01:34,610 --> 00:01:36,050 it as a GP you. 30 00:01:36,560 --> 00:01:41,230 But we do have to rerun any cells that we may have ran before. 31 00:01:42,140 --> 00:01:43,300 So there we go. 32 00:01:44,390 --> 00:01:46,430 Check the GP import tensor flow. 33 00:01:46,460 --> 00:01:52,540 Make sure we use intensive load to point X and I'll delete this one. 34 00:01:52,600 --> 00:01:54,040 Wonderful. 35 00:01:54,040 --> 00:01:54,640 Okay. 36 00:01:54,760 --> 00:02:00,430 So now we've got our workspace ready and you might be thinking Daniel that took a fairly long time to 37 00:02:00,430 --> 00:02:01,020 get set up. 38 00:02:01,090 --> 00:02:05,620 But as always a person is only as good as the tool they use. 39 00:02:05,620 --> 00:02:05,890 Right. 40 00:02:05,890 --> 00:02:07,620 Or as the tools they use. 41 00:02:07,660 --> 00:02:13,250 So that's why we spent a bit of time getting our code lab workspace ready getting our new project folder 42 00:02:13,250 --> 00:02:15,420 together getting our data there. 43 00:02:15,430 --> 00:02:19,230 So now we go back What step are we up to. 44 00:02:19,240 --> 00:02:22,030 There should be a zero here for forgetting our workspace ready. 45 00:02:22,300 --> 00:02:28,030 But now we're up to step 1 which is getting our data ready a.k.a. turning it into tenses. 46 00:02:28,030 --> 00:02:31,000 Now this is the premise of tensor flow in itself. 47 00:02:31,000 --> 00:02:32,590 That's why it's called tensor flow. 48 00:02:32,590 --> 00:02:37,930 The premise is to get your data into tenses a.k.a. remember all machine learning models like to deal 49 00:02:37,930 --> 00:02:38,910 with numbers. 50 00:02:38,950 --> 00:02:45,610 So we have to get our data into tenses which is just a fancy word for Matrix a.k.a. a name pi array 51 00:02:45,820 --> 00:02:48,490 with some dimensions but a tensor. 52 00:02:48,610 --> 00:02:51,730 The key thing here is that it can be run on a Jeep you. 53 00:02:51,790 --> 00:02:58,330 So if we go back here at tensor flow is turn your data into tenses and then it flows through a workflow 54 00:02:58,360 --> 00:02:59,350 like this. 55 00:02:59,560 --> 00:03:01,000 You get it tensor flow 56 00:03:03,860 --> 00:03:07,360 doing a machine learning goes and you get a comedy show for free. 57 00:03:07,360 --> 00:03:08,260 All right. 58 00:03:08,260 --> 00:03:10,590 So step one we're going to access the data. 59 00:03:10,600 --> 00:03:16,820 So right now it's over here in drive in the dog vision. 60 00:03:16,820 --> 00:03:21,690 So now we have to somehow get it from this folder into this code space. 61 00:03:21,770 --> 00:03:26,270 So let's do that we'll get out of that make a little heading. 62 00:03:26,420 --> 00:03:32,100 Actually we can go to we're getting our data ready. 63 00:03:32,570 --> 00:03:41,300 Turning it into tenses remember we're communicating with ourselves here we go here with all machine 64 00:03:41,300 --> 00:03:56,430 learning models our data has to be in numerical format so that's what we'll be doing first turning our 65 00:03:56,490 --> 00:03:59,820 images into tenses 66 00:04:04,140 --> 00:04:04,980 numerical 67 00:04:08,150 --> 00:04:09,410 representation 68 00:04:13,920 --> 00:04:20,520 and now this is a code cell and in that collab a special way to transform something into markdown is 69 00:04:20,520 --> 00:04:27,600 to hold command and double tap M if you want to see a list of all shortcuts and collab hold command 70 00:04:27,720 --> 00:04:34,750 M H there we go have a look and I far out there is a lot there. 71 00:04:34,780 --> 00:04:40,900 So this takes some practice again because some of the commands we've been using in Jupiter are slightly 72 00:04:40,900 --> 00:04:41,770 different. 73 00:04:41,830 --> 00:04:44,710 If you do need to look one up have a look in here. 74 00:04:44,890 --> 00:04:49,930 Otherwise I'll just kind of keep running with it if I need to add a code so I can just press this little 75 00:04:49,930 --> 00:04:55,440 plus button here or up here and the main one will will be using is shifting into right. 76 00:04:55,450 --> 00:04:59,800 Just to run our code let's look at the labels what does that look like. 77 00:04:59,800 --> 00:05:00,640 So if we come back 78 00:05:04,080 --> 00:05:12,600 let's start by accessing our data and checking out the labels. 79 00:05:13,850 --> 00:05:16,170 So to do that it's in a CSB format. 80 00:05:16,230 --> 00:05:21,780 So if we want to check it out we might have to or we will have to import pandas because it's a lot easier 81 00:05:21,780 --> 00:05:28,770 to look at things that I see format using pandas so check out the labels of our data. 82 00:05:28,770 --> 00:05:35,530 We're gonna import pandas as payday our favorite labels CSP equals. 83 00:05:36,510 --> 00:05:42,840 So what we're going to do is just pass in maybe we could set up a little a little PA but we'll type 84 00:05:42,840 --> 00:05:44,110 it out to begin with. 85 00:05:44,430 --> 00:05:50,210 We could just copy this copy path but we're gonna practice. 86 00:05:50,230 --> 00:05:51,140 What would that look like. 87 00:05:51,160 --> 00:05:51,820 Something like that. 88 00:05:51,820 --> 00:05:52,990 Let's practice typing it out. 89 00:05:52,990 --> 00:06:01,300 Remember you don't need content drive my drive slash dogged vision. 90 00:06:02,080 --> 00:06:04,010 Oh yeah. 91 00:06:04,180 --> 00:06:05,560 This is what projects need right. 92 00:06:05,590 --> 00:06:09,500 I need cool named labels dot CSB. 93 00:06:09,690 --> 00:06:10,060 All right. 94 00:06:10,300 --> 00:06:12,520 And we're going to can that. 95 00:06:13,320 --> 00:06:23,610 And then what we might check out is we'll get labels CSB we wanted to describe it describe would help 96 00:06:23,610 --> 00:06:31,970 if I could type I'd be full time lucky and we go and print we'll also get the head. 97 00:06:32,040 --> 00:06:37,690 So the first five rows let's check it out again more may this mall space. 98 00:06:37,820 --> 00:06:39,500 Wonderful. 99 00:06:39,500 --> 00:06:45,090 What do we have here count ten thousand two hundred twenty two I.D.. 100 00:06:45,170 --> 00:06:48,990 Grade ten thousand two hundred twenty two unique value. 101 00:06:49,010 --> 00:06:49,470 So. 102 00:06:49,540 --> 00:06:50,290 OK. 103 00:06:50,450 --> 00:06:53,530 So there's ten thousand two hundred twenty two unique ideas. 104 00:06:53,870 --> 00:07:00,010 And one hundred and twenty unique grades and then this is what the I.D. column looks like. 105 00:07:01,280 --> 00:07:03,950 Again it looks a bit different because we've printed it. 106 00:07:03,950 --> 00:07:07,390 So if we just add labels dot CSB dot head. 107 00:07:07,390 --> 00:07:12,810 This is what you might be used to looking at something like that Navy guys are going to I.D. column 108 00:07:13,110 --> 00:07:17,590 which is the file names I believe of our images. 109 00:07:17,610 --> 00:07:23,260 So this is of train and the brain associated with that I.D.. 110 00:07:23,290 --> 00:07:30,880 Now a little tidbit here if you do have a lot of files in one of your folders in Google Drive it actually 111 00:07:30,880 --> 00:07:33,710 does take a fairly long time to load here. 112 00:07:33,790 --> 00:07:38,060 So rather than wait for that let's just go back to Kaggle. 113 00:07:38,110 --> 00:07:41,560 Treat this as our way of looking at things. 114 00:07:41,570 --> 00:07:42,200 There we go. 115 00:07:42,380 --> 00:07:49,430 So if we get a label CSB ten thousand two hundred twenty two unique values Yes that's correct. 116 00:07:49,520 --> 00:07:50,370 Unique. 117 00:07:50,640 --> 00:07:53,560 And then train. 118 00:07:53,790 --> 00:08:01,010 We've got unique ideas so does that look like what we've got in our day. 119 00:08:01,040 --> 00:08:02,210 Yes it does. 120 00:08:02,210 --> 00:08:03,380 Wonderful. 121 00:08:03,380 --> 00:08:03,800 Okay. 122 00:08:03,800 --> 00:08:05,270 Now what we might check out. 123 00:08:05,450 --> 00:08:07,020 We've got 120 different brains. 124 00:08:07,020 --> 00:08:14,240 So that means that there's ten thousand or so images hopefully there's about 100 hundred or so images 125 00:08:14,690 --> 00:08:15,590 per class. 126 00:08:16,340 --> 00:08:17,650 Well actually let's just figure that out. 127 00:08:17,660 --> 00:08:19,810 Let's figure out how many images there are. 128 00:08:19,910 --> 00:08:25,080 Her class first and then we'll see why that's important. 129 00:08:25,130 --> 00:08:29,420 How many images are there of each breed. 130 00:08:29,540 --> 00:08:36,010 And we could probably do this pretty easily by going labels since the grade some accessing the braid 131 00:08:36,020 --> 00:08:37,900 column and then I'm going to go. 132 00:08:37,900 --> 00:08:38,900 Value counts. 133 00:08:38,900 --> 00:08:44,220 So count the values in this column actually let's just see what that does. 134 00:08:44,220 --> 00:08:49,280 This is a dog braid Scottish deer hound Maltese dog and to butcher. 135 00:08:49,320 --> 00:08:50,540 Not even sure what that is. 136 00:08:50,550 --> 00:08:53,280 Shout out if if your dog's on here asking my dog. 137 00:08:53,500 --> 00:08:55,540 Yeah it's really cool OK. 138 00:08:55,790 --> 00:09:03,010 And now let's visualize this is probably going to look a bit better if we visualize this pig size equals 139 00:09:03,090 --> 00:09:03,990 2010 140 00:09:08,010 --> 00:09:15,310 Wall see the beauty of collab is that we didn't even have to import map Gottlieb to see that. 141 00:09:15,360 --> 00:09:20,870 So my notebooks a bit zoomed in but this is pretty cool if we were to draw a line. 142 00:09:21,180 --> 00:09:25,680 So look we got dog braids here all of 120 different dog rates. 143 00:09:25,680 --> 00:09:28,830 And then this is the number of images that they have. 144 00:09:28,830 --> 00:09:37,160 So if we were to draw a line somewhere across here we'd say about maybe seventy five on average we can 145 00:09:37,160 --> 00:09:40,440 figure that out pretty easily anyway so labels CSA. 146 00:09:40,460 --> 00:09:44,390 How do you find the main of a column. 147 00:09:44,390 --> 00:09:53,930 Actually we need to count them first because otherwise they're just strings dot main average maybe it's 148 00:09:53,930 --> 00:09:55,820 probably a little bit more robust in the main 149 00:09:59,580 --> 00:10:00,830 so the middle number 82. 150 00:10:00,910 --> 00:10:01,130 OK. 151 00:10:01,160 --> 00:10:02,180 So we said seventy five. 152 00:10:02,210 --> 00:10:03,950 But that is a good thing. 153 00:10:04,190 --> 00:10:11,030 You can imagine right if you had let's say Scottish deer hound if we had 500 images of that but this 154 00:10:11,030 --> 00:10:11,920 bad boy over here. 155 00:10:11,930 --> 00:10:13,940 What is this one Eskimo dog. 156 00:10:14,230 --> 00:10:20,750 If we only had three images of Eskimo dog we could imagine our machine learning model might do very 157 00:10:20,750 --> 00:10:25,260 well at figuring out what a Scottish D ham looks like because it has so many examples. 158 00:10:25,490 --> 00:10:30,440 But as for an Eskimo dog if there's only three images it's probably going to be a bit hard to figure 159 00:10:30,440 --> 00:10:36,910 out what it looks like when you're asking yourself how many images should I have per class Google recommends 160 00:10:37,030 --> 00:10:45,250 a bare minimum ten Google minimum number of images per class 161 00:10:48,710 --> 00:10:50,000 preparing your training data. 162 00:10:50,000 --> 00:10:50,450 Here we go. 163 00:10:50,450 --> 00:10:56,090 So this is Google's auto m l so which is which is their service for automatically building machine learning 164 00:10:56,090 --> 00:10:57,820 models. 165 00:10:57,920 --> 00:11:00,560 This is some extra curricular if you want to check this out. 166 00:11:00,560 --> 00:11:05,180 Pretty cool preparing your data. 167 00:11:05,180 --> 00:11:07,550 There we go for model training purposes. 168 00:11:07,550 --> 00:11:12,140 It is recommended you use about 100 annotations per label. 169 00:11:12,140 --> 00:11:19,800 So in our case it would be great if you had 100 examples of each image class with at least 10. 170 00:11:20,100 --> 00:11:24,320 So one hundred is great but at least 10 is a good start. 171 00:11:24,710 --> 00:11:29,450 So if you wanted to build your own image classifier if it's a binary classification you'd want to be 172 00:11:29,450 --> 00:11:32,780 looking at minimum 10 images per class. 173 00:11:32,780 --> 00:11:40,670 So our data set out about 75 to 85 images per class is pretty well spread out. 174 00:11:40,670 --> 00:11:41,630 This is a good start. 175 00:11:42,140 --> 00:11:42,780 OK. 176 00:11:42,920 --> 00:11:45,800 So done a little bit of exploration of the labels. 177 00:11:45,800 --> 00:11:53,460 Now what we might do is start to give out image paths so labels is pretty easy to deal with in a CSA 178 00:11:53,480 --> 00:11:58,370 but all of our images again we can't see that are in this format here. 179 00:11:58,460 --> 00:12:01,790 So we need a way to get them into our CO lab notebook. 180 00:12:01,790 --> 00:12:04,180 Let's have a look at how to do that in the next video.