1 00:00:00,330 --> 00:00:05,650 In this lesson we're going to poke around a bit more and start exploring our data. 2 00:00:05,850 --> 00:00:08,370 So far we formulated our question. 3 00:00:08,460 --> 00:00:09,570 We know what we want to do. 4 00:00:09,600 --> 00:00:15,780 We set out to classify images according to ten different categories or classes. 5 00:00:15,780 --> 00:00:19,470 We've also gathered data and very very handily. 6 00:00:19,470 --> 00:00:22,950 They've already been split into training and testing data. 7 00:00:22,950 --> 00:00:30,210 Now we're getting to this data cleaning and processing and exploration stage back in our Jupiter notebook. 8 00:00:30,440 --> 00:00:38,110 Let's insert a markdown cell here explore the data and the first thing I want to do is I want to pull 9 00:00:38,110 --> 00:00:39,590 up some of these pictures. 10 00:00:39,670 --> 00:00:50,290 So when I go back up to my inputs and I want input to things I want to see from eye Python dot display 11 00:00:50,740 --> 00:01:05,720 import display and I want to see from Charisse dot pre processing dot image import array to image I 12 00:01:05,810 --> 00:01:12,910 AMG and you know what I'm actually going to input a third thing I'm going to input that plot lib so 13 00:01:12,910 --> 00:01:20,800 I'll add import matter plot lib dot pie plot as P.L. T. 14 00:01:21,010 --> 00:01:22,500 And just below a lot. 15 00:01:22,500 --> 00:01:23,800 Percent match. 16 00:01:23,950 --> 00:01:29,820 Plot lib space inline shift enter. 17 00:01:30,140 --> 00:01:36,760 And now let me come down here and let's throw some of these images up on screen and take a closer look 18 00:01:36,760 --> 00:01:37,060 at them. 19 00:01:37,180 --> 00:01:42,970 So the reason I imported I Python display and plot lib is because I wanted to show you two different 20 00:01:42,970 --> 00:01:47,570 ways that you can display an image in Jupiter notebook. 21 00:01:47,710 --> 00:01:51,660 So the first one is with a python display. 22 00:01:51,880 --> 00:01:58,020 And what I'm going to do is actually look at the very first item in our training dataset. 23 00:01:58,120 --> 00:02:03,010 So X train underscore all square brackets zero. 24 00:02:03,150 --> 00:02:10,230 It's the very first item and shift enter will actually show us what this image looks like at the moment. 25 00:02:10,360 --> 00:02:17,680 And if we wanted to see this whole thing as an actual image then we're going to make use of this array 26 00:02:17,710 --> 00:02:22,840 to image functionality from Chris's pre processing module. 27 00:02:22,870 --> 00:02:31,120 So what we can do is store that image in a variable called pick and that will be equal to array to image 28 00:02:31,390 --> 00:02:36,090 parentheses and then we can pick one of the images from our training dataset. 29 00:02:36,160 --> 00:02:37,480 See we picked the. 30 00:02:37,810 --> 00:02:41,410 I don't know seventh image using a python display. 31 00:02:41,410 --> 00:02:49,600 We can then throw that up on the screen with display parentheses and then supply our picture as an argument. 32 00:02:49,600 --> 00:02:53,660 So this is what the seventh picture will look like now. 33 00:02:53,680 --> 00:03:00,220 I hope you can tell it this is a very tiny picture of a horse and we should be able to verify that this 34 00:03:00,220 --> 00:03:01,670 is indeed a horse. 35 00:03:01,720 --> 00:03:05,770 By looking at the image labels which are found in the y values. 36 00:03:05,890 --> 00:03:10,300 So why honest quatrain on a scroll all don't shape. 37 00:03:10,360 --> 00:03:13,650 Well show us what are y values actually look like. 38 00:03:13,860 --> 00:03:21,790 All right we've got a number higher rate of 50000 labels and it's two dimensional. 39 00:03:21,790 --> 00:03:29,740 So if I wanted to pull up the y value at index 7 then I would write y on as Katrina and a square all 40 00:03:30,160 --> 00:03:35,260 square brackets 7 and then another square brackets 0. 41 00:03:35,500 --> 00:03:40,210 And what we get is the values 7 so it doesn't say horse. 42 00:03:40,210 --> 00:03:42,220 It just says 7. 43 00:03:42,220 --> 00:03:43,690 Now why is that. 44 00:03:43,690 --> 00:03:45,700 Well let's pull up Alex's website again. 45 00:03:46,180 --> 00:03:51,250 So if I look at his little breakdown here I think this is really handy. 46 00:03:51,250 --> 00:03:52,520 These are in order right. 47 00:03:52,570 --> 00:04:02,550 I can see that values 7 should correspond to 0 1 2 3 4 5 6 7 corresponds to a horse. 48 00:04:02,560 --> 00:04:07,450 So what we've got here are the names for these different labels. 49 00:04:07,570 --> 00:04:14,490 Label number 0 will be our plain label number 7 will be horse label number 9 will be truck. 50 00:04:14,530 --> 00:04:17,430 Now it's no good just having this on this website. 51 00:04:17,470 --> 00:04:20,850 I think what we should do is actually add these to our notebook. 52 00:04:21,010 --> 00:04:31,300 So up here I'm going to add another markdown cell and I'm going to add some constants here so constants. 53 00:04:31,810 --> 00:04:38,770 And among those constants will be this list of names because then we can easily refer to it later on. 54 00:04:38,770 --> 00:04:43,260 So I'm just type label underscore names. 55 00:04:43,600 --> 00:04:51,200 So here will create a list that contains all the label names in order plane car bird cat Dear Dog frog. 56 00:04:51,250 --> 00:04:56,670 Horse ship and truck shift enter on that. 57 00:04:57,350 --> 00:05:04,610 We can scroll back down and what we should be able to do is hit up our label names list open some square 58 00:05:04,610 --> 00:05:12,940 brackets and we can put the label at the index number seven in here and I'll paste that right in it 59 00:05:12,950 --> 00:05:15,970 shift into Astra get horse. 60 00:05:16,120 --> 00:05:16,450 OK. 61 00:05:16,460 --> 00:05:16,930 Brilliant. 62 00:05:17,210 --> 00:05:18,460 So that makes sense to me. 63 00:05:18,710 --> 00:05:24,730 But I think we should look at another image just to double check it works and look at this image. 64 00:05:24,770 --> 00:05:30,890 Let's use matte plot lib so we can actually use matte plot lib and through the image onto a plot because 65 00:05:30,890 --> 00:05:35,510 this image is actually just an array of pixels matte plot lib knows how to handle this. 66 00:05:35,720 --> 00:05:45,860 So with Piti dot I am show parentheses x underscore a train underscore all and then say square brackets 67 00:05:46,520 --> 00:05:54,940 and then let's pick the fifth image at index for close the parentheses and then on the next line will 68 00:05:54,940 --> 00:05:58,920 at Peel T dot show and hit shift enter. 69 00:05:59,210 --> 00:06:01,890 Let's throw this image up on screen. 70 00:06:02,030 --> 00:06:07,500 So what we've got here is a chart every five pixels has got a tick mark. 71 00:06:07,760 --> 00:06:13,740 And for our convenience this is a slightly enlarged but it's definitely a car right. 72 00:06:13,790 --> 00:06:16,580 We can verify this by checking our list of label names right. 73 00:06:16,820 --> 00:06:24,500 So with Peel T dot X label we can look towards our list of labels and then insert the label on the x 74 00:06:24,500 --> 00:06:25,590 axis. 75 00:06:25,640 --> 00:06:35,020 So peel teed up X label parentheses label names square brackets and then we'll use the same technique 76 00:06:35,230 --> 00:06:35,890 as above. 77 00:06:35,890 --> 00:06:36,510 Right. 78 00:06:36,610 --> 00:06:43,180 We're going to look to our y underscore train underscore all umpiring could put some square brackets 79 00:06:43,180 --> 00:06:46,010 there when you use the same index in x4. 80 00:06:46,270 --> 00:06:51,730 And we're going to use another pair square brackets and then zero because we're quite a two dimensional 81 00:06:52,000 --> 00:06:53,850 num pi array. 82 00:06:54,130 --> 00:07:00,640 So I know it's a lot of nesting here but we're pulling out a value from our y unescorted train underscore 83 00:07:00,640 --> 00:07:08,920 all and we're using that as the index in our list of label names and we're gonna display that on our 84 00:07:08,920 --> 00:07:10,350 x axis. 85 00:07:10,690 --> 00:07:15,730 And if this works we should see the word card written here. 86 00:07:16,450 --> 00:07:20,450 If this is too small can of course enlarge the font size. 87 00:07:20,590 --> 00:07:28,370 So an extra argument here and font size 15 will make this much much more legible. 88 00:07:28,420 --> 00:07:30,250 So this is pretty cool right. 89 00:07:30,250 --> 00:07:39,160 We've got our images that are stored as Num higher arrays in our data set and we've got two ways of 90 00:07:39,280 --> 00:07:42,800 displaying them in our Jupiter notebook. 91 00:07:42,910 --> 00:07:46,430 The first way is using a python display. 92 00:07:46,990 --> 00:07:51,590 And the second way is actually throwing them up onto a mat plot lib chart. 93 00:07:51,610 --> 00:07:57,490 Now I quite like Matt put lib actually one of the reasons is that's very easy to display these images 94 00:07:57,790 --> 00:08:02,550 side by side and just take a look at a few more from this dataset. 95 00:08:02,560 --> 00:08:06,190 So I might ask you to solve this as a challenge. 96 00:08:06,190 --> 00:08:13,910 Can you write a for loop to display the first 10 images from the X underscore train underscore all the 97 00:08:13,910 --> 00:08:15,310 data set. 98 00:08:15,340 --> 00:08:22,420 Remember this is a number primary and I'd like you to display all these images in a row and I'd like 99 00:08:22,420 --> 00:08:24,750 you to show the label below. 100 00:08:24,760 --> 00:08:31,450 Each image and if you can format the images in such a way that we're not seeing any of these tick marks 101 00:08:31,450 --> 00:08:34,310 here like we are in the cell above. 102 00:08:34,390 --> 00:08:37,900 I'll give you a few seconds to pause the video and give this a go 103 00:08:41,620 --> 00:08:42,340 ready. 104 00:08:42,340 --> 00:08:44,310 Here's the solution. 105 00:08:44,380 --> 00:08:49,510 The first thing I'm going to do is an asset the size of my chart or of my figure. 106 00:08:49,870 --> 00:09:00,810 So PDT don't figure parentheses fixed size equals parentheses 15 by 5. 107 00:09:00,820 --> 00:09:05,900 Well size my chart nicely and then I'm going to write this for loop. 108 00:09:05,950 --> 00:09:09,850 For i in range 10. 109 00:09:09,880 --> 00:09:16,490 This for loop will go through the numbers 0 through 9 and I can show you this if I print I here. 110 00:09:17,500 --> 00:09:24,270 But inside my loop what I want to do is actually create some subplots using my plot lib. 111 00:09:24,640 --> 00:09:35,140 This is what the challenge was about so I'll say BLT dot subplot and then my subplot will require three 112 00:09:35,140 --> 00:09:36,530 inputs right. 113 00:09:36,730 --> 00:09:46,180 One how many rows 10 how many columns and then because we're in a for loop we're going to have to figure 114 00:09:46,180 --> 00:09:49,650 out which subplot we want to create at each iteration. 115 00:09:49,660 --> 00:09:56,350 So in this case it's going to be I plus one because the first plot should be number one and the last 116 00:09:56,350 --> 00:10:00,440 subplot should be number 10 but I starts from zero. 117 00:10:00,460 --> 00:10:01,840 It goes up until 9. 118 00:10:01,960 --> 00:10:13,280 So this is why if I plus one and then we'll use PDT dot I am show parentheses x underscore train underscore 119 00:10:13,360 --> 00:10:16,440 all square brackets. 120 00:10:16,440 --> 00:10:27,100 I let's see what this gives us what we see here is 10 images in a row in a figure that assigns 15 by 121 00:10:27,160 --> 00:10:29,660 fives and number seven here. 122 00:10:29,680 --> 00:10:36,330 So of course our horse and the image at index number four is our call to remove those tick marks. 123 00:10:36,340 --> 00:10:47,830 We can do that with peeled T dot y ticks parentheses square brackets and peel T dot X takes parentheses 124 00:10:48,040 --> 00:10:51,320 square brackets so square brackets are empty. 125 00:10:51,580 --> 00:10:55,240 So this means that my tick marks will be blank. 126 00:10:55,240 --> 00:10:59,860 Now all I have to do is insert those labels below the images. 127 00:10:59,860 --> 00:11:05,830 So I want to do that with peeled T dot X label and then want to use the very same technique I've got 128 00:11:05,830 --> 00:11:06,510 above. 129 00:11:06,700 --> 00:11:14,450 I want to take my list of label names and in the square brackets I'm going to look towards my dataset 130 00:11:14,860 --> 00:11:23,500 of training labels y on a squat train underscore all square brackets and again I for the index and square 131 00:11:23,500 --> 00:11:27,460 brackets zero for that second dimension. 132 00:11:27,460 --> 00:11:30,120 Let me hit shift enter on this. 133 00:11:30,640 --> 00:11:31,810 And there we go. 134 00:11:31,810 --> 00:11:36,910 There are our tiny little labels underneath our tiny little images. 135 00:11:37,030 --> 00:11:39,530 And because that's very hard to read. 136 00:11:39,760 --> 00:11:44,470 I'll increase the font size once again to say maybe 14. 137 00:11:44,500 --> 00:11:47,050 Try that. 138 00:11:47,050 --> 00:11:47,500 There we go. 139 00:11:47,500 --> 00:11:49,450 That's a lot more legible. 140 00:11:49,450 --> 00:11:50,570 Brilliant. 141 00:11:50,590 --> 00:11:58,910 So what we see here is that our list of a label names matches up nicely with the numbers in the y underscore 142 00:11:58,910 --> 00:12:03,850 train underscore all of an umpire Ray seems to be indeed a bird. 143 00:12:03,910 --> 00:12:08,190 And this seems to be a ship and we've got a cat and a deer. 144 00:12:08,590 --> 00:12:13,220 I think this sort of evil looking red eyed frog is a definitely a keeper. 145 00:12:13,850 --> 00:12:20,800 But yeah what we're seeing here are indeed the first 10 images in our training dataset. 146 00:12:20,950 --> 00:12:28,770 So the next thing to understand about this data set is how these images are formatted so x underscore 147 00:12:28,780 --> 00:12:32,450 train underscore all square brackets zero. 148 00:12:32,560 --> 00:12:43,720 That frog is formatted like so but if we look at the shape of it we see that it is 32 by 32 by 3. 149 00:12:43,750 --> 00:12:56,040 Now this is interesting because 32 and 32 are the width and the height and three is the number of channels. 150 00:12:56,050 --> 00:12:59,770 So when I say channels I'm talking about colors. 151 00:12:59,770 --> 00:13:01,180 We've got a red channel. 152 00:13:01,210 --> 00:13:04,220 We've got a green channel and we've got a blue channel. 153 00:13:04,570 --> 00:13:08,440 So this is gonna be in our GP format. 154 00:13:08,440 --> 00:13:13,390 If we look at our number higher rate as a whole so X and it's Katrina on the square all that shape we 155 00:13:13,390 --> 00:13:16,330 get to see that this thing has actually four dimensions. 156 00:13:16,330 --> 00:13:16,960 Right. 157 00:13:16,990 --> 00:13:26,800 So we've got 50000 entries and each of these entries is 32 pixels by 32 pixels by three. 158 00:13:26,800 --> 00:13:28,150 Let's make this more explicit. 159 00:13:28,150 --> 00:13:28,450 Right. 160 00:13:28,480 --> 00:13:32,660 So let's see number of images comma X come out. 161 00:13:32,710 --> 00:13:33,620 Why comma. 162 00:13:33,670 --> 00:13:39,550 C is equal to X and a quatrain underscore all that shape. 163 00:13:39,550 --> 00:13:46,630 If we do this then we can add a nice little print statement here below with an F string so print open 164 00:13:46,630 --> 00:13:56,780 parentheses f single quotes and then we can write images is equal to curly braces. 165 00:13:56,970 --> 00:14:01,230 Number of images no underscore images. 166 00:14:01,310 --> 00:14:09,450 And then we can insert some formatting so backslash T for a tab and then a pipe maybe and then we'll 167 00:14:09,450 --> 00:14:15,280 see width is equal to curly braces. 168 00:14:15,390 --> 00:14:24,860 X backslash t pipe height is equal to curly braces. 169 00:14:24,980 --> 00:14:33,510 Y backslash t channels is equal to curly braces. 170 00:14:33,730 --> 00:14:40,480 See me at that pipe symbol here and hit shift center. 171 00:14:40,540 --> 00:14:43,600 I think this is a really nice thing to have in a notebook. 172 00:14:43,600 --> 00:14:49,720 If you're going to take a break and come back to it later on to be reminded of the structure of the 173 00:14:49,720 --> 00:14:55,000 data all in all we've got 50000 training images. 174 00:14:55,000 --> 00:14:58,750 We've got a width in pixels of 32. 175 00:14:58,750 --> 00:15:00,850 So this is our x value. 176 00:15:00,850 --> 00:15:04,240 We've got a height in pixels of 32. 177 00:15:04,240 --> 00:15:09,310 This is our y value and we've got three color channels per pixel. 178 00:15:09,340 --> 00:15:16,550 So we've got three values per pixel now considering we've got 50000 training images. 179 00:15:16,560 --> 00:15:23,840 How many testing images do we have on those for test that shape and shift into shows us that we've got 180 00:15:24,150 --> 00:15:33,590 10000 testing images now that we know what our dataset consists of and how our information is formatted. 181 00:15:33,650 --> 00:15:38,360 It's time to pre process our data for our neural network. 182 00:15:38,360 --> 00:15:44,420 Considering that we loaded this data directly from cameras there's actually not much need for cleaning 183 00:15:44,870 --> 00:15:50,210 but we will do some pre processing for our neural network to make our lives a little easier. 184 00:15:50,560 --> 00:15:54,730 And we're going to cover all that and more in the next lesson. 185 00:15:54,740 --> 00:15:55,520 I'll see you there.