1 00:00:00,570 --> 00:00:07,690 In this lesson we're going to set up our notebook and get the data in your projects folder. 2 00:00:07,860 --> 00:00:13,140 Open a new Python 3 notebook and name this notebook. 3 00:00:13,140 --> 00:00:20,750 Eleven neural networks hyphen TAF handwriting recognition. 4 00:00:21,330 --> 00:00:23,250 Then insert some markdown cells. 5 00:00:23,310 --> 00:00:32,730 The first one I'll call imports and the second one I'll call Constance and the third one I'm going to 6 00:00:32,730 --> 00:00:37,650 call get the data in terms of the inputs. 7 00:00:37,650 --> 00:00:41,310 Once again we're gonna be working with a little bit of randomization. 8 00:00:41,440 --> 00:00:48,720 So if you want to have the same starting point as me you can enter four lines of code at the very top 9 00:00:49,110 --> 00:01:03,720 from num pint up random import seed and set the seed here to 8 8 8 and then from tensor flow import 10 00:01:04,140 --> 00:01:12,810 set random underscore seed then we're going to call set random seed and supply another lucky number 11 00:01:13,030 --> 00:01:14,160 as before. 12 00:01:14,510 --> 00:01:16,460 Try 4 0 4. 13 00:01:16,500 --> 00:01:17,670 All right. 14 00:01:17,670 --> 00:01:19,320 These are the inputs that we need. 15 00:01:19,410 --> 00:01:23,800 Now you're gonna get an error if you haven't installed tensor flow locally. 16 00:01:24,030 --> 00:01:27,340 So check the previous tutorial with these set of instructions for them. 17 00:01:27,880 --> 00:01:33,510 And otherwise we're gonna import our same old friends as before the operating system. 18 00:01:33,510 --> 00:01:39,110 Import os num py important num pi as P. 19 00:01:39,660 --> 00:01:41,400 And of course we're gonna need tensor flow right. 20 00:01:41,430 --> 00:01:47,500 So import tensor flow as T F. 21 00:01:48,380 --> 00:01:54,870 The next thing I'd like you to ask you to do is could the course resources page and download the data 22 00:01:54,870 --> 00:01:57,240 set for this module. 23 00:01:57,240 --> 00:02:03,780 Once your download has completed you should find this zip file here in your Downloads folder and honest 24 00:02:04,050 --> 00:02:13,200 on a score data dump zip extract the contents of the zip file and you'll find a folder with five items 25 00:02:13,200 --> 00:02:20,970 in it that data for testing are features for training our labels for testing our labels for training 26 00:02:21,390 --> 00:02:24,020 and a little test image here. 27 00:02:24,030 --> 00:02:26,440 Number two tiny number two. 28 00:02:26,460 --> 00:02:32,910 What I'd like you to do is to add these files to your projects folder in our projects folder. 29 00:02:32,910 --> 00:02:39,750 Would it create a new folder here and we're going to rename this folder here to read just amnesty in 30 00:02:39,750 --> 00:02:41,670 all caps. 31 00:02:42,060 --> 00:02:49,710 Once we've created this folder we can go inside of it and we're going to see upload and when I select 32 00:02:50,370 --> 00:02:57,320 all the CSP files and the little test on underscore score image stop PMG so click open. 33 00:02:58,140 --> 00:03:04,710 And at this point it might give us a file size warning because it's a large amount of data and it's 34 00:03:04,710 --> 00:03:06,810 all stored in C as V files. 35 00:03:06,810 --> 00:03:15,080 We're just going to confirm that we want to upload it now we'll just hit upload upload upload upload. 36 00:03:15,370 --> 00:03:17,380 This one's taking a little bit longer. 37 00:03:17,620 --> 00:03:23,560 Just upload this thing this time it's giving me a percentage progress which is nice by the way. 38 00:03:23,560 --> 00:03:25,000 You don't have to go through this. 39 00:03:25,030 --> 00:03:28,120 Do we hear from Jupiter to do the upload. 40 00:03:28,120 --> 00:03:34,070 It might actually just be easier to drag and drop this into the amnesty folder on your hard drive. 41 00:03:34,100 --> 00:03:40,840 Just take all of these and you can copy them into the endless folder that you created under your project 42 00:03:40,840 --> 00:03:41,780 folder. 43 00:03:41,800 --> 00:03:47,830 Now that we've successfully added all the data to the project folder we can go back into our Jupiter 44 00:03:47,830 --> 00:03:54,490 notebook and we can take the data that's currently in the C as V's and create a name pie array from 45 00:03:54,490 --> 00:03:55,720 them. 46 00:03:55,720 --> 00:04:03,640 The first thing I'll do is I'll create some constants for the relative paths to these files so see X 47 00:04:03,820 --> 00:04:14,710 on a school train on the score path is equal to single quotes and this forward slash digit underscore 48 00:04:15,350 --> 00:04:19,540 X train dot c s Ft. 49 00:04:19,780 --> 00:04:24,390 And this is the folder in the same directory as my notebook and then inside amnesty. 50 00:04:24,460 --> 00:04:32,370 I've got the digit on a school train don't CSP file so I'll just take this line. 51 00:04:32,590 --> 00:04:40,870 I'll copy it a couple more times and just change the names of the constants and the relative paths as 52 00:04:40,870 --> 00:04:42,140 you see here. 53 00:04:42,280 --> 00:04:49,190 So we've got X underscored test on the score path y underscore train underscore path and so on. 54 00:04:49,240 --> 00:04:54,870 The important thing is that the relative paths and the file names match up exactly. 55 00:04:54,970 --> 00:04:58,390 If you've got a typo anywhere here then you're going to have problems down the road. 56 00:04:58,420 --> 00:05:05,510 So just double check that this deed matches up now that I've got my path all set going to head shift 57 00:05:05,510 --> 00:05:11,720 and turn on the cell and then down here in my next subsection where it says get to data. 58 00:05:12,390 --> 00:05:19,440 I'm going to load all of these using num pi so check it out I'll add a little bit of micro benchmarking 59 00:05:19,440 --> 00:05:20,200 code. 60 00:05:20,220 --> 00:05:25,150 Percent percent time and I'll load my labels first. 61 00:05:25,150 --> 00:05:31,740 So why on a school train underscore all shall hold on to all my training labels and I'm going to get 62 00:05:31,740 --> 00:05:43,310 hold of all of these using num PIs load ti 60 functions so end p dot load t t y on a school train on 63 00:05:43,310 --> 00:05:44,220 a score path. 64 00:05:44,670 --> 00:05:51,690 So my T C function knows where to look for this year's v file CSB of course stands for comma separated 65 00:05:51,690 --> 00:05:59,010 values so that the limiter in this case is going to be a comma and so provide that as well the limiter 66 00:05:59,040 --> 00:06:06,510 equals single quotes comma and finally all my training and all my testing data is actually made up of 67 00:06:06,540 --> 00:06:10,810 integers so we'll see this in a minute but I want to add another argument here. 68 00:06:10,900 --> 00:06:20,640 D type is equal to int for integer so let's load this and as you can see the training data loads very 69 00:06:20,640 --> 00:06:29,310 very quickly so why on the school train and a score all that shape is a flat num pi array with sixty 70 00:06:29,310 --> 00:06:31,070 thousand values. 71 00:06:31,140 --> 00:06:34,590 Now let's do the same for y on a score test. 72 00:06:34,590 --> 00:06:38,460 So our testing data x plus tackle our features. 73 00:06:38,460 --> 00:06:39,180 This is the big one. 74 00:06:39,180 --> 00:06:40,050 Right. 75 00:06:40,050 --> 00:06:41,990 X on the score train on the score. 76 00:06:42,030 --> 00:06:50,900 All it's gonna be equal to end p dot load t x t parentheses x and it's got train on the score path the 77 00:06:50,890 --> 00:06:58,950 limiter Como t type equals int and I'll tell you what I mean and a little bit of micro benchmarking 78 00:06:58,950 --> 00:07:00,690 code him once again. 79 00:07:00,690 --> 00:07:07,080 Percent percent time said you can see that this one is the big file and it's gonna take a little longer 80 00:07:07,080 --> 00:07:13,860 to load slide shift enter on the cell hand while it's doing that I'm already gonna get started on X 81 00:07:13,920 --> 00:07:21,590 on a score test that computer's still working so huge file I've actually given you him. 82 00:07:21,600 --> 00:07:22,050 There we go. 83 00:07:22,050 --> 00:07:30,570 It took a full minute on my machine to load in comparison our testing data should be a lot quicker and 84 00:07:30,570 --> 00:07:33,830 indeed it only takes about 10 seconds. 85 00:07:33,890 --> 00:07:34,270 Alright. 86 00:07:34,290 --> 00:07:41,610 So now we've loaded all our data into num pi arrays but we haven't really taken a look at it yet. 87 00:07:41,730 --> 00:07:43,450 We don't really know what we're dealing with. 88 00:07:43,590 --> 00:07:46,920 So let's explore the data in the next lesson.