1
00:00:00,570 --> 00:00:07,690
In this lesson we're going to set up our notebook and get the data in your projects folder.

2
00:00:07,860 --> 00:00:13,140
Open a new Python 3 notebook and name this notebook.

3
00:00:13,140 --> 00:00:20,750
Eleven neural networks hyphen TAF handwriting recognition.

4
00:00:21,330 --> 00:00:23,250
Then insert some markdown cells.

5
00:00:23,310 --> 00:00:32,730
The first one I'll call imports and the second one I'll call Constance and the third one I'm going to

6
00:00:32,730 --> 00:00:37,650
call get the data in terms of the inputs.

7
00:00:37,650 --> 00:00:41,310
Once again we're gonna be working with a little bit of randomization.

8
00:00:41,440 --> 00:00:48,720
So if you want to have the same starting point as me you can enter four lines of code at the very top

9
00:00:49,110 --> 00:01:03,720
from num pint up random import seed and set the seed here to 8 8 8 and then from tensor flow import

10
00:01:04,140 --> 00:01:12,810
set random underscore seed then we're going to call set random seed and supply another lucky number

11
00:01:13,030 --> 00:01:14,160
as before.

12
00:01:14,510 --> 00:01:16,460
Try 4 0 4.

13
00:01:16,500 --> 00:01:17,670
All right.

14
00:01:17,670 --> 00:01:19,320
These are the inputs that we need.

15
00:01:19,410 --> 00:01:23,800
Now you're gonna get an error if you haven't installed tensor flow locally.

16
00:01:24,030 --> 00:01:27,340
So check the previous tutorial with these set of instructions for them.

17
00:01:27,880 --> 00:01:33,510
And otherwise we're gonna import our same old friends as before the operating system.

18
00:01:33,510 --> 00:01:39,110
Import os num py important num pi as P.

19
00:01:39,660 --> 00:01:41,400
And of course we're gonna need tensor flow right.

20
00:01:41,430 --> 00:01:47,500
So import tensor flow as T F.

21
00:01:48,380 --> 00:01:54,870
The next thing I'd like you to ask you to do is could the course resources page and download the data

22
00:01:54,870 --> 00:01:57,240
set for this module.

23
00:01:57,240 --> 00:02:03,780
Once your download has completed you should find this zip file here in your Downloads folder and honest

24
00:02:04,050 --> 00:02:13,200
on a score data dump zip extract the contents of the zip file and you'll find a folder with five items

25
00:02:13,200 --> 00:02:20,970
in it that data for testing are features for training our labels for testing our labels for training

26
00:02:21,390 --> 00:02:24,020
and a little test image here.

27
00:02:24,030 --> 00:02:26,440
Number two tiny number two.

28
00:02:26,460 --> 00:02:32,910
What I'd like you to do is to add these files to your projects folder in our projects folder.

29
00:02:32,910 --> 00:02:39,750
Would it create a new folder here and we're going to rename this folder here to read just amnesty in

30
00:02:39,750 --> 00:02:41,670
all caps.

31
00:02:42,060 --> 00:02:49,710
Once we've created this folder we can go inside of it and we're going to see upload and when I select

32
00:02:50,370 --> 00:02:57,320
all the CSP files and the little test on underscore score image stop PMG so click open.

33
00:02:58,140 --> 00:03:04,710
And at this point it might give us a file size warning because it's a large amount of data and it's

34
00:03:04,710 --> 00:03:06,810
all stored in C as V files.

35
00:03:06,810 --> 00:03:15,080
We're just going to confirm that we want to upload it now we'll just hit upload upload upload upload.

36
00:03:15,370 --> 00:03:17,380
This one's taking a little bit longer.

37
00:03:17,620 --> 00:03:23,560
Just upload this thing this time it's giving me a percentage progress which is nice by the way.

38
00:03:23,560 --> 00:03:25,000
You don't have to go through this.

39
00:03:25,030 --> 00:03:28,120
Do we hear from Jupiter to do the upload.

40
00:03:28,120 --> 00:03:34,070
It might actually just be easier to drag and drop this into the amnesty folder on your hard drive.

41
00:03:34,100 --> 00:03:40,840
Just take all of these and you can copy them into the endless folder that you created under your project

42
00:03:40,840 --> 00:03:41,780
folder.

43
00:03:41,800 --> 00:03:47,830
Now that we've successfully added all the data to the project folder we can go back into our Jupiter

44
00:03:47,830 --> 00:03:54,490
notebook and we can take the data that's currently in the C as V's and create a name pie array from

45
00:03:54,490 --> 00:03:55,720
them.

46
00:03:55,720 --> 00:04:03,640
The first thing I'll do is I'll create some constants for the relative paths to these files so see X

47
00:04:03,820 --> 00:04:14,710
on a school train on the score path is equal to single quotes and this forward slash digit underscore

48
00:04:15,350 --> 00:04:19,540
X train dot c s Ft.

49
00:04:19,780 --> 00:04:24,390
And this is the folder in the same directory as my notebook and then inside amnesty.

50
00:04:24,460 --> 00:04:32,370
I've got the digit on a school train don't CSP file so I'll just take this line.

51
00:04:32,590 --> 00:04:40,870
I'll copy it a couple more times and just change the names of the constants and the relative paths as

52
00:04:40,870 --> 00:04:42,140
you see here.

53
00:04:42,280 --> 00:04:49,190
So we've got X underscored test on the score path y underscore train underscore path and so on.

54
00:04:49,240 --> 00:04:54,870
The important thing is that the relative paths and the file names match up exactly.

55
00:04:54,970 --> 00:04:58,390
If you've got a typo anywhere here then you're going to have problems down the road.

56
00:04:58,420 --> 00:05:05,510
So just double check that this deed matches up now that I've got my path all set going to head shift

57
00:05:05,510 --> 00:05:11,720
and turn on the cell and then down here in my next subsection where it says get to data.

58
00:05:12,390 --> 00:05:19,440
I'm going to load all of these using num pi so check it out I'll add a little bit of micro benchmarking

59
00:05:19,440 --> 00:05:20,200
code.

60
00:05:20,220 --> 00:05:25,150
Percent percent time and I'll load my labels first.

61
00:05:25,150 --> 00:05:31,740
So why on a school train underscore all shall hold on to all my training labels and I'm going to get

62
00:05:31,740 --> 00:05:43,310
hold of all of these using num PIs load ti 60 functions so end p dot load t t y on a school train on

63
00:05:43,310 --> 00:05:44,220
a score path.

64
00:05:44,670 --> 00:05:51,690
So my T C function knows where to look for this year's v file CSB of course stands for comma separated

65
00:05:51,690 --> 00:05:59,010
values so that the limiter in this case is going to be a comma and so provide that as well the limiter

66
00:05:59,040 --> 00:06:06,510
equals single quotes comma and finally all my training and all my testing data is actually made up of

67
00:06:06,540 --> 00:06:10,810
integers so we'll see this in a minute but I want to add another argument here.

68
00:06:10,900 --> 00:06:20,640
D type is equal to int for integer so let's load this and as you can see the training data loads very

69
00:06:20,640 --> 00:06:29,310
very quickly so why on the school train and a score all that shape is a flat num pi array with sixty

70
00:06:29,310 --> 00:06:31,070
thousand values.

71
00:06:31,140 --> 00:06:34,590
Now let's do the same for y on a score test.

72
00:06:34,590 --> 00:06:38,460
So our testing data x plus tackle our features.

73
00:06:38,460 --> 00:06:39,180
This is the big one.

74
00:06:39,180 --> 00:06:40,050
Right.

75
00:06:40,050 --> 00:06:41,990
X on the score train on the score.

76
00:06:42,030 --> 00:06:50,900
All it's gonna be equal to end p dot load t x t parentheses x and it's got train on the score path the

77
00:06:50,890 --> 00:06:58,950
limiter Como t type equals int and I'll tell you what I mean and a little bit of micro benchmarking

78
00:06:58,950 --> 00:07:00,690
code him once again.

79
00:07:00,690 --> 00:07:07,080
Percent percent time said you can see that this one is the big file and it's gonna take a little longer

80
00:07:07,080 --> 00:07:13,860
to load slide shift enter on the cell hand while it's doing that I'm already gonna get started on X

81
00:07:13,920 --> 00:07:21,590
on a score test that computer's still working so huge file I've actually given you him.

82
00:07:21,600 --> 00:07:22,050
There we go.

83
00:07:22,050 --> 00:07:30,570
It took a full minute on my machine to load in comparison our testing data should be a lot quicker and

84
00:07:30,570 --> 00:07:33,830
indeed it only takes about 10 seconds.

85
00:07:33,890 --> 00:07:34,270
Alright.

86
00:07:34,290 --> 00:07:41,610
So now we've loaded all our data into num pi arrays but we haven't really taken a look at it yet.

87
00:07:41,730 --> 00:07:43,450
We don't really know what we're dealing with.

88
00:07:43,590 --> 00:07:46,920
So let's explore the data in the next lesson.