0 1 00:00:00,510 --> 00:00:00,930 All right. 1 2 00:00:00,990 --> 00:00:06,660 So now that you know where the original data comes from and how to work with archive files and how to 2 3 00:00:06,660 --> 00:00:14,340 work with files that don't really have a good file extension, let's focus on how I would like you to work 3 4 00:00:14,430 --> 00:00:18,000 with the lesson resources that I've already included. 4 5 00:00:18,000 --> 00:00:23,520 These resources don't just include the spam emails, but also some image files and some fonts and I've 5 6 00:00:23,550 --> 00:00:28,180 organized everything very, very nicely for you in folders as well. 6 7 00:00:28,320 --> 00:00:30,470 So, you're welcome. 7 8 00:00:30,480 --> 00:00:35,780 Let's include the modules resources in Jupyter. 8 9 00:00:35,920 --> 00:00:36,240 All right. 9 10 00:00:36,270 --> 00:00:45,000 So on Windows, I've got my MLProject folder here in Jupyter and I have to pull up this folder here 10 11 00:00:45,540 --> 00:00:46,830 in Explorer. 11 12 00:00:47,490 --> 00:00:52,960 So this folder is located right here under "Users" and then my username, 12 13 00:00:53,130 --> 00:01:00,900 and if I open this up I can see that the same files that are listed here are here on Windows Explorer. 13 14 00:01:01,900 --> 00:01:04,250 So I've downloaded the lesson resources here. 14 15 00:01:04,330 --> 00:01:09,170 The "SpamData.zip" and if I double click on it, 15 16 00:01:09,310 --> 00:01:12,100 I open the "SpamData. 16 17 00:01:12,100 --> 00:01:13,180 zip" file. 17 18 00:01:13,300 --> 00:01:15,790 I have not extracted it. 18 19 00:01:15,790 --> 00:01:22,030 What you really need to do is right click on this thing and click "Extract All". 19 20 00:01:22,430 --> 00:01:23,150 OK. 20 21 00:01:23,200 --> 00:01:25,610 So now this is key. 21 22 00:01:26,050 --> 00:01:33,640 Let's see what happens if I leave the defaults and I just click "Extract". We'll twiddle our thumbs a bit 22 23 00:01:34,210 --> 00:01:38,200 and wait for this to complete. 23 24 00:01:38,200 --> 00:01:45,430 One thing you'll notice is that this works a lot faster with solid state drives than with regular hard 24 25 00:01:45,430 --> 00:01:47,620 drives. All right, 25 26 00:01:47,640 --> 00:01:50,440 now let's see what we've got. 26 27 00:01:50,440 --> 00:01:53,220 We've got SpamData. 27 28 00:01:53,230 --> 00:02:02,080 This is the extracted folder from here. If I double click on this, I have a nested folder here, 28 29 00:02:02,080 --> 00:02:09,730 "SpamData" and then I have this one here which we'll ignore or you can delete, but it's this folder here 29 30 00:02:09,730 --> 00:02:10,630 that you need. 30 31 00:02:10,630 --> 00:02:11,190 Right? 31 32 00:02:11,200 --> 00:02:15,740 So this is the one where you've got processing, training and testing. 32 33 00:02:15,820 --> 00:02:16,090 Right. 33 34 00:02:16,120 --> 00:02:25,990 So by default I've got a nested folder structure "SpamData" is inside another folder called "SpamData". 34 35 00:02:25,990 --> 00:02:33,720 Now, take this one here, the one inside and bring it over here. 35 36 00:02:33,730 --> 00:02:38,530 This thing here is just an artifact from me having zipped this file on a Mac. 36 37 00:02:38,620 --> 00:02:42,440 So you don't need to worry about this, you can just delete this file. 37 38 00:02:42,460 --> 00:02:44,370 It's not important. 38 39 00:02:44,370 --> 00:02:45,660 Now, here's the key. 39 40 00:02:45,700 --> 00:02:46,110 Right. 40 41 00:02:46,210 --> 00:02:52,470 Back in Jupyter notebook, we see the SpamData folder here come up and clicking on it, 41 42 00:02:52,600 --> 00:02:57,600 you see three folders. If you open this and you see another SpamData folder, 42 43 00:02:57,670 --> 00:03:03,010 then you've got a nested folder which you don't want and will cause problems later on. 43 44 00:03:03,010 --> 00:03:08,860 So this is the structure that we're really, really going for. The remainder of this module assumes that 44 45 00:03:08,860 --> 00:03:13,250 you've got this exactly the same way I'm doing this. 45 46 00:03:13,270 --> 00:03:15,220 You've got the same file names. 46 47 00:03:15,280 --> 00:03:16,690 You've got the same folder names. 47 48 00:03:16,780 --> 00:03:21,090 You've got the same folder structure and you've got the same location. 48 49 00:03:21,100 --> 00:03:23,260 This is how we're going to roll. 49 50 00:03:23,260 --> 00:03:31,150 So just double check that you're mirroring my setup here and that there is no accidental nested directories 50 51 00:03:31,270 --> 00:03:32,400 anywhere. 51 52 00:03:32,450 --> 00:03:37,200 And with that out of the way we're finally good to go. 52 53 00:03:37,270 --> 00:03:38,890 I'll see in the next lesson. 53 54 00:03:38,920 --> 00:03:39,420 Take care.