1 00:00:00,240 --> 00:00:02,080 Now we're almost done. 2 00:00:02,100 --> 00:00:09,510 One of the things that we can do at the very end of this particular notebook is prepare our test data. 3 00:00:09,540 --> 00:00:16,650 We've written this nice little function up above called make full matrix so why don't we call it and 4 00:00:16,650 --> 00:00:22,130 kind of run our test data through it and then save that as a file for later use. 5 00:00:22,140 --> 00:00:29,730 So I had a markdown cell here that's going to read prepare a test data then I'm going to come up here 6 00:00:29,840 --> 00:00:33,890 to my constants and I'll add two more file paths. 7 00:00:34,050 --> 00:00:36,280 This first one I'm going to name it. 8 00:00:36,420 --> 00:00:43,890 Test on the school feature at the school matrix and we're going to store it in our testing their rectory 9 00:00:44,280 --> 00:00:50,250 and we'll just call this text file test hyphen features. 10 00:00:50,410 --> 00:00:59,860 The second constant will call test on a school target on a school file and we'll call it test hyphen 11 00:01:00,610 --> 00:01:04,170 target having set up the file paths. 12 00:01:04,170 --> 00:01:06,750 I'd like to pose a challenge to you. 13 00:01:06,870 --> 00:01:13,420 I'd like you to load the sparse test data and create a full matrix for the test data. 14 00:01:13,650 --> 00:01:18,420 Further up in the notebook we've already loaded our data into this variable so there's no need to load 15 00:01:18,420 --> 00:01:24,640 it again when making the function call to create the full matrix from the test data. 16 00:01:24,790 --> 00:01:29,030 Add some micro benchmarking code to it time that function call. 17 00:01:29,110 --> 00:01:30,790 Figure out how long it takes. 18 00:01:31,060 --> 00:01:40,330 Afterwards separate the features and the target values and save these as separate Dot T X T files preferably 19 00:01:40,600 --> 00:01:44,140 using the relative paths that we added earlier. 20 00:01:44,390 --> 00:01:51,130 Test on escort target on a score file and test on the score a feature on the score matrix. 21 00:01:51,130 --> 00:01:58,320 I'll give you a few seconds to pause the video and have a go at this and here's the solution. 22 00:01:58,610 --> 00:02:06,590 So are sparse on a squat test on a score data we've already loaded up and this whole thing had the following 23 00:02:06,590 --> 00:02:11,660 shape about one hundred and ten thousand rows and four columns. 24 00:02:11,660 --> 00:02:18,920 The micro benchmarking will add with percent percent time and I'll store the results of our function 25 00:02:18,920 --> 00:02:24,340 call in a variable called full on a score test score data. 26 00:02:24,440 --> 00:02:33,050 This will hold on to the result from make on a score full on a score matrix which as an input will take 27 00:02:33,080 --> 00:02:43,760 all sparse test data and for the number of words we'll use vocab on a score size hitting shift enter 28 00:02:43,760 --> 00:02:49,890 on this should make our computer run between 5 and 10 seconds after that. 29 00:02:50,060 --> 00:02:55,780 I'll create two further variables x underscore test and X on a score test. 30 00:02:55,850 --> 00:03:03,470 I'll set equal to full test data dot and low C so when I create a subset square brackets. 31 00:03:03,470 --> 00:03:15,630 Colon comma and then I'll say full test data columns not equal to exclamation mark equals single quotes. 32 00:03:15,860 --> 00:03:20,490 Category for my target values y underscore test. 33 00:03:20,510 --> 00:03:23,730 I'll set those equal to full test data. 34 00:03:23,800 --> 00:03:27,020 Don't category. 35 00:03:27,020 --> 00:03:28,320 There we go. 36 00:03:28,370 --> 00:03:39,560 Now all I'm gonna do is save both of these pieces of information to a file with NDP dot safety x t parentheses 37 00:03:40,770 --> 00:03:47,030 test target file comma Y on a school test. 38 00:03:47,100 --> 00:03:55,590 This is where our target value is gonna be saved and with NDP dot safe text parentheses test feature 39 00:03:55,590 --> 00:03:59,290 metrics how safe X on a school test. 40 00:04:00,200 --> 00:04:06,820 And that's it before I hit shift enter on this I'll very quickly bring up my folder on my hard drive 41 00:04:07,120 --> 00:04:11,190 right here and then we can see it being created. 42 00:04:11,290 --> 00:04:12,990 Fantastic. 43 00:04:13,240 --> 00:04:21,250 Well we've done quite a lot of legwork up till now and we've put ourselves in a great position to finally 44 00:04:21,460 --> 00:04:25,250 run our algorithm and make some predictions. 45 00:04:25,270 --> 00:04:27,010 We've trained our model. 46 00:04:27,010 --> 00:04:29,000 We've got all our data ready. 47 00:04:29,050 --> 00:04:36,910 We're now ready to put our base classifier through its paces and on that note I'll say See you in the 48 00:04:36,910 --> 00:04:37,990 next listen. 49 00:04:37,990 --> 00:04:38,990 Goodbye. 50 00:04:39,520 --> 00:04:43,720 As we say back home in Austria Savills take her.