1
00:00:00,360 --> 00:00:04,370
So I'd say we're pretty much done with training our base classifier.

2
00:00:04,710 --> 00:00:11,550
Let's put our code for testing it and evaluating our algorithm into a separate notebook.

3
00:00:11,560 --> 00:00:20,480
So come to my projects folder and create a new notebook and I will name this notebook.

4
00:00:20,480 --> 00:00:33,420
0 7 B's classifier hyphen testing inference and Evaluation at the top.

5
00:00:33,530 --> 00:00:37,520
Of course we'll add our notebook imports.

6
00:00:37,520 --> 00:00:39,900
These are gonna be our usual suspects.

7
00:00:39,980 --> 00:00:47,990
If you still have your training notebook open you can copy none pine pandas paste them in here and then

8
00:00:47,990 --> 00:00:49,730
just add Matt.

9
00:00:49,750 --> 00:00:55,960
Plot lib dot pi plot as BLT.

10
00:00:56,090 --> 00:01:01,850
We're gonna be doing some graphing and visualization in this notebook so we're gonna need map plot lib

11
00:01:02,270 --> 00:01:08,630
and Seabourn as S.A. because we've got map plot lib in here.

12
00:01:08,690 --> 00:01:19,430
We'll add some Python notebook magic with scent not plot lib in line just below we'll add our constants

13
00:01:20,360 --> 00:01:26,110
here right and grab some of the same file paths that we used in our training file.

14
00:01:26,180 --> 00:01:31,430
I'll grab all our constants copy them and I'll paste them over.

15
00:01:31,490 --> 00:01:37,200
The only thing I'll do is I'll delete the training unescorted data on a school file.

16
00:01:37,340 --> 00:01:44,990
So our training data and our test data all we need in this notebook are our probabilities that we've

17
00:01:44,990 --> 00:01:51,860
worked out for our tokens and our features and target for our test data set.

18
00:01:52,220 --> 00:01:57,710
And these two we've prepared in the previous notebook and we've got them right here.

19
00:01:57,710 --> 00:02:06,900
So let me hit shift enter on the cell and we're ready to load the data load the data.

20
00:02:06,950 --> 00:02:07,950
There we go.

21
00:02:08,150 --> 00:02:13,470
All our stuff is in text files so this should be fairly easy.

22
00:02:13,490 --> 00:02:23,810
Our features will store in a variable called X test and this will be equal to NDP dot load t 60 parentheses

23
00:02:25,100 --> 00:02:32,100
test feature matrix comma the limit to single quotes with a space.

24
00:02:32,290 --> 00:02:40,390
Our target will be y at a school test and that will be NDP dot low to T.

25
00:02:40,820 --> 00:02:41,930
You guessed it.

26
00:02:42,260 --> 00:02:47,870
Test target file to limit a single quotes space.

27
00:02:48,260 --> 00:02:49,380
Next one up.

28
00:02:49,700 --> 00:02:53,060
Call this one token problem.

29
00:02:53,150 --> 00:02:55,190
Bill at TS.

30
00:02:55,250 --> 00:03:00,770
Need the probability that a token is spam.

31
00:03:00,770 --> 00:03:11,270
So if we end p dot low T T token spam probability limit a space.

32
00:03:11,630 --> 00:03:24,030
Copy this added two more times for the other two probability files the ham and the probability all on

33
00:03:24,470 --> 00:03:26,790
the school tokens

34
00:03:29,500 --> 00:03:35,710
these two of course will point to the other file paths home on a school probability on a score file

35
00:03:36,430 --> 00:03:40,910
and took in all probability file.

36
00:03:40,990 --> 00:03:41,970
There we go.

37
00:03:41,980 --> 00:03:45,870
Just let me hit shift enter and that's it.

38
00:03:45,940 --> 00:03:48,680
We're all set up and ready to go.

39
00:03:48,700 --> 00:03:50,950
This is where the real work begins.

40
00:03:50,960 --> 00:03:52,300
I'll see you in the next lesson.

41
00:03:52,300 --> 00:03:52,770
Take care.