1 00:00:00,266 --> 00:00:01,433 All right, let's do this. 2 00:00:01,433 --> 00:00:04,100 Let's predict the test set results. 3 00:00:04,100 --> 00:00:08,266 So as I said in the previous tutorial, we're going to display two vectors. 4 00:00:08,433 --> 00:00:12,033 First one being the vector of the real profits in the test set, 5 00:00:12,300 --> 00:00:16,200 and the second one being the vector of the predicted profits in the same test set. 6 00:00:16,466 --> 00:00:18,300 And we're going to compare next to each other 7 00:00:18,300 --> 00:00:21,633 to see if our predictions are close to the real results. 8 00:00:22,033 --> 00:00:23,100 All right, let's do this. 9 00:00:23,100 --> 00:00:26,100 Let's start by creating a new code cell. 10 00:00:26,133 --> 00:00:30,033 So the first thing I asked you to do was to indeed get that vector of predictions. 11 00:00:30,200 --> 00:00:34,033 So first, since we want to get a vector, I'm going to introduce a new variable, 12 00:00:34,200 --> 00:00:37,200 which I'm going to call as usual y pred. 13 00:00:37,633 --> 00:00:41,600 So that's going to be the vector of the predicted profits in the test set. 14 00:00:42,233 --> 00:00:44,300 And now how do we get our predictions. 15 00:00:44,300 --> 00:00:45,600 Well you know the solution. 16 00:00:45,600 --> 00:00:49,200 It's actually exactly the same as with simple linear regression. 17 00:00:49,466 --> 00:00:52,766 We first need to take our regressor object. 18 00:00:52,833 --> 00:00:56,566 You know our multiple linear regression model from which 19 00:00:56,566 --> 00:01:00,000 we're going to call that predict method. 20 00:01:00,566 --> 00:01:05,866 And of course in this method we have to input the features of the test set. 21 00:01:05,866 --> 00:01:09,866 You know exactly these ones not including the prophet of course. 22 00:01:10,100 --> 00:01:13,800 These are all the features with first the encoded variables 23 00:01:13,800 --> 00:01:17,700 for the state, then the R&D spend and the administration spend. 24 00:01:17,900 --> 00:01:19,666 And the marketing spend. 25 00:01:19,666 --> 00:01:20,000 All right. 26 00:01:20,000 --> 00:01:24,300 So these are all the features which we need to input in our predict 27 00:01:24,300 --> 00:01:27,733 method here in order to predict the profits. 28 00:01:27,766 --> 00:01:30,766 So let's do this X test. 29 00:01:31,000 --> 00:01:31,866 And there you go. 30 00:01:31,866 --> 00:01:34,866 Now we get our vector of predicted profits. 31 00:01:35,200 --> 00:01:37,233 Perfect. And now the next step. 32 00:01:37,233 --> 00:01:38,233 The thing that I'm going to do 33 00:01:38,233 --> 00:01:42,666 is I'm going to call numpy, which has a shortcut NP from which I'm 34 00:01:42,666 --> 00:01:48,066 going to call one of its function which is set underscore print options. 35 00:01:48,633 --> 00:01:52,033 And now in the parentheses I'm just going to enter 36 00:01:52,033 --> 00:01:55,033 precision equals to. 37 00:01:55,033 --> 00:01:55,433 All right. 38 00:01:55,433 --> 00:02:00,633 And this will display any numerical value with only two decimals after comma. 39 00:02:00,833 --> 00:02:01,166 All right. 40 00:02:01,166 --> 00:02:02,766 So this will be much more beautiful 41 00:02:02,766 --> 00:02:06,000 to visualize and mostly much easier to visualize. 42 00:02:06,600 --> 00:02:07,700 Okay. Good. 43 00:02:07,700 --> 00:02:08,966 And now final step. 44 00:02:08,966 --> 00:02:12,233 This is a step where we're going to display the two vectors 45 00:02:12,233 --> 00:02:15,933 of the real profits and of the predicted profits together next to each other. 46 00:02:16,433 --> 00:02:20,366 And to do this well we're going to use, you know, a classic trick with numpy. 47 00:02:20,400 --> 00:02:21,466 Concatenate. 48 00:02:21,466 --> 00:02:25,533 Concatenate is a function of numpy that allows to concatenate 49 00:02:25,533 --> 00:02:29,300 either vertically or horizontally two vectors or even arrays. 50 00:02:29,600 --> 00:02:33,166 And so now we're going to use the concatenate function to concatenate 51 00:02:33,166 --> 00:02:37,133 vertically are two vectors of the real profits and the predicted profits. 52 00:02:37,433 --> 00:02:39,666 All right so let's do this. Just follow me. 53 00:02:39,666 --> 00:02:42,133 I will explain everything as I'm coding. 54 00:02:42,133 --> 00:02:44,700 So first we're simply going to start with a print. 55 00:02:44,700 --> 00:02:48,300 You know because we want to print that concatenation of the two vectors. 56 00:02:48,666 --> 00:02:49,700 Then here we go. 57 00:02:49,700 --> 00:02:54,133 We're going to call that concatenate function by numpy from which there we go. 58 00:02:54,133 --> 00:02:58,533 We call the concatenate function perfect parentheses. 59 00:02:59,166 --> 00:03:02,166 And now be careful because this is always a bit confusing. 60 00:03:02,333 --> 00:03:07,200 The concatenate function by numpy actually expects as a first argument. 61 00:03:07,266 --> 00:03:11,733 Check the cell here, the couple of arrays you want to concatenate 62 00:03:11,733 --> 00:03:13,300 or you know, vectors. 63 00:03:13,300 --> 00:03:15,366 So actually all the description is here. 64 00:03:15,366 --> 00:03:19,700 You know A1, a2 is a sequence of arrays you want to concatenate. 65 00:03:19,700 --> 00:03:21,266 And that must have the same shape. 66 00:03:21,266 --> 00:03:25,066 Well that's perfect for us because of course our vector of predicted profits 67 00:03:25,066 --> 00:03:28,066 and the vector of real profits have exactly the same shape, meaning 68 00:03:28,133 --> 00:03:32,333 they are one dimensional vector containing the same number of profits. 69 00:03:32,433 --> 00:03:33,766 Okay, so that's good. 70 00:03:33,766 --> 00:03:37,600 But these vectors that we want to concatenate have to be 71 00:03:37,866 --> 00:03:39,566 in, you know, some parentheses. 72 00:03:39,566 --> 00:03:42,000 And these are actually the first argument. 73 00:03:42,000 --> 00:03:45,866 You know, this double of arrays of vectors is the first argument 74 00:03:45,866 --> 00:03:47,066 input in parentheses. 75 00:03:47,066 --> 00:03:49,733 So here I'm going to add new parentheses. 76 00:03:49,733 --> 00:03:52,166 All right. This is the first argument. 77 00:03:52,166 --> 00:03:54,066 And what is going to be inside this parentheses. 78 00:03:54,066 --> 00:03:56,033 Well of course are two vectors 79 00:03:56,033 --> 00:03:59,033 the vector of predicted profits and the vector of real profits. 80 00:03:59,700 --> 00:04:00,000 All right. 81 00:04:00,000 --> 00:04:03,000 So let's first add the vector of predicted profit. 82 00:04:03,066 --> 00:04:06,066 This is of course what pred. 83 00:04:06,233 --> 00:04:09,666 And now because we want to display them vertically 84 00:04:09,666 --> 00:04:11,533 and not horizontally remember that. 85 00:04:11,533 --> 00:04:14,700 You know actually I can show this to you in data preprocessing tools. 86 00:04:14,700 --> 00:04:17,700 We actually printed Y at some point. 87 00:04:18,066 --> 00:04:18,700 There we go. 88 00:04:18,700 --> 00:04:20,933 You know, remember when we print the dependent 89 00:04:20,933 --> 00:04:24,200 variable vector, since it is vector it is displayed horizontally. 90 00:04:24,600 --> 00:04:26,400 I actually prefer to display, 91 00:04:26,400 --> 00:04:29,966 you know, our two vectors of predictions in real profit vertically. 92 00:04:30,100 --> 00:04:34,200 So now I'm going to add another trick to put that vertically. 93 00:04:34,200 --> 00:04:36,766 You know, to convert that from being horizontal to vertical. 94 00:04:36,766 --> 00:04:40,733 And the trick to do that is just add dot reshape. 95 00:04:41,066 --> 00:04:43,100 Reshape is an attribute function that allows 96 00:04:43,100 --> 00:04:45,766 to, you know, reshape your vectors or arrays. 97 00:04:45,766 --> 00:04:49,100 And in order to reshape a vector from being horizontal to vertical, 98 00:04:49,233 --> 00:04:52,233 well, we just need to add as input of this function. 99 00:04:52,400 --> 00:04:55,400 First, the number of elements in y print. 100 00:04:55,466 --> 00:04:57,400 And that's actually so far the number of columns 101 00:04:57,400 --> 00:04:59,033 because you know it is horizontal. 102 00:04:59,033 --> 00:05:01,866 And to get that number we can simply use the 103 00:05:01,866 --> 00:05:05,200 Len function, which returns the length of a vector. 104 00:05:05,500 --> 00:05:09,766 And therefore here I'm going to input y Brett okay. 105 00:05:10,200 --> 00:05:13,233 So this is the first element of the reshape function. 106 00:05:13,233 --> 00:05:15,933 And the second one is just one. 107 00:05:15,933 --> 00:05:17,033 All right. 108 00:05:17,033 --> 00:05:17,966 So what does it mean. 109 00:05:17,966 --> 00:05:22,133 That means that you want to reshape your y vector into an array. 110 00:05:22,166 --> 00:05:24,933 Having Len y red rows. 111 00:05:24,933 --> 00:05:28,300 Meaning you know that the number of rows will be equal to the number of strips. 112 00:05:28,633 --> 00:05:30,666 And then just one column. 113 00:05:30,666 --> 00:05:31,100 All right. 114 00:05:31,100 --> 00:05:32,000 That's what it means. 115 00:05:32,000 --> 00:05:34,500 So it's good that you know this reshape trick. 116 00:05:34,500 --> 00:05:36,133 Now you know how to reshape your vectors. 117 00:05:36,133 --> 00:05:38,500 And you will see that it is actually going to be much nicer.