0 1 00:00:01,020 --> 00:00:07,440 The first big Python programming concept that we're going to talk about are variables. When you think 1 2 00:00:07,440 --> 00:00:10,200 about programming and software at a high level, 2 3 00:00:10,260 --> 00:00:12,630 it's all about manipulating data. 3 4 00:00:12,690 --> 00:00:18,960 For example, if you create a computer game, then you'll need to track the player score, the number of lives 4 5 00:00:18,960 --> 00:00:22,430 that the player has and what level the player is on. 5 6 00:00:22,470 --> 00:00:28,890 Of course, all this data changes over time as we play the video game and the program runs. 6 7 00:00:28,890 --> 00:00:36,400 So where do you as the programmer store this data and how do you track it? The Python programming language 7 8 00:00:36,400 --> 00:00:38,870 has a very convenient solution. 8 9 00:00:39,130 --> 00:00:44,830 You would store the game score in a variable. What is a variable? 9 10 00:00:45,140 --> 00:00:48,640 A variable is just a container for data. 10 11 00:00:48,640 --> 00:00:54,860 The important thing to understand about Python variables is that they both have a name and a value. 11 12 00:00:54,880 --> 00:01:02,920 For example, if I wanted to track my age inside a program I can simply create a variable called "myAge". 12 13 00:01:02,920 --> 00:01:12,520 "myAge" is the name of the variable, but because "myAge" is a container for data I can also give it a value, 13 14 00:01:12,550 --> 00:01:19,500 so I can stick the number 32 inside "myAge". In programming lingo 14 15 00:01:19,500 --> 00:01:27,460 this is called assigning a value to a variable. Now, as time passes and our program runs, 15 16 00:01:27,460 --> 00:01:34,810 we might need to update this variable, and we can do that simply by updating the value inside "myAge" 16 17 00:01:34,960 --> 00:01:44,980 with a new value. So we can replace the value 32 with the value 33. Now, the question is what would this 17 18 00:01:44,980 --> 00:01:52,230 look like in Python syntax? Inside the Python intro notebook that we created, 18 19 00:01:52,230 --> 00:01:57,650 we're going to create a variable called "myAge" and give it a value just like this" 19 20 00:01:57,750 --> 00:02:09,880 myAge = 32. This name "myAge" is now how we can refer to the number 32. To display the value inside 20 21 00:02:09,930 --> 00:02:12,340 the "myAge" variable below the cell, 21 22 00:02:12,340 --> 00:02:17,050 I'm simply going to write "print(myAge)". 22 23 00:02:19,750 --> 00:02:25,090 Hitting Shift+Enter on the keyboard will evaluate the code in the cell and then display this value 23 24 00:02:25,330 --> 00:02:28,810 contain inside the "myAge" variable. 24 25 00:02:28,810 --> 00:02:33,130 Now suppose our program is running and we need to update the value inside this variable. 25 26 00:02:33,250 --> 00:02:42,400 We can change this value of "myAge" simply by assigning a new value to it say with "myAge = 33". 26 27 00:02:42,460 --> 00:02:49,740 If I write "print(myAge)" now and hit Shift+Enter we'll see the value 33 printed out. 27 28 00:02:49,800 --> 00:02:51,860 No surprises there. Now, 28 29 00:02:51,910 --> 00:02:55,610 variables are super handy because you can do calculations with them. 29 30 00:02:55,680 --> 00:02:59,340 For example, if we write "print(myAge/3)" 30 31 00:03:03,670 --> 00:03:09,940 and hit Shift+Enter, we'll see 11.0 printed out below the cell. 31 32 00:03:09,940 --> 00:03:15,100 Also, we can take these two concepts that we just covered and take it up a step in complexity. 32 33 00:03:15,130 --> 00:03:19,260 Let's say we want to update the value of myAge like this. 33 34 00:03:19,390 --> 00:03:26,650 myAge = myAge + 1 34 35 00:03:26,710 --> 00:03:31,690 Now if it's your first time learning to program, this line of code is going to look very strange, but 35 36 00:03:31,690 --> 00:03:38,950 what's going on is that we're taking the previous value of myAge, namely the value 33 on the right hand 36 37 00:03:38,950 --> 00:03:44,490 side of the equals sign and then adding one to it. With that equals sign 37 38 00:03:44,500 --> 00:03:55,460 operator we are storing this new value - 34 (33 plus 1) inside of myAge on the left hand side. 38 39 00:03:55,480 --> 00:04:00,600 In other words, we are overwriting the previous value stored inside the 39 40 00:04:00,610 --> 00:04:13,320 myAge variable. If we write "print(myAge)" now and hit Shift+Enter, we'll see our value update to 34. 40 41 00:04:14,890 --> 00:04:22,260 So let's do a quick exercise on using and manipulating variables in Python. As a challenge, 41 42 00:04:22,480 --> 00:04:29,750 can you create a variable called "restaurantBill" and set its value equal to 36.17? 42 43 00:04:29,920 --> 00:04:37,690 Then create a variable called "serviceCharge" and set its value equal to 0.125. 12 and 43 44 00:04:37,690 --> 00:04:42,850 a half percent seems to be the standard rate that restaurants suggest as a tip in London these days. 44 45 00:04:44,020 --> 00:04:49,970 Finally, print out the amount of tip below the cell that you would need to add to the bill. 45 46 00:04:50,020 --> 00:04:52,120 I'll give you a few seconds to figure this out. 46 47 00:04:52,150 --> 00:04:55,900 So pause the video. Okay, 47 48 00:04:55,930 --> 00:04:59,170 so here's the solution. For part 1, 48 49 00:04:59,230 --> 00:05:06,650 we'll write "restaurantBill = 36.17". For part 2, 49 50 00:05:06,730 --> 00:05:08,870 we'll write 50 51 00:05:09,040 --> 00:05:15,680 "serviceCharge = 0.125"; and for part 3 we'll write 51 52 00:05:15,700 --> 00:05:18,500 print() 52 53 00:05:18,970 --> 00:05:20,670 restaurantBill 53 54 00:05:20,680 --> 00:05:22,570 Times serviceCharge. 54 55 00:05:26,200 --> 00:05:27,880 When I hit Shift+Enter, 55 56 00:05:27,910 --> 00:05:33,000 I'll see that the value of the tip is 4.52. 56 57 00:05:33,040 --> 00:05:38,470 Now all that's left to do is asking the waiter if the restaurant is pocketing the money, or if the tip 57 58 00:05:38,470 --> 00:05:41,270 really does go to the staff. 58 59 00:05:41,270 --> 00:05:48,190 Now, one thing that could happen to you at this point is that you've made a typo. In programming, 59 60 00:05:48,200 --> 00:05:50,340 everything is case sensitive. 60 61 00:05:50,540 --> 00:05:58,820 And if our variables names didn't match exactly to how they were defined, we're gonna get some unexpected 61 62 00:05:59,060 --> 00:06:00,590 errors. 62 63 00:06:00,620 --> 00:06:08,090 For example, if this capital B in my print statement instead was a lower case b and we had hit Shift+Enter 63 64 00:06:08,120 --> 00:06:15,010 we would have gotten the following error - "NameError: 'restaurantBill' 64 65 00:06:15,070 --> 00:06:21,880 is not defined", and this is because this restaurantbill and this restaurantBill are considered to 65 66 00:06:21,880 --> 00:06:24,390 be completely different entities. 66 67 00:06:24,790 --> 00:06:30,760 So you want to make sure that you never have any typos in your variable names. 67 68 00:06:30,760 --> 00:06:31,930 The same of course is true 68 69 00:06:31,960 --> 00:06:33,770 if we miss out a letter. 69 70 00:06:33,790 --> 00:06:42,010 So for example, if we wrote "seviceCharge" instead of "serviceCharge", then we would get exactly the same 70 71 00:06:42,040 --> 00:06:45,700 error - our variable is not defined. 71 72 00:06:45,760 --> 00:06:51,400 In other words, Python can't find something that matches this name. 72 73 00:06:51,400 --> 00:06:54,220 So let's put that "r" back where it belongs. 73 74 00:06:54,470 --> 00:07:00,550 Now that we've learned a thing or two about Python programming and variables, let's revisit the code 74 75 00:07:00,550 --> 00:07:08,400 that we wrote when we were estimating our movie revenue. Since I saved my previous work in a Python notebook 75 76 00:07:08,490 --> 00:07:11,030 and added it to the MLProjects folder, 76 77 00:07:11,100 --> 00:07:12,480 I'm just going to open that now. 77 78 00:07:15,790 --> 00:07:16,110 Now, 78 79 00:07:16,120 --> 00:07:21,240 even though this code is a little bit more complex than what we have written just now, 79 80 00:07:21,400 --> 00:07:25,890 can you spot the variables in this piece of code? 80 81 00:07:25,990 --> 00:07:27,720 Now there's actually quite a few. 81 82 00:07:27,940 --> 00:07:33,990 And you'll notice that the variables are often to the left hand side of an equal sign. 82 83 00:07:34,000 --> 00:07:44,900 So for example data is a variable, the capital X is a variable, the lower case y is also a variable. 83 84 00:07:44,900 --> 00:07:52,370 All of these variables are holding on to data, but in contrast the variables myAge and restaurantBill, 84 85 00:07:53,000 --> 00:07:59,750 these X and y variables are not holding on to a single value, they are holding onto lots and lots of 85 86 00:07:59,750 --> 00:08:01,950 values at the same time. 86 87 00:08:02,030 --> 00:08:06,950 And I'm going to show you guys how that works in a little bit. But before I do that, 87 88 00:08:06,960 --> 00:08:13,230 let's talk about another key concept that we need to understand when working with variables, namely data 88 89 00:08:13,230 --> 00:08:20,600 types. In machine learning and programming, more generally, we'll be working with different kinds of data. 89 90 00:08:20,750 --> 00:08:29,750 We'll be working with text, decimal numbers, tables of data, columns of data, images, sounds, video, all sorts. 90 91 00:08:31,480 --> 00:08:36,230 And a programming language like Python will categorize this data. 91 92 00:08:36,230 --> 00:08:42,650 In other words different kinds of data, like text or decimal numbers and whole numbers, will have a different 92 93 00:08:42,650 --> 00:08:43,840 category. 93 94 00:08:43,850 --> 00:08:48,320 In other words, they will have a different data type. 94 95 00:08:48,320 --> 00:08:53,600 Now you can think of data types like this children's toy where you have to fit the right shape into 95 96 00:08:53,600 --> 00:08:55,060 the hole. 96 97 00:08:55,070 --> 00:09:01,430 So for example when you try to put a decimal number somewhere where Python expects to have a string, 97 98 00:09:01,790 --> 00:09:04,350 you'll often find that you have a problem. 98 99 00:09:04,550 --> 00:09:07,840 And this makes sense when you think about it intuitively. 99 100 00:09:07,940 --> 00:09:10,980 So you have some Python code that makes a calculation. 100 101 00:09:11,090 --> 00:09:13,160 So you're adding two things together. 101 102 00:09:13,280 --> 00:09:18,600 If those two things are numbers then you're good; 5+10 is 15. 102 103 00:09:19,190 --> 00:09:26,290 But if one of those things is another kind of data, like, I don't know, a home address you have a problem. 103 104 00:09:26,360 --> 00:09:30,100 Your program will crash or it's going to do something very unexpected. 104 105 00:09:30,110 --> 00:09:35,690 Trying to evaluate 5 + 21 James Street. 105 106 00:09:35,690 --> 00:09:38,990 It doesn't make any sense. Now for the most part, 106 107 00:09:39,080 --> 00:09:45,470 Python will actually take care of the data types behind the scenes, so it's not something that's kind 107 108 00:09:45,470 --> 00:09:49,960 of at the forefront of the programming syntax of the Python code. 108 109 00:09:49,970 --> 00:09:56,360 But let me show you how you can actually see the data type in Jupyter notebook because we can ask Python 109 110 00:09:56,540 --> 00:10:04,670 what category something belongs to by writing "type()" and then putting something between those 110 111 00:10:04,670 --> 00:10:13,690 two parentheses. So for example, if I write "type(33)" and hit Shift+Enter, the type of this whole number is 111 112 00:10:13,910 --> 00:10:20,080 int. Int stands for integer, which makes sense. 112 113 00:10:20,440 --> 00:10:24,940 But there's also quite a few other types with maybe less intuitive names. 113 114 00:10:24,940 --> 00:10:33,800 So let me introduce a couple of them; if we write "type(33.6)" and hit Shift+Enter, we get 114 115 00:10:33,800 --> 00:10:38,240 to see that decimal numbers are classified differently. 115 116 00:10:38,240 --> 00:10:47,570 Decimal numbers are classified as floats or floating point numbers. Floating point numbers are the type 116 117 00:10:47,780 --> 00:10:53,430 that you will usually be working with every time you're dealing with numbers that have a decimal point. 117 118 00:10:53,430 --> 00:11:01,960 Now let me show you what the type for text is called so can write "type()" and then between the parentheses 118 119 00:11:02,260 --> 00:11:09,350 I'm going to open the single quotes and add my name. When I hit Shift+Enter, 119 120 00:11:09,460 --> 00:11:19,800 I can see that type for text is called "str", and "str" stands for string. The word string is just computer 120 121 00:11:19,800 --> 00:11:26,910 jargon for a piece of text or a sequence of characters. And the way that you can tell that something 121 122 00:11:27,000 --> 00:11:33,090 is considered to be a string is by the fact that strings are always, always surrounded by either single 122 123 00:11:33,420 --> 00:11:40,710 or double quotes and Jupyter notebook is actually also very helpful with the syntax highlighting so 123 124 00:11:40,710 --> 00:11:45,790 that the Python code that is considered to be a string actually has a different color. 124 125 00:11:45,810 --> 00:11:49,840 So in this case the string is marked as red. 125 126 00:11:49,860 --> 00:11:55,370 Let's take another look at the Python code that we wrote previously and see if we can spot a string. 126 127 00:11:55,550 --> 00:12:02,900 Here we can see that when we were writing "pd.read_csv()", the value 127 128 00:12:02,930 --> 00:12:08,120 that we specified between these two parentheses was a string. 128 129 00:12:08,120 --> 00:12:15,890 It was a piece of text marked by the single quotes. And the same is true for the column names - "production_budget_usd"; 129 130 00:12:15,940 --> 00:12:22,940 and it's also true for the X label on our graph and all the other 130 131 00:12:22,940 --> 00:12:27,080 parts where the code is highlighted in red. 131 132 00:12:27,250 --> 00:12:29,490 Okay, so far so good. 132 133 00:12:29,590 --> 00:12:37,090 We've just introduced you to a whole bunch of new programming jargon - variables, data types, int for integers, 133 134 00:12:37,150 --> 00:12:42,530 float for decimal numbers, str for pieces of text or strings. 134 135 00:12:42,700 --> 00:12:45,880 But how do data types relate to variables? 135 136 00:12:45,880 --> 00:12:53,560 What's the connection between the two? Well, if the data has a certain type then so does the variable 136 137 00:12:53,620 --> 00:13:00,410 containing the data, because after all a variable is just a container. Now, for example, let's see what 137 138 00:13:00,410 --> 00:13:08,210 the data type is for the variable myAge so I can write type and then between the parentheses I'm going 138 139 00:13:08,210 --> 00:13:16,010 to supply myAge and hit Shift+Enter and here we can see that, because myAge holds on to whole numbers, 139 140 00:13:16,310 --> 00:13:22,500 the data type of the variable is also integer or int. 140 141 00:13:22,520 --> 00:13:26,150 Now let's take a look at the data type for restaurantBill, 141 142 00:13:29,390 --> 00:13:35,320 and I'm sure it's no surprise that, because restaurantBill is holding on to a decimal number 142 143 00:13:35,330 --> 00:13:36,840 36.17, 143 144 00:13:36,840 --> 00:13:45,850 it's a type float. If I was to create a new variable called "myName" and set it equal to the string 144 145 00:13:45,870 --> 00:13:56,690 Philip, I can write "type(myName)", hit Shift+Enter and I can see that the myName variable is a string. 145 146 00:13:57,560 --> 00:14:03,890 Now let's take another look at the notebook where we ran our regression, and let me ask you this: what 146 147 00:14:03,890 --> 00:14:12,090 do you think is the data type of the variables data, capital X and lowercase y? 147 148 00:14:12,120 --> 00:14:18,350 This is the mystery that we're gonna be exploring over the next couple of lessons. I'll see you there.