0 1 00:00:01,080 --> 00:00:10,120 In this lesson we're going to talk about Input statements, modules and packages. In the previous lessons, 1 2 00:00:10,140 --> 00:00:15,830 we've actually seen and used three different input statements already. 2 3 00:00:15,840 --> 00:00:26,310 For example, we had "import pandas as pd", we had "import matplotlib.pyplot as plt" and we also had 3 4 00:00:26,430 --> 00:00:32,470 "from sklearn.linear_model import LinearRegression". 4 5 00:00:32,490 --> 00:00:38,760 These are the three import statements that we're going to talk about in detail in this video. Each of 5 6 00:00:38,760 --> 00:00:46,670 these import statements imports a Python module that we can then use in our Jupyter notebook. If you've 6 7 00:00:46,670 --> 00:00:51,000 already got some experience in programming, perhaps in other languages, 7 8 00:00:51,020 --> 00:00:56,450 the easiest way to think about Python modules is as frameworks or as libraries. 8 9 00:00:56,450 --> 00:01:00,300 This is what they're going to be called in other languages. 9 10 00:01:00,370 --> 00:01:09,010 That said, the simplest way to think about a Python module is that every file that ends in ".py" is 10 11 00:01:09,040 --> 00:01:16,630 a Python module and by importing a module we get access to the variables and the functions inside the 11 12 00:01:16,630 --> 00:01:18,880 file that was imported. 12 13 00:01:18,880 --> 00:01:24,960 In other words, importing gives us access to a .py files contents. 13 14 00:01:25,120 --> 00:01:30,240 Now, so far we actually haven't seen any files with a .py extension. 14 15 00:01:30,400 --> 00:01:34,380 And why should a file end in .py in the first place? 15 16 00:01:35,580 --> 00:01:44,280 Well a .py file is a file that contains Python code and to see how this all works we're gonna create 16 17 00:01:44,370 --> 00:01:48,420 a file it with a .py extension together. 17 18 00:01:48,420 --> 00:01:55,920 To do that we're gonna open a text editor. In fact, every Windows machine comes with a default text editor 18 19 00:01:55,940 --> 00:02:02,590 pre-installed. That text editor's name is notepad and it looks like this. 19 20 00:02:02,710 --> 00:02:05,970 Now you're probably not going to have the shortcut on your taskbar, 20 21 00:02:05,970 --> 00:02:14,670 so go to your Start menu and then go to find all your programs and search for "Notepad" and you should 21 22 00:02:14,670 --> 00:02:19,970 also find it hard. And remember, not Microsoft Word - Notepad. 22 23 00:02:19,980 --> 00:02:21,340 That's what we're going with. 23 24 00:02:21,360 --> 00:02:25,380 If you have a Windows machine, then open Notepad now. 24 25 00:02:25,400 --> 00:02:31,260 Now on the other hand if you're on Mac, the default text that are that you've got pre-installed is called 25 26 00:02:31,590 --> 00:02:39,030 TextEdit and you can find this program in your applications folder. So open this program now, 26 27 00:02:40,150 --> 00:02:42,070 and create a new document. 27 28 00:02:42,400 --> 00:02:49,900 If your new document looks like this then go to "Format" and click on "Make Plain Text". This is what 28 29 00:02:49,900 --> 00:02:53,770 you want your text editor to look like for the next steps. 29 30 00:02:53,800 --> 00:02:55,920 Now of course there are lots of other, 30 31 00:02:55,950 --> 00:03:03,610 and some might say superior, text editors that are available, too. My current favorite is actually called Atom 31 32 00:03:03,880 --> 00:03:12,090 and this is made by the team behind the GitHub Web site I've included the link to atom.io in the 32 33 00:03:12,090 --> 00:03:19,590 lesson resources if you want to download it. It's available for both Windows and for Mac and once you 33 34 00:03:19,590 --> 00:03:24,260 start using it you'll never go back to Notepad again, I promise. 34 35 00:03:25,330 --> 00:03:33,130 Now that you've got your text editor open create, a variable called "theAnswer" and set it equal to the 35 36 00:03:33,130 --> 00:03:41,470 number 42, then go to "File" and click "Save As". Now, navigate to your desktop 36 37 00:03:42,940 --> 00:03:51,310 and instead of saving this file with a ".txt" extension, what we're gonna do is we're gonna call it 37 38 00:03:51,850 --> 00:03:58,080 "life" and then append a ".py" extension. 38 39 00:03:58,120 --> 00:04:01,420 This is because we want to save a ".py" file. 39 40 00:04:01,540 --> 00:04:08,950 The important thing on Windows is that you have to change the Save As file type from text documents 40 41 00:04:09,190 --> 00:04:10,220 (.txt) to 41 42 00:04:10,250 --> 00:04:13,840 this other item in the dropdown. 42 43 00:04:13,900 --> 00:04:19,240 You want to change it to "All Files" and only then can you click "Save". 43 44 00:04:19,660 --> 00:04:25,300 After you've done that, check your desktop to make sure that the file that you saved really does read 44 45 00:04:25,600 --> 00:04:31,310 .py at the end and not something else like .py.txt, 45 46 00:04:31,430 --> 00:04:39,380 Okay? This is what we're looking for "life.py". Now, if you're on Mac and you're using TextEdit we're 46 47 00:04:39,510 --> 00:04:42,200 going do the very same thing as the Windows crew. 47 48 00:04:42,200 --> 00:04:53,460 We're gonna create a variable called theAnswer and set it equal to 42 and then we're gonna go to "Save". 48 49 00:04:53,600 --> 00:05:02,700 You're also gonna go to desktop and then you should be able to name your file "life" and give it the extension. 49 50 00:05:02,740 --> 00:05:04,920 ".py". 50 51 00:05:05,200 --> 00:05:13,330 Now before you click "Save" just double check that it says "Plain Text Encoding: Unicode" down here. 51 52 00:05:13,620 --> 00:05:20,910 If it says something else like Rich Text you haven't gone to format and made the file plain text. So 52 53 00:05:21,030 --> 00:05:22,500 after you've verified this, 53 54 00:05:22,500 --> 00:05:31,530 just click "Save" and then you can quit your text editor and verify that there is indeed a life.py 54 55 00:05:31,620 --> 00:05:34,960 file on your desktop. 55 56 00:05:34,960 --> 00:05:39,600 And just like that we've created our first Python script file. 56 57 00:05:39,600 --> 00:05:43,650 Now let's upload this file to our Jupyter notebook. 57 58 00:05:44,100 --> 00:05:54,170 Go to your MLProjects folder in Jupyter and click "Upload" then navigate to your desktop where you've 58 59 00:05:54,170 --> 00:06:01,540 saved your file, choose it and click "Open" and then afterwards confirm your upload here. 59 60 00:06:03,100 --> 00:06:09,010 If you've been successful. you should be able to click on this .py file and verify it that it says 60 61 00:06:09,100 --> 00:06:14,920 "theAnswer = 42". Jupyter notebook will open this in a new tab for you. 61 62 00:06:15,700 --> 00:06:21,130 The exciting thing is that now that we've uploaded our Python script as a .py file to our project 62 63 00:06:21,130 --> 00:06:25,270 folder, we can start using it in our Jupyter notebook. 63 64 00:06:25,270 --> 00:06:30,130 Check this out. Back in our Python intro Jupyter notebook, 64 65 00:06:30,130 --> 00:06:34,670 We can write "import life" and hit Shift+Enter. 65 66 00:06:35,110 --> 00:06:39,530 If you see no error messages at this point, the import was successful. 66 67 00:06:39,700 --> 00:06:45,730 Now because I have an irrational love of data types, let's check out what the type of this thing called 67 68 00:06:45,730 --> 00:06:47,750 life actually is. 68 69 00:06:47,760 --> 00:06:54,100 Going to write type and put life between the parentheses and hit Shift+Enter. Aha! 69 70 00:06:54,250 --> 00:07:02,530 So here we find out that our life.py file when we import it is called a module. 70 71 00:07:02,530 --> 00:07:04,870 Now here comes the really cool part. 71 72 00:07:04,870 --> 00:07:10,540 We're going to access the code inside our life module. Right, 72 73 00:07:10,600 --> 00:07:16,250 "life.theAnswer" and hit ShiftEnter. 73 74 00:07:16,330 --> 00:07:23,840 Now remember, what we see now is the value inside our the answer variable that is shown below the cell. 74 75 00:07:23,860 --> 00:07:25,410 How cool is that? 75 76 00:07:25,420 --> 00:07:27,340 I think this is super cool. 76 77 00:07:27,340 --> 00:07:30,280 It's like our Jupyter notebook has made a new friend. 77 78 00:07:30,640 --> 00:07:35,380 But let's take a second to think about what this line of code actually means. 78 79 00:07:35,380 --> 00:07:39,070 It literally means fetch the value of the name 79 80 00:07:39,190 --> 00:07:44,110 theAnswer that lives inside the module called life. 80 81 00:07:44,260 --> 00:07:46,940 The thing to note here is the Python syntax. 81 82 00:07:47,080 --> 00:07:54,580 We've accessed the variable inside that life.py file using the dot notation. We've accessed 82 83 00:07:54,580 --> 00:07:56,390 theAnswer by writing 83 84 00:07:56,470 --> 00:08:03,600 life.theAnswer and this notation will come up again and again as we progress throughout the course. 84 85 00:08:03,610 --> 00:08:10,420 Okay so now that we've just uploaded our own file and then imported it into our notebook. let's draw 85 86 00:08:10,420 --> 00:08:13,820 a link to the previous lessons. Now, 86 87 00:08:13,840 --> 00:08:15,250 you'll probably remember that to 87 88 00:08:15,250 --> 00:08:16,730 import pandas 88 89 00:08:16,850 --> 00:08:20,140 we didn't have to upload any .py files - we just wrote 89 90 00:08:20,320 --> 00:08:21,970 "import pandas". 90 91 00:08:21,970 --> 00:08:23,770 And then it just worked. 91 92 00:08:23,770 --> 00:08:26,800 How is it that we can actually do that? 92 93 00:08:26,800 --> 00:08:33,500 See I used to think it was magic but then sadly I discovered that it was not in fact magic. 93 94 00:08:33,550 --> 00:08:41,660 It turns out that when we installed Anaconda, a whole bunch of modules were already included in our installation. 94 95 00:08:42,100 --> 00:08:48,690 And this is why there was no need to upload any pandas.py to our project's folder. 95 96 00:08:48,790 --> 00:08:57,000 Notebook already knew about pandas and we could just write "import" pandas. But this should make you suspicious, 96 97 00:08:57,730 --> 00:08:58,040 right? 97 98 00:08:58,050 --> 00:09:02,160 It begs the question - what kind of other stuff does Jupyter already know about? 98 99 00:09:02,940 --> 00:09:09,150 So let's take a look together at what modules we have access to out of the box from installing Python 99 100 00:09:09,180 --> 00:09:14,010 Anaconda. We can get hold of this information in various places. 100 101 00:09:14,010 --> 00:09:17,930 The first place that we're going to check is the Anaconda Navigator. 101 102 00:09:18,840 --> 00:09:25,830 When you open the Anaconda Navigator, you'll notice that on the left you have this tab called "Environments". 102 103 00:09:25,830 --> 00:09:32,820 And when you click on it, under "root", you get a long, long list of all the modules that you have access 103 104 00:09:32,820 --> 00:09:42,830 to. We can even filter on installed versus not installed and we can also filter by name. 104 105 00:09:42,910 --> 00:09:48,450 So if I type in pandas I can see that there's a checkmark next to it. 105 106 00:09:49,010 --> 00:09:55,490 So pandas is one of the installed modules, which means that we can access it easily by writing an import 106 107 00:09:55,490 --> 00:09:56,750 statement. 107 108 00:09:56,750 --> 00:10:02,030 Now these filters in the Anaconda Navigator are actually super handy because we can quickly figure out 108 109 00:10:02,090 --> 00:10:09,500 if something was installed or not installed. But if we look closely at this window, you'll notice that 109 110 00:10:09,590 --> 00:10:18,690 it's not referring to the items in this list as modules. Instead, it's referring to them as packages. 110 111 00:10:18,690 --> 00:10:25,740 And it says I've got 195 packages available and this brings me to my next 111 112 00:10:25,740 --> 00:10:26,730 point. 112 113 00:10:26,820 --> 00:10:31,710 When we were writing our import statements and we looked at the type of thing that we imported it was 113 114 00:10:31,710 --> 00:10:33,240 called a module. 114 115 00:10:33,240 --> 00:10:38,550 However when people talk about pandas or scikit-learn they'll use another word, they'll use this word 115 116 00:10:38,550 --> 00:10:45,240 package, just like the Anaconda Navigator. Even on the Anaconda web site, they talk about Anaconda having 116 117 00:10:45,240 --> 00:10:53,150 1000+ data science packages. Since you and I both installed Anaconda on our own computers, 117 118 00:10:53,150 --> 00:11:01,310 we actually see what these packages actually look like because pandas has to live somewhere on our hard drive 118 119 00:11:01,340 --> 00:11:02,580 after, all right? 119 120 00:11:02,600 --> 00:11:10,820 I mean, don't tell the World Wildlife Fund, but you and I are not gonna go on a hunt for pandas. When we 120 121 00:11:10,820 --> 00:11:13,890 installed Anaconda on our own machine, 121 122 00:11:14,000 --> 00:11:17,230 it would have created a folder somewhere. On a Mac, 122 123 00:11:17,300 --> 00:11:25,750 that folder will be under your home folder, so we can go to "Go" > "Home" and then get take to our home folder. 123 124 00:11:26,710 --> 00:11:33,550 There we'll have a folder called "anaconda" and then inside the anaconda folder with all the anaconda related 124 125 00:11:33,550 --> 00:11:34,190 stuff, 125 126 00:11:34,300 --> 00:11:38,210 we have a folder called pkgs - pkgs, 126 127 00:11:38,230 --> 00:11:46,490 of course stands for packages. On Windows, that Anaconda folder will be hiding elsewhere. 127 128 00:11:46,680 --> 00:11:48,450 So you'll have to go to your local disk. 128 129 00:11:48,450 --> 00:11:55,350 This will probably be your C drive and then you'll have to go to Users then you'll have to go to your 129 130 00:11:55,350 --> 00:11:55,790 account. 130 131 00:11:56,550 --> 00:12:04,690 So mine's called "P.D." and inside there you'll have a folder called Anaconda3. Once inside your Anaconda 131 132 00:12:04,690 --> 00:12:05,340 folder, 132 133 00:12:05,440 --> 00:12:08,460 you're gonna have to venture deeper to find the pandas. 133 134 00:12:08,800 --> 00:12:15,730 You've got to open your lib folder and then you'll have to open the site-packages folder and there you're 134 135 00:12:15,730 --> 00:12:21,460 going to get the list of all the packages that came with Anaconda once you installed it. 135 136 00:12:21,460 --> 00:12:27,390 Now this is exactly the same list that you saw in the Anaconda Navigator. And looking at this list, you'll 136 137 00:12:27,390 --> 00:12:33,310 probably realise that even though the Anaconda team brags about 1000+ packages in Anaconda, we're 137 138 00:12:33,310 --> 00:12:39,680 actually super lucky that by default it doesn't install all of them on the machine from the get go. 138 139 00:12:39,700 --> 00:12:46,520 So now that we've successfully hunted down our pandas folder, let's return to the original question - why 139 140 00:12:46,520 --> 00:12:50,120 is pandas referred to as a package? 140 141 00:12:50,130 --> 00:12:57,610 Well, as you can see when we look inside the pandas directory, pandas is not a single .py file. 141 142 00:12:57,690 --> 00:13:05,320 Instead, pandas is actually a whole bunch of .py files that are grouped together into a single bundle. 142 143 00:13:05,340 --> 00:13:09,670 In other words, a package is just a collection of files. 143 144 00:13:09,810 --> 00:13:15,810 And when we refer to pandas as a package, we are referring to this bundle of files that is living on 144 145 00:13:15,810 --> 00:13:23,130 our computer. Because, after all, the sad fact of life is that there just isn't any magic. 145 146 00:13:23,130 --> 00:13:30,220 And, to be honest, we don't actually deeply care about what all of these individual files do exactly. 146 147 00:13:30,390 --> 00:13:32,070 That's all under the hood. 147 148 00:13:32,070 --> 00:13:38,160 But I wanted to show it to you guys nonetheless. At the end of the day all we need is an easy way to 148 149 00:13:38,160 --> 00:13:42,390 refer to all of this code inside this pandas directory. 149 150 00:13:42,600 --> 00:13:49,740 And this brings us full circle to the Jupyter notebook, because our Python code with that one import 150 151 00:13:49,740 --> 00:13:57,750 statement allows us to gain access to all the files in that pandas directory with a single line of code. 151 152 00:13:58,620 --> 00:14:01,080 But this story doesn't end here. 152 153 00:14:01,110 --> 00:14:07,350 The packages that we downloaded as part of Anaconda aren't the only modules that we have access to actually. 153 154 00:14:08,130 --> 00:14:12,570 Some modules are built into the Python language itself. 154 155 00:14:12,570 --> 00:14:16,710 And these built in modules include everything on this list. 155 156 00:14:16,710 --> 00:14:19,600 This list is the so-called standard library. 156 157 00:14:19,830 --> 00:14:22,360 And there are over 200 modules here. 157 158 00:14:22,470 --> 00:14:26,200 So it's a pretty massive list. Scrolling around on here, 158 159 00:14:26,250 --> 00:14:30,340 we see that there's one module called math. 159 160 00:14:30,350 --> 00:14:37,440 Let's take a closer look at this math module. The math module is something that we can use to do all 160 161 00:14:37,440 --> 00:14:39,560 sorts of calculations. 161 162 00:14:39,690 --> 00:14:47,980 And this module actually has information on two things, namely the value of pi and the number E. 162 163 00:14:48,810 --> 00:14:56,640 So, as a challenge, as an exercise, can you import the math module into your Jupyter notebook; 163 164 00:14:56,640 --> 00:15:03,510 and after you've imported the math module can you write the Python code to print out the value of pi 164 165 00:15:03,900 --> 00:15:05,280 below the cell? 165 166 00:15:05,610 --> 00:15:11,480 And then, after that, write some more code to print out the value of e as well. 166 167 00:15:11,690 --> 00:15:19,130 I'll give you a few seconds to figure this out and pause the video. And here's the solution. 167 168 00:15:19,130 --> 00:15:27,260 And just like we've imported pandas we simply write "import math". And to show the value of pi below the 168 169 00:15:27,260 --> 00:15:28,630 cell, 169 170 00:15:28,640 --> 00:15:36,830 I can write "math.pi", hit Shift+Enter and there it is. There's the value of pi as per the math module. 170 171 00:15:37,970 --> 00:15:45,230 And looking at the official Python documentation, we can see that we can access the value of e using 171 172 00:15:45,380 --> 00:15:49,320 this code right here. In our Jupyter notebook, 172 173 00:15:49,330 --> 00:15:55,030 this means that I can simply write math.e and hit Shift+Enter to complete the final part of the 173 174 00:15:55,030 --> 00:15:56,110 challenge. 174 175 00:15:56,260 --> 00:15:58,000 So that's pretty cool, right? 175 176 00:15:58,000 --> 00:16:05,320 Using the module and the dot notation we can access any of the variables inside the module through their 176 177 00:16:05,320 --> 00:16:06,490 name. 177 178 00:16:06,490 --> 00:16:13,810 This is exactly the same syntax that we used when we uploaded our own .py file and accessed the 178 179 00:16:13,810 --> 00:16:16,450 answer by writing "life. 179 180 00:16:16,450 --> 00:16:23,800 theAnswer". And this goes to show that we can access a lot of code from other modules like the math module 180 181 00:16:24,100 --> 00:16:26,890 using the same dot notation. 181 182 00:16:26,890 --> 00:16:32,620 Now let's take another look at this pandas import statement. Because this import statement that we wrote 182 183 00:16:32,620 --> 00:16:40,190 with pandas was a little more complex than the one we wrote with "import math" or "import life", the pandas 183 184 00:16:40,220 --> 00:16:49,280 import statement had something a little extra we wrote "import pandas as pd" and Jupyter is highlighting 184 185 00:16:49,400 --> 00:16:51,450 the as in green. 185 186 00:16:51,560 --> 00:16:58,750 So the syntax highlighting is telling us that the as in the line is a Python key word. 186 187 00:16:58,850 --> 00:16:59,950 So what does this 187 188 00:16:59,960 --> 00:17:01,860 "as pd" part do? 188 189 00:17:02,090 --> 00:17:03,800 Why is it in there? 189 190 00:17:03,800 --> 00:17:09,020 If you remember, we actually saw this pattern again with the other import statement that we used further 190 191 00:17:09,020 --> 00:17:14,920 down "import matplotlib.pyplot as plt". 191 192 00:17:15,140 --> 00:17:23,890 The answer is is that all this "as pd", "as plt" code is doing is giving an alias to the import. 192 193 00:17:24,020 --> 00:17:25,880 Why do we need an alias? 193 194 00:17:25,880 --> 00:17:29,570 Well, when you think about it the module math only has four letters. 194 195 00:17:29,570 --> 00:17:38,030 It's not a lot of typing it's pretty short, but this module matplotlib.pyplot has about 17 characters. 195 196 00:17:39,020 --> 00:17:45,980 And the thing you'll learn pretty quickly about programmers is that programmers are lazy, and a programmer 196 197 00:17:45,980 --> 00:17:50,520 doesn't want to type out 17 characters every time they're referring to their plot. 197 198 00:17:50,630 --> 00:17:55,950 So instead we wrote "import matplolib.pyplot as plt". 198 199 00:17:56,150 --> 00:18:00,560 This created a variable called "plt" which referenced our 199 200 00:18:00,560 --> 00:18:09,470 matplotlib.pyplot and allowed us to use a short name instead; plt became the alias for 200 201 00:18:09,470 --> 00:18:12,620 matplotlib.pyplot. And, 201 202 00:18:12,710 --> 00:18:16,430 we didn't have to call it plt by the way. We could have called it anything we wanted to. 202 203 00:18:16,460 --> 00:18:17,360 We could have called it 203 204 00:18:17,430 --> 00:18:23,660 Jennifer. But you'll pretty quickly discover that there are certain conventions in Python programming 204 205 00:18:23,660 --> 00:18:26,750 and people like to stick to convention. 205 206 00:18:26,840 --> 00:18:29,880 For example pandas is usually given the alias pd, 206 207 00:18:29,900 --> 00:18:32,920 for some reason, don't ask me why. 207 208 00:18:33,020 --> 00:18:38,630 So now that we've talked a little bit about this as keyword to create an alias let's put it into practice 208 209 00:18:38,660 --> 00:18:41,000 so that it really sinks into memory. 209 210 00:18:41,090 --> 00:18:48,130 We're gonna use this as keyword with our life module because we can give that module an arbitrary alias 210 211 00:18:48,130 --> 00:18:54,850 too. Check this out if we write "import life as hitchhikersGuide" 211 212 00:19:00,480 --> 00:19:01,590 and hit Shift+Enter, 212 213 00:19:02,760 --> 00:19:05,340 then we also have to update the cell below. 213 214 00:19:05,970 --> 00:19:12,060 And the reason is that we can no longer refer to this module as life because, well, we've given it an 214 215 00:19:12,060 --> 00:19:19,470 alias. If we were to re-evaluate this entire notebook by going to "Kernel" and going to "Restart & Run 215 216 00:19:19,470 --> 00:19:28,700 All" and scrolling down we see that we can no longer refer to our module by the name of life. 216 217 00:19:28,820 --> 00:19:32,050 We've given life an alias, so we have to use that alias. 217 218 00:19:32,060 --> 00:19:41,620 We can no longer use the other name, meaning we have to replace this with hitchhikersGuide. 218 219 00:19:42,100 --> 00:19:45,260 And if I hit Shift+Enter now, this arrow disappears. 219 220 00:19:45,440 --> 00:19:47,450 The same is true with the cell below. 220 221 00:19:47,660 --> 00:19:52,300 Because our program crashed, these cells were actually never evaluated. 221 222 00:19:52,400 --> 00:19:59,750 If I hit Shift+Enter here again, we get this name error and we have to replace life.theAnswer 222 223 00:19:59,750 --> 00:20:05,150 with hitchhikersGuide.theAnswer. 223 224 00:20:05,440 --> 00:20:11,880 Now our Python code will run without crashing. So I hope this exercise made it a bit more clear as to 224 225 00:20:11,880 --> 00:20:14,670 how this as keyword works in practice. 225 226 00:20:14,670 --> 00:20:21,240 Once we have an alias, we have to use the name of the alias to refer to the module. 226 227 00:20:21,240 --> 00:20:28,240 Now let's take another look at the final import statement from the previous lessons. This import statement 227 228 00:20:28,240 --> 00:20:32,220 read "from sklearn.linear_model 228 229 00:20:32,320 --> 00:20:40,930 import LinearRegression" and the syntax highlighting here is telling us that this from is also a Python 229 230 00:20:40,930 --> 00:20:42,020 keyword. 230 231 00:20:42,370 --> 00:20:51,640 So without me showing you this in the video right away, can you as a challenge structure our life.py 231 232 00:20:51,640 --> 00:20:56,840 module to import the answer using the from keyword? 232 233 00:20:56,950 --> 00:20:58,810 I'll give you a few seconds to pause the video. 233 234 00:21:01,810 --> 00:21:03,330 And here's the solution. 234 235 00:21:03,370 --> 00:21:07,800 The pattern is exactly the same - we would use "from life 235 236 00:21:08,050 --> 00:21:17,770 import the answer" and hit Shift+Enter. Now a very, very reasonable question that you will probably want 236 237 00:21:17,770 --> 00:21:23,350 to ask at this point is "How is this different from the other import statements?" 237 238 00:21:23,350 --> 00:21:26,440 I mean, why are there three ways of importing things? 238 239 00:21:26,440 --> 00:21:29,330 Can we just keep things simple? 239 240 00:21:29,540 --> 00:21:32,410 And the answer to that is "Well yeah". 240 241 00:21:32,440 --> 00:21:36,460 So the three import statements are all doing very, very similar things. 241 242 00:21:36,640 --> 00:21:42,580 And the reason I'm showing you these three ways of doing things is because people out there in the wild 242 243 00:21:42,880 --> 00:21:48,830 are using all these three ways of writing their own Python code. But there's also a subtle difference 243 244 00:21:48,830 --> 00:21:54,830 with this from keyword and that difference to the other import statements that we've used so far is 244 245 00:21:54,830 --> 00:22:01,530 that in this case we're copying out the answer from our life.py file. 245 246 00:22:01,610 --> 00:22:08,300 The answer is now its own variable inside our Jupyter notebook, because we've just made a copy of it 246 247 00:22:08,810 --> 00:22:11,230 and this means we can do stuff with it. 247 248 00:22:11,270 --> 00:22:15,750 For example, we can create another variable and set it equal to theAnswer. 248 249 00:22:15,800 --> 00:22:17,390 So if I write "myFavouriteNumber 249 250 00:22:24,870 --> 00:22:30,770 = theAnswer" and then print out myFavouriteNumber 250 251 00:22:35,880 --> 00:22:42,440 and hit Shift+Enter, it prints out 42. But I can also manipulate the answer itself. 251 252 00:22:42,480 --> 00:22:51,810 So if I write "theAnswer = theAnswer+1", hit Shift+Enter and then print out the value, 252 253 00:22:58,530 --> 00:23:05,670 we can see that this variable now updated to 43. The value, by the way, inside the .py file is still 253 254 00:23:05,670 --> 00:23:15,490 unchanged, but the copy that we've made inside this notebook has updated to 43. So, in summary, the from keyword 254 255 00:23:15,760 --> 00:23:22,210 copies out the answer from our life module and makes it accessible as a variable inside our current 255 256 00:23:22,210 --> 00:23:23,180 notebook. 256 257 00:23:23,410 --> 00:23:29,020 And this is different from the other import statements where this doesn't happen, because if I scroll 257 258 00:23:29,020 --> 00:23:36,520 up and we look at the cell up here we have to access the answer by writing "hitchhikersGuide.the 258 259 00:23:36,520 --> 00:23:37,240 Answer". 259 260 00:23:37,270 --> 00:23:42,740 So we have to use this dot notation to get hold of our variable. 260 261 00:23:42,790 --> 00:23:44,500 Now let's compare this "from life 261 262 00:23:44,500 --> 00:23:49,570 import theAnswer" and "myFavouriteNumber = theAnswer". 262 263 00:23:49,570 --> 00:23:52,570 Let's put this in context of the code that we've written previously. 263 264 00:23:53,620 --> 00:24:00,340 When we were estimating the movie revenues we had "from sklearn import LinearRegression" and then we 264 265 00:24:00,340 --> 00:24:04,960 set regr to LinearRegression. 265 266 00:24:04,960 --> 00:24:11,080 In other words what we did was we copied an object called LinearRegression over into this notebook 266 267 00:24:11,560 --> 00:24:18,850 and then because it was too long to type every time we stored linear regression in a variable with a 267 268 00:24:18,850 --> 00:24:22,300 short name, namely rigour. 268 269 00:24:22,300 --> 00:24:25,280 Now I also want to come full circle. 269 270 00:24:25,360 --> 00:24:31,090 This is a really good point to connect the dots between the lesson on data types, 270 271 00:24:31,090 --> 00:24:35,320 the lesson on collections and this lesson on modules. 271 272 00:24:35,500 --> 00:24:41,140 Remember when we check the data type for our regression intercept and when we read the full name of 272 273 00:24:41,200 --> 00:24:42,850 the data frame. 273 274 00:24:42,850 --> 00:24:44,250 Well, guess what? 274 275 00:24:44,260 --> 00:24:50,920 Looking at the full name of the data types, you can actually see the module name where this data type 275 276 00:24:50,920 --> 00:24:52,610 came from. 276 277 00:24:52,620 --> 00:25:00,510 So for example, the data frame came from the pandas module and our ndarray came from a module called 277 278 00:25:00,630 --> 00:25:02,130 numpy. 278 279 00:25:02,130 --> 00:25:08,740 In other words numpy was the module, where this ndarray was defined. 279 280 00:25:08,880 --> 00:25:14,460 Okay, so we're slowly coming to the end of this lesson on modules, packages and imports and we've covered 280 281 00:25:14,490 --> 00:25:16,800 quite a few important concepts here. 281 282 00:25:17,040 --> 00:25:20,170 So let's review them before we move on. 282 283 00:25:20,170 --> 00:25:29,380 For starters we learned that Python scripts live inside files ending in .py and we also saw how 283 284 00:25:29,380 --> 00:25:35,710 we can import existing code as a Python module into our Jupyter notebook projects using these import 284 285 00:25:35,710 --> 00:25:36,680 statements. 285 286 00:25:36,730 --> 00:25:40,970 We saw that there's a couple of different ways we can import a module. 286 287 00:25:41,050 --> 00:25:45,140 The simplest way being just to write "import" and then the name of the module. 287 288 00:25:45,190 --> 00:25:52,470 For example, "import math". The next level up was using the "as" keyword in the import statement. 288 289 00:25:52,470 --> 00:25:56,400 So in this case we import the module but we create an alias - 289 290 00:25:56,400 --> 00:26:03,770 we create a new name to give the module, and we can use this name to refer to it later on in the code. 290 291 00:26:03,780 --> 00:26:08,310 This is what we did when we wrote "import life as hitchhikersGuide". 291 292 00:26:08,700 --> 00:26:13,710 Every time we wanted to access the code inside one of these modules we had the module name, 292 293 00:26:14,370 --> 00:26:20,310 then the dot, and then whatever we wanted to access inside the module. 293 294 00:26:20,310 --> 00:26:27,820 In this case we had hitchhikersGuide.theAnswer and later on we had math.pi. 294 295 00:26:28,710 --> 00:26:37,170 Finally, we saw how we can use the from keyword to copy out a specific piece of code from another module. 295 296 00:26:37,170 --> 00:26:39,810 And this is what we did with "from life 296 297 00:26:39,810 --> 00:26:49,140 import the answer" and this is also what we did previously with "from sklearn import LinearRegression". 297 298 00:26:49,290 --> 00:26:53,950 So at this point you might be asking yourself - why does any of this matter? 298 299 00:26:53,970 --> 00:26:55,290 Why is this a big deal? 299 300 00:26:56,570 --> 00:27:01,940 And I suspect that you're probably already starting to see the power of these import statements and 300 301 00:27:01,940 --> 00:27:09,530 how they're a massive game changer. Because by importing a module we can use other people's code 301 302 00:27:09,650 --> 00:27:13,030 and this saves us the work from having to write the code ourselves. 302 303 00:27:14,160 --> 00:27:19,560 In the previous lessons we accessed the linear regression capability through a module called "scikit- 303 304 00:27:19,590 --> 00:27:20,610 learn". 304 305 00:27:20,910 --> 00:27:25,410 We didn't actually have to write the code that implements the specifics of linear regression. 305 306 00:27:25,440 --> 00:27:28,890 All we did was import the functionality and use it. 306 307 00:27:29,750 --> 00:27:35,720 The people who created scikit-learn essentially created a reusable software component - their linear 307 308 00:27:35,720 --> 00:27:41,090 regression code can be imported and used by everyone around the world for free. 308 309 00:27:41,150 --> 00:27:43,970 So hats off to the authors of scikit-learn. 309 310 00:27:44,090 --> 00:27:49,760 These guys and girls worked their butts off to create the code that we are using every time we import 310 311 00:27:49,840 --> 00:27:54,910 scikit-learn. When we imported our life.py file, 311 312 00:27:54,940 --> 00:27:57,590 well we only had a single line of code in it. 312 313 00:27:57,910 --> 00:28:05,740 In contrast check out the code that we imported when we wrote "from sklearn import LinearRegression". 313 314 00:28:05,740 --> 00:28:12,370 This code is actually open source so you can look at it and read it on github.com or you can actually 314 315 00:28:12,430 --> 00:28:18,120 open the .py file itself from the scikit-learn package on your computer. 315 316 00:28:18,460 --> 00:28:25,300 Now this file alone has about 500 lines of blood, sweat and tears and that's just a single .py file 316 317 00:28:25,390 --> 00:28:28,270 inside the sklearn package. 317 318 00:28:28,270 --> 00:28:34,620 In other words there's loads and loads more files just like this that are part of scikit-learn. I guess 318 319 00:28:34,620 --> 00:28:38,850 the reason that I've climbed onto this soapbox is that I want to get this idea across that we can do 319 320 00:28:38,910 --> 00:28:46,470 all this cool stuff so effortlessly because we're standing on the shoulders of giants. And on that note, 320 321 00:28:46,800 --> 00:28:53,070 let's set the stage for analyzing the effects of drugs on math test scores and let's import our plotting 321 322 00:28:53,070 --> 00:29:02,650 module matplotlib and our linear regression to draw our best fit line. Let's write "import matplotlib 322 323 00:29:02,650 --> 00:29:19,330 .pyplot as plt" and "from sklearn.linear_model import LinearRegression" and 323 324 00:29:19,330 --> 00:29:21,190 then hit Shift+Enter. 324 325 00:29:21,370 --> 00:29:28,710 No need to import pandas by the way we've already imported it as part of a previous lesson. So I'll see 325 326 00:29:28,710 --> 00:29:31,760 you in the next tutorial. Upwards and onwards!