0 1 00:00:00,510 --> 00:00:04,400 In the next couple of lessons I want to show you two things. 1 2 00:00:04,440 --> 00:00:11,520 First I want to show you how to install third party modules and packages that don't come bundled with 2 3 00:00:11,520 --> 00:00:14,370 the Anaconda distribution that we're running. 3 4 00:00:14,610 --> 00:00:20,640 But more importantly, I want to show you a data visualization technique for giving prominence to words 4 5 00:00:20,730 --> 00:00:29,980 that appear more frequently in a piece of text, namely the word cloud. A word cloud or a tag cloud as 5 6 00:00:29,980 --> 00:00:33,420 it is also called looks like this. 6 7 00:00:33,520 --> 00:00:39,070 The idea behind a word cloud is that the words that are larger appear more frequently than those that 7 8 00:00:39,070 --> 00:00:40,740 are smaller. Here, 8 9 00:00:40,900 --> 00:00:46,240 I've used the Wikipedia article on machine learning to generate this particular word cloud. 9 10 00:00:46,240 --> 00:00:49,690 Notice how the larger tags catch your attention. 10 11 00:00:49,810 --> 00:00:56,530 You'll probably find that you're not actually reading all the words in this visualization at all. 11 12 00:00:56,530 --> 00:01:00,850 Instead people just tend to scan these word clouds. 12 13 00:01:00,850 --> 00:01:03,420 So, why use this technique? 13 14 00:01:03,460 --> 00:01:07,390 It's not scientific and it's not accurate. 14 15 00:01:07,540 --> 00:01:16,090 True, but more often than not, you need to grab people's attention and a word cloud can do this rather 15 16 00:01:16,090 --> 00:01:16,840 well. 16 17 00:01:17,080 --> 00:01:19,810 It's a little bit of a novel visualization. 17 18 00:01:19,810 --> 00:01:26,820 It's a bit creative and it communicates the relative importance of the words in the dataset very quickly. 18 19 00:01:27,220 --> 00:01:34,360 After all it's pretty easy to see that a word that's twice the size of another word is probably twice 19 20 00:01:34,360 --> 00:01:35,360 as important. 20 21 00:01:35,560 --> 00:01:39,610 And in our case it appears twice as often. 21 22 00:01:39,700 --> 00:01:45,670 Speaking of grabbing people's attention, you can get even more creative with the word cloud by applying 22 23 00:01:45,760 --> 00:01:49,110 a mask to the image. Here, 23 24 00:01:49,240 --> 00:01:54,360 I've used a circle as a mask, but you could use another image instead. 24 25 00:01:54,400 --> 00:02:01,540 For example, would you like to venture a guess as to what data I've used to create this word cloud in 25 26 00:02:01,540 --> 00:02:02,980 the shape of a whale? 26 27 00:02:03,340 --> 00:02:12,010 I actually used Herman Melville's novel, Moby Dick, and here I've used the data from our spam emails, and 27 28 00:02:12,010 --> 00:02:15,440 here I've used Shakespeare's play Hamlet. 28 29 00:02:15,460 --> 00:02:20,650 Coming up, I'll be showing you how to create all of these three visualizations. 29 30 00:02:20,650 --> 00:02:24,980 That way you can apply this technique to your future projects. 30 31 00:02:25,000 --> 00:02:31,360 I've seen a lot of people use word clouds on reports and presentations that say contain customer feedback 31 32 00:02:31,720 --> 00:02:38,620 or metadata like tags or SEO keywords that will help you rank on Google or really any situation where 32 33 00:02:38,620 --> 00:02:44,560 you quickly need to show how people feel about your product or your business. 33 34 00:02:44,560 --> 00:02:49,080 These are the kind of applications where word clouds really shine. 34 35 00:02:49,100 --> 00:02:49,500 All right. 35 36 00:02:49,510 --> 00:02:55,890 So how do we create a word cloud in Python? Because we're not gonna do it from scratch. 36 37 00:02:55,930 --> 00:03:00,230 We will use an existing Python package. 37 38 00:03:00,230 --> 00:03:06,790 Now the fantastic thing about the Anaconda distribution is that it comes bundled with lots and lots 38 39 00:03:06,790 --> 00:03:07,940 of different packages 39 40 00:03:07,960 --> 00:03:11,900 straight out of the box. These included numpy, pandas, 40 41 00:03:11,960 --> 00:03:15,310 matplotlib and many others that we've already used. 41 42 00:03:15,670 --> 00:03:22,450 However, the Python ecosystem is actually huge and you'll often find that you'll want to make use of 42 43 00:03:22,450 --> 00:03:26,390 a package that's not installed by default. 43 44 00:03:26,460 --> 00:03:34,140 In that case, you'll need to know how to download and install a custom package to add to your collection. 44 45 00:03:34,500 --> 00:03:41,310 For that, you'll be using the terminal if you're running a Mac or the Anaconda prompt on Windows. 45 46 00:03:41,310 --> 00:03:46,410 Both of these kind of look like this and this is where you can execute many different types of commands. 46 47 00:03:47,070 --> 00:03:52,760 The command that we'll be looking at in detail is called "conda install" 47 48 00:03:52,780 --> 00:04:00,480 and the easiest way to install a package is just to type "conda install" followed by the package name. 48 49 00:04:00,480 --> 00:04:08,810 However, this won't always work and it actually won't work in the case of our word cloud package. 49 50 00:04:08,970 --> 00:04:16,020 And the reason is is that we also have to specify something called a channel. A channel is the location 50 51 00:04:16,320 --> 00:04:19,630 where conda will look for the package. 51 52 00:04:20,040 --> 00:04:25,920 If you just write "conda install" and then the package name, then conda will search in a default location 52 53 00:04:26,190 --> 00:04:29,370 which may or may not contain the package that you're looking for. 53 54 00:04:29,730 --> 00:04:36,090 If you need to specify a particular channel or you know that the package exists, then all you have to 54 55 00:04:36,090 --> 00:04:45,260 do is follow "conda install" with "-c" and then the channel name. This syntax, the "-c" works just 55 56 00:04:45,260 --> 00:04:48,190 like you're supplying an argument to a Python function. 56 57 00:04:49,270 --> 00:04:57,610 In our case, the wordcloud package is located in a channel called "conda-forge" and therefore the code 57 58 00:04:57,820 --> 00:05:06,220 that we will be typing into our prompt will read "conda install -c conda-forge wordcloud". 58 59 00:05:06,250 --> 00:05:12,170 Let me quickly show you how to do this on Mac and then I'll show you what you'll see on a Windows machine. 59 60 00:05:12,960 --> 00:05:17,220 To see the existing packages that are already installed 60 61 00:05:17,220 --> 00:05:25,210 fire up your Anaconda navigator and then click on Environments. There you can search for the existing 61 62 00:05:25,210 --> 00:05:28,760 packages that are already installed on your computer. 62 63 00:05:28,900 --> 00:05:34,450 The package that we're looking for is called "wordcloud", all in one word. 63 64 00:05:34,450 --> 00:05:40,130 Now since I can't find it, I'm going to open my terminal. There 64 65 00:05:40,210 --> 00:05:48,460 I'm going to type "conda install wordcloud", but just as we've discussed earlier for the wordcloud package 65 66 00:05:48,640 --> 00:05:56,320 we actually need to specify a channel, we need to specify a location because it's not found in the default 66 67 00:05:56,320 --> 00:05:59,230 location where Conda is looking. 67 68 00:05:59,230 --> 00:06:07,750 Instead we need to type "conda install -c conda-forge wordcloud" and if you don't make 68 69 00:06:07,780 --> 00:06:13,990 any typos, then you should see your computer starting to do some work. Prior to going ahead with the installation 69 70 00:06:14,350 --> 00:06:21,190 you may have to confirm that you actually want to do this and then you have to type "y" for yes to proceed. 70 71 00:06:21,790 --> 00:06:24,540 At this point a download will start 71 72 00:06:24,790 --> 00:06:29,860 and once the download is finished, you'll see something like "Preparing transaction", "Verifying transaction", 72 73 00:06:30,250 --> 00:06:31,870 "Executing transaction". 73 74 00:06:31,870 --> 00:06:33,130 *Drum roll* 74 75 00:06:33,130 --> 00:06:37,230 And then you're good to go. Once you see these messages 75 76 00:06:37,230 --> 00:06:45,170 I encourage you to fire up Finder, go to the Anaconda folder, then go to the packages folder and then 76 77 00:06:45,170 --> 00:06:48,240 find the wordcloud subfolder there. 77 78 00:06:48,260 --> 00:06:53,720 This is the package that you've just installed. Back in the Anaconda Navigator, 78 79 00:06:53,810 --> 00:06:58,430 you can now type in "wordcloud" into the search bar and it should pop up. 79 80 00:06:58,490 --> 00:07:04,370 Mind you, I actually had to restart my Anaconda Navigator for it to pick this up. And that pretty much 80 81 00:07:04,370 --> 00:07:07,570 concludes the setup for Mac. On Windows, 81 82 00:07:07,580 --> 00:07:14,420 the story is pretty much the same. If you fire up the Anaconda Navigator and search for "wordcloud" it 82 83 00:07:14,420 --> 00:07:18,440 doesn't appear. To bring up the Anaconda prompt, 83 84 00:07:18,560 --> 00:07:24,240 all you need to do is click on that little place symbol and then say "Open Terminal". 84 85 00:07:24,340 --> 00:07:31,870 This will fire up the Anaconda prompt where you can type in "conda install -c conda-forge 85 86 00:07:32,950 --> 00:07:34,340 wordcloud". 86 87 00:07:34,530 --> 00:07:40,900 Now depending on the speed of your machine, this might take a little while, but you can't go away and grab 87 88 00:07:40,900 --> 00:07:48,280 a coffee just yet, because once again you need to type in "y" to confirm and proceed. After you've done 88 89 00:07:48,280 --> 00:07:50,230 that a download will start 89 90 00:07:50,460 --> 00:07:56,470 and I'm actually fast forwarding my video here because this took a little bit longer on this machine, 90 91 00:07:57,040 --> 00:08:02,740 but at the end you'll see the very same messages that we saw on Mac - "Preparing transaction", "Verifying 91 92 00:08:02,740 --> 00:08:05,290 transaction", "Executing transaction". 92 93 00:08:05,290 --> 00:08:06,060 *Drum roll* 93 94 00:08:06,700 --> 00:08:07,810 And that's it. 94 95 00:08:08,620 --> 00:08:15,160 If you'd like to see where these packages were actually installed on your computer, go to your user directory, 95 96 00:08:15,160 --> 00:08:17,240 so mine's called "P.D" 96 97 00:08:17,470 --> 00:08:20,880 under Users, but yours will have your own name of course. 97 98 00:08:21,040 --> 00:08:22,510 Then open Anaconda 98 99 00:08:22,510 --> 00:08:27,490 3, then go to the packages folder and there, 99 100 00:08:27,600 --> 00:08:32,850 if you scroll further down you should see the wordcloud package along with the others that start with 100 101 00:08:32,850 --> 00:08:36,000 "w". Back in the Anaconda Navigator, 101 102 00:08:36,000 --> 00:08:40,260 you won't necessarily find the wordcloud package listed already, 102 103 00:08:40,260 --> 00:08:46,290 you might have to actually quit and restart this thing, because only then does it seem to pick it up. 103 104 00:08:46,560 --> 00:08:51,390 Now you're pretty much all set to install any package you come across in the future. 104 105 00:08:51,690 --> 00:08:59,430 All you need to do after all is type "conda install", then the channel where the package is located and 105 106 00:08:59,430 --> 00:09:04,380 then the package name and that's it. In the next lesson, 106 107 00:09:04,380 --> 00:09:08,550 it's time to write some Python code and actually create these word clouds. 107 108 00:09:08,550 --> 00:09:11,180 So I hope you're looking forward to that as much as I am. 108 109 00:09:11,190 --> 00:09:12,320 I'll see you there.