1 00:00:04,790 --> 00:00:06,550 Beautiful set of software. 2 00:00:06,850 --> 00:00:07,600 Hello, everyone. 3 00:00:07,630 --> 00:00:13,120 Today, we're going to talk about the beautiful set package and what it means. 4 00:00:13,120 --> 00:00:18,210 So I will show you quite a lot of examples so you can practice this specific package. 5 00:00:18,580 --> 00:00:21,850 But let's first talk about what this package is used for. 6 00:00:22,330 --> 00:00:30,940 So beautiful setup is a very easy and simplified way to search data within the HTML file. 7 00:00:31,480 --> 00:00:35,710 You can use many different types of criteria's in order to do that. 8 00:00:35,860 --> 00:00:38,830 And as you can see, I it in front of you, some of them. 9 00:00:39,130 --> 00:00:47,920 So you can use the structure code, deal em to search into the XML file elements, you can search for 10 00:00:47,920 --> 00:00:50,680 selectors and you can search through tax. 11 00:00:50,830 --> 00:00:56,570 And actually, if you don't understand what these elements are later on, I will create a different 12 00:00:56,570 --> 00:00:59,830 lecture's for each of these elements and you will be able to practice them. 13 00:01:00,130 --> 00:01:02,800 And this is not everything about the beautiful setup. 14 00:01:03,100 --> 00:01:09,380 With this two, you're actually able to investigate a lot of files just by using Python. 15 00:01:09,400 --> 00:01:16,470 So obviously, since it's a python, of course, the beautiful soap is a python, too. 16 00:01:16,900 --> 00:01:26,110 So this means you can search within the XML, HTML and Jason files just by using Python from the terminal 17 00:01:26,110 --> 00:01:31,330 or by writing a script and implemented to get into Python file. 18 00:01:31,480 --> 00:01:33,730 So let me make something clear here. 19 00:01:34,130 --> 00:01:39,400 The beautiful soap does not intend weap scrapping. 20 00:01:39,580 --> 00:01:49,480 It is actually a tool that is specifically for providing an interface to assist in a very simple way, 21 00:01:49,480 --> 00:01:54,020 any Web page into HTML or other files. 22 00:01:54,280 --> 00:02:02,170 So with this too, you're basically preparing your file for the scrapping, which happens after that. 23 00:02:02,170 --> 00:02:04,090 And we're going to talk about this as well. 24 00:02:04,480 --> 00:02:12,670 So the main features of the beautiful soul are to instruct information of HTML documents. 25 00:02:13,180 --> 00:02:18,220 It also enables you to pass HTML, but also XML files. 26 00:02:18,550 --> 00:02:25,710 And it has a very interesting feature which allows you to structure your pages or document into a tree. 27 00:02:25,720 --> 00:02:29,540 And we're also going to look and this this one as well. 28 00:02:30,010 --> 00:02:39,520 And finally, with the information that the Google so extracted, you can actually operate it very easily 29 00:02:40,240 --> 00:02:46,100 by the advanced search links and tax into that file using the beautiful soul. 30 00:02:46,570 --> 00:02:53,560 So guys, usually in order to install specific features for that from the beautiful shop, you can go 31 00:02:53,560 --> 00:02:55,480 to this Web page here that you can see. 32 00:02:56,230 --> 00:03:00,590 However, when dealing with Python, obviously you can simply stories from the terminal. 33 00:03:01,000 --> 00:03:04,080 So for that reason, let me open a terminal. 34 00:03:04,270 --> 00:03:10,030 And as you can see, I have saved the terminal from the last lecturer's, which is quite nice. 35 00:03:10,250 --> 00:03:14,940 And here I'm in the folder called Section four, because now this is Section four. 36 00:03:15,250 --> 00:03:19,840 So here's how we can install the beautiful soap and I will pipe. 37 00:03:20,820 --> 00:03:26,860 Install beautiful soap, three or less to four. 38 00:03:27,660 --> 00:03:32,010 OK, so once I do that, as you can see already here, it installed. 39 00:03:32,010 --> 00:03:38,430 So it will not just on my computer, but once you do it, it will actually start historic inside your 40 00:03:38,430 --> 00:03:38,890 machine. 41 00:03:39,600 --> 00:03:41,420 So this is the way that you do it from the term. 42 00:03:41,430 --> 00:03:44,550 No way more simple than the one warning get from the website. 43 00:03:44,820 --> 00:03:50,660 So in order to start operating with the beautiful soap, let's go to Python mode. 44 00:03:50,920 --> 00:03:58,920 OK, and if you remember from the last video, we added some comments in order to create and operate 45 00:03:58,920 --> 00:04:01,850 with HTML file of a website. 46 00:04:02,070 --> 00:04:04,950 So let's do the same thing here. 47 00:04:04,950 --> 00:04:07,850 And I will just copy and paste what we already have. 48 00:04:08,460 --> 00:04:17,130 So import resources, sorry, import requests my bed and then we'll get the request from this website 49 00:04:17,130 --> 00:04:20,100 here that you should already have in your computer. 50 00:04:20,580 --> 00:04:29,880 And then we'll again import the military to import the HTML and who are the root of the HTML. 51 00:04:30,180 --> 00:04:37,730 So after you perform those comments now we can actually import the beautiful soap into our code. 52 00:04:37,890 --> 00:04:44,310 So I will do from these four import beautiful soap. 53 00:04:44,570 --> 00:04:49,010 OK, so here is how important this package into your Python file. 54 00:04:49,350 --> 00:04:53,130 So when he took, you can see that there are no errors and we can continue. 55 00:04:53,400 --> 00:04:59,100 So after that, let's create a variable and this variable, of course, will be over type beautiful 56 00:04:59,100 --> 00:04:59,480 soap. 57 00:04:59,490 --> 00:05:07,130 So I write beautiful soap in here and say you need to add the parameters of the website. 58 00:05:07,500 --> 00:05:16,290 So as you can see here, the request that website is stored in the variable response. 59 00:05:16,590 --> 00:05:23,740 So let's either break it in our right response dot content. 60 00:05:24,500 --> 00:05:28,610 OK, so here's how we refer to the content of the website. 61 00:05:28,890 --> 00:05:37,520 And after that, let's write commo and the document will be opened with the parser XML. 62 00:05:37,740 --> 00:05:42,350 So you can see how now we use that XML in the beautiful soul. 63 00:05:42,360 --> 00:05:46,450 And this is what I want for discourse, everything to gradual. 64 00:05:46,550 --> 00:05:51,270 So if you don't know that it works in no, you will not be able to use the beautiful soap. 65 00:05:51,510 --> 00:05:56,300 And this will continue in that way when we learn about this crap and later on. 66 00:05:56,310 --> 00:06:00,770 So if I hit enter here, OK, we can see that we get an error. 67 00:06:01,080 --> 00:06:03,860 So I think that was because of a spelling mistake. 68 00:06:04,110 --> 00:06:12,780 So let's write a response and dot com and and here, sweetie, and this should be in parenthesis, guys. 69 00:06:13,110 --> 00:06:20,700 Sorry for the quite a few errors, but I just came back from work, so my brain needs to change the 70 00:06:20,700 --> 00:06:23,430 mode for making courses more anyways. 71 00:06:24,240 --> 00:06:31,620 So here's how you add to the beautiful setup, for instance, of beautiful setup for the website that 72 00:06:31,620 --> 00:06:34,440 we executed, which is the Debian dot org. 73 00:06:34,800 --> 00:06:39,060 And here's how we created an instance of the beautiful social class. 74 00:06:39,600 --> 00:06:46,860 So now in this file, in the file, we have all the information needed in order to navigate into our 75 00:06:46,860 --> 00:06:50,580 document and to assess each level of the website. 76 00:06:51,000 --> 00:06:55,530 So here is how, guys, you can basically set up the beautiful. 77 00:06:55,540 --> 00:06:58,140 So thanks for watching. 78 00:06:58,160 --> 00:07:05,350 In the next video, we'll continue with this project with the B.S. document that I showed you. 79 00:07:05,550 --> 00:07:08,460 So today we learned how to obtain the document. 80 00:07:08,460 --> 00:07:14,490 But today, tomorrow we're going to start from here and I will show some pretty cool features of the 81 00:07:14,490 --> 00:07:15,930 beautiful social package. 82 00:07:16,290 --> 00:07:18,450 So that's, I think, very much for watching. 83 00:07:18,450 --> 00:07:20,130 And I'll see you in the next video.