1 00:00:05,020 --> 00:00:13,480 Searching with experts, hello, everyone, when dealing with examples in order to be able to check 2 00:00:13,480 --> 00:00:22,930 every element from an XML file, we need a cure or a tool called XPath, which and so it search every 3 00:00:22,930 --> 00:00:26,290 element inside the next e-mail file that we enable. 4 00:00:26,320 --> 00:00:30,910 So let me first open the pie chart because I just closed it. 5 00:00:31,210 --> 00:00:37,370 But actually the first thing to do is to work a little bit into our terminal. 6 00:00:37,540 --> 00:00:44,800 So actually, I will go back from the folder I am in and I'll go to Section four. 7 00:00:45,190 --> 00:00:47,110 So let's tie pattern here. 8 00:00:47,950 --> 00:00:53,850 So we go in python mode and let's do some operations with the experts to get. 9 00:00:54,020 --> 00:01:02,080 So if we first record what we call it already from the last time, so we have with what to work in this 10 00:01:02,080 --> 00:01:12,280 video, let's now write import requests that the response is, of course, and I actually copy because 11 00:01:12,280 --> 00:01:15,230 we already did that in the LAX in the last lecture. 12 00:01:15,760 --> 00:01:20,320 So here I'm going to paste the response from this website here. 13 00:01:21,670 --> 00:01:27,490 So it's information on the XML file is going to be stored in the response. 14 00:01:27,850 --> 00:01:35,830 OK, so here the root variable will contain the XML tree like we did last time. 15 00:01:35,830 --> 00:01:44,370 And let me just base these variables here so we have all the website setup enabled for this project. 16 00:01:44,380 --> 00:01:50,200 If you actually completed the other video and you didn't exit from Python mode, you actually be able 17 00:01:50,800 --> 00:01:53,470 to simply continue from where we finished. 18 00:01:54,230 --> 00:02:00,560 But since I had to exit this place overnight, I'm just reading the comments. 19 00:02:00,560 --> 00:02:01,390 Were it already? 20 00:02:01,690 --> 00:02:07,060 OK, so we're already here and Lamido now route dot x. 21 00:02:07,480 --> 00:02:16,060 But in in the break a separate body for this is the board over the website and you can see that we get 22 00:02:16,060 --> 00:02:19,770 Elaman body out and some others. 23 00:02:20,080 --> 00:02:27,730 Usually this is how in Python or actually in the whole world I represented the other races of the different 24 00:02:27,730 --> 00:02:28,310 websites. 25 00:02:28,690 --> 00:02:36,790 So this represents the simplest xpath desperate expression and it simply it simply searches for the 26 00:02:36,880 --> 00:02:40,850 children or the current element with a specific bug name. 27 00:02:41,290 --> 00:02:48,090 So here currently the specific element that we are calling is the root element of the website. 28 00:02:48,700 --> 00:02:58,540 And so because of the fact that the root element is on the top of the HTML element, for that reason, 29 00:02:59,410 --> 00:03:03,250 we're returning actually the body of the website. 30 00:03:03,770 --> 00:03:08,980 OK, so usno know the XML file is a three file. 31 00:03:09,010 --> 00:03:13,990 So for example, if I would like to say some of the elements, I can do the following. 32 00:03:14,200 --> 00:03:23,620 I can write root x part and then bodge slash div. 33 00:03:25,710 --> 00:03:32,730 That's it, and you can see that we are getting a few elements this time, and this is because we are 34 00:03:32,730 --> 00:03:37,680 getting the children body and the children as well. 35 00:03:37,910 --> 00:03:46,140 OK, so this shows that the expert experts tool can actually much, much pull elements, not only one, 36 00:03:46,470 --> 00:03:47,590 which is quite useful. 37 00:03:48,420 --> 00:03:59,130 Now let's try to not think so outright ruled that X part and then I will write a one for either one. 38 00:03:59,310 --> 00:04:00,690 And actually, you know what? 39 00:04:00,700 --> 00:04:05,760 Let me open the actual website so you can see what I'm talking about. 40 00:04:05,760 --> 00:04:08,500 Maybe it will be a little bit more intuitive for you. 41 00:04:08,940 --> 00:04:14,910 So here's the website and I will open the inspector here so you can actually see the code in front of 42 00:04:14,910 --> 00:04:15,110 you. 43 00:04:15,720 --> 00:04:20,330 And let me minimize that window, because the code is what we are interested in. 44 00:04:20,640 --> 00:04:28,830 So until until now, we the body was on the back of this and after that we can actually see the actual 45 00:04:28,830 --> 00:04:29,190 header. 46 00:04:29,190 --> 00:04:35,360 So if we double click here, you can see that we have our header number and so on. 47 00:04:35,730 --> 00:04:41,160 So let's see now the header and H1, it will be the header one. 48 00:04:41,340 --> 00:04:46,770 And you can see that also the header has a different tack or different element that is referred to when 49 00:04:46,770 --> 00:04:47,820 we're assessing the Web. 50 00:04:48,150 --> 00:04:50,820 OK, so also let me show you one more thing. 51 00:04:51,120 --> 00:05:00,270 You can also can apply additional conditions to the elements and you can do this by rote dot exports. 52 00:05:00,570 --> 00:05:08,810 And then in the brackets you can write div and then you can apply the condition with that idea. 53 00:05:09,510 --> 00:05:12,110 OK, and you can see this right here. 54 00:05:12,180 --> 00:05:19,030 So at idee equals double bracket content. 55 00:05:19,440 --> 00:05:22,650 OK, and then let's close all the brackets. 56 00:05:23,960 --> 00:05:24,740 That's right. 57 00:05:25,930 --> 00:05:30,970 So if I run it, you can see another element in this, specifically the element of the content. 58 00:05:31,180 --> 00:05:33,390 OK, so this is its tack. 59 00:05:33,580 --> 00:05:41,380 And so I know that by seeing just this random number that it's not very intuitive for any human. 60 00:05:42,820 --> 00:05:50,410 But before jumping into the extracting of actual information, let's see a little bit more of the possibilities 61 00:05:50,530 --> 00:05:59,730 that you have with the Xbox to extract specific addresses of different tacks in the website. 62 00:06:00,160 --> 00:06:08,410 So I will do now ruled that XPath and here I will show that you can specify the tag name. 63 00:06:08,770 --> 00:06:17,530 So if I do this H1, OK, you can see that we're specifying the back of the H1 specifically for the 64 00:06:17,530 --> 00:06:18,190 first div. 65 00:06:18,530 --> 00:06:21,280 And you can also do another thing. 66 00:06:21,280 --> 00:06:31,690 For example, if you want to specify the diff in the body, you can simply do what body slash div. 67 00:06:32,640 --> 00:06:43,440 Two, OK, here we go to Nero because added to parentheses, so now we can see actually the element 68 00:06:43,440 --> 00:06:46,150 of the first death in the body, actually. 69 00:06:46,170 --> 00:06:47,450 So this is the second one. 70 00:06:47,680 --> 00:06:48,210 My bet. 71 00:06:48,660 --> 00:06:52,630 But, yeah, this is how you're actually using these statements here. 72 00:06:52,650 --> 00:06:58,610 And actually, I'm going to open up another terminal because I don't want to lose the progress here. 73 00:06:58,890 --> 00:07:02,580 So let me actually do one major. 74 00:07:02,790 --> 00:07:06,270 I will press command today here. 75 00:07:06,780 --> 00:07:13,140 And you can see that here we are getting other terminals or actually is that you can do new windows 76 00:07:13,140 --> 00:07:14,520 because this actually doesn't work. 77 00:07:14,730 --> 00:07:16,380 I will simply do new window. 78 00:07:16,680 --> 00:07:19,590 And you can see that we are getting a brand new window here. 79 00:07:19,950 --> 00:07:28,220 And from this window, I've actually located a folder that we actually created into the pie chart, 80 00:07:28,230 --> 00:07:29,540 which is of Section four. 81 00:07:29,550 --> 00:07:38,130 And then I will show you some additional files that you can create in order to work with XPath toolkit. 82 00:07:38,550 --> 00:07:47,610 So let's go here and now actually go slightly closer so you can actually see and I will do some comment 83 00:07:47,610 --> 00:07:49,410 in order to go to the right folder. 84 00:07:49,420 --> 00:07:50,940 You can also set your folder. 85 00:07:51,420 --> 00:07:53,710 OK, so once you're in the right section, nice. 86 00:07:53,730 --> 00:07:59,370 Let's go back here to the code and let's create another file, which will be called Get the Version. 87 00:07:59,400 --> 00:08:04,920 Or for this file, we're actually going to get a version of the website using the XPath toolkit. 88 00:08:05,220 --> 00:08:09,570 So Python file and I will to get. 89 00:08:10,580 --> 00:08:12,580 Version, OK? 90 00:08:12,740 --> 00:08:31,400 And let's hear now import Ari and then let's import requests and then let's it from el ximo not e three 91 00:08:33,080 --> 00:08:47,960 import HMO, OK, and then response will be equal to the response that you get in here. 92 00:08:47,960 --> 00:08:53,160 We're going to get the response of the website and I will simply copy and paste it here. 93 00:08:53,480 --> 00:08:54,930 So this is our website. 94 00:08:54,950 --> 00:09:01,010 You can simply browse the video and just rewrite what we did already but are wrote actually the same 95 00:09:01,010 --> 00:09:05,750 name, if you don't remember, into our terminal when we're doing the python mode. 96 00:09:06,020 --> 00:09:13,460 So let's do now ruled equals HMO and the breakfast is right root. 97 00:09:14,750 --> 00:09:19,540 Sorry, let's actually write response dot content. 98 00:09:19,820 --> 00:09:23,540 OK, and then let's write title. 99 00:09:24,940 --> 00:09:30,310 Next will be equal to rules, not fines. 100 00:09:31,600 --> 00:09:38,910 OK, and here we are going to look for a hat because usually the information for the website is stored 101 00:09:38,920 --> 00:09:40,120 there and then they will. 102 00:09:40,120 --> 00:09:40,470 Right? 103 00:09:40,480 --> 00:09:41,050 Fine. 104 00:09:41,300 --> 00:09:44,310 So we are going to use the find function. 105 00:09:44,920 --> 00:09:45,880 So find. 106 00:09:47,250 --> 00:09:56,310 Title and just platers text, OK, so this is how you write this in Python and then let's try it if 107 00:09:57,450 --> 00:10:01,830 we don't search and then let's try it here. 108 00:10:02,780 --> 00:10:11,510 You 201 see not star you 210 one D comma. 109 00:10:12,890 --> 00:10:14,630 Title, text. 110 00:10:14,870 --> 00:10:17,100 OK, so it is the same variable that we did. 111 00:10:18,140 --> 00:10:27,160 OK, so in that case, let's do release equals to RB dot search. 112 00:10:27,560 --> 00:10:34,370 And here we can write again you to see the star. 113 00:10:34,790 --> 00:10:39,590 OK, slash you to one the. 114 00:10:41,240 --> 00:10:43,940 OK, Colma. 115 00:10:45,090 --> 00:10:52,320 Title, underscore, text, dot group. 116 00:10:52,470 --> 00:10:59,050 OK, so if this happens more than once, who just like to group it and show one element. 117 00:10:59,070 --> 00:11:01,530 So let's do now p text. 118 00:11:03,630 --> 00:11:13,910 Equals to rules that XPath and here is the first time we're going to use the export function and here 119 00:11:13,920 --> 00:11:15,790 I will do def. 120 00:11:17,900 --> 00:11:26,690 Actually, let's do a double slash diff, and here I will do the inquiry for the content, so I will 121 00:11:26,690 --> 00:11:26,890 do. 122 00:11:26,900 --> 00:11:29,420 I do, of course. 123 00:11:31,320 --> 00:11:43,270 Content, then let's close the bracket, slash the one that's it, the zero dot text. 124 00:11:43,540 --> 00:11:45,720 OK, and then let's the version. 125 00:11:45,720 --> 00:11:55,350 So version equals P underscore text dot split one. 126 00:11:55,980 --> 00:12:03,420 And then let's write the print code name and then let's write slash and version. 127 00:12:03,870 --> 00:12:11,530 OK, and here the last thing I would do is simply to formatted so the format and then I would write 128 00:12:11,530 --> 00:12:15,720 through release release comma version. 129 00:12:17,460 --> 00:12:18,120 So. 130 00:12:19,640 --> 00:12:22,100 Here's how we're going to show their release, the version. 131 00:12:23,150 --> 00:12:31,280 After all, and the first of have code name, OK, and this will be the release and then version, which 132 00:12:31,280 --> 00:12:33,270 will be the actual version. 133 00:12:33,740 --> 00:12:41,070 So just place here, guys, because I found that they skipped it and basically that was it. 134 00:12:41,540 --> 00:12:42,470 This is the code. 135 00:12:42,470 --> 00:12:48,260 So let's save it here and let's go up there and I will just hear right. 136 00:12:48,290 --> 00:12:57,110 Hashtag exclamation mark user slash bin slash Ian V. 137 00:12:58,510 --> 00:13:03,220 Python three, so it knows actually which file to use. 138 00:13:03,790 --> 00:13:10,270 OK, so save that and let's go to our terminal and here we are already in the folder. 139 00:13:10,270 --> 00:13:14,970 And if you right away, you can see that you have the gate version on file. 140 00:13:15,280 --> 00:13:23,790 So let's write Python get version, not pure, white and OK, here we have an error. 141 00:13:23,800 --> 00:13:28,780 But this is not a big deal because now we're professionals and we can fix it. 142 00:13:28,960 --> 00:13:30,460 So let's see what it is. 143 00:13:30,730 --> 00:13:40,450 So yes, here I missed that is so you know, we imported the requests package and instead actually I 144 00:13:40,450 --> 00:13:47,590 wrote here responses and should be requests, OK, requests that get. 145 00:13:47,770 --> 00:13:48,590 That's correct. 146 00:13:49,000 --> 00:13:57,580 So let's from the code now and you can see that now once we run it, we can see that the code name is 147 00:13:57,580 --> 00:14:00,080 Buster and the version is standard then. 148 00:14:00,460 --> 00:14:06,060 So we extracted the correct information from the website and this is the way that you can do it. 149 00:14:06,070 --> 00:14:10,220 Actually, I don't think the lines are too long for getting this information. 150 00:14:10,360 --> 00:14:18,460 So what we do is here we use the regular expression XPath in order to to pull out the virtual name and 151 00:14:18,460 --> 00:14:21,480 split it into different fields. 152 00:14:21,970 --> 00:14:29,380 So there are also other usages of XPath and you can usually use it to get images and links from the 153 00:14:29,380 --> 00:14:30,010 Web page. 154 00:14:30,340 --> 00:14:34,710 But for that, we're going to talk actually tomorrow because this video is already too long. 155 00:14:35,800 --> 00:14:37,560 So thanks very much for watching, guys. 156 00:14:37,570 --> 00:14:44,890 And bear with me to the next video where we're going to use some advanced methods in order to get images 157 00:14:44,890 --> 00:14:48,250 and pages from the Web using XPath. 158 00:14:48,590 --> 00:14:49,510 Thanks for watching.