1 00:00:05,030 --> 00:00:08,780 Extracting labels and handling curious exceptions. 2 00:00:09,200 --> 00:00:09,780 Hi, guys. 3 00:00:09,860 --> 00:00:17,390 Today, we're going to talk about how to extract links and deal with exceptions with this topic last 4 00:00:17,390 --> 00:00:17,630 time. 5 00:00:17,630 --> 00:00:22,100 But today we're going to extend it because I want to show some very nice features here. 6 00:00:23,210 --> 00:00:25,030 So let's close this. 7 00:00:25,040 --> 00:00:31,160 And here in our Section four, I will not lose any time and create a new item. 8 00:00:31,160 --> 00:00:36,410 File in the Python file will be called exacty news. 9 00:00:36,580 --> 00:00:38,310 OK, extracting news. 10 00:00:38,840 --> 00:00:42,970 So let's start creating the file and I will run it after that. 11 00:00:43,370 --> 00:00:52,830 So let's try Eurorail slash and then slash EMV item three. 12 00:00:53,210 --> 00:00:53,580 OK. 13 00:00:53,930 --> 00:01:01,370 And next, we're going to import some libraries in order to make possible the extraction of the news 14 00:01:01,370 --> 00:01:03,560 from Internet completely automatic. 15 00:01:04,040 --> 00:01:09,920 So let's now import requests, OK? 16 00:01:10,490 --> 00:01:12,710 And then import once again. 17 00:01:13,940 --> 00:01:20,300 Or actually I write from B is for import. 18 00:01:20,660 --> 00:01:21,490 Beautiful. 19 00:01:22,100 --> 00:01:23,780 So, OK. 20 00:01:24,440 --> 00:01:26,300 And now let's create a function here. 21 00:01:27,170 --> 00:01:30,980 This task will be done with functions, so the function will be kept. 22 00:01:32,900 --> 00:01:33,560 From. 23 00:01:35,680 --> 00:01:36,130 Page. 24 00:01:36,930 --> 00:01:37,400 OK. 25 00:01:39,540 --> 00:01:41,040 And then I will write. 26 00:01:42,560 --> 00:01:49,360 I don't get equals h t Tepes. 27 00:01:51,910 --> 00:01:54,070 Then double Dutch news. 28 00:01:56,000 --> 00:02:05,420 OK, news that y c o m b natur dot com. 29 00:02:05,540 --> 00:02:12,860 So this will be the website that will extract news and our right front page. 30 00:02:13,080 --> 00:02:21,080 OK, this is a variable in the front page will be requests not to get and we're going to get to the 31 00:02:21,260 --> 00:02:22,880 euro or the target. 32 00:02:23,110 --> 00:02:26,600 OK, let me fix this spelling mistake here. 33 00:02:27,590 --> 00:02:31,860 OK, so let's now create an if statement. 34 00:02:31,880 --> 00:02:49,190 So if not from page, OK, then RES runtime error and the runtime error will be Kante. 35 00:02:50,760 --> 00:02:51,840 OK, Kent. 36 00:02:53,460 --> 00:02:57,080 Assess the news. 37 00:02:58,190 --> 00:03:06,680 OK, and then let's try the new underscore, so. 38 00:03:08,820 --> 00:03:09,100 OK. 39 00:03:10,290 --> 00:03:18,120 And this will be equal to beautiful soul, so here is an object from the beautiful subclause and our 40 00:03:18,120 --> 00:03:20,140 right front. 41 00:03:21,640 --> 00:03:24,390 Page, dot text. 42 00:03:24,860 --> 00:03:36,780 OK, front page text and then comma, and obviously here I write L XML for the protocol and here will 43 00:03:36,790 --> 00:03:41,850 return from the function nu so ok. 44 00:03:42,160 --> 00:03:50,320 So obviously this function will not take any parameters, but it will return the beautiful sope element 45 00:03:50,320 --> 00:03:51,700 which will be caught in your soul. 46 00:03:52,150 --> 00:03:59,740 And obviously after that in the next functions we're actually going to use what the dysfunction returns, 47 00:03:59,740 --> 00:04:02,230 namely a beautiful sope element. 48 00:04:02,500 --> 00:04:07,780 So now I would create another function which would be called find links. 49 00:04:08,270 --> 00:04:10,590 OK, and here I write. 50 00:04:10,660 --> 00:04:26,890 So, so this parameter that will need to pass and I write here items equals so dot find all and then 51 00:04:26,890 --> 00:04:28,360 I'll write to the. 52 00:04:30,350 --> 00:04:30,860 Cuomo. 53 00:04:31,400 --> 00:04:33,810 And here I will create a dictionary. 54 00:04:34,250 --> 00:04:39,310 So right here, I'll line the two dots, right? 55 00:04:40,940 --> 00:04:47,570 And after that first, right, Cuomo, the last, OK, and then. 56 00:04:48,900 --> 00:04:52,970 Title, that's it, very good. 57 00:04:53,950 --> 00:04:55,540 Now let's write links. 58 00:04:57,030 --> 00:05:05,310 We be cool to this, and that's right for I mean, items. 59 00:05:06,840 --> 00:05:08,820 Mm hmm, that's right. 60 00:05:09,330 --> 00:05:24,140 And here again, we're going to try some exceptions, so our right siblings EQUASS list and then I dot 61 00:05:24,420 --> 00:05:26,660 next siblings. 62 00:05:26,910 --> 00:05:27,300 Mm hmm. 63 00:05:29,650 --> 00:05:34,600 And then also underscore, Heidi, equals. 64 00:05:38,440 --> 00:05:43,000 Siblings Wyandot find. 65 00:05:44,620 --> 00:05:48,970 And first, I will look for A and I d. 66 00:05:49,510 --> 00:05:58,230 OK, then let's try to link equals siblings and hair right to that find. 67 00:05:58,540 --> 00:06:01,530 And here I'll look for something different, actually. 68 00:06:02,050 --> 00:06:09,610 So let's right here in the brackets, actually let's write as well, because we are going to look for 69 00:06:09,610 --> 00:06:17,130 elements of what a hero and I'll write h are you OK? 70 00:06:18,760 --> 00:06:28,510 Now let's write title and for the title will write against siblings to dot text. 71 00:06:29,290 --> 00:06:41,980 OK, since that letter was a text and the links that are bent and let's write here link and then link 72 00:06:43,450 --> 00:06:51,780 Cómo Bitel and then here title command post. 73 00:06:51,790 --> 00:06:57,500 I did not see it and then let's write post. 74 00:06:58,120 --> 00:07:10,770 I did OK and let's close the bracket here, which I actually hit upon here before the first curly bracket. 75 00:07:10,810 --> 00:07:13,600 OK, so this is a list with the links. 76 00:07:13,600 --> 00:07:24,430 And obviously, as you can see earlier on, we created this links list and now here for AI in items 77 00:07:24,880 --> 00:07:33,040 which are those, we're going to search for those items and we're going to upend the items into this 78 00:07:33,040 --> 00:07:34,480 list, into the link list. 79 00:07:34,480 --> 00:07:37,420 So we're going to open the link at this one. 80 00:07:37,660 --> 00:07:45,350 We're going to open the title that we found here and we're going to append the post ID, OK? 81 00:07:45,400 --> 00:07:51,820 And as you can see here, the exact parameters and we set the keys exactly the same name. 82 00:07:52,280 --> 00:08:04,040 OK, and let's right here, except exception us the OK and plus OK. 83 00:08:04,450 --> 00:08:08,580 And then let's right here return links. 84 00:08:08,800 --> 00:08:15,340 OK, so guys, since now we wrote the two functions, let's create just on the right the main function 85 00:08:15,460 --> 00:08:16,910 that we're going to write today. 86 00:08:17,650 --> 00:08:23,980 So here I will write if name equals. 87 00:08:26,100 --> 00:08:29,970 Maine, actually, Maine. 88 00:08:30,090 --> 00:08:30,730 That's it. 89 00:08:31,440 --> 00:08:41,140 So if the name is Maine, so the function is the main function, then they are right so equals get front 90 00:08:41,190 --> 00:08:41,800 page. 91 00:08:42,150 --> 00:08:42,510 OK. 92 00:08:43,080 --> 00:08:50,010 So as you remember, this is the function that you wrote earlier and then the results. 93 00:08:51,090 --> 00:08:56,600 So resorts will be called to find links and they'll pass here. 94 00:08:56,700 --> 00:09:02,650 So that is the output actually of the previous function, as I told you. 95 00:09:02,940 --> 00:09:07,800 So the output, the new soap here in the main function we refer to. 96 00:09:07,800 --> 00:09:18,060 So which comes from the first function and then the beautiful set of the beautiful sole object will 97 00:09:18,060 --> 00:09:20,120 be passed actually to the next function. 98 00:09:20,520 --> 00:09:22,590 And we will get our results here. 99 00:09:23,190 --> 00:09:31,490 And finally, because we would like to print something on our screens, our right for error in resorts. 100 00:09:32,310 --> 00:09:38,880 OK, we all right if our is not none. 101 00:09:39,390 --> 00:09:42,660 So our has to have a value print. 102 00:09:44,250 --> 00:09:44,700 Our. 103 00:09:46,190 --> 00:09:49,550 And in the brackets, I referred to the link. 104 00:09:51,400 --> 00:09:53,280 OK, Luss. 105 00:09:54,490 --> 00:10:06,040 Space Los Angeles, right, are litho, actually, we put it in parentheses, that's it, and we're 106 00:10:06,040 --> 00:10:07,530 done guys, with the main function. 107 00:10:07,840 --> 00:10:11,710 So let's go now to the terminal and let's run our code. 108 00:10:12,230 --> 00:10:16,100 So first of all hours and actually save it. 109 00:10:16,150 --> 00:10:16,680 That's it. 110 00:10:17,230 --> 00:10:27,670 So if we write now Python underscore the underscore news, not way. 111 00:10:28,210 --> 00:10:30,040 Let's run it and let's first see. 112 00:10:31,240 --> 00:10:33,140 OK, so here we have some error. 113 00:10:33,160 --> 00:10:37,530 Let's see what is in the defined links. 114 00:10:38,980 --> 00:10:39,520 Yes. 115 00:10:39,520 --> 00:10:41,680 Here we need to have a semicolon. 116 00:10:42,700 --> 00:10:45,540 Let's save it and let's go back to the terminal. 117 00:10:46,570 --> 00:10:52,690 So, OK, we can see that we don't have any errors and you can see that now, guys, we've got all the 118 00:10:52,690 --> 00:10:55,450 links here with all the news. 119 00:10:55,780 --> 00:10:58,630 And you can see that we've got quite a lot of them. 120 00:10:58,630 --> 00:11:05,980 And actually, let me copy the link of some of the news and run it and see what's going to be there. 121 00:11:06,490 --> 00:11:10,450 Obviously, hear what you're getting is the link. 122 00:11:11,320 --> 00:11:12,970 And also you are getting the header. 123 00:11:14,020 --> 00:11:20,560 So you see the date of the online search hardware accelerators and so on. 124 00:11:20,860 --> 00:11:22,860 These are all headers over the pages. 125 00:11:23,350 --> 00:11:30,580 So if I run those, you can actually see that there is some article here and you can see that the header 126 00:11:30,580 --> 00:11:33,160 is the date of the online search and blah, blah, blah. 127 00:11:33,760 --> 00:11:40,510 So here's how you can assess the news page essentially, and to have all of them for your own purposes 128 00:11:40,690 --> 00:11:44,420 just by using Python in beautiful soul. 129 00:11:44,950 --> 00:11:52,360 So that said, guys, this will be the end of this video, but continue with the section because I have 130 00:11:52,360 --> 00:11:56,290 quite a few more interesting sections for you. 131 00:11:56,290 --> 00:12:02,980 And we're going to write a lot of code and really enjoy using Python for networking on the Web. 132 00:12:03,490 --> 00:12:04,120 That's it. 133 00:12:04,150 --> 00:12:05,470 Thank you very much for watching. 134 00:12:05,470 --> 00:12:07,780 And I'll see you in the next video.