1 00:00:05,010 --> 00:00:08,780 Content types and extracting URLs. 2 00:00:09,960 --> 00:00:10,800 Hello, everyone. 3 00:00:11,310 --> 00:00:17,130 Today, we're going to talk about content types and let's first explain what is called the content types 4 00:00:17,370 --> 00:00:25,470 and then we're going to do some tests so you can see visually on experiments with this type of topic. 5 00:00:25,920 --> 00:00:34,560 So so as HTP can transport basically any type of data to the server, usually use the content type in 6 00:00:34,560 --> 00:00:43,410 order to add in the header and information about basically what is the type of the file that we're obtaining 7 00:00:43,410 --> 00:00:44,160 from Internet. 8 00:00:44,700 --> 00:00:51,390 So having done that, pull data basically helps us to know how to deal with it. 9 00:00:51,720 --> 00:00:58,670 You know, and this is because you're actually not going to be dealing with picture in the same way. 10 00:00:58,680 --> 00:01:01,070 For example, how would you deal with a text file? 11 00:01:01,080 --> 00:01:01,410 Right. 12 00:01:01,860 --> 00:01:07,800 So for that reason, it is very important to have the content type and to be able to know how to obtain 13 00:01:07,800 --> 00:01:08,000 it. 14 00:01:08,310 --> 00:01:13,290 And let's actually open the terminal and I will show you some examples of that. 15 00:01:14,490 --> 00:01:24,780 Now, if I write Python and so enter the Python terminal, we can assume the right response because 16 00:01:24,780 --> 00:01:27,360 to you are l open. 17 00:01:30,000 --> 00:01:37,950 And then I will write HTP s or HTP YouTube. 18 00:01:41,550 --> 00:01:51,690 YouTube and dot com let change here the parenthesis to just a single one, and let's actually import 19 00:01:51,870 --> 00:01:53,220 the Eurail. 20 00:01:55,440 --> 00:01:56,370 Actually, in. 21 00:01:58,570 --> 00:02:02,380 Let's do from Eurail Library. 22 00:02:05,890 --> 00:02:06,790 Import. 23 00:02:09,000 --> 00:02:15,530 You are real open, actually, from your elaborated request. 24 00:02:17,420 --> 00:02:18,260 Request. 25 00:02:18,320 --> 00:02:18,870 That's it. 26 00:02:19,520 --> 00:02:21,740 And also, let's import. 27 00:02:24,230 --> 00:02:27,170 Your library. 28 00:02:31,580 --> 00:02:32,130 That's it. 29 00:02:32,570 --> 00:02:35,810 So once you have both of those libraries, let's repeat the. 30 00:02:38,730 --> 00:02:44,730 And let's repeat their response excellent with the YouTube dot com, and you can see that now the request 31 00:02:44,730 --> 00:02:45,400 is accepted. 32 00:02:45,720 --> 00:02:47,490 So let's do a response. 33 00:02:51,130 --> 00:02:56,590 Response, don't you get here 34 00:02:59,120 --> 00:03:04,770 and here, would like to do the content and type. 35 00:03:05,540 --> 00:03:06,090 OK. 36 00:03:09,230 --> 00:03:13,670 So if I do it, you can see that the content type is text here. 37 00:03:14,780 --> 00:03:15,210 Right. 38 00:03:15,560 --> 00:03:23,190 So here I can see basically what is the content type of the page that they assess in YouTube dot com. 39 00:03:23,510 --> 00:03:25,150 So we consider this material. 40 00:03:25,520 --> 00:03:26,720 Obviously you can. 41 00:03:28,180 --> 00:03:35,530 Explain your files, so since you already know how to do that, let's talk a little bit about links 42 00:03:35,830 --> 00:03:43,300 or you know that every website has many pages which refer to links and sometimes maybe you would like 43 00:03:43,300 --> 00:03:45,440 to see all those things, right? 44 00:03:46,060 --> 00:03:52,840 Well, in that case, we can use the euro library in order to see or extract some of those links. 45 00:03:53,110 --> 00:03:56,380 And obviously, this is part of the python packages. 46 00:03:56,620 --> 00:04:04,240 So you can get more information to this Web page here, the HTML parser. 47 00:04:04,600 --> 00:04:11,350 So if you're interested, simply go to this page and talk to the Python dot org and HTML parser. 48 00:04:11,590 --> 00:04:17,250 OK, because this is what we're going to use in order to extract links. 49 00:04:17,560 --> 00:04:24,000 And here you will see many functions, even functions that are actually not covered in these videos 50 00:04:24,010 --> 00:04:25,460 if you feel the need to use them. 51 00:04:25,810 --> 00:04:30,240 But now let's go to the Python directory here. 52 00:04:32,150 --> 00:04:39,860 And let's create another file so the family court extract links, so vital file pics. 53 00:04:41,730 --> 00:04:42,330 Tarak. 54 00:04:43,560 --> 00:04:45,540 And underscore links. 55 00:04:45,930 --> 00:04:50,490 OK, so let's create a new python file called Construct Links, I will say that. 56 00:04:51,660 --> 00:05:01,290 And let's right now, so I actually let's get a closer look and see, then I will write user. 57 00:05:02,350 --> 00:05:14,020 Slash, we've been slashing the fee to assist a -- in Python three and then from H.T. a.m.. 58 00:05:15,210 --> 00:05:24,110 Not pastor of the total here, we're using the pastor comment and the middle HTML parser class, that's 59 00:05:24,120 --> 00:05:24,320 it. 60 00:05:24,330 --> 00:05:26,930 And then I'll import import. 61 00:05:27,780 --> 00:05:33,930 Obviously you are old library dot request. 62 00:05:34,620 --> 00:05:39,180 And no, let's create a class so the class will be called my. 63 00:05:40,840 --> 00:05:45,610 Pass her and pass past age HTML parser. 64 00:05:45,640 --> 00:05:53,080 OK, so from the class HTML parser, we're basically going to create the class by password. 65 00:05:53,440 --> 00:05:55,510 OK then. 66 00:05:55,510 --> 00:05:56,020 That's right. 67 00:05:56,500 --> 00:06:03,790 Dear Shindou underscore starting OK. 68 00:06:04,420 --> 00:06:07,630 And here you need to define a few other attributes. 69 00:06:07,870 --> 00:06:12,700 Self obviously for the current item pass to the class action limit. 70 00:06:12,790 --> 00:06:13,820 Same column here. 71 00:06:14,740 --> 00:06:16,450 So self to the current item. 72 00:06:16,450 --> 00:06:19,120 Pass the class then Thack and then. 73 00:06:20,600 --> 00:06:22,810 Eighty hours, four attributes. 74 00:06:23,330 --> 00:06:24,860 Then let's do if. 75 00:06:27,140 --> 00:06:27,710 Doc. 76 00:06:29,340 --> 00:06:31,400 It costs a. 77 00:06:34,080 --> 00:06:44,400 So in that case, what he would like to do is to do a statement for A in attributes if. 78 00:06:46,630 --> 00:06:56,650 A zero is equal to H, r, e, f, and then there's Dooling. 79 00:06:58,200 --> 00:07:00,510 You cause a a. 80 00:07:01,990 --> 00:07:04,300 We're actually a while here, OK? 81 00:07:04,760 --> 00:07:06,450 The first attribute of a. 82 00:07:09,070 --> 00:07:15,680 And that if a link that finds. 83 00:07:18,860 --> 00:07:20,150 HTP. 84 00:07:21,440 --> 00:07:26,120 So if you find the HTP link higher or equal to zero. 85 00:07:30,380 --> 00:07:38,540 We would actually like to display it, right, so would like to present a link in that way, we're actually 86 00:07:38,540 --> 00:07:44,060 going to pass in print of the HTP links that are. 87 00:07:45,300 --> 00:07:48,270 And related to the Web page to assess soil, the new. 88 00:07:50,110 --> 00:07:50,710 Parra's. 89 00:07:54,420 --> 00:07:55,260 Equals. 90 00:07:56,900 --> 00:07:59,060 By Pastor, OK? 91 00:08:00,650 --> 00:08:05,630 And then I will do a you pass the feed. 92 00:08:07,300 --> 00:08:07,800 Link. 93 00:08:08,050 --> 00:08:15,190 OK, so this is going to be recursive function because we call the function inside itself until we get 94 00:08:15,190 --> 00:08:19,360 to the final result or until the number of things finish. 95 00:08:19,570 --> 00:08:25,690 So here are two Eurorail equals to HTP. 96 00:08:29,070 --> 00:08:31,630 YouTube dot com. 97 00:08:33,730 --> 00:08:35,980 OK, then request. 98 00:08:38,980 --> 00:08:44,590 We'll do quote to you, RL library, not a request. 99 00:08:46,440 --> 00:08:55,800 OK, not who you are so open and I will write to the jury that we passed to the program. 100 00:08:55,980 --> 00:08:59,430 OK, then let's the password. 101 00:09:01,900 --> 00:09:04,000 Equals my 102 00:09:06,940 --> 00:09:07,370 sorry. 103 00:09:07,400 --> 00:09:08,340 Bye bye bye. 104 00:09:08,480 --> 00:09:09,160 Pastor. 105 00:09:11,720 --> 00:09:12,490 OK. 106 00:09:15,360 --> 00:09:21,930 In the pastor will be actually don't feed. 107 00:09:23,700 --> 00:09:31,890 We request so as you can see here, we're quite a corporate passer from the last paper, sir, and then 108 00:09:31,890 --> 00:09:34,140 I will do request to read. 109 00:09:37,510 --> 00:09:38,020 Dalts. 110 00:09:41,050 --> 00:09:41,670 Called. 111 00:09:43,290 --> 00:09:46,020 And then out to you. 112 00:09:47,380 --> 00:09:49,740 T.F. Dush eight. 113 00:09:51,910 --> 00:09:52,840 And that's it. 114 00:09:53,290 --> 00:09:53,610 Done. 115 00:09:55,590 --> 00:10:05,540 So let's say that I seem to go back to the terminal, so I would go to the terminal here and I will 116 00:10:05,550 --> 00:10:06,870 exit the python mode. 117 00:10:07,200 --> 00:10:10,620 OK, so here's the same resources. 118 00:10:10,620 --> 00:10:16,050 If we're right, Airways, you can see the file extra actually misspelled. 119 00:10:16,050 --> 00:10:18,090 But anyways, extract links. 120 00:10:19,410 --> 00:10:20,970 So I will do Python. 121 00:10:24,720 --> 00:10:25,920 Extract links. 122 00:10:26,850 --> 00:10:27,680 Let's run it. 123 00:10:28,880 --> 00:10:32,810 And here you can see all the different pages that are related to the YouTube. 124 00:10:33,150 --> 00:10:43,460 So you can see here the page about you see the page about slash press, about slash copyright creators. 125 00:10:43,480 --> 00:10:48,930 And then we also have the developers go dot com YouTube. 126 00:10:49,320 --> 00:10:53,500 So you can see that this page is also related since YouTube is owned by Google. 127 00:10:54,060 --> 00:11:00,110 Then you have the politics, Google, YouTube about policies and so on. 128 00:11:00,120 --> 00:11:02,970 And finally, you have this weird looking page. 129 00:11:03,190 --> 00:11:06,520 So let's put it in the browser and see what's inside there. 130 00:11:07,020 --> 00:11:08,610 So let's add the page. 131 00:11:10,210 --> 00:11:11,050 And. 132 00:11:12,140 --> 00:11:15,470 This is how this weird looking page looks like. 133 00:11:15,980 --> 00:11:22,910 Apparently, this is one of the main pages of YouTube I never seen before, but you can see that we 134 00:11:22,910 --> 00:11:29,470 can actually find and assess every page that YouTube owns and this place to the to the users. 135 00:11:30,110 --> 00:11:34,610 So so I hope, guys, this picture was informative for you. 136 00:11:34,920 --> 00:11:39,500 And if that was the case, please continue watching the rest of all the videos because they are also 137 00:11:39,500 --> 00:11:40,370 that interesting. 138 00:11:40,970 --> 00:11:42,320 Thank you very much for watching. 139 00:11:42,320 --> 00:11:44,410 And I'll see you in our next video.