1 00:00:05,030 --> 00:00:07,400 Storing spiders data. 2 00:00:07,850 --> 00:00:08,600 Hello, everyone. 3 00:00:08,630 --> 00:00:16,070 Today, we're going to talk how to save or store the amazing and the most important information that 4 00:00:16,070 --> 00:00:17,300 you got with the spiders. 5 00:00:17,810 --> 00:00:22,280 But first of all, I would like to automate what we did in the last lecture. 6 00:00:22,520 --> 00:00:24,400 So I will go to the pie chart. 7 00:00:24,500 --> 00:00:30,770 And actually here we wrote to the previous spider, the, quote, spider. 8 00:00:30,920 --> 00:00:37,790 I would delete everything and we going to create a new spider, which will be short, but quite important. 9 00:00:37,790 --> 00:00:47,000 And the spider will actually export the text, the outer and the text of multiple quotes that you find 10 00:00:47,000 --> 00:00:47,530 online. 11 00:00:47,870 --> 00:00:56,570 So let's import here the scrappier tool kit and then I'll create a class, OK? 12 00:00:56,930 --> 00:01:02,890 And the class will be called quotes Spider. 13 00:01:03,120 --> 00:01:09,650 OK, and here I'll pass scrapie that spider. 14 00:01:10,310 --> 00:01:19,280 OK, and here I said let's write a name equals Colts' ok. 15 00:01:19,910 --> 00:01:21,740 And after that let's write. 16 00:01:23,170 --> 00:01:36,000 Start you are else equals, let's try to break it here and then let's try the first three, which will 17 00:01:36,010 --> 00:01:50,550 be HTP column quotes, dot to scrub, dot com slash page slash one. 18 00:01:51,570 --> 00:01:52,350 That's it. 19 00:01:53,670 --> 00:02:06,180 And let's try the second one, which will be HTP quotes and dot to scrub dot com slash page slash two. 20 00:02:06,480 --> 00:02:09,270 OK, and this can be closed. 21 00:02:10,300 --> 00:02:22,380 So here in this quote, Spider, we're going to assess the the URLs and then in the next function that 22 00:02:22,380 --> 00:02:28,640 we'll create here within the spider, we're actually going to pass or extract information. 23 00:02:28,650 --> 00:02:38,490 So let's write the D.F., D.F. Parse and herself Coahoma response. 24 00:02:39,120 --> 00:02:39,890 That's it. 25 00:02:40,560 --> 00:02:51,360 And here I will create the for loop so for quote in response that she says. 26 00:02:53,390 --> 00:02:55,970 If not, quote. 27 00:02:57,630 --> 00:03:08,340 OK, so for each quote in the diffs, quote, where the quarter story will yield the following, so 28 00:03:08,340 --> 00:03:19,620 I will do next as the key actually so fixed and then called the success and then. 29 00:03:22,050 --> 00:03:25,860 Span, not text. 30 00:03:26,890 --> 00:03:30,980 And then let's right next here. 31 00:03:31,090 --> 00:03:38,380 OK, so remember here specifically, we're going to construct only the text from the from the quote 32 00:03:38,380 --> 00:03:42,880 and then extract underscore first. 33 00:03:43,090 --> 00:03:49,600 OK, so basically the same what we've done in the terminal, but now we're automating the whole process. 34 00:03:49,840 --> 00:04:02,470 Then there's the outer outer and I right here, quote, dot ccis and in the brackets were right smol 35 00:04:02,680 --> 00:04:03,220 dot. 36 00:04:04,240 --> 00:04:06,580 Ilter and then next. 37 00:04:06,820 --> 00:04:13,420 So we are going to extract Baltoro Vista's text as well and then extract. 38 00:04:14,400 --> 00:04:16,180 First, OK. 39 00:04:17,510 --> 00:04:27,040 I'm very good and the final thing will be the tax, so tax and I will do quote dot. 40 00:04:28,940 --> 00:04:36,520 Called the CSIS, and here in the break, it's all right if Dr.. 41 00:04:37,620 --> 00:04:43,410 Backs a dirt bag and then. 42 00:04:44,900 --> 00:04:48,980 Next, OK, and here, all right, Dot. 43 00:04:49,930 --> 00:04:51,040 Extract. 44 00:04:52,390 --> 00:04:54,610 OK, Como. 45 00:04:56,050 --> 00:05:02,710 And we are ready to hear guys here, we're getting on the air because I forgot to add the coma. 46 00:05:03,190 --> 00:05:04,080 So that's it. 47 00:05:04,600 --> 00:05:09,880 And let's go up because I forgot to add to the slash here after one and two. 48 00:05:10,450 --> 00:05:14,240 Let's save that and let's go back to the terminal. 49 00:05:14,950 --> 00:05:20,790 So if I write here hours, you will see that we have the Colts spider. 50 00:05:21,100 --> 00:05:24,550 So let's actually run it and to run it. 51 00:05:24,660 --> 00:05:26,800 If you remember, we used a comment. 52 00:05:28,060 --> 00:05:30,640 Scrappy, so scrappy. 53 00:05:32,200 --> 00:05:42,460 Korol and then quotes, OK, so if I run this, you can see that many things happen texturally into 54 00:05:42,460 --> 00:05:46,150 the entire terminal, quite a few actually. 55 00:05:46,180 --> 00:05:53,410 So if I go up, guys, you can see the actual quotes and here is actually the first one. 56 00:05:53,410 --> 00:05:54,490 So that exists. 57 00:05:55,360 --> 00:05:59,980 Just life is what you make it and so on and so on and so forth. 58 00:06:00,460 --> 00:06:02,610 You can see it's actually very, very long. 59 00:06:03,730 --> 00:06:09,460 Then you can see the outer, which is that Marilyn Monroe and also the tax. 60 00:06:09,970 --> 00:06:15,670 After that, we move on and we can actually see the next quote here. 61 00:06:16,300 --> 00:06:21,880 And then you can see that the quote is from both Marilyn and the text are love. 62 00:06:22,300 --> 00:06:29,740 So you can see how we can get so many information from that website and everywhere we can get specifically 63 00:06:29,740 --> 00:06:38,890 the information that we requested without the need to observe the whole HTML file and read the whole 64 00:06:38,890 --> 00:06:42,060 thing, everything with facts and so on. 65 00:06:42,940 --> 00:06:47,950 So instead of that here, we can use scrapie simply to obtain only the information we need. 66 00:06:48,340 --> 00:06:54,670 And I will show you something very cool, which is how to save the information, because obviously here 67 00:06:54,670 --> 00:06:58,440 you just it in front of you, but you don't have a safe file with it. 68 00:06:58,780 --> 00:06:59,920 It is very simple. 69 00:07:00,700 --> 00:07:04,900 You can do against Croppy Croll. 70 00:07:06,010 --> 00:07:15,500 OK, and then quotes the old and then quotes Dr Jason. 71 00:07:16,000 --> 00:07:19,260 OK, so here's how you can save it in Jason File. 72 00:07:19,270 --> 00:07:25,000 And if I hit enter here guys, you can see that everything actually opened again. 73 00:07:25,420 --> 00:07:31,160 But the difference is now that we save this to a file in the file will be in the same folder. 74 00:07:31,420 --> 00:07:37,600 So if I go back here, the entire project, I can see that we've got the quotes, Doctor Jason file, 75 00:07:37,870 --> 00:07:42,760 and you can see that every quote is saved in a very nice Jason format. 76 00:07:43,060 --> 00:07:46,030 So you can see the first take the first text. 77 00:07:46,180 --> 00:07:48,070 OK, the text. 78 00:07:48,070 --> 00:07:51,880 Then you can see the outer of the text and then you can see the tax. 79 00:07:52,150 --> 00:07:58,300 So you can basically obtain absolutely all the information that you need from this website, which is 80 00:07:58,300 --> 00:07:59,500 absolutely amazing. 81 00:07:59,860 --> 00:08:00,740 So that's it. 82 00:08:00,820 --> 00:08:02,440 Guys, you very much for watching. 83 00:08:02,470 --> 00:08:07,300 This was everything I want to show you for today, because it's quite important to know how you can 84 00:08:07,300 --> 00:08:11,200 store your files once you obtain them. 85 00:08:11,200 --> 00:08:12,130 Would scrapie. 86 00:08:12,400 --> 00:08:13,950 Thank you very much for watching. 87 00:08:14,050 --> 00:08:16,510 And I'm going to see you in our next video.