1 00:00:04,880 --> 00:00:08,320 Reading an XML file, hi, everyone. 2 00:00:08,330 --> 00:00:13,300 Today we're going to talk about a very important topic, which is the XML files. 3 00:00:13,700 --> 00:00:22,220 So in this section you will learn how to work with XML files, how to use and parse them by using the 4 00:00:22,220 --> 00:00:26,050 element three API with Python, three point seven. 5 00:00:26,810 --> 00:00:38,120 So XML in general, is standardised device type actually a structure or a file that is easily readable 6 00:00:38,120 --> 00:00:41,600 from the user and is created back in 1996. 7 00:00:42,200 --> 00:00:51,200 The XML document contains data structures based on content developers, which are stated by different 8 00:00:51,200 --> 00:00:56,230 markers and are positioned in Iraq level. 9 00:00:56,370 --> 00:01:02,780 OK, so you would have like the higher level on the outer layer and then on the inner layers you have 10 00:01:02,990 --> 00:01:08,220 the lower level code that is contained within the file. 11 00:01:08,690 --> 00:01:15,600 So the example is pretty much used to represent prioritized data in a standard text format. 12 00:01:16,040 --> 00:01:24,110 So when we're working with XML files, we're usually going to be creating XML documents signi sending 13 00:01:24,110 --> 00:01:34,860 them as bodies of HTP requests and receiving actually the XML file from the web, very similar to the 14 00:01:34,880 --> 00:01:35,840 Jason files. 15 00:01:35,900 --> 00:01:37,990 But this time the structure is slightly different. 16 00:01:38,510 --> 00:01:42,760 So when we're using examples, there are two main approaches to that. 17 00:01:43,400 --> 00:01:49,940 The first one is to read the whole document and to create the object based representation of the document 18 00:01:50,510 --> 00:01:55,530 and then transferring it to the object oriented API for manipulation. 19 00:01:56,240 --> 00:02:04,640 The second way is processing the document from the starting to the endpoint and performing the actions 20 00:02:04,880 --> 00:02:07,410 once we see specific tasks. 21 00:02:07,580 --> 00:02:13,250 For example, when we see the task, we're performing some programming action with that task, even 22 00:02:13,250 --> 00:02:15,080 though the document is not done yet. 23 00:02:15,350 --> 00:02:23,450 So let's so let's go to the ID and I will show you there how we can process XML files with the elementary. 24 00:02:23,820 --> 00:02:31,670 So here I am in the terminal, guys, and let's import the XML tree into our devices. 25 00:02:31,670 --> 00:02:48,880 So I write import xml dot e three dot element three us it actually we here to do this in python mode. 26 00:02:49,250 --> 00:02:56,180 So let me actually copy it and let's go to Python Moult and then write the same so we get an error here 27 00:02:56,180 --> 00:02:57,710 because it is a slight mistake. 28 00:02:57,710 --> 00:03:02,800 Instead of import ETR it should be import OK. 29 00:03:03,680 --> 00:03:07,690 And now we import the military as it is. 30 00:03:07,700 --> 00:03:19,100 So if a right route equals etti dot element in the brackets, if I type root OK. 31 00:03:19,880 --> 00:03:26,580 And so in that way we can create the previously mentioned XML document by using the elementary. 32 00:03:27,410 --> 00:03:34,580 So as we can see here, by creating growth, we are starting back by creating the root element which. 33 00:03:41,600 --> 00:03:46,350 Will be the actually the examiner representation of the Python file. 34 00:03:46,670 --> 00:03:52,620 So if I write it, don't dump route, OK? 35 00:03:53,120 --> 00:04:02,450 He's actually represented actually this representation is an XML representation of rule to root type. 36 00:04:03,020 --> 00:04:09,080 OK, so here from the root python type we actually created on XML. 37 00:04:09,770 --> 00:04:12,620 Now let's do another example and I'll write the book. 38 00:04:13,460 --> 00:04:17,750 You quotes etty that element. 39 00:04:19,650 --> 00:04:40,510 And then book, OK, and then I write route dot up and book, OK, and then t dot dump route. 40 00:04:41,220 --> 00:04:48,320 OK, so as you can see now guys were created the route, which is the outer layer, and then we created 41 00:04:48,330 --> 00:04:52,680 a book and we appended this book inside our route. 42 00:04:52,710 --> 00:04:59,850 OK, so you created to level structure XML file, which is this hierarchical order that I was talking 43 00:04:59,850 --> 00:05:01,700 about earlier on. 44 00:05:02,040 --> 00:05:06,330 So let's see if you'd like to add another element we can to name. 45 00:05:06,910 --> 00:05:08,220 OK, name. 46 00:05:09,520 --> 00:05:22,680 And then I will make this equal to eat dot sope element and then I will write a book Colma and then 47 00:05:22,680 --> 00:05:23,070 name. 48 00:05:23,480 --> 00:05:32,910 OK, let's close the brackets here and once I write this, if I type after that name dot text equals 49 00:05:34,170 --> 00:05:44,840 book one and then we can display what we've got already and actually created by etty dot dump route. 50 00:05:45,570 --> 00:05:51,790 And here as you can see we have the outer layer route, then the inner layer will have it as a book 51 00:05:51,790 --> 00:05:58,650 because you can see here and then inside we have a name and actually we wrote a value for this name, 52 00:05:59,070 --> 00:06:00,330 go to book one. 53 00:06:00,580 --> 00:06:04,220 OK, and we did this by writing name the text. 54 00:06:04,590 --> 00:06:13,560 So what we did here is we created the name layer, as you can see here, which is a subelements type, 55 00:06:13,560 --> 00:06:14,030 OK? 56 00:06:14,280 --> 00:06:17,450 And we just buy it dot the payment element. 57 00:06:17,460 --> 00:06:22,530 So you can see that the previous ones, there were elements, so they didn't have actual value. 58 00:06:22,540 --> 00:06:28,700 But here we create the element and then we implemented it into the rule. 59 00:06:28,890 --> 00:06:32,670 And this is how we've got the name book one inside. 60 00:06:33,090 --> 00:06:34,230 Quite simple, isn't it? 61 00:06:35,370 --> 00:06:38,400 So basically this is the way that you can create an example. 62 00:06:39,180 --> 00:06:45,420 But let's actually also practice some other very nice features of the XML files. 63 00:06:45,720 --> 00:06:48,100 So let me show you one thing. 64 00:06:48,450 --> 00:07:02,470 So here, if I write them equals t dot soap element and then ruled comma them. 65 00:07:02,990 --> 00:07:03,490 OK. 66 00:07:04,290 --> 00:07:11,330 And after that, if I write it dot dump rules. 67 00:07:11,910 --> 00:07:16,570 OK, so we added this in the route as you can see here. 68 00:07:17,910 --> 00:07:21,990 So we have a route book and then we have a name. 69 00:07:23,520 --> 00:07:27,680 And also here on the same level we have temp. 70 00:07:28,430 --> 00:07:33,650 OK, so that's what we're producing in effect right now. 71 00:07:33,660 --> 00:07:37,700 Rules dot remove. 72 00:07:38,370 --> 00:07:43,110 You see how we can remove the temporal layer if we wouldn't like to have it. 73 00:07:43,560 --> 00:07:46,890 So if I press this, you get nothing. 74 00:07:46,890 --> 00:07:59,230 But if I write it dot dot rules, you can see that now our temp layer is completely removed. 75 00:07:59,490 --> 00:08:03,840 So this is the way that you can remove actually elements from the XML file. 76 00:08:04,170 --> 00:08:11,940 Now, if you want to display the XML file in a nice visual way, because obviously if you have a very 77 00:08:11,940 --> 00:08:17,220 long XML file and trust me, I've been working with XML files long. 78 00:08:17,370 --> 00:08:18,720 Ten thousand lines. 79 00:08:18,870 --> 00:08:22,040 So imagine ten thousand lines of split like this. 80 00:08:22,470 --> 00:08:30,220 Basically the XML representation kinda loses its advantages over the other representations. 81 00:08:30,570 --> 00:08:34,410 So let me show you how can represented in quite a nice way. 82 00:08:34,980 --> 00:08:45,600 So if I write import X amount dot doma dot many dumb as many dumb. 83 00:08:47,640 --> 00:08:48,870 Let's print. 84 00:08:51,310 --> 00:08:53,050 Mini dome dot. 85 00:08:55,290 --> 00:09:00,170 Parts during and after that. 86 00:09:00,390 --> 00:09:04,180 Right here in the break, it's E.T. dot, dot. 87 00:09:05,960 --> 00:09:13,370 13, OK, to string and then root, so the root will be to drink not. 88 00:09:14,580 --> 00:09:19,350 To pre t t, y, x. 89 00:09:20,630 --> 00:09:25,540 And Mel, to pretty accent Mel, that's it. 90 00:09:26,980 --> 00:09:27,690 Let's run it. 91 00:09:28,250 --> 00:09:34,220 And as you can see here, the things are looking actually quite different from before. 92 00:09:34,520 --> 00:09:41,540 So you can see the hair there in the version of the XML, which is a version of one, then you have 93 00:09:41,540 --> 00:09:42,200 the root. 94 00:09:42,360 --> 00:09:50,820 OK, then you have the inner layer, which is a subletter, the book, and then the actual value, the 95 00:09:51,390 --> 00:09:59,120 player which has the value of the book, which is called Bourquin, this is the name you can obviously 96 00:09:59,120 --> 00:09:59,510 have on. 97 00:09:59,510 --> 00:10:04,940 There are some values like type Outr and so on. 98 00:10:05,180 --> 00:10:10,940 But basically this is how one XML file looks like and this is one that you just create by yourself. 99 00:10:12,050 --> 00:10:13,290 So this is quite nice. 100 00:10:13,940 --> 00:10:22,820 So the next thing I will do, guys, is we're actually going to use the XML books file and I will show 101 00:10:22,820 --> 00:10:23,070 you. 102 00:10:23,930 --> 00:10:25,440 So let me open it for you. 103 00:10:25,760 --> 00:10:27,670 So just a 4.0 Provigil. 104 00:10:27,680 --> 00:10:31,930 And this is XML file with different books. 105 00:10:31,970 --> 00:10:33,380 They are names and so on. 106 00:10:34,190 --> 00:10:37,070 So let me copy it. 107 00:10:37,940 --> 00:10:46,700 And and we can actually paste it in the same folder as your other files in the Section three. 108 00:10:47,300 --> 00:10:51,470 So if you select Section three and they copied and then if we select. 109 00:10:52,740 --> 00:11:00,540 Command commentary, I can simply place the books XML here, as you can see, and if you open it, you 110 00:11:00,540 --> 00:11:02,080 can see actually the same file here. 111 00:11:02,640 --> 00:11:04,920 So this is the book Sex Symbol, guys. 112 00:11:05,580 --> 00:11:11,410 And once we have this file, I obviously attach it to the final lecture of this section. 113 00:11:11,650 --> 00:11:17,850 Now we can write a Python script in order to assess this file and use the information and transfer it 114 00:11:17,850 --> 00:11:18,840 to Python file. 115 00:11:19,200 --> 00:11:25,430 So if I go here into python mode, so now I am in the python mode, guys. 116 00:11:25,710 --> 00:11:31,900 So let's assess the file here and let's see what operations we can actually perform with it. 117 00:11:32,310 --> 00:11:49,110 So the first thing I would do is import xml dot e three, ok, xml a tree dot elements three us etty 118 00:11:49,560 --> 00:11:50,090 ok. 119 00:11:50,460 --> 00:12:05,310 And then let's write books equals t dot files and then write books dot xml and this is the file would 120 00:12:05,310 --> 00:12:09,260 just add it into our resources section three folder. 121 00:12:09,630 --> 00:12:12,270 So when ready to consider the comment is accepted. 122 00:12:12,450 --> 00:12:28,080 So if I write now ruled Yuko's books dot get ruled and I will break it in that way you can assess the 123 00:12:29,160 --> 00:12:35,400 root node here and if I write perent root. 124 00:12:37,510 --> 00:12:46,330 You can see that we get the element ruled out its location, we can also assess certain strength from 125 00:12:46,330 --> 00:12:50,080 the example by identifying what type of data this is. 126 00:12:50,470 --> 00:12:59,190 So if I write the print and then route that tag, we can see that the tack of the route is route. 127 00:12:59,770 --> 00:13:06,180 We can also see the different attributes of each book by iterating between the books. 128 00:13:06,190 --> 00:13:07,620 And this is also very simple. 129 00:13:08,260 --> 00:13:15,400 So you can simply do for child, for child in route. 130 00:13:16,910 --> 00:13:22,820 OK, and now we are starting the fallout, by the way, you can also do this here in the python mode, 131 00:13:22,820 --> 00:13:27,710 you can print and then child. 132 00:13:29,170 --> 00:13:36,070 Not that Cuomo child, that's our tribute. 133 00:13:36,220 --> 00:13:43,750 OK, OK, let me just fix my print statement so we'll get some getting some inundation era when the 134 00:13:43,750 --> 00:13:47,950 music, the follow up from the python mode here. 135 00:13:47,950 --> 00:13:52,990 Obviously, later on we're going to create a file where I will show you how it's done. 136 00:13:53,260 --> 00:13:57,670 But you can also test the attributes with the following way. 137 00:13:57,700 --> 00:14:02,440 For example, FRATA ruled that atrip. 138 00:14:02,440 --> 00:14:10,240 And I put in a print statement, let's say we're going to see that, for example, in The Root, um, 139 00:14:10,660 --> 00:14:12,850 we don't have any attributes at the moment. 140 00:14:13,220 --> 00:14:19,870 So instead of showing you all this in Python model, let's actually create the file where we're going 141 00:14:19,870 --> 00:14:25,570 to be able to define all the things we need from the book's XML file. 142 00:14:25,930 --> 00:14:28,480 So I will create a new file in Section three. 143 00:14:28,870 --> 00:14:31,690 I will select a new file actually. 144 00:14:31,690 --> 00:14:38,800 Let's select a python file and here I will type iterate books. 145 00:14:39,150 --> 00:14:41,560 OK, so this is how the file will be called. 146 00:14:41,560 --> 00:14:43,270 And let's write from. 147 00:14:44,880 --> 00:14:50,370 Ximo dot e three dot s. 148 00:14:52,860 --> 00:15:03,090 element, three import eater parts, OK, you can select that from here and let's create a function 149 00:15:03,090 --> 00:15:07,170 here, so give books and in the brackets list. 150 00:15:07,170 --> 00:15:08,580 Right file. 151 00:15:10,290 --> 00:15:19,920 OK, and here write for event command element in Interplast. 152 00:15:20,400 --> 00:15:25,980 OK, and then file in here. 153 00:15:25,980 --> 00:15:26,300 Right. 154 00:15:26,310 --> 00:15:39,800 If event is equal to starts and element dot tag is equal to ruled. 155 00:15:40,390 --> 00:16:01,080 OK then I would like to happen is to write book dot element and then if event is equal to and and element 156 00:16:01,380 --> 00:16:10,140 dot that is equal to the book we would like to print in that case. 157 00:16:12,300 --> 00:16:31,440 OK, so we would like to print the zero and then one, then two, then three and four, that's it. 158 00:16:31,470 --> 00:16:47,460 And here I thought format's elements dot find text and then title comma element dot fien. 159 00:16:48,210 --> 00:16:53,810 OK, and actually let's place those on a different intentionally. 160 00:16:53,950 --> 00:17:02,190 Also our code is more representable so let's try to here this and then element. 161 00:17:02,190 --> 00:17:02,700 Yes. 162 00:17:02,730 --> 00:17:07,500 Dot find text and the text. 163 00:17:07,500 --> 00:17:10,720 I would like to find this publisher. 164 00:17:11,100 --> 00:17:17,430 OK, so the next thing we'll find is the publisher and after this writing are common and let's write 165 00:17:17,430 --> 00:17:31,350 element dot find text and here will search for number of chapters. 166 00:17:31,590 --> 00:17:34,340 OK, that's it. 167 00:17:35,610 --> 00:17:46,990 And after that let's write elements dot find text and search for page page count. 168 00:17:47,280 --> 00:17:54,180 We're just trying to search for basically every possible element that we can find so we can practice 169 00:17:55,020 --> 00:17:57,470 our search into the XML file. 170 00:17:57,750 --> 00:18:01,590 So let's write in Element Dot Find. 171 00:18:02,790 --> 00:18:15,750 Text and then let's right out there, OK, and let's close all the brackets, so let's create another 172 00:18:15,750 --> 00:18:18,010 if statement here in the outer layer. 173 00:18:18,360 --> 00:18:23,670 So I would do if event equals. 174 00:18:24,810 --> 00:18:34,830 And so in the previous we did start and then we are doomed to case of pencil first with it if the event 175 00:18:34,830 --> 00:18:37,440 equals to end and attack his book. 176 00:18:37,440 --> 00:18:44,360 And now two, if the event is equal to end and element that attack. 177 00:18:46,210 --> 00:18:50,140 Is equal to chapter. 178 00:18:51,930 --> 00:19:03,360 All right, so if this is the case, we print, OK, we'll print zero and then one and then two. 179 00:19:05,750 --> 00:19:16,040 OK, so let's right here, Dalts, and we're going to the format, but let's play the format on the 180 00:19:16,850 --> 00:19:28,730 next the tension level in here, I'll write elements that find text and here we're going to try finding 181 00:19:29,660 --> 00:19:30,700 the chapter numbers. 182 00:19:30,710 --> 00:19:33,370 So write chapter number. 183 00:19:34,130 --> 00:19:46,550 OK, so let's write comma here, then I will write element that find text, OK, and then I will search 184 00:19:46,550 --> 00:19:51,200 for chapter title. 185 00:19:52,670 --> 00:20:02,780 OK, and after that I'll write element that finds text and then let's write. 186 00:20:04,740 --> 00:20:06,470 Page count. 187 00:20:08,130 --> 00:20:08,750 OK. 188 00:20:08,820 --> 00:20:09,970 And that's it here. 189 00:20:10,590 --> 00:20:16,820 So this function can be run by writing the and calling them in class. 190 00:20:16,860 --> 00:20:17,490 All right. 191 00:20:17,980 --> 00:20:18,600 Name 192 00:20:20,880 --> 00:20:21,780 equals. 193 00:20:24,030 --> 00:20:38,680 Maine, actually, Maine, OK, in the books open and then books that ximo so our country is ready. 194 00:20:38,930 --> 00:20:49,670 Guys, let's save it and let's go to the terminal and go into Python and then iterate books. 195 00:20:50,130 --> 00:20:59,580 And once they do that, you can see that first we're getting one chapter one and 30 should be the page 196 00:20:59,580 --> 00:21:07,220 count here and then we get chapter two in Chapter two has twenty five pages to display here this then 197 00:21:07,260 --> 00:21:11,640 we get all the information for the book so we get the book. 198 00:21:11,640 --> 00:21:13,860 Is Python to Out or Mark Martin. 199 00:21:13,860 --> 00:21:26,790 Why then we have 13 chapters, um, 500 pages and outrace out or one and then we get obviously information 200 00:21:26,790 --> 00:21:27,770 about the chapters. 201 00:21:28,080 --> 00:21:34,140 So we have one chapter 30 pages and there are twenty five and so on. 202 00:21:34,920 --> 00:21:36,140 This is for book two. 203 00:21:36,480 --> 00:21:40,170 So this is how our function here is working. 204 00:21:40,410 --> 00:21:45,870 And obviously first we get information for the book and then we'll get information for the chapter and 205 00:21:45,870 --> 00:21:46,980 you can see all the pages. 206 00:21:47,130 --> 00:21:55,860 And this is a way actually that you can take a simple XML file and transfer it into a python representation 207 00:21:56,100 --> 00:21:57,150 quite easily. 208 00:21:57,180 --> 00:22:04,980 So this XML file are all over the Internet and now we know how to work with them and transfer them into 209 00:22:04,980 --> 00:22:06,270 python representation. 210 00:22:06,780 --> 00:22:07,290 That's it. 211 00:22:07,320 --> 00:22:09,090 Guys, thanks very much for watching. 212 00:22:09,870 --> 00:22:16,440 I appreciate you spending this time to check out the XML representation in transferring to Python. 213 00:22:16,650 --> 00:22:18,930 And I will see you in our next video.