1 00:00:00,330 --> 00:00:01,080 Hello, everyone. 2 00:00:01,800 --> 00:00:06,030 In this video, we will import our midweek election data and do Python. 3 00:00:07,540 --> 00:00:13,140 So before doing that, let's first import all the important libraries that we need. 4 00:00:14,620 --> 00:00:23,450 First, we will import number with Alia's as and B, we will import pandas with Elías as pilly. 5 00:00:24,040 --> 00:00:32,950 We will import Seabourne with Elías as an and we will also import my blood clip, not by a plot as ability 6 00:00:34,450 --> 00:00:36,010 to execute DUSSEL. 7 00:00:37,180 --> 00:00:40,720 If you remember, you can shift less and that. 8 00:00:41,730 --> 00:00:46,930 It will execute the sale and it will make the next sale as active sale. 9 00:00:47,730 --> 00:00:53,730 If you want to insert a blank cell after your currency, then you can press or you enter. 10 00:00:54,030 --> 00:01:00,240 It will execute your current sale and it will insert an empty cell below your current cell. 11 00:01:01,570 --> 00:01:09,430 So here we will use tender now, importing data as found, our data frame is very easy. 12 00:01:10,240 --> 00:01:14,050 You just have to use read Reed, Underscore CSP, Mentone. 13 00:01:15,500 --> 00:01:24,880 And if you hit shift less stab, you will get all the parameters that we need for this 3D underscores. 14 00:01:24,950 --> 00:01:28,820 Yes, we might hold the first parameter here is file, but. 15 00:01:30,120 --> 00:01:36,920 So first, you have to enter the final part of the file you want to import, since my movie underscored 16 00:01:36,920 --> 00:01:40,380 regression NTSC file is in my default folder. 17 00:01:40,980 --> 00:01:46,140 I don't have to write the whole file, but I can just write the file name. 18 00:01:47,380 --> 00:01:55,190 If you have restored your file in some another directory, you have to write the full file. 19 00:01:55,390 --> 00:01:55,600 But. 20 00:01:57,460 --> 00:02:03,420 And remember to put forward slashes and sort of back slashes while inserting your file. 21 00:02:03,550 --> 00:02:03,850 But. 22 00:02:06,380 --> 00:02:14,740 The second parameter here that I'm going to uses headers equate to zero since my ACSU file have Haddad's 23 00:02:15,470 --> 00:02:17,060 at zero row. 24 00:02:17,450 --> 00:02:18,950 That is the first true. 25 00:02:19,860 --> 00:02:21,000 Consist header. 26 00:02:21,390 --> 00:02:26,610 That's what I have to write, headers equate to zero header. 27 00:02:26,730 --> 00:02:35,100 Since my files I've heard it and zero since the headers are at the first row of my file and indexing, 28 00:02:35,120 --> 00:02:37,370 although that starts at zero in Python. 29 00:02:40,330 --> 00:02:41,560 I can execute this. 30 00:02:46,260 --> 00:02:53,030 Now, to view a sample of our beef data frame, we can just read D.F. Dot Head. 31 00:02:55,320 --> 00:02:59,820 This is my third, which will give us the sample of first five rows of our data frame. 32 00:03:01,820 --> 00:03:03,290 You can see this is over. 33 00:03:03,800 --> 00:03:06,680 They tapped him on the top. 34 00:03:06,800 --> 00:03:13,550 We have all the column headers and on the left we have the indexes there, zero, one, two, three, 35 00:03:13,550 --> 00:03:13,940 four. 36 00:03:14,870 --> 00:03:19,280 And in between, we have all the data, all four data from. 37 00:03:22,320 --> 00:03:25,590 You can see that the last column here is collection. 38 00:03:25,830 --> 00:03:32,460 This is our way, variable or dependent, variable and dress of all the variables are over independent 39 00:03:32,460 --> 00:03:32,940 variable. 40 00:03:36,390 --> 00:03:45,110 Now, to get a quick summary of data types and account of each variable we can use in formatter. 41 00:03:45,960 --> 00:03:47,640 We will write D.F. Dot in full. 42 00:03:51,660 --> 00:03:52,530 And we will execute. 43 00:03:56,020 --> 00:04:02,290 You can see on the top we have that type of data frame, since this is binary data frame. 44 00:04:02,440 --> 00:04:05,620 We are getting pounded our core frame dot data frame. 45 00:04:06,890 --> 00:04:10,780 Then, since our data frame consists five hundred and six entries. 46 00:04:11,150 --> 00:04:19,250 The second goal here is telling us that there are five hundred and six entries and the index range is 47 00:04:19,250 --> 00:04:21,070 from zero to five zero five. 48 00:04:22,750 --> 00:04:26,980 Then we have all the detail columns that we have in our data frame. 49 00:04:27,370 --> 00:04:29,230 We have the cone of columns. 50 00:04:30,810 --> 00:04:36,900 You can see for all the columns except time taken, the count base five zero six. 51 00:04:39,250 --> 00:04:47,860 This means that there are some null values in our time taken, there are empty rows in time taken column 52 00:04:49,180 --> 00:04:55,240 in the later part of this course, we will see how to treat our time taken very well to correct all 53 00:04:55,240 --> 00:04:56,140 the missing values. 54 00:04:59,010 --> 00:04:59,370 Then. 55 00:05:00,720 --> 00:05:08,880 At the last we have the tape of each column, you can see most of our variables are of flawed and end 56 00:05:08,890 --> 00:05:09,130 date. 57 00:05:10,340 --> 00:05:11,840 But there are two variable. 58 00:05:13,080 --> 00:05:20,650 Which is to be available and Jonah, which are object type object, Mings String. 59 00:05:21,150 --> 00:05:24,900 So these two variables are a string type variable. 60 00:05:25,620 --> 00:05:31,470 And in the later part of this course, we will see how to convert this categorical string variables 61 00:05:32,040 --> 00:05:36,810 and to numerical dummy variable in the next video. 62 00:05:36,870 --> 00:05:39,730 We will see how to treat missing values.