1
00:00:00,330 --> 00:00:01,080
Hello, everyone.

2
00:00:01,800 --> 00:00:06,030
In this video, we will import our midweek election data and do Python.

3
00:00:07,540 --> 00:00:13,140
So before doing that, let's first import all the important libraries that we need.

4
00:00:14,620 --> 00:00:23,450
First, we will import number with Alia's as and B, we will import pandas with Elías as pilly.

5
00:00:24,040 --> 00:00:32,950
We will import Seabourne with Elías as an and we will also import my blood clip, not by a plot as ability

6
00:00:34,450 --> 00:00:36,010
to execute DUSSEL.

7
00:00:37,180 --> 00:00:40,720
If you remember, you can shift less and that.

8
00:00:41,730 --> 00:00:46,930
It will execute the sale and it will make the next sale as active sale.

9
00:00:47,730 --> 00:00:53,730
If you want to insert a blank cell after your currency, then you can press or you enter.

10
00:00:54,030 --> 00:01:00,240
It will execute your current sale and it will insert an empty cell below your current cell.

11
00:01:01,570 --> 00:01:09,430
So here we will use tender now, importing data as found, our data frame is very easy.

12
00:01:10,240 --> 00:01:14,050
You just have to use read Reed, Underscore CSP, Mentone.

13
00:01:15,500 --> 00:01:24,880
And if you hit shift less stab, you will get all the parameters that we need for this 3D underscores.

14
00:01:24,950 --> 00:01:28,820
Yes, we might hold the first parameter here is file, but.

15
00:01:30,120 --> 00:01:36,920
So first, you have to enter the final part of the file you want to import, since my movie underscored

16
00:01:36,920 --> 00:01:40,380
regression NTSC file is in my default folder.

17
00:01:40,980 --> 00:01:46,140
I don't have to write the whole file, but I can just write the file name.

18
00:01:47,380 --> 00:01:55,190
If you have restored your file in some another directory, you have to write the full file.

19
00:01:55,390 --> 00:01:55,600
But.

20
00:01:57,460 --> 00:02:03,420
And remember to put forward slashes and sort of back slashes while inserting your file.

21
00:02:03,550 --> 00:02:03,850
But.

22
00:02:06,380 --> 00:02:14,740
The second parameter here that I'm going to uses headers equate to zero since my ACSU file have Haddad's

23
00:02:15,470 --> 00:02:17,060
at zero row.

24
00:02:17,450 --> 00:02:18,950
That is the first true.

25
00:02:19,860 --> 00:02:21,000
Consist header.

26
00:02:21,390 --> 00:02:26,610
That's what I have to write, headers equate to zero header.

27
00:02:26,730 --> 00:02:35,100
Since my files I've heard it and zero since the headers are at the first row of my file and indexing,

28
00:02:35,120 --> 00:02:37,370
although that starts at zero in Python.

29
00:02:40,330 --> 00:02:41,560
I can execute this.

30
00:02:46,260 --> 00:02:53,030
Now, to view a sample of our beef data frame, we can just read D.F. Dot Head.

31
00:02:55,320 --> 00:02:59,820
This is my third, which will give us the sample of first five rows of our data frame.

32
00:03:01,820 --> 00:03:03,290
You can see this is over.

33
00:03:03,800 --> 00:03:06,680
They tapped him on the top.

34
00:03:06,800 --> 00:03:13,550
We have all the column headers and on the left we have the indexes there, zero, one, two, three,

35
00:03:13,550 --> 00:03:13,940
four.

36
00:03:14,870 --> 00:03:19,280
And in between, we have all the data, all four data from.

37
00:03:22,320 --> 00:03:25,590
You can see that the last column here is collection.

38
00:03:25,830 --> 00:03:32,460
This is our way, variable or dependent, variable and dress of all the variables are over independent

39
00:03:32,460 --> 00:03:32,940
variable.

40
00:03:36,390 --> 00:03:45,110
Now, to get a quick summary of data types and account of each variable we can use in formatter.

41
00:03:45,960 --> 00:03:47,640
We will write D.F. Dot in full.

42
00:03:51,660 --> 00:03:52,530
And we will execute.

43
00:03:56,020 --> 00:04:02,290
You can see on the top we have that type of data frame, since this is binary data frame.

44
00:04:02,440 --> 00:04:05,620
We are getting pounded our core frame dot data frame.

45
00:04:06,890 --> 00:04:10,780
Then, since our data frame consists five hundred and six entries.

46
00:04:11,150 --> 00:04:19,250
The second goal here is telling us that there are five hundred and six entries and the index range is

47
00:04:19,250 --> 00:04:21,070
from zero to five zero five.

48
00:04:22,750 --> 00:04:26,980
Then we have all the detail columns that we have in our data frame.

49
00:04:27,370 --> 00:04:29,230
We have the cone of columns.

50
00:04:30,810 --> 00:04:36,900
You can see for all the columns except time taken, the count base five zero six.

51
00:04:39,250 --> 00:04:47,860
This means that there are some null values in our time taken, there are empty rows in time taken column

52
00:04:49,180 --> 00:04:55,240
in the later part of this course, we will see how to treat our time taken very well to correct all

53
00:04:55,240 --> 00:04:56,140
the missing values.

54
00:04:59,010 --> 00:04:59,370
Then.

55
00:05:00,720 --> 00:05:08,880
At the last we have the tape of each column, you can see most of our variables are of flawed and end

56
00:05:08,890 --> 00:05:09,130
date.

57
00:05:10,340 --> 00:05:11,840
But there are two variable.

58
00:05:13,080 --> 00:05:20,650
Which is to be available and Jonah, which are object type object, Mings String.

59
00:05:21,150 --> 00:05:24,900
So these two variables are a string type variable.

60
00:05:25,620 --> 00:05:31,470
And in the later part of this course, we will see how to convert this categorical string variables

61
00:05:32,040 --> 00:05:36,810
and to numerical dummy variable in the next video.

62
00:05:36,870 --> 00:05:39,730
We will see how to treat missing values.