1
00:00:00,500 --> 00:00:05,720
In this lecture, we will learn how to import a word, name cities fine.

2
00:00:06,450 --> 00:00:07,170
In them

3
00:00:09,870 --> 00:00:10,970
for this group audience.

4
00:00:11,790 --> 00:00:16,360
We will be importing Bailey to tell female, but that's it.

5
00:00:16,640 --> 00:00:22,400
And Biton, this dataset is publicly available on trade forms like Gagen.

6
00:00:23,520 --> 00:00:29,040
And you can also download the CSP file from the resources section of this lecture.

7
00:00:30,570 --> 00:00:32,640
This dataset have two columns.

8
00:00:33,750 --> 00:00:36,960
First column is for dates and second column.

9
00:00:37,260 --> 00:00:38,190
As for the number of.

10
00:00:38,220 --> 00:00:43,410
But now let's import this database.

11
00:00:43,560 --> 00:00:47,630
We will lose Bandas reads GSV Method.

12
00:00:48,300 --> 00:00:57,210
So let's just first import Banda's and then we are importing this dataset using not greed's.

13
00:00:57,390 --> 00:01:05,130
Yes we method since this CSU file is available in the working directory off my Python library.

14
00:01:05,280 --> 00:01:12,960
I am not providing the complete but I'm just providing the file name if you want.

15
00:01:13,050 --> 00:01:18,810
You can also provide the whole part of this file instead of just the file name.

16
00:01:19,920 --> 00:01:22,590
And then my file have had.

17
00:01:22,640 --> 00:01:33,000
That's that's why I'm writing headers equate to zero zero means that heads are available at the 08 index.

18
00:01:33,120 --> 00:01:35,830
That is the first rule of my dataset.

19
00:01:36,900 --> 00:01:44,670
If you had file does not contain any headers, you have to write headers equate to none known.

20
00:01:45,360 --> 00:01:53,210
Let's just import this dataset and we are saving this dataset into an object called data frame.

21
00:01:54,550 --> 00:01:58,830
So we are creating a data frame object of Ponder's with the name Duty-Free.

22
00:01:59,880 --> 00:02:00,690
Let's just run it.

23
00:02:05,230 --> 00:02:09,340
Now to look at the first five records of this newly created.

24
00:02:10,450 --> 00:02:12,130
We can just call head matter.

25
00:02:12,730 --> 00:02:15,860
We right to frame.

26
00:02:22,190 --> 00:02:25,320
And you can see we have the first five rows of data set.

27
00:02:26,200 --> 00:02:28,150
The first column is of dates.

28
00:02:28,720 --> 00:02:30,460
The second column is our birds.

29
00:02:31,210 --> 00:02:34,840
And we have index from zero and so on.

30
00:02:35,980 --> 00:02:40,030
Now, let's check the database of this data column.

31
00:02:41,380 --> 00:02:48,160
We want to know whether PANDAS is identifying this text as a string or as a date.

32
00:02:52,380 --> 00:02:56,530
For that, we can write Dufrene.

33
00:02:59,190 --> 00:03:00,630
Then they'd call up.

34
00:03:01,760 --> 00:03:03,680
And then in more detail,

35
00:03:08,750 --> 00:03:17,710
you can see the output we are getting is awe or sense for object or in other word strings.

36
00:03:18,860 --> 00:03:25,730
So currently one does is not identifying this big text as a date.

37
00:03:26,750 --> 00:03:32,620
It is identifying this value as a string for what?

38
00:03:32,660 --> 00:03:34,430
Future data analysis.

39
00:03:34,880 --> 00:03:41,270
We want this value as dates, not as a string to do that.

40
00:03:41,390 --> 00:03:47,690
We can use a parameter that is past date using this parameter.

41
00:03:47,930 --> 00:03:55,340
We can mention the column which contains dates, and Banda's will automatically identify the format

42
00:03:55,370 --> 00:03:58,760
of that column and convert it into a debate format.

43
00:04:00,800 --> 00:04:02,810
So let's create another dappling.

44
00:04:04,070 --> 00:04:05,660
I'm calling is D.F. to.

45
00:04:07,090 --> 00:04:13,550
And simply, we are writing the same statement, but this time we are using additional parameter of

46
00:04:13,550 --> 00:04:18,230
pass state and then we are mentioning that at all to zero through.

47
00:04:18,770 --> 00:04:20,030
That is the fourth straw.

48
00:04:21,440 --> 00:04:24,170
There are dates in that column.

49
00:04:25,630 --> 00:04:29,870
So suppose if my dates are in second column, I have to write one here.

50
00:04:30,540 --> 00:04:33,500
I have to mention the index which contains the dates.

51
00:04:34,700 --> 00:04:36,050
Let us run this.

52
00:04:40,010 --> 00:04:42,160
So my beef to deter fame is ready.

53
00:04:42,190 --> 00:04:50,340
Know now to take the first fibros I can ride the f blue dot peg.

54
00:04:54,230 --> 00:05:01,370
You can see the output is similar, but this thing, this dates are in the form of the time for.

55
00:05:01,820 --> 00:05:03,410
Not in the form of the string.

56
00:05:03,410 --> 00:05:10,740
For me to check that, I can just straight up frame the do.

57
00:05:12,590 --> 00:05:19,980
Then I want to look at the date, that type of date column and just play the detailed.

58
00:05:23,680 --> 00:05:29,500
You can see earlier the fundamentals of the tape or which is sense for string.

59
00:05:30,460 --> 00:05:34,410
This time the phone my boys Emmett Emmett stands for.

60
00:05:34,680 --> 00:05:35,890
They paint for me.

61
00:05:37,540 --> 00:05:43,780
Now, you can see that does has automatically identified the format of these dates.

62
00:05:45,400 --> 00:05:47,150
So here the four on my days.

63
00:05:47,320 --> 00:05:47,860
Why, why?

64
00:05:47,920 --> 00:05:50,860
Why, why m-m didi.

65
00:05:52,880 --> 00:06:03,880
And Fondas has automatically identified that me for me and of that, if the format was of MMD Levi Way.

66
00:06:04,130 --> 00:06:07,780
In that case also pandas would have automatically identified in the format.

67
00:06:10,130 --> 00:06:14,700
But in some cases, pandas have some trouble in identifying the format.

68
00:06:15,140 --> 00:06:17,890
In those cases, you can use DEET password.

69
00:06:18,620 --> 00:06:19,810
But I think that as well.

70
00:06:22,770 --> 00:06:32,610
So only in cases where you are finding problems in loading data and loading the time data using this

71
00:06:32,610 --> 00:06:40,320
method on the in those cases you have to use date parser for date parser.

72
00:06:41,220 --> 00:06:46,620
You have to mention the format of fear data that is present in columns.

73
00:06:48,510 --> 00:06:58,440
So, for example, in this case, my data is available in the form of why year, month and days?

74
00:06:58,980 --> 00:07:01,800
Then I have to mention the format of such dates.

75
00:07:04,340 --> 00:07:12,340
For example, here, this percentage by a percentage and percentage B means that if my date is nineteen

76
00:07:12,340 --> 00:07:21,560
ninety nine slash well slash zero one, then we have a space for our.

77
00:07:22,520 --> 00:07:24,770
Then there is a colon.

78
00:07:25,280 --> 00:07:27,950
Then there is a minute and then there is a second.

79
00:07:28,580 --> 00:07:35,450
So if my date is in the form of this index affirmation the format in this way.

80
00:07:37,820 --> 00:07:45,830
So for date poza you have to first create a date parser function where you have to mention the format

81
00:07:46,310 --> 00:07:52,280
and then pass this function as an argument for date underscore parser parameter.

82
00:07:55,780 --> 00:08:04,210
You can also look at the link that I have provided you to look at all the directives that are available

83
00:08:04,300 --> 00:08:05,640
with this date parser.

84
00:08:08,470 --> 00:08:15,360
So suppose if your months are in this format, you have to write percentage B instead of percentage

85
00:08:15,460 --> 00:08:15,630
M.

86
00:08:16,570 --> 00:08:18,650
So just look at all this tapes.

87
00:08:19,660 --> 00:08:23,500
And create your own big parcel function.

88
00:08:23,890 --> 00:08:28,680
And then use it to import your dates and date time for me.

89
00:08:30,970 --> 00:08:33,820
But for most cases, you are not going to need it.

90
00:08:34,910 --> 00:08:39,960
Fondas will automatically identify the date time formic for you.

91
00:08:44,500 --> 00:08:51,490
Now, another way to import daytime data is by using cities instead of data fames.

92
00:08:52,450 --> 00:08:55,180
So here we were creating data frames.

93
00:08:55,720 --> 00:08:59,290
Our indexes were in the form of numerical data.

94
00:09:01,410 --> 00:09:10,710
But there is one more matter to import time cities where we will have dates or date time values as our

95
00:09:10,710 --> 00:09:11,370
index.

96
00:09:11,820 --> 00:09:16,590
And then a single column for the value of their cities.

97
00:09:18,810 --> 00:09:24,330
So, for example, let's just import this same data as the cities as well.

98
00:09:26,160 --> 00:09:29,610
So I am creating another object that we are calling cities.

99
00:09:30,880 --> 00:09:34,860
And here also I am using the same method be read out routes CSP.

100
00:09:35,670 --> 00:09:45,450
Then we have to mention the finally then headers and then we are using pastorate equate to zero since

101
00:09:45,540 --> 00:09:48,060
our first column contains dates.

102
00:09:48,810 --> 00:09:54,600
And then for cities, we need to make our date time data as our indexed column.

103
00:09:55,590 --> 00:09:58,980
So I am writing index underscore call equate to zero.

104
00:09:59,540 --> 00:10:03,410
That is our first column is the index for our cities.

105
00:10:04,020 --> 00:10:12,090
And then to convert this data frame in two cities, I can read it squeeze equate to grow just on this.

106
00:10:13,350 --> 00:10:16,980
So my series is really.

107
00:10:20,040 --> 00:10:22,020
Let's look at the first five aloose.

108
00:10:30,300 --> 00:10:32,530
You can see these are the first five values.

109
00:10:34,110 --> 00:10:39,540
You can notice now the city's name is, but there is no individual column name.

110
00:10:40,410 --> 00:10:44,000
The values that are contained in this series is OK.

111
00:10:44,020 --> 00:10:48,510
But so instead of column name, we have our city's name as Bert's.

112
00:10:51,450 --> 00:10:57,570
The big type is in digits since this second column contains integer values.

113
00:10:59,280 --> 00:11:03,780
And for this series, we have big time values as our indexes.

114
00:11:04,170 --> 00:11:09,420
And this second column that is the column is the values of our cities.

115
00:11:10,350 --> 00:11:17,520
So the on the difference between data frame and cities is that in data frame we have numerical indexes

116
00:11:17,520 --> 00:11:20,460
from zero to the land of ordinary Dufrene.

117
00:11:21,150 --> 00:11:22,170
We have two columns.

118
00:11:22,420 --> 00:11:23,970
First one is dead.

119
00:11:24,060 --> 00:11:27,150
And the second one is birth in cities.

120
00:11:27,270 --> 00:11:35,910
We have indexes as a World Time Series data and the values corresponding to those indexes as the values

121
00:11:36,000 --> 00:11:37,310
of cities.

122
00:11:39,420 --> 00:11:43,300
Now let's look at some different attributes of their data frame.

123
00:11:43,500 --> 00:11:45,240
And the cities that we have created.

124
00:11:48,210 --> 00:11:53,210
To look at the size of Ford data from cities, we can right dot shaped my head.

125
00:11:53,690 --> 00:11:57,400
So c.D or Chip?

126
00:11:59,640 --> 00:12:02,060
Sidis is our object name here.

127
00:12:04,490 --> 00:12:06,720
We are just using dot ship attribute.

128
00:12:07,900 --> 00:12:10,960
Similarly, what do Dufferin words D.F. do?

129
00:12:14,060 --> 00:12:14,980
Using dot ship.

130
00:12:18,080 --> 00:12:21,740
You can see we have it on 365 values in our cities.

131
00:12:22,950 --> 00:12:25,550
Since cities only have one single column.

132
00:12:25,730 --> 00:12:29,170
You cannot have more than one column in your cities.

133
00:12:30,070 --> 00:12:32,420
Let's say we are not getting the number of columns.

134
00:12:33,260 --> 00:12:38,930
Whereas for the data frame that we have created, we have two columns, first one for date and second

135
00:12:38,930 --> 00:12:39,920
one for the birth.

136
00:12:40,660 --> 00:12:43,160
That's why we are getting 365 Pommer too.

137
00:12:43,430 --> 00:12:47,570
There are 365 rows and two columns.

138
00:12:51,350 --> 00:12:57,070
Now let's look at how to get that subset of our cities or our data frame.

139
00:13:00,430 --> 00:13:10,030
So far, cities, if I want values of January of 1959, I can just ride cities and I can mention the

140
00:13:10,120 --> 00:13:11,320
year and the month.

141
00:13:11,680 --> 00:13:21,010
Well, you remember, this is the advantage of using the time format and sort of a string for me here.

142
00:13:21,140 --> 00:13:22,850
I'm not mentioning the days value.

143
00:13:23,560 --> 00:13:27,970
I will automatically get all the values of January 1959.

144
00:13:28,600 --> 00:13:36,580
If you're on this, you can see that I'm getting all the values from day one to day 31.

145
00:13:38,380 --> 00:13:40,660
This is the way to slice your data.

146
00:13:40,810 --> 00:13:50,480
In case of cities, if you just write 1959 here, you will get data for all the months and all the days.

147
00:13:54,060 --> 00:13:58,650
Similarly for data frame, we can write condition.

148
00:14:00,600 --> 00:14:03,550
So here we want the values from the data frame.

149
00:14:04,530 --> 00:14:06,520
We're the data frame date.

150
00:14:06,750 --> 00:14:08,370
Value is more than this.

151
00:14:08,610 --> 00:14:11,550
And data frame data value is less than this.

152
00:14:13,380 --> 00:14:19,230
So for this 21 days from 2nd Gen to 21st Gen.

153
00:14:20,700 --> 00:14:23,400
If I want to get a value, I can write like this.

154
00:14:24,720 --> 00:14:28,330
Here also, this is the advantage of using date time for me.

155
00:14:29,670 --> 00:14:37,470
If we were using string format, this was just a string for Bandas and it will not be able to identify

156
00:14:37,500 --> 00:14:42,060
whether January two is between these two values or not.

157
00:14:42,270 --> 00:14:51,480
But since we are using data format, Banda's can identify that Denbury five lies between first gen and

158
00:14:51,630 --> 00:14:52,460
21st Gen.

159
00:14:54,120 --> 00:14:55,530
Let's run this as well.

160
00:14:56,790 --> 00:15:00,360
You can see the result is as expected.

161
00:15:00,870 --> 00:15:05,040
We have values from 2nd Gen build 21st Gen.

162
00:15:06,880 --> 00:15:12,330
So in case of series you can just write them on values and you will get all the values.

163
00:15:12,600 --> 00:15:21,870
But for the top frame, you have to mention the range of dates between which you want the data now to

164
00:15:21,870 --> 00:15:26,410
find important statistics about your data frame or series.

165
00:15:27,120 --> 00:15:29,270
You can use Don describe method.

166
00:15:29,910 --> 00:15:36,000
So just write your series or data frame, name and use or describe method here.

167
00:15:36,390 --> 00:15:44,550
You will get values of the number of homes, mean standard deviation, minimum value, the value present

168
00:15:44,580 --> 00:15:47,360
at the 25th percentile of the value presented.

169
00:15:47,400 --> 00:15:51,930
The 15th percentile and the value presented in the fifth percentile.

170
00:15:52,680 --> 00:15:54,690
And the maximum value of your cities.

171
00:15:56,020 --> 00:16:00,990
So for our cities, the total number of observations are 365.

172
00:16:01,800 --> 00:16:06,370
The mean of all these values is forty one point nine eight.

173
00:16:06,960 --> 00:16:09,090
Standard deviation is seven point three.

174
00:16:09,870 --> 00:16:13,890
The minimum values frantically, the maximum value is 73.

175
00:16:14,370 --> 00:16:19,770
And then we have that 25 feet and 75 percent value as well.

176
00:16:21,000 --> 00:16:26,220
The name of this series is Births and the D type is flawed.

177
00:16:27,150 --> 00:16:36,810
Similarly, this is the stats for our cities for data frame can again use Dort describe method here.

178
00:16:36,810 --> 00:16:41,410
Also, you will get all these values for your columns which contain numerical value.

179
00:16:42,630 --> 00:16:47,280
So the statistics are the same since we have loaded the same data.

180
00:16:48,660 --> 00:16:56,170
So this is the way to load your data and to get some basic statistics about your data.

181
00:16:58,570 --> 00:17:02,870
Now, to summarize, you can load your data as a data frame.

182
00:17:03,320 --> 00:17:12,590
And in case, if you have only one column for euro value, you can also imported as your cities four

183
00:17:12,590 --> 00:17:13,310
dates.

184
00:17:13,460 --> 00:17:18,610
You need to take special care to import it in our day time format.

185
00:17:19,130 --> 00:17:28,880
Using part state and mentioning the column name for Seabees and data frame containing time series data,

186
00:17:28,940 --> 00:17:36,830
you can use all the basic methods of pandas data Dufrene such as head for looking at the first five

187
00:17:36,830 --> 00:17:45,590
values shape for looking at the size of your data frame and or describe metric to look at some basic

188
00:17:45,710 --> 00:17:47,550
statistics about your affair.

189
00:17:48,980 --> 00:17:50,270
That's all for this video.

190
00:17:51,440 --> 00:17:51,810
Thank you.