1 00:00:00,500 --> 00:00:05,720 In this lecture, we will learn how to import a word, name cities fine. 2 00:00:06,450 --> 00:00:07,170 In them 3 00:00:09,870 --> 00:00:10,970 for this group audience. 4 00:00:11,790 --> 00:00:16,360 We will be importing Bailey to tell female, but that's it. 5 00:00:16,640 --> 00:00:22,400 And Biton, this dataset is publicly available on trade forms like Gagen. 6 00:00:23,520 --> 00:00:29,040 And you can also download the CSP file from the resources section of this lecture. 7 00:00:30,570 --> 00:00:32,640 This dataset have two columns. 8 00:00:33,750 --> 00:00:36,960 First column is for dates and second column. 9 00:00:37,260 --> 00:00:38,190 As for the number of. 10 00:00:38,220 --> 00:00:43,410 But now let's import this database. 11 00:00:43,560 --> 00:00:47,630 We will lose Bandas reads GSV Method. 12 00:00:48,300 --> 00:00:57,210 So let's just first import Banda's and then we are importing this dataset using not greed's. 13 00:00:57,390 --> 00:01:05,130 Yes we method since this CSU file is available in the working directory off my Python library. 14 00:01:05,280 --> 00:01:12,960 I am not providing the complete but I'm just providing the file name if you want. 15 00:01:13,050 --> 00:01:18,810 You can also provide the whole part of this file instead of just the file name. 16 00:01:19,920 --> 00:01:22,590 And then my file have had. 17 00:01:22,640 --> 00:01:33,000 That's that's why I'm writing headers equate to zero zero means that heads are available at the 08 index. 18 00:01:33,120 --> 00:01:35,830 That is the first rule of my dataset. 19 00:01:36,900 --> 00:01:44,670 If you had file does not contain any headers, you have to write headers equate to none known. 20 00:01:45,360 --> 00:01:53,210 Let's just import this dataset and we are saving this dataset into an object called data frame. 21 00:01:54,550 --> 00:01:58,830 So we are creating a data frame object of Ponder's with the name Duty-Free. 22 00:01:59,880 --> 00:02:00,690 Let's just run it. 23 00:02:05,230 --> 00:02:09,340 Now to look at the first five records of this newly created. 24 00:02:10,450 --> 00:02:12,130 We can just call head matter. 25 00:02:12,730 --> 00:02:15,860 We right to frame. 26 00:02:22,190 --> 00:02:25,320 And you can see we have the first five rows of data set. 27 00:02:26,200 --> 00:02:28,150 The first column is of dates. 28 00:02:28,720 --> 00:02:30,460 The second column is our birds. 29 00:02:31,210 --> 00:02:34,840 And we have index from zero and so on. 30 00:02:35,980 --> 00:02:40,030 Now, let's check the database of this data column. 31 00:02:41,380 --> 00:02:48,160 We want to know whether PANDAS is identifying this text as a string or as a date. 32 00:02:52,380 --> 00:02:56,530 For that, we can write Dufrene. 33 00:02:59,190 --> 00:03:00,630 Then they'd call up. 34 00:03:01,760 --> 00:03:03,680 And then in more detail, 35 00:03:08,750 --> 00:03:17,710 you can see the output we are getting is awe or sense for object or in other word strings. 36 00:03:18,860 --> 00:03:25,730 So currently one does is not identifying this big text as a date. 37 00:03:26,750 --> 00:03:32,620 It is identifying this value as a string for what? 38 00:03:32,660 --> 00:03:34,430 Future data analysis. 39 00:03:34,880 --> 00:03:41,270 We want this value as dates, not as a string to do that. 40 00:03:41,390 --> 00:03:47,690 We can use a parameter that is past date using this parameter. 41 00:03:47,930 --> 00:03:55,340 We can mention the column which contains dates, and Banda's will automatically identify the format 42 00:03:55,370 --> 00:03:58,760 of that column and convert it into a debate format. 43 00:04:00,800 --> 00:04:02,810 So let's create another dappling. 44 00:04:04,070 --> 00:04:05,660 I'm calling is D.F. to. 45 00:04:07,090 --> 00:04:13,550 And simply, we are writing the same statement, but this time we are using additional parameter of 46 00:04:13,550 --> 00:04:18,230 pass state and then we are mentioning that at all to zero through. 47 00:04:18,770 --> 00:04:20,030 That is the fourth straw. 48 00:04:21,440 --> 00:04:24,170 There are dates in that column. 49 00:04:25,630 --> 00:04:29,870 So suppose if my dates are in second column, I have to write one here. 50 00:04:30,540 --> 00:04:33,500 I have to mention the index which contains the dates. 51 00:04:34,700 --> 00:04:36,050 Let us run this. 52 00:04:40,010 --> 00:04:42,160 So my beef to deter fame is ready. 53 00:04:42,190 --> 00:04:50,340 Know now to take the first fibros I can ride the f blue dot peg. 54 00:04:54,230 --> 00:05:01,370 You can see the output is similar, but this thing, this dates are in the form of the time for. 55 00:05:01,820 --> 00:05:03,410 Not in the form of the string. 56 00:05:03,410 --> 00:05:10,740 For me to check that, I can just straight up frame the do. 57 00:05:12,590 --> 00:05:19,980 Then I want to look at the date, that type of date column and just play the detailed. 58 00:05:23,680 --> 00:05:29,500 You can see earlier the fundamentals of the tape or which is sense for string. 59 00:05:30,460 --> 00:05:34,410 This time the phone my boys Emmett Emmett stands for. 60 00:05:34,680 --> 00:05:35,890 They paint for me. 61 00:05:37,540 --> 00:05:43,780 Now, you can see that does has automatically identified the format of these dates. 62 00:05:45,400 --> 00:05:47,150 So here the four on my days. 63 00:05:47,320 --> 00:05:47,860 Why, why? 64 00:05:47,920 --> 00:05:50,860 Why, why m-m didi. 65 00:05:52,880 --> 00:06:03,880 And Fondas has automatically identified that me for me and of that, if the format was of MMD Levi Way. 66 00:06:04,130 --> 00:06:07,780 In that case also pandas would have automatically identified in the format. 67 00:06:10,130 --> 00:06:14,700 But in some cases, pandas have some trouble in identifying the format. 68 00:06:15,140 --> 00:06:17,890 In those cases, you can use DEET password. 69 00:06:18,620 --> 00:06:19,810 But I think that as well. 70 00:06:22,770 --> 00:06:32,610 So only in cases where you are finding problems in loading data and loading the time data using this 71 00:06:32,610 --> 00:06:40,320 method on the in those cases you have to use date parser for date parser. 72 00:06:41,220 --> 00:06:46,620 You have to mention the format of fear data that is present in columns. 73 00:06:48,510 --> 00:06:58,440 So, for example, in this case, my data is available in the form of why year, month and days? 74 00:06:58,980 --> 00:07:01,800 Then I have to mention the format of such dates. 75 00:07:04,340 --> 00:07:12,340 For example, here, this percentage by a percentage and percentage B means that if my date is nineteen 76 00:07:12,340 --> 00:07:21,560 ninety nine slash well slash zero one, then we have a space for our. 77 00:07:22,520 --> 00:07:24,770 Then there is a colon. 78 00:07:25,280 --> 00:07:27,950 Then there is a minute and then there is a second. 79 00:07:28,580 --> 00:07:35,450 So if my date is in the form of this index affirmation the format in this way. 80 00:07:37,820 --> 00:07:45,830 So for date poza you have to first create a date parser function where you have to mention the format 81 00:07:46,310 --> 00:07:52,280 and then pass this function as an argument for date underscore parser parameter. 82 00:07:55,780 --> 00:08:04,210 You can also look at the link that I have provided you to look at all the directives that are available 83 00:08:04,300 --> 00:08:05,640 with this date parser. 84 00:08:08,470 --> 00:08:15,360 So suppose if your months are in this format, you have to write percentage B instead of percentage 85 00:08:15,460 --> 00:08:15,630 M. 86 00:08:16,570 --> 00:08:18,650 So just look at all this tapes. 87 00:08:19,660 --> 00:08:23,500 And create your own big parcel function. 88 00:08:23,890 --> 00:08:28,680 And then use it to import your dates and date time for me. 89 00:08:30,970 --> 00:08:33,820 But for most cases, you are not going to need it. 90 00:08:34,910 --> 00:08:39,960 Fondas will automatically identify the date time formic for you. 91 00:08:44,500 --> 00:08:51,490 Now, another way to import daytime data is by using cities instead of data fames. 92 00:08:52,450 --> 00:08:55,180 So here we were creating data frames. 93 00:08:55,720 --> 00:08:59,290 Our indexes were in the form of numerical data. 94 00:09:01,410 --> 00:09:10,710 But there is one more matter to import time cities where we will have dates or date time values as our 95 00:09:10,710 --> 00:09:11,370 index. 96 00:09:11,820 --> 00:09:16,590 And then a single column for the value of their cities. 97 00:09:18,810 --> 00:09:24,330 So, for example, let's just import this same data as the cities as well. 98 00:09:26,160 --> 00:09:29,610 So I am creating another object that we are calling cities. 99 00:09:30,880 --> 00:09:34,860 And here also I am using the same method be read out routes CSP. 100 00:09:35,670 --> 00:09:45,450 Then we have to mention the finally then headers and then we are using pastorate equate to zero since 101 00:09:45,540 --> 00:09:48,060 our first column contains dates. 102 00:09:48,810 --> 00:09:54,600 And then for cities, we need to make our date time data as our indexed column. 103 00:09:55,590 --> 00:09:58,980 So I am writing index underscore call equate to zero. 104 00:09:59,540 --> 00:10:03,410 That is our first column is the index for our cities. 105 00:10:04,020 --> 00:10:12,090 And then to convert this data frame in two cities, I can read it squeeze equate to grow just on this. 106 00:10:13,350 --> 00:10:16,980 So my series is really. 107 00:10:20,040 --> 00:10:22,020 Let's look at the first five aloose. 108 00:10:30,300 --> 00:10:32,530 You can see these are the first five values. 109 00:10:34,110 --> 00:10:39,540 You can notice now the city's name is, but there is no individual column name. 110 00:10:40,410 --> 00:10:44,000 The values that are contained in this series is OK. 111 00:10:44,020 --> 00:10:48,510 But so instead of column name, we have our city's name as Bert's. 112 00:10:51,450 --> 00:10:57,570 The big type is in digits since this second column contains integer values. 113 00:10:59,280 --> 00:11:03,780 And for this series, we have big time values as our indexes. 114 00:11:04,170 --> 00:11:09,420 And this second column that is the column is the values of our cities. 115 00:11:10,350 --> 00:11:17,520 So the on the difference between data frame and cities is that in data frame we have numerical indexes 116 00:11:17,520 --> 00:11:20,460 from zero to the land of ordinary Dufrene. 117 00:11:21,150 --> 00:11:22,170 We have two columns. 118 00:11:22,420 --> 00:11:23,970 First one is dead. 119 00:11:24,060 --> 00:11:27,150 And the second one is birth in cities. 120 00:11:27,270 --> 00:11:35,910 We have indexes as a World Time Series data and the values corresponding to those indexes as the values 121 00:11:36,000 --> 00:11:37,310 of cities. 122 00:11:39,420 --> 00:11:43,300 Now let's look at some different attributes of their data frame. 123 00:11:43,500 --> 00:11:45,240 And the cities that we have created. 124 00:11:48,210 --> 00:11:53,210 To look at the size of Ford data from cities, we can right dot shaped my head. 125 00:11:53,690 --> 00:11:57,400 So c.D or Chip? 126 00:11:59,640 --> 00:12:02,060 Sidis is our object name here. 127 00:12:04,490 --> 00:12:06,720 We are just using dot ship attribute. 128 00:12:07,900 --> 00:12:10,960 Similarly, what do Dufferin words D.F. do? 129 00:12:14,060 --> 00:12:14,980 Using dot ship. 130 00:12:18,080 --> 00:12:21,740 You can see we have it on 365 values in our cities. 131 00:12:22,950 --> 00:12:25,550 Since cities only have one single column. 132 00:12:25,730 --> 00:12:29,170 You cannot have more than one column in your cities. 133 00:12:30,070 --> 00:12:32,420 Let's say we are not getting the number of columns. 134 00:12:33,260 --> 00:12:38,930 Whereas for the data frame that we have created, we have two columns, first one for date and second 135 00:12:38,930 --> 00:12:39,920 one for the birth. 136 00:12:40,660 --> 00:12:43,160 That's why we are getting 365 Pommer too. 137 00:12:43,430 --> 00:12:47,570 There are 365 rows and two columns. 138 00:12:51,350 --> 00:12:57,070 Now let's look at how to get that subset of our cities or our data frame. 139 00:13:00,430 --> 00:13:10,030 So far, cities, if I want values of January of 1959, I can just ride cities and I can mention the 140 00:13:10,120 --> 00:13:11,320 year and the month. 141 00:13:11,680 --> 00:13:21,010 Well, you remember, this is the advantage of using the time format and sort of a string for me here. 142 00:13:21,140 --> 00:13:22,850 I'm not mentioning the days value. 143 00:13:23,560 --> 00:13:27,970 I will automatically get all the values of January 1959. 144 00:13:28,600 --> 00:13:36,580 If you're on this, you can see that I'm getting all the values from day one to day 31. 145 00:13:38,380 --> 00:13:40,660 This is the way to slice your data. 146 00:13:40,810 --> 00:13:50,480 In case of cities, if you just write 1959 here, you will get data for all the months and all the days. 147 00:13:54,060 --> 00:13:58,650 Similarly for data frame, we can write condition. 148 00:14:00,600 --> 00:14:03,550 So here we want the values from the data frame. 149 00:14:04,530 --> 00:14:06,520 We're the data frame date. 150 00:14:06,750 --> 00:14:08,370 Value is more than this. 151 00:14:08,610 --> 00:14:11,550 And data frame data value is less than this. 152 00:14:13,380 --> 00:14:19,230 So for this 21 days from 2nd Gen to 21st Gen. 153 00:14:20,700 --> 00:14:23,400 If I want to get a value, I can write like this. 154 00:14:24,720 --> 00:14:28,330 Here also, this is the advantage of using date time for me. 155 00:14:29,670 --> 00:14:37,470 If we were using string format, this was just a string for Bandas and it will not be able to identify 156 00:14:37,500 --> 00:14:42,060 whether January two is between these two values or not. 157 00:14:42,270 --> 00:14:51,480 But since we are using data format, Banda's can identify that Denbury five lies between first gen and 158 00:14:51,630 --> 00:14:52,460 21st Gen. 159 00:14:54,120 --> 00:14:55,530 Let's run this as well. 160 00:14:56,790 --> 00:15:00,360 You can see the result is as expected. 161 00:15:00,870 --> 00:15:05,040 We have values from 2nd Gen build 21st Gen. 162 00:15:06,880 --> 00:15:12,330 So in case of series you can just write them on values and you will get all the values. 163 00:15:12,600 --> 00:15:21,870 But for the top frame, you have to mention the range of dates between which you want the data now to 164 00:15:21,870 --> 00:15:26,410 find important statistics about your data frame or series. 165 00:15:27,120 --> 00:15:29,270 You can use Don describe method. 166 00:15:29,910 --> 00:15:36,000 So just write your series or data frame, name and use or describe method here. 167 00:15:36,390 --> 00:15:44,550 You will get values of the number of homes, mean standard deviation, minimum value, the value present 168 00:15:44,580 --> 00:15:47,360 at the 25th percentile of the value presented. 169 00:15:47,400 --> 00:15:51,930 The 15th percentile and the value presented in the fifth percentile. 170 00:15:52,680 --> 00:15:54,690 And the maximum value of your cities. 171 00:15:56,020 --> 00:16:00,990 So for our cities, the total number of observations are 365. 172 00:16:01,800 --> 00:16:06,370 The mean of all these values is forty one point nine eight. 173 00:16:06,960 --> 00:16:09,090 Standard deviation is seven point three. 174 00:16:09,870 --> 00:16:13,890 The minimum values frantically, the maximum value is 73. 175 00:16:14,370 --> 00:16:19,770 And then we have that 25 feet and 75 percent value as well. 176 00:16:21,000 --> 00:16:26,220 The name of this series is Births and the D type is flawed. 177 00:16:27,150 --> 00:16:36,810 Similarly, this is the stats for our cities for data frame can again use Dort describe method here. 178 00:16:36,810 --> 00:16:41,410 Also, you will get all these values for your columns which contain numerical value. 179 00:16:42,630 --> 00:16:47,280 So the statistics are the same since we have loaded the same data. 180 00:16:48,660 --> 00:16:56,170 So this is the way to load your data and to get some basic statistics about your data. 181 00:16:58,570 --> 00:17:02,870 Now, to summarize, you can load your data as a data frame. 182 00:17:03,320 --> 00:17:12,590 And in case, if you have only one column for euro value, you can also imported as your cities four 183 00:17:12,590 --> 00:17:13,310 dates. 184 00:17:13,460 --> 00:17:18,610 You need to take special care to import it in our day time format. 185 00:17:19,130 --> 00:17:28,880 Using part state and mentioning the column name for Seabees and data frame containing time series data, 186 00:17:28,940 --> 00:17:36,830 you can use all the basic methods of pandas data Dufrene such as head for looking at the first five 187 00:17:36,830 --> 00:17:45,590 values shape for looking at the size of your data frame and or describe metric to look at some basic 188 00:17:45,710 --> 00:17:47,550 statistics about your affair. 189 00:17:48,980 --> 00:17:50,270 That's all for this video. 190 00:17:51,440 --> 00:17:51,810 Thank you.