1 00:00:00,870 --> 00:00:07,080 In this video, we will learn about frequency distributions, and once we know about frequency distribution, 2 00:00:07,080 --> 00:00:09,180 we will learn how to draw a histogram. 3 00:00:12,090 --> 00:00:17,340 So when we are trying to draw frequency distribution, we are basically trying to summarize the data 4 00:00:17,970 --> 00:00:27,330 so that we have different categories from that data and we assign the number of occurrences of each 5 00:00:27,330 --> 00:00:28,650 category in that data. 6 00:00:29,850 --> 00:00:37,860 For example, if I have laid off students of a college and against their name, I have the branch that 7 00:00:37,860 --> 00:00:47,430 they belong to, I can create a table like this in which I have summarized that how many of those students 8 00:00:47,610 --> 00:00:49,090 belong to each branch? 9 00:00:50,280 --> 00:00:57,120 So in an engineering college in that list, 100 students belong to a computer science branch. 10 00:00:57,930 --> 00:01:00,960 It is to belong to mechanical engineering branch and so on. 11 00:01:02,610 --> 00:01:06,600 When we have the categories like this, that is the categories are discrete. 12 00:01:07,530 --> 00:01:13,600 We have five different branches and to each branch we can clearly assign the student. 13 00:01:15,510 --> 00:01:18,510 This is called qualitative data or categorical data. 14 00:01:19,830 --> 00:01:24,840 And this type of distribution is called a frequency distribution for qualitative qualitatively. 15 00:01:27,060 --> 00:01:34,830 So the raw data would be student name in front of that is right on the branch to which that student 16 00:01:34,830 --> 00:01:35,280 belongs. 17 00:01:36,750 --> 00:01:43,830 And we have this list for 420 students and we summarize the data as how many students belong to each 18 00:01:43,830 --> 00:01:44,180 group. 19 00:01:45,750 --> 00:01:50,430 Once we have a say in these frequencies, we can also find out the relative frequency of each category 20 00:01:50,760 --> 00:01:55,890 by using this formula that is frequency of that category divided by sum of all frequencies. 21 00:01:56,730 --> 00:02:02,100 That is, if I want to find out the relative frequency of students belonging to biotechnology. 22 00:02:02,580 --> 00:02:03,810 It is 60. 23 00:02:05,550 --> 00:02:09,570 That is the frequency of this category divided by the total frequency, which is four. 24 00:02:09,570 --> 00:02:13,200 Twenty is equal to nearly fourteen point two percent. 25 00:02:14,790 --> 00:02:21,030 So when we build a frequency distribution for qualitative data, that is there are clear, discrete 26 00:02:21,030 --> 00:02:24,180 categories and we draw a graph of it. 27 00:02:24,480 --> 00:02:26,190 It is similar to a budget. 28 00:02:27,150 --> 00:02:34,490 A graphical representation of frequency distribution for qualitative data is known as budget budget. 29 00:02:34,500 --> 00:02:37,830 We have already discussed we know how to draw a budget. 30 00:02:38,340 --> 00:02:45,510 If we have a data like this, you can just select the data in exile and go to budget and draw this kind 31 00:02:45,510 --> 00:02:45,960 of graph. 32 00:02:47,790 --> 00:02:52,710 The next type of frequency distribution is frequency distribution for quantitative data. 33 00:02:52,890 --> 00:02:54,300 That is continuous data. 34 00:02:55,320 --> 00:03:03,330 So for example, if I have a of student and in front of them I have the marks of these student inScience. 35 00:03:04,620 --> 00:03:08,270 So Marks is a continuous value from zero to 100. 36 00:03:09,210 --> 00:03:14,940 If I want to straightaway create categories, maybe I'll get one hundred category one category of zero 37 00:03:14,940 --> 00:03:18,030 one category of two, one category of value tree. 38 00:03:18,810 --> 00:03:20,450 And that does not really make a sense. 39 00:03:21,750 --> 00:03:30,630 So when we have a continuous data, we try to create ranges, small ranges, and within each range we 40 00:03:30,630 --> 00:03:38,310 assign the frequency of number of instances belonging to that category to, for example, instead of 41 00:03:38,310 --> 00:03:46,710 creating zero one, two, three as categories, we have greater data 235 as one category 35 to 55 is 42 00:03:46,710 --> 00:03:50,070 another fifty five to seven days another and so on. 43 00:03:51,420 --> 00:04:00,450 And within each range I have found out how many students belong to that range and assign that number 44 00:04:00,450 --> 00:04:05,700 here to this table is frequency distribution for a continuous data. 45 00:04:07,320 --> 00:04:16,560 And when we create a graph of this data that is called a histogram, so a graph of frequency distribution 46 00:04:16,560 --> 00:04:18,830 of categorical variable is called Barcia. 47 00:04:19,260 --> 00:04:23,730 A graph of frequency distribution of continuous data is called a histogram. 48 00:04:25,020 --> 00:04:28,820 So how to draw a histogram when you have continuous data? 49 00:04:29,460 --> 00:04:33,000 I have outlined that process in these steps. 50 00:04:34,440 --> 00:04:40,650 When you have a list of continuous data, the first thing you need to decide is the number of classes 51 00:04:40,650 --> 00:04:41,210 that you want. 52 00:04:41,490 --> 00:04:42,960 That is the number of categories. 53 00:04:43,680 --> 00:04:48,660 For example, in the previous example, we have five classes. 54 00:04:49,600 --> 00:04:55,830 So first thing you need to decide is how many classes you want in your table. 55 00:04:57,420 --> 00:05:04,470 Second, using that number of classes, we find out the class without using this formula, the maximum 56 00:05:04,470 --> 00:05:07,860 value of data and the minimum value data. 57 00:05:08,460 --> 00:05:11,520 What is the difference between these two values divided by the number? 58 00:05:11,580 --> 00:05:19,860 After that, we decided in was that once we have class that we can start creating the classes, we start 59 00:05:19,860 --> 00:05:24,900 by the minimum value, we add classwork and we get the first class. 60 00:05:25,110 --> 00:05:28,100 This will get clear when we go through the example. 61 00:05:28,800 --> 00:05:30,680 This is the continuous data that I have. 62 00:05:31,110 --> 00:05:36,660 It's a set of 20 numbers and I want to segregate into five equal classes. 63 00:05:37,080 --> 00:05:39,880 So I have selected a number of classes, which is five. 64 00:05:40,780 --> 00:05:46,770 Now I want to point out the class which I told you class with this one by subtracting minimum value 65 00:05:46,770 --> 00:05:55,950 from the maximum value, minimum value for this set of numbers is six and the maximum value is 36 to 66 00:05:55,950 --> 00:05:59,860 36 minus six, divided by number of classes, which is five. 67 00:06:00,510 --> 00:06:05,880 So it comes out to six now to create the classes. 68 00:06:06,240 --> 00:06:13,560 I will start with the minimum value, which is six, and I will add the class glasswork to get this 69 00:06:13,560 --> 00:06:14,230 upper range. 70 00:06:14,580 --> 00:06:15,900 So is six to 12. 71 00:06:17,580 --> 00:06:26,310 Next class will start from this value will and we will add the glasswork to get it in and so on. 72 00:06:27,000 --> 00:06:33,630 So from six to 12, 12 to 18, 18 to 24, 24, 30, 32, 36. 73 00:06:36,080 --> 00:06:45,590 To avoid confusion, that if we have a number 30 to which class it will belong, we usually follow this 74 00:06:45,590 --> 00:06:52,750 convention that 30 will not belong to this class, it will belong to this 24 to 30 class. 75 00:06:52,760 --> 00:06:57,640 That is, that they will be part of the upper range and not of the lower range. 76 00:06:58,700 --> 00:07:04,820 Once we have created these classes, we have to start assigning each number to these classes. 77 00:07:05,690 --> 00:07:12,710 So I will go to the numbers one by one and I'll start adding a mark to each of the classes to whichever 78 00:07:12,710 --> 00:07:15,040 class that number belongs to. 79 00:07:15,080 --> 00:07:18,140 For example, I start with 10 and belong to this class. 80 00:07:18,350 --> 00:07:23,750 So at the market then there is 14, for they belong to this class. 81 00:07:23,900 --> 00:07:25,310 I am a mark here. 82 00:07:26,600 --> 00:07:30,470 I go through all these numbers and I continue adding marks. 83 00:07:31,640 --> 00:07:39,320 In the end, I have this column containing tally which is giving me how many numbers belong to each 84 00:07:39,320 --> 00:07:39,770 category. 85 00:07:40,670 --> 00:07:44,580 Lastly, I just need to add these marks to get the frequency. 86 00:07:45,500 --> 00:07:49,610 So this is how this table is created and what this table is created. 87 00:07:49,880 --> 00:07:55,630 We can create the histogram, so let us go to Excel and create this histogram. 88 00:07:57,020 --> 00:08:00,500 So here is the list of numbers that I had in my presentation. 89 00:08:01,760 --> 00:08:05,750 And this is the histogram which is created for these numbers. 90 00:08:07,400 --> 00:08:13,580 The good thing about Excel is that you do not need to create that table to create this histogram. 91 00:08:14,450 --> 00:08:18,580 Excel can create this histogram straight away using these numbers. 92 00:08:19,610 --> 00:08:26,010 You can see from six to 12, we have two from 12 to 18, we have four. 93 00:08:26,210 --> 00:08:34,100 So basically the classes are on the horizontal axis and the height of the bar is giving you the frequency 94 00:08:34,280 --> 00:08:35,060 in each class. 95 00:08:36,260 --> 00:08:44,690 One small but important thing to note here is that Excel is using this code and square brackets to indicate 96 00:08:45,290 --> 00:08:53,660 that which number is included and which is not, for example, in this 24 to 30, 24 number will not 97 00:08:53,660 --> 00:08:55,010 be included in this graph. 98 00:08:55,250 --> 00:09:02,690 Therefore, it has a record, but 30 will be included in this, but therefore it is a square bracket. 99 00:09:03,680 --> 00:09:09,220 So you can see for these four categories, the first number is not included, but this second is included. 100 00:09:09,680 --> 00:09:15,750 But for this first category, only the first and second board are included. 101 00:09:17,540 --> 00:09:19,580 This is our histogram will look like. 102 00:09:20,030 --> 00:09:28,190 Let us learn how to create this histogram and delete this first to create a histogram. 103 00:09:28,190 --> 00:09:34,640 Select the numbers that we have got to insert in these statistical graphs. 104 00:09:35,840 --> 00:09:40,880 This first option is a histogram you can see by default. 105 00:09:41,030 --> 00:09:49,160 It has created three classes with the range six to sixteen, 16 to 26 and 26 to 36. 106 00:09:50,670 --> 00:10:00,860 Now I wanted five classes, so for that I will do the format access to opening for my lectures. 107 00:10:01,160 --> 00:10:04,380 You know how to open this formatting options either. 108 00:10:04,380 --> 00:10:10,700 Right, click and select the format option or you double click on that element to open formatting options 109 00:10:10,700 --> 00:10:11,860 for that particular element. 110 00:10:12,590 --> 00:10:18,680 So I double clicked on the horizontal axis to open it formatting options in these. 111 00:10:19,010 --> 00:10:24,920 The third option will have the specific exit options for histograms. 112 00:10:25,730 --> 00:10:33,740 Here you can change the bandwidth and number of bins by default automatically created three. 113 00:10:34,700 --> 00:10:40,790 Now I want to create five and now either I give a benefit of value six, which I found. 114 00:10:40,790 --> 00:10:41,030 OK, 115 00:10:44,360 --> 00:10:47,420 so this is the histogram that I wanted to see. 116 00:10:47,570 --> 00:10:52,010 Or I could have given the number of bent as you to both will work. 117 00:10:53,000 --> 00:10:58,490 Also, remember, whenever you draw histogram, it is advisable that you have data labeled on top of 118 00:10:58,490 --> 00:10:58,640 it. 119 00:10:59,780 --> 00:11:03,320 It'll make it easier to find out what is the frequency of each category. 120 00:11:06,350 --> 00:11:14,420 The second type of statistical chart is a burrito chart, so this is a burrito chart. 121 00:11:17,230 --> 00:11:19,400 It is similar to the histogram. 122 00:11:20,620 --> 00:11:27,340 The major difference is the categories are separate, these are sharper in the sense that the category 123 00:11:27,340 --> 00:11:34,300 which is contributing the most comes first to this category had a frequency of nine to it is coming 124 00:11:34,300 --> 00:11:40,510 first next county category, which is contributing the next maximum to the total. 125 00:11:41,110 --> 00:11:46,080 So the categories are arranged in descending order of that frequency values. 126 00:11:47,710 --> 00:11:52,420 And this line is showing the cumulative relative frequency values. 127 00:11:53,210 --> 00:12:01,110 For example, the relative frequency of this category is around 35 percent. 128 00:12:02,510 --> 00:12:08,970 When I add the second category, it becomes nearly 70 percent of the third category. 129 00:12:09,160 --> 00:12:11,440 It becomes nearly 88 percent. 130 00:12:12,220 --> 00:12:15,640 After adding fourth category, it becomes nearly 95 percent. 131 00:12:16,060 --> 00:12:20,500 And the last category acts it to 100 percent. 132 00:12:21,790 --> 00:12:31,840 So using Perito, you can estimate how many of these categories contribute to how much percentage of 133 00:12:31,840 --> 00:12:32,350 the total. 134 00:12:32,860 --> 00:12:39,070 So, for example, if you would like to find out those important two or three categories, which are 135 00:12:39,070 --> 00:12:44,200 going to be 80 percent of the total, you can do that using a burrito analysis. 136 00:12:45,730 --> 00:12:49,460 So this is how we create frequency distribution and histograms. 137 00:12:50,170 --> 00:12:55,090 Good thing about using Excel is you do not need to create a frequency distribution table to create a 138 00:12:55,090 --> 00:12:55,670 histogram. 139 00:12:55,690 --> 00:13:00,350 I told you how to do that so that you understand the concept being a histogram. 140 00:13:01,210 --> 00:13:02,310 That's all for this video.