1 00:00:00,990 --> 00:00:07,110 In this video we will learn about frequency distributions and once we know about frequency distribution 2 00:00:07,110 --> 00:00:09,150 we will learn how to draw a histogram 3 00:00:12,080 --> 00:00:15,220 to when we are trying to draw frequency distribution. 4 00:00:15,230 --> 00:00:23,120 We are basically trying to summarize the data so that we have different categories from that data and 5 00:00:23,120 --> 00:00:29,830 we assign the number of occurrences of each category in that data. 6 00:00:29,900 --> 00:00:37,850 For example if I have data of students of for college and against their name I have the branch that 7 00:00:37,850 --> 00:00:39,940 they belong to. 8 00:00:40,220 --> 00:00:48,290 I can create a table like this in which I have summarized that how many of those students belong to 9 00:00:48,410 --> 00:00:50,330 each branch. 10 00:00:50,330 --> 00:00:58,550 So in an engineering college in that list hundred students belong to computer science branch 82 and 11 00:00:58,550 --> 00:01:02,690 belong to mechanical engineering branch and so on. 12 00:01:02,690 --> 00:01:09,500 When we have the categories like this that is the categories are discrete we have five different branches 13 00:01:09,980 --> 00:01:15,560 and to each branch we can clearly assign these student. 14 00:01:15,560 --> 00:01:18,470 This is called qualitative data or categorical data. 15 00:01:19,970 --> 00:01:26,410 And this type of distribution is called up frequency distribution for qualitative reader. 16 00:01:27,110 --> 00:01:34,700 So that raw data would be stored in name in front of that is read in the branch to which that's true 17 00:01:34,700 --> 00:01:43,010 and belongs and we have this list for 420 students and we summarize that data as how many students belong 18 00:01:43,010 --> 00:01:45,800 to each group. 19 00:01:45,800 --> 00:01:50,450 Once we have a say in these frequencies we can also find out the relative frequency of each category 20 00:01:50,840 --> 00:01:56,780 by using this formula that is frequency of that category divided by some of all frequencies. 21 00:01:56,780 --> 00:02:02,840 That is if I want to find out the relative frequency of students blank belonging to biotechnology it 22 00:02:02,840 --> 00:02:11,690 is 60 that is the frequency of this category divided by the total frequency which is 420 is equal to 23 00:02:11,690 --> 00:02:14,610 nearly fourteen point two percent. 24 00:02:14,810 --> 00:02:21,710 So when we build a frequency distribution for qualitative data that is there are clear discrete categories 25 00:02:22,880 --> 00:02:24,500 and we draw a graph of it. 26 00:02:24,590 --> 00:02:32,270 It is similar to a bar chart a graphical representation of frequency distribution for qualitative data 27 00:02:32,390 --> 00:02:34,500 is known as budget budget. 28 00:02:34,520 --> 00:02:37,790 We have already discussed we know how to draw a budget. 29 00:02:38,450 --> 00:02:45,500 If we have a data like this you can just select data die in Excel and go to budget and draw this kind 30 00:02:45,500 --> 00:02:47,520 of graph. 31 00:02:47,840 --> 00:02:53,930 The next table frequency distribution is frequency distribution for quantitative data that is continuous 32 00:02:53,930 --> 00:02:54,260 data. 33 00:02:55,340 --> 00:03:02,840 So for example if I have name of student and in front of them I have the marks of these students in 34 00:03:02,840 --> 00:03:03,340 science. 35 00:03:04,670 --> 00:03:09,270 So MOX is a continuous value from zero to hundred. 36 00:03:09,290 --> 00:03:14,900 If I want to straightaway create categories maybe I'll get a hundred categories one category of zero 37 00:03:14,930 --> 00:03:16,200 one category of two. 38 00:03:16,700 --> 00:03:21,670 One category of value 3 and that does not really make a sense. 39 00:03:21,800 --> 00:03:31,030 So when we have a continuous data we try to create ranges small ranges and within each range we assign 40 00:03:31,040 --> 00:03:36,470 the frequency of number of instances belonging to that category. 41 00:03:36,500 --> 00:03:44,480 So for example instead of creating 0 1 2 3 as categories we have greater detail 235 as one category 42 00:03:45,480 --> 00:03:49,480 35 to 55 as a matter to divide 278 added. 43 00:03:49,550 --> 00:04:00,110 And so on and within each range I have found out how many student belong to that range and assign that 44 00:04:00,110 --> 00:04:06,580 number hit to this table is frequency distribution for that continues data. 45 00:04:07,430 --> 00:04:13,730 And when we create a graph of this data that is called a histogram. 46 00:04:14,000 --> 00:04:20,810 So a graph of frequency distribution of categorical variable is called bar chart a graph of frequency 47 00:04:20,810 --> 00:04:24,670 distribution of continuous data is called a histogram. 48 00:04:25,070 --> 00:04:28,870 So how to draw a histogram when you have continuous data. 49 00:04:29,570 --> 00:04:34,490 I have outlined that process in these six steps. 50 00:04:34,520 --> 00:04:40,640 When you have a list of continuous data the first thing you need to decide is the number of classes 51 00:04:40,640 --> 00:04:41,570 that you want. 52 00:04:41,570 --> 00:04:43,420 That is the number of categories. 53 00:04:43,700 --> 00:04:52,820 For example in the previous example we have five classes to first thing you need to decide is how many 54 00:04:53,630 --> 00:04:57,250 classes you want in your table. 55 00:04:57,500 --> 00:05:03,580 Second using that number of classes we find out the class quit using this formula. 56 00:05:03,710 --> 00:05:10,580 The maximum value of data and the minimum value data what is the difference between these two values 57 00:05:10,730 --> 00:05:13,580 divided by the number of classes that we decided in first. 58 00:05:15,260 --> 00:05:19,260 Once we have class weight we can start creating the classes. 59 00:05:19,400 --> 00:05:25,130 We start by the minimum value we add classwork and we get the first class. 60 00:05:25,130 --> 00:05:28,850 This will get clear when we go through the example. 61 00:05:28,850 --> 00:05:30,960 This is the continuous data that I have. 62 00:05:30,960 --> 00:05:36,900 It is a set of 24 numbers and I want to segregate into five equal classes. 63 00:05:37,130 --> 00:05:39,890 So I have selected the number of classes with just five. 64 00:05:40,790 --> 00:05:47,090 Now I want to find out the class which I do classwork this phone by subtracting minimum value from the 65 00:05:47,090 --> 00:05:48,340 maximum value. 66 00:05:48,590 --> 00:05:58,210 Minimum value for this set of numbers is six and the maximum value is 36 to 36 minus six divided by 67 00:05:58,400 --> 00:05:59,870 number of classes which is five. 68 00:06:00,560 --> 00:06:06,200 So it comes out to six now to create the classes. 69 00:06:06,350 --> 00:06:14,560 I will start with the minimum value which is six and I will add the class word to get this upper range. 70 00:06:14,630 --> 00:06:17,660 So it is six to twelve. 71 00:06:17,660 --> 00:06:24,720 Next class will start from this value will and we will add the class word to get it. 72 00:06:25,790 --> 00:06:26,320 And so on. 73 00:06:27,050 --> 00:06:40,880 So from six to twelve will do 18 18 to 24 25 30 32 36 to avoid confusion that if we have a number 30 74 00:06:41,290 --> 00:06:50,150 to which class it will belong we usually follow this convention that party will not belong to this class. 75 00:06:50,150 --> 00:06:56,870 It would belong to this 24 to 30 class that is today will be part of the upper range and not of the 76 00:06:56,960 --> 00:06:58,760 lower range. 77 00:06:58,760 --> 00:07:05,370 Once we have created these classes we have to start assigning each number to these classes. 78 00:07:05,720 --> 00:07:12,710 So I will go to the numbers one by one and I will start adding a mark to each of these classes to whichever 79 00:07:12,710 --> 00:07:15,060 class that number belongs to. 80 00:07:15,080 --> 00:07:20,570 For example I start with 10 and belongs to this class so I add a market. 81 00:07:20,570 --> 00:07:23,970 Then there is 14 14 belong to this class. 82 00:07:24,020 --> 00:07:26,720 I add a mark here. 83 00:07:26,720 --> 00:07:32,470 I go through all these numbers and I continue adding marks in the end. 84 00:07:32,480 --> 00:07:40,760 I have this column containing Ali which is giving me how many numbers belong to each category. 85 00:07:40,760 --> 00:07:45,230 Lastly I just need to add these marks to any frequency. 86 00:07:45,540 --> 00:07:47,300 So this is how this table is created. 87 00:07:48,080 --> 00:07:52,110 And once this table is created we can create a histogram. 88 00:07:52,460 --> 00:07:55,610 So let us go to excel now and create this is to them. 89 00:07:57,080 --> 00:08:04,880 So here is the list of numbers that I had in my presentation and this is the histogram which is created 90 00:08:04,880 --> 00:08:07,470 for these numbers. 91 00:08:07,490 --> 00:08:14,900 The good thing about Excel is that you do not need to create that table to create this histogram Excel 92 00:08:14,910 --> 00:08:17,180 can create this histogram straight away. 93 00:08:17,180 --> 00:08:25,310 Using these numbers you can see from six to twelve we have two from two to 18. 94 00:08:25,460 --> 00:08:26,050 We have four. 95 00:08:26,270 --> 00:08:34,130 So basically these classes are on the horizontal axis and the height of the bar is giving you the frequency 96 00:08:34,370 --> 00:08:35,820 in each class. 97 00:08:36,350 --> 00:08:44,690 One small but important thing to note here is that Excel is using this gold and square brackets to indicate 98 00:08:45,350 --> 00:08:54,380 that which number is included and which is not for example in this 24 to 30 24 number will not be included 99 00:08:54,380 --> 00:08:55,230 in this graph. 100 00:08:55,310 --> 00:09:00,810 Therefore it has a gold bracket but 30 will be included in this bar. 101 00:09:01,070 --> 00:09:07,760 Therefore it is a square bracket so you can see for these four categories the first number is not included 102 00:09:08,030 --> 00:09:09,220 but the second is included. 103 00:09:09,740 --> 00:09:17,370 But for this first category only the first and chicken both are included. 104 00:09:17,600 --> 00:09:19,580 This is our histogram we look like. 105 00:09:20,090 --> 00:09:29,300 Let us learn how to create this histogram and delete this first so to create Instagram select the numbers 106 00:09:29,300 --> 00:09:35,900 that we have go to insert in these statistical graphs. 107 00:09:35,900 --> 00:09:45,410 This first option isn't Instagram you can see by default it has created 3D glasses with the range six 108 00:09:45,410 --> 00:09:50,500 to sixteen 16 to 26 and 26 to 36. 109 00:09:50,720 --> 00:10:01,240 Now I wanted five glasses so for that I will do the format access to open for my taxes. 110 00:10:01,250 --> 00:10:07,460 You know how to open this formatting options either right click and select the format option or you 111 00:10:07,460 --> 00:10:12,390 double click on that chart element to open formatting options for that particular element. 112 00:10:12,650 --> 00:10:19,020 So I double clicked on the horizontal axis to open it formatting options in these. 113 00:10:19,040 --> 00:10:25,560 The third option will have the specific exit options for histogram. 114 00:10:25,790 --> 00:10:34,790 Here you can change the bean rate and number of bins by default automatically created to ribbons. 115 00:10:34,790 --> 00:10:43,860 Now I want to create five bins now either I give a bin rate of value 6 which I found out. 116 00:10:44,420 --> 00:10:47,890 So this is the Instagram that I wanted to see on. 117 00:10:47,900 --> 00:10:53,100 I could have given the number of bent as for you two both will look. 118 00:10:53,100 --> 00:10:59,680 Also remember whenever you draw a histogram it is advisable that you have data labels on top of it. 119 00:10:59,840 --> 00:11:06,220 It will make it easier to find out what is the frequency of each category. 120 00:11:06,380 --> 00:11:13,640 The second type of statistical chart is up but he does chart so this is it. 121 00:11:13,670 --> 00:11:14,410 But he does chart 122 00:11:17,330 --> 00:11:20,660 it is similar to the histogram. 123 00:11:20,660 --> 00:11:23,980 The major differences the categories are sharp weight. 124 00:11:23,990 --> 00:11:30,800 These are sharper in the sense that the category which is contributing the most comes first. 125 00:11:30,800 --> 00:11:38,570 So this category had a frequency of 9 to it this coming first next county category which just contributing 126 00:11:38,570 --> 00:11:40,550 the next maximum to the total. 127 00:11:41,180 --> 00:11:47,810 So the categories are arranged in the descending order of that frequency values. 128 00:11:47,810 --> 00:11:52,860 And this line is showing the cumulative relative frequency values. 129 00:11:53,270 --> 00:12:02,550 For example the relative frequency of this category is around 35 percent. 130 00:12:02,600 --> 00:12:09,290 Then I add this second category it becomes nearly 70 percent of the third category. 131 00:12:09,290 --> 00:12:13,670 It becomes nearly 88 percent after adding fourth category. 132 00:12:13,670 --> 00:12:15,640 It becomes nearly 95 percent. 133 00:12:16,190 --> 00:12:21,610 And the last category act it two hundred percent. 134 00:12:21,860 --> 00:12:31,850 So using burrito you can estimate how many of these categories contribute to how much percentage of 135 00:12:31,850 --> 00:12:32,920 the total. 136 00:12:32,930 --> 00:12:39,280 So for example if you would like to find all those important two or three categories which are counted 137 00:12:39,270 --> 00:12:41,820 we contribute 80 percent of the total. 138 00:12:41,830 --> 00:12:45,770 And you can do that using up burrito do analysis. 139 00:12:45,770 --> 00:12:49,850 So this is how we create frequency distribution and histogram. 140 00:12:50,240 --> 00:12:55,640 Good thing about using Excel is you do not need to create a frequency distribution table to create histogram 141 00:12:55,760 --> 00:12:57,140 I told you how to do that. 142 00:12:57,230 --> 00:13:01,300 So that you understand the concept being a histogram. 143 00:13:01,310 --> 00:13:02,300 That's all for this value.