1 00:00:01,090 --> 00:00:09,710 In this video you learn how to drop scatter plot and believe dogs scatter plots are also known as X 2 00:00:09,710 --> 00:00:18,620 Y charts and these are also a very common type of chart a scatter plot differs from most of the other 3 00:00:18,620 --> 00:00:25,540 chart types in that both axes display values in a scatter plot. 4 00:00:25,730 --> 00:00:30,200 That is the horizontal axis is not a category axis. 5 00:00:30,320 --> 00:00:33,530 In a scatter plot it is also having values 6 00:00:36,270 --> 00:00:44,530 this type of chart is often used to show the relationship between two variables in this example. 7 00:00:44,530 --> 00:00:52,090 I have taken the monthly marketing emails that the company sent me and the corresponding sales it is 8 00:00:52,090 --> 00:00:53,650 obtaining. 9 00:00:53,740 --> 00:01:02,050 So in the month of January 2080 the company sent out 900 marketing e-mails and the amount of sales that 10 00:01:02,050 --> 00:01:04,900 it did was eighty nine. 11 00:01:04,910 --> 00:01:12,190 So the question that I want to answered is is did any relationship between marketing emails and sales 12 00:01:13,810 --> 00:01:15,450 to find the relationship. 13 00:01:15,550 --> 00:01:19,560 I can block a scatter plot of these two variables. 14 00:01:20,800 --> 00:01:29,860 So on the x axis I can take marketing e-mail and on the y axis I can take the sales value and each point 15 00:01:30,190 --> 00:01:32,440 will be X common. 16 00:01:32,740 --> 00:01:39,560 So 904 comma 89 when you plot each individual point like this. 17 00:01:39,690 --> 00:01:46,920 This whole plot is called a scatter plot and if you look at the scattered plot probably you can imagine 18 00:01:46,950 --> 00:01:54,530 that most of the points are telling you not there is a linear relationship between the x axis variable 19 00:01:54,570 --> 00:01:56,610 and the y axis variable. 20 00:01:56,790 --> 00:02:05,570 That is if you increase the marketing emails among the sales is correspondingly increasing. 21 00:02:05,790 --> 00:02:11,130 So such relationships can be identified using scatter plot 22 00:02:13,970 --> 00:02:17,210 let us learn how to create this get a plot. 23 00:02:17,210 --> 00:02:18,190 I will delete this chart 24 00:02:21,610 --> 00:02:25,150 since scatter plot has two variables. 25 00:02:25,150 --> 00:02:27,730 We will select these two variables that we want to plot 26 00:02:34,440 --> 00:02:39,710 and we will go to recommended targets option and we will to like this scatter plot. 27 00:02:46,460 --> 00:02:49,040 Here I have the scattered plot. 28 00:02:49,040 --> 00:02:52,520 It is looking a little bit different than the previous one. 29 00:02:52,520 --> 00:02:59,510 The reason is that in the previous scatter plot I was showing you the y axis which was starting from 30 00:02:59,510 --> 00:03:02,360 a value of 75. 31 00:03:02,450 --> 00:03:09,390 Here it is starting from the new if your aim is to show the absolute values. 32 00:03:09,670 --> 00:03:11,200 It is better to start from zero. 33 00:03:12,610 --> 00:03:18,790 If your aim is to show the relationship between two variables you can start with a different value such 34 00:03:18,790 --> 00:03:19,630 as 75. 35 00:03:20,760 --> 00:03:29,770 How to change the value you select that y axis labels 2D formatting axis option and you change the bond 36 00:03:29,860 --> 00:03:31,300 from 0 to 75. 37 00:03:34,070 --> 00:03:41,820 Since I want to show that there is some linear relationship I have changed the lower amount of my access 38 00:03:41,830 --> 00:03:42,570 to 75. 39 00:03:43,150 --> 00:03:50,410 Now it is like zooming in into that portion of the chart where most of my points are like 40 00:03:53,880 --> 00:03:56,420 now what are the different types of scatter plots. 41 00:03:56,500 --> 00:03:59,790 We will again go to Jean duct type. 42 00:04:01,250 --> 00:04:06,610 And here you can see that we have created the first one which is a simple scatter plot. 43 00:04:06,830 --> 00:04:15,890 The second one is a scatter plot with smooth lines these scatter plot with smooth lines can be used 44 00:04:15,890 --> 00:04:19,090 to show how the values are changing in the series. 45 00:04:20,270 --> 00:04:28,210 So it will connect all the points in this series from the starting first point to the next one then 46 00:04:28,210 --> 00:04:30,970 to the next line using smooth lanes. 47 00:04:30,980 --> 00:04:35,780 That is if you look at any two points it is not joined by a straight line. 48 00:04:35,810 --> 00:04:39,830 It is joined by a curved line so that it connects all the points 49 00:04:44,010 --> 00:04:50,620 the third option is scattered plot with smooth lines but no data markers. 50 00:04:50,850 --> 00:04:55,150 As you can see the earlier one had these data points. 51 00:04:55,350 --> 00:04:58,340 These small circles highlighting the data point. 52 00:04:58,500 --> 00:05:03,270 If you do not want these small circles you can select the third option. 53 00:05:03,270 --> 00:05:07,350 It will have smooth line but it will have no marker. 54 00:05:07,470 --> 00:05:11,190 The point of this is if you want to emphasize on relationship only 55 00:05:14,480 --> 00:05:17,910 the fourth option is scatter plot red lines 56 00:05:20,620 --> 00:05:24,960 so as I told you the previous one had smooth lines. 57 00:05:25,000 --> 00:05:29,650 That is it had curves instead of straight lines. 58 00:05:29,680 --> 00:05:34,000 If you select this one it has straight lines connecting the two points 59 00:05:36,730 --> 00:05:42,350 next option is scattered plot with lines but no data markers. 60 00:05:42,430 --> 00:05:45,960 Same as before but these small circles will not be present. 61 00:05:45,970 --> 00:05:46,780 It was like this one 62 00:06:01,810 --> 00:06:10,360 one additional feature that comes with scatter plot is trend line as it all you know scatter plot is 63 00:06:10,360 --> 00:06:14,380 used to identify a relationship between two variables. 64 00:06:14,470 --> 00:06:21,010 One method is to visually check out what trend is there between the two variables. 65 00:06:21,010 --> 00:06:28,390 The other option is to draw trend line trend line is another chart element so you can add it by clicking 66 00:06:28,390 --> 00:06:29,200 this plus symbol 67 00:06:32,310 --> 00:06:37,550 if you see completely in line by default it will draw linear trend line. 68 00:06:37,570 --> 00:06:41,640 The other options of trend lines also let us first draw a linear trend line 69 00:06:47,050 --> 00:06:55,490 when I select this trend line this linear red line Excel draws the line such that it minimizes the difference 70 00:06:55,850 --> 00:07:00,050 between each data point and the corresponding value added only 71 00:07:02,700 --> 00:07:03,360 overall. 72 00:07:03,360 --> 00:07:11,960 This trend line is suggesting that there is a positive relationship between sales and marketing. 73 00:07:12,040 --> 00:07:16,950 Image you can also see in this trend line. 74 00:07:17,140 --> 00:07:26,940 If I increase the x axis value that is the marketing e-mails from nearly 70 to 80. 75 00:07:27,040 --> 00:07:29,910 That is there is a hundred units increase. 76 00:07:30,100 --> 00:07:33,020 I send additional hundred emails to the customers. 77 00:07:34,450 --> 00:07:45,700 I will increase the sales by from 79 approximately to eighty eight approximately two by sending out 78 00:07:46,030 --> 00:07:47,720 hundred additional e-mails. 79 00:07:47,760 --> 00:07:56,920 We will have an increased sales of 9 to 10 units to the slope of this line is telling you the change 80 00:07:57,160 --> 00:08:03,580 in the y axis with the change in x axis. 81 00:08:03,820 --> 00:08:11,670 The other types of trend lines also you can change from linear to logarithmic polynomial. 82 00:08:12,630 --> 00:08:17,060 Although we do not see any much difference in this data set. 83 00:08:18,120 --> 00:08:25,710 But in your dataset it is always better to draw all these types of train length first and then visually 84 00:08:25,710 --> 00:08:33,270 identify which is fitting the data better and use that trend line. 85 00:08:33,360 --> 00:08:41,340 So using this trend line you can also forecast this is such that if you know the number of emails you 86 00:08:41,340 --> 00:08:46,310 are going to sing you can find out the corresponding C value basis. 87 00:08:46,330 --> 00:08:47,250 This currently 88 00:08:51,780 --> 00:08:58,870 and once you have plotted this linear trend line and you want to find out what is the equation of this 89 00:08:59,080 --> 00:09:08,290 linear line that is considering this is y axis and this is x axis you want to find out why it is equal 90 00:09:08,290 --> 00:09:15,220 to express B and what is the value of a and b you can do that by taking this option 91 00:09:18,450 --> 00:09:26,760 this will give you the equation of this line is the equation y is equal to point zero eight nine times 92 00:09:26,970 --> 00:09:29,280 x plus eight point eight. 93 00:09:30,870 --> 00:09:36,780 So what this means is if you want to find out what will be your sales then you have when you are sending 94 00:09:36,780 --> 00:09:45,300 out 1000 e-mails you can just put the value of x as 1000 it will come out to eighty nine point seven 95 00:09:45,840 --> 00:09:49,130 plus eight point eight. 96 00:09:50,040 --> 00:09:56,580 So your total sales will be eighty nine point seven plus eight point eight which will be nearly ninety 97 00:09:56,580 --> 00:10:02,360 eight point three. 98 00:10:02,370 --> 00:10:04,320 This is how this equation can be used. 99 00:10:05,130 --> 00:10:12,780 Basically the point of using a scatter plot is to find out the relationship between two variables. 100 00:10:12,990 --> 00:10:22,080 If you identify a relationship visually you can also plot a trend line using this job adding chart elements. 101 00:10:22,090 --> 00:10:28,260 But then once you plot the trend line and you are happy with the trend line and you would like to use 102 00:10:28,650 --> 00:10:34,550 the equation of that trend line you can find that equation by clicking on this line. 103 00:10:34,650 --> 00:10:43,320 Selecting its formatting options and taking this box if you are in italics and you understand the terms 104 00:10:43,320 --> 00:10:48,090 of our squared and an intercept you can add those options also. 105 00:10:51,090 --> 00:10:56,220 So the r squared value for this line is point for it 106 00:11:00,750 --> 00:11:06,240 so this is scatter plot these last two options are remaining. 107 00:11:06,300 --> 00:11:12,590 This is a bubble plot and this is a 3D bubble plot. 108 00:11:12,750 --> 00:11:16,820 Then you want to identify a relationship between two variables only. 109 00:11:16,830 --> 00:11:24,960 We use these two dimensional scatter plot but if you have a third dimension also that is that is a third 110 00:11:24,960 --> 00:11:25,770 variable also. 111 00:11:26,700 --> 00:11:32,070 And you want to see the relationship between first second and third variable. 112 00:11:32,070 --> 00:11:34,660 You can use these to the entry debugger plots. 113 00:11:35,370 --> 00:11:36,150 Let me show you how 114 00:11:40,370 --> 00:11:41,170 industry does it. 115 00:11:41,410 --> 00:11:44,310 I have three Delta cities. 116 00:11:44,950 --> 00:11:50,120 I had eight participant in my weight loss program. 117 00:11:50,140 --> 00:11:57,880 These are the original weight of these eight participant this is the time spent by these participants 118 00:11:58,390 --> 00:12:06,520 in our program and this is the amount of weight lost by each individual participant. 119 00:12:06,550 --> 00:12:11,660 I want to find out the effect of these two variables in data mining. 120 00:12:11,950 --> 00:12:22,520 This third variable so what I'm going to do is I'll use a bubble jar which will have on the x axis the 121 00:12:22,640 --> 00:12:27,720 original rate of each individual participant on the y axis. 122 00:12:27,770 --> 00:12:35,870 It will have the number of weeks the participant was in program and the radius of this bubble will be 123 00:12:35,870 --> 00:12:37,990 this third variable. 124 00:12:38,030 --> 00:12:47,900 The idea behind creating this chart is if in this bubble chart circles with larger radius are coming 125 00:12:47,960 --> 00:12:58,460 in a particular area of this chart you can assign that maximum weight loss is being achieved by people 126 00:12:58,910 --> 00:13:01,330 belonging to that particular category. 127 00:13:01,460 --> 00:13:07,310 For example most of the weight loss has been achieved by people in this range. 128 00:13:08,840 --> 00:13:15,770 So people belonging to the weight category of 200 to 320 probably 129 00:13:18,350 --> 00:13:27,500 achieve the maximum weight loss and at least you should be in the program for two or three weeks. 130 00:13:27,500 --> 00:13:39,230 So this squared area constitute most of the big circles and you can clearly identify the range in which 131 00:13:39,320 --> 00:13:43,990 these circles are occurring. 132 00:13:44,030 --> 00:13:51,310 So basically when you have three data series and you want to find the effect of two of the data it is 133 00:13:51,320 --> 00:13:55,070 on the third day densities up but which are deduced. 134 00:13:56,600 --> 00:13:58,870 So now let us learn how to draw a bubble. 135 00:13:58,890 --> 00:14:14,030 Jack I believe this one we will select these three cities and go to the Bobby Jack by default in Excel. 136 00:14:14,030 --> 00:14:20,060 The first series is taken as the radius of the bubbles and the other two cities are taken as the x axis 137 00:14:20,060 --> 00:14:21,480 and y axis. 138 00:14:21,620 --> 00:14:27,350 But instead what I want to do is I want to take the first variable as x axis. 139 00:14:27,350 --> 00:14:30,410 Second is y axis and deterred a radius. 140 00:14:30,410 --> 00:14:34,940 So I have to go to select that option and I will change 141 00:14:43,160 --> 00:14:58,300 so the x value series is the original word series the y value series is weeks in program and the bubble 142 00:14:58,780 --> 00:15:01,930 radius will be this one. 143 00:15:03,750 --> 00:15:04,070 Okay 144 00:15:08,170 --> 00:15:14,700 and it will automatically decide what should be the horizontal axis labels and then click on okay. 145 00:15:15,060 --> 00:15:20,130 And this is the bubble chart that we wanted to create. 146 00:15:20,130 --> 00:15:24,280 The second option in bubble chart is a treaty will be just as you can see. 147 00:15:24,280 --> 00:15:29,080 This is a truly bubble chart where you have circles if you created in 3-D. 148 00:15:29,310 --> 00:15:35,340 These will become spheres so each circle is now looking like a small ball 149 00:15:41,350 --> 00:15:47,390 so just like scatter plot you can use bubble chart to identify a trend and create trend length 150 00:15:50,500 --> 00:15:52,510 once you have created the trend line you can 151 00:15:55,750 --> 00:15:59,800 automatically trend line also you can change its color weight etc. 152 00:16:03,010 --> 00:16:05,550 So scatter plot and bubble chart. 153 00:16:05,740 --> 00:16:12,620 I basically used to identify a relationship between two or three variables and this is how we create 154 00:16:12,620 --> 00:16:13,010 them.