1 00:00:00,560 --> 00:00:03,140 Hey there it's Andre game with your quick tip. 2 00:00:03,140 --> 00:00:03,780 That is. 3 00:00:03,890 --> 00:00:07,160 Let's talk about correlation analysis. 4 00:00:07,160 --> 00:00:08,000 What does that mean. 5 00:00:08,270 --> 00:00:14,100 Well correlation analysis simply means which attributes have correlations. 6 00:00:14,150 --> 00:00:18,740 So let's say one column is correlated to another column. 7 00:00:18,890 --> 00:00:27,830 Let's say we're trying to sell our home and there's two columns one column is the size of the land that 8 00:00:27,830 --> 00:00:34,730 the house is on and the other column is the size of the House and the floor space. 9 00:00:34,730 --> 00:00:43,070 Now let's say that when we're analyzing our data we notice that pretty much all houses that have a large 10 00:00:43,250 --> 00:00:51,440 land size also have a large floor space size and they're just correlated the prices go up the same every 11 00:00:51,440 --> 00:00:59,220 time the land increases and every time the floor space increases in this case these attributes have 12 00:00:59,220 --> 00:01:01,890 high correlation with each other. 13 00:01:01,890 --> 00:01:09,240 In this case we can actually remove this from our analysis or from building our model because it might 14 00:01:09,240 --> 00:01:10,920 not affect our model. 15 00:01:10,920 --> 00:01:16,070 This gets to our next point or forward backwards attribute selection. 16 00:01:16,170 --> 00:01:17,980 What does this mean. 17 00:01:17,980 --> 00:01:26,280 Well we can try training our model using different techniques backward attributes selection essentially 18 00:01:26,280 --> 00:01:33,840 says train the model on all the attributes and then slowly to start taking away attributes or columns 19 00:01:34,200 --> 00:01:35,880 to train your model. 20 00:01:36,090 --> 00:01:43,150 It does that affect your model does it improve the model forward attribute selection is the opposite. 21 00:01:43,200 --> 00:01:50,460 Start with just one column when you train the model and keep adding one attribute at a time until you 22 00:01:50,460 --> 00:01:53,010 get the accuracy to plateau. 23 00:01:53,100 --> 00:02:00,400 That is if you keep increasing columns and let's say after the fiftieth column all the other attributes 24 00:02:00,400 --> 00:02:02,630 say you added just don't improve the model. 25 00:02:02,700 --> 00:02:04,710 Well then maybe we might not need it. 26 00:02:04,950 --> 00:02:07,120 This idea of correlation analysis. 27 00:02:07,200 --> 00:02:13,950 The forward backward attribute selection are all ways for us to test our model reduce our data if we 28 00:02:13,950 --> 00:02:21,120 want to and play with our model instead of just assuming if we include everything everything will make 29 00:02:21,120 --> 00:02:22,320 the model better. 30 00:02:22,320 --> 00:02:25,640 That's often not usually the case. 31 00:02:25,650 --> 00:02:26,290 All right. 32 00:02:26,370 --> 00:02:27,390 Back to Daniel.