1 00:00:00,740 --> 00:00:07,820 Highlight continue with our project and this video where doing some exploratory analysis so fraught, 2 00:00:07,820 --> 00:00:15,770 we need to involve law enforcement because the I said, do we need to extract further information from 3 00:00:15,780 --> 00:00:24,050 the dataset so we can understand Marshall al-Juburi data, not info. 4 00:00:26,070 --> 00:00:31,020 And then execute a sale and you say we got our sale. 5 00:00:32,290 --> 00:00:35,350 We got the only data that we need. 6 00:00:37,630 --> 00:00:46,600 So I said additional information is written for Olavarria contended that I said so to get a preview 7 00:00:46,600 --> 00:00:53,830 of the data contained in it, we can calculate ICRA a basic statistic so we can use that least and shot. 8 00:00:55,960 --> 00:00:56,690 So let. 9 00:00:57,950 --> 00:01:02,690 Creating a new variable called summary data. 10 00:01:03,200 --> 00:01:04,580 This great. 11 00:01:09,520 --> 00:01:10,990 And then you can hear. 12 00:01:12,250 --> 00:01:14,320 Summary equa. 13 00:01:15,760 --> 00:01:18,400 Somehow I don't transpose. 14 00:01:20,550 --> 00:01:22,110 And then you can bring the. 15 00:01:23,360 --> 00:01:24,020 Samari. 16 00:01:27,440 --> 00:01:29,630 And we got the following summary. 17 00:01:31,340 --> 00:01:38,900 So that is a function, generally descriptive statistics that summarize the central tendency, dispersion 18 00:01:38,900 --> 00:01:45,530 and shape of our dataset distribution, including not a number and and values. 19 00:01:45,800 --> 00:01:49,670 So it analyzes what Normal Rate and Objects series as well. 20 00:01:49,830 --> 00:01:57,380 The data from columns says that high the output will vary depending on what is provided. 21 00:01:57,950 --> 00:02:06,020 So as you can see that the variable is defined range when the predictors have different ranges that 22 00:02:06,020 --> 00:02:09,170 impact on respon variables by the features. 23 00:02:09,350 --> 00:02:15,380 Having a greater normal range could be more than one having a lesser normal range. 24 00:02:15,800 --> 00:02:19,460 And this could in turn impact prediction accuracy. 25 00:02:20,240 --> 00:02:26,930 So our goal is to improve the predictive accuracy and not allow a particular feature to impact the prediction 26 00:02:26,950 --> 00:02:29,050 due to a normal range. 27 00:02:29,690 --> 00:02:30,140 Does. 28 00:02:31,330 --> 00:02:38,680 We may need to scale values and different features so that they fall under a common range through this 29 00:02:38,700 --> 00:02:40,010 statistical procedure. 30 00:02:40,240 --> 00:02:46,240 Is it possible to compare identical variables belonging to different distributions and also different 31 00:02:46,240 --> 00:02:50,800 variables, all variables in a different unit? 32 00:02:51,680 --> 00:02:54,030 So that is the end of this video. 33 00:02:54,410 --> 00:02:56,510 I hope you enjoy it and. 34 00:02:57,830 --> 00:02:59,660 I will see you in the next video.