1 00:00:05,020 --> 00:00:07,180 Hey everyone. 2 00:00:07,500 --> 00:00:12,330 Before moving to the Lords there's something I need to tell you here and that's something I believe 3 00:00:12,330 --> 00:00:17,080 important that what we are going to study in this module we need to categorize this module. 4 00:00:17,100 --> 00:00:18,900 Otherwise you will get any problem there. 5 00:00:19,710 --> 00:00:24,690 So first thing is not like the metallurgy there have been everything from the beginning like how the 6 00:00:24,690 --> 00:00:30,300 X label is denoted how y level is appointed and how these points are created there. 7 00:00:30,340 --> 00:00:33,970 Every syntax because that's what you have already done. 8 00:00:34,070 --> 00:00:39,510 Handy seaborne we are going to move on floats that how we can analyze the professional data by using 9 00:00:39,510 --> 00:00:45,630 these blobs and that's what you will get in your companies and industries or in any project. 10 00:00:45,630 --> 00:00:50,820 And we are going to work on the same type of data that you will get on all duty places like in your 11 00:00:50,820 --> 00:00:55,200 job in your projects or in your industries. 12 00:00:55,200 --> 00:01:02,360 So now before categorizing the module there there's one thing that we're going to develop the data our 13 00:01:02,360 --> 00:01:03,100 own. 14 00:01:03,240 --> 00:01:10,410 So if you have seen the data the data will look like something this one this one is a excel sheet and 15 00:01:11,160 --> 00:01:14,010 in a program this will be like this one. 16 00:01:14,040 --> 00:01:19,690 Have you noticed this one is something like a data entry that is developed by using bound. 17 00:01:19,980 --> 00:01:21,390 And here we have more adoption. 18 00:01:21,390 --> 00:01:23,080 This one is an Excel sheet. 19 00:01:23,250 --> 00:01:27,900 First you will get the data annual access you then you will create that one in your bonds data frame 20 00:01:28,230 --> 00:01:31,710 and then you will use that one on your programs. 21 00:01:31,710 --> 00:01:37,980 But we are not going to waste over time creating the data and making the data frames because this process 22 00:01:38,190 --> 00:01:39,730 is I believe very low. 23 00:01:39,750 --> 00:01:45,240 It takes a number of hours and then you will be able to create only one data offering because you need 24 00:01:45,240 --> 00:01:51,080 to enter all the values again and again and check whether something right or wrong so we are going to 25 00:01:51,080 --> 00:01:56,870 use some data shoots that are predefined with Siebel that are available in the Siebel library. 26 00:01:56,870 --> 00:02:03,020 So you just need to load these files and you will get these data sheets for that data sheets you need 27 00:02:03,020 --> 00:02:07,700 to go here just write data sets in Siebel 28 00:02:10,370 --> 00:02:19,600 they will get this website Seabourn don't load underscored Data Set Don't Go for the official seaborne 29 00:02:19,630 --> 00:02:21,980 one that is not showing here 30 00:02:25,020 --> 00:02:35,370 likely that one that something like a I go here because we also need network so if you just google out 31 00:02:35,370 --> 00:02:41,580 Siebel and you will get this one seamless statistical data visualization don't go for that one then 32 00:02:41,580 --> 00:02:46,800 it will be a little complex to find these go for this one the state data sets in Python or data shoots 33 00:02:46,800 --> 00:02:55,110 in the seaborne sorry their base and get to the Web site heavily a the speech on this page you will 34 00:02:55,110 --> 00:03:02,220 have a link here that is denoting the tub dot com because Seabourn is officially released at its hub 35 00:03:02,580 --> 00:03:09,450 and there you will find every five to Siebel so here if you click on that one it will direct you to 36 00:03:09,450 --> 00:03:16,800 the IT hub page very you have the datasets here these CSC files that you have we have also loaded in 37 00:03:17,320 --> 00:03:26,070 upon the section so these are some predefined files which are going to use here in this module and some 38 00:03:26,070 --> 00:03:35,260 of these we are going to need is this one it is good CSP tips NTSB and diamonds NTSB you can use anyone 39 00:03:35,280 --> 00:03:41,760 you want to but I have selected these three because tips is what using every kind of like teaching one 40 00:03:42,300 --> 00:03:51,590 because it has all types of data and has much viz. that bill I will tell you that protecting and this 41 00:03:51,590 --> 00:03:57,020 is about the data that from there we are going to get the data to analyze that one and you can also 42 00:03:57,830 --> 00:04:02,990 involve your own data that you are working on if you are looking in at any project then you can have 43 00:04:02,990 --> 00:04:05,190 that data also. 44 00:04:05,330 --> 00:04:08,870 Now let me move to the plot. 45 00:04:09,250 --> 00:04:14,330 So here I have got radicalized deployed in four basic parts. 46 00:04:14,800 --> 00:04:16,180 Two things here. 47 00:04:16,180 --> 00:04:21,140 This categorization is not official like it will get this one on the Google or anything else. 48 00:04:21,460 --> 00:04:23,710 I have just categorize them on a simple basis. 49 00:04:23,710 --> 00:04:25,930 So you will understand them better. 50 00:04:26,050 --> 00:04:31,960 You can classify them as you want to or you can just go for the official one like what the distribution 51 00:04:31,960 --> 00:04:33,870 type and all that data. 52 00:04:34,170 --> 00:04:39,180 And second thing here these four no do not include all the plots here. 53 00:04:39,280 --> 00:04:45,430 These four just include the plots to require new projects like they are the plot also that is known 54 00:04:45,430 --> 00:04:51,100 as regression plots that are something related to machine learning and because this one is a topic of 55 00:04:51,100 --> 00:04:56,530 that part and if you are working on machine learning because data sciences in general is to learn for 56 00:04:56,530 --> 00:05:03,680 machine learning then I will also tell you about duplication plots and modification is now here. 57 00:05:04,430 --> 00:05:11,960 I have defined these four categories these four categories are defined in a V like if I have the official 58 00:05:11,960 --> 00:05:18,820 seaborne website and if I move to the category you will find different groups here. 59 00:05:18,890 --> 00:05:23,560 And if you plot on any plot like if I go for this one Katie 60 00:05:26,270 --> 00:05:31,510 then you will find this one. 61 00:05:31,510 --> 00:05:33,900 This is the main we can see. 62 00:05:33,900 --> 00:05:40,180 That means in Texas uploading the data head in that one you find different parameters. 63 00:05:40,420 --> 00:05:44,020 That is X here by here see map shared true. 64 00:05:44,140 --> 00:05:50,170 These parameters are just for the styling or we can see some extra things required in any plot. 65 00:05:50,410 --> 00:05:53,190 But first two are denoting the labels. 66 00:05:53,200 --> 00:05:58,030 That is what is on accesses and what is on Vi X is based on these two. 67 00:05:58,240 --> 00:06:01,110 These four categories are categorized here. 68 00:06:01,630 --> 00:06:05,300 Like if you notice here we have two wells that is X and Y. 69 00:06:05,550 --> 00:06:13,800 And if you go for something like Here let me search did this load and if I open line plot also 70 00:06:19,530 --> 00:06:20,310 so here. 71 00:06:20,310 --> 00:06:22,920 Line plot also include two parameters 72 00:06:27,040 --> 00:06:30,440 and no deadline. 73 00:06:32,970 --> 00:06:34,740 Maybe that one is not shown here. 74 00:06:35,220 --> 00:06:41,280 But the thing I need to tell you here like when we have open the one like line built here. 75 00:06:42,000 --> 00:06:48,680 If you notice here we have two parameters in those we have two different parameters. 76 00:06:48,820 --> 00:06:51,850 They are distorting the distribution blocks. 77 00:06:52,000 --> 00:07:04,970 If I have any data shoots like let me open one or just load anyone from these into your dataset but 78 00:07:05,300 --> 00:07:06,810 let me take a simple example. 79 00:07:16,630 --> 00:07:24,960 Go to the images and if I open this one any random one then you will notice here we have different columns 80 00:07:26,120 --> 00:07:32,270 some columns have numeric value some have strings and the strings are not listed here like this one. 81 00:07:32,390 --> 00:07:38,720 Here we have some have strings some have numbers and some have floating point values. 82 00:07:38,720 --> 00:07:46,490 So first we have point plots they are the plots that uploaded like the simple plot but they have only 83 00:07:46,490 --> 00:07:53,190 one parameter defined by user out of these columns or all these. 84 00:07:53,260 --> 00:08:00,970 If I just load any graph on the basis of frequency here and the second component will be maybe that 85 00:08:00,970 --> 00:08:07,080 one numbers or anything the computer want to set up there based on that one the frequency is vetting 86 00:08:07,450 --> 00:08:14,410 then those plots are known as Point plots in which we have only one parameter will officially point 87 00:08:14,410 --> 00:08:16,020 plots or something else. 88 00:08:16,030 --> 00:08:21,930 If you search on here on this one you will also find a point plot there. 89 00:08:22,670 --> 00:08:27,840 And again I would not be able to find that one let me suggest that here. 90 00:08:27,920 --> 00:08:29,000 I have point blowout 91 00:08:32,660 --> 00:08:35,380 and David over DC on point plot. 92 00:08:35,660 --> 00:08:37,420 You will find something like these. 93 00:08:37,520 --> 00:08:40,270 This is known as the point Lord in Siebel. 94 00:08:40,440 --> 00:08:47,270 But this categorization here point plots doesn't mean this one that only mean we are using only one 95 00:08:47,270 --> 00:08:48,010 column. 96 00:08:48,020 --> 00:08:49,760 Or we can see one parameter. 97 00:08:49,760 --> 00:08:52,010 Here we have two barometer total bill and pay. 98 00:08:52,310 --> 00:08:54,870 So this will fall in distribution process. 99 00:08:55,010 --> 00:08:59,100 So distribution codes include in which we have two columns. 100 00:08:59,270 --> 00:09:05,730 Like if a plutocrat between hope and frequency that how much frequency is vetting and variation of host 101 00:09:06,590 --> 00:09:11,580 then that will be a distribution program that I am using two columns or against it. 102 00:09:11,580 --> 00:09:20,210 Two parameters different one to show did its own deletion that how they are betting with each other. 103 00:09:20,210 --> 00:09:22,890 Third one is the category plus category. 104 00:09:22,890 --> 00:09:31,730 Blood is something like when I am plotting a numerical value with any string value like if I have a 105 00:09:31,730 --> 00:09:37,670 number of smokers in a city then I would float a graph on that one that how many smokers do we have 106 00:09:37,970 --> 00:09:41,160 and categorized them according to male and female. 107 00:09:41,500 --> 00:09:43,930 Then those blows are known as category blocks. 108 00:09:44,090 --> 00:09:48,700 I hope you got the idea after that we have met explode. 109 00:09:48,720 --> 00:09:54,160 There's something else I will tell you while we are working on that one so I hope you got the idea that 110 00:09:54,160 --> 00:10:00,680 what the four plots are and how I have categorized them these two are the official one that I will tell 111 00:10:00,680 --> 00:10:02,490 you you will get the same result for them. 112 00:10:02,690 --> 00:10:07,000 But these two are something I just made to make you understand better. 113 00:10:07,010 --> 00:10:12,080 You can also learn that one in play but this will help you more after that. 114 00:10:12,080 --> 00:10:19,710 One more thing like the data why we are not creating the data in this body next because whenever you 115 00:10:19,710 --> 00:10:27,270 are going to work on any project 9 in 90 percent cases you will find that data already made by any other 116 00:10:27,270 --> 00:10:33,180 person because there are a number of data scientist working on data science projects and students also 117 00:10:33,510 --> 00:10:35,630 limit the data files by day. 118 00:10:36,120 --> 00:10:38,530 So don't waste time on making the data files. 119 00:10:38,580 --> 00:10:40,840 And if you are looking for any particular data. 120 00:10:40,840 --> 00:10:46,520 I but you have defended the line just get that file and change the data according to your say. 121 00:10:46,650 --> 00:10:52,230 Like if you are uploading marks of students or heights of student then just download any data related 122 00:10:52,230 --> 00:10:58,130 to that front and change the data according to that like changed the names and change the Heights. 123 00:10:58,140 --> 00:11:04,770 That's the normal way of working because if you take a simple example like we have seen six mobile companies 124 00:11:05,340 --> 00:11:12,600 and we have some users that are working or using their cell phones or SIM cards then these all the six 125 00:11:12,600 --> 00:11:15,760 companies will not get data from every single user. 126 00:11:15,930 --> 00:11:21,750 They will do something like either integrate this area in six part like you take this one I take that 127 00:11:21,750 --> 00:11:27,210 one and then we will share the data with any one company can get the data and it will sell the data 128 00:11:27,300 --> 00:11:33,150 it does so never get worried about the data retention of data is very easy. 129 00:11:33,150 --> 00:11:36,250 If you go for easy they like in a smart way. 130 00:11:36,630 --> 00:11:40,790 And if you are going like I will create my data my own then that is also good. 131 00:11:40,980 --> 00:11:47,660 But that good news consume your time and network time is the first parameter. 132 00:11:47,670 --> 00:11:52,380 Anyone observe while taking you on any job or selecting you for any job. 133 00:11:53,250 --> 00:11:54,380 So thanks for watching. 134 00:11:54,390 --> 00:11:59,130 I hope you got the idea that what we are going to do here and how the points are categorized more you 135 00:11:59,130 --> 00:12:03,540 will get these ideas that what they are when we dominate them. 136 00:12:04,360 --> 00:12:07,270 So from the next video we will work on the point. 137 00:12:07,380 --> 00:12:10,410 So thanks for watching and we will continue in the next video.