1 00:00:00,300 --> 00:00:01,000 Okay, off we go. 2 00:00:01,000 --> 00:00:03,633 So this data set, it's got a few columns. 3 00:00:03,633 --> 00:00:06,900 It's got row number customer ID surname. 4 00:00:06,900 --> 00:00:10,833 So what we're looking at is a data set of a bank. 5 00:00:11,466 --> 00:00:13,366 of course it's all fictional. 6 00:00:13,366 --> 00:00:16,366 It's not a real bank, but it's very realistic. 7 00:00:16,366 --> 00:00:19,166 And here we've got a snapshot of 8 00:00:19,166 --> 00:00:22,833 if you scroll down to the bottom is going to be 10,000 customers. 9 00:00:22,833 --> 00:00:24,933 So 10,000 rows in this data set. 10 00:00:25,900 --> 00:00:28,166 And what the bank 11 00:00:28,166 --> 00:00:31,800 did is they measured some things about these customers. 12 00:00:32,033 --> 00:00:32,333 Right. 13 00:00:32,333 --> 00:00:34,366 Why do why did they undergo this whole thing. 14 00:00:34,366 --> 00:00:35,333 And what's the challenge here. 15 00:00:35,333 --> 00:00:39,300 Well the bank has been seeing unusual churn rates. 16 00:00:39,300 --> 00:00:41,700 So churn is when people leave the company 17 00:00:41,700 --> 00:00:45,633 and they've seen customers leaving, at unusually high rates. 18 00:00:45,633 --> 00:00:47,600 And they want to understand what the problem is, 19 00:00:47,600 --> 00:00:50,266 and they want to assess and address that problem. 20 00:00:50,266 --> 00:00:54,766 And that's why they've hired you to look into this data set for them. 21 00:00:54,900 --> 00:00:57,000 And give them some insights. 22 00:00:57,000 --> 00:00:59,000 And how did this dataset come to be? 23 00:00:59,000 --> 00:01:03,033 Well, six months ago, the bank said, all right, there's a big problem. 24 00:01:03,433 --> 00:01:06,200 We got to take a sample of our customers. 25 00:01:06,200 --> 00:01:07,433 By the way, this is a sample. 26 00:01:07,433 --> 00:01:10,433 That 10,000 is a tiny number for this bank. 27 00:01:10,433 --> 00:01:12,733 This bank has millions of customers. 28 00:01:12,733 --> 00:01:16,066 It operates this this fictional bank operates in Europe 29 00:01:16,533 --> 00:01:19,666 in three countries France, Spain and Germany. 30 00:01:20,233 --> 00:01:22,233 And they have lots and lots of customers. 31 00:01:22,233 --> 00:01:25,833 So what they did is they took this sample of 10,000 customers 32 00:01:26,333 --> 00:01:29,600 and they measured six months ago everything they knew 33 00:01:29,600 --> 00:01:33,233 about them, their customer ID, their surname, credit score, 34 00:01:33,400 --> 00:01:37,900 their geography, their gender, their age, their tenure. 35 00:01:37,900 --> 00:01:42,066 So how long they've been with the bank, the balance of the customers at that point 36 00:01:42,066 --> 00:01:45,466 in time, the number of products they had at that point in time. 37 00:01:45,466 --> 00:01:46,466 So a number of products says 38 00:01:46,466 --> 00:01:49,500 things like how many products do they have to have a savings account? 39 00:01:49,500 --> 00:01:51,100 They have a credit card. 40 00:01:51,100 --> 00:01:52,533 do they have a loan? 41 00:01:52,533 --> 00:01:55,400 did the customer have a credit card or not? 42 00:01:55,400 --> 00:01:57,033 So there's a yes no flag. 43 00:01:57,033 --> 00:01:59,566 Is the customer an active member? Another yes. 44 00:01:59,566 --> 00:02:00,233 No flag? 45 00:02:00,233 --> 00:02:03,266 Active member can be measured differently by different organizations. 46 00:02:03,500 --> 00:02:04,600 It could be whether or not 47 00:02:04,600 --> 00:02:07,600 the customer logged into their online banking in the past month, 48 00:02:07,800 --> 00:02:11,666 whether they did a transaction in the past two months or some other. 49 00:02:12,033 --> 00:02:15,966 A measure like that, and estimated salary. 50 00:02:15,966 --> 00:02:18,300 So the bank doesn't know the salary of the customers, 51 00:02:18,300 --> 00:02:21,566 but based on the other things they know, they could estimate a salary 52 00:02:21,566 --> 00:02:24,000 for that customer. And they also gave you this information. 53 00:02:24,000 --> 00:02:27,000 So six months ago they measured all of these things 54 00:02:27,233 --> 00:02:30,500 and said, all right, so for these 10,000 randomly selected 55 00:02:30,500 --> 00:02:33,933 customers, we're going to just, watch them. 56 00:02:33,933 --> 00:02:36,700 So we're just going to wait six months and six months down the track. 57 00:02:36,700 --> 00:02:38,300 We're going to check 58 00:02:38,300 --> 00:02:41,300 who all of those customers left and who those customers stayed. 59 00:02:41,533 --> 00:02:43,933 And that's what this column exited represents here. 60 00:02:43,933 --> 00:02:48,200 It tells you whether or not the person left the bank within those six months. 61 00:02:48,200 --> 00:02:53,166 So this person over here, sometime within the six months he left the bank. 62 00:02:53,166 --> 00:02:56,166 And as of a couple of days ago, he's no longer with the bank. 63 00:02:56,533 --> 00:02:59,400 This person over here, on the other hand, is still off the bank. 64 00:02:59,400 --> 00:03:00,400 So there's a zero here. 65 00:03:00,400 --> 00:03:04,266 This person left the bank, this person stayed and so on. 66 00:03:04,266 --> 00:03:06,966 So if you see a one, that means the person is no longer for the bank. 67 00:03:06,966 --> 00:03:08,766 Is there a person still for the bank? 68 00:03:08,766 --> 00:03:12,600 And your goal is to create a geographic segmentation model 69 00:03:12,600 --> 00:03:17,500 to tell the bank which of their customers are at highest risk of leaving. 70 00:03:17,700 --> 00:03:21,300 What I wanted to mention today is that for a lot of customer 71 00:03:21,300 --> 00:03:23,333 centric organization, this is going to be valuable. 72 00:03:23,333 --> 00:03:24,766 So I've done this personally. 73 00:03:24,766 --> 00:03:26,433 I've done this so many times. 74 00:03:26,433 --> 00:03:31,366 It is such an a value add to any customer centric organization. 75 00:03:31,366 --> 00:03:36,200 So when whenever an organization deals with customers this is a lot of value add. 76 00:03:36,233 --> 00:03:40,200 And then the other thing is that this skill that you're going to 77 00:03:40,200 --> 00:03:42,300 learn is very transferable. 78 00:03:42,300 --> 00:03:44,200 It doesn't have to be for a bank. 79 00:03:44,200 --> 00:03:46,900 It doesn't have to be for churn rates. 80 00:03:46,900 --> 00:03:49,500 Geo demographic segmentation models can be applied 81 00:03:49,500 --> 00:03:52,533 to millions of scenarios. 82 00:03:52,533 --> 00:03:56,700 So here, for instance, even in a bank, the same scenario could work. 83 00:03:57,433 --> 00:04:00,833 should the person get a loan or not? 84 00:04:00,833 --> 00:04:03,700 Should the person be approved for credit or not? 85 00:04:03,700 --> 00:04:05,466 And once again, you'd have a binary outcome. 86 00:04:05,466 --> 00:04:09,666 So based on prior experience you would know, whether or not 87 00:04:09,666 --> 00:04:12,033 a person is reliable and you would build a model and say 88 00:04:12,033 --> 00:04:13,066 which people are 89 00:04:13,066 --> 00:04:16,800 more likely to be reliable and which people are more likely to default. 90 00:04:17,000 --> 00:04:20,000 And that could govern the bank's decision on whether or not to give loans. 91 00:04:20,200 --> 00:04:23,100 This could be in terms of fraudulent transactions, 92 00:04:23,100 --> 00:04:25,966 not even in a bank, in a different financial institutions. 93 00:04:25,966 --> 00:04:28,233 You could, find out which transactions 94 00:04:28,233 --> 00:04:31,233 are more likely to be fraudulent, which are less likely. 95 00:04:31,800 --> 00:04:35,400 there's lots of, scenarios where you could apply 96 00:04:35,400 --> 00:04:37,466 a geo demographic segmentation model, 97 00:04:37,466 --> 00:04:39,633 and it doesn't even have to be geo demographic. 98 00:04:39,633 --> 00:04:43,366 So whenever you have a scenario, when you have a binary outcome 99 00:04:43,366 --> 00:04:47,400 and you have lots of, independent variables, you can build a 100 00:04:47,600 --> 00:04:51,133 proper robust model and that will tell you which factors influence the outcome. 101 00:04:51,300 --> 00:04:54,000 So you can apply this knowledge. 102 00:04:54,000 --> 00:04:56,133 You're going to learn in any kind of scenario 103 00:04:56,133 --> 00:04:59,100 where you have a binary outcome and lots of independent variable.