1 00:00:00,480 --> 00:00:02,600 So let's take a few minutes to talk about 2 00:00:02,600 --> 00:00:05,380 designing and configuring exception handling 3 00:00:05,380 --> 00:00:07,450 in Data Factory. 4 00:00:07,450 --> 00:00:10,060 In this lesson, we're going to be talking about 5 00:00:10,060 --> 00:00:11,920 how we design and think through 6 00:00:11,920 --> 00:00:14,260 our exception handling strategy. 7 00:00:14,260 --> 00:00:16,860 We're going to take a look at what is actually possible. 8 00:00:16,860 --> 00:00:18,530 And then we're going to talk about 9 00:00:18,530 --> 00:00:21,030 how we implement that strategy. 10 00:00:21,030 --> 00:00:23,020 And then of course, as with most of these, 11 00:00:23,020 --> 00:00:26,190 we're also going to see it in action in the Azure portal. 12 00:00:26,190 --> 00:00:28,333 So with that, hey, let's get started. 13 00:00:30,360 --> 00:00:33,640 First up, a few key considerations to think about 14 00:00:33,640 --> 00:00:36,770 when we design an exception handling strategy. 15 00:00:36,770 --> 00:00:39,740 We need to think about the activity section, 16 00:00:39,740 --> 00:00:42,290 we need to think about fault tolerance, 17 00:00:42,290 --> 00:00:45,000 and we need to think about retry attempts. 18 00:00:45,000 --> 00:00:48,160 So when we build a pipeline in Data Factory, 19 00:00:48,160 --> 00:00:51,380 we can choose what happens upon the completion 20 00:00:51,380 --> 00:00:55,320 or failure of one of those activities. 21 00:00:55,320 --> 00:00:58,020 So we need to think about what we want to have happen 22 00:00:58,020 --> 00:01:00,790 as we build our pipelines. 23 00:01:00,790 --> 00:01:03,010 Fault tolerance, another thing. 24 00:01:03,010 --> 00:01:05,640 So as we look at our fault tolerance, 25 00:01:05,640 --> 00:01:08,350 as we're copying data across a pipeline, 26 00:01:08,350 --> 00:01:11,340 what happens when part of it fails? 27 00:01:11,340 --> 00:01:13,330 There are ways to continue on 28 00:01:13,330 --> 00:01:15,460 and we'll look and see what those are. 29 00:01:15,460 --> 00:01:17,510 And then finally, retry attempts. 30 00:01:17,510 --> 00:01:19,740 We always need to think about retry attempts 31 00:01:19,740 --> 00:01:21,460 and the pluses and negatives 32 00:01:21,460 --> 00:01:23,823 that go with setting a retry strategy. 33 00:01:25,460 --> 00:01:28,190 With activity, we've got 4 different types. 34 00:01:28,190 --> 00:01:32,970 We have success, failure, completion, and skipped. 35 00:01:32,970 --> 00:01:37,050 So success, when you see a success dependency, 36 00:01:37,050 --> 00:01:41,290 what we're doing is, the second activity only kicks off 37 00:01:41,290 --> 00:01:43,710 when the first activity succeeds. 38 00:01:43,710 --> 00:01:45,450 So we don't start the second activity 39 00:01:45,450 --> 00:01:47,900 until the first one succeeds. 40 00:01:47,900 --> 00:01:50,650 Failure, if we add a failure dependency, 41 00:01:50,650 --> 00:01:53,330 the second activity only executes 42 00:01:53,330 --> 00:01:56,560 if the first activity fails. 43 00:01:56,560 --> 00:01:58,410 And we're going to go through and I'll show you all of this 44 00:01:58,410 --> 00:02:00,160 in the portal in just a second, 45 00:02:00,160 --> 00:02:02,930 but these are the definitions that you need to know. 46 00:02:02,930 --> 00:02:06,020 Completion, so if you do a completion dependency, 47 00:02:06,020 --> 00:02:09,160 the second activity only executes 48 00:02:09,160 --> 00:02:13,390 when the first activity completes regardless of its status. 49 00:02:13,390 --> 00:02:16,240 So fail or success, it doesn't matter. 50 00:02:16,240 --> 00:02:17,400 As soon as it completes, 51 00:02:17,400 --> 00:02:19,650 it's going to kick off a completion activity. 52 00:02:20,879 --> 00:02:22,730 And then finally, the skipped. 53 00:02:22,730 --> 00:02:25,690 So if you have a skipped dependency, 54 00:02:25,690 --> 00:02:27,970 the second activity only executes 55 00:02:27,970 --> 00:02:30,343 if the first one isn't executed. 56 00:02:33,160 --> 00:02:36,620 So next up, let's talk about fault tolerance. 57 00:02:36,620 --> 00:02:38,760 So as you look at fault tolerance, 58 00:02:38,760 --> 00:02:40,240 the question you need to ask yourself 59 00:02:40,240 --> 00:02:43,510 is what errors can we successfully ignore? 60 00:02:43,510 --> 00:02:46,210 And we'll take a look at that as well here in just a second. 61 00:02:46,210 --> 00:02:47,850 And then finally, retry. 62 00:02:47,850 --> 00:02:50,930 We need to define the retry attempts and time limits. 63 00:02:50,930 --> 00:02:52,440 And one of the things you need to think about 64 00:02:52,440 --> 00:02:55,730 as you set up the retry, if you have a pipeline running 65 00:02:55,730 --> 00:02:59,870 and you set a retry of 10, as it retries, 66 00:02:59,870 --> 00:03:02,780 it's going to be holding up your entire pipeline, 67 00:03:02,780 --> 00:03:04,650 and so you need to think about those time limits 68 00:03:04,650 --> 00:03:07,410 and what that actually means for your pipeline 69 00:03:07,410 --> 00:03:09,050 because you don't want to cause failure 70 00:03:09,050 --> 00:03:13,220 or further breakdown of the pipeline later on downstream 71 00:03:13,220 --> 00:03:17,250 because you've set retry attempts and time limits too high. 72 00:03:17,250 --> 00:03:18,897 So that's the one side, 73 00:03:18,897 --> 00:03:20,390 and the other side of that is 74 00:03:20,390 --> 00:03:22,190 if you don't set it high enough, 75 00:03:22,190 --> 00:03:25,600 you don't want to stop or break an entire pipeline 76 00:03:25,600 --> 00:03:28,930 because of a 5-second retry attempt. 77 00:03:28,930 --> 00:03:30,530 So just some things to think about 78 00:03:30,530 --> 00:03:33,010 as you build those pipelines. 79 00:03:33,010 --> 00:03:35,270 But with that, let's go ahead and hop into the portal 80 00:03:35,270 --> 00:03:36,310 and take a look at 81 00:03:36,310 --> 00:03:38,793 what some of these things actually look like. 82 00:03:39,640 --> 00:03:43,200 So I've opened up a pipeline in Data Factory 83 00:03:43,200 --> 00:03:45,810 and I want to walk through 84 00:03:45,810 --> 00:03:48,550 some of the activity dependencies first. 85 00:03:48,550 --> 00:03:50,730 So I'm going to grab a Copy Data activity, 86 00:03:50,730 --> 00:03:53,500 just drag that onto the canvas. 87 00:03:53,500 --> 00:03:55,030 And let's talk through 88 00:03:55,030 --> 00:03:57,710 some of the things you need to be setting. 89 00:03:57,710 --> 00:04:01,240 First, down here in the configuration panel, 90 00:04:01,240 --> 00:04:03,810 we want to, of course, set our source and sink 91 00:04:03,810 --> 00:04:05,150 and all of those things, 92 00:04:05,150 --> 00:04:07,470 but from an exception handling policy, 93 00:04:07,470 --> 00:04:12,010 what we really want to look at is here in Retry. 94 00:04:12,010 --> 00:04:13,790 So you can see our retry here 95 00:04:13,790 --> 00:04:16,420 and we can set our maximum number of attempts. 96 00:04:16,420 --> 00:04:18,490 So I could say 10 attempts, 97 00:04:18,490 --> 00:04:21,680 and I can set a retry interval of 30 seconds, 98 00:04:21,680 --> 00:04:24,100 which means between each retry, 99 00:04:24,100 --> 00:04:26,820 it's going to give us 30 seconds, 100 00:04:26,820 --> 00:04:31,020 which means that we would be spending 300 seconds 101 00:04:31,020 --> 00:04:36,020 or 5 minutes on this retry of this one copy activity. 102 00:04:36,460 --> 00:04:40,510 Moving a little further on, down here under Settings, 103 00:04:40,510 --> 00:04:43,226 this is where we can set our fault tolerance, 104 00:04:43,226 --> 00:04:45,800 and so when we select our fault tolerance, 105 00:04:45,800 --> 00:04:46,950 we're going to click here 106 00:04:46,950 --> 00:04:49,890 and we can choose to skip incompatible rows 107 00:04:49,890 --> 00:04:52,550 or skip missing files or forbidden files 108 00:04:52,550 --> 00:04:55,180 or files with invalid names. 109 00:04:55,180 --> 00:04:57,610 And it depends on how we set this up, of course, 110 00:04:57,610 --> 00:04:58,920 so some of these are going to be grayed out 111 00:04:58,920 --> 00:05:02,489 depending upon what sources we've selected to move, 112 00:05:02,489 --> 00:05:07,070 but this is how we set our fault tolerance. 113 00:05:07,070 --> 00:05:10,760 I would also recommend coming down and enabling logging. 114 00:05:10,760 --> 00:05:12,410 If you look at the logging, 115 00:05:12,410 --> 00:05:15,360 this allows us to log copied files, skipped files, 116 00:05:15,360 --> 00:05:19,080 skipped rows, and so we can specify a storage connection 117 00:05:19,080 --> 00:05:21,410 and then we can choose the settings there, 118 00:05:21,410 --> 00:05:23,340 but I think that's something that's also helpful, 119 00:05:23,340 --> 00:05:25,492 especially as we're starting to run pipelines 120 00:05:25,492 --> 00:05:27,220 because it'll give you a better feel 121 00:05:27,220 --> 00:05:30,660 of what's actually happening as the pipeline runs. 122 00:05:30,660 --> 00:05:32,980 Alright, so the last thing we want to talk about here 123 00:05:32,980 --> 00:05:36,060 in the actual portal is the activities. 124 00:05:36,060 --> 00:05:39,210 So here on this little plus button, if I choose that, 125 00:05:39,210 --> 00:05:42,870 I can add activities on success, failure, completion, 126 00:05:42,870 --> 00:05:44,500 and skipped. 127 00:05:44,500 --> 00:05:47,860 So let's say this is the first step in my pipeline 128 00:05:47,860 --> 00:05:50,003 and I'm going to copy data over. 129 00:05:50,910 --> 00:05:52,950 After that data copies over, 130 00:05:52,950 --> 00:05:57,290 let's say that I want to kick off a Databricks notebook. 131 00:05:57,290 --> 00:05:59,473 So I can drag this activity up here. 132 00:06:00,360 --> 00:06:04,810 I can choose an activity on success. 133 00:06:04,810 --> 00:06:08,320 Drag that little green box and pull it up here, 134 00:06:08,320 --> 00:06:11,170 and now, this is my success activity. 135 00:06:11,170 --> 00:06:13,230 So if the data successfully copies, 136 00:06:13,230 --> 00:06:15,180 it's going to then kick off this notebook 137 00:06:15,180 --> 00:06:17,420 and it's going to run the notebook. 138 00:06:17,420 --> 00:06:20,270 I can also add a failure activity. 139 00:06:20,270 --> 00:06:22,700 So let's say that if I have a failure, 140 00:06:22,700 --> 00:06:25,980 I want to have an Azure function kicked off 141 00:06:25,980 --> 00:06:27,470 that does something else. 142 00:06:27,470 --> 00:06:30,523 I can drag this over here, and again, 143 00:06:31,720 --> 00:06:35,990 drag the square over to the new activity. 144 00:06:35,990 --> 00:06:38,920 And so now, I've got 2 different things going on. 145 00:06:38,920 --> 00:06:41,250 I'll have a copy activity that runs. 146 00:06:41,250 --> 00:06:43,460 If it's successful, it kicks off a notebook. 147 00:06:43,460 --> 00:06:44,710 If it's not successful, 148 00:06:44,710 --> 00:06:46,568 it's going to kick off an Azure function 149 00:06:46,568 --> 00:06:48,584 that could alert me, 150 00:06:48,584 --> 00:06:50,040 or could do, I don't know, we could build it 151 00:06:50,040 --> 00:06:52,100 to do a lot of different things with a function, 152 00:06:52,100 --> 00:06:53,790 but it's going to do something else 153 00:06:53,790 --> 00:06:55,883 to help us out in our pipeline. 154 00:06:57,860 --> 00:07:01,650 I can also come in here, and let's say 155 00:07:01,650 --> 00:07:06,650 that I want to have a webhook kicked off as well, 156 00:07:08,110 --> 00:07:10,130 and I want this webhook to go, 157 00:07:10,130 --> 00:07:13,460 and I want it to go whether this activity succeeds or fails. 158 00:07:13,460 --> 00:07:16,710 So I can come down here to Completion, grab the blue box, 159 00:07:16,710 --> 00:07:19,350 and bring that over here. 160 00:07:19,350 --> 00:07:22,890 And so now, you can see that I have a pipeline 161 00:07:22,890 --> 00:07:25,750 that's going to do something on success, failure, 162 00:07:25,750 --> 00:07:28,550 and regardless of the status, succeed or fail, 163 00:07:28,550 --> 00:07:30,750 it's going to kick off this webhook. 164 00:07:30,750 --> 00:07:33,670 And then finally, our last one is our skipped. 165 00:07:33,670 --> 00:07:35,680 So let's say that off of this pipeline, 166 00:07:35,680 --> 00:07:38,070 I don't know, let's just say we had a lookup here. 167 00:07:38,070 --> 00:07:41,933 And from this notebook, I could say Skipped. 168 00:07:43,000 --> 00:07:44,343 Drag this here. 169 00:07:47,260 --> 00:07:50,450 So if this notebook doesn't get kicked off 170 00:07:50,450 --> 00:07:52,500 because this doesn't succeed-- 171 00:07:52,500 --> 00:07:54,760 so let's just say that we start this pipeline, 172 00:07:54,760 --> 00:07:55,960 the pipeline fails, 173 00:07:55,960 --> 00:07:58,280 it comes down here to this Azure function, 174 00:07:58,280 --> 00:08:01,350 if that's the case, this notebook never runs, 175 00:08:01,350 --> 00:08:03,440 and if this notebook never runs, 176 00:08:03,440 --> 00:08:05,820 it's going to kick off this skipped activity 177 00:08:05,820 --> 00:08:08,530 because it never ran as the pipeline went through, 178 00:08:08,530 --> 00:08:11,230 and so we would run this lookup activity. 179 00:08:11,230 --> 00:08:13,700 So those are the 4 activities that you have. 180 00:08:13,700 --> 00:08:16,050 I would suggest that you actually take a whiteboard 181 00:08:16,050 --> 00:08:19,380 and map out all of the different steps in your pipelines 182 00:08:19,380 --> 00:08:21,570 and think about what do you want to happen, 183 00:08:21,570 --> 00:08:24,360 whether that's email notifications or webhooks 184 00:08:24,360 --> 00:08:28,150 or Azure functions or notebooks or further copy activities. 185 00:08:28,150 --> 00:08:31,210 Whatever it is, I would suggest that you map all of that out 186 00:08:31,210 --> 00:08:32,800 because it can get quite complicated 187 00:08:32,800 --> 00:08:34,763 if you have a fairly large pipeline. 188 00:08:36,300 --> 00:08:38,420 So with that, let's jump back in 189 00:08:38,420 --> 00:08:40,880 and talk about implementing our strategy. 190 00:08:40,880 --> 00:08:41,960 So first stop, 191 00:08:41,960 --> 00:08:45,110 think about what are the steps in your pipeline. 192 00:08:45,110 --> 00:08:48,920 Map out those main steps in your pipeline. 193 00:08:48,920 --> 00:08:51,850 Then second, I would recommend thinking about 194 00:08:51,850 --> 00:08:55,020 where notifications are important along those steps. 195 00:08:55,020 --> 00:08:58,120 Where do you need to receive emails on success 196 00:08:58,120 --> 00:08:59,700 or failure or completion? 197 00:08:59,700 --> 00:09:01,800 Where do you need to know what's going on? 198 00:09:04,240 --> 00:09:07,270 What happens at an outright failure? 199 00:09:07,270 --> 00:09:11,210 So if a step completely fails, what do you want to do? 200 00:09:11,210 --> 00:09:13,100 Do we stop the pipeline entirely? 201 00:09:13,100 --> 00:09:14,850 Do we do notifications? 202 00:09:14,850 --> 00:09:16,320 Do we retry? 203 00:09:16,320 --> 00:09:19,320 Do we kick off a webhook to do something else? 204 00:09:19,320 --> 00:09:20,530 What are we going to do 205 00:09:20,530 --> 00:09:24,270 if we have an outright failure of an activity step? 206 00:09:24,270 --> 00:09:27,200 Has general troubleshooting been implemented? 207 00:09:27,200 --> 00:09:30,390 Now, we'll talk about troubleshooting in a later section, 208 00:09:30,390 --> 00:09:33,100 but you want to think about troubleshooting as well, 209 00:09:33,100 --> 00:09:35,780 and if you go through the activity steps 210 00:09:35,780 --> 00:09:37,020 and you start to have issues, 211 00:09:37,020 --> 00:09:40,210 you want to be thinking about, why am I having these issues? 212 00:09:40,210 --> 00:09:42,700 Is there maybe a problem with the source file 213 00:09:42,700 --> 00:09:44,970 that I'm trying to implement? 214 00:09:44,970 --> 00:09:48,800 Am I having a problem with an event hub 215 00:09:48,800 --> 00:09:50,780 or the data that's getting ingested 216 00:09:50,780 --> 00:09:54,340 into my blob storage, for instance? 217 00:09:54,340 --> 00:09:56,960 Those are the kinds of things that might be causing trouble 218 00:09:56,960 --> 00:09:58,540 in your Data Factory pipeline 219 00:09:58,540 --> 00:10:00,080 that really doesn't have a whole lot to do 220 00:10:00,080 --> 00:10:02,070 with the Data Factory activity. 221 00:10:02,070 --> 00:10:03,733 So be thinking about that. 222 00:10:05,040 --> 00:10:06,750 Next, think about optimization. 223 00:10:06,750 --> 00:10:09,210 We'll talk about that in another section as well, 224 00:10:09,210 --> 00:10:11,610 but thinking about all of those activity steps, 225 00:10:11,610 --> 00:10:15,390 have you gone through and optimized each of those steps 226 00:10:15,390 --> 00:10:17,600 to make sure that your pipeline is robust 227 00:10:17,600 --> 00:10:20,623 and it's running as efficiently as it possibly can? 228 00:10:22,160 --> 00:10:23,370 If you've done all that, 229 00:10:23,370 --> 00:10:26,470 we are set up and ready for a review. 230 00:10:26,470 --> 00:10:28,700 So first, look, have a plan. 231 00:10:28,700 --> 00:10:30,740 It's really important that you think about 232 00:10:30,740 --> 00:10:34,430 your exception handling strategy and your pipeline strategy, 233 00:10:34,430 --> 00:10:37,200 and that you have an actual plan mapped out 234 00:10:37,200 --> 00:10:39,430 rather than just going into Data Factory 235 00:10:39,430 --> 00:10:41,110 and grabbing a few activities 236 00:10:41,110 --> 00:10:43,023 and dragging them onto the canvas. 237 00:10:44,020 --> 00:10:47,180 Once you have that plan, implement your strategy. 238 00:10:47,180 --> 00:10:48,740 Make sure that you've implemented 239 00:10:48,740 --> 00:10:50,400 and that you've tested the strategy, 240 00:10:50,400 --> 00:10:53,350 and you have a really good feel for what's going to happen. 241 00:10:54,470 --> 00:10:57,570 For the DP-203, you need to understand what's possible. 242 00:10:57,570 --> 00:10:58,480 So if you've gone through 243 00:10:58,480 --> 00:11:03,010 and you're thinking through those steps of optimizing 244 00:11:03,010 --> 00:11:06,750 and troubleshooting and building in alerts 245 00:11:06,750 --> 00:11:10,280 and what happens on failure, if you're doing that, 246 00:11:10,280 --> 00:11:13,060 you should have a really good feel, not just for the DP-203, 247 00:11:13,060 --> 00:11:16,510 but honestly, for your career as a data engineer. 248 00:11:16,510 --> 00:11:19,090 So make sure that you're thinking about what's possible 249 00:11:19,090 --> 00:11:20,120 as you go through 250 00:11:20,120 --> 00:11:23,880 and design your exception handling strategy. 251 00:11:23,880 --> 00:11:26,253 That's it for this lesson, I'll see ya in the next.