1 00:00:00,850 --> 00:00:01,920 Hey, what's up, Gurus? 2 00:00:01,920 --> 00:00:02,930 Congratulations. 3 00:00:02,930 --> 00:00:05,860 You have made it to the end of a pretty long section. 4 00:00:05,860 --> 00:00:07,010 In this last video, 5 00:00:07,010 --> 00:00:10,550 we are going to recap some of the highlights of this section 6 00:00:10,550 --> 00:00:12,703 and things that you need to keep in mind. 7 00:00:13,820 --> 00:00:16,660 So in this video, it's all a review. 8 00:00:16,660 --> 00:00:18,620 We're not going to be covering anything new, 9 00:00:18,620 --> 00:00:21,530 so if you see anything here that you're not quite sure 10 00:00:21,530 --> 00:00:23,400 what it was or what it means, 11 00:00:23,400 --> 00:00:26,780 then you should go back and re-watch those videos. 12 00:00:26,780 --> 00:00:29,730 Second, focus on the DP-203. 13 00:00:29,730 --> 00:00:32,470 As you noticed from all of the lessons in this section, 14 00:00:32,470 --> 00:00:34,420 we are focusing on the 203, 15 00:00:34,420 --> 00:00:37,500 which sometimes means glossing over some details 16 00:00:37,500 --> 00:00:39,540 that might be helpful for a data engineer, 17 00:00:39,540 --> 00:00:41,210 such as Cosmos DB, 18 00:00:41,210 --> 00:00:44,210 but aren't likely to be found on the DP-203, 19 00:00:44,210 --> 00:00:48,050 so just keep that in mind as you move through this review. 20 00:00:48,050 --> 00:00:51,390 And finally, if you don't know something, again, review. 21 00:00:51,390 --> 00:00:53,230 If something seems confusing to you, 22 00:00:53,230 --> 00:00:54,860 or you're not quite sure, 23 00:00:54,860 --> 00:00:56,540 go back through and re-watch the video 24 00:00:56,540 --> 00:00:58,300 or redo that hands-on lab. 25 00:00:58,300 --> 00:00:59,990 It's important. 26 00:00:59,990 --> 00:01:02,030 All right, here we go. 27 00:01:02,030 --> 00:01:06,400 So what are the services of stream processing? 28 00:01:06,400 --> 00:01:09,000 I'm going to give you a second to see if you can name them. 29 00:01:09,000 --> 00:01:10,823 Hint, hint, there's 3. 30 00:01:12,450 --> 00:01:16,880 If you said Azure Stream Analytics, Databricks, 31 00:01:18,770 --> 00:01:22,200 and HDInsight, you are correct. 32 00:01:22,200 --> 00:01:23,460 And if you'll note, 33 00:01:23,460 --> 00:01:26,270 in this section we focused almost exclusively 34 00:01:26,270 --> 00:01:28,350 on Azure Stream Analytics, 35 00:01:28,350 --> 00:01:31,270 other than a few highlights of the other services, 36 00:01:31,270 --> 00:01:34,000 because if you look at the exam skills, 37 00:01:34,000 --> 00:01:36,840 they're really focusing on Azure Stream Analytics. 38 00:01:36,840 --> 00:01:37,750 Doesn't mean you won't see a Databricks 39 00:01:37,750 --> 00:01:39,550 or HDInsight question, 40 00:01:39,550 --> 00:01:42,783 but Stream Analytics seems to be the most likely choice. 41 00:01:43,980 --> 00:01:47,830 When would I choose streaming over batch? 42 00:01:47,830 --> 00:01:49,990 Well, streaming is a better fit 43 00:01:49,990 --> 00:01:52,770 if you need the information right now. 44 00:01:52,770 --> 00:01:55,280 Remember, it's good for recommendation engines, 45 00:01:55,280 --> 00:01:59,330 fraud detection, marketing applications, machine learning. 46 00:01:59,330 --> 00:02:01,860 However, streaming is more costly 47 00:02:01,860 --> 00:02:04,440 and a bit more complicated to do transformations 48 00:02:04,440 --> 00:02:06,360 and pull data insight out. 49 00:02:06,360 --> 00:02:09,050 This lesson is important. 50 00:02:09,050 --> 00:02:10,950 Make sure that you understand 51 00:02:10,950 --> 00:02:13,160 that there are 5 types of windows 52 00:02:13,160 --> 00:02:15,070 and that you know what they are. 53 00:02:15,070 --> 00:02:16,980 Remember, we talked about Stream Analytics 54 00:02:16,980 --> 00:02:18,390 and setting up those windows 55 00:02:18,390 --> 00:02:20,730 to look at data as it moves across? 56 00:02:20,730 --> 00:02:23,240 There's tumbling windows, hopping windows, 57 00:02:23,240 --> 00:02:28,200 sliding windows, session windows, and snapshot windows. 58 00:02:28,200 --> 00:02:29,920 In that lesson, I recommended 59 00:02:29,920 --> 00:02:33,090 that you understand, in detail, what all of those are. 60 00:02:33,090 --> 00:02:35,160 I still think that's incredibly important. 61 00:02:35,160 --> 00:02:38,110 So if you don't know what all 5 of those are, 62 00:02:38,110 --> 00:02:39,870 go back and re-watch that lesson. 63 00:02:39,870 --> 00:02:41,210 It's going to really help you. 64 00:02:41,210 --> 00:02:44,860 It's an important concept, not just for the DP-203, 65 00:02:44,860 --> 00:02:47,100 but also for a data engineer. 66 00:02:47,100 --> 00:02:49,500 Do you remember watermarks? 67 00:02:49,500 --> 00:02:51,870 Remember that there's 2 formulas for watermarks, 68 00:02:51,870 --> 00:02:56,140 one that involves incoming events and one that is not, 69 00:02:56,140 --> 00:02:58,740 that does not involve incoming events. 70 00:02:58,740 --> 00:02:59,863 Make sure that you remember 71 00:02:59,863 --> 00:03:03,910 that that top formula: watermark = largest event - 72 00:03:03,910 --> 00:03:05,470 out-of-order tolerance. 73 00:03:05,470 --> 00:03:07,190 That's for incoming. 74 00:03:07,190 --> 00:03:08,180 And the bottom one, 75 00:03:08,180 --> 00:03:09,830 the current estimated arrival time 76 00:03:09,830 --> 00:03:11,980 late arrival tolerance 77 00:03:11,980 --> 00:03:15,420 is for estimated events, 78 00:03:15,420 --> 00:03:17,433 which means there's no incoming event. 79 00:03:19,100 --> 00:03:20,840 Remember that there's a couple of different ways 80 00:03:20,840 --> 00:03:22,010 to keep time. 81 00:03:22,010 --> 00:03:23,570 Make sure that you remember that. 82 00:03:23,570 --> 00:03:25,330 And then, very importantly, 83 00:03:25,330 --> 00:03:27,530 remember the concept of tolerance. 84 00:03:27,530 --> 00:03:30,630 We looked how to set tolerance in the Azure portal. 85 00:03:30,630 --> 00:03:32,670 But remember when you set tolerance, 86 00:03:32,670 --> 00:03:35,160 if you set tolerance that's too short, 87 00:03:35,160 --> 00:03:36,720 you're likely to see data loss, 88 00:03:36,720 --> 00:03:40,140 and if you set it to be too long, you can break process. 89 00:03:40,140 --> 00:03:43,670 So make sure that you keep those concepts in mind. 90 00:03:43,670 --> 00:03:45,530 And if you don't remember how to do all that, 91 00:03:45,530 --> 00:03:46,913 go back and re-watch. 92 00:03:47,900 --> 00:03:49,230 Question. 93 00:03:49,230 --> 00:03:53,240 If I told you I wanted to upsert data via streaming, 94 00:03:53,240 --> 00:03:54,390 what would you tell me? 95 00:03:56,540 --> 00:04:01,413 You should tell me that you have to upsert via Cosmos DB, 96 00:04:02,600 --> 00:04:06,273 that you have to set your compatibility level to 1.2, 97 00:04:07,170 --> 00:04:09,650 and that it's going to require configuration 98 00:04:09,650 --> 00:04:12,240 on the Cosmos DB side. 99 00:04:12,240 --> 00:04:17,240 Keep in mind, Cosmos DB? Not on the DP-203. 100 00:04:17,490 --> 00:04:22,090 But upserting via Stream Analytics to Cosmos DB might 101 00:04:22,090 --> 00:04:23,160 be on the 203. 102 00:04:23,160 --> 00:04:26,000 It's at least a requirement that they've listed. 103 00:04:26,000 --> 00:04:28,493 So make sure that you remember how to do that. 104 00:04:30,350 --> 00:04:33,320 Don't forget to set alerts and monitor jobs. 105 00:04:33,320 --> 00:04:36,590 Hey, this is a really good common tip that you want to do, 106 00:04:36,590 --> 00:04:37,880 no matter what you do in the cloud. 107 00:04:37,880 --> 00:04:40,890 Whether it's data engineering, data science, DevOps. 108 00:04:40,890 --> 00:04:43,330 Set alerts. Monitor your jobs. 109 00:04:43,330 --> 00:04:45,460 You will thank yourself for it later. 110 00:04:45,460 --> 00:04:47,730 And remember, as we look at this, you do that 111 00:04:47,730 --> 00:04:50,740 by understanding what your key metrics are, 112 00:04:50,740 --> 00:04:54,290 looking at events, utilization, watermarking and errors, 113 00:04:54,290 --> 00:04:57,050 and then by creating those alerts. 114 00:04:57,050 --> 00:05:00,483 So make sure that you're doing both of those things. 115 00:05:03,300 --> 00:05:04,430 Remember also we talked 116 00:05:04,430 --> 00:05:07,350 about partitioning and repartitioning. 117 00:05:07,350 --> 00:05:10,350 Remember we said that partitioning was taking our data, 118 00:05:10,350 --> 00:05:13,973 dividing it up into buckets, and then using a partition key. 119 00:05:14,870 --> 00:05:17,210 We talked a lot about why we partition as well. 120 00:05:17,210 --> 00:05:19,480 Remember the story about LEGOs, 121 00:05:19,480 --> 00:05:21,913 that we use it to make our searching faster. 122 00:05:23,790 --> 00:05:26,530 And we talked about how to select a partition key 123 00:05:26,530 --> 00:05:28,970 by making sure that you're choosing a partition key 124 00:05:28,970 --> 00:05:32,290 that is both static and has high cardinality, 125 00:05:32,290 --> 00:05:34,283 which just means it has a big range. 126 00:05:35,730 --> 00:05:39,120 Then we talked about repartitioning, or shuffling. 127 00:05:39,120 --> 00:05:40,570 And that was for those scenarios 128 00:05:40,570 --> 00:05:42,750 that weren't fully parallelized, 129 00:05:42,750 --> 00:05:46,010 which basically means they're not embarrassingly parallel, 130 00:05:46,010 --> 00:05:48,820 which that was a-one-to-one relationship 131 00:05:48,820 --> 00:05:53,250 of input and output both being a partition. 132 00:05:53,250 --> 00:05:54,930 Remember, we talked about if there wasn't, 133 00:05:54,930 --> 00:05:57,710 that you can process those independently. 134 00:05:57,710 --> 00:05:59,410 Make sure that you keep all of that information 135 00:05:59,410 --> 00:06:00,383 in mind as well. 136 00:06:04,050 --> 00:06:04,883 Oh no! 137 00:06:04,883 --> 00:06:07,260 Your Stream Analytics job has crashed. 138 00:06:07,260 --> 00:06:08,670 Should you panic? 139 00:06:08,670 --> 00:06:10,190 The answer? 140 00:06:10,190 --> 00:06:11,190 No. 141 00:06:11,190 --> 00:06:12,023 But why? 142 00:06:13,260 --> 00:06:14,660 I'll give you just a second. 143 00:06:15,780 --> 00:06:16,847 You should have said, 144 00:06:16,847 --> 00:06:19,860 "Well, the job gets started and the work is broken up 145 00:06:19,860 --> 00:06:21,210 "into worker nodes. 146 00:06:21,210 --> 00:06:22,370 "Something bad happens. 147 00:06:22,370 --> 00:06:23,400 "The node fails. 148 00:06:23,400 --> 00:06:25,870 "But remember, Azure Stream Analytics has 149 00:06:25,870 --> 00:06:27,810 "an automatic recovery system 150 00:06:27,810 --> 00:06:30,330 "that's going to generate a new healthy node 151 00:06:30,330 --> 00:06:33,757 "and pick up based upon its last available checkpoint." 152 00:06:36,390 --> 00:06:40,200 We also talked about Stream Analytics output error policy. 153 00:06:40,200 --> 00:06:43,030 And remember, we're going to drop events that result 154 00:06:43,030 --> 00:06:44,750 in data conversion error, 155 00:06:44,750 --> 00:06:48,740 and then we're going to retry until the event succeeds. 156 00:06:48,740 --> 00:06:52,080 However, that could literally take forever. 157 00:06:52,080 --> 00:06:56,090 So in summary, we have talked about a ton of stuff. 158 00:06:56,090 --> 00:06:58,930 I hope that that very quick walkthrough sparked 159 00:06:58,930 --> 00:07:01,380 a lot of memories and that you have a clear understanding 160 00:07:01,380 --> 00:07:02,863 of all of those concepts. 161 00:07:03,990 --> 00:07:05,940 Some things to keep in mind as you go. 162 00:07:05,940 --> 00:07:07,860 First, think scenario. 163 00:07:07,860 --> 00:07:11,610 Remember, as you go through all of these lessons, 164 00:07:11,610 --> 00:07:14,560 be thinking about how you would fit these into scenarios 165 00:07:14,560 --> 00:07:16,070 that you might be working in, 166 00:07:16,070 --> 00:07:18,450 or think about scenarios that might happen 167 00:07:18,450 --> 00:07:20,480 as you go through and look at this. 168 00:07:20,480 --> 00:07:22,910 Things like, "Well, I'm buying a movie ticket, 169 00:07:22,910 --> 00:07:25,320 "and how would that data be processed?" 170 00:07:25,320 --> 00:07:28,170 That's going to make you a better engineer. 171 00:07:28,170 --> 00:07:31,640 Second, focus on Azure Stream Analytics first. 172 00:07:31,640 --> 00:07:33,250 Most of the stream requirements 173 00:07:33,250 --> 00:07:36,020 are about Azure Stream Analytics. 174 00:07:36,020 --> 00:07:39,660 Doesn't mean you shouldn't look at HDInsight or Databricks. 175 00:07:39,660 --> 00:07:43,030 It just means that 90% of the requirements 176 00:07:43,030 --> 00:07:44,973 are about Azure Stream Analytics. 177 00:07:46,290 --> 00:07:48,280 Finally, don't forget the labs. You're 178 00:07:48,280 --> 00:07:51,400 going to need those for the exam, and for your career. 179 00:07:51,400 --> 00:07:52,470 If you're enjoying the course, 180 00:07:52,470 --> 00:07:55,560 don't forget to smash that thumbs up button on the lessons. 181 00:07:55,560 --> 00:07:58,120 Congratulations on finishing this section, 182 00:07:58,120 --> 00:07:59,453 and I'll see you in the next.