1 00:00:00,320 --> 00:00:02,260 Welcome to our introduction 2 00:00:02,260 --> 00:00:04,950 to Azure Stream Analytics. 3 00:00:04,950 --> 00:00:06,030 Continuing on, 4 00:00:06,030 --> 00:00:08,540 we're going to take a look at Stream Analytics. 5 00:00:08,540 --> 00:00:10,600 And in this video, I'm going to be talking 6 00:00:10,600 --> 00:00:15,420 to you at a high level about what Azure Stream Analytics is. 7 00:00:15,420 --> 00:00:19,240 We're going to talk about when, where, and why it's used. 8 00:00:19,240 --> 00:00:20,073 And then, of course, 9 00:00:20,073 --> 00:00:23,160 we're going to talk a little bit a about how it works. 10 00:00:23,160 --> 00:00:24,720 So to start off our discussion, 11 00:00:24,720 --> 00:00:29,330 we are going to be talking about Azure Stream Analytics. 12 00:00:29,330 --> 00:00:32,010 In order to talk about Azure Stream Analytics, 13 00:00:32,010 --> 00:00:36,100 we have to talk about the tail of 2 processes. 14 00:00:36,100 --> 00:00:37,590 So over here on the right, 15 00:00:37,590 --> 00:00:40,380 you will see a giant circle of a pizza, 16 00:00:40,380 --> 00:00:43,380 and you'll also see a picture of a stream. 17 00:00:43,380 --> 00:00:47,360 So we need to talk about batch versus stream. 18 00:00:47,360 --> 00:00:49,690 So when we look at batch, 19 00:00:49,690 --> 00:00:54,060 batch processing is going to be much more like a pizza oven. 20 00:00:54,060 --> 00:00:57,300 Someone places an order, we queue up 2 21 00:00:57,300 --> 00:00:59,410 or 3 or 4 of those orders together, 22 00:00:59,410 --> 00:01:01,400 we throw those pizzas in the oven. 23 00:01:01,400 --> 00:01:04,080 We then pull, those is out, put it on the rack 24 00:01:04,080 --> 00:01:05,580 and then we serve the pizza; 25 00:01:05,580 --> 00:01:07,300 that's batch process. 26 00:01:07,300 --> 00:01:09,860 Stream process doesn't build things up. 27 00:01:09,860 --> 00:01:12,470 Stream process is just like this stream over here. 28 00:01:12,470 --> 00:01:13,960 As the data comes through, 29 00:01:13,960 --> 00:01:16,330 we're going to simply be processing that data 30 00:01:16,330 --> 00:01:18,400 in near or real time 31 00:01:18,400 --> 00:01:21,150 and we're going to be doing some transformations 32 00:01:21,150 --> 00:01:23,570 and work on that data as it moves through, 33 00:01:23,570 --> 00:01:26,723 but we're not stopping and running everything all at once. 34 00:01:28,320 --> 00:01:31,520 So Azure Stream Analytics then is going 35 00:01:31,520 --> 00:01:34,180 to be this stream process. 36 00:01:34,180 --> 00:01:35,410 And when we talk about batch, 37 00:01:35,410 --> 00:01:38,730 you're going to see that more in things like trading. 38 00:01:38,730 --> 00:01:40,150 So if we're going to place a trading order, 39 00:01:40,150 --> 00:01:41,850 we stop and place this trading order 40 00:01:41,850 --> 00:01:44,500 and then we place the next trading order. 41 00:01:44,500 --> 00:01:46,590 Or traffic patterns, 42 00:01:46,590 --> 00:01:50,250 we have a light that is red turns to green, 43 00:01:50,250 --> 00:01:52,050 then it turns to yellow and red and it stops, 44 00:01:52,050 --> 00:01:53,550 and then the next side goes. 45 00:01:53,550 --> 00:01:55,530 So it's a batch of traffic going through. 46 00:01:55,530 --> 00:01:57,570 The first batch goes, then we stop, 47 00:01:57,570 --> 00:01:59,840 then we send the next batch through. 48 00:01:59,840 --> 00:02:01,320 Stream processing on the other hand, 49 00:02:01,320 --> 00:02:03,830 you're going to see that for things like fraud detection 50 00:02:03,830 --> 00:02:05,310 because we don't have time to stop 51 00:02:05,310 --> 00:02:06,450 and analyze all the data. 52 00:02:06,450 --> 00:02:09,150 We want to see in real time what's happening. 53 00:02:09,150 --> 00:02:11,430 So we can detect where fraud might be used 54 00:02:11,430 --> 00:02:13,300 and we can turn it off immediately. 55 00:02:13,300 --> 00:02:15,040 So we have to be processing data 56 00:02:15,040 --> 00:02:18,300 from millions of different sources 57 00:02:18,300 --> 00:02:21,143 and seeing if we can detect patterns from that data. 58 00:02:21,990 --> 00:02:24,350 Also, you'll see that in purchasing. 59 00:02:24,350 --> 00:02:26,530 So, think Amazon here; 60 00:02:26,530 --> 00:02:29,760 if I go out and I buy something on Amazon, 61 00:02:29,760 --> 00:02:31,870 it's going to recommend other products for me 62 00:02:31,870 --> 00:02:33,600 and that's done in real time. 63 00:02:33,600 --> 00:02:36,720 So we need to have that sort of information. 64 00:02:36,720 --> 00:02:39,520 You'll also see that like at AMC theatres. 65 00:02:39,520 --> 00:02:40,830 So if you go to the movie theater 66 00:02:40,830 --> 00:02:42,210 and you buy a ticket, 67 00:02:42,210 --> 00:02:44,470 while that data is being processed in real-time 68 00:02:44,470 --> 00:02:46,430 because it has to know how many seats are left 69 00:02:46,430 --> 00:02:47,390 in the theatre, 70 00:02:47,390 --> 00:02:49,050 which seat you purchased. 71 00:02:49,050 --> 00:02:50,660 It has to be able to process all 72 00:02:50,660 --> 00:02:51,770 of that information through 73 00:02:51,770 --> 00:02:53,530 and spit out your tickets to you. 74 00:02:53,530 --> 00:02:56,493 So those are examples of stream processes. 75 00:02:59,210 --> 00:03:02,810 Be careful when moving immediately to stream. 76 00:03:02,810 --> 00:03:05,330 A lot of businesses, that's what they want 77 00:03:05,330 --> 00:03:07,247 because they look at this and they say, 78 00:03:07,247 --> 00:03:08,930 "Well, why wouldn't I want real time? 79 00:03:08,930 --> 00:03:10,110 "We want everything now. 80 00:03:10,110 --> 00:03:13,120 "I don't want to have to wait and only get data once a day. 81 00:03:13,120 --> 00:03:15,140 "I want everything immediately." 82 00:03:15,140 --> 00:03:16,360 Be careful when doing that 83 00:03:16,360 --> 00:03:18,430 because when we talk about stream, 84 00:03:18,430 --> 00:03:22,010 stream can be very, very expensive 85 00:03:22,010 --> 00:03:24,270 because we have to have a lot of processing 86 00:03:24,270 --> 00:03:25,820 to do any transformation 87 00:03:25,820 --> 00:03:28,380 on a live flowing stream of data, 88 00:03:28,380 --> 00:03:31,180 especially if it's a large stream of data. 89 00:03:31,180 --> 00:03:32,920 So you always want to go back 90 00:03:32,920 --> 00:03:35,530 and say what is the business case? 91 00:03:35,530 --> 00:03:38,720 What is it exactly that we are trying to solve? 92 00:03:38,720 --> 00:03:39,740 And then let's come up 93 00:03:39,740 --> 00:03:42,690 with a recommendation for that solution 94 00:03:42,690 --> 00:03:47,170 that's going to save our company the most time and money 95 00:03:47,170 --> 00:03:49,530 and/or at least give those arguments 96 00:03:49,530 --> 00:03:51,480 so that we know where the value lies. 97 00:03:51,480 --> 00:03:52,370 Is it in money? 98 00:03:52,370 --> 00:03:53,660 Is it in time? 99 00:03:53,660 --> 00:03:55,720 What are we actually trying to do here? 100 00:03:55,720 --> 00:03:58,040 So just make sure that you have those conversations 101 00:03:58,040 --> 00:03:59,880 and you kind of think through what you're doing, 102 00:03:59,880 --> 00:04:02,513 before immediately recommending a stream solution. 103 00:04:04,860 --> 00:04:08,790 All right, core concepts of Stream Analytics. 104 00:04:08,790 --> 00:04:13,420 So Stream Analytics has basically 3 things happening. 105 00:04:13,420 --> 00:04:15,870 The first is our input. 106 00:04:15,870 --> 00:04:17,260 So you see our input. 107 00:04:17,260 --> 00:04:19,070 We have Event and IoT Hubs 108 00:04:19,070 --> 00:04:21,660 and then also you'll see Blob storage there. 109 00:04:21,660 --> 00:04:24,950 Those are the 3 sources for Azure Stream Analytics 110 00:04:24,950 --> 00:04:27,040 and that's important to understand. 111 00:04:27,040 --> 00:04:28,270 So that's our first phase. 112 00:04:28,270 --> 00:04:29,103 It's fairly simple. 113 00:04:29,103 --> 00:04:33,430 We just connect our input streams into Stream Analytics. 114 00:04:33,430 --> 00:04:35,880 The next is where the magic happens. 115 00:04:35,880 --> 00:04:38,340 So this is our query section. 116 00:04:38,340 --> 00:04:40,060 So part 2, query, 117 00:04:40,060 --> 00:04:42,950 and in this stage we're going to do transformation 118 00:04:44,060 --> 00:04:46,540 Transformation on a streaming source, 119 00:04:46,540 --> 00:04:49,030 by definition, is going to be less 120 00:04:49,030 --> 00:04:51,430 than what you can do with a batch source; 121 00:04:51,430 --> 00:04:53,490 simply because with a batch source, 122 00:04:53,490 --> 00:04:54,930 we can get all the data together. 123 00:04:54,930 --> 00:04:56,070 We can stop. 124 00:04:56,070 --> 00:04:58,540 We can pause while we run all of our transformations, 125 00:04:58,540 --> 00:05:01,070 and then we can spit out our end result. 126 00:05:01,070 --> 00:05:05,150 In stream, the more complicated the transformation is, 127 00:05:05,150 --> 00:05:06,990 the more processing power you need. 128 00:05:06,990 --> 00:05:10,160 Because again, there's more data coming in right behind it. 129 00:05:10,160 --> 00:05:13,260 So we don't have time to stop and run everything overnight. 130 00:05:13,260 --> 00:05:14,910 So with transformations, 131 00:05:14,910 --> 00:05:17,630 it's a little bit more difficult and challenging. 132 00:05:17,630 --> 00:05:19,580 And with transformations here, 133 00:05:19,580 --> 00:05:22,040 you're going to use your query step 134 00:05:22,040 --> 00:05:24,540 in order to do those transformations. 135 00:05:24,540 --> 00:05:25,700 And just like the other lessons, 136 00:05:25,700 --> 00:05:27,530 we're going to dive more into this later, 137 00:05:27,530 --> 00:05:29,690 but again I want you to get the overarching picture 138 00:05:29,690 --> 00:05:30,940 of what's happening here. 139 00:05:31,850 --> 00:05:36,240 Lastly, output. So send things into the system, 140 00:05:36,240 --> 00:05:38,830 do something to the data that's being streamed through 141 00:05:38,830 --> 00:05:41,230 and then we're going to send it somewhere to be stored 142 00:05:41,230 --> 00:05:43,200 and to save those results. 143 00:05:43,200 --> 00:05:45,080 So that could be to Power BI 144 00:05:45,080 --> 00:05:47,110 to get real-time dashboards 145 00:05:47,110 --> 00:05:49,430 that could be dumping it into a Data Lake 146 00:05:49,430 --> 00:05:51,070 so that something else can be done to it 147 00:05:51,070 --> 00:05:54,010 or it can be pulled out and sent to a customer's account. 148 00:05:54,010 --> 00:05:55,300 Lots of things can happen there, 149 00:05:55,300 --> 00:05:58,160 but we have our output phase which is last. 150 00:05:58,160 --> 00:06:00,640 That's really it for Stream Analytics. 151 00:06:00,640 --> 00:06:04,100 It's not wildly complicated, at least in its architecture. 152 00:06:04,100 --> 00:06:07,130 And we'll dive further into those transformations 153 00:06:07,130 --> 00:06:09,840 which is kind of the main piece to Stream Analytics. 154 00:06:09,840 --> 00:06:11,990 You can kind of see what's happening later. 155 00:06:12,950 --> 00:06:14,580 The last concept we're going to talk about 156 00:06:14,580 --> 00:06:17,260 is the concept of windowing. 157 00:06:17,260 --> 00:06:19,330 So when we think about windowing, 158 00:06:19,330 --> 00:06:21,580 we are looking at, you can see here, 159 00:06:21,580 --> 00:06:23,660 outdoors, we're looking at a window. 160 00:06:23,660 --> 00:06:26,020 When we think about that in terms of data, 161 00:06:26,020 --> 00:06:28,290 when do you look at your data 162 00:06:28,290 --> 00:06:31,180 and what is the output of the view that you want to see? 163 00:06:31,180 --> 00:06:33,640 So what I mean by that is 164 00:06:33,640 --> 00:06:35,580 am I interested in taking a look 165 00:06:35,580 --> 00:06:38,020 at every single piece of data? 166 00:06:38,020 --> 00:06:39,620 I can't get very much transformation. 167 00:06:39,620 --> 00:06:42,570 I can't get averages off of every single piece of data. 168 00:06:42,570 --> 00:06:45,970 Do I want to look at a window of only 10 events? 169 00:06:45,970 --> 00:06:49,950 Do I want to look at a window of 10 minutes? 170 00:06:49,950 --> 00:06:53,400 How do I want to handle data that arrives late? 171 00:06:53,400 --> 00:06:56,500 These are concepts of windowing. 172 00:06:56,500 --> 00:06:59,820 So it's looking at the data as it comes into your system 173 00:06:59,820 --> 00:07:02,000 and figuring out the best way to kind of arrange 174 00:07:02,000 --> 00:07:05,210 or clump that data so that we can get transformations 175 00:07:05,210 --> 00:07:07,130 in the views that we want to see. 176 00:07:07,130 --> 00:07:09,440 The averages, the whatever we want to look at, 177 00:07:09,440 --> 00:07:10,950 so that we can get the data that we need 178 00:07:10,950 --> 00:07:13,010 to make our business decision. 179 00:07:13,010 --> 00:07:14,810 So we'll talk more about windowing, 180 00:07:14,810 --> 00:07:17,720 but it's a very important concept to think about 181 00:07:17,720 --> 00:07:20,453 when looking at anything that involves streaming. 182 00:07:23,510 --> 00:07:24,950 All right, so that is it, 183 00:07:24,950 --> 00:07:26,690 and I actually want to pause here. 184 00:07:26,690 --> 00:07:29,450 So you'll probably notice the same as I do, 185 00:07:29,450 --> 00:07:32,160 there's this very odd looking picture of a woman 186 00:07:32,160 --> 00:07:35,393 in a green face mask and a Macintosh computer. 187 00:07:36,751 --> 00:07:38,060 I'm not really sure what's happening here, 188 00:07:38,060 --> 00:07:40,570 but I was searching for "think," 189 00:07:40,570 --> 00:07:41,880 and this came up, and 190 00:07:41,880 --> 00:07:45,110 yeah, it really made me stop and think for a few minutes. 191 00:07:45,110 --> 00:07:47,623 So there you go, random picture for the day. 192 00:07:48,700 --> 00:07:50,880 But looking back at our review here, 193 00:07:50,880 --> 00:07:52,950 let's talk about Stream Analytics. 194 00:07:52,950 --> 00:07:55,220 We talked about the introduction 195 00:07:55,220 --> 00:07:57,520 for every single one of these services. 196 00:07:57,520 --> 00:07:59,480 It's extremely important 197 00:07:59,480 --> 00:08:02,810 that you think about what the service is used for. 198 00:08:02,810 --> 00:08:04,190 You can't figure out how to put it 199 00:08:04,190 --> 00:08:06,640 into a data engineering architecture 200 00:08:06,640 --> 00:08:08,390 without understanding what it is 201 00:08:08,390 --> 00:08:10,173 and what it does at its core. 202 00:08:11,180 --> 00:08:13,120 Next, how do all the pieces 203 00:08:13,120 --> 00:08:15,670 of Stream Analytics architecture fit together? 204 00:08:15,670 --> 00:08:16,800 This one was pretty easy. 205 00:08:16,800 --> 00:08:21,310 We looked at inputs and queries and outputs. 206 00:08:21,310 --> 00:08:23,740 Those are the 3 main components of streaming. 207 00:08:23,740 --> 00:08:26,310 So make sure you keep those 3 concepts in mind. 208 00:08:26,310 --> 00:08:28,170 Again, we'll dive further into it. 209 00:08:28,170 --> 00:08:29,220 And then you also want to think 210 00:08:29,220 --> 00:08:32,390 about windowing and what is a window. 211 00:08:32,390 --> 00:08:33,630 And then we'll use that later 212 00:08:33,630 --> 00:08:35,850 to talk about the various types of windows 213 00:08:35,850 --> 00:08:38,320 and how that can be used down the road. 214 00:08:38,320 --> 00:08:40,430 All right, that's actually it for this lesson. 215 00:08:40,430 --> 00:08:41,570 In the next lesson, 216 00:08:41,570 --> 00:08:44,250 we will talk about Databricks. 217 00:08:44,250 --> 00:08:46,060 So we'll be diving in and talk about that 218 00:08:46,060 --> 00:08:47,210 and I'll see you there.