1 00:00:00,280 --> 00:00:01,410 Welcome back. 2 00:00:01,410 --> 00:00:02,770 Hopefully, you've taken a moment 3 00:00:02,770 --> 00:00:04,460 and you have gone to the bathroom 4 00:00:04,460 --> 00:00:07,110 and gotten yourself a snack, and you are all settled in 5 00:00:07,110 --> 00:00:09,480 and ready to start Part 2. 6 00:00:09,480 --> 00:00:10,450 So in this part, 7 00:00:10,450 --> 00:00:13,370 we're going to dive in and I'm actually going to show you 8 00:00:13,370 --> 00:00:15,690 an example in the Azure portal. 9 00:00:15,690 --> 00:00:18,350 One thing to think about as we go through, 10 00:00:18,350 --> 00:00:20,820 don't get too bogged down in the details. 11 00:00:20,820 --> 00:00:24,590 Really focus on that process of what we're doing 12 00:00:24,590 --> 00:00:26,500 with incremental data loads, 13 00:00:26,500 --> 00:00:29,640 which is, looking at our watermark, 14 00:00:29,640 --> 00:00:31,310 deciding from there 15 00:00:31,310 --> 00:00:33,390 how we're going to figure out what's new, 16 00:00:33,390 --> 00:00:37,820 copying the data, and then finally updating our watermark. 17 00:00:37,820 --> 00:00:39,890 And in this example, 18 00:00:39,890 --> 00:00:42,200 all of that's going to be done in the portal. 19 00:00:42,200 --> 00:00:46,193 So you're not actually going to see the watermark selection 20 00:00:46,193 --> 00:00:50,420 other than when we dynamically choose 21 00:00:50,420 --> 00:00:52,320 what our file structure is going to be. 22 00:00:52,320 --> 00:00:53,153 That's actually where 23 00:00:53,153 --> 00:00:57,380 all that watermark discussion would take place. 24 00:00:57,380 --> 00:00:58,350 So, hopefully that helps. 25 00:00:58,350 --> 00:01:01,450 And with that, let's go ahead and dive in to Part 2 26 00:01:01,450 --> 00:01:03,593 and see this example in the portal. 27 00:01:05,000 --> 00:01:09,030 So let's go ahead and jump over into the Azure portal. 28 00:01:09,030 --> 00:01:11,290 So, here we find ourself in the Azure portal 29 00:01:11,290 --> 00:01:13,100 and I am going to show you 30 00:01:13,100 --> 00:01:15,770 how to do this incremental data load. 31 00:01:15,770 --> 00:01:19,410 So, we are in Data Factory and in the Data Factory studio. 32 00:01:19,410 --> 00:01:21,690 So, let's start off by clicking on Ingest 33 00:01:21,690 --> 00:01:24,550 to copy data at scale. 34 00:01:24,550 --> 00:01:26,750 Now, since we want to do an incremental data load, 35 00:01:26,750 --> 00:01:28,770 we're going to go ahead and just select 36 00:01:28,770 --> 00:01:30,853 a tumbling window to get us started. 37 00:01:32,050 --> 00:01:33,393 Click on Next. 38 00:01:34,300 --> 00:01:37,440 And I'm going to show you how to create this. 39 00:01:37,440 --> 00:01:40,660 So, let's start off by picking our source data type. 40 00:01:40,660 --> 00:01:42,780 And I'm just going to pick a Blob storage account 41 00:01:42,780 --> 00:01:44,410 that I had set up. 42 00:01:44,410 --> 00:01:46,250 And then, you can see here under Options, 43 00:01:46,250 --> 00:01:48,680 we have File Loading Behavior. 44 00:01:48,680 --> 00:01:52,330 So, I can load all files, which is that full restoration. 45 00:01:52,330 --> 00:01:55,990 Or I can come in here and I can choose an incremental load, 46 00:01:55,990 --> 00:01:56,823 which is what we're going to do. 47 00:01:56,823 --> 00:01:59,200 We're going to pick this last modified date 48 00:01:59,200 --> 00:02:00,380 incremental data load 49 00:02:02,040 --> 00:02:03,510 and go ahead and click on, 50 00:02:03,510 --> 00:02:05,820 oh, I got to pick an output folder here. 51 00:02:05,820 --> 00:02:07,470 So, let me go ahead and just browse 52 00:02:07,470 --> 00:02:09,930 and pick a folder for us. 53 00:02:09,930 --> 00:02:11,480 Again, it doesn't matter too much right now 54 00:02:11,480 --> 00:02:13,010 because I'm just more explaining 55 00:02:13,010 --> 00:02:15,040 how to do incremental loads. 56 00:02:15,040 --> 00:02:17,760 And then, we also want to choose Binary Copy. 57 00:02:17,760 --> 00:02:20,270 And again, this isn't as important basically. 58 00:02:20,270 --> 00:02:24,120 We're going to skip schemas at this stage because again, 59 00:02:24,120 --> 00:02:26,470 I just want to kind of walk through the examples 60 00:02:26,470 --> 00:02:28,430 of how to do incremental data load. 61 00:02:28,430 --> 00:02:30,103 So, go ahead and click on Next. 62 00:02:31,770 --> 00:02:35,630 And now, we are ready for our destination datastore. 63 00:02:35,630 --> 00:02:37,770 So, we've defined the source, 64 00:02:37,770 --> 00:02:40,990 and now we're going to define where it's going to go. 65 00:02:40,990 --> 00:02:41,910 So, we're going to go ahead 66 00:02:41,910 --> 00:02:45,050 and just pick a blob storage account again. 67 00:02:45,050 --> 00:02:48,490 And now we can choose a folder path. 68 00:02:48,490 --> 00:02:50,550 So, this is how we can actually go through 69 00:02:50,550 --> 00:02:54,920 and create an incremental data load based upon date. 70 00:02:55,840 --> 00:03:00,840 So, I could create in my blob storage a new folder 71 00:03:01,470 --> 00:03:04,170 for year and month. 72 00:03:04,170 --> 00:03:09,170 And every month, a new file, a new database, gets loaded in 73 00:03:09,180 --> 00:03:14,180 that's going to be under that year/month format. 74 00:03:14,620 --> 00:03:16,890 And so, I would just simply choose that 75 00:03:16,890 --> 00:03:18,670 by again going to browse, 76 00:03:18,670 --> 00:03:21,440 picking the folder where it's going to go to, 77 00:03:21,440 --> 00:03:23,180 and then I would add in 78 00:03:26,240 --> 00:03:27,073 year 79 00:03:30,140 --> 00:03:31,600 and then month. 80 00:03:31,600 --> 00:03:33,370 Let's just do that. 81 00:03:33,370 --> 00:03:35,200 If I can type, there we go. 82 00:03:35,200 --> 00:03:36,160 So, what it's going to do, 83 00:03:36,160 --> 00:03:38,840 you can see it gives us this year and month format. 84 00:03:38,840 --> 00:03:41,730 So, what this is asking me is in my blob storage, 85 00:03:41,730 --> 00:03:45,400 it's going to go out and dynamically look for a year 86 00:03:45,400 --> 00:03:47,930 in the format of 4 numbers. 87 00:03:47,930 --> 00:03:51,720 So, 2021, and then a 2-digit month. 88 00:03:51,720 --> 00:03:54,730 So, 2021/09 as an example. 89 00:03:54,730 --> 00:03:56,950 So, it's going to pick the folder that I chose, 90 00:03:56,950 --> 00:04:01,950 and it's going to look to put the new file into that. 91 00:04:02,230 --> 00:04:05,270 So in this way, we can do an incremental data load. 92 00:04:05,270 --> 00:04:08,530 It's going to look for the last folder that was updated, 93 00:04:08,530 --> 00:04:10,480 compare that to the latest folder 94 00:04:10,480 --> 00:04:11,313 that's in the blob storage, 95 00:04:11,313 --> 00:04:13,513 and it's going to incrementally load 96 00:04:13,513 --> 00:04:15,203 all the way up to that point. 97 00:04:18,400 --> 00:04:19,530 If I click on Next, 98 00:04:19,530 --> 00:04:21,960 I can then go through and define my settings, 99 00:04:21,960 --> 00:04:25,313 and then I'm ready to review and submit. 100 00:04:26,620 --> 00:04:28,440 So, finish, and you can see here, 101 00:04:28,440 --> 00:04:33,220 it's going to copy from one Azure blob storage 102 00:04:33,220 --> 00:04:35,330 to another Azure blob storage 103 00:04:35,330 --> 00:04:37,493 using that incremental data load. 104 00:04:38,820 --> 00:04:41,120 Alright, so that is another quick example 105 00:04:41,120 --> 00:04:44,550 of how we could use incremental data loads. 106 00:04:44,550 --> 00:04:48,030 So in review, we need to understand first, 107 00:04:48,030 --> 00:04:50,380 what is an incremental data load? 108 00:04:50,380 --> 00:04:53,640 It is a way of uploading the new data 109 00:04:53,640 --> 00:04:57,970 that exists from our folders or files so that we can 110 00:04:57,970 --> 00:05:01,410 update our database without restoring the entire thing. 111 00:05:01,410 --> 00:05:03,010 Service that is going to be used 112 00:05:03,010 --> 00:05:05,480 for incremental data loads in Azure. 113 00:05:05,480 --> 00:05:08,150 Most commonly, that's probably going to be Data Factory. 114 00:05:08,150 --> 00:05:09,380 There are some other ways you can do that, 115 00:05:09,380 --> 00:05:12,240 but again, I would look at Data Factory for that. 116 00:05:12,240 --> 00:05:14,720 And then finally, what is the process? 117 00:05:14,720 --> 00:05:17,820 So, you need to understand the importance of watermarks, 118 00:05:17,820 --> 00:05:20,640 be that a date, or be that a customer ID, 119 00:05:20,640 --> 00:05:23,210 and how we can use watermarks 120 00:05:23,210 --> 00:05:25,210 to do an incremental data load. 121 00:05:25,210 --> 00:05:26,540 Essentially the process, 122 00:05:26,540 --> 00:05:28,130 whether I'm going to the Azure portal 123 00:05:28,130 --> 00:05:30,230 or whether I'm doing it manually, 124 00:05:30,230 --> 00:05:34,000 is going to be looking at our new watermark, 125 00:05:34,000 --> 00:05:36,730 looking at our last updated watermark, 126 00:05:36,730 --> 00:05:40,280 comparing the difference, and then incrementally loading 127 00:05:40,280 --> 00:05:43,150 all of the data between those 2 watermarks. 128 00:05:43,150 --> 00:05:45,500 And then, we finish out by updating our watermark 129 00:05:45,500 --> 00:05:49,100 so that we have a new point of reference. 130 00:05:49,100 --> 00:05:53,610 That is the process for doing incremental data loads. 131 00:05:53,610 --> 00:05:56,430 Alright, so that is it for this lesson. 132 00:05:56,430 --> 00:05:58,420 We will talk more about watermarks 133 00:05:58,420 --> 00:06:01,160 in the next section as we talk about streaming. 134 00:06:01,160 --> 00:06:01,993 But for now, 135 00:06:01,993 --> 00:06:04,490 let's go ahead and leave the concept of watermarks 136 00:06:04,490 --> 00:06:07,640 and move a little further down our Azure Data Factory 137 00:06:07,640 --> 00:06:09,403 and batch processing journey.