1 00:00:00,060 --> 00:00:05,760 ‫What is going on is my name is Hussein, and today I'm going to talk about a very interesting read phenomenon 2 00:00:05,760 --> 00:00:17,010 ‫database's called Phantom Rietz and this is something a lot of people overlook and it can, if misused, 3 00:00:17,010 --> 00:00:18,470 ‫lead to a disaster. 4 00:00:18,810 --> 00:00:20,880 ‫So let's just explain, what is that? 5 00:00:20,880 --> 00:00:25,740 ‫Phantom Reed is one of the three phenomenons, which is dirty reads and unrepeatable reeds. 6 00:00:26,070 --> 00:00:31,040 ‫And the Phantom Reed happens in concurrence kind of transaction. 7 00:00:31,050 --> 00:00:36,300 ‫So when another transaction does something and you are in another completely concurrent transaction 8 00:00:36,300 --> 00:00:41,420 ‫running at the same time and you're seeing unpleasant results, let's put it this way. 9 00:00:41,790 --> 00:00:50,940 ‫So Phantom Reed explicitly is when another transaction inserts a new roll and all of a sudden when you 10 00:00:50,940 --> 00:00:58,320 ‫do a query in your own isolated transaction, you start seeing that role despite you should really see 11 00:00:58,320 --> 00:00:58,830 ‫that roll. 12 00:00:59,280 --> 00:01:00,690 ‫Like, let's give an example. 13 00:01:00,720 --> 00:01:01,920 ‫So this is all clear. 14 00:01:02,280 --> 00:01:04,650 ‫So I'm here and this is the transaction number one. 15 00:01:04,650 --> 00:01:09,030 ‫This transaction number two, there is no there are no any transactions. 16 00:01:09,040 --> 00:01:10,170 ‫That is just terminals. 17 00:01:10,930 --> 00:01:14,850 ‫But if I do a select store from sales, I have a sales table here. 18 00:01:15,060 --> 00:01:16,980 ‫You can see that I have five records here. 19 00:01:16,990 --> 00:01:17,270 ‫Right. 20 00:01:17,460 --> 00:01:19,410 ‫This is the dates and all that stuff. 21 00:01:19,410 --> 00:01:19,680 ‫Right. 22 00:01:20,040 --> 00:01:29,040 ‫So now if I do a begin transaction here, I started a transaction and I do a select store from sales. 23 00:01:29,970 --> 00:01:35,980 ‫You can see that I see these five sales report and I'm about let's say I started the transaction due 24 00:01:36,030 --> 00:01:43,500 ‫to produce a report about the all the sales that I made between these dates and let's say I want to 25 00:01:43,500 --> 00:01:44,250 ‫do the some. 26 00:01:44,250 --> 00:01:53,480 ‫Rila says do OK, select videos, some of the price from sales, and let's do a group. 27 00:01:53,500 --> 00:01:59,970 ‫I have to do a group by P e and all of a sudden you have product number one sold forty dollars. 28 00:01:59,970 --> 00:02:00,270 ‫Right. 29 00:02:00,840 --> 00:02:06,110 ‫So now if I go I am still in a transaction, go to a completely different transaction and I'm going 30 00:02:06,110 --> 00:02:07,890 ‫to insert a new sale. 31 00:02:07,890 --> 00:02:11,010 ‫Someone just made a sale and it happened to be between these dates. 32 00:02:11,010 --> 00:02:11,310 ‫Right. 33 00:02:13,590 --> 00:02:20,910 ‫Say bid price dates, values enter let's just say product number one. 34 00:02:21,480 --> 00:02:29,880 ‫It made up fifteen dollars and it was made on February seven, twenty twenty one. 35 00:02:31,410 --> 00:02:33,750 ‫And now I was in a transaction. 36 00:02:33,750 --> 00:02:36,110 ‫So this automatically started transaction committed. 37 00:02:36,750 --> 00:02:39,820 ‫Did that statement and then committed immediately. 38 00:02:39,840 --> 00:02:44,040 ‫Now if I go back to my isolated transactions or so they say. 39 00:02:44,040 --> 00:02:44,460 ‫Right. 40 00:02:44,940 --> 00:02:48,750 ‫And I do a query, all of a sudden the sum has changed. 41 00:02:48,990 --> 00:02:54,760 ‫Remember, you're producing a report and a report has to be consistent based on the start of the transaction. 42 00:02:54,930 --> 00:03:00,900 ‫So if people starting making sales while you're producing a report, you're going to get funky results 43 00:03:01,050 --> 00:03:04,230 ‫like let's say you produce, you're listing all the products. 44 00:03:04,230 --> 00:03:06,150 ‫And then you were doing a query to do the some. 45 00:03:06,460 --> 00:03:08,120 ‫You're going to get an inconsistent result. 46 00:03:08,130 --> 00:03:09,120 ‫Of course you're going to get. 47 00:03:10,350 --> 00:03:16,620 ‫Now, if I do select start from sales and even if I do a range query, let's say between this and this, 48 00:03:16,980 --> 00:03:20,340 ‫you're going to get this extra roll that had been inserted. 49 00:03:20,670 --> 00:03:23,850 ‫This most of the time is undesirable. 50 00:03:24,210 --> 00:03:25,350 ‫So how do we fix this? 51 00:03:25,680 --> 00:03:29,970 ‫Those guys provide us with ways to fix this and most databases actually do it. 52 00:03:30,180 --> 00:03:36,230 ‫And that is called isolation, which I talked about in my introduction to database engineering course. 53 00:03:36,660 --> 00:03:45,300 ‫So now the serialisation isolation level allows us allows the databases to serialize transactions so 54 00:03:45,300 --> 00:03:52,470 ‫that nobody if a transaction has dependency like this one, this has a dependency on this insert, it 55 00:03:52,470 --> 00:03:53,340 ‫will affect it. 56 00:03:53,530 --> 00:04:01,830 ‫If if Bosco's detect or MySQL or any databases that supports serialisation, detects that there is a 57 00:04:01,830 --> 00:04:05,610 ‫dependency, it will try to isolate those changes. 58 00:04:05,730 --> 00:04:14,040 ‫So let's go ahead and be in a let's do the same thing, but in a not isolated and let's do a roll back 59 00:04:14,040 --> 00:04:17,670 ‫first and then do the serialization. 60 00:04:18,000 --> 00:04:29,070 ‫So if I do begin transaction isolation level serializable and now I am in a transaction that is serializable, 61 00:04:29,220 --> 00:04:36,270 ‫that means anything that I read must not depend on other transactions that is currently running. 62 00:04:36,420 --> 00:04:38,790 ‫Otherwise things can go bad. 63 00:04:39,070 --> 00:04:42,780 ‫So if I do select Star from sales, you can do it. 64 00:04:42,780 --> 00:04:44,310 ‫You can see that I have six records. 65 00:04:44,490 --> 00:04:48,530 ‫Let's go ahead and insert something in a completely different row. 66 00:04:48,540 --> 00:04:50,870 ‫Here are the same thing. 67 00:04:51,180 --> 00:04:51,710 ‫Sure. 68 00:04:52,050 --> 00:05:01,050 ‫Now if I go back and a query, you see that I've been isolated from that post, detected that, hey, 69 00:05:01,800 --> 00:05:03,720 ‫something happen on the other end. 70 00:05:04,020 --> 00:05:07,980 ‫But don't worry, we have isolated you from that change. 71 00:05:08,140 --> 00:05:12,020 ‫Okay, so now you can safely commit your change and you're good, right? 72 00:05:12,060 --> 00:05:13,320 ‫You can produce a report. 73 00:05:13,320 --> 00:05:15,930 ‫You can you can do the same and you will be isolated. 74 00:05:15,930 --> 00:05:19,870 ‫You will not see that change that happened on the on the other transactions. 75 00:05:20,850 --> 00:05:22,410 ‫So finally. 76 00:05:23,410 --> 00:05:30,670 ‫This is all true in all databases except postscripts, actually, postcrisis is special when it comes 77 00:05:30,670 --> 00:05:31,000 ‫to this. 78 00:05:31,540 --> 00:05:39,310 ‫Postscripts prevents phantom reads even in other isolation levels, such as repeatable read isolation 79 00:05:39,310 --> 00:05:39,640 ‫level. 80 00:05:40,480 --> 00:05:46,000 ‫So there are beatable readers allow you to query, execute the same query, yet get the same result 81 00:05:46,070 --> 00:05:51,550 ‫a little bit different than phantom rate that the reader all of a sudden you get a new raw read as just 82 00:05:51,630 --> 00:05:52,450 ‫reading a value. 83 00:05:52,450 --> 00:05:58,720 ‫And that value does not change no matter how much how many times you cooperate in the same query, in 84 00:05:58,720 --> 00:05:59,620 ‫the same transaction. 85 00:05:59,980 --> 00:06:05,530 ‫So now I begin my beautiful transaction here, which is not repeatable. 86 00:06:05,650 --> 00:06:06,790 ‫There are seven rows now. 87 00:06:06,800 --> 00:06:07,270 ‫Why? 88 00:06:07,300 --> 00:06:09,850 ‫Because now I started a new transaction. 89 00:06:09,850 --> 00:06:11,990 ‫I picked up the latest changes. 90 00:06:12,010 --> 00:06:18,320 ‫It's almost like it's it is implemented as versioning because NVCA multi version concurrency control. 91 00:06:18,670 --> 00:06:22,960 ‫So when you start a transaction, you create a version that's a version of these rows. 92 00:06:23,050 --> 00:06:28,450 ‫And then as you start editing other people, committing the database increases its version. 93 00:06:29,080 --> 00:06:37,180 ‫So now even a repeatable read, if you do this thing, that phantom thing, you don't see that phantom 94 00:06:37,180 --> 00:06:44,800 ‫read that even in a range queries whether this is an unbounded query like what I'm doing right now or 95 00:06:44,800 --> 00:06:49,390 ‫if it's like you're doing between, hey, give me the sales report between this date and this date that's 96 00:06:49,390 --> 00:06:51,450 ‫also can be affected as a phantom read. 97 00:06:51,730 --> 00:06:59,230 ‫But if you are in a repeatable read isolation level, you do not see that even in because that is very 98 00:06:59,230 --> 00:07:01,570 ‫special only to Bosco's MySQL. 99 00:07:01,720 --> 00:07:05,740 ‫If you do a repeatable read, you don't get rid of Phantom Rate. 100 00:07:06,670 --> 00:07:08,090 ‫That's very, very critical. 101 00:07:08,440 --> 00:07:16,480 ‫The only way to get rid of Phantom Reads and Oracle, MySQL, SQL Server, as far as I know, you have 102 00:07:16,480 --> 00:07:23,980 ‫to do serializable isolation level or you do some sort of a pessimistic locking was like lock until 103 00:07:23,980 --> 00:07:28,600 ‫someone if someone is changing the stiv, don't allow me to read it. 104 00:07:28,870 --> 00:07:29,620 ‫That's another way. 105 00:07:29,680 --> 00:07:32,470 ‫All right, guys, very quickly to talk about Phantom Reads. 106 00:07:32,650 --> 00:07:34,180 ‫I'm going to see you in the next one. 107 00:07:34,190 --> 00:07:35,770 ‫You guys stay the by.