1 00:00:01,390 --> 00:00:02,490 Welcome back. 2 00:00:02,500 --> 00:00:07,280 Let's talk about the types of databases that exist now. 3 00:00:07,310 --> 00:00:08,510 Here's the thing. 4 00:00:08,510 --> 00:00:12,350 The world of databases is extremely complicated. 5 00:00:12,350 --> 00:00:18,310 There's so many databases out there so many new innovations different databases for different things. 6 00:00:18,320 --> 00:00:20,150 It's impossible to know all of them. 7 00:00:20,210 --> 00:00:27,920 So we're just going to do a brief overview of what exists and why we have so many databases. 8 00:00:27,920 --> 00:00:29,420 So let's start off with the first one. 9 00:00:29,480 --> 00:00:31,940 Why do we have so many databases. 10 00:00:31,940 --> 00:00:36,530 I mean look at this little diagram that I found that is a lot. 11 00:00:36,530 --> 00:00:43,840 Well because we have different needs with data different databases are good at different things. 12 00:00:43,880 --> 00:00:48,540 So there's many different databases because we need them from different solutions. 13 00:00:48,770 --> 00:00:54,120 But the first thing we need to talk about when talking about databases is this idea. 14 00:00:54,350 --> 00:00:57,540 There's three main categories of databases. 15 00:00:57,590 --> 00:01:05,170 Now remember databases are just computers their computers with a hard drive essentially just like your 16 00:01:05,170 --> 00:01:11,080 computers is able to store files and have files available anytime. 17 00:01:11,080 --> 00:01:15,350 That's what a database is every time you turn off your computer and turn it back on. 18 00:01:15,430 --> 00:01:23,260 You expect your files to be there and a database is that you expect that your data persists. 19 00:01:23,260 --> 00:01:28,480 That is the data that you safe whether we turn off the computer or turn it back on. 20 00:01:28,480 --> 00:01:29,760 It's still going to be there. 21 00:01:29,830 --> 00:01:31,200 And then we can access it. 22 00:01:31,210 --> 00:01:32,780 We can modify it. 23 00:01:32,800 --> 00:01:37,950 That's the role of a database and we have three main types. 24 00:01:37,950 --> 00:01:45,240 One was the original type of database which was relational database a relational database is a database 25 00:01:45,450 --> 00:01:53,180 like post Cress or my sequel that allow you to use Eskew well to make transactions. 26 00:01:53,180 --> 00:02:00,540 That is to write to the database to read to the database using the ASCII well language. 27 00:02:00,890 --> 00:02:08,510 Now relational databases were all great until around thousands when initially we just had one database 28 00:02:08,540 --> 00:02:12,050 one master database that held all the information that we needed. 29 00:02:12,500 --> 00:02:18,380 But starting in the 2000s we got more and more data we collected more and more data and all of a sudden 30 00:02:18,470 --> 00:02:22,340 all that information cannot be stored on just one database. 31 00:02:22,370 --> 00:02:27,830 So we had an issue we now needed two databases to hold that information. 32 00:02:27,830 --> 00:02:35,510 And although I'm simplifying things the idea was that now because we need distributed databases that 33 00:02:35,510 --> 00:02:42,980 is we needed more than one database to store all that data relational databases didn't have the necessary 34 00:02:42,980 --> 00:02:48,200 tools to have this idea of multiple databases working together. 35 00:02:48,200 --> 00:02:56,780 So no sequel databases were born out of that databases like Mongo D.B. became popular because for the 36 00:02:56,780 --> 00:03:00,640 first time it allowed us to do distributed databases. 37 00:03:00,680 --> 00:03:09,920 That is we can have different machines 10 15 machines all working together as one database. 38 00:03:09,920 --> 00:03:15,230 Finally more recently we have this idea of a new ask you all database. 39 00:03:15,230 --> 00:03:16,480 What does that mean. 40 00:03:16,490 --> 00:03:23,560 Well new rescue all databases are still fairly new but they tried to get the best of both worlds. 41 00:03:23,600 --> 00:03:30,320 You see relational databases are really really good for something called acid transactions. 42 00:03:30,350 --> 00:03:36,830 They have some really nice guarantees when we read and write data to make sure that it's always accurate. 43 00:03:36,920 --> 00:03:45,170 No Eskew all databases were a little bit more disorganized but it allows us to store information on 44 00:03:45,170 --> 00:03:49,230 distributed databases that is as our data gets bigger and bigger. 45 00:03:49,370 --> 00:03:57,050 We can have a no Eskew database on different machines so that it scales up with our data new rescue 46 00:03:57,040 --> 00:04:05,850 of databases like vault D.B. or cockroach D.B. are trying to have the distributed nature of no Eskew 47 00:04:05,870 --> 00:04:12,320 all databases so we can scale up our data but also have some of the guarantees of relational databases. 48 00:04:12,320 --> 00:04:17,450 Now these new rescue wells are still in the works but definitely keep an eye on them. 49 00:04:17,450 --> 00:04:24,120 Another way to talk about databases is using databases for their own specific needs. 50 00:04:24,170 --> 00:04:31,440 For example there's some data storage or databases that are specifically used for searches. 51 00:04:31,550 --> 00:04:37,400 For example elastic search or solar are used as searching databases. 52 00:04:37,400 --> 00:04:41,610 So we put data in there and it allows us to search for things really fast. 53 00:04:41,690 --> 00:04:50,030 Then we have things like computational databases something like Apache Spark that allows us to use the 54 00:04:50,030 --> 00:04:55,760 data that we have in the database and make computations and calculations on them and we'll get to Apache 55 00:04:55,760 --> 00:04:57,080 Spark shortly. 56 00:04:57,080 --> 00:05:02,630 Finally I just want to introduce you these terms that you might hear when it comes to data engineering. 57 00:05:02,650 --> 00:05:10,580 Oh well T.P. and all that you can think of oil teepee databases as relational databases or Eskew old 58 00:05:10,580 --> 00:05:14,700 databases that allows us to make transactions. 59 00:05:14,870 --> 00:05:22,220 For example if you have a web app and that web app has a user that you can store user information you 60 00:05:22,220 --> 00:05:30,740 can upload photos and have that application interact with a database for a user that's oh well DTP databases 61 00:05:31,160 --> 00:05:35,890 and all that database is usually used for analytical purposes. 62 00:05:35,930 --> 00:05:40,110 Now I'll some resources if you want to really dive deep into this topic. 63 00:05:40,310 --> 00:05:43,670 But as you can see there's many ways to describe databases. 64 00:05:43,670 --> 00:05:49,760 We have many different types of databases but at the end of the day each database is good for its own 65 00:05:49,760 --> 00:05:51,570 specific use. 66 00:05:51,650 --> 00:05:56,630 If we look back at our diagram databases are used everywhere. 67 00:05:56,630 --> 00:06:04,310 For example these streams these are usually oil tepee databases like relational databases that collect 68 00:06:04,340 --> 00:06:10,740 information let's say a user on a mobile phone updates their user name. 69 00:06:10,770 --> 00:06:17,500 All that goes to a relational database and oil TB database and then a data lake well a data lake is 70 00:06:17,500 --> 00:06:18,880 another database. 71 00:06:18,880 --> 00:06:25,440 Usually something like Hadoop which we're going to talk about and then that data goes into a data warehouse. 72 00:06:25,600 --> 00:06:31,960 All of these things that we see here are essentially databases we collect information from different 73 00:06:31,960 --> 00:06:32,500 streams. 74 00:06:32,530 --> 00:06:38,860 We have data lakes which our databases and data warehouses which are databases so over the next couple 75 00:06:38,860 --> 00:06:45,640 of videos knowing what we know about databases and how we have different types of databases let's try 76 00:06:45,640 --> 00:06:49,960 and figure out how this whole picture of a data engineer works.