0 1 00:00:00,550 --> 00:00:01,210 All right guys. 1 2 00:00:01,240 --> 00:00:07,870 So we're going to get started learning about databases and how to actually persist or user data. 2 3 00:00:07,870 --> 00:00:09,730 Now here's the problem. 3 4 00:00:09,730 --> 00:00:16,540 Previously in the last challenge when we built our blog website, you would have noticed that every single 4 5 00:00:16,540 --> 00:00:19,500 time that you've restarted your server 5 6 00:00:19,540 --> 00:00:27,190 so for example at the moment, I'm using node app.js to run my app.js file and my server is 6 7 00:00:27,190 --> 00:00:28,740 running on port 3000. 7 8 00:00:28,840 --> 00:00:38,680 But if I use Control + C to stop running and then I rerun my app now, if I head back to my blog the post 8 9 00:00:38,680 --> 00:00:44,000 that I added previously will now be removed as soon as I hit refresh. 9 10 00:00:44,210 --> 00:00:51,940 And the reason behind this is because inside app.js, we're using an array to store all of that post 10 11 00:00:52,000 --> 00:00:52,880 data. 11 12 00:00:53,110 --> 00:00:59,130 And we don't have any way of persistently storing the data that the user generates 12 13 00:00:59,170 --> 00:01:04,650 for example new blog posts or new comments or new users sign ups. 13 14 00:01:04,780 --> 00:01:09,010 Now it's not just our website that's generating data. 14 15 00:01:09,040 --> 00:01:16,150 If you take a look at internetlivestats.com, you can see just how much data is being generated as 15 16 00:01:16,150 --> 00:01:17,380 we speak. 16 17 00:01:17,380 --> 00:01:23,620 If you look at the total number of Google searches today not even you know of all time, you can see how 17 18 00:01:23,620 --> 00:01:28,970 quickly that number moves and how many emails are sent or how many blog posts were written today. 18 19 00:01:29,290 --> 00:01:36,370 It's an absolutely huge amount of data that's being generated all across the Internet and they all need 19 20 00:01:36,370 --> 00:01:38,480 to be saved somewhere. 20 21 00:01:38,560 --> 00:01:41,070 That's why we need to know about databases. 21 22 00:01:41,320 --> 00:01:46,530 Once you realize the need for databases, the next step is actually choosing one. 22 23 00:01:46,600 --> 00:01:52,840 And if you look on the Internet this is actually not an easy task because there are so many databases 23 24 00:01:52,870 --> 00:01:58,930 out there. Even if you narrow it down to the databases that work well with Node.js, you still left 24 25 00:01:58,930 --> 00:02:06,360 with a lot of choice. And very frequently I'll come across blog posts have a title that's something like this: 25 26 00:02:06,370 --> 00:02:14,450 Cassandra vs MongoDB versus CouchDB vs... and it's just crazy the amount of choice that you have. 26 27 00:02:14,620 --> 00:02:19,660 But there's a reason for that because depending on the type of data you're looking to store and the 27 28 00:02:19,660 --> 00:02:26,440 structure of your data, you might favor one of these databases over another. But no matter what choice 28 29 00:02:26,440 --> 00:02:27,100 you make 29 30 00:02:27,100 --> 00:02:33,270 the main difference between databases is whether if they are sequel based or if they are no 30 31 00:02:33,280 --> 00:02:35,120 SQL based. Now 31 32 00:02:35,230 --> 00:02:43,480 SQL or sequel stands for Structured Query Language and it's a really old school piece of technology 32 33 00:02:43,510 --> 00:02:47,670 that businesses have been using for many many years. Now 33 34 00:02:47,680 --> 00:02:48,180 NoSQL 34 35 00:02:48,190 --> 00:02:55,450 stands for Not Only Structured Query Language so they can actually be any format as long as it's 35 36 00:02:55,450 --> 00:03:00,960 not SQL although most of them follow a very similar structure. 36 37 00:03:00,970 --> 00:03:05,380 Those are the two big families of databases that you get to choose from. 37 38 00:03:05,470 --> 00:03:10,480 And we're going to talk about the differences between each and I'm going to show you how to work with 38 39 00:03:10,480 --> 00:03:12,600 both of these types of databases. 39 40 00:03:13,120 --> 00:03:19,660 If you do a quick google, you will find that there are so many different types of databases that are 40 41 00:03:19,660 --> 00:03:23,500 categorized as SQUL databases or noSQL. 41 42 00:03:23,500 --> 00:03:29,760 But the top four most popular databases at least when we're working with Node.js, for the sequel family 42 43 00:03:29,770 --> 00:03:34,140 that's MySQL and Postgres. And for noSQL 43 44 00:03:34,180 --> 00:03:41,230 it's going to be MongoDB and redis. Now some of these databases you might have already heard of but 44 45 00:03:41,560 --> 00:03:47,890 a lot of people are confused as to when to use which and what are the similarities and differences and 45 46 00:03:47,890 --> 00:03:49,720 how do you choose anyways. 46 47 00:03:50,930 --> 00:03:55,120 So let's talk about what are the differences between SQL and NOSQL. 47 48 00:03:55,130 --> 00:04:02,840 The first difference is in their structure. So let's say that you're starting a new business and 48 49 00:04:02,870 --> 00:04:08,270 you've decided to start selling stationery. Now because you're shipping a lot of your product to your 49 50 00:04:08,270 --> 00:04:09,120 customers 50 51 00:04:09,290 --> 00:04:12,850 you need to keep a record of their names and addresses. 51 52 00:04:12,980 --> 00:04:15,300 So you create a database for that. 52 53 00:04:15,500 --> 00:04:21,800 And if you used to excel and creating tables then you'll find that a sequel database will serve you 53 54 00:04:21,800 --> 00:04:23,510 quite well. 54 55 00:04:23,510 --> 00:04:27,310 A sequel database will group your data into tables. 55 56 00:04:27,380 --> 00:04:33,490 So this is how our customer data would look if we chose a sequel database such as mySQL or 56 57 00:04:33,500 --> 00:04:41,300 Postgres. But sometimes you might find that sequel databases can be extremely inflexible. 57 58 00:04:41,330 --> 00:04:48,740 For example let's say that in your shop you have me as a customer and I'm one of those annoying customers 58 59 00:04:48,830 --> 00:04:52,100 who don't really like having mail sent to my address. 59 60 00:04:52,100 --> 00:04:54,840 So I say to you, 'You know what? 60 61 00:04:54,890 --> 00:04:57,500 Why don't you deliver to my address 61 62 00:04:57,530 --> 00:05:03,890 but for all other communications I want you to just email me? Because we are after all in the 21st 62 63 00:05:03,890 --> 00:05:05,250 century right? 63 64 00:05:05,270 --> 00:05:12,500 If you had a sequel table then you have a bit of a problem because that doesn't really fit with your 64 65 00:05:12,500 --> 00:05:13,480 table. 65 66 00:05:13,490 --> 00:05:17,910 You don't have a pre-specified email column to fit that data 66 67 00:05:17,910 --> 00:05:22,120 and none of your existing data have an email entry. 67 68 00:05:22,550 --> 00:05:27,770 And what if you had a customer who didn't want to give you any information at all? 68 69 00:05:27,770 --> 00:05:33,290 They didn't want to give you their address or their email and they'd rather protect their privacy. 69 70 00:05:33,320 --> 00:05:40,700 Well then you kind of have a gap in your table and it ends up looking very very irregular and that doesn't 70 71 00:05:40,700 --> 00:05:44,050 sit very well with sequel databases. 71 72 00:05:44,190 --> 00:05:53,010 So if you chose a sequel database, what it will try to do is it will try to square off the table. So you 72 73 00:05:53,010 --> 00:05:55,300 can add a new column called email 73 74 00:05:55,530 --> 00:06:02,060 but for all the places where there's missing information it will automatically insert a null. 74 75 00:06:02,190 --> 00:06:08,880 And as we learned from previous programming lessons null can be very dangerous especially if you're 75 76 00:06:08,880 --> 00:06:14,070 not expecting something to be null or not have any data. 76 77 00:06:14,100 --> 00:06:19,110 For example if you decide to send out your marketing emails, a lot of those emails are going to be 77 78 00:06:19,110 --> 00:06:28,400 going to an email address that's called null right? But if you had a NoSQL based database such as 78 79 00:06:28,490 --> 00:06:32,210 MongoDB, then this wouldn't be a problem at all. 79 80 00:06:33,120 --> 00:06:38,870 If you were working with Mongo the same data would be represented as JSON objects. 80 81 00:06:39,150 --> 00:06:43,750 So you would have key value pairs such as first name, Last name, address 81 82 00:06:43,920 --> 00:06:50,730 and if in one of those documents you had an extra key value pair such as email then it doesn't affect 82 83 00:06:50,790 --> 00:06:57,630 any of the other data records. And none of these records or documents actually have to be the same shape 83 84 00:06:57,960 --> 00:06:59,730 or follow the same structure. 84 85 00:07:00,960 --> 00:07:06,930 If you're old enough to remember that ad with a PC Guy and Mac guy, it's kind of the same with SQL and 85 86 00:07:06,930 --> 00:07:14,910 NoSQL. NoSQL databases tend to be flashier, newer with modern syntax and modern methods 86 87 00:07:15,270 --> 00:07:20,460 but SQL database is old and reliable and it likes structure. 87 88 00:07:20,550 --> 00:07:26,550 Everybody knows that guy who they work with in the office who really likes structure and for everybody 88 89 00:07:26,550 --> 00:07:32,680 to follow rules and to specify a structure ahead of time and just stick to it. 89 90 00:07:32,970 --> 00:07:39,420 But with NoSQL and especially MongoDB you're more flexible. You're able to adapt and maneuver around 90 91 00:07:39,420 --> 00:07:43,070 the situation as and when things come up. 91 92 00:07:43,110 --> 00:07:48,480 So it's really helpful for startups where your data structure is not predefined. 92 93 00:07:48,510 --> 00:07:53,190 You know one day you could be selling toasters and tomorrow you could be pivoting to giving massages. 93 94 00:07:53,480 --> 00:07:59,950 Then a NoSQL database can remain flexible and you're not bound to a particular structure. 94 95 00:08:01,140 --> 00:08:09,300 But even though the NoSQL guy is flashy and he's new and he's hip, you know I'm sure you've 95 96 00:08:09,300 --> 00:08:12,330 met people like this and I've met a lot of NoSQL 96 97 00:08:12,330 --> 00:08:20,340 guys in my time. And what they're not good at are relationships and this is another big difference 97 98 00:08:20,400 --> 00:08:21,670 between SQL 98 99 00:08:21,690 --> 00:08:27,440 and NoSQL databases in a way that they implement relationships between your data. 99 100 00:08:27,750 --> 00:08:30,760 Although previously we were comparing between SQL 100 101 00:08:30,770 --> 00:08:38,930 and NoSQL databases, you also often hear people refer to them as relational and non-relational databases. 101 102 00:08:39,150 --> 00:08:45,810 And this is the highlight that SQL databases of really good at establishing relationships between 102 103 00:08:45,870 --> 00:08:50,040 your data. Coming back to our shop, 103 104 00:08:50,060 --> 00:08:54,990 let's say that you wanted to keep a record of all of the orders that are being made. 104 105 00:08:55,040 --> 00:09:00,190 So you want to know what's the name of the customer, who bought it, what's their address, 105 106 00:09:00,260 --> 00:09:01,490 what did they buy, 106 107 00:09:01,490 --> 00:09:03,270 what was the price of the product, 107 108 00:09:03,290 --> 00:09:05,300 how many of it did they buy, 108 109 00:09:05,300 --> 00:09:06,320 all of these things. 109 110 00:09:06,380 --> 00:09:13,520 But it doesn't really make sense to cram it all into the same table. Especially if that same customer 110 111 00:09:13,550 --> 00:09:16,970 comes back and buys a different product 111 112 00:09:17,150 --> 00:09:24,170 then you have a lot of repeated data and it doesn't make any sense to cram everything into the same table 112 113 00:09:24,170 --> 00:09:24,750 right? 113 114 00:09:25,070 --> 00:09:31,310 Using a SQL database what you would do is you would group related pieces of data into individual 114 115 00:09:31,310 --> 00:09:32,090 tables. 115 116 00:09:32,420 --> 00:09:39,650 So you might have a customer table, a products table and an orders table. And in your database you can 116 117 00:09:39,650 --> 00:09:43,430 specify a relationship between the tables. 117 118 00:09:43,430 --> 00:09:51,230 So for example you could link up the customers table through the customer ID to the orders table. And 118 119 00:09:51,230 --> 00:09:56,080 you could link the products table to the products ID in the orders table. 119 120 00:09:56,180 --> 00:10:01,960 If we had some real data for example, we have a database of all of our customers and where they live. 120 121 00:10:02,180 --> 00:10:06,430 We have a table of our products and what their prices are. 121 122 00:10:06,650 --> 00:10:12,230 And we also have this Orders table which links up the data in the different tables. 122 123 00:10:12,590 --> 00:10:19,670 Let's say we have an order with order ID 1, our very first order. We can have a column where we specify 123 124 00:10:19,670 --> 00:10:21,310 the customer ID. 124 125 00:10:21,500 --> 00:10:25,480 So in this case it's linked to the customer that has an ID of 2. 125 126 00:10:25,580 --> 00:10:28,240 So that is of course me 126 127 00:10:28,250 --> 00:10:28,640 right? 127 128 00:10:28,640 --> 00:10:35,130 cUSTOMER 2 is Angela Yu. Now the next column saves the product ID. 128 129 00:10:35,300 --> 00:10:44,990 So this links to the products table with the product that has an ID of 2. By using this format we can 129 130 00:10:44,990 --> 00:10:49,400 join the tables together at a later point if need be. 130 131 00:10:49,700 --> 00:10:55,910 So we can know for each order which customer bought it and which products were bought. 131 132 00:10:57,480 --> 00:11:01,800 Now using a NoSQL database such as MongoDB 132 133 00:11:01,950 --> 00:11:05,180 you have to rethink how you organize your data. 133 134 00:11:05,340 --> 00:11:10,450 In this case we might have a document where we have an order ID. 134 135 00:11:10,800 --> 00:11:19,050 We have a key value pair that's a customer with customer as the key and their details as the value 135 136 00:11:19,920 --> 00:11:24,190 and also the products as its own embedded objects. 136 137 00:11:24,540 --> 00:11:30,180 So now we have embedded objects in order to represent the entire order. 137 138 00:11:30,270 --> 00:11:33,340 And this might lead to some data repetition. 138 139 00:11:34,250 --> 00:11:38,650 So you could also organize your data in different documents 139 140 00:11:38,660 --> 00:11:39,230 right? 140 141 00:11:39,260 --> 00:11:47,180 So say you have three objects. One has details of the order, one has details of the customer and one has 141 142 00:11:47,180 --> 00:11:51,280 details of the product. And you can also link them together 142 143 00:11:51,600 --> 00:11:55,650 although a little bit more clumsily through using references. 143 144 00:11:55,670 --> 00:12:03,650 So for example in the order document, you can see the customer references the document with a person 144 145 00:12:03,650 --> 00:12:10,900 ID of person_01 and the product references a product with the ID of product_01. 145 146 00:12:10,910 --> 00:12:18,380 So this is how you might implement your relationships in a NoSQL database such as MongoDB. But 146 147 00:12:18,470 --> 00:12:24,080 it's a little bit clumsy and when you start searching this database that structured like this, it might 147 148 00:12:24,170 --> 00:12:29,830 not be quite as efficient or fast as a SQL database. 148 149 00:12:30,060 --> 00:12:37,220 So depending on the structure of your data, you will want to choose a SQL or a NoSQL database. 149 150 00:12:37,440 --> 00:12:45,330 And you should know very often you will see on the Internet people saying MongoDB is the best database 150 151 00:12:45,420 --> 00:12:46,810 out there ever. 151 152 00:12:47,010 --> 00:12:52,810 And other people will say don't ever use MongoDB, it's the worst. Or you know 152 153 00:12:52,810 --> 00:12:55,670 MySQL is the best thing since sliced bread. 153 154 00:12:55,710 --> 00:12:57,670 Don't believe what people say. 154 155 00:12:57,750 --> 00:13:03,370 Instead you have to know what your particular situation is to choose the right database. 155 156 00:13:03,660 --> 00:13:11,100 If you need to store data such as orders, customer details, products inventory, things that have lots of 156 157 00:13:11,100 --> 00:13:16,650 relationships between each other, then you might be better off choosing a SQL database such as 157 158 00:13:16,650 --> 00:13:17,770 MySQL. 158 159 00:13:17,850 --> 00:13:24,230 However if you have a website where you have something that's more of a one to many kind of relationship 159 160 00:13:24,240 --> 00:13:31,440 like here for example say if we were making a database for Instagram, then we might have a user name, 160 161 00:13:31,500 --> 00:13:36,000 a first name, last name, and then we might have a post array. 161 162 00:13:36,090 --> 00:13:43,230 And so this is the array of objects and those objects each have an image URL description and the 162 163 00:13:43,230 --> 00:13:45,280 date that they were created. 163 164 00:13:45,360 --> 00:13:53,100 In this case where you have a single user generating lots of content creating a one user to many posts 164 165 00:13:53,220 --> 00:14:01,350 relationship, then this is much easier to map out using something like MongoDB. The last difference 165 166 00:14:01,350 --> 00:14:07,970 I want to talk about is scalability. Despite it being relatively weak on the relationship front 166 167 00:14:08,010 --> 00:14:14,060 one of the reasons why you might use NoSQL over SQL is because of scalability. 167 168 00:14:14,370 --> 00:14:20,720 This is a real strong point of NoSQL databases and this is a reason why you might choose a NoSQL 168 169 00:14:20,720 --> 00:14:23,650 database over a SQL database. 169 170 00:14:23,820 --> 00:14:30,630 If you've ever had an Excel spreadsheet that has lots and lots of rows, so say 10,000+ rows 170 171 00:14:31,080 --> 00:14:33,050 then you might have experienced this. 171 172 00:14:33,120 --> 00:14:35,770 It gets slower and slower 172 173 00:14:35,790 --> 00:14:42,630 the more rows of data you add until a point where the file is so big that your computer can no longer 173 174 00:14:42,630 --> 00:14:49,170 handle it and you might get a pop up saying Excel cannot complete this task with available resources. 174 175 00:14:49,170 --> 00:14:52,240 Choose less data or close other applications. 175 176 00:14:52,320 --> 00:14:59,430 It's saying that you need a more powerful computer. If you are running your SQL database on your laptop 176 177 00:14:59,430 --> 00:15:00,400 for example 177 178 00:15:00,480 --> 00:15:06,900 then you might just get a more powerful computer, say a desktop computer with higher specs 178 179 00:15:06,960 --> 00:15:07,390 right? 179 180 00:15:07,530 --> 00:15:13,830 And once that start slowing down and struggling, then you might just go for a mainframe and buy the biggest 180 181 00:15:13,830 --> 00:15:15,930 computer that you can find. 181 182 00:15:15,960 --> 00:15:22,020 But as we saw previously there are some companies that have so much data. 182 183 00:15:22,200 --> 00:15:29,430 For example the number of websites that Google has to index and be able to serve up in searches is 183 184 00:15:29,430 --> 00:15:37,470 in the bazillions and they have so much data that even the fastest computer will slow down and struggle. 184 185 00:15:38,220 --> 00:15:42,770 Managing large amounts of data using SQL based databases 185 186 00:15:42,840 --> 00:15:47,210 it's kind of like scaling a building vertically. As soon as you have more data 186 187 00:15:47,220 --> 00:15:53,820 you just add floors onto your building until eventually you end up with the skyscraper of a building. 187 188 00:15:53,880 --> 00:15:58,020 It might just become unstable and end up collapsing on you. 188 189 00:15:58,290 --> 00:16:05,730 And it's also very costly to scale vertically like this. Now because the way that MongoDB organizes 189 190 00:16:05,730 --> 00:16:13,650 its data into smaller chunks or smaller documents of data where essentially each record or each row 190 191 00:16:13,650 --> 00:16:17,510 in the table is represented as a JSON object, 191 192 00:16:17,700 --> 00:16:25,280 then in this case instead of buying larger and larger computers, it allows for a distributed system. 192 193 00:16:25,410 --> 00:16:30,350 So your database can be distributed amongst lots and lots of different computers. 193 194 00:16:30,460 --> 00:16:37,230 And this is equivalent to building low but sturdy buildings and you scale horizontally so you build 194 195 00:16:37,230 --> 00:16:43,160 lots and lots of those. And all the data is distributed and spread out amongst all of them. 195 196 00:16:44,210 --> 00:16:50,810 So if we were to compare the most popular SQL database which is MySQL against the most popular 196 197 00:16:51,230 --> 00:16:58,790 NoSQL database which is MongoDB, then these are some of the pros and cons of each. SQL databases 197 198 00:16:59,000 --> 00:17:02,040 have been around for ages. 198 199 00:17:02,210 --> 00:17:05,300 That means that the technology is more mature. 199 200 00:17:05,570 --> 00:17:11,390 But also if you come across anything that's SQL related on the internet, it all looks like it was created 200 201 00:17:11,420 --> 00:17:13,110 in the 90s. 201 202 00:17:13,490 --> 00:17:21,260 MongoDB is shiny and new but as we're finding out some of the things are not really ironed out. That 202 203 00:17:21,260 --> 00:17:27,460 might mean more frequent changes in the technology as they find new problems and address them. 203 204 00:17:27,650 --> 00:17:34,430 The next difference is that SQL databases run on a table structure as we've seen before whereas Mongo 204 205 00:17:34,430 --> 00:17:37,290 DB works with a document structure. 205 206 00:17:37,490 --> 00:17:43,190 So this way individual rows of data can be each represented as a document. 206 207 00:17:43,310 --> 00:17:47,180 And this leads us onto the next point. With a SQL database 207 208 00:17:47,180 --> 00:17:51,770 you should have a structure in mind before you create your database. 208 209 00:17:51,920 --> 00:17:55,550 And this, in database lingo, is called a schema. 209 210 00:17:55,580 --> 00:18:01,450 So these are the names of your columns and the names of your tables and how many columns you might need. 210 211 00:18:02,120 --> 00:18:04,900 And SQL databases work really well 211 212 00:18:04,940 --> 00:18:10,940 if you have a fixed schema, so if you already know ahead of time what your customer database will look 212 213 00:18:10,940 --> 00:18:13,230 like, what your orders database will look like 213 214 00:18:13,400 --> 00:18:15,910 and you know that they're not likely to change. 214 215 00:18:16,250 --> 00:18:22,040 Whereas with something like MongoDB and other NoSQL databases, they're more flexible to changes. 215 216 00:18:22,220 --> 00:18:27,170 So if you're a young startup and you haven't really figured out how you're going to organize your data 216 217 00:18:27,470 --> 00:18:32,960 or what kind of data you're going to store, then it might be a better idea to choose this particular 217 218 00:18:32,960 --> 00:18:34,220 type of database. 218 219 00:18:34,580 --> 00:18:40,640 But as we saw earlier on, if you need to store data with a lot of relationships between the data then 219 220 00:18:40,760 --> 00:18:46,320 a SQL database will work out to be much faster when you query through your database. 220 221 00:18:46,670 --> 00:18:53,240 Whereas with something like MongoDB it can manage simple relationships but it's really not great when 221 222 00:18:53,240 --> 00:19:00,080 those relationships start to get complex and when you have data in MongoDB that's linked up with complex 222 223 00:19:00,080 --> 00:19:01,090 relationships 223 224 00:19:01,250 --> 00:19:08,470 The speed and efficiency can quickly go down. The last point as we mentioned earlier on is that SQL 224 225 00:19:08,470 --> 00:19:13,300 databases tend to scale vertically. So you tend to need more infrastructure, 225 226 00:19:13,510 --> 00:19:14,930 it's more costly. 226 227 00:19:15,100 --> 00:19:21,160 Whereas with MongoDB and other NoSQL databases they can scale horizontally and allows for a 227 228 00:19:21,160 --> 00:19:28,430 more distributed structure which makes it much easier to scale. 228 229 00:19:28,440 --> 00:19:34,320 So this is your introduction for working with databases and especially looking at the differences between 229 230 00:19:34,320 --> 00:19:36,410 the two major families, SQL 230 231 00:19:36,460 --> 00:19:42,430 and NoSQL databases. Now in the coming lessons we're going to explore in a more hands on way 231 232 00:19:42,660 --> 00:19:48,930 the differences between SQL and NoSQL so that you can try it out and see for yourself how they 232 233 00:19:48,930 --> 00:19:54,640 each work and experience some of the frustrations as well as the benefits of each. 233 234 00:19:54,810 --> 00:19:57,420 So for all of that and more, I'll see on the next lesson.