1 00:00:05,980 --> 00:00:10,630 So let's go through some common database terminology, that should be useful for 2 00:00:10,630 --> 00:00:13,540 you when we're working with the databases in this section of the course. 3 00:00:13,540 --> 00:00:18,490 And the idea here is that it'll make it a lot easier to understand the 4 00:00:18,490 --> 00:00:22,170 app, when you understand what the fundamentals are, and what all the 5 00:00:22,170 --> 00:00:27,840 actual various terms mean. Now at the very basic level, the database is the 6 00:00:27,840 --> 00:00:33,780 container for all the data that you store, or that resides inside it. Now when you 7 00:00:33,780 --> 00:00:38,530 use the term database, you're referring to the entire data, as well as the 8 00:00:38,530 --> 00:00:44,100 structure it's actually stored in, and in addition, any queries and views on 9 00:00:44,100 --> 00:00:49,660 that data. Now in SQLite, which is the database used for Android apps, 10 00:00:49,660 --> 00:00:54,960 the entire database contents are stored in one single file, but that isn't true 11 00:00:54,960 --> 00:00:59,850 of most large database systems. Now the Database Dictionary provides a 12 00:00:59,850 --> 00:01:04,980 comprehensive list of the structures and types of data, that are used in recording 13 00:01:04,980 --> 00:01:10,720 the data. So basically, it describes all the tables and fields within the database. Now 14 00:01:10,720 --> 00:01:15,190 more on the specifics later, but in SQLite there is a table in each 15 00:01:15,190 --> 00:01:20,890 database called SQL_Lite_master, that has this information 16 00:01:20,890 --> 00:01:26,110 in it. Now you can query that table, but there are commands that do it for you, so you 17 00:01:26,110 --> 00:01:29,920 don't have to understand the structure of the master table. But it's there for 18 00:01:29,920 --> 00:01:33,190 you anyway, and as we go through the app, you'll see how these commands 19 00:01:33,190 --> 00:01:36,190 actually work. 20 00:01:36,760 --> 00:01:42,610 Now, a table is a collection of related data held in a database. So think of 21 00:01:42,610 --> 00:01:46,180 a contact database, for example, that stores the name and the address and the 22 00:01:46,180 --> 00:01:50,770 phone number, of perhaps your customers. Or what about an invoice table that 23 00:01:50,770 --> 00:01:55,180 records the invoice number in details of the invoice. So in this slide there are 24 00:01:55,180 --> 00:02:00,160 two tables; Contacts and Invoices, that are used to store information about contacts 25 00:02:00,160 --> 00:02:05,290 and invoice details. Now its SQL databases such as SQLite, again used 26 00:02:05,290 --> 00:02:09,720 in Android app development, or another database like Microsoft SQL Server - 27 00:02:09,720 --> 00:02:14,290 they're ideal for storing structured data that can be organized neatly into 28 00:02:14,290 --> 00:02:18,880 rows and columns, like you can see in these examples. Now with all the interest 29 00:02:18,880 --> 00:02:24,910 in Big Data, there are now databases such as No SQL or Hadoop, that can cope with data 30 00:02:24,910 --> 00:02:27,730 that doesn't have such an obvious structure, but we're going to be 31 00:02:27,730 --> 00:02:30,730 restricting our use of database to structured data. 32 00:02:32,720 --> 00:02:38,540 So a Field is the basic unit of data in a table. So a field in a database can be 33 00:02:38,540 --> 00:02:43,610 thought of, probably in a similar way, to what an instance variable in Java is. And 34 00:02:43,610 --> 00:02:47,570 you've seen those obviously in different apps, and just like a class variable, a 35 00:02:47,570 --> 00:02:52,940 database field has a name and a type. Now the type restricts what kind of 36 00:02:52,940 --> 00:02:57,620 data can be stored in a field, for example, it could be a string or it could 37 00:02:57,620 --> 00:03:02,930 accept numbers. Now many databases also allowed date fields, large text fields 38 00:03:02,930 --> 00:03:07,220 and also fields where you can store things like photographs or audio - and 39 00:03:07,220 --> 00:03:11,780 these field types can often, are often called BLOBs, which is 40 00:03:11,780 --> 00:03:16,490 supposed to stand for Binary Large OBject. It's a great name, and I can't 41 00:03:16,490 --> 00:03:20,090 help thinking that the original acronym was something like L.B.O, for Large Binary 42 00:03:20,090 --> 00:03:23,090 Object, and they came up with a cooler sounding name afterwards. 43 00:03:24,230 --> 00:03:29,620 Now Fields are often referred to as columns in databases. I know this can be 44 00:03:29,620 --> 00:03:33,260 technically confusing sometimes because if you come from an Excel background, 45 00:03:33,260 --> 00:03:36,200 you'll find that the definition is probably a bit different to what you 46 00:03:36,200 --> 00:03:41,090 think of as a column. Now in a spreadsheet, the term column refers to an entire set 47 00:03:41,090 --> 00:03:46,370 of data extending across many rows. but in a relational database, column 48 00:03:46,370 --> 00:03:48,830 generally refers to a single entry - 49 00:03:48,830 --> 00:03:52,250 although, when talking about the structure of a table rather than the 50 00:03:52,250 --> 00:03:56,540 actual data, then you could talk about a column to hold the Invoice Number, which 51 00:03:56,540 --> 00:04:01,250 is probably closer to the spreadsheet use of the term. Now when referring to 52 00:04:01,250 --> 00:04:05,560 the data in a database, though, we're talking about an individual item, like a 53 00:04:05,560 --> 00:04:10,400 base unit of data. Now relational databases existed nearly 10 years before 54 00:04:10,400 --> 00:04:13,940 the first spreadsheet program, and we just have to accept that column means 55 00:04:13,940 --> 00:04:16,940 slightly different things in each case. But more on that shortly. 56 00:04:18,630 --> 00:04:23,550 Now a row or record, is a single set of data for all fields that are in that 57 00:04:23,550 --> 00:04:28,770 table. So if you've got four columns like the example on the screen, and if your 58 00:04:28,770 --> 00:04:33,300 table has an invoice number or a date, a description and an amount, then a row 59 00:04:33,300 --> 00:04:37,320 represents those four values for a single invoice. So it's really a 60 00:04:37,320 --> 00:04:41,820 collection of all the columns that comprise the details of one entry in 61 00:04:41,820 --> 00:04:46,560 that table. So the highlighted record holds the details for invoice number two, 62 00:04:46,560 --> 00:04:51,060 which was a laptop costing just over thousand dollars, sold on the 63 00:04:51,060 --> 00:04:57,390 twenty-fourth of may 2016. And you can use either of the terms, row or record, to 64 00:04:57,390 --> 00:05:01,620 identify it, but the correct relational database terminology is actually row. 65 00:05:01,620 --> 00:05:03,630 .... 66 00:05:03,630 --> 00:05:08,700 Now a Flat File database stores all the data in a single file, which can 67 00:05:08,700 --> 00:05:12,510 result in a lot of duplication of data. Now here, 68 00:05:12,510 --> 00:05:17,160 ISP's credit limit needs to be increased, so that they can purchase the monitor, as 69 00:05:17,160 --> 00:05:20,070 otherwise, they'd actually go over the limit. 70 00:05:20,070 --> 00:05:24,060 So in order to increase the limit, the data in three rows would have to be 71 00:05:24,060 --> 00:05:25,970 modified. 72 00:05:25,970 --> 00:05:31,040 Now as you can see, every row in the table that contains a record for ISP, 73 00:05:31,040 --> 00:05:36,650 must be changed in order to increase the credit limit. So Flat File databases are 74 00:05:36,650 --> 00:05:41,210 not used very often anymore, but they were fairly popular in the early days as 75 00:05:41,210 --> 00:05:46,040 they directly mapped to those card index records, that companies sometimes used. 76 00:05:46,040 --> 00:05:51,050 Now if we were storing names, addresses and phone number type information, then a 77 00:05:51,050 --> 00:05:55,490 Flat File database is fine for the job. And even today, people still use address 78 00:05:55,490 --> 00:06:00,230 books and rolex style contact systems, with each person's details on a 79 00:06:00,230 --> 00:06:04,850 different card. But there isn't really much need to relate the individual cards 80 00:06:04,850 --> 00:06:05,840 to each other - 81 00:06:05,840 --> 00:06:11,030 this works perfectly well. Now as you can see from this invoice example, though, trying 82 00:06:11,030 --> 00:06:16,040 to store all the data in a single table results in duplicate data. Now using a 83 00:06:16,040 --> 00:06:20,390 relational database, tables can be related to other tables which is very 84 00:06:20,390 --> 00:06:22,370 useful. 85 00:06:22,370 --> 00:06:27,290 So continuing on with the invoice example; we can split the data out into 86 00:06:27,290 --> 00:06:32,380 a customer table, which contains standard company data such as their name, address 87 00:06:32,380 --> 00:06:36,590 and a credit limit, and another table called invoices, containing all that 88 00:06:36,590 --> 00:06:41,810 customer's purchases. The Name column in the Customer table is related to the 89 00:06:41,810 --> 00:06:46,720 Name column in the Invoices table. Now in relational database terms, this is called 90 00:06:46,720 --> 00:06:52,760 a Join. In fact, in this example we have a one-to-many join because there can, and 91 00:06:52,760 --> 00:06:57,830 probably will, be many invoice rows for each customer. Using a relational model, 92 00:06:57,830 --> 00:07:01,970 updating a customer's credit limit involves changing the data in just a 93 00:07:01,970 --> 00:07:06,800 single row. So there is a mechanism to join these two tables, to link the 94 00:07:06,800 --> 00:07:11,440 individual records in each table, to each other. And you'll often see designs where a 95 00:07:11,440 --> 00:07:15,080 third table is used to provide the link, as in this next slide. 96 00:07:17,480 --> 00:07:21,890 And it's also very common to use a linking table to relate the data to two 97 00:07:21,890 --> 00:07:28,100 other tables. Now here, when an invoice record is stored, a new record is created 98 00:07:28,100 --> 00:07:33,740 in the Customer_Invoices table to link, for example, invoice 0004 99 00:07:33,740 --> 00:07:38,840 with customer ISP. Now one advantage of this is that the invoice table only 100 00:07:38,840 --> 00:07:43,640 contains data relating to invoices. The rows are not cluttered up with 101 00:07:43,640 --> 00:07:48,560 customer information of any kind, not even the customer name. Splitting the 102 00:07:48,560 --> 00:07:51,860 data up like this is known as normalisation. Now database 103 00:07:51,860 --> 00:07:56,480 normalization is basically the process of removing redundant, duplicated and 104 00:07:56,480 --> 00:08:01,190 irrelevant data from the tables, and the more that this is done, 105 00:08:01,190 --> 00:08:05,720 the higher the level of normalisation. If you look into normalisation, you'll 106 00:08:05,720 --> 00:08:11,450 find that you can go up to level 6 - Sixth Normal Form - but in most practical 107 00:08:11,450 --> 00:08:15,530 applications it's ready to go beyond the third level. Now it's an interesting 108 00:08:15,530 --> 00:08:19,490 subject but the maths can get a bit horrible at the higher levels. Our 109 00:08:19,490 --> 00:08:23,060 example on the slide isn't quite as normal we should be, because we've 110 00:08:23,060 --> 00:08:26,780 used the customer name as the link between the customer and invoice tables. 111 00:08:26,780 --> 00:08:31,130 Now if one of our customers changes its name, which is quite a common thing to happen, 112 00:08:31,130 --> 00:08:35,929 then we'll have to update each of the relevant entries in the Customer 113 00:08:35,929 --> 00:08:40,309 _Invoices table as well. Now the usual way to deal with this is to use a 114 00:08:40,309 --> 00:08:46,550 unique ID field, that stays the same for each customer once its allocated. Now we 115 00:08:46,550 --> 00:08:50,090 shouldn't store the customer balances in the table, as that's best calculated 116 00:08:50,090 --> 00:08:53,260 when needed. 117 00:08:53,260 --> 00:08:58,690 So a View is a way of looking at the data in a format similar to a table, but 118 00:08:58,690 --> 00:09:04,210 bringing data together from more than one Joined table. So in this example, the view 119 00:09:04,210 --> 00:09:09,520 contains columns from the Customer table and the Invoices table. So a view can 120 00:09:09,520 --> 00:09:13,270 just contain just some columns from a single table - for example, just the 121 00:09:13,270 --> 00:09:16,930 Description column from the Invoices table to produce a list of items that 122 00:09:16,930 --> 00:09:23,230 have been sold. Now in SQLite, the data in a view can't be updated. So that means 123 00:09:23,230 --> 00:09:27,340 that you can't add a new row to a view and have it place the new data into the 124 00:09:27,340 --> 00:09:32,080 the relevant tables. Now some databases such as Microsoft SQL Server do allow 125 00:09:32,080 --> 00:09:37,120 this, but in that case, the views do have restrictions on the columns they can and 126 00:09:37,120 --> 00:09:41,410 must contain, if they're used in this way. So that's not something we need to worry 127 00:09:41,410 --> 00:09:43,960 about because we can't do that in SQLite. 128 00:09:43,960 --> 00:09:47,290 Alright, so that's enough to get us started, and we're going to explore these terms 129 00:09:47,290 --> 00:09:52,240 more as we go through SQLite. So coming up in the next videos, we're going 130 00:09:52,240 --> 00:09:57,010 to discuss a a few more database concepts and I'll be going into more detail. We'll 131 00:09:57,010 --> 00:10:01,630 be using practical examples and actually using the SQLite database. Now 132 00:10:01,630 --> 00:10:06,670 SQLite is designed to be embedded in applications, and is actually a library 133 00:10:06,670 --> 00:10:11,980 that's called from our application code. It does ship with a shell program that 134 00:10:11,980 --> 00:10:16,210 you can use to create databases and interrogate them, though, and we'll start off by 135 00:10:16,210 --> 00:10:20,410 using that to explore the commands available in SQLite. Now before we 136 00:10:20,410 --> 00:10:24,040 can use that shell program, we need to make sure that it's available on your 137 00:10:24,040 --> 00:10:28,420 system's path. so the next three videos are going to show you how to do that for 138 00:10:28,420 --> 00:10:32,680 Windows, Mac and Linux. So there's one video for each operating system, so 139 00:10:32,680 --> 00:10:36,400 follow the instructions in the relevant video for your operating system and you 140 00:10:36,400 --> 00:10:40,180 can skip over the videos that aren't relevant. So I'll see you in one of those 141 00:10:40,180 --> 00:10:40,600 videos.