1 00:00:00,01 --> 00:00:03,00 - [Instructor] Modern operating systems offer metadata based 2 00:00:03,00 --> 00:00:05,06 file indexing and searching, a feature which allows us 3 00:00:05,06 --> 00:00:08,06 to find files on our system very quickly by name 4 00:00:08,06 --> 00:00:10,08 or by content once the system has found 5 00:00:10,08 --> 00:00:12,07 and cataloged our files. 6 00:00:12,07 --> 00:00:14,08 On Windows systems, the file searching function 7 00:00:14,08 --> 00:00:16,08 is called Windows Search. 8 00:00:16,08 --> 00:00:19,06 By default, it indexes a user's content from the documents, 9 00:00:19,06 --> 00:00:22,02 pictures, music and desktop directories. 10 00:00:22,02 --> 00:00:25,01 This includes the name of the files and if Windows Search 11 00:00:25,01 --> 00:00:27,02 or a plugin provided to it can read the file, 12 00:00:27,02 --> 00:00:29,05 it will index the file's contents as well, 13 00:00:29,05 --> 00:00:32,01 allowing us to search for a text within a text document, 14 00:00:32,01 --> 00:00:34,02 office document, or pdf. 15 00:00:34,02 --> 00:00:36,01 The index is stored in a system folder 16 00:00:36,01 --> 00:00:39,02 and is periodically updated as content changes. 17 00:00:39,02 --> 00:00:40,09 We can explore the settings for this feature 18 00:00:40,09 --> 00:00:42,06 in the settings application. 19 00:00:42,06 --> 00:00:45,05 Here we can also set some advanced options. 20 00:00:45,05 --> 00:00:48,00 In the advanced options, we can choose specific sub folders 21 00:00:48,00 --> 00:00:49,07 to exclude from the search scope 22 00:00:49,07 --> 00:00:51,05 and we can add other items too. 23 00:00:51,05 --> 00:00:53,03 And when we plug in an external disc, 24 00:00:53,03 --> 00:00:55,04 we can open its properties to indicate whether 25 00:00:55,04 --> 00:00:57,00 or not it should be indexed as well. 26 00:00:57,00 --> 00:01:00,00 Discs or volumes are tracked by unique identifiers written 27 00:01:00,00 --> 00:01:02,09 to the system volume information directory at their route. 28 00:01:02,09 --> 00:01:04,01 To use the Search feature, 29 00:01:04,01 --> 00:01:06,04 we can search in the start menu or the taskbar 30 00:01:06,04 --> 00:01:08,07 or within Explorer Windows. 31 00:01:08,07 --> 00:01:10,01 macOS offers a metadata based 32 00:01:10,01 --> 00:01:12,05 search function called Spotlight. 33 00:01:12,05 --> 00:01:14,07 On a Mac, the Spotlight service scans files 34 00:01:14,07 --> 00:01:16,00 and records metadata, 35 00:01:16,00 --> 00:01:18,09 and if it can read it, some of the file contents as well. 36 00:01:18,09 --> 00:01:21,03 Once the process of building this index is complete, 37 00:01:21,03 --> 00:01:22,09 we can search for our files and content 38 00:01:22,09 --> 00:01:24,07 on our system very quickly. 39 00:01:24,07 --> 00:01:26,06 Individual Spotlight indexes are stored 40 00:01:26,06 --> 00:01:28,02 on each index volume, 41 00:01:28,02 --> 00:01:29,09 so if we have a disc of sensitive data 42 00:01:29,09 --> 00:01:31,08 and we disconnect that disc from the system, 43 00:01:31,08 --> 00:01:33,08 either by unmounting or ejecting it, 44 00:01:33,08 --> 00:01:36,05 the Spotlight Search feature won't return results for files 45 00:01:36,05 --> 00:01:38,00 that are on that disc. 46 00:01:38,00 --> 00:01:39,04 When we attach the disc again, 47 00:01:39,04 --> 00:01:41,08 the index for that disc will be available to Spotlight 48 00:01:41,08 --> 00:01:44,06 and it will be able to return results on that disc. 49 00:01:44,06 --> 00:01:46,06 Spotlight works this way for a few reasons, 50 00:01:46,06 --> 00:01:49,07 one being security or more specifically secrecy 51 00:01:49,07 --> 00:01:52,03 as we've seen and the other is consistency. 52 00:01:52,03 --> 00:01:55,03 If our Mac keeps the index of removable drives locally and 53 00:01:55,03 --> 00:01:58,02 those drives are subsequently modified in another system, 54 00:01:58,02 --> 00:02:01,03 then our local index would be out of sync with reality. 55 00:02:01,03 --> 00:02:04,01 And an unreliable index isn't very useful. 56 00:02:04,01 --> 00:02:06,00 There's still a little bit of consistency risk 57 00:02:06,00 --> 00:02:08,01 with an index stored on the device, 58 00:02:08,01 --> 00:02:10,02 but Spotlight on any Mac plug a disc into 59 00:02:10,02 --> 00:02:12,03 will eventually update the index. 60 00:02:12,03 --> 00:02:15,04 Windows and Linux won't update the Spotlight index though. 61 00:02:15,04 --> 00:02:17,06 The index data for Spotlight volumes are stored 62 00:02:17,06 --> 00:02:18,08 in the route directory 63 00:02:18,08 --> 00:02:22,01 in a directory called .Spotlight-V100. 64 00:02:22,01 --> 00:02:23,07 This directory can be deleted 65 00:02:23,07 --> 00:02:26,02 without harming our actual data, but if it's missing, 66 00:02:26,02 --> 00:02:28,08 the Mac may attempt to rebuild the index when it discovers 67 00:02:28,08 --> 00:02:31,08 the data it expects to have is missing. 68 00:02:31,08 --> 00:02:33,01 We can search the Spotlight index 69 00:02:33,01 --> 00:02:36,02 through the graphical user interface with Command Space 70 00:02:36,02 --> 00:02:40,00 or at the command line using the command mdfind. 71 00:02:40,00 --> 00:02:41,01 I'll cancel this though. 72 00:02:41,01 --> 00:02:43,05 We can view metadata about the items in the index 73 00:02:43,05 --> 00:02:45,09 with the mdls command. 74 00:02:45,09 --> 00:02:48,05 Spotlight can be administered from the system settings area 75 00:02:48,05 --> 00:02:51,00 or through the mdutility terminal command. 76 00:02:51,00 --> 00:02:51,09 In the spotlight settings, 77 00:02:51,09 --> 00:02:55,06 we can exclude a disc or a folder from being indexed. 78 00:02:55,06 --> 00:02:57,06 The Spotlight index is updated in response 79 00:02:57,06 --> 00:02:59,01 to file system changes. 80 00:02:59,01 --> 00:03:01,09 So when files are created, modified and so on, 81 00:03:01,09 --> 00:03:05,00 the FSEvents feature of macOS tells Spotlight to update 82 00:03:05,00 --> 00:03:06,04 its stored information. 83 00:03:06,04 --> 00:03:08,08 After major operations though like a system update 84 00:03:08,08 --> 00:03:11,03 or extensive changes to a disc's contents, 85 00:03:11,03 --> 00:03:12,07 the Spotlight service may spend 86 00:03:12,07 --> 00:03:15,00 a sizable amount of time re-indexing. 87 00:03:15,00 --> 00:03:17,07 Like Windows Search, Spotlight gives us some flexibility 88 00:03:17,07 --> 00:03:20,02 about what is indexed and allows us to quickly search 89 00:03:20,02 --> 00:03:22,04 for content on our attached discs. 90 00:03:22,04 --> 00:03:25,02 On a Linux system, we can use the mlocate software 91 00:03:25,02 --> 00:03:27,07 to build and search an index of our files. 92 00:03:27,07 --> 00:03:30,02 To create an index or update it periodically 93 00:03:30,02 --> 00:03:32,01 we'd run the updatedb command. 94 00:03:32,01 --> 00:03:33,05 This generates a database file 95 00:03:33,05 --> 00:03:36,04 at var, lib, mlocate, mlocate.db. 96 00:03:36,04 --> 00:03:37,07 Once the database is created, 97 00:03:37,07 --> 00:03:39,09 we can search it with the locate command followed 98 00:03:39,09 --> 00:03:41,02 by a search term. 99 00:03:41,02 --> 00:03:42,08 Unlike on Windows and macOS, 100 00:03:42,08 --> 00:03:45,07 this index isn't updated automatically when files change, 101 00:03:45,07 --> 00:03:48,03 but it is updated on a daily basis by default 102 00:03:48,03 --> 00:03:50,02 by the system's task scheduler 103 00:03:50,02 --> 00:03:52,03 and it only keeps track of file names, 104 00:03:52,03 --> 00:03:55,00 not file contents or other metadata. 105 00:03:55,00 --> 00:03:57,01 There are other file indexing solutions available 106 00:03:57,01 --> 00:04:00,02 for Linux too, but and mlocate is widely used. 107 00:04:00,02 --> 00:04:01,01 All of these indexes 108 00:04:01,01 --> 00:04:02,09 that we've discussed here contain some amount 109 00:04:02,09 --> 00:04:06,01 of information about files they have to in order to operate. 110 00:04:06,01 --> 00:04:08,04 And that means they can present a security risk, 111 00:04:08,04 --> 00:04:11,02 especially if the index is copied or otherwise distributed 112 00:04:11,02 --> 00:04:12,08 to people who shouldn't have it. 113 00:04:12,08 --> 00:04:14,06 That doesn't mean we should totally turn off 114 00:04:14,06 --> 00:04:16,00 these helpful features though, 115 00:04:16,00 --> 00:04:18,00 we just need to be mindful of how their information 116 00:04:18,00 --> 00:04:19,04 is stored and collected 117 00:04:19,04 --> 00:04:22,00 and what options we have to exclude sensitive information 118 00:04:22,00 --> 00:04:23,02 from being indexed. 119 00:04:23,02 --> 00:04:25,02 We can mitigate the risk of information breached through 120 00:04:25,02 --> 00:04:27,06 through these databases by following practices 121 00:04:27,06 --> 00:04:29,04 that we should already have adopted. 122 00:04:29,04 --> 00:04:31,09 Don't mix work and personal files or systems 123 00:04:31,09 --> 00:04:33,08 and keep shared volumes clean. 124 00:04:33,08 --> 00:04:35,09 Take steps to protect sensitive information 125 00:04:35,09 --> 00:04:38,02 such as excluding it from indexers you don't control 126 00:04:38,02 --> 00:04:41,00 or encrypted at rest so it's contents aren't available 127 00:04:41,00 --> 00:04:42,01 to be indexed. 128 00:04:42,01 --> 00:04:44,04 Metadata indexing and the search capabilities 129 00:04:44,04 --> 00:04:47,00 it enables make dealing with a huge amount of data 130 00:04:47,00 --> 00:04:49,03 that we use on a daily basis possible. 131 00:04:49,03 --> 00:04:51,02 But like anything else in the digital world 132 00:04:51,02 --> 00:04:52,01 and in the real world, 133 00:04:52,01 --> 00:04:54,00 there are risks we should be aware of.