1 00:00:00,06 --> 00:00:02,06 - [Instructor] Metadata that's embedded within files 2 00:00:02,06 --> 00:00:03,07 is extremely important 3 00:00:03,07 --> 00:00:06,01 in a number of fields and applications. 4 00:00:06,01 --> 00:00:08,04 Image metadata is often the first to come to mind 5 00:00:08,04 --> 00:00:10,04 when we think about this kind of metadata, 6 00:00:10,04 --> 00:00:11,08 but other media and formats 7 00:00:11,08 --> 00:00:14,07 often use embedded metadata as well. 8 00:00:14,07 --> 00:00:18,00 Here, for example, I have a photo that I took recently. 9 00:00:18,00 --> 00:00:20,03 I encourage you to explore your own photos. 10 00:00:20,03 --> 00:00:21,09 In most photo viewing applications, 11 00:00:21,09 --> 00:00:24,01 we can find an information or detail area 12 00:00:24,01 --> 00:00:30,08 to show us some of the metadata attached to a photo. 13 00:00:30,08 --> 00:00:33,04 I'll open up the file info, and here on the right, 14 00:00:33,04 --> 00:00:36,07 I can see camera information and location information. 15 00:00:36,07 --> 00:00:40,08 This photo was taken in Rome 16 00:00:40,08 --> 00:00:46,01 with my iPhone 13 Pro in April of 2022. 17 00:00:46,01 --> 00:00:49,02 This sort of metadata is defined by a standard called EXIF, 18 00:00:49,02 --> 00:00:52,01 which is short for Exchangeable Image File format. 19 00:00:52,01 --> 00:00:53,07 There are dozens of standard tags 20 00:00:53,07 --> 00:00:56,05 that might be applied to any photo created by any camera, 21 00:00:56,05 --> 00:00:58,09 for example, that can be read in a standard way 22 00:00:58,09 --> 00:01:00,08 by other devices and systems. 23 00:01:00,08 --> 00:01:02,04 This standard allows tools to work 24 00:01:02,04 --> 00:01:04,09 with the metadata information of files. 25 00:01:04,09 --> 00:01:06,07 There's also another set of metadata 26 00:01:06,07 --> 00:01:10,06 which is mostly supported within EXIF called IPTC. 27 00:01:10,06 --> 00:01:13,01 This standard has also been extended to XMP 28 00:01:13,01 --> 00:01:16,00 or Extensible Metadata Platform. 29 00:01:16,00 --> 00:01:19,03 Depending on the type, age, origin, and purpose of a file, 30 00:01:19,03 --> 00:01:21,05 it may have metadata in all of these formats, 31 00:01:21,05 --> 00:01:23,08 some of them, or none of them. 32 00:01:23,08 --> 00:01:26,04 It's common, for example, for a photo management tool 33 00:01:26,04 --> 00:01:30,01 to allow us to search or create albums based on metadata. 34 00:01:30,01 --> 00:01:31,07 I could search to find all the photos 35 00:01:31,07 --> 00:01:34,07 taken with a particular lens or at a particular location 36 00:01:34,07 --> 00:01:38,00 if my camera supports adding GPS information to photos. 37 00:01:38,00 --> 00:01:40,07 Some audio and other media file formats use EXIF 38 00:01:40,07 --> 00:01:42,06 and related tags as well. 39 00:01:42,06 --> 00:01:45,03 While this kind of metadata is useful within applications, 40 00:01:45,03 --> 00:01:47,08 it's also useful outside of applications 41 00:01:47,08 --> 00:01:49,00 because many tools can read 42 00:01:49,00 --> 00:01:51,04 standard embedded metadata formats. 43 00:01:51,04 --> 00:01:52,07 As we'll see later in the course, 44 00:01:52,07 --> 00:01:54,06 many system-wide file search tools 45 00:01:54,06 --> 00:01:56,05 can build their own index of metadata 46 00:01:56,05 --> 00:01:58,06 and use that to search for files quickly. 47 00:01:58,06 --> 00:02:01,01 There are also software tools like ExifTool 48 00:02:01,01 --> 00:02:03,09 which allow us to read and modify metadata in files 49 00:02:03,09 --> 00:02:05,08 without having to open up a photo editor, 50 00:02:05,08 --> 00:02:08,02 or an audio player, or a document browser, 51 00:02:08,02 --> 00:02:10,00 or whichever specialized application 52 00:02:10,00 --> 00:02:12,08 we might use a file's metadata within. 53 00:02:12,08 --> 00:02:14,03 One thing to be aware of when working 54 00:02:14,03 --> 00:02:15,07 with embedded metadata, though, 55 00:02:15,07 --> 00:02:17,07 is that, generally speaking, there's no guarantee 56 00:02:17,07 --> 00:02:20,07 that any of the values are true or correct. 57 00:02:20,07 --> 00:02:23,01 When I travel with my interchangeable lens camera 58 00:02:23,01 --> 00:02:26,01 which doesn't have internet access or a built-in GPS, 59 00:02:26,01 --> 00:02:27,06 I almost always forget to set 60 00:02:27,06 --> 00:02:29,04 its time zone and date correctly, 61 00:02:29,04 --> 00:02:31,02 so when I come back home from my travels, 62 00:02:31,02 --> 00:02:34,07 I need to modify the date and time of these photos. 63 00:02:34,07 --> 00:02:36,06 Some tools make it easy to simply offset 64 00:02:36,06 --> 00:02:39,03 the existing time by a set number of hours 65 00:02:39,03 --> 00:02:41,05 if we simply forgot to change the time zone, 66 00:02:41,05 --> 00:02:44,01 and sometimes we'll need to make extremely specific changes 67 00:02:44,01 --> 00:02:46,06 if the recorded date and time are way off. 68 00:02:46,06 --> 00:02:49,02 I also often add approximate GPS information 69 00:02:49,02 --> 00:02:51,02 lifted from photos I took with my smartphone 70 00:02:51,02 --> 00:02:53,08 around the same location or from mapping software 71 00:02:53,08 --> 00:02:56,00 after I've found a specific location. 72 00:02:56,00 --> 00:02:57,05 Editing the metadata like this 73 00:02:57,05 --> 00:02:59,01 allows my photo browsing software 74 00:02:59,01 --> 00:03:01,03 to present all of my photos on its handy map 75 00:03:01,03 --> 00:03:04,00 and show them in correct chronological order. 76 00:03:04,00 --> 00:03:07,03 Changing metadata for my personal use in this way is useful, 77 00:03:07,03 --> 00:03:09,05 honest, and non-harmful, 78 00:03:09,05 --> 00:03:12,05 but we can set metadata values to anything we'd like, 79 00:03:12,05 --> 00:03:15,02 so the embedded metadata values we read from files 80 00:03:15,02 --> 00:03:17,01 aren't guaranteed to always be truthful, 81 00:03:17,01 --> 00:03:19,02 accurate, or correct. 82 00:03:19,02 --> 00:03:21,07 The information can be maliciously falsified, 83 00:03:21,07 --> 00:03:25,02 accidentally changed, or, as happens with my travel camera, 84 00:03:25,02 --> 00:03:27,01 simply incorrect to begin with, 85 00:03:27,01 --> 00:03:30,00 so if we rely on metadata for forensic purposes 86 00:03:30,00 --> 00:03:31,08 or anything of consequence, 87 00:03:31,08 --> 00:03:33,08 we need to be careful to validate the data 88 00:03:33,08 --> 00:03:36,07 and not simply trust it uncritically. 89 00:03:36,07 --> 00:03:39,01 Some file formats, though, usually documents, 90 00:03:39,01 --> 00:03:40,08 store their own internal metadata 91 00:03:40,08 --> 00:03:42,07 that can be verified and audited, 92 00:03:42,07 --> 00:03:45,02 and this sort of metadata usually can't be read 93 00:03:45,02 --> 00:03:47,01 or changed from outside of the application 94 00:03:47,01 --> 00:03:49,04 that creates and edits the files. 95 00:03:49,04 --> 00:03:52,00 These values are usually chain of custody records, 96 00:03:52,00 --> 00:03:55,02 audit or review trails, or change-tracking information. 97 00:03:55,02 --> 00:03:56,06 Often, documents like this 98 00:03:56,06 --> 00:03:58,07 will also include cryptographic signatures 99 00:03:58,07 --> 00:04:01,03 that can be independently verified. 100 00:04:01,03 --> 00:04:03,09 Embedded metadata values are often called tags, 101 00:04:03,09 --> 00:04:05,07 and that's a helpful way to think about them. 102 00:04:05,07 --> 00:04:08,06 They're simply information attached to the data of a file, 103 00:04:08,06 --> 00:04:10,00 and they can be changed or removed 104 00:04:10,00 --> 00:04:12,05 without altering the actual data of a file. 105 00:04:12,05 --> 00:04:14,07 For example, consider this mug. 106 00:04:14,07 --> 00:04:17,07 I've added tags that describe various properties of it. 107 00:04:17,07 --> 00:04:19,07 It's red, it's made of metal, 108 00:04:19,07 --> 00:04:21,06 and should not be put in the microwave. 109 00:04:21,06 --> 00:04:23,01 Even though we can learn these things 110 00:04:23,01 --> 00:04:25,00 simply by looking at the mug itself, 111 00:04:25,00 --> 00:04:27,03 sometimes it's useful to have this kind of information 112 00:04:27,03 --> 00:04:30,00 specifically listed in a way that we can access. 113 00:04:30,00 --> 00:04:31,04 Tags also give us the ability 114 00:04:31,04 --> 00:04:33,09 to attach other non self-evident information 115 00:04:33,09 --> 00:04:35,04 to an object or file. 116 00:04:35,04 --> 00:04:37,05 While these tags in this photo are separate from 117 00:04:37,05 --> 00:04:39,00 but attached to the object, 118 00:04:39,00 --> 00:04:42,08 file metadata is often included in the file itself. 119 00:04:42,08 --> 00:04:44,04 When tags are embedded within a file, 120 00:04:44,04 --> 00:04:45,08 they are part of the file, 121 00:04:45,08 --> 00:04:49,01 but a file with embedded metadata also has a data component 122 00:04:49,01 --> 00:04:50,08 that represents the actual contents 123 00:04:50,08 --> 00:04:54,03 or information of the file, not including the metadata. 124 00:04:54,03 --> 00:04:55,08 In an image file, for example, 125 00:04:55,08 --> 00:04:58,06 there's binary data that represents the photo itself, 126 00:04:58,06 --> 00:05:00,03 and that's separate from the part of the file 127 00:05:00,03 --> 00:05:02,04 where the metadata is stored. 128 00:05:02,04 --> 00:05:05,02 I also want to point out that some embedded metadata values 129 00:05:05,02 --> 00:05:08,02 are intrinsically representative of the data itself, 130 00:05:08,02 --> 00:05:11,00 like image dimensions or format information. 131 00:05:11,00 --> 00:05:13,00 We could strip this metadata from the file, 132 00:05:13,00 --> 00:05:14,09 but the actual image is still encoded 133 00:05:14,09 --> 00:05:18,09 in whatever format it originally was, like JPEG or PNG, 134 00:05:18,09 --> 00:05:21,08 and it still resolves to the same 1920 pixels wide 135 00:05:21,08 --> 00:05:24,00 by 1080 pixels tall, and so on. 136 00:05:24,00 --> 00:05:27,01 This metadata can be regenerated from the data itself, 137 00:05:27,01 --> 00:05:29,00 and changing these metadata values 138 00:05:29,00 --> 00:05:32,04 doesn't alter the image's actual format or size. 139 00:05:32,04 --> 00:05:34,07 When working with metadata, it's also useful to know 140 00:05:34,07 --> 00:05:37,06 about a few interesting values we might find. 141 00:05:37,06 --> 00:05:39,09 One of these is the Unix epoch start time, 142 00:05:39,09 --> 00:05:41,07 the time when time began, 143 00:05:41,07 --> 00:05:44,05 at least according to Unix and Linux computers. 144 00:05:44,05 --> 00:05:48,05 That's January 1st, 1970 in the UTC time representation, 145 00:05:48,05 --> 00:05:51,06 and that time is kept by counting seconds since then. 146 00:05:51,06 --> 00:05:53,07 This date is important to know about 147 00:05:53,07 --> 00:05:55,05 because it often appears when the real date 148 00:05:55,05 --> 00:05:57,01 and time are missing. 149 00:05:57,01 --> 00:05:59,04 Instead of leaving the date or time field blank, 150 00:05:59,04 --> 00:06:01,05 some software wants to put in a value, 151 00:06:01,05 --> 00:06:02,07 and if the value is zero, 152 00:06:02,07 --> 00:06:05,02 that usually resolves to this special date. 153 00:06:05,02 --> 00:06:08,06 Sometimes we'll also see December 31st, 1969, 154 00:06:08,06 --> 00:06:09,08 which is how the date can render 155 00:06:09,08 --> 00:06:11,09 in time zones west of the prime meridian 156 00:06:11,09 --> 00:06:14,02 when the date value is set to the epoch start. 157 00:06:14,02 --> 00:06:16,05 While some events certainly did happen on this date, 158 00:06:16,05 --> 00:06:20,01 most files we'll use in modern times didn't exist then. 159 00:06:20,01 --> 00:06:23,00 We'll also sometimes see other erroneous dates. 160 00:06:23,00 --> 00:06:24,09 For example, I was going through an old archive 161 00:06:24,09 --> 00:06:27,09 and found files with the year 2036, 162 00:06:27,09 --> 00:06:31,01 14 years in the future of when I'm recording this. 163 00:06:31,01 --> 00:06:33,01 These files definitely weren't from the future, though, 164 00:06:33,01 --> 00:06:36,03 because I burned those archive CDs almost 20 years ago, 165 00:06:36,03 --> 00:06:37,07 so something happened, 166 00:06:37,07 --> 00:06:40,01 maybe a little bit of data corruption. 167 00:06:40,01 --> 00:06:43,08 It's always useful to check metadata for reasonableness. 168 00:06:43,08 --> 00:06:46,03 Another value that results from putting zero in a field 169 00:06:46,03 --> 00:06:49,08 is the location zero degrees north, zero degrees east, 170 00:06:49,08 --> 00:06:52,02 where the equator and prime meridian meet. 171 00:06:52,02 --> 00:06:53,04 That's a point on the globe 172 00:06:53,04 --> 00:06:55,08 in the water off the coast of West Africa, 173 00:06:55,08 --> 00:06:58,07 and it's commonly referred to as Null Island. 174 00:06:58,07 --> 00:07:00,06 Like with our strange times before, 175 00:07:00,06 --> 00:07:02,01 this is a valid location, 176 00:07:02,01 --> 00:07:05,01 and there are probably photos taken precisely there. 177 00:07:05,01 --> 00:07:06,08 There's even a little weather buoy at that spot 178 00:07:06,08 --> 00:07:10,03 to mark the location, but in most photos and other media, 179 00:07:10,03 --> 00:07:13,00 the coordinates of Null Island in location fields 180 00:07:13,00 --> 00:07:14,03 indicate a missing value 181 00:07:14,03 --> 00:07:17,08 that was replaced sometime along the way with zeros. 182 00:07:17,08 --> 00:07:20,01 Embedded metadata is extremely useful, 183 00:07:20,01 --> 00:07:22,02 and we can correct it and add our own, 184 00:07:22,02 --> 00:07:24,03 but it can also contain incorrect information, 185 00:07:24,03 --> 00:07:27,04 so it's important to double-check it before relying on it. 186 00:07:27,04 --> 00:07:29,07 Next, let's take a look at using the ExifTool software 187 00:07:29,07 --> 00:07:33,00 that I mentioned earlier to work with a file's metadata.