1 00:00:07,260 --> 00:00:07,740 Hello. 2 00:00:08,340 --> 00:00:13,170 In this video, we will discuss the last interview question about connections. 3 00:00:13,590 --> 00:00:15,180 What is a dictionary? 4 00:00:15,630 --> 00:00:21,060 A dictionary is a data structure representing a collection of key value pairs. 5 00:00:21,450 --> 00:00:24,390 Each key in the dictionary must be unique. 6 00:00:24,870 --> 00:00:30,570 Let's declare a dictionary representing the mapping from the country name to its currency. 7 00:00:41,510 --> 00:00:48,290 In this case, both key and the value are strings, but that's not a rule, let's declare a dictionary 8 00:00:48,290 --> 00:00:53,630 with a mapping from the currency name to its value expressed in U.S. dollars. 9 00:01:03,780 --> 00:01:10,470 We can use the collection equalizer instead of adding the developers to the dictionary one by one. 10 00:01:21,900 --> 00:01:25,440 Remember, that kiss in the dictionary must be unique. 11 00:01:25,620 --> 00:01:31,890 This means an attempt to add a new value under the same key will draw an exception. 12 00:01:38,040 --> 00:01:45,510 When using the other method, which the duplicated key an exception is thrown, on the other hand, 13 00:01:45,510 --> 00:01:49,320 if I used the index are the old value would simply be updated. 14 00:01:57,720 --> 00:02:03,030 This time, no exception is thrown and the updated value is presented to the council. 15 00:02:04,390 --> 00:02:07,540 The use cases for dictionaries are endless. 16 00:02:07,870 --> 00:02:12,370 Whenever we need any kind of not being there, most likely the best choice. 17 00:02:12,700 --> 00:02:15,070 Let me give you a very simple example. 18 00:02:16,210 --> 00:02:18,610 Here we have a collection of employees. 19 00:02:19,000 --> 00:02:23,970 Each employee has a department he or she works in and the salary property. 20 00:02:24,250 --> 00:02:30,040 We want to create a method that calculates what is the average salary in each department. 21 00:02:30,520 --> 00:02:37,510 The result of this method should be a mapping from department to the average salary mappings are best 22 00:02:37,510 --> 00:02:38,350 represented. 23 00:02:38,440 --> 00:02:39,490 We've dictionaries. 24 00:02:39,850 --> 00:02:46,720 I will use the links grouped by method to group the employees by department and then calculate the average 25 00:02:46,720 --> 00:02:48,430 salary for each group. 26 00:02:48,940 --> 00:02:54,190 Then I will transform the result into a dictionary using the two dictionary method. 27 00:03:25,180 --> 00:03:27,610 Let's make sure this works as expected. 28 00:03:29,550 --> 00:03:35,760 It seems like the result is correct, and the dictionary is a perfect data structure to represent it. 29 00:03:36,930 --> 00:03:40,500 Now let's take a look under the hood of dictionaries. 30 00:03:40,920 --> 00:03:45,060 The underlying data structure of a dictionary is a hash table. 31 00:03:45,630 --> 00:03:49,740 A hash table is basically an array of linked lists. 32 00:03:50,190 --> 00:03:52,260 We can imagine it like this. 33 00:03:53,250 --> 00:03:56,070 Each element in the least because a value. 34 00:03:56,190 --> 00:04:00,120 And the key object for which the hash code is calculated. 35 00:04:00,480 --> 00:04:04,830 The placing of the developer is not random in this array. 36 00:04:05,220 --> 00:04:08,040 The index is calculated like this. 37 00:04:08,430 --> 00:04:15,270 This naturally gives a number between zero and R0 size minus one, which is a valid index in. 38 00:04:16,260 --> 00:04:21,420 When an item is inserted into the hash table, it's hash code is calculated. 39 00:04:21,720 --> 00:04:24,540 We learned about hash codes in the previous lecture. 40 00:04:24,990 --> 00:04:29,880 Then the index in the array is calculated using this formula. 41 00:04:30,180 --> 00:04:36,180 Finally, the key value power is added to the list stored under the given index. 42 00:04:36,570 --> 00:04:42,120 This means under a certain index, objects with different hash codes can be stored. 43 00:04:42,600 --> 00:04:49,620 But what happens if we add a new key value power to the dictionary, and the key has the same hash code 44 00:04:49,620 --> 00:04:52,500 as some other key that is already stored in it? 45 00:04:52,890 --> 00:05:01,050 Well, the dictionary must ask this is this actually the same key, or is it a different key that accidently 46 00:05:01,080 --> 00:05:02,430 has the same hash code? 47 00:05:02,820 --> 00:05:07,230 There is one method that can answer this question that equals method. 48 00:05:07,710 --> 00:05:14,700 If two keys have the same hash codes and the equals method returns true for them, it means it's actually 49 00:05:14,700 --> 00:05:15,540 the same key. 50 00:05:15,900 --> 00:05:21,720 Then one of the two things will happen if the key value pair was added with the other method, which 51 00:05:21,720 --> 00:05:25,050 expects that the key is not yet present in the dictionary. 52 00:05:25,080 --> 00:05:31,280 An exception will be thrown if the key value per se is set with the index, or it will simply update 53 00:05:31,350 --> 00:05:32,910 the value under the key. 54 00:05:33,240 --> 00:05:39,510 But if the new and the old key have the same hash codes, but the equals method returns false four, 55 00:05:39,510 --> 00:05:45,690 then it means there are actually two different keys that had the same hash called by accident. 56 00:05:45,930 --> 00:05:51,390 In this case, the dictionary stores the new key value pair under the same index. 57 00:05:51,720 --> 00:05:58,620 Remember, under each index there is a link to East, and the dictionary will put a new developer at 58 00:05:58,620 --> 00:05:59,340 its end. 59 00:05:59,610 --> 00:06:05,640 When we try to retrieve the value under this key again, the dictionary will quickly calculate the hash 60 00:06:05,640 --> 00:06:10,530 code and based on this hash, the index in the array of linked lists. 61 00:06:11,100 --> 00:06:15,950 Then it will iterate the list looking for an object with the same key. 62 00:06:16,050 --> 00:06:23,070 So the key for which the equals method returns true if compared with the key for which we try to retrieve 63 00:06:23,070 --> 00:06:30,300 the value because the equals method is needed in case of hash code conflicts to properly identify the 64 00:06:30,300 --> 00:06:30,790 key. 65 00:06:30,810 --> 00:06:36,690 We should always override it when overriding the good hash code method so their implementations are 66 00:06:36,690 --> 00:06:37,530 consistent. 67 00:06:37,680 --> 00:06:44,940 For example, if good hash code method returns the Social Security number for a person object, it means 68 00:06:44,940 --> 00:06:48,090 we consider this number the person's identifier. 69 00:06:48,450 --> 00:06:52,920 The equals method should also only compare the Social Security numbers. 70 00:06:53,250 --> 00:06:57,840 Calculating the hash code is, or at least should be very fast. 71 00:06:58,170 --> 00:07:02,460 Accessing the right element at a given index is extremely fast. 72 00:07:02,760 --> 00:07:06,900 This means as long as the list under each index is small. 73 00:07:06,990 --> 00:07:13,590 Accessing the dictionary value under a given key should be super fast, and this is the main power of 74 00:07:13,590 --> 00:07:14,490 dictionaries. 75 00:07:15,030 --> 00:07:19,140 So how big are the lists stored under each index? 76 00:07:19,470 --> 00:07:21,930 Well, it depends on two factors. 77 00:07:22,380 --> 00:07:26,250 What is the size of the array that represents the hash table? 78 00:07:26,700 --> 00:07:28,680 This is something we don't control. 79 00:07:28,950 --> 00:07:31,830 The dictionary itself adjusts this size. 80 00:07:32,490 --> 00:07:37,950 The other factor is how often the hash codes for different objects are the same. 81 00:07:38,430 --> 00:07:45,930 If, for example, we implemented the hash code method for some type us simply returning one, it would 82 00:07:45,930 --> 00:07:48,630 mean that all hash codes will be duplicated. 83 00:07:49,050 --> 00:07:55,110 There will be only one element in the array representing the hash table, and this element will be a 84 00:07:55,110 --> 00:07:57,390 very long list of developers. 85 00:07:57,630 --> 00:08:03,720 In other words, in this particular case, the performance of the dictionary will be similar to the 86 00:08:03,720 --> 00:08:05,160 performance of other used. 87 00:08:05,490 --> 00:08:10,410 This is the reason for which the hash functions should be uniformly distributed. 88 00:08:10,860 --> 00:08:13,680 The fewer hash code conflict, the better. 89 00:08:13,770 --> 00:08:21,210 The dictionaries performance Let's summarize a dictionary is a data structure representing a collection 90 00:08:21,210 --> 00:08:22,560 of key value pass. 91 00:08:22,980 --> 00:08:25,680 Each key in the dictionary must be unique. 92 00:08:25,950 --> 00:08:28,350 When a key is added to the dictionary. 93 00:08:28,720 --> 00:08:32,950 The dictionary calculates its hash code using the get hash called method. 94 00:08:33,310 --> 00:08:39,130 It uses these hash codes to properly replace the value for the given key in the hash table. 95 00:08:39,160 --> 00:08:42,250 That is the underlying data structure of a dictionary. 96 00:08:42,790 --> 00:08:49,210 Dictionaries are a very common topic during the interview, so you can be asked some related questions. 97 00:08:49,510 --> 00:08:51,940 For example, what is a hash table? 98 00:08:52,360 --> 00:08:57,580 A hash table is a data structure that stores values in an array of corrections. 99 00:08:57,940 --> 00:09:01,840 The index in the area is calculated using the hash code. 100 00:09:02,260 --> 00:09:06,190 It allows quick retrieval of objects with given hash code. 101 00:09:06,430 --> 00:09:10,300 A hash table is the underlying data structure of the dictionary. 102 00:09:10,930 --> 00:09:16,540 Will the dictionary work correctly if we have hash code conflicts for two of its keys? 103 00:09:17,350 --> 00:09:17,890 Yes. 104 00:09:18,250 --> 00:09:25,060 The dictionary still can tell which key is which using the equals method, so it will not mistake them 105 00:09:25,150 --> 00:09:27,700 only because they have the same hash codes. 106 00:09:28,180 --> 00:09:33,430 Why should we override the equals method when we override the good housecoat method? 107 00:09:34,060 --> 00:09:40,810 Because the equals method is needed for the dictionary to distinguish two keys in case of the hash code 108 00:09:40,810 --> 00:09:41,500 conflict? 109 00:09:41,830 --> 00:09:47,560 And because of that, its implementation should be in line with the implementation of the hash code 110 00:09:47,560 --> 00:09:48,160 method. 111 00:09:48,400 --> 00:09:55,120 For example, if the hash code method returns the Social Security number for a person object, it means 112 00:09:55,120 --> 00:09:58,210 we consider this number the person's identifier. 113 00:09:58,600 --> 00:10:03,460 The equals method should also only compare the Social Security numbers. 114 00:10:04,030 --> 00:10:06,430 All right, that's it in this lecture. 115 00:10:06,700 --> 00:10:09,430 Thanks for watching and see you in the next one.