1 00:00:00,700 --> 00:00:08,430 Hi guys, in the previous video, we used the retrieval QA chain to ask questions 2 00:00:08,440 --> 00:00:15,690 against a vector store, that works well, but has one disadvantage, it fails to 3 00:00:15,700 --> 00:00:17,830 preserve conversational history. 4 00:00:19,000 --> 00:00:23,330 A common requirement for reg or retrieval 5 00:00:23,340 --> 00:00:30,270 -augmented generation apps is support for follow -up questions. 6 00:00:30,280 --> 00:00:34,230 Requirements can contain references to past chat history. 7 00:00:35,320 --> 00:00:40,730 In this video, I ll show you how to save the chat history so that you can ask 8 00:00:40,740 --> 00:00:42,050 follow -up questions. 9 00:00:43,380 --> 00:00:46,990 I am importing chatOpenAI to instantiate 10 00:00:47,000 --> 00:00:48,170 the LLM object. 11 00:00:49,000 --> 00:00:51,890 Instead of using the retrieval QA chain, 12 00:00:52,000 --> 00:00:56,130 we ll use another chain called conversationalRetrievalChain. 13 00:00:56,460 --> 00:01:00,990 I am importing it from langChain .chains 14 00:01:01,000 --> 00:01:05,910 import conversationalRetrievalChain. 15 00:01:06,580 --> 00:01:09,830 This chain is used to have a conversation 16 00:01:10,380 --> 00:01:12,230 based on the retrieved documents. 17 00:01:13,140 --> 00:01:14,890 I will also import the 18 00:01:14,900 --> 00:01:21,350 ConversationBufferMemory class, which acts as a buffer for storing conversation memory. 19 00:01:22,420 --> 00:01:29,550 From langChain .memory import ConversationBufferMemory. 20 00:01:32,690 --> 00:01:34,930 I am creating the LLM object. 21 00:01:37,410 --> 00:01:44,340 I ll use gpt4 -turbo, but you can also use gpt3 .5 -turbo if you are using a 22 00:01:44,350 --> 00:01:45,540 free OpenAI account. 23 00:01:51,030 --> 00:01:56,600 gpt4 -turbo -preview and the temperature 24 00:01:56,610 --> 00:01:57,840 equals 0. 25 00:02:01,020 --> 00:02:04,150 In REG systems, external data is 26 00:02:04,160 --> 00:02:09,170 retrieved and then passed to the LLM when doing the generation step. 27 00:02:09,340 --> 00:02:11,750 I am creating the retriever. 28 00:02:14,090 --> 00:02:17,760 Retriever equals vector store as 29 00:02:17,770 --> 00:02:33,330 retriever and the arguments are search type equals similarity and the search 30 00:02:33,340 --> 00:02:47,200 keywords equals and the key is k and the value lets say 5. 31 00:02:47,950 --> 00:02:56,140 A retriever is a crucial component that helps LLMs find and access relevant information. 32 00:02:56,870 --> 00:03:01,920 It does this by searching for relevant data and retrieving the information. 33 00:03:03,270 --> 00:03:09,480 In this example, the retriever will search by similarity and will retrieve 34 00:03:09,490 --> 00:03:12,840 the top k most similar chunks of data. 35 00:03:13,990 --> 00:03:17,120 I am creating a memory object that will 36 00:03:17,130 --> 00:03:20,560 be passed to the conversational retrieval chain as an argument. 37 00:03:21,690 --> 00:03:29,640 The memory will be automatically updated with the questions and the answers memory 38 00:03:29,650 --> 00:03:41,440 equals conversation memory buffer of memory key equals chat history and the 39 00:03:41,450 --> 00:03:43,420 return messages equals true. 40 00:03:48,510 --> 00:03:51,900 Memory is specifically designed to store 41 00:03:51,910 --> 00:03:55,860 and manage conversation history within the link chain application. 42 00:03:58,150 --> 00:04:02,980 Memory key equals chat history gives your memory a label. 43 00:04:03,610 --> 00:04:08,480 When retrieving or interacting with the stored conversation, you'll use the key 44 00:04:08,490 --> 00:04:09,340 chat history. 45 00:04:11,830 --> 00:04:14,280 I am creating the conversational 46 00:04:14,290 --> 00:04:15,620 retrieval chain. 47 00:04:32,530 --> 00:04:41,740 Equals retriever memory equals memory 48 00:04:43,640 --> 00:04:51,050 chain type equals stuff and the verbose equals true. 49 00:04:52,580 --> 00:04:58,030 Chain type equals stuff means use all of the text from the documents. 50 00:04:58,760 --> 00:05:00,050 I am running the code. 51 00:05:03,130 --> 00:05:06,400 Next, I will define a function called ask 52 00:05:06,410 --> 00:05:10,440 question to make it easier to send the questions. 53 00:05:13,430 --> 00:05:16,120 The parameters will be queue for the 54 00:05:16,130 --> 00:05:17,660 question and the chain. 55 00:05:18,630 --> 00:05:22,760 I am running the chain by calling chain .invoke. 56 00:05:25,090 --> 00:05:33,140 It takes a dict that contains the question as an argument and returns a result. 57 00:05:49,000 --> 00:05:54,940 Just to ensure that data is loaded, splitted into chunks and embedded, I will 58 00:05:54,950 --> 00:05:59,160 copy and paste the code that does that from a previous cell. 59 00:06:07,400 --> 00:06:09,490 I am sending the first question. 60 00:06:09,940 --> 00:06:12,450 It is the same question we asked before. 61 00:06:14,060 --> 00:06:18,950 How many pairs of questions and answers had the stack overflow dataset? 62 00:06:21,880 --> 00:06:28,950 And the result equals ask question of queue and crc, the chain. 63 00:06:29,620 --> 00:06:31,350 And I am printing the result. 64 00:06:32,180 --> 00:06:32,950 I am running the code. 65 00:06:35,910 --> 00:06:36,800 It is running the chain. 66 00:06:37,830 --> 00:06:38,460 Very well. 67 00:06:39,090 --> 00:06:43,680 The answer is in the document and will be returned. 68 00:06:44,390 --> 00:06:48,040 The stack overflow dataset had 8 million 69 00:06:48,050 --> 00:06:50,600 pairs of questions and answers. 70 00:06:51,550 --> 00:06:54,980 Note that the result is a dictionary that 71 00:06:54,990 --> 00:07:05,380 contains 3 key value pairs, the question sent by the user, the chat history and 72 00:07:05,390 --> 00:07:06,160 the answer. 73 00:07:07,330 --> 00:07:11,800 If you only want the answer, use result 74 00:07:11,810 --> 00:07:13,080 of answer. 75 00:07:16,150 --> 00:07:19,320 Let's test if it remembers the last question. 76 00:07:21,830 --> 00:07:36,880 Q equals, multiply that number by 10 and the result equals ask question of queue 77 00:07:36,890 --> 00:07:38,400 and crc. 78 00:07:39,190 --> 00:07:40,560 I am running it. 79 00:07:43,500 --> 00:07:45,210 I am also printing the result. 80 00:07:49,710 --> 00:07:53,040 Note that it knew exactly what I was 81 00:07:53,050 --> 00:07:53,840 referring to. 82 00:07:54,830 --> 00:07:57,440 The result of multiplying the number of 83 00:07:57,450 --> 00:08:03,880 pairs of questions and answers in the dataset which is 8 million by 10 would be 84 00:08:03,890 --> 00:08:04,600 80 million. 85 00:08:05,130 --> 00:08:05,600 Very well. 86 00:08:09,110 --> 00:08:10,040 Let's try another one. 87 00:08:17,290 --> 00:08:27,860 Divide the result by 80 and I am running it. 88 00:08:33,180 --> 00:08:37,290 The result of dividing 80 million by 80 is 1 million. 89 00:08:39,280 --> 00:08:43,770 To display the chat history which contains all the questions and their 90 00:08:43,780 --> 00:08:49,910 answers, iterate over the content of the chat history key like this for item in 91 00:08:49,920 --> 00:08:56,030 result of chat history, print item. 92 00:09:00,130 --> 00:09:02,920 You will see all the questions and the 93 00:09:02,930 --> 00:09:04,700 answers from the conversation. 94 00:09:07,570 --> 00:09:07,900 That's it. 95 00:09:08,410 --> 00:09:13,580 In this video, you learned how to use the conversational retrieval chain and the 96 00:09:13,590 --> 00:09:17,660 conversation buffer memory classes to add memory to your rack system. 97 00:09:18,250 --> 00:09:22,360 In the next video, we will dive deeper into it and see how to use the custom 98 00:09:22,370 --> 00:09:24,480 prompt with prompt templates.