1 00:00:00,700 --> 00:00:06,830 In this video, we ll go over caching in Lankchain to boost performance and save costs. 2 00:00:07,400 --> 00:00:08,110 What is caching? 3 00:00:08,920 --> 00:00:11,190 Caching is the practice of storing 4 00:00:11,200 --> 00:00:16,790 frequently accessed data or results in a temporary faster storage layer. 5 00:00:17,680 --> 00:00:24,190 In the context of Lankchain, caching helps optimize interactions with DLLMs. 6 00:00:25,080 --> 00:00:30,990 It matters because it reduces API calls and speeds up applications. 7 00:00:31,780 --> 00:00:37,770 When you repeatedly request the same completion from an LLM, caching ensures 8 00:00:37,780 --> 00:00:40,770 that the result is stored locally. 9 00:00:41,460 --> 00:00:43,950 Subsequent requests for the same input 10 00:00:43,960 --> 00:00:49,390 can then be served directly from the cache, reducing the number of expensive 11 00:00:49,400 --> 00:00:51,890 API calls to the LLM provider. 12 00:00:52,660 --> 00:00:56,350 By avoiding redundant API calls, caching 13 00:00:56,360 --> 00:01:01,510 significantly speeds up your application, whether you are building chatbots, 14 00:01:02,000 --> 00:01:08,150 content generators, or any other language -related tools, faster responses enhance 15 00:01:08,160 --> 00:01:09,790 the user experience. 16 00:01:10,940 --> 00:01:12,430 Let's see how it is done. 17 00:01:14,200 --> 00:01:18,770 Lankchain provides an optional caching layer for LLMs and there are two caching 18 00:01:18,780 --> 00:01:22,670 options, in -memory cache and SQLite cache. 19 00:01:25,020 --> 00:01:27,230 Let's implement in -memory cache. 20 00:01:28,680 --> 00:01:40,780 From Lankchain Globals, I am importing set -llm -cache and from Lankchain 21 00:01:40,790 --> 00:01:45,410 OpenAI, I am importing OpenAI. 22 00:01:48,790 --> 00:01:52,540 I am creating an LLM and I will select on 23 00:01:52,550 --> 00:01:56,480 purpose a slower model for demonstration purposes. 24 00:01:57,150 --> 00:02:05,060 LLM equals OpenAI of model name equals 25 00:02:05,070 --> 00:02:07,400 GPT 3 .5 TurboInstruct. 26 00:02:12,940 --> 00:02:14,790 To measure the response time of the 27 00:02:14,800 --> 00:02:20,050 model, use % %time magic command like this. 28 00:02:20,800 --> 00:02:24,010 The % %time magic command in Jupyter 29 00:02:24,020 --> 00:02:29,510 notebook is used to measure the execution time of the code within the current cell. 30 00:02:30,360 --> 00:02:33,290 Next, I will set up an in -memory cache. 31 00:02:33,820 --> 00:02:41,330 From Lankchain Cache, import the 32 00:02:41,340 --> 00:02:51,270 InMemoryCache class and I am setting the InMemoryCache by calling set -llm -cache 33 00:02:51,280 --> 00:02:55,630 with the constructor of InMemoryCache as an argument. 34 00:02:58,540 --> 00:03:06,080 The prompt will be, Tell me a joke that a toddler can understand. 35 00:03:12,330 --> 00:03:19,200 I am making the first request, llm .invoke of prompt. 36 00:03:20,070 --> 00:03:22,980 I am running the code within this cell. 37 00:03:26,290 --> 00:03:30,180 The request was not in cache and took longer. 38 00:03:31,990 --> 00:03:42,120 The total CPU time was 31 milliseconds and the wall time was almost 500 milliseconds. 39 00:03:45,160 --> 00:03:46,770 Let s make the same request. 40 00:03:48,180 --> 00:03:51,930 This time, the response is already in the 41 00:03:51,940 --> 00:03:59,910 cache and it will serve it from there, llm .invoke of prompt. 42 00:04:03,790 --> 00:04:05,480 Take a look at the difference. 43 00:04:06,210 --> 00:04:10,140 Now, the CPU time is 0 and the wall time 44 00:04:10,150 --> 00:04:11,560 is 1 millisecond. 45 00:04:12,050 --> 00:04:13,440 Amazing, isn t it? 46 00:04:18,080 --> 00:04:20,310 Let s talk about SQLite caching. 47 00:04:21,320 --> 00:04:24,090 If you want to use SQLite for caching, 48 00:04:24,560 --> 00:04:32,090 just import the SQLiteCache module and set the SQLiteDatabase path using set 49 00:04:32,100 --> 00:04:32,750 -llm -cache. 50 00:04:33,960 --> 00:04:37,250 To save time, I am just pasting the code 51 00:04:37,260 --> 00:04:38,890 because it s really simple. 52 00:04:42,310 --> 00:04:42,800 That s it. 53 00:04:43,190 --> 00:04:47,800 In this lecture, you learned about Langchain Caching and how to implement it 54 00:04:47,810 --> 00:04:54,780 in memory or using SQLite to ensure efficiency, cost savings, and quick responses.