1
00:00:00,700 --> 00:00:06,830
In this video, we ll go over caching in
Lankchain to boost performance and save costs.

2
00:00:07,400 --> 00:00:08,110
What is caching?

3
00:00:08,920 --> 00:00:11,190
Caching is the practice of storing

4
00:00:11,200 --> 00:00:16,790
frequently accessed data or results in a
temporary faster storage layer.

5
00:00:17,680 --> 00:00:24,190
In the context of Lankchain, caching
helps optimize interactions with DLLMs.

6
00:00:25,080 --> 00:00:30,990
It matters because it reduces API calls
and speeds up applications.

7
00:00:31,780 --> 00:00:37,770
When you repeatedly request the same
completion from an LLM, caching ensures

8
00:00:37,780 --> 00:00:40,770
that the result is stored locally.

9
00:00:41,460 --> 00:00:43,950
Subsequent requests for the same input

10
00:00:43,960 --> 00:00:49,390
can then be served directly from the
cache, reducing the number of expensive

11
00:00:49,400 --> 00:00:51,890
API calls to the LLM provider.

12
00:00:52,660 --> 00:00:56,350
By avoiding redundant API calls, caching

13
00:00:56,360 --> 00:01:01,510
significantly speeds up your application,
whether you are building chatbots,

14
00:01:02,000 --> 00:01:08,150
content generators, or any other language
-related tools, faster responses enhance

15
00:01:08,160 --> 00:01:09,790
the user experience.

16
00:01:10,940 --> 00:01:12,430
Let's see how it is done.

17
00:01:14,200 --> 00:01:18,770
Lankchain provides an optional caching
layer for LLMs and there are two caching

18
00:01:18,780 --> 00:01:22,670
options, in -memory cache and SQLite cache.

19
00:01:25,020 --> 00:01:27,230
Let's implement in -memory cache.

20
00:01:28,680 --> 00:01:40,780
From Lankchain Globals, I am importing
set -llm -cache and from Lankchain

21
00:01:40,790 --> 00:01:45,410
OpenAI, I am importing OpenAI.

22
00:01:48,790 --> 00:01:52,540
I am creating an LLM and I will select on

23
00:01:52,550 --> 00:01:56,480
purpose a slower model for demonstration purposes.

24
00:01:57,150 --> 00:02:05,060
LLM equals OpenAI of model name equals

25
00:02:05,070 --> 00:02:07,400
GPT 3 .5 TurboInstruct.

26
00:02:12,940 --> 00:02:14,790
To measure the response time of the

27
00:02:14,800 --> 00:02:20,050
model, use % %time magic command like this.

28
00:02:20,800 --> 00:02:24,010
The % %time magic command in Jupyter

29
00:02:24,020 --> 00:02:29,510
notebook is used to measure the execution
time of the code within the current cell.

30
00:02:30,360 --> 00:02:33,290
Next, I will set up an in -memory cache.

31
00:02:33,820 --> 00:02:41,330
From Lankchain Cache, import the

32
00:02:41,340 --> 00:02:51,270
InMemoryCache class and I am setting the
InMemoryCache by calling set -llm -cache

33
00:02:51,280 --> 00:02:55,630
with the constructor of InMemoryCache as
an argument.

34
00:02:58,540 --> 00:03:06,080
The prompt will be, Tell me a joke that a
toddler can understand.

35
00:03:12,330 --> 00:03:19,200
I am making the first request, llm
.invoke of prompt.

36
00:03:20,070 --> 00:03:22,980
I am running the code within this cell.

37
00:03:26,290 --> 00:03:30,180
The request was not in cache and took longer.

38
00:03:31,990 --> 00:03:42,120
The total CPU time was 31 milliseconds
and the wall time was almost 500 milliseconds.

39
00:03:45,160 --> 00:03:46,770
Let s make the same request.

40
00:03:48,180 --> 00:03:51,930
This time, the response is already in the

41
00:03:51,940 --> 00:03:59,910
cache and it will serve it from there,
llm .invoke of prompt.

42
00:04:03,790 --> 00:04:05,480
Take a look at the difference.

43
00:04:06,210 --> 00:04:10,140
Now, the CPU time is 0 and the wall time

44
00:04:10,150 --> 00:04:11,560
is 1 millisecond.

45
00:04:12,050 --> 00:04:13,440
Amazing, isn t it?

46
00:04:18,080 --> 00:04:20,310
Let s talk about SQLite caching.

47
00:04:21,320 --> 00:04:24,090
If you want to use SQLite for caching,

48
00:04:24,560 --> 00:04:32,090
just import the SQLiteCache module and
set the SQLiteDatabase path using set

49
00:04:32,100 --> 00:04:32,750
-llm -cache.

50
00:04:33,960 --> 00:04:37,250
To save time, I am just pasting the code

51
00:04:37,260 --> 00:04:38,890
because it s really simple.

52
00:04:42,310 --> 00:04:42,800
That s it.

53
00:04:43,190 --> 00:04:47,800
In this lecture, you learned about
Langchain Caching and how to implement it

54
00:04:47,810 --> 00:04:54,780
in memory or using SQLite to ensure
efficiency, cost savings, and quick responses.