Skip to main content

ChatBedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

%pip install --upgrade --quiet  langchain-aws
Note: you may need to restart the kernel to use updated packages.
from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage
API Reference:HumanMessage
chat = ChatBedrock(
model_id="anthropic.claude-3-sonnet-20240229-v1:0",
model_kwargs={"temperature": 0.1},
)
messages = [
HumanMessage(
content="Translate this sentence from English to French. I love programming."
)
]
chat.invoke(messages)
AIMessage(content="Voici la traduction en franรงais :\n\nJ'aime la programmation.", additional_kwargs={'usage': {'prompt_tokens': 20, 'completion_tokens': 21, 'total_tokens': 41}}, response_metadata={'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0', 'usage': {'prompt_tokens': 20, 'completion_tokens': 21, 'total_tokens': 41}}, id='run-994f0362-0e50-4524-afad-3c4f5bb11328-0')

Streamingโ€‹

To stream responses, you can use the runnable .stream() method.

for chunk in chat.stream(messages):
print(chunk.content, end="", flush=True)
Voici la traduction en franรงais :

J'aime la programmation.

LLM Caching with OpenSearch Semantic Cacheโ€‹

Use OpenSearch as a semantic cache to cache prompts and responses and evaluate hits based on semantic similarity.

from langchain.globals import set_llm_cache
from langchain_aws import BedrockEmbeddings, ChatBedrock
from langchain_community.cache import OpenSearchSemanticCache
from langchain_core.messages import HumanMessage

bedrock_embeddings = BedrockEmbeddings(
model_id="amazon.titan-embed-text-v1", region_name="us-east-1"
)

chat = ChatBedrock(
model_id="anthropic.claude-3-haiku-20240307-v1:0", model_kwargs={"temperature": 0.5}
)

# Enable LLM cache. Make sure OpenSearch is set up and running. Update URL accordingly.
set_llm_cache(
OpenSearchSemanticCache(
opensearch_url="http://localhost:9200", embedding=bedrock_embeddings
)
)
%%time
# The first time, it is not yet in cache, so it should take longer
messages = [HumanMessage(content="tell me about Amazon Bedrock")]
response_text = chat.invoke(messages)

print(response_text)
%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!

messages = [HumanMessage(content="what is amazon bedrock")]
response_text = chat.invoke(messages)

print(response_text)

Was this page helpful?


You can also leave detailed feedback on GitHub.