Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLMInputOutputAdapter.prepare_output - List out of Index error #28

Open
quissuiven opened this issue Apr 26, 2024 · 2 comments
Open

LLMInputOutputAdapter.prepare_output - List out of Index error #28

quissuiven opened this issue Apr 26, 2024 · 2 comments
Labels

Comments

@quissuiven
Copy link

Hi, I'm currently using ChatBedrock(model_id = "anthropic.claude-3-sonnet-20240229-v1:0"). I'm implementing a reflection workflow for PII extraction, with 1 prompt for the extractor and 1 prompt for the reflector.
There are 3 invocations where the first invocation extracts PII from a resume, the second invocation critiques the output, followed by the third invocation refining the output.

Currently, I'm noticing a List out of Index error for several cases:

  • When I include pydantic format in the prompt
  • When I include this line in the 2nd prompt: If no modifications are necessary, respond with "Output looks correct. Please return the original output in the same format."
Screenshot 2024-04-26 at 3 59 29 PM

This error did not appear when I was using langchain.llms.Bedrock. I presume this error happens only for chat models, when the library is trying to prepare the output in the form of an AIMessage but failed to do so. Does anyone know how to resolve this issue?

@3coins
Copy link
Collaborator

3coins commented May 23, 2024

@quissuiven
Is this still a problem? Can you share some sample code to reproduce?

@3coins 3coins added the bedrock label May 23, 2024
@quissuiven
Copy link
Author

HI @3coins, yes it's still a problem. Here's the sample code, I'm running this in Sagemaker studio:

!pip install -q langchain kaleido pypdf pydantic langchain-community langchain-core
!pip install -q langchain_aws 

!pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57" \
    "requests" \
    "defusedxml"
    
import boto3
import json
import time
from io import BytesIO
from datetime import datetime
import dateutil.parser
import os
import pypdf
import re
from langchain import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.llms import HuggingFacePipeline, Bedrock
from langchain.schema import BaseOutputParser, StrOutputParser
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from langchain.schema import OutputParserException, BaseOutputParser, StrOutputParser
from typing import List, Dict, Tuple
from langchain.schema.runnable import RunnablePassthrough, RunnableParallel, RunnableLambda
from langchain.pydantic_v1 import BaseModel, Field, validator
from langchain_aws import BedrockLLM, ChatBedrock

from langchain_core.prompts import MessagesPlaceholder
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)  

model = ChatBedrock(
    model_id = "anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={"temperature": 0}
)

def extract_pii_entities_with_reflection(resume_text):
    #EXTRACTION
    system_prompt_pii_masking = """
        You are a specialist focused on extracting personal identifying information from resumes. 
        Your job is to extract all personally identifying information from a resume. You respond only in valid JSON format.

        Here is your task:
        1. Read the candidate's resume text.
        2. Extract all personally identifying information matching the following template definition:
            person_name (list all people names in the text)
            physical_address (Use your advanced geopolitical knowledge to list all physical addresses in the text. This refers to only full addresses and excludes cities, states and countries.)
            phone_number (list all phone numbers in the text)
            email_address (list all email addresses in the text)
            url (list all URLs in the text)
            date_of_birth (list all dates of birth in the text)
            personal_identification_id (list all personal identification id in the text)

        Only extract information from the text, do not make up any information.
        Put the output in <response></response> XML tags.
    """
    human_prompt_pii_masking = "Here is the resume text: {TEXT}"

    def clean_response(response_message):
        response_str = response_message.content
        final_str = response_str.replace('<response>','')
        final_str = final_str.replace('</response>','')
        return final_str

    extractor_messages = ChatPromptTemplate.from_messages([("system", system_prompt_pii_masking),
                                                    MessagesPlaceholder(variable_name="messages")])

    runnable_extraction = extractor_messages | model | RunnableLambda(clean_response)
    query = human_prompt_pii_masking.format(TEXT=resume_text)
    request = HumanMessage(content = query)
    result_dict_extraction = runnable_extraction.invoke({"messages":[request]})

    #REFLECTION
    reflection_prompt = """
    You are tasked with evaluating personally identifying information extracted from a text. Here are your responsibilities:
    - Check all relevant personally identifying information have been extracted
    - All extracted information are present in the original text
    
    Your Feedback Protocol:
    - If suggesting modifications, include the specific segment and your recommendations.
    - If no modifications are necessary, respond with "Output looks correct. Please return the original output in the same format."
    """
    reflector_messages = ChatPromptTemplate.from_messages(
        [("system",reflection_prompt),
        MessagesPlaceholder(variable_name="messages")]
    )

    runnable_reflection = reflector_messages | model
    human_prompt_reflection = human_prompt_pii_masking.format(TEXT=resume_text)
    result_reflection = runnable_reflection.invoke({"messages": [HumanMessage(content = human_prompt_reflection), AIMessage(content = str(eval(result_dict_extraction)))]})
    
    #REFINED EXTRACTION
    message_1 = HumanMessage(content = human_prompt_reflection)
    message_2 = AIMessage(content = str(eval(result_dict_extraction)))
    message_3 = HumanMessage(content = result_reflection.content)
    runnable_extraction.invoke({"messages":[message_1, message_2, message_3]})

    return runnable_extraction.invoke({"messages":[message_1, message_2, message_3]})

results_list_with_reflection = []
for index, resume_text in enumerate(resume_extracted_list):         
    print(f"Performing extraction for Resume {index+1}")
    results_dict_reflection = extract_pii_entities_with_reflection(resume_text)
    print(results_dict_reflection)
    print("\n")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants