Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RetrievalAgent] Add base retrieval agent #258

Open
wants to merge 31 commits into
base: develop
Choose a base branch
from

Conversation

w5688414
Copy link
Collaborator

@w5688414 w5688414 commented Jan 2, 2024

TODO

BFS方法 (Base版)

  • Text summarization压缩sub query的召回文本
  • few shot retriever for planning
  • summarization retriever for context planning
  • 单测

Graph DFS方法(高级版)

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2024

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (6fbc099) 69.37% compared to head (db39621) 70.09%.
Report is 1 commits behind head on develop.

Files Patch % Lines
...agent/src/erniebot_agent/agents/retrieval_agent.py 87.61% 13 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #258      +/-   ##
===========================================
+ Coverage    69.37%   70.09%   +0.71%     
===========================================
  Files           63       64       +1     
  Lines         3236     3347     +111     
===========================================
+ Hits          2245     2346     +101     
- Misses         991     1001      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@w5688414 w5688414 marked this pull request as ready for review January 3, 2024 05:16
Comment on lines 10 to 12
QUERY_DECOMPOSITION = """请把下面的问题分解成子问题,每个子问题必须足够简单,要求:
1.严格按照【JSON格式】的形式输出:{'子问题1':'具体子问题1','子问题2':'具体子问题2'}
问题:{{prompt}} 子问题:"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要给几个few shots吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加了一个few shot retriever,更通用

self.use_extractor = use_extractor
self.extractor = PromptTemplate(CONTENT_COMPRESSOR, input_variables=["context", "query"])

async def _run(self, prompt: str, files: Optional[List[File]] = None) -> AgentResponse:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 为啥不写在_run里面呢
  2. log需要加,同时steps需要遵守, 要不然返回最后的response信息不足

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_llm里面加了logger日志

@w5688414 w5688414 self-assigned this Jan 8, 2024
Comment on lines +127 to +128
else:
steps_input = HumanMessage(content=self.query_transform.format(query=prompt))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

什么时候会走进最后这个else branch呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero shot,完全靠大模型自己的能力进行子query分解的时候

"""


class FaissFewShotSearch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FaissFewShotSearch和FaissAbstractSearch 这2个类都以Faiss打头,但是没看出和faiss有任何关系. 唯一有关系的是self.db. similarity_search_with_relevance_scores这个函数, 但是这个函数应该是任何vector db实现都有的,也不仅仅和Faiss相关

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,这个similarity_search_with_relevance_scores函数是langchain独有的

Comment on lines 95 to 99
@dataclass
class RetrievalStep(AgentStep):
"""A step taken by an agent."""

name: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果只是为了加一个name字段,就没有必要去新开一个class了。直接将name字段放入info就可以了,info是一个dict

Copy link
Collaborator Author

@w5688414 w5688414 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,但我有个疑问:是不是可以给每个step加一个别名,当成默认属性,然后单独拿出来?

@w5688414 w5688414 changed the title [RetrievalAgent] Add retrieval agent [RetrievalAgent] Add base retrieval agent Jan 12, 2024
erniebot-agent/src/erniebot_agent/tools/baizhong_tool.py Outdated Show resolved Hide resolved


class LangChainRetrievalTool(Tool):
description: str = "在知识库中检索与用户输入query相关的段落"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这里没有设置 InputView 和 OutputView。

Copy link
Collaborator Author

@w5688414 w5688414 Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

langchain里面有metadata字段,是一个Dict,用于存储元数据,不太好实例化成pydantic的形式

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants