-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
continue.dev integration WIP #89
Conversation
WalkthroughThe updates bring significant enhancements across various modules, focusing on document processing, graph topology, category search, and evaluation metrics. These changes include new imports, function modifications, attribute initialization, and logic improvements for categorizing relevant categories. The system's capabilities in document processing, search functionality, and evaluation have been significantly enriched. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
10531072 | Triggered | Generic High Entropy Secret | fec7010 | cognee/utils.py | View secret |
10531072 | Triggered | Generic High Entropy Secret | fec7010 | cognee/utils.py | View secret |
10593973 | Triggered | Generic High Entropy Secret | 79311ee | notebooks/full_run.ipynb | View secret |
10593973 | Triggered | Generic High Entropy Secret | 8e850c1 | notebooks/full_run.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Out of diff range and nitpick comments (1)
notebooks/full_run.ipynb (1)
81-81
: Re-importingSourceCodeGraph
and reconfiguring the graph model is redundant. Consider streamlining the notebook by removing repeated code to enhance readability and maintainability.
''' | ||
Given a string, find the length of the longest substring without repeating characters. | ||
|
||
Examples: | ||
|
||
Given "abcabcbb", the answer is "abc", which the length is 3. | ||
|
||
Given "bbbbb", the answer is "b", with the length of 1. | ||
|
||
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring. | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper pairing of triple quotes in the comments for clarity.
- '''
+ """
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
''' | |
Given a string, find the length of the longest substring without repeating characters. | |
Examples: | |
Given "abcabcbb", the answer is "abc", which the length is 3. | |
Given "bbbbb", the answer is "b", with the length of 1. | |
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring. | |
''' | |
""" | |
Given a string, find the length of the longest substring without repeating characters. | |
Examples: | |
Given "abcabcbb", the answer is "abc", which the length is 3. | |
Given "bbbbb", the answer is "b", with the length of 1. | |
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring. | |
""" |
type_node_id = generate_node_id(node.type) | ||
type_node = await graph_client.extract_node(type_node_id) | ||
|
||
if not entity_type_node: | ||
node_name = node.entity_type.lower().capitalize() | ||
if not type_node: | ||
node_name = node.type.lower().capitalize() | ||
|
||
entity_type_node = ( | ||
entity_type_node_id, | ||
type_node = ( | ||
type_node_id, | ||
dict( | ||
id = entity_type_node_id, | ||
id = type_node_id, | ||
name = node_name, | ||
entity_type = node_name, | ||
type = node_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of type nodes with standardized 'type' and 'name' properties aligns with the PR's objectives. Consider using a more precise timestamp format or a timezone-aware datetime object for 'created_at' and 'updated_at' to avoid potential issues with time zone differences.
- created_at = datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
- updated_at = datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+ from datetime import datetime, timezone
+ created_at = datetime.now(timezone.utc).isoformat(),
+ updated_at = datetime.now(timezone.utc).isoformat(),
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
type_node_id = generate_node_id(node.type) | |
type_node = await graph_client.extract_node(type_node_id) | |
if not entity_type_node: | |
node_name = node.entity_type.lower().capitalize() | |
if not type_node: | |
node_name = node.type.lower().capitalize() | |
entity_type_node = ( | |
entity_type_node_id, | |
type_node = ( | |
type_node_id, | |
dict( | |
id = entity_type_node_id, | |
id = type_node_id, | |
name = node_name, | |
entity_type = node_name, | |
type = node_name, | |
type_node_id = generate_node_id(node.type) | |
type_node = await graph_client.extract_node(type_node_id) | |
if not type_node: | |
node_name = node.type.lower().capitalize() | |
type_node = ( | |
type_node_id, | |
dict( | |
id = type_node_id, | |
name = node_name, | |
type = node_name, | |
created_at = datetime.now(timezone.utc).isoformat(), | |
updated_at = datetime.now(timezone.utc).isoformat(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Out of diff range and nitpick comments (2)
cognee/modules/search/graph/search_summary.py (1)
Line range hint
18-49
: The functionsearch_summary
is well-designed to handle different graph engines and provides detailed documentation. However, ensure that the f-string in the Neo4j query section is correctly used or removed if not needed.- cypher_query = f""" + cypher_query = """ MATCH (n) WHERE n.id CONTAINS $query AND EXISTS(n.summary) RETURN n.id AS nodeId, n.summary AS summary """cognee/infrastructure/databases/graph/networkx/adapter.py (1)
Line range hint
134-134
: Remove the unused variableerror
in the exception handling block to clean up the code.- except Exception as error: + except Exception: logger.error("Failed to load graph from file: %s", file_path) # Initialize an empty graph in case of error self.graph = nx.MultiDiGraph() await self.save_graph_to_file(file_path)
cognee/api/v1/cognify/cognify.py
Outdated
|
||
from cognee.utils import render_graph | ||
|
||
await render_graph(graph, include_color=True, include_nodes=True, include_size=True) | ||
await render_graph(graph, include_color=True, include_nodes=False, include_size=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined variable graph
used in render_graph
.
- await render_graph(graph, include_color=True, include_nodes=False, include_size=False)
+ # Ensure that 'graph' is defined before this line or pass the correct graph object.
+ # await render_graph(graph, include_color=True, include_nodes=False, include_size=False)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
await render_graph(graph, include_color=True, include_nodes=False, include_size=False) | |
# Ensure that 'graph' is defined before this line or pass the correct graph object. | |
# await render_graph(graph, include_color=True, include_nodes=False, include_size=False) |
import logging | ||
from typing import List, Dict | ||
from cognee.infrastructure import infrastructure_config | ||
from.extraction.categorize_relevant_summary import categorize_relevant_summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure the import statement uses correct relative path syntax. It seems there might be a typo with the dot (.
) before extraction
.
- from.extraction.categorize_relevant_summary import categorize_relevant_summary
+ from extraction.categorize_relevant_summary import categorize_relevant_summary
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from.extraction.categorize_relevant_summary import categorize_relevant_summary | |
from extraction.categorize_relevant_summary import categorize_relevant_summary |
|
||
""" Search categories in the graph and return their summary attributes. """ | ||
|
||
from cognee.shared.data_models import GraphDBType | ||
from cognee.shared.data_models import GraphDBType, DefaultContentPrediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the import statement for networkx as nx
to the top of the file to follow Python's convention of grouping all imports at the beginning.
- import networkx as nx
+ # Moved to top
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from cognee.shared.data_models import GraphDBType, DefaultContentPrediction | |
from cognee.shared.data_models import GraphDBType, DefaultContentPrediction |
notebooks/full_run.ipynb
Outdated
}, | ||
{ | ||
"ename": "CatalogException", | ||
"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catalog Error: Verify the table name file_metadata
.
- train_dataset = cognee.datasets.query_data("short_stories")
+ # Ensure the table name is correct or exists in the database configuration.
+ # train_dataset = cognee.datasets.query_data("short_stories")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?", | |
"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?", |
notebooks/full_run.ipynb
Outdated
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"ename": "NameError", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NameError: results
is not defined.
- descriptions = {node: graph.nodes[node].get('summary', 'No summary available') for node in results}
+ # Ensure that 'results' is defined before this line.
+ # descriptions = {node: graph.nodes[node].get('summary', 'No summary available') for node in results}
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
"ename": "NameError", | |
"ename": "NameError", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Out of diff range and nitpick comments (2)
cognee/shared/data_models.py (1)
11-11
: Update thetype
attribute documentation to reflect its purpose more clearly.cognee/modules/cognify/graph/create.py (1)
Line range hint
183-183
: The local variablerelationship_data
is assigned but never used. This could be a leftover from a previous version of the code. Consider removing it or implementing its intended use.- relationship_data = {}
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- cognee/api/v1/cognify/cognify.py (4 hunks)
- cognee/api/v1/config/config.py (1 hunks)
- cognee/config.py (2 hunks)
- cognee/infrastructure/InfrastructureConfig.py (4 hunks)
- cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (1 hunks)
- cognee/infrastructure/files/utils/get_file_metadata.py (1 hunks)
- cognee/modules/cognify/graph/create.py (2 hunks)
- cognee/modules/cognify/graph/initialize_graph.py (1 hunks)
- cognee/shared/GithubTopology.py (1 hunks)
- cognee/shared/data_models.py (7 hunks)
- cognee/utils.py (2 hunks)
Files not reviewed due to errors (6)
- cognee/shared/GithubTopology.py (no review received)
- cognee/modules/cognify/graph/initialize_graph.py (no review received)
- cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (no review received)
- cognee/api/v1/config/config.py (no review received)
- cognee/config.py (no review received)
- cognee/api/v1/cognify/cognify.py (no review received)
Additional Context Used
Ruff (4)
cognee/api/v1/cognify/cognify.py (1)
80-80: Local variable
graph_topology
is assigned to but never usedcognee/infrastructure/files/utils/get_file_metadata.py (1)
24-24: Do not use bare
except
cognee/modules/cognify/graph/create.py (1)
183-183: Local variable
relationship_data
is assigned to but never usedcognee/shared/GithubTopology.py (1)
21-21: Undefined name
Directory
Additional comments not posted (2)
cognee/infrastructure/InfrastructureConfig.py (1)
36-36
: The implementation ofInfrastructureConfig
for managing system configurations is robust and well-structured.cognee/shared/data_models.py (1)
252-254
: Ensure that theResponseSummaryModel
class is integrated properly in the system, especially where document IDs and response summaries are handled.Verification successful
The review of the
ResponseSummaryModel
integration within the system shows that the model is used in various parts of the system, particularly in functions that generate structured outputs from large language models. Theacreate_structured_output
andmodel_dump
functions are designed to utilize the model in a generic way, which suggests that the handling of specific fields likedocument_id
andresponse_summary
would depend on the implementation of theResponseSummaryModel
itself.Given the abstract nature of these functions and the lack of specific errors or misconfigurations in the script outputs, it appears that the integration is as expected. However, for a complete verification, a detailed review of how
ResponseSummaryModel
is structured and its interaction with these functions would be necessary.
cognee/modules/search/graph/search_summary.py
: UsesResponseSummaryModel
incategorize_relevant_summary
.cognee/infrastructure/llm/
: Contains implementations ofacreate_structured_output
that useResponseSummaryModel
.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `ResponseSummaryModel` in the system. # Test: Search for the usage of `ResponseSummaryModel`. Expect: Proper handling of document IDs and response summaries. rg --type py 'ResponseSummaryModel'Length of output: 397
Script:
#!/bin/bash # Description: Inspect the usage of `ResponseSummaryModel` in the function `categorize_relevant_summary`. # Test: Search for the implementation of `categorize_relevant_summary` to see how it handles `document_id` and `response_summary`. rg --type py 'def categorize_relevant_summary' -A 20Length of output: 1388
Script:
#!/bin/bash # Description: Further inspect the usage of `ResponseSummaryModel` within `acreate_structured_output` and `model_dump`. # Test: Search for the implementation of `acreate_structured_output` and `model_dump` to see how they handle `document_id` and `response_summary`. rg --type py 'acreate_structured_output' -A 10 rg --type py 'model_dump' -A 10Length of output: 13097
except: | ||
keywords = ["no keywords detected" + str(uuid.uuid4())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify exception type in the except
block to avoid masking unexpected errors.
- except:
+ except Exception as e:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
except: | |
keywords = ["no keywords detected" + str(uuid.uuid4())] | |
except Exception as e: | |
keywords = ["no keywords detected" + str(uuid.uuid4())] |
@@ -36,13 +34,13 @@ class ChunkStrategy(Enum): | |||
EXACT = "exact" | |||
PARAGRAPH = "paragraph" | |||
SENTENCE = "sentence" | |||
CODE = "code" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Several files in the codebase do not handle the new CODE
value in ChunkStrategy
enum:
DefaultChunkEngine.py
: This file handlesPARAGRAPH
,SENTENCE
, andEXACT
values but does not include a case forCODE
.
Analysis chain
Addition of CODE
to ChunkStrategy
enum. Ensure that all usages of this enum are updated to handle this new value.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify all usages of `ChunkStrategy` enum handle the new `CODE` value.
# Test: Search for the enum usage. Expect: Handling of the new value in all occurrences.
rg --type py 'ChunkStrategy'
Length of output: 911
print("Telemetry is disabled.") | ||
logging.info("Telemetry is disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging and printing the same message about telemetry being disabled is redundant. Consider removing the print statement to clean up the console output.
- print("Telemetry is disabled.")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
print("Telemetry is disabled.") | |
logging.info("Telemetry is disabled.") | |
logging.info("Telemetry is disabled.") |
@@ -44,6 +44,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di | |||
|
|||
# Add an edge if a parent ID is provided and the graph engine is NETWORKX | |||
if parent_id and "default_relationship" in node_data and infrastructure_config.get_config()["graph_engine"] == GraphDBType.NETWORKX: | |||
print("Node id", node_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging the node ID during the edge addition process might not be necessary and could clutter the log files. Consider removing this print statement unless it is essential for debugging.
- print("Node id", node_id)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
print("Node id", node_id) |
@@ -103,6 +104,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di | |||
|
|||
|
|||
async def add_edge(client, parent_id: Optional[str], node_id: str, node_data: dict, created_node_ids): | |||
print('NODE ID', node_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The print statement here seems to be for debugging purposes. It's a good practice to use a logging framework instead of print statements for better control over logging levels and outputs.
- print('NODE ID', node_data)
+ logging.debug('NODE ID: %s', node_data)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
print('NODE ID', node_data) | |
logging.debug('NODE ID: %s', node_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- cognee/api/v1/cognify/cognify.py (4 hunks)
- cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
- cognee/modules/topology/extraction/extract_topology.py (1 hunks)
- cognee/modules/topology/infer_data_topology.py (1 hunks)
- cognee/modules/topology/topology.py (1 hunks)
Files skipped from review due to trivial changes (1)
- cognee/infrastructure/llm/prompts/extract_topology.txt
Additional Context Used
Ruff (3)
cognee/api/v1/cognify/cognify.py (1)
88-88: Local variable
graph_topology
is assigned to but never usedcognee/modules/topology/topology.py (2)
7-7: Redefinition of unused
BaseModel
from line 4
8-8: Redefinition of unused
Dict
from line 5
Additional comments not posted (4)
cognee/modules/topology/extraction/extract_topology.py (1)
7-14
: The implementation ofextract_categories
function looks clean and efficient.cognee/modules/topology/infer_data_topology.py (1)
9-19
: The implementation ofinfer_data_topology
function is robust, handling default configurations and exceptions effectively.cognee/modules/topology/topology.py (1)
71-120
: TheTopologyEngine
class and its methods are well-implemented, providing clear functionality for managing repository topology.cognee/api/v1/cognify/cognify.py (1)
Line range hint
17-201
: Thecognify
function and the main test function are well-implemented, covering a comprehensive range of functionalities for processing and graph operations.
from pydantic import BaseModel, create_model | ||
from typing import Dict, Type, Any | ||
|
||
from pydantic import BaseModel, Field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redefinition of BaseModel
from line 4 is unnecessary and can be removed to avoid confusion.
- from pydantic import BaseModel, Field
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from pydantic import BaseModel, Field |
from typing import Dict, Type, Any | ||
|
||
from pydantic import BaseModel, Field | ||
from typing import Dict, List, Optional, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redefinition of Dict
from line 5 is unnecessary and can be removed to avoid confusion.
- from typing import Dict, List, Optional, Union
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from typing import Dict, List, Optional, Union |
cognee/api/v1/cognify/cognify.py
Outdated
|
||
|
||
|
||
graph_topology = infrastructure_config.get_config()["graph_topology"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable graph_topology
is assigned but never used, which could be removed to clean up the code.
- graph_topology = infrastructure_config.get_config()["graph_topology"]
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
graph_topology = infrastructure_config.get_config()["graph_topology"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Out of diff range and nitpick comments (4)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
Line range hint
6-6
: Remove duplicate import ofTextEmbedding
.- from fastembed import TextEmbedding
Line range hint
5-5
: Remove duplicate import ofAsyncOpenAI
.- from openai import AsyncOpenAI
cognee/modules/cognify/graph/add_label_nodes.py (1)
Line range hint
8-8
: Remove the unused variablevector_client
.- vector_client = infrastructure_config.get_config("vector_engine")
cognee/modules/cognify/graph/create.py (1)
Line range hint
219-219
: Remove the unused variablerelationship_data
.- relationship_data = {}
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (2)
poetry.lock
is excluded by!**/*.lock
,!**/*.lock
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (8)
- cognee/api/v1/cognify/cognify.py (5 hunks)
- cognee/api/v1/topology/add_topology.py (1 hunks)
- cognee/infrastructure/data/chunking/DefaultChunkEngine.py (1 hunks)
- cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
- cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
- cognee/modules/cognify/graph/add_label_nodes.py (2 hunks)
- cognee/modules/cognify/graph/create.py (4 hunks)
- cognee/modules/topology/topology.py (1 hunks)
Files skipped from review due to trivial changes (1)
- cognee/infrastructure/data/chunking/DefaultChunkEngine.py
Additional Context Used
LanguageTool (2)
cognee/infrastructure/llm/prompts/extract_topology.txt (2)
Near line 2: Inconsistent capitalization in contraction.
Context: ... the text provided to you. Relationship part'S can't be empty, and have to be logical ...
Near line 2: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS O...
Ruff (11)
cognee/api/v1/cognify/cognify.py (1)
121-121: Local variable
graph_topology
is assigned to but never usedcognee/api/v1/topology/add_topology.py (2)
16-16: Local variable
graph_topology
is assigned to but never used
80-80: Local variable
graph
is assigned to but never usedcognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
7-7: Redefinition of unused
TextEmbedding
from line 6
8-8: Redefinition of unused
AsyncOpenAI
from line 5cognee/modules/cognify/graph/add_label_nodes.py (1)
8-8: Local variable
vector_client
is assigned to but never usedcognee/modules/cognify/graph/create.py (3)
159-159: Local variable
created_node_id
is assigned to but never used
219-219: Local variable
relationship_data
is assigned to but never used
254-254: Local variable
ids
is assigned to but never usedcognee/modules/topology/topology.py (2)
7-7: Redefinition of unused
BaseModel
from line 4
8-8: Redefinition of unused
Dict
from line 5
Additional comments not posted (2)
cognee/modules/topology/topology.py (2)
7-7
: Remove duplicate import ofBaseModel
.
8-8
: Remove duplicate import ofDict
.
@@ -0,0 +1,4 @@ | |||
You are a topology master and need to extract the following topology information from the text provided to you. | |||
Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct grammatical issues in the prompt.
- Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY
+ Relationship parts can't be empty and have to be logical and connecting elements of the topology.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY | |
Relationship parts can't be empty and have to be logical and connecting elements of the topology. |
# | ||
# await add("data://" +data_directory_path, "example") | ||
|
||
graph = await add_topology() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused variable graph
.
- graph = await add_topology()
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
graph = await add_topology() | |
await add_topology() |
cognee/api/v1/cognify/cognify.py
Outdated
@@ -109,78 +118,91 @@ async def process_text(chunk_collection: str, chunk_id: str, input_text: str, fi | |||
|
|||
graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"]) | |||
|
|||
graph_topology = infrastructure_config.get_config()["graph_topology"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused variable graph_topology
.
- graph_topology = infrastructure_config.get_config()["graph_topology"]
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
graph_topology = infrastructure_config.get_config()["graph_topology"] |
if isinstance(value, BaseModel): | ||
node_id = await generate_node_id(value) | ||
node_data = value.model_dump() | ||
|
||
# Use the specified default relationship for the edge between the parent node and the current node | ||
|
||
# Add node to the graph | ||
created_node_id = await add_node(graph_client, parent_id, node_id, node_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused variable created_node_id
.
- created_node_id = await add_node(graph_client, parent_id, node_id, node_data)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
created_node_id = await add_node(graph_client, parent_id, node_id, node_data) |
|
||
for attribute_name, attribute_value in graph_model: | ||
ids = await process_attribute_edge(graph_client, root_id, attribute_name, attribute_value, flattened_and_deduplicated) | ||
|
||
ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused variable ids
.
- ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value) | |
await process_attribute(graph_client, root_id, attribute_name, attribute_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Out of diff range and nitpick comments (4)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
Line range hint
6-6
: Duplicate import ofTextEmbedding
fromfastembed
. Remove the redundant import to clean up the code.- from fastembed import TextEmbedding
Line range hint
5-5
: Duplicate import ofAsyncOpenAI
fromopenai
. It's good practice to avoid duplicate imports to maintain code clarity.- from openai import AsyncOpenAI
cognee/modules/cognify/graph/add_data_chunks.py (2)
Line range hint
13-13
: The variablevector_client
is assigned but never used in the uncommented sections of the code. If it's not needed, consider removing it to clean up the code.- vector_client = infrastructure_config.get_config("vector_engine")
Line range hint
18-59
: The commented-out error handling and retry logic could be important for robustness, especially in production environments. Consider implementing a more structured error handling and retry mechanism.Would you like me to help implement a robust error handling and retry mechanism for this function?
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (6)
- cognee/api/v1/cognify/cognify.py (7 hunks)
- cognee/config.py (3 hunks)
- cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (1 hunks)
- cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
- cognee/modules/cognify/graph/add_data_chunks.py (3 hunks)
- cognee/shared/GithubClassification.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- cognee/config.py
- cognee/infrastructure/data/chunking/LangchainChunkingEngine.py
Additional Context Used
Ruff (5)
cognee/api/v1/cognify/cognify.py (1)
125-125: Local variable
graph_topology
is assigned to but never usedcognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3)
7-7: Redefinition of unused
TextEmbedding
from line 6
8-8: Redefinition of unused
AsyncOpenAI
from line 5
40-40: Do not use bare
except
cognee/modules/cognify/graph/add_data_chunks.py (1)
13-13: Local variable
vector_client
is assigned to but never used
Additional comments not posted (8)
cognee/shared/GithubClassification.py (4)
8-13
: The enumTextSubclass
is well-defined with clear, descriptive categories for different types of text content.
15-17
: TheContentType
class is simple and correctly uses Pydantic for data validation. The default type of "TEXT" is appropriate for the base class.
19-21
: TheTextContent
class correctly inherits fromContentType
and introduces a new fieldsubclass
which is a list ofTextSubclass
. This design supports multiple categorizations for text content, which is flexible and useful.
23-25
: TheCodeContentPrediction
model is succinct and appropriate for predicting content types. It usesTextContent
for thelabel
field, which ensures that predictions are tied to the defined text categories.cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (1)
Line range hint
16-66
: TheDefaultEmbeddingEngine
andLiteLLMEmbeddingEngine
classes are well-structured for handling text embeddings. Ensure that the verbose setting and configuration loading are appropriately managed to avoid side effects.Verification successful
The review comment raised a concern about the potential side effects of the
set_verbose
setting used in theDefaultEmbeddingEngine
class. The verification process involved searching for the usage ofset_verbose
across the application to ensure it does not alter any critical settings elsewhere.The results from the executed script show that
set_verbose
is only used in theDefaultEmbeddingEngine.py
file and is not found in any other parts of the application. This indicates that the setting is localized to this specific class and does not impact other components or configurations.Therefore, the concern about unintended side effects from the
set_verbose
setting does not apply, as its usage is confined and does not interfere with other parts of the application.
- File:
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py
- Detail: The
set_verbose = True
setting is used only within this file and does not affect other application settings.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the verbose setting does not cause unintended side effects across the application. # Test: Search for `set_verbose` usage in other parts of the application. Expect: No other critical settings are altered. rg --type py 'set_verbose'Length of output: 129
cognee/api/v1/cognify/cognify.py (3)
125-125
: The variablegraph_topology
is assigned but never used, which could be removed to clean up the code.
215-215
: Ensure that thegraph
variable is defined before it's used inrender_graph
. This is crucial to avoid runtime errors.
192-212
: The test function is well-structured for testing thecognify
process. However, ensure that all configurations and dependencies are correctly set up before running the tests to avoid issues.
print("text", text) | ||
try: | ||
text = str(text[0]) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using a bare except
statement. It's better to specify the exception type to avoid catching unexpected exceptions and to make the error handling clearer.
- except:
+ except SomeSpecificException:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
except: | |
except SomeSpecificException: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Out of diff range and nitpick comments (3)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
Line range hint
14-29
: Consider refactoring the lambda function for converting embeddings to lists for improved readability.- embeddings_list = list(map(lambda embedding: embedding.tolist(), embedding_model.embed(text))) + embeddings_list = [embedding.tolist() for embedding in embedding_model.embed(text)]
42-46
: Determine if the commented-out debug prints are necessary. If they are not, consider removing them to clean up the code.cognee/modules/cognify/graph/add_cognitive_layer_graphs.py (1)
[!TIP]
Codebase VerificationCommented-out text processing functions are used elsewhere in the codebase
The commented-out text processing functions (
extract_pos_tags
,extract_named_entities
,extract_sentiment_vader
) are indeed used in other parts of the codebase, specifically incognee/utils.py
andcognee/infrastructure/data/utils/extract_keywords.py
.
cognee/utils.py
:
extract_pos_tags
extract_named_entities
extract_sentiment_vader
cognee/infrastructure/data/utils/extract_keywords.py
:
extract_pos_tags
Given their usage in other files, it is important to retain these functions in the codebase. However, if the commented-out lines in
add_cognitive_layer_graphs.py
are not needed, they can be removed to clean up the code.
- Consider removing the commented-out lines in
add_cognitive_layer_graphs.py
if they are no longer needed.Analysis chain
Line range hint
144-267
: Review the commented-out lines related to text processing. If they are no longer needed, consider removing them to clean up the code.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if the commented-out text processing functions are used elsewhere in the codebase. # Test: Search for the functions usage. Expect: No occurrences outside this file. rg --type python $'extract_pos_tags|extract_named_entities|extract_sentiment_vader'Length of output: 115
Script:
#!/bin/bash # Description: Check if the commented-out text processing functions are used elsewhere in the codebase. # Test: Search for the functions usage. Expect: No occurrences outside this file. rg --type py 'extract_pos_tags|extract_named_entities|extract_sentiment_vader'Length of output: 1081
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (2)
poetry.lock
is excluded by!**/*.lock
,!**/*.lock
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (11)
- cognee/api/v1/cognify/cognify.py (3 hunks)
- cognee/config.py (3 hunks)
- cognee/infrastructure/InfrastructureConfig.py (4 hunks)
- cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
- cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2 hunks)
- cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
- cognee/modules/cognify/graph/add_cognitive_layer_graphs.py (5 hunks)
- cognee/modules/cognify/graph/add_data_chunks.py (2 hunks)
- cognee/modules/cognify/graph/add_document_node.py (2 hunks)
- cognee/modules/cognify/graph/add_label_nodes.py (1 hunks)
- cognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- cognee/config.py
- cognee/infrastructure/InfrastructureConfig.py
- cognee/modules/cognify/graph/add_data_chunks.py
Additional Context Used
Ruff (4)
cognee/api/v1/cognify/cognify.py (3)
179-179: Local variable
graph_topology
is assigned to but never used
184-184: Local variable
document_id
is assigned to but never used
201-201: Local variable
classified_categories
is assigned to but never usedcognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py (1)
21-21: Local variable
error
is assigned to but never used
Additional comments not posted (2)
cognee/modules/cognify/graph/add_document_node.py (1)
17-17
: Correctly sets the document type to "Document".cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
Line range hint
1-1
: The implementation ofLanceDBAdapter
correctly handles interactions with the vector database.
@@ -25,6 +25,7 @@ async def add_document_node(graph_client: GraphDBInterface, parent_node_id, docu | |||
dict(relationship_name = "has_document"), | |||
) | |||
|
|||
await add_label_nodes(graph_client, document_id, document_metadata["keywords"].split("|")) | |||
# | |||
# await add_label_nodes(graph_client, document_id, document_metadata["keywords"].split("|")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Consider whether the commented-out lines for adding label nodes are still needed. If not, they should be removed to clean up the code.
cognee/modules/cognify/graph/add_document_node.py
: Line 29cognee/api/v1/cognify/cognify.py
: Line with# await add_label_nodes(graph_client, document_id, chunk_id, file_metadata["keywords"].split("|"))
Analysis chain
Consider whether the commented-out line for adding label nodes is still needed. If not, it should be removed to clean up the code.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the function `add_label_nodes` is used elsewhere in the codebase.
# Test: Search for the function usage. Expect: No occurrences outside this file.
rg --type python $'add_label_nodes'
Length of output: 65
Script:
#!/bin/bash
# Description: Check if the function `add_label_nodes` is used elsewhere in the codebase.
# Test: Search for the function usage. Expect: No occurrences outside this file.
rg 'add_label_nodes' --glob '*.py'
Length of output: 579
@@ -21,4 +21,4 @@ def sync_extract_knowledge_graph(): | |||
except Exception as error: | |||
# TODO: Log error to Sentry | |||
|
|||
return await extract_content_graph(text, cognitive_layer, graph_model) | |||
return await extract_content_graph(text, cognitive_layer, graph_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper error handling by logging the exception. Currently, the error is caught but not logged, which could hinder debugging.
- except Exception as error:
+ except Exception as error:
+ logger.error(f"Failed to extract knowledge graph: {error}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
return await extract_content_graph(text, cognitive_layer, graph_model) | |
return await extract_content_graph(text, cognitive_layer, graph_model) | |
except Exception as error: | |
logger.error(f"Failed to extract knowledge graph: {error}") |
except Exception as e: | ||
# It's ok if the collection already exists. | ||
pass | ||
print(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify the exception type instead of using a generic exception to avoid catching unexpected exceptions and to make the error handling clearer.
- except Exception as e:
+ except CollectionExistsException as e:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
except Exception as e: | |
# It's ok if the collection already exists. | |
pass | |
print(e) | |
except CollectionExistsException as e: | |
# It's ok if the collection already exists. | |
print(e) |
async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel: | ||
"""Generate a response from a user query.""" | ||
|
||
return await self.aclient.chat.completions.create( | ||
model=self.model, | ||
messages=[ | ||
model = self.model, | ||
messages = [ | ||
{ | ||
"role": "user", | ||
"content": f"""Use the given format to | ||
extract information from the following input: {text_input}. {system_prompt} """, | ||
} | ||
extract information from the following input: {text_input}. """, | ||
}, | ||
{"role": "system", "content": system_prompt}, | ||
], | ||
response_model=response_model, | ||
response_model = response_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify the string formatting in acreate_structured_output
to reduce complexity and potential errors.
- "content": f"""Use the given format to
- extract information from the following input: {text_input}. """,
+ "content": "Use the given format to extract information from the following input: " + text_input
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel: | |
"""Generate a response from a user query.""" | |
return await self.aclient.chat.completions.create( | |
model=self.model, | |
messages=[ | |
model = self.model, | |
messages = [ | |
{ | |
"role": "user", | |
"content": f"""Use the given format to | |
extract information from the following input: {text_input}. {system_prompt} """, | |
} | |
extract information from the following input: {text_input}. """, | |
}, | |
{"role": "system", "content": system_prompt}, | |
], | |
response_model=response_model, | |
response_model = response_model, | |
async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel: | |
"""Generate a response from a user query.""" | |
return await self.aclient.chat.completions.create( | |
model = self.model, | |
messages = [ | |
{ | |
"role": "user", | |
"content": "Use the given format to extract information from the following input: " + text_input, | |
}, | |
{"role": "system", "content": system_prompt}, | |
], | |
response_model = response_model, |
cognee/api/v1/cognify/cognify.py
Outdated
# categories = classified_categories, | ||
# ) | ||
|
||
classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}] | ||
|
||
|
||
# | ||
# async def process_text(document_id: str, chunk_id: str, chunk_collection: str, input_text: str): | ||
# raw_document_id = document_id.split("__")[-1] | ||
# | ||
# print(f"Processing chunk ({chunk_id}) from document ({raw_document_id}).") | ||
# | ||
# graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"]) | ||
# | ||
# classified_categories = await get_content_categories(input_text) | ||
# await add_classification_nodes( | ||
# graph_client, | ||
# parent_node_id = document_id, | ||
# categories = classified_categories, | ||
# ) | ||
# >>>>>>> origin/main | ||
# | ||
# print(f"Chunk ({chunk_id}) classified.") | ||
# | ||
# # print("document_id", document_id) | ||
# # | ||
# # content_summary = await get_content_summary(input_text) | ||
# # await add_summary_nodes(graph_client, document_id, content_summary) | ||
# | ||
# print(f"Chunk ({chunk_id}) summarized.") | ||
# # | ||
# cognitive_layers = await get_cognitive_layers(input_text, classified_categories) | ||
# cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2] | ||
# # | ||
# layer_graphs = await get_layer_graphs(input_text, cognitive_layers) | ||
# await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs) | ||
# | ||
# <<<<<<< HEAD | ||
# print("got here 4444") | ||
# | ||
# if infrastructure_config.get_config()["connect_documents"] is True: | ||
# db_engine = infrastructure_config.get_config()["database_engine"] | ||
# relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = file_metadata["id"]) | ||
# | ||
# list_of_nodes = [] | ||
# | ||
# relevant_documents_to_connect.append({ | ||
# "layer_id": document_id, | ||
# }) | ||
# | ||
# for document in relevant_documents_to_connect: | ||
# node_descriptions_to_match = await graph_client.extract_node_description(document["layer_id"]) | ||
# list_of_nodes.extend(node_descriptions_to_match) | ||
# | ||
# nodes_by_layer = await group_nodes_by_layer(list_of_nodes) | ||
# | ||
# results = await resolve_cross_graph_references(nodes_by_layer) | ||
# | ||
# relationships = graph_ready_output(results) | ||
# | ||
# await connect_nodes_in_graph( | ||
# graph_client, | ||
# relationships, | ||
# score_threshold = infrastructure_config.get_config()["intra_layer_score_treshold"] | ||
# ) | ||
# | ||
# send_telemetry("cognee.cognify") | ||
# | ||
# print(f"Chunk ({chunk_id}) cognified.") | ||
# ======= | ||
# if infrastructure_config.get_config()["connect_documents"] is True: | ||
# db_engine = infrastructure_config.get_config()["database_engine"] | ||
# relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = raw_document_id) | ||
# | ||
# list_of_nodes = [] | ||
# | ||
# relevant_documents_to_connect.append({ | ||
# "layer_id": document_id, | ||
# }) | ||
# | ||
# for document in relevant_documents_to_connect: | ||
# node_descriptions_to_match = await graph_client.extract_node_description(document["layer_id"]) | ||
# list_of_nodes.extend(node_descriptions_to_match) | ||
# | ||
# nodes_by_layer = await group_nodes_by_layer(list_of_nodes) | ||
# | ||
# results = await resolve_cross_graph_references(nodes_by_layer) | ||
# | ||
# relationships = graph_ready_output(results) | ||
# | ||
# await connect_nodes_in_graph( | ||
# graph_client, | ||
# relationships, | ||
# score_threshold = infrastructure_config.get_config()["intra_layer_score_treshold"] | ||
# ) | ||
# | ||
# send_telemetry("cognee.cognify") | ||
# | ||
# print(f"Chunk ({chunk_id}) cognified.") | ||
# >>>>>>> origin/main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Commented-out code cleanup needed
The functions add_classification_nodes
, get_content_categories
, get_content_summary
, and add_summary_nodes
are defined and imported but are not used outside of the commented-out sections in cognify.py
. Consider removing these commented-out lines to clean up the code.
cognee/api/v1/cognify/cognify.py
: Lines containing commented-out code related to the mentioned functions.
Analysis chain
Review the commented-out lines and debug prints. If they are no longer needed, consider removing them to clean up the code.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the commented-out functions are used elsewhere in the codebase.
# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type python $'add_classification_nodes|get_content_categories|get_content_summary|add_summary_nodes'
Length of output: 137
Script:
#!/bin/bash
# Description: Check if the commented-out functions are used elsewhere in the codebase.
# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type py $'add_classification_nodes|get_content_categories|get_content_summary|add_summary_nodes'
Length of output: 1559
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Out of diff range and nitpick comments (1)
cognee/api/v1/cognify/cognify.py (1)
[!TIP]
Codebase VerificationUnused Imports in
cognee/api/v1/cognify/cognify.py
The following imports are not actively used in the code and should be considered for removal:
add_classification_nodes
add_summary_nodes
group_nodes_by_layer
graph_ready_output
connect_nodes_in_graph
resolve_cross_graph_references
get_content_categories
get_content_summary
get_cognitive_layers
get_layer_graphs
send_telemetry
These imports are either commented out or not referenced in the active code. Removing them can help clean up the codebase and improve maintainability.
Analysis chain
Line range hint
11-35
: Ensure all new imports are utilized effectively in the code.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all newly imported modules are used in the file. # Test: Search for the usage of newly imported modules. Expect: At least one occurrence of each module. rg --type python $'prune|Config|LangchainChunkingEngine|LiteLLMEmbeddingEngine|add_data_chunks|add_document_node|add_classification_nodes|add_cognitive_layer_graphs|add_summary_nodes|group_nodes_by_layer|graph_ready_output|connect_nodes_in_graph|resolve_cross_graph_references|get_graph_client|add_cognitive_layers|guess_file_type|extract_text_from_file|infrastructure_config|get_content_categories|get_content_summary|get_cognitive_layers|get_layer_graphs|TopologyEngine|CodeContentPrediction|ChunkStrategy|DefaultGraphModel|send_telemetry' cognee/api/v1/cognify/cognify.pyLength of output: 606
Script:
#!/bin/bash # Description: Verify that all newly imported modules are used in the file. # Test: Search for the usage of newly imported modules. Expect: At least one occurrence of each module. rg $'prune|Config|LangchainChunkingEngine|LiteLLMEmbeddingEngine|add_data_chunks|add_document_node|add_classification_nodes|add_cognitive_layer_graphs|add_summary_nodes|group_nodes_by_layer|graph_ready_output|connect_nodes_in_graph|resolve_cross_graph_references|get_graph_client|add_cognitive_layers|guess_file_type|extract_text_from_file|infrastructure_config|get_content_categories|get_content_summary|get_cognitive_layers|get_layer_graphs|TopologyEngine|CodeContentPrediction|ChunkStrategy|DefaultGraphModel|send_telemetry' cognee/api/v1/cognify/cognify.pyLength of output: 5726
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- cognee/api/v1/cognify/cognify.py (3 hunks)
Additional Context Used
Ruff (3)
cognee/api/v1/cognify/cognify.py (3)
184-184: Local variable
parent_node_id
is assigned to but never used
186-186: Local variable
document_id
is assigned to but never used
203-203: Local variable
classified_categories
is assigned to but never used
Additional comments not posted (2)
cognee/api/v1/cognify/cognify.py (2)
179-179
: Remove unused variablegraph_topology
.This issue has been flagged in previous reviews. Please ensure it is addressed.
328-328
: Undefined variablegraph
used inrender_graph
.This issue has been flagged in previous reviews. Please ensure it is addressed.
cognee/api/v1/cognify/cognify.py
Outdated
if graph_topology == "default": | ||
parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}" | ||
elif graph_topology == DefaultGraphModel: | ||
parent_node_id = f"DefaultGraphModel__{USER_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variable parent_node_id
is assigned but never used in the process_text
function.
- parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}"
Consider removing this line if parent_node_id
is not needed, or ensure it is used appropriately if intended.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
parent_node_id = f"DefaultGraphModel__{USER_ID}" |
cognee/api/v1/cognify/cognify.py
Outdated
|
||
print(f"Chunk ({chunk_id}) cognified.") | ||
document_id = await add_document_node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variable document_id
is assigned but never used in the process_text
function.
- document_id = await add_document_node(
- graph_client,
- parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}",
- document_metadata = file_metadata,
- )
Consider removing this line if document_id
is not needed, or ensure it is used appropriately if intended.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
document_id = await add_document_node( |
cognee/api/v1/cognify/cognify.py
Outdated
# categories = classified_categories, | ||
# ) | ||
|
||
classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variable classified_categories
is assigned but never used in the process_text
function.
- classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]
Consider removing this line if classified_categories
is not needed, or ensure it is used appropriately if intended.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cognee/api/v1/cognify/cognify.py (3 hunks)
- cognee/config.py (3 hunks)
- cognee/modules/cognify/graph/add_document_node.py (2 hunks)
Files skipped from review as they are similar to previous changes (2)
- cognee/config.py
- cognee/modules/cognify/graph/add_document_node.py
Additional Context Used
Ruff (1)
cognee/api/v1/cognify/cognify.py (1)
188-188: Do not use bare
except
Additional comments not posted (1)
cognee/api/v1/cognify/cognify.py (1)
227-255
: Ensure proper asynchronous handling in the main testing block.The main block uses
asyncio.run(test())
which is correct for running async code from a synchronous context. However, ensure that all async functions withintest()
are awaited properly to avoid runtime errors.Verification successful
Ensure proper asynchronous handling in the main testing block.
The main block uses
asyncio.run(test())
which is correct for running async code from a synchronous context. However, ensure that all async functions withintest()
are awaited properly to avoid runtime errors.
- The
test
function incognee/api/v1/cognify/cognify.py
correctly usesawait
for thecognify
andrender_graph
functions.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify proper use of async/await in the main testing block. # Test: Search for missing 'await' in async function calls within the main block. ast-grep --lang python --pattern $'if __name__ == "__main__" { $$$ async def test() { $$$ } $$$ }'Length of output: 444
Script:
#!/bin/bash # Description: Verify proper use of async/await in the main testing block. # Test: Search for missing 'await' in async function calls within the main block. ast-grep --lang python --pattern $'if __name__ == "__main__":\n async def test():\n $$$\n $$$'Length of output: 4501
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- cognee/api/v1/cognify/cognify.py (3 hunks)
- cognee/api/v1/topology/add_topology.py (1 hunks)
Additional Context Used
Ruff (2)
cognee/api/v1/cognify/cognify.py (1)
188-188: Do not use bare
except
cognee/api/v1/topology/add_topology.py (1)
16-16: Local variable
graph_topology
is assigned to but never used
|
||
graph_client = await get_graph_client(graph_db_type) | ||
|
||
graph_topology = infrastructure_config.get_config()["graph_topology"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unused variable graph_topology
.
- graph_topology = infrastructure_config.get_config()["graph_topology"]
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
graph_topology = infrastructure_config.get_config()["graph_topology"] |
engine = TopologyEngine() | ||
topology = await engine.infer_from_directory_structure(node_id =USER_ID , repository = directory, model=model) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Ensure proper error handling in infer_from_directory_structure
The function infer_from_directory_structure
is defined in cognee/modules/topology/topology.py
. However, the search for error handling patterns (e.g., try:
blocks) did not yield any results within this function. This indicates that the function might lack proper error handling mechanisms.
Action Required:
- Add appropriate error handling (e.g.,
try-except
blocks) within theinfer_from_directory_structure
function to ensure robustness and reliability.
Location:
cognee/modules/topology/topology.py
Example:
async def infer_from_directory_structure(self, node_id: str, repository: str, model):
try:
# Your existing code logic
except Exception as e:
# Handle exceptions appropriately
logger.error(f"Error in infer_from_directory_structure: {e}")
raise
Please review and implement the necessary changes.
Analysis chain
Ensure proper error handling in infer_from_directory_structure
.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify error handling in the function `infer_from_directory_structure`.
# Test: Search for the function implementation and check for error handling. Expect: Proper try-except blocks.
ast-grep --lang python --pattern $'async def infer_from_directory_structure($_, $_, $_) {
$$$
}'
Length of output: 584
Script:
#!/bin/bash
# Description: Verify error handling in the function `infer_from_directory_structure`.
# Test: Search for the function implementation and check for error handling. Expect: Proper try-except blocks.
ast-grep --lang python --pattern $'async def infer_from_directory_structure(node_id, repository, model):\n $$$'
Length of output: 114
Script:
#!/bin/bash
# Description: Locate the function `infer_from_directory_structure` and verify error handling.
# Step 1: Locate the function definition
rg 'async def infer_from_directory_structure' -t py
# Step 2: Check for error handling within the function
rg 'try:' -A 10 -B 10 -t py
Length of output: 50516
def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]: | ||
"""Flatten a single Pydantic model to a dictionary handling nested structures.""" | ||
result = {**model.dict(), 'parent_id': parent_id} | ||
if hasattr(model, 'default_relationship') and model.default_relationship: | ||
result.update({ | ||
'relationship_type': model.default_relationship.type, | ||
'relationship_source': model.default_relationship.source, | ||
'relationship_target': model.default_relationship.target | ||
}) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the flatten_model
function to improve readability and maintainability.
- result = {**model.dict(), 'parent_id': parent_id}
- if hasattr(model, 'default_relationship') and model.default_relationship:
- result.update({
- 'relationship_type': model.default_relationship.type,
- 'relationship_source': model.default_relationship.source,
- 'relationship_target': model.default_relationship.target
- })
+ result = model.dict()
+ result['parent_id'] = parent_id
+ if model.default_relationship:
+ result['relationship_type'] = model.default_relationship.type
+ result['relationship_source'] = model.default_relationship.source
+ result['relationship_target'] = model.default_relationship.target
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]: | |
"""Flatten a single Pydantic model to a dictionary handling nested structures.""" | |
result = {**model.dict(), 'parent_id': parent_id} | |
if hasattr(model, 'default_relationship') and model.default_relationship: | |
result.update({ | |
'relationship_type': model.default_relationship.type, | |
'relationship_source': model.default_relationship.source, | |
'relationship_target': model.default_relationship.target | |
}) | |
return result | |
def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]: | |
"""Flatten a single Pydantic model to a dictionary handling nested structures.""" | |
result = model.dict() | |
result['parent_id'] = parent_id | |
if model.default_relationship: | |
result['relationship_type'] = model.default_relationship.type | |
result['relationship_source'] = model.default_relationship.source | |
result['relationship_target'] = model.default_relationship.target | |
return result |
def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]: | ||
"""Recursively flatten nested Pydantic models or lists of models.""" | ||
if isinstance(items, list): | ||
return [entry for item in items for entry in recursive_flatten(item, parent_id)] | ||
elif isinstance(items, BaseModel): | ||
flat = [flatten_model(items, parent_id)] | ||
for field, value in items: | ||
if isinstance(value, (BaseModel, list)): | ||
flat.extend(recursive_flatten(value, items.dict().get('node_id', None))) | ||
return flat | ||
else: | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize the recursive_flatten
function to handle different data types more efficiently.
- if isinstance(items, list):
- return [entry for item in items for entry in recursive_flatten(item, parent_id)]
- elif isinstance(items, BaseModel):
- flat = [flatten_model(items, parent_id)]
- for field, value in items:
- if isinstance(value, (BaseModel, list)):
- flat.extend(recursive_flatten(value, items.dict().get('node_id', None)))
- return flat
- else:
- return []
+ return [flatten_model(item, parent_id) if isinstance(item, BaseModel) else recursive_flatten(item, parent_id) for item in items] if isinstance(items, list) else [flatten_model(items, parent_id)]
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]: | |
"""Recursively flatten nested Pydantic models or lists of models.""" | |
if isinstance(items, list): | |
return [entry for item in items for entry in recursive_flatten(item, parent_id)] | |
elif isinstance(items, BaseModel): | |
flat = [flatten_model(items, parent_id)] | |
for field, value in items: | |
if isinstance(value, (BaseModel, list)): | |
flat.extend(recursive_flatten(value, items.dict().get('node_id', None))) | |
return flat | |
else: | |
return [] | |
def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]: | |
"""Recursively flatten nested Pydantic models or lists of models.""" | |
return [flatten_model(item, parent_id) if isinstance(item, BaseModel) else recursive_flatten(item, parent_id) for item in items] if isinstance(items, list) else [flatten_model(items, parent_id)] |
if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']): | ||
await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for add_edge
to manage potential failures gracefully.
- if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
- await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
+ if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
+ try:
+ await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
+ except Exception as e:
+ logging.error(f"Failed to add edge: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']): | |
await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type']) | |
if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']): | |
try: | |
await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type']) | |
except Exception as e: | |
logging.error(f"Failed to add edge: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- .github/workflows/test_common.yml (2 hunks)
Files skipped from review due to trivial changes (1)
- .github/workflows/test_common.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Out of diff range and nitpick comments (5)
cognee/infrastructure/llm/openai/adapter.py (4)
Line range hint
23-25
: Encapsulate API key management to enhance security and maintainability.Direct manipulation of
openai.api_key
in the constructor can lead to security risks and makes the code less maintainable. Consider using a method or a separate configuration manager to handle API keys.
Line range hint
29-31
: Make retry attempts configurable to adapt to different runtime environments.The number of retry attempts is hard-coded in several methods. Making this configurable via an environment variable or configuration file would allow easier adjustments based on the deployment environment.
Also applies to: 33-35, 37-39, 41-43, 45-47
Line range hint
37-39
: Use a constant for the default model to avoid repetition and potential errors.The default model string
"text-embedding-3-large"
is repeated in multiple methods. Defining this as a constant at the class or module level would reduce repetition and help prevent errors if the model name changes.Also applies to: 41-43
Line range hint
57-59
: Improve the formatting of the user input in the message for clarity.The user input is embedded in a larger string which might make it hard to read. Consider formatting it on a new line or using a clearer delimiter.
cognee/infrastructure/InfrastructureConfig.py (1)
Line range hint
36-149
: Add error handling and validation for configuration parameters to enhance robustness.The
get_config
andset_config
methods handle a wide range of configurations. Adding error handling and validation for the input parameters would make these methods more robust and prevent potential runtime errors.Also applies to: 204-206
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (2)
docker-compose.yml
is excluded by!**/*.yml
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (4)
- cognee/config.py (3 hunks)
- cognee/infrastructure/InfrastructureConfig.py (4 hunks)
- cognee/infrastructure/llm/openai/adapter.py (1 hunks)
- cognee/shared/data_models.py (7 hunks)
Files skipped from review as they are similar to previous changes (2)
- cognee/config.py
- cognee/shared/data_models.py
Additional comments not posted (1)
cognee/infrastructure/InfrastructureConfig.py (1)
35-36
: Ensure consistent handling ofgraph_topology
across different methods.The
graph_topology
attribute is set in multiple places within theInfrastructureConfig
class. Verify that it is consistently handled across all methods to avoid configuration inconsistencies.Also applies to: 82-84, 149-149, 204-206
if config.monitoring_tool == MonitoringTool.LANGFUSE: | ||
from langfuse.openai import AsyncOpenAI, OpenAI | ||
else: | ||
from openai import AsyncOpenAI, OpenAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more dynamic import strategy for AsyncOpenAI
and OpenAI
.
Using conditional imports based on configuration can lead to code that is harder to maintain and test. Consider using a factory pattern or dependency injection to handle different configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
- cognee/infrastructure/llm/openai/adapter.py (1 hunks)
- cognee/shared/data_models.py (7 hunks)
Files skipped from review as they are similar to previous changes (3)
- cognee/infrastructure/llm/generic_llm_api/adapter.py
- cognee/infrastructure/llm/openai/adapter.py
- cognee/shared/data_models.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (2)
evals/soldiers_home.pdf
is excluded by!**/*.pdf
,!**/*.pdf
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (6)
- cognee/api/v1/cognify/cognify.py (3 hunks)
- cognee/modules/cognify/graph/add_data_chunks.py (3 hunks)
- evals/generate_test_set.py (1 hunks)
- evals/natural_language_processing.txt (1 hunks)
- evals/simple_rag_vs_cognee_eval.py (1 hunks)
- evals/trump.txt (1 hunks)
Files skipped from review due to trivial changes (1)
- evals/natural_language_processing.txt
Files skipped from review as they are similar to previous changes (1)
- cognee/modules/cognify/graph/add_data_chunks.py
Additional Context Used
Ruff (15)
cognee/api/v1/cognify/cognify.py (4)
110-110: Local variable
added_chunks
is assigned to but never used
111-111: Local variable
added__basic_rag_chunks
is assigned to but never used
188-188: Do not use bare
except
263-263: Local variable
graph
is assigned to but never usedevals/generate_test_set.py (4)
34-34: Module level import not at top of file
35-35: Module level import not at top of file
36-36: Module level import not at top of file
41-41: Module level import not at top of file
evals/simple_rag_vs_cognee_eval.py (7)
11-11: Module level import not at top of file
26-26: Module level import not at top of file
27-27: Module level import not at top of file
27-27: Redefinition of unused
List
from line 5
28-28: Module level import not at top of file
33-33: Local variable
response
is annotated but never used
71-71: Module level import not at top of file
Additional comments not posted (3)
evals/generate_test_set.py (2)
1-6
: Imports and dotenv setup are correctly placed at the top of the file.
23-27
: The methodgenerate_goldens_from_docs
is used to generate goldens from documents. Ensure that the document paths and parameters are correctly configured for your use case.evals/trump.txt (1)
1-15
: The content of the file provides a detailed account of Donald Trump's speech. Ensure that all factual information is accurate and up-to-date.
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5) | ||
|
||
from deepeval import evaluate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instantiation of AnswerRelevancyMetric
and the import of evaluate
are correctly placed. However, consider moving all imports to the top of the file.
- from deepeval import evaluate
+ # Move this import to the top of the file
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5) | |
from deepeval import evaluate | |
from deepeval import evaluate | |
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5) |
evals/simple_rag_vs_cognee_eval.py
Outdated
dataset.test_cases = convert_goldens_to_test_cases(dataset.goldens) | ||
|
||
|
||
from deepeval.metrics import HallucinationMetric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import of HallucinationMetric
should be moved to the top of the file to maintain consistency and readability.
- from deepeval.metrics import HallucinationMetric
+ # Move this import to the top of the file
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from deepeval.metrics import HallucinationMetric |
except: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using a bare except
statement as it can catch unexpected exceptions and obscure real errors. Specify the exception type or handle the general exception more gracefully.
- except:
+ except Exception as e:
+ logger.error("Failed during cognitive layer processing: %s", str(e))
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
except: | |
pass | |
except Exception as e: | |
logger.error("Failed during cognitive layer processing: %s", str(e)) |
cognee/api/v1/cognify/cognify.py
Outdated
|
||
config.set_graph_model(SourceCodeGraph) | ||
config.set_classification_model(CodeContentPrediction) | ||
graph = await cognify() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local variable graph
is assigned but never used. Consider removing this assignment if it's not needed.
- graph = await cognify()
+ # Consider removing this line if 'graph' is not used.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
graph = await cognify() | |
# Consider removing this line if 'graph' is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- evals/generate_test_set.py (1 hunks)
- evals/simple_rag_vs_cognee_eval.py (1 hunks)
Additional Context Used
Ruff (10)
evals/generate_test_set.py (4)
36-36: Module level import not at top of file
37-37: Module level import not at top of file
38-38: Module level import not at top of file
43-43: Module level import not at top of file
evals/simple_rag_vs_cognee_eval.py (6)
11-11: Module level import not at top of file
44-44: Module level import not at top of file
45-45: Module level import not at top of file
45-45: Redefinition of unused
List
from line 5
46-46: Module level import not at top of file
97-97: Module level import not at top of file
import pytest | ||
from deepeval import assert_test | ||
from deepeval.metrics import AnswerRelevancyMetric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move module-level imports to the top of the file to improve readability and maintain Python conventions.
- import pytest
- from deepeval import assert_test
- from deepeval.metrics import AnswerRelevancyMetric
- from deepeval import evaluate
+ import pytest
+ from deepeval import assert_test, evaluate
+ from deepeval.metrics import AnswerRelevancyMetric
Also applies to: 43-43
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
import pytest | |
from deepeval import assert_test | |
from deepeval.metrics import AnswerRelevancyMetric | |
import pytest | |
from deepeval import assert_test, evaluate | |
from deepeval.metrics import AnswerRelevancyMetric |
# synthesizer = Synthesizer() | ||
# synthesizer.generate_goldens_from_docs( | ||
# document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'], | ||
# max_goldens_per_document=5, | ||
# num_evolutions=5, | ||
# include_expected_output=True, | ||
# enable_breadth_evolve=True, | ||
# ) | ||
# | ||
# synthesizer.save_as( | ||
# file_type='json', # or 'csv' | ||
# directory="./synthetic_data" | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing or refactoring the commented-out code if it's no longer needed.
It's generally a good practice to keep the codebase clean by removing unused code blocks. This can help improve the maintainability of the code.
|
||
|
||
import logging | ||
from typing import List, Dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the redundant import of List
as it is already imported earlier in the file.
- from typing import List, Dict
+ from typing import Dict
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from typing import List, Dict | |
from typing import Dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (3)
docker-compose.yml
is excluded by!**/*.yml
evals/test_datasets/initial_test/soldiers_home.pdf
is excluded by!**/*.pdf
,!**/*.pdf
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (4)
- cognee/infrastructure/databases/graph/falkordb/adapter.py (1 hunks)
- evals/simple_rag_vs_cognee_eval.py (1 hunks)
- evals/test_datasets/initial_test/natural_language_processing.txt (1 hunks)
- evals/test_datasets/initial_test/trump.txt (1 hunks)
Additional Context Used
LanguageTool (5)
evals/test_datasets/initial_test/natural_language_processing.txt (1)
Near line 1: Possible missing comma found.
Context: ...formation and insights contained in the documents as well as categorize and organize the ...evals/test_datasets/initial_test/trump.txt (4)
Near line 1: The preposition ‘to’ seems more likely in this position.
Context: ...onstitution – during a bombastic speech for the National Rifle Association in which...
Near line 5: Consider a different adjective to strengthen your wording.
Context: ...t all. I intend to serve four years and do a great job.” The 22nd amendment, which was enacte...
Near line 9: The preposition “about” seems more likely in this position.
Context: ... In his speech to the NRA, Trump spoke on abortion, immigration and criticized Ro...
Near line 9: Possible spelling mistake found.
Context: ...on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”....
Ruff (7)
evals/simple_rag_vs_cognee_eval.py (7)
11-11: Module level import not at top of file
44-44: Module level import not at top of file
45-45: Module level import not at top of file
45-45: Redefinition of unused
List
from line 5
46-46: Module level import not at top of file
79-79: Local variable
graph
is assigned to but never used
131-131: Undefined name
run_cognify_base_rag_and_search
Additional comments not posted (4)
evals/test_datasets/initial_test/natural_language_processing.txt (1)
2-2
: This paragraph clearly outlines the challenges in NLP.evals/test_datasets/initial_test/trump.txt (1)
5-5
: The adjective "great" adequately conveys the intended meaning here.evals/simple_rag_vs_cognee_eval.py (1)
131-131
: Verify the existence ofrun_cognify_base_rag_and_search
. It appears to be called but is not defined or imported in this file.cognee/infrastructure/databases/graph/falkordb/adapter.py (1)
13-186
: TheFalcorDBAdapter
class is well-implemented with clear, asynchronous database operation methods and proper error handling.
@@ -0,0 +1,2 @@ | |||
Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comma after "documents" for better readability.
- ...contained in the documents as well as categorize...
+ ...contained in the documents, as well as categorize...
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves. | |
Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents, as well as categorize and organize the documents themselves. |
|
||
The 22nd amendment, which was enacted following Franklin Delano Rosevelt’s fourth term, limits the presidency to two terms. | ||
|
||
In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing "on" to "about" to clarify the topics discussed.
- ...Trump spoke on abortion...
+ ...Trump spoke about abortion...
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial. | |
In his speech to the NRA, Trump spoke about abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial. |
@@ -0,0 +1,15 @@ | |||
Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing "for" to "to" to clarify the context of the speech.
- ...speech for the National Rifle Association...
+ ...speech to the National Rifle Association...
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration. | |
Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech to the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (3)
cognee/modules/search/llm/extraction/categorize_relevant_category.py (1)
6-17
: The implementation ofcategorize_relevant_category
looks solid and follows async programming practices well.Consider reviewing the use of
cognee/modules/search/graph/search_categories.py (1)
Line range hint
18-52
: The enhancements tosearch_categories
to support different graph engines and configurations are well-implemented.Consider adding more detailed docstrings to explain the parameters and the return type more clearly.
cognee/api/v1/cognify/cognify.py (1)
83-149
: The modifications to thecognify
function to handle different datasets and graph interactions are comprehensive and well-implemented.Consider adding more comments to explain complex sections of the code for better maintainability.
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- cognee/api/v1/cognify/cognify.py (3 hunks)
- cognee/infrastructure/InfrastructureConfig.py (5 hunks)
- cognee/modules/search/graph/search_categories.py (2 hunks)
- cognee/modules/search/llm/extraction/categorize_relevant_category.py (1 hunks)
- evals/simple_rag_vs_cognee_eval.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- cognee/infrastructure/InfrastructureConfig.py
Additional Context Used
Ruff (11)
cognee/api/v1/cognify/cognify.py (3)
110-110: Local variable
added_chunks
is assigned to but never used
111-111: Local variable
added__basic_rag_chunks
is assigned to but never used
188-188: Do not use bare
except
cognee/modules/search/graph/search_categories.py (2)
9-9: Module level import not at top of file
10-10: Module level import not at top of file
evals/simple_rag_vs_cognee_eval.py (6)
11-11: Module level import not at top of file
44-44: Module level import not at top of file
45-45: Module level import not at top of file
45-45: Redefinition of unused
List
from line 5
46-46: Module level import not at top of file
79-79: Local variable
graph
is assigned to but never used
Additional comments not posted (3)
cognee/modules/search/graph/search_categories.py (1)
12-16
: Thestrip_exact_regex
function is a useful utility for regex operations.evals/simple_rag_vs_cognee_eval.py (1)
1-10
: The new imports for handling evaluation datasets are correctly placed and used.cognee/api/v1/cognify/cognify.py (1)
Line range hint
1-38
: The new imports and initial setup incognify.py
are well-organized and relevant to the functionality.
from cognee.infrastructure.llm.get_llm_client import get_llm_client | ||
|
||
dataset = EvaluationDataset() | ||
dataset.add_test_cases_from_json_file( | ||
# file_path is the absolute path to you .json file | ||
file_path="./synthetic_data/20240519_185842.json", | ||
input_key_name="input", | ||
actual_output_key_name="actual_output", | ||
expected_output_key_name="expected_output", | ||
context_key_name="context" | ||
) | ||
|
||
print(dataset) | ||
# from deepeval.synthesizer import Synthesizer | ||
# | ||
# synthesizer = Synthesizer(model="gpt-3.5-turbo") | ||
# | ||
# dataset = EvaluationDataset() | ||
# dataset.generate_goldens_from_docs( | ||
# synthesizer=synthesizer, | ||
# document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'], | ||
# max_goldens_per_document=10, | ||
# num_evolutions=5, | ||
# enable_breadth_evolve=True, | ||
# ) | ||
|
||
|
||
print(dataset.goldens) | ||
print(dataset) | ||
|
||
|
||
|
||
|
||
import logging | ||
from typing import List, Dict | ||
from cognee.infrastructure import infrastructure_config | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
class AnswerModel(BaseModel): | ||
response:str | ||
def get_answer_base(content: str, context:str, response_model: Type[BaseModel]): | ||
llm_client = get_llm_client() | ||
|
||
system_prompt = "THIS IS YOUR CONTEXT:" + str(context) | ||
|
||
return llm_client.create_structured_output(content, system_prompt, response_model) | ||
def get_answer(content: str,context, model: Type[BaseModel]= AnswerModel): | ||
|
||
try: | ||
return (get_answer_base( | ||
content, | ||
context, | ||
model | ||
)) | ||
except Exception as error: | ||
logger.error("Error extracting cognitive layers from content: %s", error, exc_info = True) | ||
raise error | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of dataset handling and model testing functions is robust.
Consider using more specific exception types in the get_answer
function to handle errors more gracefully.
- except Exception as error:
+ except SpecificExceptionType as error:
Committable suggestion was skipped due low confidence.
async def process_text(chunk_collection: str, chunk_id: str, input_text: str, file_metadata: dict, document_id: str): | ||
print(f"Processing chunk ({chunk_id}) from document ({file_metadata['id']}).") | ||
|
||
graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"]) | ||
|
||
classified_categories = await get_content_categories(input_text) | ||
|
||
graph_topology = infrastructure_config.get_config()["graph_model"] | ||
if graph_topology == SourceCodeGraph: | ||
classified_categories = [{'data_type': 'text', 'category_name': 'Code and functions'}] | ||
elif graph_topology == KnowledgeGraph: | ||
classified_categories = await get_content_categories(input_text) | ||
else: | ||
classified_categories = [{'data_type': 'text', 'category_name': 'Unclassified text'}] | ||
|
||
# await add_label_nodes(graph_client, document_id, chunk_id, file_metadata["keywords"].split("|")) | ||
|
||
await add_classification_nodes( | ||
graph_client, | ||
parent_node_id = document_id, | ||
categories = classified_categories, | ||
) | ||
graph_client, | ||
parent_node_id = document_id, | ||
categories = classified_categories, | ||
) | ||
|
||
print(f"Chunk ({chunk_id}) classified.") | ||
|
||
content_summary = await get_content_summary(input_text) | ||
await add_summary_nodes(graph_client, document_id, content_summary) | ||
|
||
print(f"Chunk ({chunk_id}) summarized.") | ||
|
||
cognitive_layers = await get_cognitive_layers(input_text, classified_categories) | ||
cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2] | ||
|
||
layer_graphs = await get_layer_graphs(input_text, cognitive_layers) | ||
await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs) | ||
|
||
if infrastructure_config.get_config()["connect_documents"] is True: | ||
db_engine = infrastructure_config.get_config()["database_engine"] | ||
relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = raw_document_id) | ||
|
||
list_of_nodes = [] | ||
|
||
relevant_documents_to_connect.append({ | ||
"layer_id": document_id, | ||
}) | ||
cognitive_layers = cognitive_layers[:config.cognitive_layers_limit] | ||
|
||
try: | ||
cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2] | ||
print("cognitive_layers", cognitive_layers) | ||
layer_graphs = await get_layer_graphs(input_text, cognitive_layers) | ||
await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs) | ||
except: | ||
pass | ||
|
||
|
||
|
||
|
||
|
||
# if infrastructure_config.get_config()["connect_documents"] is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process_text
function includes robust error handling and detailed logging.
Consider using specific exception types instead of a bare except
to handle errors more gracefully.
- except:
+ except SpecificExceptionType:
Committable suggestion was skipped due low confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Outside diff range and nitpick comments (1)
cognee/modules/search/vector/search_similarity.py (1)
14-14
: Consider removing debug print statements or replacing them with proper logging mechanisms before deploying to production.Also applies to: 23-24
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- cognee/api/v1/cognify/cognify.py (3 hunks)
- cognee/api/v1/search/search.py (3 hunks)
- cognee/modules/search/graph/search_adjacent.py (1 hunks)
- cognee/modules/search/graph/search_categories.py (2 hunks)
- cognee/modules/search/graph/search_cypher.py (1 hunks)
- cognee/modules/search/graph/search_neighbour.py (2 hunks)
- cognee/modules/search/llm/extraction/categorize_relevant_category.py (1 hunks)
- cognee/modules/search/vector/bm25.py (1 hunks)
- cognee/modules/search/vector/fusion.py (1 hunks)
- cognee/modules/search/vector/search_similarity.py (1 hunks)
- evals/simple_rag_vs_cognee_eval.py (1 hunks)
Files skipped from review due to trivial changes (2)
- cognee/modules/search/vector/bm25.py
- cognee/modules/search/vector/fusion.py
Files skipped from review as they are similar to previous changes (1)
- cognee/modules/search/llm/extraction/categorize_relevant_category.py
Additional Context Used
Ruff (10)
cognee/api/v1/cognify/cognify.py (2)
111-111: Local variable
added__basic_rag_chunks
is assigned to but never used
188-188: Do not use bare
except
cognee/modules/search/graph/search_categories.py (2)
10-10: Module level import not at top of file
11-11: Module level import not at top of file
evals/simple_rag_vs_cognee_eval.py (6)
11-11: Module level import not at top of file
44-44: Module level import not at top of file
45-45: Module level import not at top of file
45-45: Redefinition of unused
List
from line 5
46-46: Module level import not at top of file
79-79: Local variable
graph
is assigned to but never used
|
||
await add("data://test_datasets", "initial_test") | ||
|
||
graph = await cognify("initial_test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variable graph
is assigned to but never used.
- graph = await cognify("initial_test")
+ # Consider removing this line if 'graph' is not used.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
graph = await cognify("initial_test") | |
# Consider removing this line if 'graph' is not used. |
logger = logging.getLogger(__name__) | ||
|
||
class AnswerModel(BaseModel): | ||
response:str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local variable response
is annotated but never used within the AnswerModel
. Consider removing it if it's not needed.
- response:str
+ # Remove this line if 'response' is not used.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
response:str | |
# Remove this line if 'response' is not used. |
import logging | ||
from typing import List, Dict | ||
from cognee.infrastructure import infrastructure_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidate module-level imports and remove redundant import of List
.
- import logging
- from typing import List, Dict
- from cognee.infrastructure import infrastructure_config
+ import logging
+ from typing import Dict
+ from cognee.infrastructure import infrastructure_config
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
import logging | |
from typing import List, Dict | |
from cognee.infrastructure import infrastructure_config | |
import logging | |
from typing import Dict | |
from cognee.infrastructure import infrastructure_config |
import dotenv | ||
dotenv.load_dotenv() | ||
|
||
from cognee.infrastructure.llm.get_llm_client import get_llm_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move module-level imports to the top of the file.
- from cognee.infrastructure.llm.get_llm_client import get_llm_client
+ # Move this import to the top of the file
Committable suggestion was skipped due low confidence.
except FileTypeException: | ||
logger.warning("File (%s) has an unknown file type. We are skipping it.", file_metadata["id"]) | ||
|
||
added_chunks: list[tuple[str, str, dict]] = await add_data_chunks(data_chunks) | ||
added_chunks = await add_data_chunks(data_chunks) | ||
added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local variable added__basic_rag_chunks
is assigned to but never used.
- added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)
+ # Consider removing this line if 'added__basic_rag_chunks' is not used.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks) | |
# added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks) |
else: | ||
raise ValueError("Unsupported graph engine type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider enhancing the error message to specify which graph engine type was attempted.
- raise ValueError("Unsupported graph engine type.")
+ raise ValueError(f"Unsupported graph engine type: {infrastructure_config.get_config()['graph_engine']}")
Committable suggestion was skipped due low confidence.
@@ -22,7 +22,7 @@ async def search_adjacent(graph: Union[nx.Graph, any], query: str, infrastructur | |||
|
|||
if node_id is None: | |||
return {} | |||
|
|||
from cognee.infrastructure import infrastructure_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import statement for infrastructure_config
. Consider removing it to clean up the code.
Enhance the error message to specify which graph engine type was attempted.
- raise ValueError("Unsupported graph engine type in the configuration.")
+ raise ValueError(f"Unsupported graph engine type in the configuration: {infrastructure_config.get_config()['graph_engine']}")
Committable suggestion was skipped due low confidence.
@@ -20,26 +22,23 @@ async def search_neighbour(graph: Union[nx.Graph, any], id: str, infrastructure_ | |||
Returns: | |||
- List[str]: A list of 'description' attributes of nodes that share the same 'layer_uuid' with the specified node. | |||
""" | |||
node_id = other_param.get('node_id') if other_param else None | |||
from cognee.infrastructure import infrastructure_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import statement for infrastructure_config
. Consider removing it to clean up the code.
Enhance the error message to specify which graph engine type was attempted.
- raise ValueError("Unsupported graph engine type in the configuration.")
+ raise ValueError(f"Unsupported graph engine type in the configuration: {infrastructure_config.get_config()['graph_engine']}")
Committable suggestion was skipped due low confidence.
@@ -21,9 +36,22 @@ async def search_categories(graph: Union[nx.Graph, any], query_label: str, infra | |||
each representing a node with 'nodeId' and 'summary'. | |||
""" | |||
# Determine which client is in use based on the configuration | |||
from cognee.infrastructure import infrastructure_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import statement for infrastructure_config
. Consider removing it to clean up the code.
Enhance the error message to specify which graph engine type was attempted.
- raise ValueError("Unsupported graph engine type.")
+ raise ValueError(f"Unsupported graph engine type: {infrastructure_config.get_config()['graph_engine']}")
Committable suggestion was skipped due low confidence.
SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION' | ||
NODE_CLASSIFICATION = 'NODE_CLASSIFICATION' | ||
DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION', | ||
CYPHER = 'CYPHER' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a default case in the from_str
method to handle unexpected search types more gracefully.
- raise ValueError(f"{name} is not a valid SearchType")
+ raise ValueError(f"{name} is not a valid SearchType. Available types: {', '.join([e.name for e in SearchType])}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION' | |
NODE_CLASSIFICATION = 'NODE_CLASSIFICATION' | |
DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION', | |
CYPHER = 'CYPHER' | |
SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION' | |
NODE_CLASSIFICATION = 'NODE_CLASSIFICATION' | |
DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION', | |
CYPHER = 'CYPHER' |
No description provided.