Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continue.dev integration WIP #89

Merged
merged 23 commits into from
May 22, 2024
Merged

continue.dev integration WIP #89

merged 23 commits into from
May 22, 2024

Conversation

Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Apr 28, 2024

No description provided.

Copy link
Contributor

coderabbitai bot commented Apr 28, 2024

Walkthrough

The updates bring significant enhancements across various modules, focusing on document processing, graph topology, category search, and evaluation metrics. These changes include new imports, function modifications, attribute initialization, and logic improvements for categorizing relevant categories. The system's capabilities in document processing, search functionality, and evaluation have been significantly enriched.

Changes

File Path Change Summary
cognee/api/v1/cognify/cognify.py Added imports, initialized graph_topology, modified function calls, and commented out cognitive layers and telemetry.
cognee/infrastructure/InfrastructureConfig.py Added and initialized graph_topology and monitoring_tool attributes, updated configuration methods.
cognee/modules/search/graph/search_categories.py Added strip_exact_regex function, updated imports, modified search_categories function signature and logic.
cognee/modules/search/llm/extraction/... Introduced categorize_relevant_category function for categorizing relevant categories using a language model.
cognee/infrastructure/databases/graph/falkordb/... Enhanced FalkorDBAdapter with methods for adding nodes/edges, extracting nodes, filtering, and deleting nodes/graph.
evals/generate_test_set.py Introduced functionality for generating evaluation datasets, synthesizing data, and evaluating metrics using deepeval.
evals/simple_rag_vs_cognee_eval.py Added functions for evaluating models, retrieving answers, converting goldens to test cases, and evaluating metrics.
evals/test_datasets/initial_test/natural_language_processing.txt Introduced an overview of Natural Language Processing (NLP).
evals/test_datasets/initial_test/trump.txt Discussed Donald Trump's remarks on presidency, gun safety, and the NRA convention.

In the land of code, where data flows,
🐰 A rabbit tweaks where logic grows,
With nodes and graphs, a structure grand,
📚 Texts and queries hand in hand.
Evaluation metrics, precise and neat,
🌟 Enhancing systems, a coder's feat.
Celebrate the changes, big and small,
For in this realm, we stand tall! 🚀


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

gitguardian bot commented Apr 28, 2024

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
10531072 Triggered Generic High Entropy Secret fec7010 cognee/utils.py View secret
10531072 Triggered Generic High Entropy Secret fec7010 cognee/utils.py View secret
10593973 Triggered Generic High Entropy Secret 79311ee notebooks/full_run.ipynb View secret
10593973 Triggered Generic High Entropy Secret 8e850c1 notebooks/full_run.ipynb View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Out of diff range and nitpick comments (1)
notebooks/full_run.ipynb (1)

81-81: Re-importing SourceCodeGraph and reconfiguring the graph model is redundant. Consider streamlining the notebook by removing repeated code to enhance readability and maintainability.

Comment on lines +1 to +11
'''
Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure proper pairing of triple quotes in the comments for clarity.

- '''
+ """

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
'''
Given a string, find the length of the longest substring without repeating characters.
Examples:
Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
'''
"""
Given a string, find the length of the longest substring without repeating characters.
Examples:
Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
"""

Comment on lines +32 to +43
type_node_id = generate_node_id(node.type)
type_node = await graph_client.extract_node(type_node_id)

if not entity_type_node:
node_name = node.entity_type.lower().capitalize()
if not type_node:
node_name = node.type.lower().capitalize()

entity_type_node = (
entity_type_node_id,
type_node = (
type_node_id,
dict(
id = entity_type_node_id,
id = type_node_id,
name = node_name,
entity_type = node_name,
type = node_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of type nodes with standardized 'type' and 'name' properties aligns with the PR's objectives. Consider using a more precise timestamp format or a timezone-aware datetime object for 'created_at' and 'updated_at' to avoid potential issues with time zone differences.

- created_at = datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
- updated_at = datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+ from datetime import datetime, timezone
+ created_at = datetime.now(timezone.utc).isoformat(),
+ updated_at = datetime.now(timezone.utc).isoformat(),

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
type_node_id = generate_node_id(node.type)
type_node = await graph_client.extract_node(type_node_id)
if not entity_type_node:
node_name = node.entity_type.lower().capitalize()
if not type_node:
node_name = node.type.lower().capitalize()
entity_type_node = (
entity_type_node_id,
type_node = (
type_node_id,
dict(
id = entity_type_node_id,
id = type_node_id,
name = node_name,
entity_type = node_name,
type = node_name,
type_node_id = generate_node_id(node.type)
type_node = await graph_client.extract_node(type_node_id)
if not type_node:
node_name = node.type.lower().capitalize()
type_node = (
type_node_id,
dict(
id = type_node_id,
name = node_name,
type = node_name,
created_at = datetime.now(timezone.utc).isoformat(),
updated_at = datetime.now(timezone.utc).isoformat(),

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Out of diff range and nitpick comments (2)
cognee/modules/search/graph/search_summary.py (1)

Line range hint 18-49: The function search_summary is well-designed to handle different graph engines and provides detailed documentation. However, ensure that the f-string in the Neo4j query section is correctly used or removed if not needed.

-        cypher_query = f"""
+        cypher_query = """
        MATCH (n)
        WHERE n.id CONTAINS $query AND EXISTS(n.summary)
        RETURN n.id AS nodeId, n.summary AS summary
        """
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

Line range hint 134-134: Remove the unused variable error in the exception handling block to clean up the code.

- except Exception as error:
+ except Exception:
  logger.error("Failed to load graph from file: %s", file_path)
  # Initialize an empty graph in case of error
  self.graph = nx.MultiDiGraph()
  await self.save_graph_to_file(file_path)


from cognee.utils import render_graph

await render_graph(graph, include_color=True, include_nodes=True, include_size=True)
await render_graph(graph, include_color=True, include_nodes=False, include_size=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined variable graph used in render_graph.

- await render_graph(graph, include_color=True, include_nodes=False, include_size=False)
+ # Ensure that 'graph' is defined before this line or pass the correct graph object.
+ # await render_graph(graph, include_color=True, include_nodes=False, include_size=False)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
await render_graph(graph, include_color=True, include_nodes=False, include_size=False)
# Ensure that 'graph' is defined before this line or pass the correct graph object.
# await render_graph(graph, include_color=True, include_nodes=False, include_size=False)

import logging
from typing import List, Dict
from cognee.infrastructure import infrastructure_config
from.extraction.categorize_relevant_summary import categorize_relevant_summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the import statement uses correct relative path syntax. It seems there might be a typo with the dot (.) before extraction.

- from.extraction.categorize_relevant_summary import categorize_relevant_summary
+ from extraction.categorize_relevant_summary import categorize_relevant_summary

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from.extraction.categorize_relevant_summary import categorize_relevant_summary
from extraction.categorize_relevant_summary import categorize_relevant_summary


""" Search categories in the graph and return their summary attributes. """

from cognee.shared.data_models import GraphDBType
from cognee.shared.data_models import GraphDBType, DefaultContentPrediction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the import statement for networkx as nx to the top of the file to follow Python's convention of grouping all imports at the beginning.

- import networkx as nx
+ # Moved to top

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from cognee.shared.data_models import GraphDBType, DefaultContentPrediction
from cognee.shared.data_models import GraphDBType, DefaultContentPrediction

},
{
"ename": "CatalogException",
"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog Error: Verify the table name file_metadata.

- train_dataset = cognee.datasets.query_data("short_stories")
+ # Ensure the table name is correct or exists in the database configuration.
+ # train_dataset = cognee.datasets.query_data("short_stories")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",
"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",

"metadata": {},
"outputs": [
{
"ename": "NameError",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NameError: results is not defined.

- descriptions = {node: graph.nodes[node].get('summary', 'No summary available') for node in results}
+ # Ensure that 'results' is defined before this line.
+ # descriptions = {node: graph.nodes[node].get('summary', 'No summary available') for node in results}

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
"ename": "NameError",
"ename": "NameError",

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Out of diff range and nitpick comments (2)
cognee/shared/data_models.py (1)

11-11: Update the type attribute documentation to reflect its purpose more clearly.

cognee/modules/cognify/graph/create.py (1)

Line range hint 183-183: The local variable relationship_data is assigned but never used. This could be a leftover from a previous version of the code. Consider removing it or implementing its intended use.

- relationship_data = {}
Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 8e850c1 and e1a9a23.
Files selected for processing (11)
  • cognee/api/v1/cognify/cognify.py (4 hunks)
  • cognee/api/v1/config/config.py (1 hunks)
  • cognee/config.py (2 hunks)
  • cognee/infrastructure/InfrastructureConfig.py (4 hunks)
  • cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (1 hunks)
  • cognee/infrastructure/files/utils/get_file_metadata.py (1 hunks)
  • cognee/modules/cognify/graph/create.py (2 hunks)
  • cognee/modules/cognify/graph/initialize_graph.py (1 hunks)
  • cognee/shared/GithubTopology.py (1 hunks)
  • cognee/shared/data_models.py (7 hunks)
  • cognee/utils.py (2 hunks)
Files not reviewed due to errors (6)
  • cognee/shared/GithubTopology.py (no review received)
  • cognee/modules/cognify/graph/initialize_graph.py (no review received)
  • cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (no review received)
  • cognee/api/v1/config/config.py (no review received)
  • cognee/config.py (no review received)
  • cognee/api/v1/cognify/cognify.py (no review received)
Additional Context Used
Ruff (4)
cognee/api/v1/cognify/cognify.py (1)

80-80: Local variable graph_topology is assigned to but never used

cognee/infrastructure/files/utils/get_file_metadata.py (1)

24-24: Do not use bare except

cognee/modules/cognify/graph/create.py (1)

183-183: Local variable relationship_data is assigned to but never used

cognee/shared/GithubTopology.py (1)

21-21: Undefined name Directory

Additional comments not posted (2)
cognee/infrastructure/InfrastructureConfig.py (1)

36-36: The implementation of InfrastructureConfig for managing system configurations is robust and well-structured.

cognee/shared/data_models.py (1)

252-254: Ensure that the ResponseSummaryModel class is integrated properly in the system, especially where document IDs and response summaries are handled.

Verification successful

The review of the ResponseSummaryModel integration within the system shows that the model is used in various parts of the system, particularly in functions that generate structured outputs from large language models. The acreate_structured_output and model_dump functions are designed to utilize the model in a generic way, which suggests that the handling of specific fields like document_id and response_summary would depend on the implementation of the ResponseSummaryModel itself.

Given the abstract nature of these functions and the lack of specific errors or misconfigurations in the script outputs, it appears that the integration is as expected. However, for a complete verification, a detailed review of how ResponseSummaryModel is structured and its interaction with these functions would be necessary.

  • cognee/modules/search/graph/search_summary.py: Uses ResponseSummaryModel in categorize_relevant_summary.
  • cognee/infrastructure/llm/: Contains implementations of acreate_structured_output that use ResponseSummaryModel.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `ResponseSummaryModel` in the system.

# Test: Search for the usage of `ResponseSummaryModel`. Expect: Proper handling of document IDs and response summaries.
rg --type py 'ResponseSummaryModel'

Length of output: 397



Script:

#!/bin/bash
# Description: Inspect the usage of `ResponseSummaryModel` in the function `categorize_relevant_summary`.

# Test: Search for the implementation of `categorize_relevant_summary` to see how it handles `document_id` and `response_summary`.
rg --type py 'def categorize_relevant_summary' -A 20

Length of output: 1388



Script:

#!/bin/bash
# Description: Further inspect the usage of `ResponseSummaryModel` within `acreate_structured_output` and `model_dump`.

# Test: Search for the implementation of `acreate_structured_output` and `model_dump` to see how they handle `document_id` and `response_summary`.
rg --type py 'acreate_structured_output' -A 10
rg --type py 'model_dump' -A 10

Length of output: 13097

Comment on lines +24 to +25
except:
keywords = ["no keywords detected" + str(uuid.uuid4())]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify exception type in the except block to avoid masking unexpected errors.

- except:
+ except Exception as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except:
keywords = ["no keywords detected" + str(uuid.uuid4())]
except Exception as e:
keywords = ["no keywords detected" + str(uuid.uuid4())]

@@ -36,13 +34,13 @@ class ChunkStrategy(Enum):
EXACT = "exact"
PARAGRAPH = "paragraph"
SENTENCE = "sentence"
CODE = "code"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Several files in the codebase do not handle the new CODE value in ChunkStrategy enum:

  • DefaultChunkEngine.py: This file handles PARAGRAPH, SENTENCE, and EXACT values but does not include a case for CODE.
Analysis chain

Addition of CODE to ChunkStrategy enum. Ensure that all usages of this enum are updated to handle this new value.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify all usages of `ChunkStrategy` enum handle the new `CODE` value.

# Test: Search for the enum usage. Expect: Handling of the new value in all occurrences.
rg --type py 'ChunkStrategy'

Length of output: 911

Comment on lines +23 to +24
print("Telemetry is disabled.")
logging.info("Telemetry is disabled.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging and printing the same message about telemetry being disabled is redundant. Consider removing the print statement to clean up the console output.

- print("Telemetry is disabled.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
print("Telemetry is disabled.")
logging.info("Telemetry is disabled.")
logging.info("Telemetry is disabled.")

@@ -44,6 +44,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di

# Add an edge if a parent ID is provided and the graph engine is NETWORKX
if parent_id and "default_relationship" in node_data and infrastructure_config.get_config()["graph_engine"] == GraphDBType.NETWORKX:
print("Node id", node_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging the node ID during the edge addition process might not be necessary and could clutter the log files. Consider removing this print statement unless it is essential for debugging.

- print("Node id", node_id)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
print("Node id", node_id)

@@ -103,6 +104,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di


async def add_edge(client, parent_id: Optional[str], node_id: str, node_data: dict, created_node_ids):
print('NODE ID', node_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print statement here seems to be for debugging purposes. It's a good practice to use a logging framework instead of print statements for better control over logging levels and outputs.

- print('NODE ID', node_data)
+ logging.debug('NODE ID: %s', node_data)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
print('NODE ID', node_data)
logging.debug('NODE ID: %s', node_data)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between e1a9a23 and 01446de.
Files selected for processing (5)
  • cognee/api/v1/cognify/cognify.py (4 hunks)
  • cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
  • cognee/modules/topology/extraction/extract_topology.py (1 hunks)
  • cognee/modules/topology/infer_data_topology.py (1 hunks)
  • cognee/modules/topology/topology.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • cognee/infrastructure/llm/prompts/extract_topology.txt
Additional Context Used
Ruff (3)
cognee/api/v1/cognify/cognify.py (1)

88-88: Local variable graph_topology is assigned to but never used

cognee/modules/topology/topology.py (2)

7-7: Redefinition of unused BaseModel from line 4


8-8: Redefinition of unused Dict from line 5

Additional comments not posted (4)
cognee/modules/topology/extraction/extract_topology.py (1)

7-14: The implementation of extract_categories function looks clean and efficient.

cognee/modules/topology/infer_data_topology.py (1)

9-19: The implementation of infer_data_topology function is robust, handling default configurations and exceptions effectively.

cognee/modules/topology/topology.py (1)

71-120: The TopologyEngine class and its methods are well-implemented, providing clear functionality for managing repository topology.

cognee/api/v1/cognify/cognify.py (1)

Line range hint 17-201: The cognify function and the main test function are well-implemented, covering a comprehensive range of functionalities for processing and graph operations.

from pydantic import BaseModel, create_model
from typing import Dict, Type, Any

from pydantic import BaseModel, Field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redefinition of BaseModel from line 4 is unnecessary and can be removed to avoid confusion.

- from pydantic import BaseModel, Field

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from pydantic import BaseModel, Field

from typing import Dict, Type, Any

from pydantic import BaseModel, Field
from typing import Dict, List, Optional, Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redefinition of Dict from line 5 is unnecessary and can be removed to avoid confusion.

- from typing import Dict, List, Optional, Union

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from typing import Dict, List, Optional, Union




graph_topology = infrastructure_config.get_config()["graph_topology"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable graph_topology is assigned but never used, which could be removed to clean up the code.

- graph_topology = infrastructure_config.get_config()["graph_topology"]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
graph_topology = infrastructure_config.get_config()["graph_topology"]

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Out of diff range and nitpick comments (4)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)

Line range hint 6-6: Remove duplicate import of TextEmbedding.

- from fastembed import TextEmbedding

Line range hint 5-5: Remove duplicate import of AsyncOpenAI.

- from openai import AsyncOpenAI
cognee/modules/cognify/graph/add_label_nodes.py (1)

Line range hint 8-8: Remove the unused variable vector_client.

- vector_client = infrastructure_config.get_config("vector_engine")
cognee/modules/cognify/graph/create.py (1)

Line range hint 219-219: Remove the unused variable relationship_data.

- relationship_data = {}
Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 01446de and 7249c24.
Files ignored due to path filters (2)
  • poetry.lock is excluded by !**/*.lock, !**/*.lock
  • pyproject.toml is excluded by !**/*.toml
Files selected for processing (8)
  • cognee/api/v1/cognify/cognify.py (5 hunks)
  • cognee/api/v1/topology/add_topology.py (1 hunks)
  • cognee/infrastructure/data/chunking/DefaultChunkEngine.py (1 hunks)
  • cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
  • cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
  • cognee/modules/cognify/graph/add_label_nodes.py (2 hunks)
  • cognee/modules/cognify/graph/create.py (4 hunks)
  • cognee/modules/topology/topology.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • cognee/infrastructure/data/chunking/DefaultChunkEngine.py
Additional Context Used
LanguageTool (2)
cognee/infrastructure/llm/prompts/extract_topology.txt (2)

Near line 2: Inconsistent capitalization in contraction.
Context: ... the text provided to you. Relationship part'S can't be empty, and have to be logical ...


Near line 2: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS O...

Ruff (11)
cognee/api/v1/cognify/cognify.py (1)

121-121: Local variable graph_topology is assigned to but never used

cognee/api/v1/topology/add_topology.py (2)

16-16: Local variable graph_topology is assigned to but never used


80-80: Local variable graph is assigned to but never used

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)

7-7: Redefinition of unused TextEmbedding from line 6


8-8: Redefinition of unused AsyncOpenAI from line 5

cognee/modules/cognify/graph/add_label_nodes.py (1)

8-8: Local variable vector_client is assigned to but never used

cognee/modules/cognify/graph/create.py (3)

159-159: Local variable created_node_id is assigned to but never used


219-219: Local variable relationship_data is assigned to but never used


254-254: Local variable ids is assigned to but never used

cognee/modules/topology/topology.py (2)

7-7: Redefinition of unused BaseModel from line 4


8-8: Redefinition of unused Dict from line 5

Additional comments not posted (2)
cognee/modules/topology/topology.py (2)

7-7: Remove duplicate import of BaseModel.


8-8: Remove duplicate import of Dict.

@@ -0,0 +1,4 @@
You are a topology master and need to extract the following topology information from the text provided to you.
Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct grammatical issues in the prompt.

- Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY
+ Relationship parts can't be empty and have to be logical and connecting elements of the topology.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY
Relationship parts can't be empty and have to be logical and connecting elements of the topology.

#
# await add("data://" +data_directory_path, "example")

graph = await add_topology()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused variable graph.

- graph = await add_topology()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
graph = await add_topology()
await add_topology()

@@ -109,78 +118,91 @@ async def process_text(chunk_collection: str, chunk_id: str, input_text: str, fi

graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])

graph_topology = infrastructure_config.get_config()["graph_topology"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused variable graph_topology.

- graph_topology = infrastructure_config.get_config()["graph_topology"]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
graph_topology = infrastructure_config.get_config()["graph_topology"]

if isinstance(value, BaseModel):
node_id = await generate_node_id(value)
node_data = value.model_dump()

# Use the specified default relationship for the edge between the parent node and the current node

# Add node to the graph
created_node_id = await add_node(graph_client, parent_id, node_id, node_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused variable created_node_id.

- created_node_id = await add_node(graph_client, parent_id, node_id, node_data)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
created_node_id = await add_node(graph_client, parent_id, node_id, node_data)


for attribute_name, attribute_value in graph_model:
ids = await process_attribute_edge(graph_client, root_id, attribute_name, attribute_value, flattened_and_deduplicated)

ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused variable ids.

- ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)
await process_attribute(graph_client, root_id, attribute_name, attribute_value)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Out of diff range and nitpick comments (4)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)

Line range hint 6-6: Duplicate import of TextEmbedding from fastembed. Remove the redundant import to clean up the code.

- from fastembed import TextEmbedding

Line range hint 5-5: Duplicate import of AsyncOpenAI from openai. It's good practice to avoid duplicate imports to maintain code clarity.

- from openai import AsyncOpenAI
cognee/modules/cognify/graph/add_data_chunks.py (2)

Line range hint 13-13: The variable vector_client is assigned but never used in the uncommented sections of the code. If it's not needed, consider removing it to clean up the code.

- vector_client = infrastructure_config.get_config("vector_engine")

Line range hint 18-59: The commented-out error handling and retry logic could be important for robustness, especially in production environments. Consider implementing a more structured error handling and retry mechanism.

Would you like me to help implement a robust error handling and retry mechanism for this function?

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 7249c24 and 3c261ce.
Files selected for processing (6)
  • cognee/api/v1/cognify/cognify.py (7 hunks)
  • cognee/config.py (3 hunks)
  • cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (1 hunks)
  • cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
  • cognee/modules/cognify/graph/add_data_chunks.py (3 hunks)
  • cognee/shared/GithubClassification.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • cognee/config.py
  • cognee/infrastructure/data/chunking/LangchainChunkingEngine.py
Additional Context Used
Ruff (5)
cognee/api/v1/cognify/cognify.py (1)

125-125: Local variable graph_topology is assigned to but never used

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3)

7-7: Redefinition of unused TextEmbedding from line 6


8-8: Redefinition of unused AsyncOpenAI from line 5


40-40: Do not use bare except

cognee/modules/cognify/graph/add_data_chunks.py (1)

13-13: Local variable vector_client is assigned to but never used

Additional comments not posted (8)
cognee/shared/GithubClassification.py (4)

8-13: The enum TextSubclass is well-defined with clear, descriptive categories for different types of text content.


15-17: The ContentType class is simple and correctly uses Pydantic for data validation. The default type of "TEXT" is appropriate for the base class.


19-21: The TextContent class correctly inherits from ContentType and introduces a new field subclass which is a list of TextSubclass. This design supports multiple categorizations for text content, which is flexible and useful.


23-25: The CodeContentPrediction model is succinct and appropriate for predicting content types. It uses TextContent for the label field, which ensures that predictions are tied to the defined text categories.

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (1)

Line range hint 16-66: The DefaultEmbeddingEngine and LiteLLMEmbeddingEngine classes are well-structured for handling text embeddings. Ensure that the verbose setting and configuration loading are appropriately managed to avoid side effects.

Verification successful

The review comment raised a concern about the potential side effects of the set_verbose setting used in the DefaultEmbeddingEngine class. The verification process involved searching for the usage of set_verbose across the application to ensure it does not alter any critical settings elsewhere.

The results from the executed script show that set_verbose is only used in the DefaultEmbeddingEngine.py file and is not found in any other parts of the application. This indicates that the setting is localized to this specific class and does not impact other components or configurations.

Therefore, the concern about unintended side effects from the set_verbose setting does not apply, as its usage is confined and does not interfere with other parts of the application.

  • File: cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py
  • Detail: The set_verbose = True setting is used only within this file and does not affect other application settings.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the verbose setting does not cause unintended side effects across the application.

# Test: Search for `set_verbose` usage in other parts of the application. Expect: No other critical settings are altered.
rg --type py 'set_verbose'

Length of output: 129

cognee/api/v1/cognify/cognify.py (3)

125-125: The variable graph_topology is assigned but never used, which could be removed to clean up the code.


215-215: Ensure that the graph variable is defined before it's used in render_graph. This is crucial to avoid runtime errors.


192-212: The test function is well-structured for testing the cognify process. However, ensure that all configurations and dependencies are correctly set up before running the tests to avoid issues.

print("text", text)
try:
text = str(text[0])
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare except statement. It's better to specify the exception type to avoid catching unexpected exceptions and to make the error handling clearer.

- except:
+ except SomeSpecificException:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except:
except SomeSpecificException:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Out of diff range and nitpick comments (3)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)

Line range hint 14-29: Consider refactoring the lambda function for converting embeddings to lists for improved readability.

- embeddings_list = list(map(lambda embedding: embedding.tolist(), embedding_model.embed(text)))
+ embeddings_list = [embedding.tolist() for embedding in embedding_model.embed(text)]

42-46: Determine if the commented-out debug prints are necessary. If they are not, consider removing them to clean up the code.

cognee/modules/cognify/graph/add_cognitive_layer_graphs.py (1)

[!TIP]
Codebase Verification

Commented-out text processing functions are used elsewhere in the codebase

The commented-out text processing functions (extract_pos_tags, extract_named_entities, extract_sentiment_vader) are indeed used in other parts of the codebase, specifically in cognee/utils.py and cognee/infrastructure/data/utils/extract_keywords.py.

  • cognee/utils.py:
    • extract_pos_tags
    • extract_named_entities
    • extract_sentiment_vader
  • cognee/infrastructure/data/utils/extract_keywords.py:
    • extract_pos_tags

Given their usage in other files, it is important to retain these functions in the codebase. However, if the commented-out lines in add_cognitive_layer_graphs.py are not needed, they can be removed to clean up the code.

  • Consider removing the commented-out lines in add_cognitive_layer_graphs.py if they are no longer needed.
Analysis chain

Line range hint 144-267: Review the commented-out lines related to text processing. If they are no longer needed, consider removing them to clean up the code.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check if the commented-out text processing functions are used elsewhere in the codebase.

# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type python $'extract_pos_tags|extract_named_entities|extract_sentiment_vader'

Length of output: 115



Script:

#!/bin/bash
# Description: Check if the commented-out text processing functions are used elsewhere in the codebase.

# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type py 'extract_pos_tags|extract_named_entities|extract_sentiment_vader'

Length of output: 1081

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 3c261ce and db0b19b.
Files ignored due to path filters (2)
  • poetry.lock is excluded by !**/*.lock, !**/*.lock
  • pyproject.toml is excluded by !**/*.toml
Files selected for processing (11)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/config.py (3 hunks)
  • cognee/infrastructure/InfrastructureConfig.py (4 hunks)
  • cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
  • cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2 hunks)
  • cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
  • cognee/modules/cognify/graph/add_cognitive_layer_graphs.py (5 hunks)
  • cognee/modules/cognify/graph/add_data_chunks.py (2 hunks)
  • cognee/modules/cognify/graph/add_document_node.py (2 hunks)
  • cognee/modules/cognify/graph/add_label_nodes.py (1 hunks)
  • cognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • cognee/config.py
  • cognee/infrastructure/InfrastructureConfig.py
  • cognee/modules/cognify/graph/add_data_chunks.py
Additional Context Used
Ruff (4)
cognee/api/v1/cognify/cognify.py (3)

179-179: Local variable graph_topology is assigned to but never used


184-184: Local variable document_id is assigned to but never used


201-201: Local variable classified_categories is assigned to but never used

cognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py (1)

21-21: Local variable error is assigned to but never used

Additional comments not posted (2)
cognee/modules/cognify/graph/add_document_node.py (1)

17-17: Correctly sets the document type to "Document".

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

Line range hint 1-1: The implementation of LanceDBAdapter correctly handles interactions with the vector database.

@@ -25,6 +25,7 @@ async def add_document_node(graph_client: GraphDBInterface, parent_node_id, docu
dict(relationship_name = "has_document"),
)

await add_label_nodes(graph_client, document_id, document_metadata["keywords"].split("|"))
#
# await add_label_nodes(graph_client, document_id, document_metadata["keywords"].split("|"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Consider whether the commented-out lines for adding label nodes are still needed. If not, they should be removed to clean up the code.

  • cognee/modules/cognify/graph/add_document_node.py: Line 29
  • cognee/api/v1/cognify/cognify.py: Line with # await add_label_nodes(graph_client, document_id, chunk_id, file_metadata["keywords"].split("|"))
Analysis chain

Consider whether the commented-out line for adding label nodes is still needed. If not, it should be removed to clean up the code.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check if the function `add_label_nodes` is used elsewhere in the codebase.

# Test: Search for the function usage. Expect: No occurrences outside this file.
rg --type python $'add_label_nodes'

Length of output: 65



Script:

#!/bin/bash
# Description: Check if the function `add_label_nodes` is used elsewhere in the codebase.

# Test: Search for the function usage. Expect: No occurrences outside this file.
rg 'add_label_nodes' --glob '*.py'

Length of output: 579

@@ -21,4 +21,4 @@ def sync_extract_knowledge_graph():
except Exception as error:
# TODO: Log error to Sentry

return await extract_content_graph(text, cognitive_layer, graph_model)
return await extract_content_graph(text, cognitive_layer, graph_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure proper error handling by logging the exception. Currently, the error is caught but not logged, which could hinder debugging.

- except Exception as error:
+ except Exception as error:
+     logger.error(f"Failed to extract knowledge graph: {error}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
return await extract_content_graph(text, cognitive_layer, graph_model)
return await extract_content_graph(text, cognitive_layer, graph_model)
except Exception as error:
logger.error(f"Failed to extract knowledge graph: {error}")

Comment on lines +64 to +66
except Exception as e:
# It's ok if the collection already exists.
pass
print(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the exception type instead of using a generic exception to avoid catching unexpected exceptions and to make the error handling clearer.

- except Exception as e:
+ except CollectionExistsException as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except Exception as e:
# It's ok if the collection already exists.
pass
print(e)
except CollectionExistsException as e:
# It's ok if the collection already exists.
print(e)

Comment on lines +91 to +104
async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel:
"""Generate a response from a user query."""

return await self.aclient.chat.completions.create(
model=self.model,
messages=[
model = self.model,
messages = [
{
"role": "user",
"content": f"""Use the given format to
extract information from the following input: {text_input}. {system_prompt} """,
}
extract information from the following input: {text_input}. """,
},
{"role": "system", "content": system_prompt},
],
response_model=response_model,
response_model = response_model,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify the string formatting in acreate_structured_output to reduce complexity and potential errors.

- "content": f"""Use the given format to
- extract information from the following input: {text_input}. """,
+ "content": "Use the given format to extract information from the following input: " + text_input

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel:
"""Generate a response from a user query."""
return await self.aclient.chat.completions.create(
model=self.model,
messages=[
model = self.model,
messages = [
{
"role": "user",
"content": f"""Use the given format to
extract information from the following input: {text_input}. {system_prompt} """,
}
extract information from the following input: {text_input}. """,
},
{"role": "system", "content": system_prompt},
],
response_model=response_model,
response_model = response_model,
async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel:
"""Generate a response from a user query."""
return await self.aclient.chat.completions.create(
model = self.model,
messages = [
{
"role": "user",
"content": "Use the given format to extract information from the following input: " + text_input,
},
{"role": "system", "content": system_prompt},
],
response_model = response_model,

Comment on lines 144 to 297
# categories = classified_categories,
# )

classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]


#
# async def process_text(document_id: str, chunk_id: str, chunk_collection: str, input_text: str):
# raw_document_id = document_id.split("__")[-1]
#
# print(f"Processing chunk ({chunk_id}) from document ({raw_document_id}).")
#
# graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])
#
# classified_categories = await get_content_categories(input_text)
# await add_classification_nodes(
# graph_client,
# parent_node_id = document_id,
# categories = classified_categories,
# )
# >>>>>>> origin/main
#
# print(f"Chunk ({chunk_id}) classified.")
#
# # print("document_id", document_id)
# #
# # content_summary = await get_content_summary(input_text)
# # await add_summary_nodes(graph_client, document_id, content_summary)
#
# print(f"Chunk ({chunk_id}) summarized.")
# #
# cognitive_layers = await get_cognitive_layers(input_text, classified_categories)
# cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2]
# #
# layer_graphs = await get_layer_graphs(input_text, cognitive_layers)
# await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs)
#
# <<<<<<< HEAD
# print("got here 4444")
#
# if infrastructure_config.get_config()["connect_documents"] is True:
# db_engine = infrastructure_config.get_config()["database_engine"]
# relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = file_metadata["id"])
#
# list_of_nodes = []
#
# relevant_documents_to_connect.append({
# "layer_id": document_id,
# })
#
# for document in relevant_documents_to_connect:
# node_descriptions_to_match = await graph_client.extract_node_description(document["layer_id"])
# list_of_nodes.extend(node_descriptions_to_match)
#
# nodes_by_layer = await group_nodes_by_layer(list_of_nodes)
#
# results = await resolve_cross_graph_references(nodes_by_layer)
#
# relationships = graph_ready_output(results)
#
# await connect_nodes_in_graph(
# graph_client,
# relationships,
# score_threshold = infrastructure_config.get_config()["intra_layer_score_treshold"]
# )
#
# send_telemetry("cognee.cognify")
#
# print(f"Chunk ({chunk_id}) cognified.")
# =======
# if infrastructure_config.get_config()["connect_documents"] is True:
# db_engine = infrastructure_config.get_config()["database_engine"]
# relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = raw_document_id)
#
# list_of_nodes = []
#
# relevant_documents_to_connect.append({
# "layer_id": document_id,
# })
#
# for document in relevant_documents_to_connect:
# node_descriptions_to_match = await graph_client.extract_node_description(document["layer_id"])
# list_of_nodes.extend(node_descriptions_to_match)
#
# nodes_by_layer = await group_nodes_by_layer(list_of_nodes)
#
# results = await resolve_cross_graph_references(nodes_by_layer)
#
# relationships = graph_ready_output(results)
#
# await connect_nodes_in_graph(
# graph_client,
# relationships,
# score_threshold = infrastructure_config.get_config()["intra_layer_score_treshold"]
# )
#
# send_telemetry("cognee.cognify")
#
# print(f"Chunk ({chunk_id}) cognified.")
# >>>>>>> origin/main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Commented-out code cleanup needed

The functions add_classification_nodes, get_content_categories, get_content_summary, and add_summary_nodes are defined and imported but are not used outside of the commented-out sections in cognify.py. Consider removing these commented-out lines to clean up the code.

  • cognee/api/v1/cognify/cognify.py: Lines containing commented-out code related to the mentioned functions.
Analysis chain

Review the commented-out lines and debug prints. If they are no longer needed, consider removing them to clean up the code.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check if the commented-out functions are used elsewhere in the codebase.

# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type python $'add_classification_nodes|get_content_categories|get_content_summary|add_summary_nodes'

Length of output: 137



Script:

#!/bin/bash
# Description: Check if the commented-out functions are used elsewhere in the codebase.

# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type py $'add_classification_nodes|get_content_categories|get_content_summary|add_summary_nodes'

Length of output: 1559

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Out of diff range and nitpick comments (1)
cognee/api/v1/cognify/cognify.py (1)

[!TIP]
Codebase Verification

Unused Imports in cognee/api/v1/cognify/cognify.py

The following imports are not actively used in the code and should be considered for removal:

  • add_classification_nodes
  • add_summary_nodes
  • group_nodes_by_layer
  • graph_ready_output
  • connect_nodes_in_graph
  • resolve_cross_graph_references
  • get_content_categories
  • get_content_summary
  • get_cognitive_layers
  • get_layer_graphs
  • send_telemetry

These imports are either commented out or not referenced in the active code. Removing them can help clean up the codebase and improve maintainability.

Analysis chain

Line range hint 11-35: Ensure all new imports are utilized effectively in the code.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that all newly imported modules are used in the file.

# Test: Search for the usage of newly imported modules. Expect: At least one occurrence of each module.
rg --type python $'prune|Config|LangchainChunkingEngine|LiteLLMEmbeddingEngine|add_data_chunks|add_document_node|add_classification_nodes|add_cognitive_layer_graphs|add_summary_nodes|group_nodes_by_layer|graph_ready_output|connect_nodes_in_graph|resolve_cross_graph_references|get_graph_client|add_cognitive_layers|guess_file_type|extract_text_from_file|infrastructure_config|get_content_categories|get_content_summary|get_cognitive_layers|get_layer_graphs|TopologyEngine|CodeContentPrediction|ChunkStrategy|DefaultGraphModel|send_telemetry' cognee/api/v1/cognify/cognify.py

Length of output: 606



Script:

#!/bin/bash
# Description: Verify that all newly imported modules are used in the file.

# Test: Search for the usage of newly imported modules. Expect: At least one occurrence of each module.
rg $'prune|Config|LangchainChunkingEngine|LiteLLMEmbeddingEngine|add_data_chunks|add_document_node|add_classification_nodes|add_cognitive_layer_graphs|add_summary_nodes|group_nodes_by_layer|graph_ready_output|connect_nodes_in_graph|resolve_cross_graph_references|get_graph_client|add_cognitive_layers|guess_file_type|extract_text_from_file|infrastructure_config|get_content_categories|get_content_summary|get_cognitive_layers|get_layer_graphs|TopologyEngine|CodeContentPrediction|ChunkStrategy|DefaultGraphModel|send_telemetry' cognee/api/v1/cognify/cognify.py

Length of output: 5726

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between db0b19b and 4e6fcde.
Files selected for processing (1)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
Additional Context Used
Ruff (3)
cognee/api/v1/cognify/cognify.py (3)

184-184: Local variable parent_node_id is assigned to but never used


186-186: Local variable document_id is assigned to but never used


203-203: Local variable classified_categories is assigned to but never used

Additional comments not posted (2)
cognee/api/v1/cognify/cognify.py (2)

179-179: Remove unused variable graph_topology.

This issue has been flagged in previous reviews. Please ensure it is addressed.


328-328: Undefined variable graph used in render_graph.

This issue has been flagged in previous reviews. Please ensure it is addressed.

if graph_topology == "default":
parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}"
elif graph_topology == DefaultGraphModel:
parent_node_id = f"DefaultGraphModel__{USER_ID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable parent_node_id is assigned but never used in the process_text function.

- parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}"

Consider removing this line if parent_node_id is not needed, or ensure it is used appropriately if intended.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
parent_node_id = f"DefaultGraphModel__{USER_ID}"


print(f"Chunk ({chunk_id}) cognified.")
document_id = await add_document_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable document_id is assigned but never used in the process_text function.

- document_id = await add_document_node(
-     graph_client,
-     parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}",
-     document_metadata = file_metadata,
- )

Consider removing this line if document_id is not needed, or ensure it is used appropriately if intended.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
document_id = await add_document_node(

# categories = classified_categories,
# )

classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable classified_categories is assigned but never used in the process_text function.

- classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]

Consider removing this line if classified_categories is not needed, or ensure it is used appropriately if intended.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 4e6fcde and 57e3e2e.
Files selected for processing (3)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/config.py (3 hunks)
  • cognee/modules/cognify/graph/add_document_node.py (2 hunks)
Files skipped from review as they are similar to previous changes (2)
  • cognee/config.py
  • cognee/modules/cognify/graph/add_document_node.py
Additional Context Used
Ruff (1)
cognee/api/v1/cognify/cognify.py (1)

188-188: Do not use bare except

Additional comments not posted (1)
cognee/api/v1/cognify/cognify.py (1)

227-255: Ensure proper asynchronous handling in the main testing block.

The main block uses asyncio.run(test()) which is correct for running async code from a synchronous context. However, ensure that all async functions within test() are awaited properly to avoid runtime errors.

Verification successful

Ensure proper asynchronous handling in the main testing block.

The main block uses asyncio.run(test()) which is correct for running async code from a synchronous context. However, ensure that all async functions within test() are awaited properly to avoid runtime errors.

  • The test function in cognee/api/v1/cognify/cognify.py correctly uses await for the cognify and render_graph functions.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify proper use of async/await in the main testing block.

# Test: Search for missing 'await' in async function calls within the main block.
ast-grep --lang python --pattern $'if __name__ == "__main__" { $$$ async def test() { $$$ } $$$ }'

Length of output: 444



Script:

#!/bin/bash
# Description: Verify proper use of async/await in the main testing block.

# Test: Search for missing 'await' in async function calls within the main block.
ast-grep --lang python --pattern $'if __name__ == "__main__":\n    async def test():\n        $$$\n    $$$'

Length of output: 4501

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 57e3e2e and 48d21c4.
Files selected for processing (2)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/api/v1/topology/add_topology.py (1 hunks)
Additional Context Used
Ruff (2)
cognee/api/v1/cognify/cognify.py (1)

188-188: Do not use bare except

cognee/api/v1/topology/add_topology.py (1)

16-16: Local variable graph_topology is assigned to but never used


graph_client = await get_graph_client(graph_db_type)

graph_topology = infrastructure_config.get_config()["graph_topology"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused variable graph_topology.

- graph_topology = infrastructure_config.get_config()["graph_topology"]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
graph_topology = infrastructure_config.get_config()["graph_topology"]

Comment on lines +18 to +20
engine = TopologyEngine()
topology = await engine.infer_from_directory_structure(node_id =USER_ID , repository = directory, model=model)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Ensure proper error handling in infer_from_directory_structure

The function infer_from_directory_structure is defined in cognee/modules/topology/topology.py. However, the search for error handling patterns (e.g., try: blocks) did not yield any results within this function. This indicates that the function might lack proper error handling mechanisms.

Action Required:

  • Add appropriate error handling (e.g., try-except blocks) within the infer_from_directory_structure function to ensure robustness and reliability.

Location:

  • cognee/modules/topology/topology.py

Example:

async def infer_from_directory_structure(self, node_id: str, repository: str, model):
    try:
        # Your existing code logic
    except Exception as e:
        # Handle exceptions appropriately
        logger.error(f"Error in infer_from_directory_structure: {e}")
        raise

Please review and implement the necessary changes.

Analysis chain

Ensure proper error handling in infer_from_directory_structure.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify error handling in the function `infer_from_directory_structure`.

# Test: Search for the function implementation and check for error handling. Expect: Proper try-except blocks.
ast-grep --lang python --pattern $'async def infer_from_directory_structure($_, $_, $_) {
  $$$
}'

Length of output: 584



Script:

#!/bin/bash
# Description: Verify error handling in the function `infer_from_directory_structure`.

# Test: Search for the function implementation and check for error handling. Expect: Proper try-except blocks.
ast-grep --lang python --pattern $'async def infer_from_directory_structure(node_id, repository, model):\n    $$$'

Length of output: 114



Script:

#!/bin/bash
# Description: Locate the function `infer_from_directory_structure` and verify error handling.

# Step 1: Locate the function definition
rg 'async def infer_from_directory_structure' -t py

# Step 2: Check for error handling within the function
rg 'try:' -A 10 -B 10 -t py

Length of output: 50516

Comment on lines +21 to +30
def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]:
"""Flatten a single Pydantic model to a dictionary handling nested structures."""
result = {**model.dict(), 'parent_id': parent_id}
if hasattr(model, 'default_relationship') and model.default_relationship:
result.update({
'relationship_type': model.default_relationship.type,
'relationship_source': model.default_relationship.source,
'relationship_target': model.default_relationship.target
})
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor the flatten_model function to improve readability and maintainability.

- result = {**model.dict(), 'parent_id': parent_id}
- if hasattr(model, 'default_relationship') and model.default_relationship:
-     result.update({
-         'relationship_type': model.default_relationship.type,
-         'relationship_source': model.default_relationship.source,
-         'relationship_target': model.default_relationship.target
-     })
+ result = model.dict()
+ result['parent_id'] = parent_id
+ if model.default_relationship:
+     result['relationship_type'] = model.default_relationship.type
+     result['relationship_source'] = model.default_relationship.source
+     result['relationship_target'] = model.default_relationship.target

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]:
"""Flatten a single Pydantic model to a dictionary handling nested structures."""
result = {**model.dict(), 'parent_id': parent_id}
if hasattr(model, 'default_relationship') and model.default_relationship:
result.update({
'relationship_type': model.default_relationship.type,
'relationship_source': model.default_relationship.source,
'relationship_target': model.default_relationship.target
})
return result
def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]:
"""Flatten a single Pydantic model to a dictionary handling nested structures."""
result = model.dict()
result['parent_id'] = parent_id
if model.default_relationship:
result['relationship_type'] = model.default_relationship.type
result['relationship_source'] = model.default_relationship.source
result['relationship_target'] = model.default_relationship.target
return result

Comment on lines +32 to +43
def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]:
"""Recursively flatten nested Pydantic models or lists of models."""
if isinstance(items, list):
return [entry for item in items for entry in recursive_flatten(item, parent_id)]
elif isinstance(items, BaseModel):
flat = [flatten_model(items, parent_id)]
for field, value in items:
if isinstance(value, (BaseModel, list)):
flat.extend(recursive_flatten(value, items.dict().get('node_id', None)))
return flat
else:
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the recursive_flatten function to handle different data types more efficiently.

- if isinstance(items, list):
-     return [entry for item in items for entry in recursive_flatten(item, parent_id)]
- elif isinstance(items, BaseModel):
-     flat = [flatten_model(items, parent_id)]
-     for field, value in items:
-         if isinstance(value, (BaseModel, list)):
-             flat.extend(recursive_flatten(value, items.dict().get('node_id', None)))
-     return flat
- else:
-     return []
+ return [flatten_model(item, parent_id) if isinstance(item, BaseModel) else recursive_flatten(item, parent_id) for item in items] if isinstance(items, list) else [flatten_model(items, parent_id)]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]:
"""Recursively flatten nested Pydantic models or lists of models."""
if isinstance(items, list):
return [entry for item in items for entry in recursive_flatten(item, parent_id)]
elif isinstance(items, BaseModel):
flat = [flatten_model(items, parent_id)]
for field, value in items:
if isinstance(value, (BaseModel, list)):
flat.extend(recursive_flatten(value, items.dict().get('node_id', None)))
return flat
else:
return []
def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]:
"""Recursively flatten nested Pydantic models or lists of models."""
return [flatten_model(item, parent_id) if isinstance(item, BaseModel) else recursive_flatten(item, parent_id) for item in items] if isinstance(items, list) else [flatten_model(items, parent_id)]

Comment on lines +62 to +63
if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for add_edge to manage potential failures gracefully.

- if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
-     await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
+ if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
+     try:
+         await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
+     except Exception as e:
+         logging.error(f"Failed to add edge: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
try:
await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])
except Exception as e:
logging.error(f"Failed to add edge: {e}")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 48d21c4 and 4488339.
Files selected for processing (1)
  • .github/workflows/test_common.yml (2 hunks)
Files skipped from review due to trivial changes (1)
  • .github/workflows/test_common.yml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Out of diff range and nitpick comments (5)
cognee/infrastructure/llm/openai/adapter.py (4)

Line range hint 23-25: Encapsulate API key management to enhance security and maintainability.

Direct manipulation of openai.api_key in the constructor can lead to security risks and makes the code less maintainable. Consider using a method or a separate configuration manager to handle API keys.


Line range hint 29-31: Make retry attempts configurable to adapt to different runtime environments.

The number of retry attempts is hard-coded in several methods. Making this configurable via an environment variable or configuration file would allow easier adjustments based on the deployment environment.

Also applies to: 33-35, 37-39, 41-43, 45-47


Line range hint 37-39: Use a constant for the default model to avoid repetition and potential errors.

The default model string "text-embedding-3-large" is repeated in multiple methods. Defining this as a constant at the class or module level would reduce repetition and help prevent errors if the model name changes.

Also applies to: 41-43


Line range hint 57-59: Improve the formatting of the user input in the message for clarity.

The user input is embedded in a larger string which might make it hard to read. Consider formatting it on a new line or using a clearer delimiter.

cognee/infrastructure/InfrastructureConfig.py (1)

Line range hint 36-149: Add error handling and validation for configuration parameters to enhance robustness.

The get_config and set_config methods handle a wide range of configurations. Adding error handling and validation for the input parameters would make these methods more robust and prevent potential runtime errors.

Also applies to: 204-206

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 4488339 and d1fa826.
Files ignored due to path filters (2)
  • docker-compose.yml is excluded by !**/*.yml
  • pyproject.toml is excluded by !**/*.toml
Files selected for processing (4)
  • cognee/config.py (3 hunks)
  • cognee/infrastructure/InfrastructureConfig.py (4 hunks)
  • cognee/infrastructure/llm/openai/adapter.py (1 hunks)
  • cognee/shared/data_models.py (7 hunks)
Files skipped from review as they are similar to previous changes (2)
  • cognee/config.py
  • cognee/shared/data_models.py
Additional comments not posted (1)
cognee/infrastructure/InfrastructureConfig.py (1)

35-36: Ensure consistent handling of graph_topology across different methods.

The graph_topology attribute is set in multiple places within the InfrastructureConfig class. Verify that it is consistently handled across all methods to avoid configuration inconsistencies.

Also applies to: 82-84, 149-149, 204-206

Comment on lines 16 to 19
if config.monitoring_tool == MonitoringTool.LANGFUSE:
from langfuse.openai import AsyncOpenAI, OpenAI
else:
from openai import AsyncOpenAI, OpenAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a more dynamic import strategy for AsyncOpenAI and OpenAI.

Using conditional imports based on configuration can lead to code that is harder to maintain and test. Consider using a factory pattern or dependency injection to handle different configurations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between d1fa826 and 352ea25.
Files selected for processing (3)
  • cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
  • cognee/infrastructure/llm/openai/adapter.py (1 hunks)
  • cognee/shared/data_models.py (7 hunks)
Files skipped from review as they are similar to previous changes (3)
  • cognee/infrastructure/llm/generic_llm_api/adapter.py
  • cognee/infrastructure/llm/openai/adapter.py
  • cognee/shared/data_models.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 352ea25 and 2657aa7.
Files ignored due to path filters (2)
  • evals/soldiers_home.pdf is excluded by !**/*.pdf, !**/*.pdf
  • pyproject.toml is excluded by !**/*.toml
Files selected for processing (6)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/modules/cognify/graph/add_data_chunks.py (3 hunks)
  • evals/generate_test_set.py (1 hunks)
  • evals/natural_language_processing.txt (1 hunks)
  • evals/simple_rag_vs_cognee_eval.py (1 hunks)
  • evals/trump.txt (1 hunks)
Files skipped from review due to trivial changes (1)
  • evals/natural_language_processing.txt
Files skipped from review as they are similar to previous changes (1)
  • cognee/modules/cognify/graph/add_data_chunks.py
Additional Context Used
Ruff (15)
cognee/api/v1/cognify/cognify.py (4)

110-110: Local variable added_chunks is assigned to but never used


111-111: Local variable added__basic_rag_chunks is assigned to but never used


188-188: Do not use bare except


263-263: Local variable graph is assigned to but never used

evals/generate_test_set.py (4)

34-34: Module level import not at top of file


35-35: Module level import not at top of file


36-36: Module level import not at top of file


41-41: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (7)

11-11: Module level import not at top of file


26-26: Module level import not at top of file


27-27: Module level import not at top of file


27-27: Redefinition of unused List from line 5


28-28: Module level import not at top of file


33-33: Local variable response is annotated but never used


71-71: Module level import not at top of file

Additional comments not posted (3)
evals/generate_test_set.py (2)

1-6: Imports and dotenv setup are correctly placed at the top of the file.


23-27: The method generate_goldens_from_docs is used to generate goldens from documents. Ensure that the document paths and parameters are correctly configured for your use case.

evals/trump.txt (1)

1-15: The content of the file provides a detailed account of Donald Trump's speech. Ensure that all factual information is accurate and up-to-date.

Comment on lines +39 to +41
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)

from deepeval import evaluate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instantiation of AnswerRelevancyMetric and the import of evaluate are correctly placed. However, consider moving all imports to the top of the file.

- from deepeval import evaluate
+ # Move this import to the top of the file

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
from deepeval import evaluate
from deepeval import evaluate
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)

dataset.test_cases = convert_goldens_to_test_cases(dataset.goldens)


from deepeval.metrics import HallucinationMetric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import of HallucinationMetric should be moved to the top of the file to maintain consistency and readability.

- from deepeval.metrics import HallucinationMetric
+ # Move this import to the top of the file

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from deepeval.metrics import HallucinationMetric

Comment on lines +188 to +189
except:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare except statement as it can catch unexpected exceptions and obscure real errors. Specify the exception type or handle the general exception more gracefully.

- except:
+ except Exception as e:
+     logger.error("Failed during cognitive layer processing: %s", str(e))

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except:
pass
except Exception as e:
logger.error("Failed during cognitive layer processing: %s", str(e))


config.set_graph_model(SourceCodeGraph)
config.set_classification_model(CodeContentPrediction)
graph = await cognify()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local variable graph is assigned but never used. Consider removing this assignment if it's not needed.

- graph = await cognify()
+ # Consider removing this line if 'graph' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
graph = await cognify()
# Consider removing this line if 'graph' is not used.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 2657aa7 and d099cae.
Files selected for processing (2)
  • evals/generate_test_set.py (1 hunks)
  • evals/simple_rag_vs_cognee_eval.py (1 hunks)
Additional Context Used
Ruff (10)
evals/generate_test_set.py (4)

36-36: Module level import not at top of file


37-37: Module level import not at top of file


38-38: Module level import not at top of file


43-43: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (6)

11-11: Module level import not at top of file


44-44: Module level import not at top of file


45-45: Module level import not at top of file


45-45: Redefinition of unused List from line 5


46-46: Module level import not at top of file


97-97: Module level import not at top of file

Comment on lines +36 to +38
import pytest
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move module-level imports to the top of the file to improve readability and maintain Python conventions.

- import pytest
- from deepeval import assert_test
- from deepeval.metrics import AnswerRelevancyMetric
- from deepeval import evaluate
+ import pytest
+ from deepeval import assert_test, evaluate
+ from deepeval.metrics import AnswerRelevancyMetric

Also applies to: 43-43


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
import pytest
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric
import pytest
from deepeval import assert_test, evaluate
from deepeval.metrics import AnswerRelevancyMetric

Comment on lines +8 to +20
# synthesizer = Synthesizer()
# synthesizer.generate_goldens_from_docs(
# document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'],
# max_goldens_per_document=5,
# num_evolutions=5,
# include_expected_output=True,
# enable_breadth_evolve=True,
# )
#
# synthesizer.save_as(
# file_type='json', # or 'csv'
# directory="./synthetic_data"
# )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing or refactoring the commented-out code if it's no longer needed.

It's generally a good practice to keep the codebase clean by removing unused code blocks. This can help improve the maintainability of the code.



import logging
from typing import List, Dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the redundant import of List as it is already imported earlier in the file.

- from typing import List, Dict
+ from typing import Dict

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from typing import List, Dict
from typing import Dict

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between d099cae and 8ef2373.
Files ignored due to path filters (3)
  • docker-compose.yml is excluded by !**/*.yml
  • evals/test_datasets/initial_test/soldiers_home.pdf is excluded by !**/*.pdf, !**/*.pdf
  • pyproject.toml is excluded by !**/*.toml
Files selected for processing (4)
  • cognee/infrastructure/databases/graph/falkordb/adapter.py (1 hunks)
  • evals/simple_rag_vs_cognee_eval.py (1 hunks)
  • evals/test_datasets/initial_test/natural_language_processing.txt (1 hunks)
  • evals/test_datasets/initial_test/trump.txt (1 hunks)
Additional Context Used
LanguageTool (5)
evals/test_datasets/initial_test/natural_language_processing.txt (1)

Near line 1: Possible missing comma found.
Context: ...formation and insights contained in the documents as well as categorize and organize the ...

evals/test_datasets/initial_test/trump.txt (4)

Near line 1: The preposition ‘to’ seems more likely in this position.
Context: ...onstitution – during a bombastic speech for the National Rifle Association in which...


Near line 5: Consider a different adjective to strengthen your wording.
Context: ...t all. I intend to serve four years and do a great job.” The 22nd amendment, which was enacte...


Near line 9: The preposition “about” seems more likely in this position.
Context: ... In his speech to the NRA, Trump spoke on abortion, immigration and criticized Ro...


Near line 9: Possible spelling mistake found.
Context: ...on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”....

Ruff (7)
evals/simple_rag_vs_cognee_eval.py (7)

11-11: Module level import not at top of file


44-44: Module level import not at top of file


45-45: Module level import not at top of file


45-45: Redefinition of unused List from line 5


46-46: Module level import not at top of file


79-79: Local variable graph is assigned to but never used


131-131: Undefined name run_cognify_base_rag_and_search

Additional comments not posted (4)
evals/test_datasets/initial_test/natural_language_processing.txt (1)

2-2: This paragraph clearly outlines the challenges in NLP.

evals/test_datasets/initial_test/trump.txt (1)

5-5: The adjective "great" adequately conveys the intended meaning here.

evals/simple_rag_vs_cognee_eval.py (1)

131-131: Verify the existence of run_cognify_base_rag_and_search. It appears to be called but is not defined or imported in this file.

cognee/infrastructure/databases/graph/falkordb/adapter.py (1)

13-186: The FalcorDBAdapter class is well-implemented with clear, asynchronous database operation methods and proper error handling.

@@ -0,0 +1,2 @@
Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comma after "documents" for better readability.

- ...contained in the documents as well as categorize...
+ ...contained in the documents, as well as categorize...

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents, as well as categorize and organize the documents themselves.


The 22nd amendment, which was enacted following Franklin Delano Rosevelt’s fourth term, limits the presidency to two terms.

In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing "on" to "about" to clarify the topics discussed.

- ...Trump spoke on abortion...
+ ...Trump spoke about abortion...

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.
In his speech to the NRA, Trump spoke about abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.

@@ -0,0 +1,15 @@
Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing "for" to "to" to clarify the context of the speech.

- ...speech for the National Rifle Association...
+ ...speech to the National Rifle Association...

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.
Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech to the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range and nitpick comments (3)
cognee/modules/search/llm/extraction/categorize_relevant_category.py (1)

6-17: The implementation of categorize_relevant_category looks solid and follows async programming practices well.

Consider reviewing the use of print statements for sensitive data logging in a production environment.

cognee/modules/search/graph/search_categories.py (1)

Line range hint 18-52: The enhancements to search_categories to support different graph engines and configurations are well-implemented.

Consider adding more detailed docstrings to explain the parameters and the return type more clearly.

cognee/api/v1/cognify/cognify.py (1)

83-149: The modifications to the cognify function to handle different datasets and graph interactions are comprehensive and well-implemented.

Consider adding more comments to explain complex sections of the code for better maintainability.

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 8ef2373 and 63356f2.
Files selected for processing (5)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/infrastructure/InfrastructureConfig.py (5 hunks)
  • cognee/modules/search/graph/search_categories.py (2 hunks)
  • cognee/modules/search/llm/extraction/categorize_relevant_category.py (1 hunks)
  • evals/simple_rag_vs_cognee_eval.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cognee/infrastructure/InfrastructureConfig.py
Additional Context Used
Ruff (11)
cognee/api/v1/cognify/cognify.py (3)

110-110: Local variable added_chunks is assigned to but never used


111-111: Local variable added__basic_rag_chunks is assigned to but never used


188-188: Do not use bare except

cognee/modules/search/graph/search_categories.py (2)

9-9: Module level import not at top of file


10-10: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (6)

11-11: Module level import not at top of file


44-44: Module level import not at top of file


45-45: Module level import not at top of file


45-45: Redefinition of unused List from line 5


46-46: Module level import not at top of file


79-79: Local variable graph is assigned to but never used

Additional comments not posted (3)
cognee/modules/search/graph/search_categories.py (1)

12-16: The strip_exact_regex function is a useful utility for regex operations.

evals/simple_rag_vs_cognee_eval.py (1)

1-10: The new imports for handling evaluation datasets are correctly placed and used.

cognee/api/v1/cognify/cognify.py (1)

Line range hint 1-38: The new imports and initial setup in cognify.py are well-organized and relevant to the functionality.

Comment on lines +11 to +69
from cognee.infrastructure.llm.get_llm_client import get_llm_client

dataset = EvaluationDataset()
dataset.add_test_cases_from_json_file(
# file_path is the absolute path to you .json file
file_path="./synthetic_data/20240519_185842.json",
input_key_name="input",
actual_output_key_name="actual_output",
expected_output_key_name="expected_output",
context_key_name="context"
)

print(dataset)
# from deepeval.synthesizer import Synthesizer
#
# synthesizer = Synthesizer(model="gpt-3.5-turbo")
#
# dataset = EvaluationDataset()
# dataset.generate_goldens_from_docs(
# synthesizer=synthesizer,
# document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'],
# max_goldens_per_document=10,
# num_evolutions=5,
# enable_breadth_evolve=True,
# )


print(dataset.goldens)
print(dataset)




import logging
from typing import List, Dict
from cognee.infrastructure import infrastructure_config

logger = logging.getLogger(__name__)

class AnswerModel(BaseModel):
response:str
def get_answer_base(content: str, context:str, response_model: Type[BaseModel]):
llm_client = get_llm_client()

system_prompt = "THIS IS YOUR CONTEXT:" + str(context)

return llm_client.create_structured_output(content, system_prompt, response_model)
def get_answer(content: str,context, model: Type[BaseModel]= AnswerModel):

try:
return (get_answer_base(
content,
context,
model
))
except Exception as error:
logger.error("Error extracting cognitive layers from content: %s", error, exc_info = True)
raise error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of dataset handling and model testing functions is robust.

Consider using more specific exception types in the get_answer function to handle errors more gracefully.

- except Exception as error:
+ except SpecificExceptionType as error:

Committable suggestion was skipped due low confidence.

Comment on lines +154 to +195
async def process_text(chunk_collection: str, chunk_id: str, input_text: str, file_metadata: dict, document_id: str):
print(f"Processing chunk ({chunk_id}) from document ({file_metadata['id']}).")

graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])

classified_categories = await get_content_categories(input_text)

graph_topology = infrastructure_config.get_config()["graph_model"]
if graph_topology == SourceCodeGraph:
classified_categories = [{'data_type': 'text', 'category_name': 'Code and functions'}]
elif graph_topology == KnowledgeGraph:
classified_categories = await get_content_categories(input_text)
else:
classified_categories = [{'data_type': 'text', 'category_name': 'Unclassified text'}]

# await add_label_nodes(graph_client, document_id, chunk_id, file_metadata["keywords"].split("|"))

await add_classification_nodes(
graph_client,
parent_node_id = document_id,
categories = classified_categories,
)
graph_client,
parent_node_id = document_id,
categories = classified_categories,
)

print(f"Chunk ({chunk_id}) classified.")

content_summary = await get_content_summary(input_text)
await add_summary_nodes(graph_client, document_id, content_summary)

print(f"Chunk ({chunk_id}) summarized.")

cognitive_layers = await get_cognitive_layers(input_text, classified_categories)
cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2]

layer_graphs = await get_layer_graphs(input_text, cognitive_layers)
await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs)

if infrastructure_config.get_config()["connect_documents"] is True:
db_engine = infrastructure_config.get_config()["database_engine"]
relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = raw_document_id)

list_of_nodes = []

relevant_documents_to_connect.append({
"layer_id": document_id,
})
cognitive_layers = cognitive_layers[:config.cognitive_layers_limit]

try:
cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2]
print("cognitive_layers", cognitive_layers)
layer_graphs = await get_layer_graphs(input_text, cognitive_layers)
await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs)
except:
pass





# if infrastructure_config.get_config()["connect_documents"] is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process_text function includes robust error handling and detailed logging.

Consider using specific exception types instead of a bare except to handle errors more gracefully.

- except:
+ except SpecificExceptionType:

Committable suggestion was skipped due low confidence.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Outside diff range and nitpick comments (1)
cognee/modules/search/vector/search_similarity.py (1)

14-14: Consider removing debug print statements or replacing them with proper logging mechanisms before deploying to production.

Also applies to: 23-24

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 63356f2 and 3fadb27.
Files selected for processing (11)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/api/v1/search/search.py (3 hunks)
  • cognee/modules/search/graph/search_adjacent.py (1 hunks)
  • cognee/modules/search/graph/search_categories.py (2 hunks)
  • cognee/modules/search/graph/search_cypher.py (1 hunks)
  • cognee/modules/search/graph/search_neighbour.py (2 hunks)
  • cognee/modules/search/llm/extraction/categorize_relevant_category.py (1 hunks)
  • cognee/modules/search/vector/bm25.py (1 hunks)
  • cognee/modules/search/vector/fusion.py (1 hunks)
  • cognee/modules/search/vector/search_similarity.py (1 hunks)
  • evals/simple_rag_vs_cognee_eval.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • cognee/modules/search/vector/bm25.py
  • cognee/modules/search/vector/fusion.py
Files skipped from review as they are similar to previous changes (1)
  • cognee/modules/search/llm/extraction/categorize_relevant_category.py
Additional Context Used
Ruff (10)
cognee/api/v1/cognify/cognify.py (2)

111-111: Local variable added__basic_rag_chunks is assigned to but never used


188-188: Do not use bare except

cognee/modules/search/graph/search_categories.py (2)

10-10: Module level import not at top of file


11-11: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (6)

11-11: Module level import not at top of file


44-44: Module level import not at top of file


45-45: Module level import not at top of file


45-45: Redefinition of unused List from line 5


46-46: Module level import not at top of file


79-79: Local variable graph is assigned to but never used


await add("data://test_datasets", "initial_test")

graph = await cognify("initial_test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable graph is assigned to but never used.

- graph = await cognify("initial_test")
+ # Consider removing this line if 'graph' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
graph = await cognify("initial_test")
# Consider removing this line if 'graph' is not used.

logger = logging.getLogger(__name__)

class AnswerModel(BaseModel):
response:str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local variable response is annotated but never used within the AnswerModel. Consider removing it if it's not needed.

- response:str
+ # Remove this line if 'response' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
response:str
# Remove this line if 'response' is not used.

Comment on lines +44 to +46
import logging
from typing import List, Dict
from cognee.infrastructure import infrastructure_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidate module-level imports and remove redundant import of List.

- import logging
- from typing import List, Dict
- from cognee.infrastructure import infrastructure_config
+ import logging
+ from typing import Dict
+ from cognee.infrastructure import infrastructure_config

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
import logging
from typing import List, Dict
from cognee.infrastructure import infrastructure_config
import logging
from typing import Dict
from cognee.infrastructure import infrastructure_config

import dotenv
dotenv.load_dotenv()

from cognee.infrastructure.llm.get_llm_client import get_llm_client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move module-level imports to the top of the file.

- from cognee.infrastructure.llm.get_llm_client import get_llm_client
+ # Move this import to the top of the file

Committable suggestion was skipped due low confidence.

except FileTypeException:
logger.warning("File (%s) has an unknown file type. We are skipping it.", file_metadata["id"])

added_chunks: list[tuple[str, str, dict]] = await add_data_chunks(data_chunks)
added_chunks = await add_data_chunks(data_chunks)
added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable added__basic_rag_chunks is assigned to but never used.

- added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)
+ # Consider removing this line if 'added__basic_rag_chunks' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)
# added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)

Comment on lines +23 to +24
else:
raise ValueError("Unsupported graph engine type.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enhancing the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type.")
+ raise ValueError(f"Unsupported graph engine type: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

@@ -22,7 +22,7 @@ async def search_adjacent(graph: Union[nx.Graph, any], query: str, infrastructur

if node_id is None:
return {}

from cognee.infrastructure import infrastructure_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import statement for infrastructure_config. Consider removing it to clean up the code.


Enhance the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type in the configuration.")
+ raise ValueError(f"Unsupported graph engine type in the configuration: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

@@ -20,26 +22,23 @@ async def search_neighbour(graph: Union[nx.Graph, any], id: str, infrastructure_
Returns:
- List[str]: A list of 'description' attributes of nodes that share the same 'layer_uuid' with the specified node.
"""
node_id = other_param.get('node_id') if other_param else None
from cognee.infrastructure import infrastructure_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import statement for infrastructure_config. Consider removing it to clean up the code.


Enhance the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type in the configuration.")
+ raise ValueError(f"Unsupported graph engine type in the configuration: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

@@ -21,9 +36,22 @@ async def search_categories(graph: Union[nx.Graph, any], query_label: str, infra
each representing a node with 'nodeId' and 'summary'.
"""
# Determine which client is in use based on the configuration
from cognee.infrastructure import infrastructure_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import statement for infrastructure_config. Consider removing it to clean up the code.


Enhance the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type.")
+ raise ValueError(f"Unsupported graph engine type: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

Comment on lines +23 to +26
SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION'
NODE_CLASSIFICATION = 'NODE_CLASSIFICATION'
DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION',
CYPHER = 'CYPHER'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a default case in the from_str method to handle unexpected search types more gracefully.

- raise ValueError(f"{name} is not a valid SearchType")
+ raise ValueError(f"{name} is not a valid SearchType. Available types: {', '.join([e.name for e in SearchType])}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION'
NODE_CLASSIFICATION = 'NODE_CLASSIFICATION'
DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION',
CYPHER = 'CYPHER'
SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION'
NODE_CLASSIFICATION = 'NODE_CLASSIFICATION'
DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION',
CYPHER = 'CYPHER'

@Vasilije1990 Vasilije1990 merged commit 9bb30bc into main May 22, 2024
4 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants