continue.dev integration WIP #89

Vasilije1990 · 2024-04-28T15:09:46Z

No description provided.

coderabbitai · 2024-04-28T15:09:58Z

Walkthrough

The updates bring significant enhancements across various modules, focusing on document processing, graph topology, category search, and evaluation metrics. These changes include new imports, function modifications, attribute initialization, and logic improvements for categorizing relevant categories. The system's capabilities in document processing, search functionality, and evaluation have been significantly enriched.

Changes

File Path	Change Summary
`cognee/api/v1/cognify/cognify.py`	Added imports, initialized `graph_topology`, modified function calls, and commented out cognitive layers and telemetry.
`cognee/infrastructure/InfrastructureConfig.py`	Added and initialized `graph_topology` and `monitoring_tool` attributes, updated configuration methods.
`cognee/modules/search/graph/search_categories.py`	Added `strip_exact_regex` function, updated imports, modified `search_categories` function signature and logic.
`cognee/modules/search/llm/extraction/...`	Introduced `categorize_relevant_category` function for categorizing relevant categories using a language model.
`cognee/infrastructure/databases/graph/falkordb/...`	Enhanced `FalkorDBAdapter` with methods for adding nodes/edges, extracting nodes, filtering, and deleting nodes/graph.
`evals/generate_test_set.py`	Introduced functionality for generating evaluation datasets, synthesizing data, and evaluating metrics using `deepeval`.
`evals/simple_rag_vs_cognee_eval.py`	Added functions for evaluating models, retrieving answers, converting goldens to test cases, and evaluating metrics.
`evals/test_datasets/initial_test/natural_language_processing.txt`	Introduced an overview of Natural Language Processing (NLP).
`evals/test_datasets/initial_test/trump.txt`	Discussed Donald Trump's remarks on presidency, gun safety, and the NRA convention.

In the land of code, where data flows,
🐰 A rabbit tweaks where logic grows,
With nodes and graphs, a structure grand,
📚 Texts and queries hand in hand.
Evaluation metrics, precise and neat,
🌟 Enhancing systems, a coder's feat.
Celebrate the changes, big and small,
For in this realm, we stand tall! 🚀

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gitguardian · 2024-04-28T15:11:01Z

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
10531072	Triggered	Generic High Entropy Secret	`fec7010`	cognee/utils.py	View secret
10531072	Triggered	Generic High Entropy Secret	`fec7010`	cognee/utils.py	View secret
10593973	Triggered	Generic High Entropy Secret	`79311ee`	notebooks/full_run.ipynb	View secret
10593973	Triggered	Generic High Entropy Secret	`8e850c1`	notebooks/full_run.ipynb	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

coderabbitai

Actionable comments posted: 2

Out of diff range and nitpick comments (1)

notebooks/full_run.ipynb (1)

81-81: Re-importing SourceCodeGraph and reconfiguring the graph model is redundant. Consider streamlining the notebook by removing repeated code to enhance readability and maintainability.

coderabbitai · 2024-04-28T15:13:39Z

.data/code/example.txt

+'''
+	Given a string, find the length of the longest substring without repeating characters.
+
+	Examples:
+
+	Given "abcabcbb", the answer is "abc", which the length is 3.
+
+	Given "bbbbb", the answer is "b", with the length of 1.
+
+	Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
+'''


Ensure proper pairing of triple quotes in the comments for clarity.

- ''' + """

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

'''

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

'''

"""

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

"""

coderabbitai · 2024-04-28T15:13:39Z

cognee/modules/cognify/graph/add_cognitive_layer_graphs.py

+            type_node_id = generate_node_id(node.type)
+            type_node = await graph_client.extract_node(type_node_id)

-            if not entity_type_node:
-                node_name = node.entity_type.lower().capitalize()
+            if not type_node:
+                node_name = node.type.lower().capitalize()

-                entity_type_node = (
-                    entity_type_node_id,
+                type_node = (
+                    type_node_id,
                    dict(
-                        id = entity_type_node_id,
+                        id = type_node_id,
                        name = node_name,
-                        entity_type = node_name,
+                        type = node_name,


The creation of type nodes with standardized 'type' and 'name' properties aligns with the PR's objectives. Consider using a more precise timestamp format or a timezone-aware datetime object for 'created_at' and 'updated_at' to avoid potential issues with time zone differences.

- created_at = datetime.now().strftime("%Y-%m-%d %H:%M:%S"), - updated_at = datetime.now().strftime("%Y-%m-%d %H:%M:%S"), + from datetime import datetime, timezone + created_at = datetime.now(timezone.utc).isoformat(), + updated_at = datetime.now(timezone.utc).isoformat(),

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

type_node_id = generate_node_id(node.type)

type_node = await graph_client.extract_node(type_node_id)

if not entity_type_node:

node_name = node.entity_type.lower().capitalize()

if not type_node:

node_name = node.type.lower().capitalize()

entity_type_node = (

entity_type_node_id,

type_node = (

type_node_id,

dict(

id = entity_type_node_id,

id = type_node_id,

name = node_name,

entity_type = node_name,

type = node_name,

type_node_id = generate_node_id(node.type)

type_node = await graph_client.extract_node(type_node_id)

if not type_node:

node_name = node.type.lower().capitalize()

type_node = (

type_node_id,

dict(

id = type_node_id,

name = node_name,

type = node_name,

created_at = datetime.now(timezone.utc).isoformat(),

updated_at = datetime.now(timezone.utc).isoformat(),

coderabbitai

Actionable comments posted: 5

Out of diff range and nitpick comments (2)

cognee/modules/search/graph/search_summary.py (1)
Line range hint 18-49: The function search_summary is well-designed to handle different graph engines and provides detailed documentation. However, ensure that the f-string in the Neo4j query section is correctly used or removed if not needed.
-        cypher_query = f"""
+        cypher_query = """
        MATCH (n)
        WHERE n.id CONTAINS $query AND EXISTS(n.summary)
        RETURN n.id AS nodeId, n.summary AS summary
        """
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
Line range hint 134-134: Remove the unused variable error in the exception handling block to clean up the code.
- except Exception as error:
+ except Exception:
  logger.error("Failed to load graph from file: %s", file_path)
  # Initialize an empty graph in case of error
  self.graph = nx.MultiDiGraph()
  await self.save_graph_to_file(file_path)

coderabbitai · 2024-04-30T21:21:27Z

cognee/api/v1/cognify/cognify.py


        from cognee.utils import render_graph

-        await render_graph(graph, include_color=True, include_nodes=True, include_size=True)
+        await render_graph(graph, include_color=True, include_nodes=False, include_size=False)


Undefined variable graph used in render_graph.

- await render_graph(graph, include_color=True, include_nodes=False, include_size=False) + # Ensure that 'graph' is defined before this line or pass the correct graph object. + # await render_graph(graph, include_color=True, include_nodes=False, include_size=False)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

await render_graph(graph, include_color=True, include_nodes=False, include_size=False)

# Ensure that 'graph' is defined before this line or pass the correct graph object.

# await render_graph(graph, include_color=True, include_nodes=False, include_size=False)

coderabbitai · 2024-04-30T21:21:27Z

cognee/modules/search/llm/get_relevant_summary.py

+import logging
+from typing import List, Dict
+from cognee.infrastructure import infrastructure_config
+from.extraction.categorize_relevant_summary import categorize_relevant_summary


Ensure the import statement uses correct relative path syntax. It seems there might be a typo with the dot (.) before extraction.

- from.extraction.categorize_relevant_summary import categorize_relevant_summary + from extraction.categorize_relevant_summary import categorize_relevant_summary

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from.extraction.categorize_relevant_summary import categorize_relevant_summary

from extraction.categorize_relevant_summary import categorize_relevant_summary

coderabbitai · 2024-04-30T21:21:27Z

cognee/modules/search/graph/search_categories.py


 """ Search categories in the graph and return their summary attributes. """

-from cognee.shared.data_models import GraphDBType
+from cognee.shared.data_models import GraphDBType, DefaultContentPrediction


Move the import statement for networkx as nx to the top of the file to follow Python's convention of grouping all imports at the beginning.

- import networkx as nx + # Moved to top

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from cognee.shared.data_models import GraphDBType, DefaultContentPrediction

from cognee.shared.data_models import GraphDBType, DefaultContentPrediction

coderabbitai · 2024-04-30T21:21:28Z

notebooks/full_run.ipynb

+    },
+    {
+     "ename": "CatalogException",
+     "evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",


Catalog Error: Verify the table name file_metadata.

- train_dataset = cognee.datasets.query_data("short_stories") + # Ensure the table name is correct or exists in the database configuration. + # train_dataset = cognee.datasets.query_data("short_stories")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",

"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",

coderabbitai · 2024-04-30T21:21:28Z

notebooks/full_run.ipynb

+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "NameError",


NameError: results is not defined.

- descriptions = {node: graph.nodes[node].get('summary', 'No summary available') for node in results} + # Ensure that 'results' is defined before this line. + # descriptions = {node: graph.nodes[node].get('summary', 'No summary available') for node in results}

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

"ename": "NameError",

"ename": "NameError",

coderabbitai

Actionable comments posted: 5

Out of diff range and nitpick comments (2)

cognee/shared/data_models.py (1)

11-11: Update the type attribute documentation to reflect its purpose more clearly.
cognee/modules/cognify/graph/create.py (1)
Line range hint 183-183: The local variable relationship_data is assigned but never used. This could be a leftover from a previous version of the code. Consider removing it or implementing its intended use.
- relationship_data = {}

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8e850c1 and e1a9a23.

Files selected for processing (11)

cognee/api/v1/cognify/cognify.py (4 hunks)
cognee/api/v1/config/config.py (1 hunks)
cognee/config.py (2 hunks)
cognee/infrastructure/InfrastructureConfig.py (4 hunks)
cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (1 hunks)
cognee/infrastructure/files/utils/get_file_metadata.py (1 hunks)
cognee/modules/cognify/graph/create.py (2 hunks)
cognee/modules/cognify/graph/initialize_graph.py (1 hunks)
cognee/shared/GithubTopology.py (1 hunks)
cognee/shared/data_models.py (7 hunks)
cognee/utils.py (2 hunks)

Files not reviewed due to errors (6)

cognee/shared/GithubTopology.py (no review received)
cognee/modules/cognify/graph/initialize_graph.py (no review received)
cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (no review received)
cognee/api/v1/config/config.py (no review received)
cognee/config.py (no review received)
cognee/api/v1/cognify/cognify.py (no review received)

Additional Context Used

Ruff (4)

cognee/api/v1/cognify/cognify.py (1)

80-80: Local variable graph_topology is assigned to but never used

cognee/infrastructure/files/utils/get_file_metadata.py (1)

24-24: Do not use bare except

cognee/modules/cognify/graph/create.py (1)

183-183: Local variable relationship_data is assigned to but never used

cognee/shared/GithubTopology.py (1)

21-21: Undefined name Directory

Additional comments not posted (2)

cognee/infrastructure/InfrastructureConfig.py (1)

36-36: The implementation of InfrastructureConfig for managing system configurations is robust and well-structured.
cognee/shared/data_models.py (1)
252-254: Ensure that the ResponseSummaryModel class is integrated properly in the system, especially where document IDs and response summaries are handled.

Verification successful

The review of the ResponseSummaryModel integration within the system shows that the model is used in various parts of the system, particularly in functions that generate structured outputs from large language models. The acreate_structured_output and model_dump functions are designed to utilize the model in a generic way, which suggests that the handling of specific fields like document_id and response_summary would depend on the implementation of the ResponseSummaryModel itself.

Given the abstract nature of these functions and the lack of specific errors or misconfigurations in the script outputs, it appears that the integration is as expected. However, for a complete verification, a detailed review of how ResponseSummaryModel is structured and its interaction with these functions would be necessary.

cognee/modules/search/graph/search_summary.py: Uses ResponseSummaryModel in categorize_relevant_summary.

cognee/infrastructure/llm/: Contains implementations of acreate_structured_output that use ResponseSummaryModel.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of `ResponseSummaryModel` in the system.

# Test: Search for the usage of `ResponseSummaryModel`. Expect: Proper handling of document IDs and response summaries.
rg --type py 'ResponseSummaryModel'
Length of output: 397

Script:
#!/bin/bash
# Description: Inspect the usage of `ResponseSummaryModel` in the function `categorize_relevant_summary`.

# Test: Search for the implementation of `categorize_relevant_summary` to see how it handles `document_id` and `response_summary`.
rg --type py 'def categorize_relevant_summary' -A 20
Length of output: 1388

Script:
#!/bin/bash
# Description: Further inspect the usage of `ResponseSummaryModel` within `acreate_structured_output` and `model_dump`.

# Test: Search for the implementation of `acreate_structured_output` and `model_dump` to see how they handle `document_id` and `response_summary`.
rg --type py 'acreate_structured_output' -A 10
rg --type py 'model_dump' -A 10
Length of output: 13097

coderabbitai · 2024-05-09T21:19:53Z

cognee/infrastructure/files/utils/get_file_metadata.py

+    except:
+        keywords = ["no keywords detected" + str(uuid.uuid4())]


Specify exception type in the except block to avoid masking unexpected errors.

- except: + except Exception as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

except:

keywords = ["no keywords detected" + str(uuid.uuid4())]

except Exception as e:

keywords = ["no keywords detected" + str(uuid.uuid4())]

coderabbitai · 2024-05-09T21:19:53Z

cognee/shared/data_models.py

@@ -36,13 +34,13 @@ class ChunkStrategy(Enum):
    EXACT = "exact"
    PARAGRAPH = "paragraph"
    SENTENCE = "sentence"
+    CODE    = "code"


Tip
Codebase Verification

Several files in the codebase do not handle the new CODE value in ChunkStrategy enum:

DefaultChunkEngine.py: This file handles PARAGRAPH, SENTENCE, and EXACT values but does not include a case for CODE.

Analysis chain

Addition of CODE to ChunkStrategy enum. Ensure that all usages of this enum are updated to handle this new value.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify all usages of `ChunkStrategy` enum handle the new `CODE` value. # Test: Search for the enum usage. Expect: Handling of the new value in all occurrences. rg --type py 'ChunkStrategy'

Length of output: 911

coderabbitai · 2024-05-09T21:19:53Z

cognee/utils.py

+        print("Telemetry is disabled.")
+        logging.info("Telemetry is disabled.")


Logging and printing the same message about telemetry being disabled is redundant. Consider removing the print statement to clean up the console output.

- print("Telemetry is disabled.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

print("Telemetry is disabled.")

logging.info("Telemetry is disabled.")

logging.info("Telemetry is disabled.")

coderabbitai · 2024-05-09T21:19:53Z

cognee/modules/cognify/graph/create.py

@@ -44,6 +44,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di

            # Add an edge if a parent ID is provided and the graph engine is NETWORKX
            if parent_id and "default_relationship" in node_data and infrastructure_config.get_config()["graph_engine"] == GraphDBType.NETWORKX:
+                print("Node id", node_id)


Logging the node ID during the edge addition process might not be necessary and could clutter the log files. Consider removing this print statement unless it is essential for debugging.

- print("Node id", node_id)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

print("Node id", node_id)

coderabbitai · 2024-05-09T21:19:53Z

cognee/modules/cognify/graph/create.py

@@ -103,6 +104,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di


 async def add_edge(client, parent_id: Optional[str], node_id: str, node_data: dict, created_node_ids):
+    print('NODE ID', node_data)


The print statement here seems to be for debugging purposes. It's a good practice to use a logging framework instead of print statements for better control over logging levels and outputs.

- print('NODE ID', node_data) + logging.debug('NODE ID: %s', node_data)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

print('NODE ID', node_data)

logging.debug('NODE ID: %s', node_data)

coderabbitai

Actionable comments posted: 3

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e1a9a23 and 01446de.

Files selected for processing (5)

cognee/api/v1/cognify/cognify.py (4 hunks)
cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
cognee/modules/topology/extraction/extract_topology.py (1 hunks)
cognee/modules/topology/infer_data_topology.py (1 hunks)
cognee/modules/topology/topology.py (1 hunks)

Files skipped from review due to trivial changes (1)

cognee/infrastructure/llm/prompts/extract_topology.txt

Additional Context Used

Ruff (3)

cognee/api/v1/cognify/cognify.py (1)

88-88: Local variable graph_topology is assigned to but never used

cognee/modules/topology/topology.py (2)

7-7: Redefinition of unused BaseModel from line 4

8-8: Redefinition of unused Dict from line 5

Additional comments not posted (4)

cognee/modules/topology/extraction/extract_topology.py (1)

7-14: The implementation of extract_categories function looks clean and efficient.

cognee/modules/topology/infer_data_topology.py (1)

9-19: The implementation of infer_data_topology function is robust, handling default configurations and exceptions effectively.

cognee/modules/topology/topology.py (1)

71-120: The TopologyEngine class and its methods are well-implemented, providing clear functionality for managing repository topology.

cognee/api/v1/cognify/cognify.py (1)

Line range hint 17-201: The cognify function and the main test function are well-implemented, covering a comprehensive range of functionalities for processing and graph operations.

coderabbitai · 2024-05-10T13:19:19Z

cognee/modules/topology/topology.py

+from pydantic import BaseModel, create_model
+from typing import Dict, Type, Any
+
+from pydantic import BaseModel, Field


Redefinition of BaseModel from line 4 is unnecessary and can be removed to avoid confusion.

- from pydantic import BaseModel, Field

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from pydantic import BaseModel, Field

coderabbitai · 2024-05-10T13:19:19Z

cognee/modules/topology/topology.py

+from typing import Dict, Type, Any
+
+from pydantic import BaseModel, Field
+from typing import Dict, List, Optional, Union


Redefinition of Dict from line 5 is unnecessary and can be removed to avoid confusion.

- from typing import Dict, List, Optional, Union

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from typing import Dict, List, Optional, Union

coderabbitai · 2024-05-10T13:19:19Z

cognee/api/v1/cognify/cognify.py

+
+
+
+    graph_topology = infrastructure_config.get_config()["graph_topology"]


The variable graph_topology is assigned but never used, which could be removed to clean up the code.

- graph_topology = infrastructure_config.get_config()["graph_topology"]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

graph_topology = infrastructure_config.get_config()["graph_topology"]

coderabbitai

Actionable comments posted: 6

Out of diff range and nitpick comments (4)

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
Line range hint 6-6: Remove duplicate import of TextEmbedding.
- from fastembed import TextEmbedding
Line range hint 5-5: Remove duplicate import of AsyncOpenAI.
- from openai import AsyncOpenAI
cognee/modules/cognify/graph/add_label_nodes.py (1)
Line range hint 8-8: Remove the unused variable vector_client.
- vector_client = infrastructure_config.get_config("vector_engine")
cognee/modules/cognify/graph/create.py (1)
Line range hint 219-219: Remove the unused variable relationship_data.
- relationship_data = {}

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 01446de and 7249c24.

Files ignored due to path filters (2)

poetry.lock is excluded by !**/*.lock, !**/*.lock
pyproject.toml is excluded by !**/*.toml

Files selected for processing (8)

cognee/api/v1/cognify/cognify.py (5 hunks)
cognee/api/v1/topology/add_topology.py (1 hunks)
cognee/infrastructure/data/chunking/DefaultChunkEngine.py (1 hunks)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
cognee/modules/cognify/graph/add_label_nodes.py (2 hunks)
cognee/modules/cognify/graph/create.py (4 hunks)
cognee/modules/topology/topology.py (1 hunks)

Files skipped from review due to trivial changes (1)

cognee/infrastructure/data/chunking/DefaultChunkEngine.py

Additional Context Used

LanguageTool (2)

cognee/infrastructure/llm/prompts/extract_topology.txt (2)

Near line 2: Inconsistent capitalization in contraction.
Context: ... the text provided to you. Relationship part'S can't be empty, and have to be logical ...

Near line 2: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS O...

Ruff (11)

cognee/api/v1/cognify/cognify.py (1)

121-121: Local variable graph_topology is assigned to but never used

cognee/api/v1/topology/add_topology.py (2)

16-16: Local variable graph_topology is assigned to but never used

80-80: Local variable graph is assigned to but never used

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)

7-7: Redefinition of unused TextEmbedding from line 6

8-8: Redefinition of unused AsyncOpenAI from line 5

cognee/modules/cognify/graph/add_label_nodes.py (1)

8-8: Local variable vector_client is assigned to but never used

cognee/modules/cognify/graph/create.py (3)

159-159: Local variable created_node_id is assigned to but never used

219-219: Local variable relationship_data is assigned to but never used

254-254: Local variable ids is assigned to but never used

cognee/modules/topology/topology.py (2)

7-7: Redefinition of unused BaseModel from line 4

8-8: Redefinition of unused Dict from line 5

Additional comments not posted (2)

cognee/modules/topology/topology.py (2)

7-7: Remove duplicate import of BaseModel.

8-8: Remove duplicate import of Dict.

coderabbitai · 2024-05-11T21:14:59Z

cognee/infrastructure/llm/prompts/extract_topology.txt

@@ -0,0 +1,4 @@
+You are a topology master and need to extract the following topology information from the text provided to you.
+Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY


Correct grammatical issues in the prompt.

- Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY + Relationship parts can't be empty and have to be logical and connecting elements of the topology.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY

Relationship parts can't be empty and have to be logical and connecting elements of the topology.

coderabbitai · 2024-05-11T21:14:59Z

cognee/api/v1/topology/add_topology.py

+        #
+        # await add("data://" +data_directory_path, "example")
+
+        graph = await add_topology()


Remove the unused variable graph.

- graph = await add_topology()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

graph = await add_topology()

await add_topology()

coderabbitai · 2024-05-11T21:14:59Z

cognee/api/v1/cognify/cognify.py

@@ -109,78 +118,91 @@ async def process_text(chunk_collection: str, chunk_id: str, input_text: str, fi

    graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])

+    graph_topology = infrastructure_config.get_config()["graph_topology"]


Remove the unused variable graph_topology.

- graph_topology = infrastructure_config.get_config()["graph_topology"]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

graph_topology = infrastructure_config.get_config()["graph_topology"]

coderabbitai · 2024-05-11T21:15:00Z

cognee/modules/cognify/graph/create.py

    if isinstance(value, BaseModel):
        node_id = await generate_node_id(value)
        node_data = value.model_dump()

-        # Use the specified default relationship for the edge between the parent node and the current node
-
+        # Add node to the graph
        created_node_id = await add_node(graph_client, parent_id, node_id, node_data)


Remove the unused variable created_node_id.

- created_node_id = await add_node(graph_client, parent_id, node_id, node_data)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

created_node_id = await add_node(graph_client, parent_id, node_id, node_data)

coderabbitai · 2024-05-11T21:15:00Z

cognee/modules/cognify/graph/create.py


    for attribute_name, attribute_value in graph_model:
-        ids = await process_attribute_edge(graph_client, root_id, attribute_name, attribute_value, flattened_and_deduplicated)
-
+        ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)


Remove the unused variable ids.

- ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)

await process_attribute(graph_client, root_id, attribute_name, attribute_value)

coderabbitai

Actionable comments posted: 1

Out of diff range and nitpick comments (4)

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
Line range hint 6-6: Duplicate import of TextEmbedding from fastembed. Remove the redundant import to clean up the code.
- from fastembed import TextEmbedding
Line range hint 5-5: Duplicate import of AsyncOpenAI from openai. It's good practice to avoid duplicate imports to maintain code clarity.
- from openai import AsyncOpenAI
cognee/modules/cognify/graph/add_data_chunks.py (2)
Line range hint 13-13: The variable vector_client is assigned but never used in the uncommented sections of the code. If it's not needed, consider removing it to clean up the code.
- vector_client = infrastructure_config.get_config("vector_engine")
Line range hint 18-59: The commented-out error handling and retry logic could be important for robustness, especially in production environments. Consider implementing a more structured error handling and retry mechanism.

Would you like me to help implement a robust error handling and retry mechanism for this function?

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7249c24 and 3c261ce.

Files selected for processing (6)

cognee/api/v1/cognify/cognify.py (7 hunks)
cognee/config.py (3 hunks)
cognee/infrastructure/data/chunking/LangchainChunkingEngine.py (1 hunks)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
cognee/modules/cognify/graph/add_data_chunks.py (3 hunks)
cognee/shared/GithubClassification.py (1 hunks)

Files skipped from review as they are similar to previous changes (2)

cognee/config.py
cognee/infrastructure/data/chunking/LangchainChunkingEngine.py

Additional Context Used

Ruff (5)

cognee/api/v1/cognify/cognify.py (1)

125-125: Local variable graph_topology is assigned to but never used

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3)

7-7: Redefinition of unused TextEmbedding from line 6

8-8: Redefinition of unused AsyncOpenAI from line 5

40-40: Do not use bare except

cognee/modules/cognify/graph/add_data_chunks.py (1)

13-13: Local variable vector_client is assigned to but never used

Additional comments not posted (8)

cognee/shared/GithubClassification.py (4)

8-13: The enum TextSubclass is well-defined with clear, descriptive categories for different types of text content.

15-17: The ContentType class is simple and correctly uses Pydantic for data validation. The default type of "TEXT" is appropriate for the base class.

19-21: The TextContent class correctly inherits from ContentType and introduces a new field subclass which is a list of TextSubclass. This design supports multiple categorizations for text content, which is flexible and useful.

23-25: The CodeContentPrediction model is succinct and appropriate for predicting content types. It uses TextContent for the label field, which ensures that predictions are tied to the defined text categories.
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (1)
Line range hint 16-66: The DefaultEmbeddingEngine and LiteLLMEmbeddingEngine classes are well-structured for handling text embeddings. Ensure that the verbose setting and configuration loading are appropriately managed to avoid side effects.

Verification successful

The review comment raised a concern about the potential side effects of the set_verbose setting used in the DefaultEmbeddingEngine class. The verification process involved searching for the usage of set_verbose across the application to ensure it does not alter any critical settings elsewhere.

The results from the executed script show that set_verbose is only used in the DefaultEmbeddingEngine.py file and is not found in any other parts of the application. This indicates that the setting is localized to this specific class and does not impact other components or configurations.

Therefore, the concern about unintended side effects from the set_verbose setting does not apply, as its usage is confined and does not interfere with other parts of the application.

File: cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py

Detail: The set_verbose = True setting is used only within this file and does not affect other application settings.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the verbose setting does not cause unintended side effects across the application.

# Test: Search for `set_verbose` usage in other parts of the application. Expect: No other critical settings are altered.
rg --type py 'set_verbose'
Length of output: 129
cognee/api/v1/cognify/cognify.py (3)

125-125: The variable graph_topology is assigned but never used, which could be removed to clean up the code.

215-215: Ensure that the graph variable is defined before it's used in render_graph. This is crucial to avoid runtime errors.

192-212: The test function is well-structured for testing the cognify process. However, ensure that all configurations and dependencies are correctly set up before running the tests to avoid issues.

coderabbitai · 2024-05-12T20:00:47Z

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py

+        print("text", text)
+        try:
+            text = str(text[0])
+        except:


Avoid using a bare except statement. It's better to specify the exception type to avoid catching unexpected exceptions and to make the error handling clearer.

- except: + except SomeSpecificException:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

except:

except SomeSpecificException:

coderabbitai

Actionable comments posted: 5

Out of diff range and nitpick comments (3)

cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (2)
Line range hint 14-29: Consider refactoring the lambda function for converting embeddings to lists for improved readability.
- embeddings_list = list(map(lambda embedding: embedding.tolist(), embedding_model.embed(text)))
+ embeddings_list = [embedding.tolist() for embedding in embedding_model.embed(text)]
42-46: Determine if the commented-out debug prints are necessary. If they are not, consider removing them to clean up the code.
cognee/modules/cognify/graph/add_cognitive_layer_graphs.py (1)
[!TIP]
Codebase Verification

Commented-out text processing functions are used elsewhere in the codebase

The commented-out text processing functions (extract_pos_tags, extract_named_entities, extract_sentiment_vader) are indeed used in other parts of the codebase, specifically in cognee/utils.py and cognee/infrastructure/data/utils/extract_keywords.py.

cognee/utils.py:

extract_pos_tags

extract_named_entities

extract_sentiment_vader

cognee/infrastructure/data/utils/extract_keywords.py:

extract_pos_tags

Given their usage in other files, it is important to retain these functions in the codebase. However, if the commented-out lines in add_cognitive_layer_graphs.py are not needed, they can be removed to clean up the code.

Consider removing the commented-out lines in add_cognitive_layer_graphs.py if they are no longer needed.

Analysis chain

Line range hint 144-267: Review the commented-out lines related to text processing. If they are no longer needed, consider removing them to clean up the code.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check if the commented-out text processing functions are used elsewhere in the codebase.

# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type python $'extract_pos_tags|extract_named_entities|extract_sentiment_vader'
Length of output: 115

Script:
#!/bin/bash
# Description: Check if the commented-out text processing functions are used elsewhere in the codebase.

# Test: Search for the functions usage. Expect: No occurrences outside this file.
rg --type py 'extract_pos_tags|extract_named_entities|extract_sentiment_vader'
Length of output: 1081

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3c261ce and db0b19b.

Files ignored due to path filters (2)

poetry.lock is excluded by !**/*.lock, !**/*.lock
pyproject.toml is excluded by !**/*.toml

Files selected for processing (11)

cognee/api/v1/cognify/cognify.py (3 hunks)
cognee/config.py (3 hunks)
cognee/infrastructure/InfrastructureConfig.py (4 hunks)
cognee/infrastructure/databases/vector/embeddings/DefaultEmbeddingEngine.py (3 hunks)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2 hunks)
cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
cognee/modules/cognify/graph/add_cognitive_layer_graphs.py (5 hunks)
cognee/modules/cognify/graph/add_data_chunks.py (2 hunks)
cognee/modules/cognify/graph/add_document_node.py (2 hunks)
cognee/modules/cognify/graph/add_label_nodes.py (1 hunks)
cognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py (1 hunks)

Files skipped from review as they are similar to previous changes (3)

cognee/config.py
cognee/infrastructure/InfrastructureConfig.py
cognee/modules/cognify/graph/add_data_chunks.py

Additional Context Used

Ruff (4)

cognee/api/v1/cognify/cognify.py (3)

179-179: Local variable graph_topology is assigned to but never used

184-184: Local variable document_id is assigned to but never used

201-201: Local variable classified_categories is assigned to but never used

cognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py (1)

21-21: Local variable error is assigned to but never used

Additional comments not posted (2)

cognee/modules/cognify/graph/add_document_node.py (1)

17-17: Correctly sets the document type to "Document".

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

Line range hint 1-1: The implementation of LanceDBAdapter correctly handles interactions with the vector database.

coderabbitai · 2024-05-17T09:18:48Z

cognee/modules/cognify/graph/add_document_node.py

@@ -25,6 +25,7 @@ async def add_document_node(graph_client: GraphDBInterface, parent_node_id, docu
        dict(relationship_name = "has_document"),
    )

-    await add_label_nodes(graph_client, document_id, document_metadata["keywords"].split("|"))
+    #
+    # await add_label_nodes(graph_client, document_id, document_metadata["keywords"].split("|"))


Tip
Codebase Verification

Consider whether the commented-out lines for adding label nodes are still needed. If not, they should be removed to clean up the code.

cognee/modules/cognify/graph/add_document_node.py: Line 29

cognee/api/v1/cognify/cognify.py: Line with # await add_label_nodes(graph_client, document_id, chunk_id, file_metadata["keywords"].split("|"))

Analysis chain

Consider whether the commented-out line for adding label nodes is still needed. If not, it should be removed to clean up the code.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check if the function `add_label_nodes` is used elsewhere in the codebase. # Test: Search for the function usage. Expect: No occurrences outside this file. rg --type python $'add_label_nodes'

Length of output: 65

Script:

#!/bin/bash # Description: Check if the function `add_label_nodes` is used elsewhere in the codebase. # Test: Search for the function usage. Expect: No occurrences outside this file. rg 'add_label_nodes' --glob '*.py'

Length of output: 579

coderabbitai · 2024-05-17T09:18:48Z

cognee/modules/data/extraction/knowledge_graph/extract_knowledge_graph.py

@@ -21,4 +21,4 @@ def sync_extract_knowledge_graph():
    except Exception as error:
        # TODO: Log error to Sentry

-        return await extract_content_graph(text, cognitive_layer, graph_model)
+        return await extract_content_graph(text, cognitive_layer, graph_model)


Ensure proper error handling by logging the exception. Currently, the error is caught but not logged, which could hinder debugging.

- except Exception as error: + except Exception as error: + logger.error(f"Failed to extract knowledge graph: {error}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

return await extract_content_graph(text, cognitive_layer, graph_model)

return await extract_content_graph(text, cognitive_layer, graph_model)

except Exception as error:

logger.error(f"Failed to extract knowledge graph: {error}")

coderabbitai · 2024-05-17T09:18:48Z

cognee/modules/cognify/graph/add_label_nodes.py

+    except Exception as e:
        # It's ok if the collection already exists.
-        pass
+        print(e)


Specify the exception type instead of using a generic exception to avoid catching unexpected exceptions and to make the error handling clearer.

- except Exception as e: + except CollectionExistsException as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

except Exception as e:

# It's ok if the collection already exists.

pass

print(e)

except CollectionExistsException as e:

# It's ok if the collection already exists.

print(e)

coderabbitai · 2024-05-17T09:18:48Z

cognee/infrastructure/llm/generic_llm_api/adapter.py

+    async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel:
        """Generate a response from a user query."""
+
        return await self.aclient.chat.completions.create(
-            model=self.model,
-            messages=[
+            model = self.model,
+            messages = [
                {
                    "role": "user",
                    "content": f"""Use the given format to
-                    extract information from the following input: {text_input}. {system_prompt} """,
-                }
+                    extract information from the following input: {text_input}. """,
+                },
+                {"role": "system", "content": system_prompt},
            ],
-            response_model=response_model,
+            response_model = response_model,


Simplify the string formatting in acreate_structured_output to reduce complexity and potential errors.

- "content": f"""Use the given format to - extract information from the following input: {text_input}. """, + "content": "Use the given format to extract information from the following input: " + text_input

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel:

"""Generate a response from a user query."""

return await self.aclient.chat.completions.create(

model=self.model,

messages=[

model = self.model,

messages = [

{

"role": "user",

"content": f"""Use the given format to

extract information from the following input: {text_input}. {system_prompt} """,

}

extract information from the following input: {text_input}. """,

},

{"role": "system", "content": system_prompt},

],

response_model=response_model,

response_model = response_model,

async def acreate_structured_output(self, text_input: str, system_prompt: str, response_model: Type[BaseModel]) -> BaseModel:

"""Generate a response from a user query."""

return await self.aclient.chat.completions.create(

model = self.model,

messages = [

{

"role": "user",

"content": "Use the given format to extract information from the following input: " + text_input,

},

{"role": "system", "content": system_prompt},

],

response_model = response_model,

coderabbitai · 2024-05-17T09:18:48Z

cognee/api/v1/cognify/cognify.py

+    #      categories = classified_categories,
+    #  )
+
+    classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]
+
+
+#
+# async def process_text(document_id: str, chunk_id: str, chunk_collection: str, input_text: str):
+#     raw_document_id = document_id.split("__")[-1]
+#
+#     print(f"Processing chunk ({chunk_id}) from document ({raw_document_id}).")
+#
+#     graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])
+#
+#     classified_categories = await get_content_categories(input_text)
+#     await add_classification_nodes(
+#         graph_client,
+#         parent_node_id = document_id,
+#         categories = classified_categories,
+#     )
+# >>>>>>> origin/main
+#
+#     print(f"Chunk ({chunk_id}) classified.")
+#
+#     # print("document_id", document_id)
+#     #
+#     # content_summary = await get_content_summary(input_text)
+#     # await add_summary_nodes(graph_client, document_id, content_summary)
+#
+#     print(f"Chunk ({chunk_id}) summarized.")
+#     #
+#     cognitive_layers = await get_cognitive_layers(input_text, classified_categories)
+#     cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2]
+#     #
+#     layer_graphs = await get_layer_graphs(input_text, cognitive_layers)
+#     await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs)
+#
+# <<<<<<< HEAD
+#     print("got here 4444")
+    #
+    # if infrastructure_config.get_config()["connect_documents"] is True:
+    #     db_engine = infrastructure_config.get_config()["database_engine"]
+    #     relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = file_metadata["id"])
+    #
+    #     list_of_nodes = []
+    #
+    #     relevant_documents_to_connect.append({
+    #         "layer_id": document_id,
+    #     })
+    #
+    #     for document in relevant_documents_to_connect:
+    #         node_descriptions_to_match = await graph_client.extract_node_description(document["layer_id"])
+    #         list_of_nodes.extend(node_descriptions_to_match)
+    #
+    #     nodes_by_layer = await group_nodes_by_layer(list_of_nodes)
+    #
+    #     results = await resolve_cross_graph_references(nodes_by_layer)
+    #
+    #     relationships = graph_ready_output(results)
+    #
+    #     await connect_nodes_in_graph(
+    #         graph_client,
+    #         relationships,
+    #         score_threshold = infrastructure_config.get_config()["intra_layer_score_treshold"]
+    #     )
+    #
+    # send_telemetry("cognee.cognify")
+    #
+    # print(f"Chunk ({chunk_id}) cognified.")
+# =======
+#     if infrastructure_config.get_config()["connect_documents"] is True:
+#         db_engine = infrastructure_config.get_config()["database_engine"]
+#         relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = raw_document_id)
+#
+#         list_of_nodes = []
+#
+#         relevant_documents_to_connect.append({
+#             "layer_id": document_id,
+#         })
+#
+#         for document in relevant_documents_to_connect:
+#             node_descriptions_to_match = await graph_client.extract_node_description(document["layer_id"])
+#             list_of_nodes.extend(node_descriptions_to_match)
+#
+#         nodes_by_layer = await group_nodes_by_layer(list_of_nodes)
+#
+#         results = await resolve_cross_graph_references(nodes_by_layer)
+#
+#         relationships = graph_ready_output(results)
+#
+#         await connect_nodes_in_graph(
+#             graph_client,
+#             relationships,
+#             score_threshold = infrastructure_config.get_config()["intra_layer_score_treshold"]
+#         )
+#
+#     send_telemetry("cognee.cognify")
+#
+#     print(f"Chunk ({chunk_id}) cognified.")
+# >>>>>>> origin/main


Tip
Codebase Verification

Commented-out code cleanup needed

The functions add_classification_nodes, get_content_categories, get_content_summary, and add_summary_nodes are defined and imported but are not used outside of the commented-out sections in cognify.py. Consider removing these commented-out lines to clean up the code.

cognee/api/v1/cognify/cognify.py: Lines containing commented-out code related to the mentioned functions.

Analysis chain

Review the commented-out lines and debug prints. If they are no longer needed, consider removing them to clean up the code.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check if the commented-out functions are used elsewhere in the codebase. # Test: Search for the functions usage. Expect: No occurrences outside this file. rg --type python $'add_classification_nodes|get_content_categories|get_content_summary|add_summary_nodes'

Length of output: 137

Script:

#!/bin/bash # Description: Check if the commented-out functions are used elsewhere in the codebase. # Test: Search for the functions usage. Expect: No occurrences outside this file. rg --type py $'add_classification_nodes|get_content_categories|get_content_summary|add_summary_nodes'

Length of output: 1559

coderabbitai

Actionable comments posted: 3

Out of diff range and nitpick comments (1)

cognee/api/v1/cognify/cognify.py (1)
[!TIP]
Codebase Verification

Unused Imports in cognee/api/v1/cognify/cognify.py

The following imports are not actively used in the code and should be considered for removal:

add_classification_nodes

add_summary_nodes

group_nodes_by_layer

graph_ready_output

connect_nodes_in_graph

resolve_cross_graph_references

get_content_categories

get_content_summary

get_cognitive_layers

get_layer_graphs

send_telemetry

These imports are either commented out or not referenced in the active code. Removing them can help clean up the codebase and improve maintainability.

Analysis chain

Line range hint 11-35: Ensure all new imports are utilized effectively in the code.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that all newly imported modules are used in the file.

# Test: Search for the usage of newly imported modules. Expect: At least one occurrence of each module.
rg --type python $'prune|Config|LangchainChunkingEngine|LiteLLMEmbeddingEngine|add_data_chunks|add_document_node|add_classification_nodes|add_cognitive_layer_graphs|add_summary_nodes|group_nodes_by_layer|graph_ready_output|connect_nodes_in_graph|resolve_cross_graph_references|get_graph_client|add_cognitive_layers|guess_file_type|extract_text_from_file|infrastructure_config|get_content_categories|get_content_summary|get_cognitive_layers|get_layer_graphs|TopologyEngine|CodeContentPrediction|ChunkStrategy|DefaultGraphModel|send_telemetry' cognee/api/v1/cognify/cognify.py
Length of output: 606

Script:
#!/bin/bash
# Description: Verify that all newly imported modules are used in the file.

# Test: Search for the usage of newly imported modules. Expect: At least one occurrence of each module.
rg $'prune|Config|LangchainChunkingEngine|LiteLLMEmbeddingEngine|add_data_chunks|add_document_node|add_classification_nodes|add_cognitive_layer_graphs|add_summary_nodes|group_nodes_by_layer|graph_ready_output|connect_nodes_in_graph|resolve_cross_graph_references|get_graph_client|add_cognitive_layers|guess_file_type|extract_text_from_file|infrastructure_config|get_content_categories|get_content_summary|get_cognitive_layers|get_layer_graphs|TopologyEngine|CodeContentPrediction|ChunkStrategy|DefaultGraphModel|send_telemetry' cognee/api/v1/cognify/cognify.py
Length of output: 5726

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between db0b19b and 4e6fcde.

Files selected for processing (1)

cognee/api/v1/cognify/cognify.py (3 hunks)

Additional Context Used

Ruff (3)

cognee/api/v1/cognify/cognify.py (3)

184-184: Local variable parent_node_id is assigned to but never used

186-186: Local variable document_id is assigned to but never used

203-203: Local variable classified_categories is assigned to but never used

Additional comments not posted (2)

cognee/api/v1/cognify/cognify.py (2)

179-179: Remove unused variable graph_topology.

This issue has been flagged in previous reviews. Please ensure it is addressed.

328-328: Undefined variable graph used in render_graph.

This issue has been flagged in previous reviews. Please ensure it is addressed.

coderabbitai · 2024-05-17T12:26:33Z

cognee/api/v1/cognify/cognify.py

+    if graph_topology == "default":
+        parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}"
+    elif graph_topology == DefaultGraphModel:
+        parent_node_id = f"DefaultGraphModel__{USER_ID}"


Local variable parent_node_id is assigned but never used in the process_text function.

- parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}"

Consider removing this line if parent_node_id is not needed, or ensure it is used appropriately if intended.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

parent_node_id = f"DefaultGraphModel__{USER_ID}"

coderabbitai · 2024-05-17T12:26:33Z

cognee/api/v1/cognify/cognify.py


-    print(f"Chunk ({chunk_id}) cognified.")
+    document_id = await add_document_node(


Local variable document_id is assigned but never used in the process_text function.

- document_id = await add_document_node( - graph_client, - parent_node_id = f"{file_metadata['name']}.{file_metadata['extension']}", - document_metadata = file_metadata, - )

Consider removing this line if document_id is not needed, or ensure it is used appropriately if intended.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

document_id = await add_document_node(

coderabbitai · 2024-05-17T12:26:33Z

cognee/api/v1/cognify/cognify.py

+    #      categories = classified_categories,
+    #  )
+
+    classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]


Local variable classified_categories is assigned but never used in the process_text function.

- classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]

Consider removing this line if classified_categories is not needed, or ensure it is used appropriately if intended.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

classified_categories= [{'data_type': 'text', 'category_name': 'Source code in various programming languages'}]

coderabbitai

Actionable comments posted: 1

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4e6fcde and 57e3e2e.

Files selected for processing (3)

cognee/api/v1/cognify/cognify.py (3 hunks)
cognee/config.py (3 hunks)
cognee/modules/cognify/graph/add_document_node.py (2 hunks)

Files skipped from review as they are similar to previous changes (2)

cognee/config.py
cognee/modules/cognify/graph/add_document_node.py

Additional Context Used

Ruff (1)

cognee/api/v1/cognify/cognify.py (1)

188-188: Do not use bare except

Additional comments not posted (1)

cognee/api/v1/cognify/cognify.py (1)
227-255: Ensure proper asynchronous handling in the main testing block.

The main block uses asyncio.run(test()) which is correct for running async code from a synchronous context. However, ensure that all async functions within test() are awaited properly to avoid runtime errors.

Verification successful

Ensure proper asynchronous handling in the main testing block.

The main block uses asyncio.run(test()) which is correct for running async code from a synchronous context. However, ensure that all async functions within test() are awaited properly to avoid runtime errors.

The test function in cognee/api/v1/cognify/cognify.py correctly uses await for the cognify and render_graph functions.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify proper use of async/await in the main testing block.

# Test: Search for missing 'await' in async function calls within the main block.
ast-grep --lang python --pattern $'if __name__ == "__main__" { $$$ async def test() { $$$ } $$$ }'
Length of output: 444

Script:
#!/bin/bash
# Description: Verify proper use of async/await in the main testing block.

# Test: Search for missing 'await' in async function calls within the main block.
ast-grep --lang python --pattern $'if __name__ == "__main__":\n    async def test():\n        $$$\n    $$$'
Length of output: 4501

coderabbitai

Actionable comments posted: 6

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 57e3e2e and 48d21c4.

Files selected for processing (2)

cognee/api/v1/cognify/cognify.py (3 hunks)
cognee/api/v1/topology/add_topology.py (1 hunks)

Additional Context Used

Ruff (2)

cognee/api/v1/cognify/cognify.py (1)

188-188: Do not use bare except

cognee/api/v1/topology/add_topology.py (1)

16-16: Local variable graph_topology is assigned to but never used

coderabbitai · 2024-05-18T14:50:57Z

cognee/api/v1/topology/add_topology.py

+
+    graph_client = await get_graph_client(graph_db_type)
+
+    graph_topology = infrastructure_config.get_config()["graph_topology"]


Remove the unused variable graph_topology.

- graph_topology = infrastructure_config.get_config()["graph_topology"]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

graph_topology = infrastructure_config.get_config()["graph_topology"]

coderabbitai · 2024-05-18T14:50:58Z

cognee/api/v1/topology/add_topology.py

+    engine = TopologyEngine()
+    topology = await engine.infer_from_directory_structure(node_id =USER_ID , repository = directory, model=model)
+


Tip
Codebase Verification

Ensure proper error handling in infer_from_directory_structure

The function infer_from_directory_structure is defined in cognee/modules/topology/topology.py. However, the search for error handling patterns (e.g., try: blocks) did not yield any results within this function. This indicates that the function might lack proper error handling mechanisms.

Action Required:

Add appropriate error handling (e.g., try-except blocks) within the infer_from_directory_structure function to ensure robustness and reliability.

Location:

cognee/modules/topology/topology.py

Example:

async def infer_from_directory_structure(self, node_id: str, repository: str, model): try: # Your existing code logic except Exception as e: # Handle exceptions appropriately logger.error(f"Error in infer_from_directory_structure: {e}") raise

Please review and implement the necessary changes.

Analysis chain

Ensure proper error handling in infer_from_directory_structure.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify error handling in the function `infer_from_directory_structure`. # Test: Search for the function implementation and check for error handling. Expect: Proper try-except blocks. ast-grep --lang python --pattern $'async def infer_from_directory_structure($_, $_, $_) { $$$ }'

Length of output: 584

Script:

#!/bin/bash # Description: Verify error handling in the function `infer_from_directory_structure`. # Test: Search for the function implementation and check for error handling. Expect: Proper try-except blocks. ast-grep --lang python --pattern $'async def infer_from_directory_structure(node_id, repository, model):\n $$$'

Length of output: 114

Script:

#!/bin/bash # Description: Locate the function `infer_from_directory_structure` and verify error handling. # Step 1: Locate the function definition rg 'async def infer_from_directory_structure' -t py # Step 2: Check for error handling within the function rg 'try:' -A 10 -B 10 -t py

Length of output: 50516

coderabbitai · 2024-05-18T14:50:58Z

cognee/api/v1/topology/add_topology.py

+    def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]:
+        """Flatten a single Pydantic model to a dictionary handling nested structures."""
+        result = {**model.dict(), 'parent_id': parent_id}
+        if hasattr(model, 'default_relationship') and model.default_relationship:
+            result.update({
+                'relationship_type': model.default_relationship.type,
+                'relationship_source': model.default_relationship.source,
+                'relationship_target': model.default_relationship.target
+            })
+        return result


Refactor the flatten_model function to improve readability and maintainability.

- result = {**model.dict(), 'parent_id': parent_id} - if hasattr(model, 'default_relationship') and model.default_relationship: - result.update({ - 'relationship_type': model.default_relationship.type, - 'relationship_source': model.default_relationship.source, - 'relationship_target': model.default_relationship.target - }) + result = model.dict() + result['parent_id'] = parent_id + if model.default_relationship: + result['relationship_type'] = model.default_relationship.type + result['relationship_source'] = model.default_relationship.source + result['relationship_target'] = model.default_relationship.target

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]:

"""Flatten a single Pydantic model to a dictionary handling nested structures."""

result = {**model.dict(), 'parent_id': parent_id}

if hasattr(model, 'default_relationship') and model.default_relationship:

result.update({

'relationship_type': model.default_relationship.type,

'relationship_source': model.default_relationship.source,

'relationship_target': model.default_relationship.target

})

return result

def flatten_model(model: BaseModel, parent_id: Optional[str] = None) -> Dict[str, Any]:

"""Flatten a single Pydantic model to a dictionary handling nested structures."""

result = model.dict()

result['parent_id'] = parent_id

if model.default_relationship:

result['relationship_type'] = model.default_relationship.type

result['relationship_source'] = model.default_relationship.source

result['relationship_target'] = model.default_relationship.target

return result

coderabbitai · 2024-05-18T14:50:58Z

cognee/api/v1/topology/add_topology.py

+    def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]:
+        """Recursively flatten nested Pydantic models or lists of models."""
+        if isinstance(items, list):
+            return [entry for item in items for entry in recursive_flatten(item, parent_id)]
+        elif isinstance(items, BaseModel):
+            flat = [flatten_model(items, parent_id)]
+            for field, value in items:
+                if isinstance(value, (BaseModel, list)):
+                    flat.extend(recursive_flatten(value, items.dict().get('node_id', None)))
+            return flat
+        else:
+            return []


Optimize the recursive_flatten function to handle different data types more efficiently.

- if isinstance(items, list): - return [entry for item in items for entry in recursive_flatten(item, parent_id)] - elif isinstance(items, BaseModel): - flat = [flatten_model(items, parent_id)] - for field, value in items: - if isinstance(value, (BaseModel, list)): - flat.extend(recursive_flatten(value, items.dict().get('node_id', None))) - return flat - else: - return [] + return [flatten_model(item, parent_id) if isinstance(item, BaseModel) else recursive_flatten(item, parent_id) for item in items] if isinstance(items, list) else [flatten_model(items, parent_id)]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]:

"""Recursively flatten nested Pydantic models or lists of models."""

if isinstance(items, list):

return [entry for item in items for entry in recursive_flatten(item, parent_id)]

elif isinstance(items, BaseModel):

flat = [flatten_model(items, parent_id)]

for field, value in items:

if isinstance(value, (BaseModel, list)):

flat.extend(recursive_flatten(value, items.dict().get('node_id', None)))

return flat

else:

return []

def recursive_flatten(items: Union[List[Any], BaseModel], parent_id: Optional[str] = None) -> List[Dict[str, Any]]:

"""Recursively flatten nested Pydantic models or lists of models."""

return [flatten_model(item, parent_id) if isinstance(item, BaseModel) else recursive_flatten(item, parent_id) for item in items] if isinstance(items, list) else [flatten_model(items, parent_id)]

coderabbitai · 2024-05-18T14:50:58Z

cognee/api/v1/topology/add_topology.py

+        if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
+            await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])


Add error handling for add_edge to manage potential failures gracefully.

- if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']): - await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type']) + if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']): + try: + await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type']) + except Exception as e: + logging.error(f"Failed to add edge: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):

await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])

if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):

try:

await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])

except Exception as e:

logging.error(f"Failed to add edge: {e}")

coderabbitai

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 48d21c4 and 4488339.

Files selected for processing (1)

.github/workflows/test_common.yml (2 hunks)

Files skipped from review due to trivial changes (1)

.github/workflows/test_common.yml

coderabbitai

Actionable comments posted: 1

Out of diff range and nitpick comments (5)

cognee/infrastructure/llm/openai/adapter.py (4)

Line range hint 23-25: Encapsulate API key management to enhance security and maintainability.

Direct manipulation of openai.api_key in the constructor can lead to security risks and makes the code less maintainable. Consider using a method or a separate configuration manager to handle API keys.

Line range hint 29-31: Make retry attempts configurable to adapt to different runtime environments.

The number of retry attempts is hard-coded in several methods. Making this configurable via an environment variable or configuration file would allow easier adjustments based on the deployment environment.

Also applies to: 33-35, 37-39, 41-43, 45-47

Line range hint 37-39: Use a constant for the default model to avoid repetition and potential errors.

The default model string "text-embedding-3-large" is repeated in multiple methods. Defining this as a constant at the class or module level would reduce repetition and help prevent errors if the model name changes.

Also applies to: 41-43

Line range hint 57-59: Improve the formatting of the user input in the message for clarity.

The user input is embedded in a larger string which might make it hard to read. Consider formatting it on a new line or using a clearer delimiter.

cognee/infrastructure/InfrastructureConfig.py (1)

Line range hint 36-149: Add error handling and validation for configuration parameters to enhance robustness.

The get_config and set_config methods handle a wide range of configurations. Adding error handling and validation for the input parameters would make these methods more robust and prevent potential runtime errors.

Also applies to: 204-206

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4488339 and d1fa826.

Files ignored due to path filters (2)

docker-compose.yml is excluded by !**/*.yml
pyproject.toml is excluded by !**/*.toml

Files selected for processing (4)

cognee/config.py (3 hunks)
cognee/infrastructure/InfrastructureConfig.py (4 hunks)
cognee/infrastructure/llm/openai/adapter.py (1 hunks)
cognee/shared/data_models.py (7 hunks)

Files skipped from review as they are similar to previous changes (2)

cognee/config.py
cognee/shared/data_models.py

Additional comments not posted (1)

cognee/infrastructure/InfrastructureConfig.py (1)

35-36: Ensure consistent handling of graph_topology across different methods.

The graph_topology attribute is set in multiple places within the InfrastructureConfig class. Verify that it is consistently handled across all methods to avoid configuration inconsistencies.

Also applies to: 82-84, 149-149, 204-206

coderabbitai · 2024-05-18T16:48:43Z

cognee/infrastructure/llm/openai/adapter.py

+if config.monitoring_tool == MonitoringTool.LANGFUSE:
+    from langfuse.openai import AsyncOpenAI, OpenAI
+else:
+    from openai import AsyncOpenAI, OpenAI


Consider using a more dynamic import strategy for AsyncOpenAI and OpenAI.

Using conditional imports based on configuration can lead to code that is harder to maintain and test. Consider using a factory pattern or dependency injection to handle different configurations.

coderabbitai

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d1fa826 and 352ea25.

Files selected for processing (3)

cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
cognee/infrastructure/llm/openai/adapter.py (1 hunks)
cognee/shared/data_models.py (7 hunks)

Files skipped from review as they are similar to previous changes (3)

cognee/infrastructure/llm/generic_llm_api/adapter.py
cognee/infrastructure/llm/openai/adapter.py
cognee/shared/data_models.py

coderabbitai

Actionable comments posted: 8

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 352ea25 and 2657aa7.

Files ignored due to path filters (2)

evals/soldiers_home.pdf is excluded by !**/*.pdf, !**/*.pdf
pyproject.toml is excluded by !**/*.toml

Files selected for processing (6)

cognee/api/v1/cognify/cognify.py (3 hunks)
cognee/modules/cognify/graph/add_data_chunks.py (3 hunks)
evals/generate_test_set.py (1 hunks)
evals/natural_language_processing.txt (1 hunks)
evals/simple_rag_vs_cognee_eval.py (1 hunks)
evals/trump.txt (1 hunks)

Files skipped from review due to trivial changes (1)

evals/natural_language_processing.txt

Files skipped from review as they are similar to previous changes (1)

cognee/modules/cognify/graph/add_data_chunks.py

Additional Context Used

Ruff (15)

cognee/api/v1/cognify/cognify.py (4)

110-110: Local variable added_chunks is assigned to but never used

111-111: Local variable added__basic_rag_chunks is assigned to but never used

188-188: Do not use bare except

263-263: Local variable graph is assigned to but never used

evals/generate_test_set.py (4)

34-34: Module level import not at top of file

35-35: Module level import not at top of file

36-36: Module level import not at top of file

41-41: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (7)

11-11: Module level import not at top of file

26-26: Module level import not at top of file

27-27: Module level import not at top of file

27-27: Redefinition of unused List from line 5

28-28: Module level import not at top of file

33-33: Local variable response is annotated but never used

71-71: Module level import not at top of file

Additional comments not posted (3)

evals/generate_test_set.py (2)

1-6: Imports and dotenv setup are correctly placed at the top of the file.

23-27: The method generate_goldens_from_docs is used to generate goldens from documents. Ensure that the document paths and parameters are correctly configured for your use case.

evals/trump.txt (1)

1-15: The content of the file provides a detailed account of Donald Trump's speech. Ensure that all factual information is accurate and up-to-date.

coderabbitai · 2024-05-19T18:41:35Z

evals/generate_test_set.py

+answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
+
+from deepeval import evaluate


The instantiation of AnswerRelevancyMetric and the import of evaluate are correctly placed. However, consider moving all imports to the top of the file.

- from deepeval import evaluate + # Move this import to the top of the file

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)

from deepeval import evaluate

from deepeval import evaluate

answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)

coderabbitai · 2024-05-19T18:41:35Z

evals/simple_rag_vs_cognee_eval.py

+dataset.test_cases = convert_goldens_to_test_cases(dataset.goldens)
+
+
+from deepeval.metrics import HallucinationMetric


The import of HallucinationMetric should be moved to the top of the file to maintain consistency and readability.

- from deepeval.metrics import HallucinationMetric + # Move this import to the top of the file

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from deepeval.metrics import HallucinationMetric

coderabbitai · 2024-05-19T18:41:35Z

cognee/api/v1/cognify/cognify.py

+    except:
+        pass


Avoid using a bare except statement as it can catch unexpected exceptions and obscure real errors. Specify the exception type or handle the general exception more gracefully.

- except: + except Exception as e: + logger.error("Failed during cognitive layer processing: %s", str(e))

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

except:

pass

except Exception as e:

logger.error("Failed during cognitive layer processing: %s", str(e))

coderabbitai · 2024-05-19T18:41:35Z

cognee/api/v1/cognify/cognify.py


+        config.set_graph_model(SourceCodeGraph)
+        config.set_classification_model(CodeContentPrediction)
        graph = await cognify()


The local variable graph is assigned but never used. Consider removing this assignment if it's not needed.

- graph = await cognify() + # Consider removing this line if 'graph' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

graph = await cognify()

# Consider removing this line if 'graph' is not used.

coderabbitai

Actionable comments posted: 4

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2657aa7 and d099cae.

Files selected for processing (2)

evals/generate_test_set.py (1 hunks)
evals/simple_rag_vs_cognee_eval.py (1 hunks)

Additional Context Used

Ruff (10)

evals/generate_test_set.py (4)

36-36: Module level import not at top of file

37-37: Module level import not at top of file

38-38: Module level import not at top of file

43-43: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (6)

11-11: Module level import not at top of file

44-44: Module level import not at top of file

45-45: Module level import not at top of file

45-45: Redefinition of unused List from line 5

46-46: Module level import not at top of file

97-97: Module level import not at top of file

coderabbitai · 2024-05-19T20:44:29Z

evals/generate_test_set.py

+import pytest
+from deepeval import assert_test
+from deepeval.metrics import AnswerRelevancyMetric


Move module-level imports to the top of the file to improve readability and maintain Python conventions.

- import pytest - from deepeval import assert_test - from deepeval.metrics import AnswerRelevancyMetric - from deepeval import evaluate + import pytest + from deepeval import assert_test, evaluate + from deepeval.metrics import AnswerRelevancyMetric

Also applies to: 43-43

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

import pytest

from deepeval import assert_test

from deepeval.metrics import AnswerRelevancyMetric

import pytest

from deepeval import assert_test, evaluate

from deepeval.metrics import AnswerRelevancyMetric

coderabbitai · 2024-05-19T20:44:30Z

evals/generate_test_set.py

+# synthesizer = Synthesizer()
+# synthesizer.generate_goldens_from_docs(
+#     document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'],
+#     max_goldens_per_document=5,
+#     num_evolutions=5,
+#     include_expected_output=True,
+#     enable_breadth_evolve=True,
+# )
+#
+# synthesizer.save_as(
+#     file_type='json', # or 'csv'
+#     directory="./synthetic_data"
+# )


Consider removing or refactoring the commented-out code if it's no longer needed.

It's generally a good practice to keep the codebase clean by removing unused code blocks. This can help improve the maintainability of the code.

coderabbitai · 2024-05-19T20:44:30Z

evals/simple_rag_vs_cognee_eval.py

+
+
+import logging
+from typing import List, Dict


Remove the redundant import of List as it is already imported earlier in the file.

- from typing import List, Dict + from typing import Dict

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from typing import List, Dict

from typing import Dict

coderabbitai

Actionable comments posted: 4

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d099cae and 8ef2373.

Files ignored due to path filters (3)

docker-compose.yml is excluded by !**/*.yml
evals/test_datasets/initial_test/soldiers_home.pdf is excluded by !**/*.pdf, !**/*.pdf
pyproject.toml is excluded by !**/*.toml

Files selected for processing (4)

cognee/infrastructure/databases/graph/falkordb/adapter.py (1 hunks)
evals/simple_rag_vs_cognee_eval.py (1 hunks)
evals/test_datasets/initial_test/natural_language_processing.txt (1 hunks)
evals/test_datasets/initial_test/trump.txt (1 hunks)

Additional Context Used

LanguageTool (5)

evals/test_datasets/initial_test/natural_language_processing.txt (1)

Near line 1: Possible missing comma found.
Context: ...formation and insights contained in the documents as well as categorize and organize the ...

evals/test_datasets/initial_test/trump.txt (4)

Near line 1: The preposition ‘to’ seems more likely in this position.
Context: ...onstitution – during a bombastic speech for the National Rifle Association in which...

Near line 5: Consider a different adjective to strengthen your wording.
Context: ...t all. I intend to serve four years and do a great job.” The 22nd amendment, which was enacte...

Near line 9: The preposition “about” seems more likely in this position.
Context: ... In his speech to the NRA, Trump spoke on abortion, immigration and criticized Ro...

Near line 9: Possible spelling mistake found.
Context: ...on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”....

Ruff (7)

evals/simple_rag_vs_cognee_eval.py (7)

11-11: Module level import not at top of file

44-44: Module level import not at top of file

45-45: Module level import not at top of file

45-45: Redefinition of unused List from line 5

46-46: Module level import not at top of file

79-79: Local variable graph is assigned to but never used

131-131: Undefined name run_cognify_base_rag_and_search

Additional comments not posted (4)

evals/test_datasets/initial_test/natural_language_processing.txt (1)

2-2: This paragraph clearly outlines the challenges in NLP.

evals/test_datasets/initial_test/trump.txt (1)

5-5: The adjective "great" adequately conveys the intended meaning here.

evals/simple_rag_vs_cognee_eval.py (1)

131-131: Verify the existence of run_cognify_base_rag_and_search. It appears to be called but is not defined or imported in this file.

cognee/infrastructure/databases/graph/falkordb/adapter.py (1)

13-186: The FalcorDBAdapter class is well-implemented with clear, asynchronous database operation methods and proper error handling.

coderabbitai · 2024-05-20T12:45:52Z

evals/test_datasets/initial_test/natural_language_processing.txt

@@ -0,0 +1,2 @@
+Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.


Consider adding a comma after "documents" for better readability.

- ...contained in the documents as well as categorize... + ...contained in the documents, as well as categorize...

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents, as well as categorize and organize the documents themselves.

coderabbitai · 2024-05-20T12:45:52Z

evals/test_datasets/initial_test/trump.txt

+
+The 22nd amendment, which was enacted following Franklin Delano Rosevelt’s fourth term, limits the presidency to two terms.
+
+In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.


Consider changing "on" to "about" to clarify the topics discussed.

- ...Trump spoke on abortion... + ...Trump spoke about abortion...

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.

In his speech to the NRA, Trump spoke about abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.

coderabbitai · 2024-05-20T12:45:52Z

evals/test_datasets/initial_test/trump.txt

@@ -0,0 +1,15 @@
+Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.


Consider changing "for" to "to" to clarify the context of the speech.

- ...speech for the National Rifle Association... + ...speech to the National Rifle Association...

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.

Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech to the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.

coderabbitai

Actionable comments posted: 2

Outside diff range and nitpick comments (3)

cognee/modules/search/llm/extraction/categorize_relevant_category.py (1)

6-17: The implementation of categorize_relevant_category looks solid and follows async programming practices well.

Consider reviewing the use of print statements for sensitive data logging in a production environment.

cognee/modules/search/graph/search_categories.py (1)

Line range hint 18-52: The enhancements to search_categories to support different graph engines and configurations are well-implemented.

Consider adding more detailed docstrings to explain the parameters and the return type more clearly.

cognee/api/v1/cognify/cognify.py (1)

83-149: The modifications to the cognify function to handle different datasets and graph interactions are comprehensive and well-implemented.

Consider adding more comments to explain complex sections of the code for better maintainability.

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8ef2373 and 63356f2.

Files selected for processing (5)

cognee/api/v1/cognify/cognify.py (3 hunks)
cognee/infrastructure/InfrastructureConfig.py (5 hunks)
cognee/modules/search/graph/search_categories.py (2 hunks)
cognee/modules/search/llm/extraction/categorize_relevant_category.py (1 hunks)
evals/simple_rag_vs_cognee_eval.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

cognee/infrastructure/InfrastructureConfig.py

Additional Context Used

Ruff (11)

cognee/api/v1/cognify/cognify.py (3)

110-110: Local variable added_chunks is assigned to but never used

111-111: Local variable added__basic_rag_chunks is assigned to but never used

188-188: Do not use bare except

cognee/modules/search/graph/search_categories.py (2)

9-9: Module level import not at top of file

10-10: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (6)

11-11: Module level import not at top of file

44-44: Module level import not at top of file

45-45: Module level import not at top of file

45-45: Redefinition of unused List from line 5

46-46: Module level import not at top of file

79-79: Local variable graph is assigned to but never used

Additional comments not posted (3)

cognee/modules/search/graph/search_categories.py (1)

12-16: The strip_exact_regex function is a useful utility for regex operations.

evals/simple_rag_vs_cognee_eval.py (1)

1-10: The new imports for handling evaluation datasets are correctly placed and used.

cognee/api/v1/cognify/cognify.py (1)

Line range hint 1-38: The new imports and initial setup in cognify.py are well-organized and relevant to the functionality.

coderabbitai · 2024-05-21T08:08:28Z

evals/simple_rag_vs_cognee_eval.py

+from cognee.infrastructure.llm.get_llm_client import get_llm_client
+
+dataset = EvaluationDataset()
+dataset.add_test_cases_from_json_file(
+    # file_path is the absolute path to you .json file
+    file_path="./synthetic_data/20240519_185842.json",
+    input_key_name="input",
+    actual_output_key_name="actual_output",
+    expected_output_key_name="expected_output",
+    context_key_name="context"
+)
+
+print(dataset)
+# from deepeval.synthesizer import Synthesizer
+#
+# synthesizer = Synthesizer(model="gpt-3.5-turbo")
+#
+# dataset = EvaluationDataset()
+# dataset.generate_goldens_from_docs(
+#     synthesizer=synthesizer,
+#     document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'],
+#     max_goldens_per_document=10,
+#     num_evolutions=5,
+#     enable_breadth_evolve=True,
+# )
+
+
+print(dataset.goldens)
+print(dataset)
+
+
+
+
+import logging
+from typing import List, Dict
+from cognee.infrastructure import infrastructure_config
+
+logger = logging.getLogger(__name__)
+
+class AnswerModel(BaseModel):
+    response:str
+def get_answer_base(content: str, context:str, response_model: Type[BaseModel]):
+    llm_client = get_llm_client()
+
+    system_prompt = "THIS IS YOUR CONTEXT:" + str(context)
+
+    return  llm_client.create_structured_output(content, system_prompt, response_model)
+def get_answer(content: str,context, model: Type[BaseModel]= AnswerModel):
+
+    try:
+        return (get_answer_base(
+            content,
+            context,
+            model
+        ))
+    except Exception as error:
+        logger.error("Error extracting cognitive layers from content: %s", error, exc_info = True)
+        raise error
+


The implementation of dataset handling and model testing functions is robust.

Consider using more specific exception types in the get_answer function to handle errors more gracefully.

- except Exception as error: + except SpecificExceptionType as error:

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-21T08:08:28Z

cognee/api/v1/cognify/cognify.py

+async def process_text(chunk_collection: str, chunk_id: str, input_text: str, file_metadata: dict, document_id: str):
+    print(f"Processing chunk ({chunk_id}) from document ({file_metadata['id']}).")

    graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])

-    classified_categories = await get_content_categories(input_text)
+
+    graph_topology = infrastructure_config.get_config()["graph_model"]
+    if graph_topology == SourceCodeGraph:
+        classified_categories = [{'data_type': 'text', 'category_name': 'Code and functions'}]
+    elif graph_topology == KnowledgeGraph:
+        classified_categories = await get_content_categories(input_text)
+    else:
+        classified_categories = [{'data_type': 'text', 'category_name': 'Unclassified text'}]
+
+    # await add_label_nodes(graph_client, document_id, chunk_id, file_metadata["keywords"].split("|"))
+
    await add_classification_nodes(
-        graph_client,
-        parent_node_id = document_id,
-        categories = classified_categories,
-    )
+         graph_client,
+         parent_node_id = document_id,
+         categories = classified_categories,
+     )

    print(f"Chunk ({chunk_id}) classified.")
-
    content_summary = await get_content_summary(input_text)
    await add_summary_nodes(graph_client, document_id, content_summary)
-
    print(f"Chunk ({chunk_id}) summarized.")
-
    cognitive_layers = await get_cognitive_layers(input_text, classified_categories)
-    cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2]
-
-    layer_graphs = await get_layer_graphs(input_text, cognitive_layers)
-    await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs)
-
-    if infrastructure_config.get_config()["connect_documents"] is True:
-        db_engine = infrastructure_config.get_config()["database_engine"]
-        relevant_documents_to_connect = db_engine.fetch_cognify_data(excluded_document_id = raw_document_id)
-
-        list_of_nodes = []
-
-        relevant_documents_to_connect.append({
-            "layer_id": document_id,
-        })
+    cognitive_layers = cognitive_layers[:config.cognitive_layers_limit]
+
+    try:
+        cognitive_layers = (await add_cognitive_layers(graph_client, document_id, cognitive_layers))[:2]
+        print("cognitive_layers", cognitive_layers)
+        layer_graphs = await get_layer_graphs(input_text, cognitive_layers)
+        await add_cognitive_layer_graphs(graph_client, chunk_collection, chunk_id, layer_graphs)
+    except:
+        pass
+
+
+
+
+
+    # if infrastructure_config.get_config()["connect_documents"] is True:


The process_text function includes robust error handling and detailed logging.

Consider using specific exception types instead of a bare except to handle errors more gracefully.

- except: + except SpecificExceptionType:

Committable suggestion was skipped due low confidence.

coderabbitai

Actionable comments posted: 14

Outside diff range and nitpick comments (1)

cognee/modules/search/vector/search_similarity.py (1)

14-14: Consider removing debug print statements or replacing them with proper logging mechanisms before deploying to production.

Also applies to: 23-24

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 63356f2 and 3fadb27.

Files selected for processing (11)

cognee/api/v1/cognify/cognify.py (3 hunks)
cognee/api/v1/search/search.py (3 hunks)
cognee/modules/search/graph/search_adjacent.py (1 hunks)
cognee/modules/search/graph/search_categories.py (2 hunks)
cognee/modules/search/graph/search_cypher.py (1 hunks)
cognee/modules/search/graph/search_neighbour.py (2 hunks)
cognee/modules/search/llm/extraction/categorize_relevant_category.py (1 hunks)
cognee/modules/search/vector/bm25.py (1 hunks)
cognee/modules/search/vector/fusion.py (1 hunks)
cognee/modules/search/vector/search_similarity.py (1 hunks)
evals/simple_rag_vs_cognee_eval.py (1 hunks)

Files skipped from review due to trivial changes (2)

cognee/modules/search/vector/bm25.py
cognee/modules/search/vector/fusion.py

Files skipped from review as they are similar to previous changes (1)

cognee/modules/search/llm/extraction/categorize_relevant_category.py

Additional Context Used

Ruff (10)

cognee/api/v1/cognify/cognify.py (2)

111-111: Local variable added__basic_rag_chunks is assigned to but never used

188-188: Do not use bare except

cognee/modules/search/graph/search_categories.py (2)

10-10: Module level import not at top of file

11-11: Module level import not at top of file

evals/simple_rag_vs_cognee_eval.py (6)

11-11: Module level import not at top of file

44-44: Module level import not at top of file

45-45: Module level import not at top of file

45-45: Redefinition of unused List from line 5

46-46: Module level import not at top of file

79-79: Local variable graph is assigned to but never used

coderabbitai · 2024-05-21T17:42:02Z

evals/simple_rag_vs_cognee_eval.py

+
+    await add("data://test_datasets", "initial_test")
+
+    graph = await cognify("initial_test")


Local variable graph is assigned to but never used.

- graph = await cognify("initial_test") + # Consider removing this line if 'graph' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

graph = await cognify("initial_test")

# Consider removing this line if 'graph' is not used.

coderabbitai · 2024-05-21T17:42:02Z

evals/simple_rag_vs_cognee_eval.py

+logger = logging.getLogger(__name__)
+
+class AnswerModel(BaseModel):
+    response:str


The local variable response is annotated but never used within the AnswerModel. Consider removing it if it's not needed.

- response:str + # Remove this line if 'response' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

response:str

# Remove this line if 'response' is not used.

coderabbitai · 2024-05-21T17:42:03Z

evals/simple_rag_vs_cognee_eval.py

+import logging
+from typing import List, Dict
+from cognee.infrastructure import infrastructure_config


Consolidate module-level imports and remove redundant import of List.

- import logging - from typing import List, Dict - from cognee.infrastructure import infrastructure_config + import logging + from typing import Dict + from cognee.infrastructure import infrastructure_config

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

import logging

from typing import List, Dict

from cognee.infrastructure import infrastructure_config

import logging

from typing import Dict

from cognee.infrastructure import infrastructure_config

coderabbitai · 2024-05-21T17:42:03Z

evals/simple_rag_vs_cognee_eval.py

+import dotenv
+dotenv.load_dotenv()
+
+from cognee.infrastructure.llm.get_llm_client import get_llm_client


Move module-level imports to the top of the file.

- from cognee.infrastructure.llm.get_llm_client import get_llm_client + # Move this import to the top of the file

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-21T17:42:03Z

cognee/api/v1/cognify/cognify.py

                except FileTypeException:
                    logger.warning("File (%s) has an unknown file type. We are skipping it.", file_metadata["id"])

-    added_chunks: list[tuple[str, str, dict]] = await add_data_chunks(data_chunks)
+        added_chunks = await add_data_chunks(data_chunks)
+        added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)


Local variable added__basic_rag_chunks is assigned to but never used.

- added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks) + # Consider removing this line if 'added__basic_rag_chunks' is not used.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)

# added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)

coderabbitai · 2024-05-21T17:42:03Z

cognee/modules/search/graph/search_cypher.py

+    else:
+        raise ValueError("Unsupported graph engine type.")


Consider enhancing the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type.") + raise ValueError(f"Unsupported graph engine type: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-21T17:42:03Z

cognee/modules/search/graph/search_adjacent.py

@@ -22,7 +22,7 @@ async def search_adjacent(graph: Union[nx.Graph, any], query: str, infrastructur

    if node_id is None:
        return {}
-
+    from cognee.infrastructure import infrastructure_config


Duplicate import statement for infrastructure_config. Consider removing it to clean up the code.

Enhance the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type in the configuration.") + raise ValueError(f"Unsupported graph engine type in the configuration: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-21T17:42:03Z

cognee/modules/search/graph/search_neighbour.py

@@ -20,26 +22,23 @@ async def search_neighbour(graph: Union[nx.Graph, any], id: str, infrastructure_
    Returns:
    - List[str]: A list of 'description' attributes of nodes that share the same 'layer_uuid' with the specified node.
    """
-    node_id = other_param.get('node_id') if other_param else None
+    from cognee.infrastructure import infrastructure_config


Duplicate import statement for infrastructure_config. Consider removing it to clean up the code.

Enhance the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type in the configuration.") + raise ValueError(f"Unsupported graph engine type in the configuration: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-21T17:42:03Z

cognee/modules/search/graph/search_categories.py

@@ -21,9 +36,22 @@ async def search_categories(graph: Union[nx.Graph, any], query_label: str, infra
      each representing a node with 'nodeId' and 'summary'.
    """
    # Determine which client is in use based on the configuration
+    from cognee.infrastructure import infrastructure_config


Duplicate import statement for infrastructure_config. Consider removing it to clean up the code.

Enhance the error message to specify which graph engine type was attempted.

- raise ValueError("Unsupported graph engine type.") + raise ValueError(f"Unsupported graph engine type: {infrastructure_config.get_config()['graph_engine']}")

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-21T17:42:04Z

cognee/api/v1/search/search.py

+    SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION'
+    NODE_CLASSIFICATION = 'NODE_CLASSIFICATION'
+    DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION',
+    CYPHER = 'CYPHER'


Consider adding a default case in the from_str method to handle unexpected search types more gracefully.

- raise ValueError(f"{name} is not a valid SearchType") + raise ValueError(f"{name} is not a valid SearchType. Available types: {', '.join([e.name for e in SearchType])}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION'

NODE_CLASSIFICATION = 'NODE_CLASSIFICATION'

DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION',

CYPHER = 'CYPHER'

SUMMARY_CLASSIFICATION = 'SUMMARY_CLASSIFICATION'

NODE_CLASSIFICATION = 'NODE_CLASSIFICATION'

DOCUMENT_CLASSIFICATION = 'DOCUMENT_CLASSIFICATION',

CYPHER = 'CYPHER'

Intermidiate commit

f655ee8

Merge branch 'main' into feat/code-knowledge-graph

fec7010

coderabbitai bot reviewed Apr 28, 2024

View reviewed changes

Vasilije1990 added 2 commits April 29, 2024 22:52

fix cog layers

84fcf6c

fix search, add improvements

8e850c1

coderabbitai bot reviewed Apr 30, 2024

View reviewed changes

fix search, add improvements

e1a9a23

coderabbitai bot reviewed May 9, 2024

View reviewed changes

working topology inference

01446de

coderabbitai bot reviewed May 10, 2024

View reviewed changes

working topology inference

7249c24

coderabbitai bot reviewed May 11, 2024

View reviewed changes

working cognify on large dataset

3c261ce

coderabbitai bot reviewed May 12, 2024

View reviewed changes

Vasilije1990 added 3 commits May 17, 2024 09:52

fix deployment

d2e17dd

fix deployment

79311ee

async embeddings fix + processing fix + decouple issues with the lancedb

db0b19b

coderabbitai bot reviewed May 17, 2024

View reviewed changes

fixes to cognee

4e6fcde

coderabbitai bot reviewed May 17, 2024

View reviewed changes

fix graph logic

57e3e2e

coderabbitai bot reviewed May 18, 2024

View reviewed changes

test with a simple file

48d21c4

coderabbitai bot reviewed May 18, 2024

View reviewed changes

test with a simple file

4488339

coderabbitai bot reviewed May 18, 2024

View reviewed changes

test with a simple file

d1fa826

coderabbitai bot reviewed May 18, 2024

View reviewed changes

test with a simple file

352ea25

coderabbitai bot reviewed May 18, 2024

View reviewed changes

Vasilije1990 added 2 commits May 19, 2024 15:48

fix qdrant poetry issue

bd42ceb

Add evals for cognee

2657aa7

coderabbitai bot reviewed May 19, 2024

View reviewed changes

Add evals for cognee

d099cae

coderabbitai bot reviewed May 19, 2024

View reviewed changes

Updated evals, added falkordb

8ef2373

coderabbitai bot reviewed May 20, 2024

View reviewed changes

Fixes to search and evals

63356f2

coderabbitai bot reviewed May 21, 2024

View reviewed changes

Fixes for searches

3fadb27

coderabbitai bot reviewed May 21, 2024

View reviewed changes

Vasilije1990 merged commit 9bb30bc into main May 22, 2024
4 of 16 checks passed

	await render_graph(graph, include_color=True, include_nodes=False, include_size=False)
	# Ensure that 'graph' is defined before this line or pass the correct graph object.
	# await render_graph(graph, include_color=True, include_nodes=False, include_size=False)

	from.extraction.categorize_relevant_summary import categorize_relevant_summary
	from extraction.categorize_relevant_summary import categorize_relevant_summary

	from cognee.shared.data_models import GraphDBType, DefaultContentPrediction
	from cognee.shared.data_models import GraphDBType, DefaultContentPrediction

	"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",
	"evalue": "Catalog Error: Table with name file_metadata does not exist!\nDid you mean \"code_staging.file_metadata\"?",

		except:
		keywords = ["no keywords detected" + str(uuid.uuid4())]

		print("Telemetry is disabled.")
		logging.info("Telemetry is disabled.")

		@@ -103,6 +104,7 @@ async def add_node(client, parent_id: Optional[str], node_id: str, node_data: di


		async def add_edge(client, parent_id: Optional[str], node_id: str, node_data: dict, created_node_ids):
		print('NODE ID', node_data)

	print('NODE ID', node_data)
	logging.debug('NODE ID: %s', node_data)




		graph_topology = infrastructure_config.get_config()["graph_topology"]

		@@ -0,0 +1,4 @@
		You are a topology master and need to extract the following topology information from the text provided to you.
		Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY

	Relationship part'S can't be empty, and have to be logical AND CONNECTING ELEMENTS OF THE TOPOLOGY
	Relationship parts can't be empty and have to be logical and connecting elements of the topology.

		@@ -109,78 +118,91 @@ async def process_text(chunk_collection: str, chunk_id: str, input_text: str, fi

		graph_client = await get_graph_client(infrastructure_config.get_config()["graph_engine"])

		graph_topology = infrastructure_config.get_config()["graph_topology"]

	ids = await process_attribute(graph_client, root_id, attribute_name, attribute_value)
	await process_attribute(graph_client, root_id, attribute_name, attribute_value)


		print(f"Chunk ({chunk_id}) cognified.")
		document_id = await add_document_node(


		graph_client = await get_graph_client(graph_db_type)

		graph_topology = infrastructure_config.get_config()["graph_topology"]

		engine = TopologyEngine()
		topology = await engine.infer_from_directory_structure(node_id =USER_ID , repository = directory, model=model)

		if pd.notna(row['relationship_source']) and pd.notna(row['relationship_target']):
		await graph_client.add_edge(row['relationship_source'], row['relationship_target'], relationship_name=row['relationship_type'])

		answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)

		from deepeval import evaluate

		dataset.test_cases = convert_goldens_to_test_cases(dataset.goldens)


		from deepeval.metrics import HallucinationMetric

	graph = await cognify()
	# Consider removing this line if 'graph' is not used.

continue.dev integration WIP #89

continue.dev integration WIP #89

Conversation

Vasilije1990 commented Apr 28, 2024 • edited by coderabbitai bot

coderabbitai bot commented Apr 28, 2024 • edited

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

gitguardian bot commented Apr 28, 2024 • edited

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Apr 28, 2024

Choose a reason for hiding this comment

coderabbitai bot Apr 28, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Apr 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Apr 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Apr 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Apr 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Apr 30, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 9, 2024

Choose a reason for hiding this comment

coderabbitai bot May 9, 2024

Choose a reason for hiding this comment

coderabbitai bot May 9, 2024

Choose a reason for hiding this comment

coderabbitai bot May 9, 2024

Choose a reason for hiding this comment

coderabbitai bot May 9, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 10, 2024

Choose a reason for hiding this comment

coderabbitai bot May 10, 2024

Choose a reason for hiding this comment

coderabbitai bot May 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 11, 2024

Choose a reason for hiding this comment

coderabbitai bot May 11, 2024

Choose a reason for hiding this comment

coderabbitai bot May 11, 2024

Choose a reason for hiding this comment

coderabbitai bot May 11, 2024

Choose a reason for hiding this comment

coderabbitai bot May 11, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 12, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Commented-out text processing functions are used elsewhere in the codebase

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

Vasilije1990 commented Apr 28, 2024 •

edited by coderabbitai bot

coderabbitai bot commented Apr 28, 2024 •

edited

CodeRabbit Configration File (`.coderabbit.yaml`)

gitguardian bot commented Apr 28, 2024 •

edited

Unused Imports in `cognee/api/v1/cognify/cognify.py`

Ensure proper error handling in `infer_from_directory_structure`

		@@ -0,0 +1,2 @@
		Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding"[citation needed] the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.


		The 22nd amendment, which was enacted following Franklin Delano Rosevelt’s fourth term, limits the presidency to two terms.

		In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.

	In his speech to the NRA, Trump spoke on abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.
	In his speech to the NRA, Trump spoke about abortion, immigration and criticized Robert F Kennedy Jr as being part of the “radical left”. He also complained about the multiple criminal cases against him, including a gag order that bars him from commenting about witnesses in his ongoing New York City criminal trial.

		@@ -0,0 +1,15 @@
		Donald Trump flirted with the idea of being president for three terms – a clear violation of the US constitution – during a bombastic speech for the National Rifle Association in which he vowed to reverse gun safety measures green-lighted during the Biden administration.


		await add("data://test_datasets", "initial_test")

		graph = await cognify("initial_test")

	added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)
	# added__basic_rag_chunks = await add_data_chunks_basic_rag(data_chunks)