llm-security

This project investigates the security of large language models by performing binary classification of a set of input prompts to discover malicious prompts. Several approaches have been analyzed using classical ML algorithms, a trained LLM model, and a fine-tuned LLM model.

cybersecurity transformers-models prompt-injection llm-prompting llm-security

Updated Dec 18, 2023
Jupyter Notebook

leondz / lm_risk_cards

Sponsor

Star

Risks and targets for assessing LLMs & LLM vulnerabilities

security vulnerability red-teaming llm llm-security

Updated May 7, 2024
Python

raga-ai-hub / raga-llm-hub

Star

Framework for LLM evaluation, guardrails and security

guardrails llmops llm-security llm-evaluation

Updated Mar 10, 2024
Python

briland / LLM-security-and-privacy

Star

LLM security and privacy

security awesome privacy awesome-list llm generative-ai llm-security llm-framework llm-threats llm-vulnerabilities llm-privacy awesome-llm-security-and-privacy

Updated Apr 15, 2024
TeX

lakeraai / pint-benchmark

Star

A benchmark for prompt injection detection systems.

benchmark llm prompt-injection llm-security llm-benchmarking

Updated May 7, 2024
Jupyter Notebook

llm-platform-security / SecGPT

Star

SecGPT: An execution isolation architecture for LLM-based systems

sandbox gpt isolation multi-agent-systems openai-api llm chatgpt langchain llm-agent llm-security llm-framework llm-privacy llm-platform llm-based-systems

Updated Apr 29, 2024
Python

LostOxygen / llm-confidentiality

Star

Whispers in the Machine: Confidentiality in LLM-integrated Systems

security machine-learning framework deep-learning transformers openai prompt-toolkit gpt confidentiality systems-security llm prompt-engineering chatgpt prompt-injection llm-security

Updated Apr 24, 2024
Python

levitation-opensource / Manipulative-Expression-Recognition

Star

MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions, fostering development of transparency and safety in AI. It also supports manipulation victims by detecting manipulative patterns in human communication.

benchmarking sentiment-analysis manipulation transparency fraud-prevention human-computer-interaction human-robot-interaction expression-recognition sentiment-classification fraud-detection psychometrics misinformation conversation-analysis conversation-analytics llm prompt-engineering prompt-injection llm-security llm-training llm-test

Updated Jan 31, 2024
HTML

Improve this page

Add a description, image, and links to the llm-security topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-security topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-security

Here are 36 public repositories matching this topic...

Giskard-AI / giskard

pathwaycom / llm-app

protectai / llm-guard

deadbits / vigil-llm

EasyJailbreak / EasyJailbreak

chawins / llm-sp

liu00222 / Open-Prompt-Injection

ZenGuard-AI / fast-llm-security-guardrails

yevh / TaaC-AI

NaniDAO / ie

msoedov / agentic_security

llm-platform-security / chatgpt-plugin-eval

sinanw / llm-security-prompt-injection

leondz / lm_risk_cards

raga-ai-hub / raga-llm-hub

briland / LLM-security-and-privacy

lakeraai / pint-benchmark

llm-platform-security / SecGPT

LostOxygen / llm-confidentiality

levitation-opensource / Manipulative-Expression-Recognition

Improve this page

Add this topic to your repo