-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Backend-free support #670
Comments
This is a pretty interesting idea, it does solve the explosion of backends DLLs we have but still keeps the advantage of feature auto selection for end-users. My main concern (to add to the potential risks) is security - there's obviously a huge security risk in downloading DLLs and executing them as part of a progam (rather than just as part of the install step). I think we should include this as a separate backend (e.g. Some comments on specific implementation details:
When we build the release could we embed the commit ID directly into the source code and release that? That way you can just download from GitHub e.g. This has two advantages, it reduces security exposure slightly and means zero extra work on deployments!
We'd probably want to integrate it into NativeLibraryConfig somehow, but we'd need to offer an async API as well to prevent hangs. So you could call something like:
I experimented with flattening and it doesn't work :( We can flatten our DLLs, but as soon as one of them depends on another DLL which we don't load directly (e.g.
This seems like a separate feature to loading DLLs and I think it'd be really cool! There's a lot less inherent risk downloading model weights instead of DLLs. The proposed API ( public static LLamaWeights LoadFromFile(IModelParams @params);
public static async Task<LLamaWeights> LoadFromHuggingFace(string name, IModelParams @params); |
I concur with @martindevans : downloading on the fly is a security risk and for back-ends with LLamaSharp in most enterprise environments this will be blocked outright. So the alternative of manually associating a back-end should still be an option. |
I think that it could be a good use case to distribute an application to different desktop environments, but it doesn't fit on server environments where you will be using containers to deploy the code. From the security side, and after XZ backdoor , I don't see this as a step to be more secure. |
My idea is a bit different from it, I prefer to add the auto-downloading as a configuration in the main package, for example,
I agree with that. We could add a static variable to LLamaSharp to mark the default commit_id for the current version.
I'm not sure if this could be done without manually clicking the The security issue is absolutely one of the most important things. I don't mean to drop the existing backends, but only provide one more option for users, especially for those who is freshman for programming/LLM and who wants to distribute desktop apps. It could be disabled by default and depends on user that whether to enable it. :) |
I am sorry, but I do not think that this is the right solution for several reasons:
What I am doing now takes 10 minutes to make the backend. I use the CMake gui and generate a VS solution and compile llama.dll and llava_shared.dll. I use these directly in the C# code. But even this can be automated with the CMake script I have started in |
It could do if we wanted to (just add a
It can provide more setups if done right. Our current backend packages can package anything of course, but because they're packaged together we do have to make a decision on what's worth including (e.g. CUDA with every AVX variant would massively bloat things). With auto download we could just have every variant precompiled and sitting in the repo, ready for auto download. Less bloat, more hardware support! One proposal we've discussed before is to provide every individual backend as a different nuget package with exactly one binary in it (e.g.
Agreed, definitely a problem with providing native dependencies. I've moved all the build work over to GH actions, so it's fairly public/auditable, but still not perfect.
This doesn't solve half of the problem. There are two types of applications with very different needs from backends. Server apps are compiled and deployed to a specific platform and in practice only need one specific backend. Cmake works for that - you just compile the DLL you need. That's fully supported at the moment, since you can install no backend package and just drop the DLL in the appropriate place. I do think your work with cmake/submodules to make this use-case easier is valuable! However applications deployed to an end-user (e.g. a game using LLamaSharp) cannot just ship one single backend. You absolutely need feature detection to select the best backend for whatever the user has. Obviously you have no idea what that might be when shipping the app and if feature detection wasn't built into LLamaSharp everyone would need to build it themselves. The current backend situation isn't ideal for the app usecase since you need to ship a load of DLLs to every user even though they only need one, you just don't know which one. This is where auto downloading would be great. |
About the security problem, I think we could allow user to specify a huggingface repo, instead of using the official one provided by us. If we take one step further, we could let user decide the full downloading behavior. For example, providing an APIs like below. NativeLibraryConfig.SetDownloadHandler(IDownloadHandler handler); public interface IDownloadHandler{
// We pass the best configuration we detected, but the user could certainly ignore it.
// The `Path` in `recommendedConfiguration` is an url pointing to a file in our official repo.
// The returned value should contains the real path to the local file, and the selected library type.
NativeLibraryInfo Download(NativeLibraryInfo recommendedConfiguration);
}
// Path: the path of the library file.
// IsCuda: whether it's compiled with CUDA.
// AvxLevel: which avx the livrary uses.
// CommitHash: optional, the commit id of the llama.cpp repo it compiled from.
public record class NativeLibraryInfo(string Path, bool IsCuda, AvxLevel AvxLevel, string? CommitHash = null); Though it's still not perfect, it could help reduce the risks of security problem. |
My personal opinion is that you make this much more complicated than it is necessary. A game developer you mentioned above would just run CMake with 2-3 configurations (each takes 5-6 min) and get a different VS solution for each setup. Or you could just make one VS solution with a fine tuned CMake script where you just select the Configuration in VS... I do not think that a professional software company would distribute DLLs from someone else if they can just compile it easily... |
I've made a prototype of the library I mentioned above -- HuggingfaceHub. It could download files from huggingface now, though it needs more test. At least, it proved that it's absolutely possible to implement this proposal, and possibly support automatically downloading models in the future.
@zsogitbe I agree that a professional software company won't use the auto-downloading in their distributions. However, it's as much as important to provide convenience for developers/users that are not so experienced. As you can see, most of the issues and PRs are opened by individuals, instead of employees whose company uses LLamaSharp. I started programming at 2016 with VB.NET but had never used CMake until 2020. I believe there are many .NET developers who don't have experience with C++/CMake. Thus it's necessary to provide a way to make them use LLamaSharp as easily as possible. Your work at experimental_cpp, if I'm not misunderstanding it, is mainly to solve the problem that the user could only compile llama.cpp at CLI instead of GUI. Would you like to add some docs about how to use it with GUI (maybe some screenshots?) so that we could merge it into branch, to let users know this option? |
But what 2-3 configurations would they choose (there are a lot more than 2-3 possible configurations)? Once they had those 3 DLLs how would the software select which one to use on the end-user machine? This is back to the problem we're trying solve by distributing all the builds and using auto selection, except now the developer has to implement it all instead of us! Cmake and self built DLLs is great for server software, but is orthogonal to the other issue of selecting which backend to use for applications.
Agreed, but they will probably still need auto selection (unless they rebuilt it themselves as part of the installer) and can maybe even use auto downloading if we allow it to be configured with self-hosted URLs. |
Maybe we need to think more about it to find the best solution. The CMake GUI (Graphical User Interface) provides an interactive way to configure CMake projects. You define some parameters (cuda, avx2,...) and it generates the VS solution automatically. After this you add the C++ projects you need to the C# projects like this (this is how I use the library - clean and easy to understand): But, I do not know how to generate multiplatform DLL's on Windows (so for Mac and Linux). My initial work on experimental_cpp attempts to do the above automatically (with manually changing some parameters in CMakeList.txt for choosing cuda, avx2, etc.). |
Added PR for automatic solution generator work in progress code: 9c91fac |
While the idea of downloading might work for local boxes, IMO it's a no-no for production use. E.g. downloading remote code at start means having a backdoor open for code injection, plus a bootstrap performance killer for lambdas. Even with some caching (which would complicate things), the security concern is pretty big. Packages should have a strong signature with a trusted cert (who would decide what to trust?) and the client downloading assemblies would have to verify these signatures. Anyway, seems a lot of additional complexity and trust :-) Without going too much off topic, in .NET there's a pretty robust dependency injection framework - why not allow to detect the plat hw, and then simply inject the right backend, without overriding assembly names? |
@dluc Would it sound reasonable if we leave it up to the user to decide whether or not to use this feature? As you can see here, developers who use LLamaSharp could insert a downloading process before the library loading. What I made in that PR is trying to add an official implementation of this feature. I have a question here. There're many applications who have a plugin system. When the user download the plugin, it's actually downloading some remote code. What's the difference between the plugin downloading and this proposal? Is the plugin downloading more safe for some reasons?
We have supported selecting the right native library according to the system information (code, doc). However it's triggered via setting the |
Introduction
LLamaSharp uses llama.cpp as a backend, and have introduced dynamic native library loading, which allows us to choose which DLL to load at runtime. However, users still need to install the backend packages unless they have exactly one DLL to use. The problem is, at most of the time, an user only needs one DLL, for example, the CUDA11 one. However, many DLLs have to be included, especially if we support CUDA with AVX in the future.
Dividing into backend packages with a single file, as previously discussed in other issues, appears to be a solution. However, if the user has chosen a specific backend, what is the purpose of our backend selection strategy? Furthermore, this approach may lead to an excessive number of backends, causing potential difficulties.
Is it possible to select the native library by the configuration and system information, and only download the selected one, and without having too many backend packages? This is the point of this proposal.
Brief Description
My idea is to put all the native library files on HuggingFace, then download the selected one according to the configuration and system information at runtime. That's all!
APIs
The following APIs will be exposed for users to get this feature.
p.s. To be honest, I don't think it's good to put the methods for downloading in
NativeLibraryConfig
, but I haven't come up with a better idea yet.Behaviors
Priorities
The most important thing is that what the behavior is when this feature is used with backend installed?
My answer would be that we'll follow the priorities below.
WithLibrary
, just load it.Directory structure
We will cache the files in a default directory (may be
~/.llama-sharp
) or a specified one by user. In this directory, we will make subdirectories named by version, in which there are downloaded files.In this way, there're two possible directory structures, which are listed as below.
the first one, flatten all the files
the second one, keep the current structure
I'm open to this and will leave the decision till the final time, depending on discussions about it.
How to implement
Downloading files from Huggingface
It would not be implemented in LLamaSharp. I'll create a repo named
HuggingfaceHub
and I'm already working on it. I'm pretty sure that the downloading could be implemented without too many difficulties.As an evidence, llama.cpp has already had an example function to download model files from Huggingface. In this proposal, the downloading will be more complex because we are making a library API rather than an example, but I think I could hold this.
After the completion of the this library, We could depend on it in LLamaSharp to download files. The reason why I won't put it in LLamaSharp is because:
HuggingfaceHub
but with an old version of LLamaSharp.Pushing files to Huggingface
I'll do this in our CI. We only need to put files when we are going to publish a new release. I'll add a secret key to github actions secrets, and use huggingface-cli to push files.
Advantages
I believe this feature will bring the following advantages:
new LLamaWeights("Facebook/LLaMA", "llama2.gguf")
.Potential risks
I would appreciate for any suggestion for this proposal!
The text was updated successfully, but these errors were encountered: