Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files: CreateFile adds backwards-compatible io.Reader input, Purpose const #590

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tucnak
Copy link

@tucnak tucnak commented Nov 16, 2023

This PR supersedes #577 coincidentally.

Describe the change
I have re-implemented FileRequest and CreateFile so as to support arbitrary io.Reader inputs which must have priority over FilePath and also fixed some errors in the implementation. I've also took the liberty of adding a Purpose enum and testing for it, as fine-tuning requires a particular file format. Perhaps it's worthwhile getting rid of CreateFormFile semantics altogether at some later stage as it's, frankly, short-sighted and embarrassing. Thankfully, it hasn't touched the API boundary.

Also: I've made some other adjustments in streaming API that have not been included part of this PR. I'm eager to hear feedback on this commit as it's implementing "teeing" of streaming response so as to make proxying so much easier. I reckon a great deal of go-openai use cases will revolve around proxying, in fact that's our primary use-case & if you're supposed to forward the response as well as parse it, then this type of change will be very favourable. My implementation takes advantage of io.Pipe and io.TeeReader and hence the server code is quite straightforward:

// passthrough is re-directing the output of tee buffer
func passthrough(c *fiber.Ctx, r io.Reader) error {
	var ctx = c.Context()
	ctx.SetContentType("text/event-stream")
	ctx.Response.Header.Set("Cache-Control", "no-cache")
	ctx.Response.Header.Set("Connection", "keep-alive")
	ctx.Response.Header.Set("Transfer-Encoding", "chunked")
	ctx.Response.Header.Set("Access-Control-Allow-Origin", "*")
	ctx.Response.Header.Set("Access-Control-Allow-Headers", "Cache-Control")
	ctx.Response.Header.Set("Access-Control-Allow-Credentials", "true")
	ctx.SetBodyStreamWriter(fasthttp.StreamWriter(func(w *bufio.Writer) {
		io.Copy(w, r)
		w.Flush()
	}))
	return nil
}

//v1/completions
func completions(c *fiber.Ctx) error {
	var req openai.CompletionRequest
	api, err := parseRequest(c, &req)
	if err != nil {
		return err
	}
	if req.Stream {
		stream, err := api.CreateCompletionStream(ctx, req)
		if err != nil {
			return err
		}
		tee := stream.Tee()
		go func() {
			defer stream.Close()
			defer tee.Close()
			for chunk, err := stream.Recv(); err != io.EOF; {
				if err != nil {
					api.Errorln(err)
					return
				}
				// process streaming chunk normally...
			}
		}()
		return passthrough(c, tee)
	}
	resp, err := api.CreateCompletion(ctx, req)
	if err != nil {
		return err
	}
	return c.JSON(resp)
}

@tucnak
Copy link
Author

tucnak commented Nov 16, 2023

I've also learnt about #568 which (to my shock and regret) has been merged a few days earlier, unfortunately I had missed the commit in my tree at the time of drafting this PR. @sashabaranov please consider reviewing/re-evaluating the API surface, as it seems the changes are very fresh, and there's time still to nip it in the bud. I mean, having two separate functions for doing what effectively amounts to poor man's io.Reader work— is something you would likely regret soon. I've made a similar mistake when originally designing telebot some 7 years ago, and I wasn't as lucky as you are now; by the time I figured it out, it had already infected the API.

@floodfx
Copy link
Contributor

floodfx commented Nov 16, 2023

Hey @tucnak. Author of #568 here. Agree with your feedback that perhaps more design thinking should go into API design. On that note, why not go further and remove the FileName and FilePath and only allow a io.Reader? Personally not a huge fan of "you must set one or the other". One could go further and separate out files uploads for fine tuning as well which you point out has different format requirements. (Though I'd hesitate to assume having the correct file extension means it is formatted properly.)

Anyway, we can probably agree that these issues could be better resolved in a more thorough design phase facilitated by the contributors. The downside of which is slower iteration of the library. Which is worse? Less than idea API or slower development? My vote is the latter.

@tucnak
Copy link
Author

tucnak commented Nov 17, 2023

@floodfx Thanks for feedback! I'd originally chose to keep FileName and FilePath for compatibility so that my change doesn't break any existing code. I'm not the maintainer, and don't otherwise care for the API design, for the most part I'm simply trying to protect my fork from upstream, and it seems like integrating changes like this would likely make our lives easier along the road...

@floodfx
Copy link
Contributor

floodfx commented Nov 17, 2023

Oh yeah, I get why you kept the existing properties. I added a new func so I wouldn't break existing code. Both valid solutions. :) My larger point is the maintainers could have a bit of a heavier hand or at least a perspective on generally the approach they want folks like us to take and then we could all follow that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants