Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reorganize documentation #688

Open
bwanglzu opened this issue Mar 12, 2023 · 20 comments
Open

reorganize documentation #688

bwanglzu opened this issue Mar 12, 2023 · 20 comments
Assignees

Comments

@bwanglzu
Copy link
Member

bwanglzu commented Mar 12, 2023

as pointed out by @guenthermi , a lot of improvements needs to be done in documentation page, such as motivation, data preparation, tabs,, and readme, we'll list them in this issue and improve them over time.

and this is linked to our requirement from design team @CatStark .

@guenthermi
Copy link
Member

guenthermi commented Mar 13, 2023

Here is a list of the things, which we wanted to change:
Problem: For many Non-ML engineers it is not really clear how Finetuner helps in improving search and how it works what they need to do with it to improve there search sytem.
Proposed Changes: We should include in the README and the first documentation page a small explanation on the basis of a picture which describes the mayor steps on how to use finetuner, i.e., (1) prepare (and label) your data, (2) submit a finetuning job to the cloud, (3) integrate the model into your neural search pipeline.

Problem: The documentation is perceived very technical and hard to get started.
Proposed Changes: We should create a getting started page (maybe re-use on of the notebooks) which should cover the main points from the walkthrough on one page. The notebooks are already good, but missing some essential parts for a good getting started page. For example, the data preparation and the hosting is missing.

Problem: The data preparation section only explains the format. For someone new to metric learning and search, it is hard to understand what kind of data needs to be labeled, how to label data, and why different models need data in different formats.
Proposed Changes: Add explanations for this to the data preparation section.


Tickets:

https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7

@scott-martens
Copy link
Member

I would propose to divide this into three (maybe more tickets) because this is a big job.

  • The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review; or we could do it the other way around, have engineers write it and Team Tech Content can review. I am open to either.
  • A Getting Started/Quick Start page is also a good idea. I would propose one of more engineers start on that with Tech Content involvement and review.
  • I have a more general problem with some of our notebooks, that I want to discuss. A lot of times, you follow the instructions and they just don't work. Or there is such a long processing or setup time that they're impossible to follow.
  • There should be a more general rewrite of the Finetuner docs to increase readability on several fronts. It's been a backlog ticket in Tech Content for a long time. We should perhaps prioritize it, and break it up into smaller tickets. Tech Content should probably lead on this, with engineering support.

@bwanglzu
Copy link
Member Author

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review; or we could do it the other way around, have engineers write it and Team Tech Content can review. I am open to either.

if you can write something and we review that would be nice. I think it would be nice that "someone out of the loop" write it since the engineering team already have a lot of understanding on the software, without awareness of the writer and reader are not on the same page.

@scott-martens
Copy link
Member

I will link a ticket from the tech content board.

@bwanglzu
Copy link
Member Author

A Getting Started/Quick Start page is also a good idea. I would propose one of more engineers start on that with Tech Content involvement and review.

How do you think about our current walkthrough part, or this quick start is a different section?

@bwanglzu
Copy link
Member Author

I have a more general problem with some of our notebooks, that I want to discuss. A lot of times, you follow the instructions and they just don't work. Or there is such a long processing or setup time that they're impossible to follow.

In general our documentation in the notebooks are well tested. But indeed, given the nature of fine-tuning could be time consuming, the fit might take a while to finish. Notebooks out of the documentation from finetuner are not guaranteed to be runnable.

@bwanglzu
Copy link
Member Author

fyi @CatStark

@scott-martens
Copy link
Member

How do you think about our current walkthrough part, or this quick start is a different section?

I think the walkthrough is too fast, and I might offer a fully working example with some data to fine-tune with. I would warn at each stage of what might go wrong. Like: Did you log in to Jina Cloud? Or: This may take some time, depending on cloud load. Or even: Make sure your current python environment is the one where you actually installed exactly the things we told you to install.

It's not idiot proof. I know, because I'm an idiot. :)

I've been going through other documentation (the main Jina docs actually) and doing the things it shows on the first pages, like a new user. My failure rate has been very, very high.

@scott-martens
Copy link
Member

I will link a ticket from the tech content board.

https://github.com/jina-ai/team-tech-content/issues/77

@guenthermi
Copy link
Member

How do you think about our current walkthrough part, or this quick start is a different section?

The Walkthrough is not a "Getting Started" since it tries to cover all cases, the "Getting Started" should only cover one very specific example, but this is much more detail and idiot proof as Scott said.

@guenthermi
Copy link
Member

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review;

Sounds good. As I wrote, it would be nice to have a very simple flow chart for this which displays something like those 3 steps I mentioned which then are explained along with the example in the getting started section. We could later send this to the design team to make it more beautiful.

@bwanglzu
Copy link
Member Author

more, we need to add a JAC page to JAC documentation

@bwanglzu
Copy link
Member Author

bwanglzu commented Mar 20, 2023

need to add documentation:

  1. LLRD
  2. CosineSimilarityLoss
  3. new way of construct DA from CSV for CosineSImilarityLoss

LLRD:

The LLRD assigns different learning rates for each layer of the model backbone. It sets a large learning rate for the top layer and uses a multiplicative decay rate to decrease the learning rate layer-by-layer from top to bottom. With a large
learning rate, the feature of the top layers changes more and could adapt to new tasks. On the contrary, the bottom layers have a small learning rate, so the strong feature learned from the pre-training is preserved.

@scott-martens
Copy link
Member

@guenthermi
Copy link
Member

@CatStark
Copy link
Member

We are looking for a new datase

@guenthermi
Copy link
Member

guenthermi commented Apr 19, 2023

Results from Getting Started Guide:
pretrained:
download (2)
finetuned:
download (3)
pretrained:
download (1)
finetuned:
results

@guenthermi
Copy link
Member

getting started image:
download (4)

@guenthermi
Copy link
Member

JAC images:
download (6)
download (5)
download (7)

@guenthermi
Copy link
Member

download (8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants