reorganize documentation #688

bwanglzu · 2023-03-12T12:00:46Z

as pointed out by @guenthermi , a lot of improvements needs to be done in documentation page, such as motivation, data preparation, tabs,, and readme, we'll list them in this issue and improve them over time.

and this is linked to our requirement from design team @CatStark .

guenthermi · 2023-03-13T12:30:40Z

Here is a list of the things, which we wanted to change:
Problem: For many Non-ML engineers it is not really clear how Finetuner helps in improving search and how it works what they need to do with it to improve there search sytem.
Proposed Changes: We should include in the README and the first documentation page a small explanation on the basis of a picture which describes the mayor steps on how to use finetuner, i.e., (1) prepare (and label) your data, (2) submit a finetuning job to the cloud, (3) integrate the model into your neural search pipeline.

Problem: The documentation is perceived very technical and hard to get started.
Proposed Changes: We should create a getting started page (maybe re-use on of the notebooks) which should cover the main points from the walkthrough on one page. The notebooks are already good, but missing some essential parts for a good getting started page. For example, the data preparation and the hosting is missing.

Problem: The data preparation section only explains the format. For someone new to metric learning and search, it is hard to understand what kind of data needs to be labeled, how to label data, and why different models need data in different formats.
Proposed Changes: Add explanations for this to the data preparation section.

Tickets:

https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7

scott-martens · 2023-03-13T12:41:33Z

I would propose to divide this into three (maybe more tickets) because this is a big job.

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review; or we could do it the other way around, have engineers write it and Team Tech Content can review. I am open to either.
A Getting Started/Quick Start page is also a good idea. I would propose one of more engineers start on that with Tech Content involvement and review.
I have a more general problem with some of our notebooks, that I want to discuss. A lot of times, you follow the instructions and they just don't work. Or there is such a long processing or setup time that they're impossible to follow.
There should be a more general rewrite of the Finetuner docs to increase readability on several fronts. It's been a backlog ticket in Tech Content for a long time. We should perhaps prioritize it, and break it up into smaller tickets. Tech Content should probably lead on this, with engineering support.

bwanglzu · 2023-03-13T14:25:33Z

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review; or we could do it the other way around, have engineers write it and Team Tech Content can review. I am open to either.

if you can write something and we review that would be nice. I think it would be nice that "someone out of the loop" write it since the engineering team already have a lot of understanding on the software, without awareness of the writer and reader are not on the same page.

scott-martens · 2023-03-13T14:26:21Z

I will link a ticket from the tech content board.

bwanglzu · 2023-03-13T14:26:23Z

A Getting Started/Quick Start page is also a good idea. I would propose one of more engineers start on that with Tech Content involvement and review.

How do you think about our current walkthrough part, or this quick start is a different section?

bwanglzu · 2023-03-13T14:27:53Z

I have a more general problem with some of our notebooks, that I want to discuss. A lot of times, you follow the instructions and they just don't work. Or there is such a long processing or setup time that they're impossible to follow.

In general our documentation in the notebooks are well tested. But indeed, given the nature of fine-tuning could be time consuming, the fit might take a while to finish. Notebooks out of the documentation from finetuner are not guaranteed to be runnable.

bwanglzu · 2023-03-13T14:28:39Z

fyi @CatStark

scott-martens · 2023-03-13T14:32:56Z

How do you think about our current walkthrough part, or this quick start is a different section?

I think the walkthrough is too fast, and I might offer a fully working example with some data to fine-tune with. I would warn at each stage of what might go wrong. Like: Did you log in to Jina Cloud? Or: This may take some time, depending on cloud load. Or even: Make sure your current python environment is the one where you actually installed exactly the things we told you to install.

It's not idiot proof. I know, because I'm an idiot. :)

I've been going through other documentation (the main Jina docs actually) and doing the things it shows on the first pages, like a new user. My failure rate has been very, very high.

scott-martens · 2023-03-13T14:39:00Z

I will link a ticket from the tech content board.

https://github.com/jina-ai/team-tech-content/issues/77

guenthermi · 2023-03-13T15:35:27Z

How do you think about our current walkthrough part, or this quick start is a different section?

The Walkthrough is not a "Getting Started" since it tries to cover all cases, the "Getting Started" should only cover one very specific example, but this is much more detail and idiot proof as Scott said.

guenthermi · 2023-03-13T15:39:58Z

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review;

Sounds good. As I wrote, it would be nice to have a very simple flow chart for this which displays something like those 3 steps I mentioned which then are explained along with the example in the getting started section. We could later send this to the design team to make it more beautiful.

bwanglzu · 2023-03-15T16:05:15Z

more, we need to add a JAC page to JAC documentation

bwanglzu · 2023-03-20T14:23:14Z

need to add documentation:

LLRD
CosineSimilarityLoss
new way of construct DA from CSV for CosineSImilarityLoss

LLRD:

The LLRD assigns different learning rates for each layer of the model backbone. It sets a large learning rate for the top layer and uses a multiplicative decay rate to decrease the learning rate layer-by-layer from top to bottom. With a large
learning rate, the feature of the top layers changes more and could adapt to new tasks. On the contrary, the bottom layers have a small learning rate, so the strong feature learned from the pre-training is preserved.

scott-martens · 2023-03-21T15:43:42Z

Outline of docs from meeting w/ @bwanglzu @guenthermi @LMMilliken : https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7?pvs=4

guenthermi · 2023-04-05T07:41:09Z

First draft of the getting started: https://colab.research.google.com/drive/1DSvA9x4xi6GL7ulcUjgIYlgvUdQJqrJG?usp=sharing

CatStark · 2023-04-17T07:47:15Z

We are looking for a new datase

guenthermi · 2023-04-19T13:39:14Z

Results from Getting Started Guide:
pretrained:

finetuned:

pretrained:

finetuned:

guenthermi · 2023-04-19T13:58:15Z

getting started image:

guenthermi · 2023-04-19T14:02:40Z

JAC images:

guenthermi · 2023-04-20T12:08:51Z

bwanglzu mentioned this issue Mar 20, 2023

feat: add score document support in csv #696

Merged

2 tasks

CatStark assigned guenthermi Mar 21, 2023

scott-martens self-assigned this Mar 21, 2023

bwanglzu mentioned this issue Mar 24, 2023

Finetuner Docs Massive Edit #699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reorganize documentation #688

reorganize documentation #688

bwanglzu commented Mar 12, 2023 •

edited

guenthermi commented Mar 13, 2023 •

edited

scott-martens commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

scott-martens commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

scott-martens commented Mar 13, 2023

scott-martens commented Mar 13, 2023

guenthermi commented Mar 13, 2023

guenthermi commented Mar 13, 2023

bwanglzu commented Mar 15, 2023

bwanglzu commented Mar 20, 2023 •

edited

scott-martens commented Mar 21, 2023

guenthermi commented Apr 5, 2023

CatStark commented Apr 17, 2023

guenthermi commented Apr 19, 2023 •

edited

guenthermi commented Apr 19, 2023

guenthermi commented Apr 19, 2023

guenthermi commented Apr 20, 2023

reorganize documentation #688

reorganize documentation #688

Comments

bwanglzu commented Mar 12, 2023 • edited

guenthermi commented Mar 13, 2023 • edited

scott-martens commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

scott-martens commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

bwanglzu commented Mar 13, 2023

scott-martens commented Mar 13, 2023

scott-martens commented Mar 13, 2023

guenthermi commented Mar 13, 2023

guenthermi commented Mar 13, 2023

bwanglzu commented Mar 15, 2023

bwanglzu commented Mar 20, 2023 • edited

scott-martens commented Mar 21, 2023

guenthermi commented Apr 5, 2023

CatStark commented Apr 17, 2023

guenthermi commented Apr 19, 2023 • edited

guenthermi commented Apr 19, 2023

guenthermi commented Apr 19, 2023

guenthermi commented Apr 20, 2023

bwanglzu commented Mar 12, 2023 •

edited

guenthermi commented Mar 13, 2023 •

edited

bwanglzu commented Mar 20, 2023 •

edited

guenthermi commented Apr 19, 2023 •

edited