Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained models config files #45

Open
RiccardoLincetto opened this issue May 6, 2024 · 10 comments
Open

Pretrained models config files #45

RiccardoLincetto opened this issue May 6, 2024 · 10 comments

Comments

@RiccardoLincetto
Copy link

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.

@ZexinHe
Copy link
Collaborator

ZexinHe commented May 6, 2024

Hi,

You can simply change the model configs and dataset configs based on the differences described in the model_card.md.
Here's an example.
image

@da2r-20
Copy link

da2r-20 commented May 23, 2024

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.
Every published model contains a config.json file with the info
See here for example: https://huggingface.co/zxhezexin/openlrm-obj-small-1.1/tree/main

You can also fetch this configuration with the following code:

import transformers
model_config = transformers.PretrainedConfig.from_pretrained("zxhezexin/openlrm-obj-base-1.1")
print(model_config)
PretrainedConfig {
  "camera_embed_dim": 1024,
  "encoder_feat_dim": 768,
  "encoder_freeze": false,
  "encoder_model_name": "dinov2_vitb14_reg",
  "encoder_type": "dinov2",
  "rendering_samples_per_ray": 96,
  "transformer_dim": 768,
  "transformer_heads": 12,
  "transformer_layers": 12,
  "transformers_version": "4.28.1",
  "triplane_dim": 48,
  "triplane_high_res": 64,
  "triplane_low_res": 32
}

Personaly for pretraining I've changed the code to load the pretrained model directly

from openlrm.utils.hf_hub import wrap_model_hub

class LRMTrainer(Trainer):

    ...

    def _build_model(self, cfg):
        assert (
            cfg.experiment.type == "lrm"
        ), f"Config type {cfg.experiment.type} does not match with runner {self.__class__.__name__}"
        from openlrm.models import ModelLRM

        model_class = wrap_model_hub(ModelLRM)
        model = model_class.from_pretrained(cfg.experiment.pretrained)
        return model

you can replace cfg.experiment.pretrained with "zxhezexin/openlrm-obj-base-1.1" or add a pretrained key to your config

@ZexinHe
Copy link
Collaborator

ZexinHe commented May 29, 2024

@da2r-20 This is amazing! Thanks!

@hayoung-jeremy
Copy link

Hi @ZexinHe , thank you for your advice.
I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)?
My goal is to increase the inference result's texture quality.
I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data.
But the training result is not good.

training data

There are 1000 custom glb files, all processed through blender_script.py properly.
I know the number of data is not enough, so I'm currently just trying overfitting.

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 128
    transformer_dim: 1024
    transformer_layers: 16
    transformer_heads: 16
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg
    encoder_feat_dim: 768
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/home/ubuntu/training-tokyo/OpenLRM/views"
            meta_path:
                train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
                val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 1008 # higher resolution
    render_image:
        low: 512 # higher resolution
        high: 1008 # higher resolution
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 3  # reduced it because of the CUDA OOM error
    accum_steps: 1
    epochs: 2000  # modified it for overfitting
    debug_global_steps: null

val:
    batch_size: 2 # modified
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" 
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result

[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|| 60000/60000 [15:40:28<00:00,  1.06s/it]

as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.

F_AAA_22FW_T002-ezgif com-optimize

F_AAB_23SS_O001-ezgif com-optimize

previous trained inference result

F_KOC_22SS_T004-ezgif com-optimize

M_AAA_22FW_T004-ezgif com-optimize

I really need to increase the texture resolution.
Could you please give me an advice for that?

@joshkiller
Copy link

Hi @ZexinHe , thank you for your advice. I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)? My goal is to increase the inference result's texture quality. I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data. But the training result is not good.

training data

There are 1000 custom glb files, all processed through blender_script.py properly. I know the number of data is not enough, so I'm currently just trying overfitting.

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 128
    transformer_dim: 1024
    transformer_layers: 16
    transformer_heads: 16
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg
    encoder_feat_dim: 768
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/home/ubuntu/training-tokyo/OpenLRM/views"
            meta_path:
                train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
                val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 1008 # higher resolution
    render_image:
        low: 512 # higher resolution
        high: 1008 # higher resolution
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 3  # reduced it because of the CUDA OOM error
    accum_steps: 1
    epochs: 2000  # modified it for overfitting
    debug_global_steps: null

val:
    batch_size: 2 # modified
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" 
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result

[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|| 60000/60000 [15:40:28<00:00,  1.06s/it]

as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.

F_AAA_22FW_T002-ezgif com-optimize F_AAA_22FW_T002-ezgif com-optimize

F_AAB_23SS_O001-ezgif com-optimize F_AAB_23SS_O001-ezgif com-optimize

previous trained inference result

F_KOC_22SS_T004-ezgif com-optimize F_KOC_22SS_T004-ezgif com-optimize

M_AAA_22FW_T004-ezgif com-optimize M_AAA_22FW_T004-ezgif com-optimize

I really need to increase the texture resolution. Could you please give me an advice for that?

Hi boss Can I ask a question related to fune tuning ?

@hayoung-jeremy
Copy link

Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!

@joshkiller
Copy link

Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!

I Was wondering if someone can fine tune a model and not change the general behavior of the model. Like i find that a model as stable diffusion generate some time images that can't be use for 3D reconstruction. How and with what kind of data can we remediate to that problem. so that the model cool generate only total and unique objects ? I'm doing my master intership program with text to 3d pipeline

@da2r-20
Copy link

da2r-20 commented May 30, 2024

@joshkiller

I Was wondering if someone can fine tune a model and not change the general behavior of the model.

Usually what you are describing can be achived with LoRA and it's deriviatives.
but I'm not sure OpenLRM can help you.
OpenLRM is single_image->3d-reconstruction

text->3d is a different yet related task, there are other models available for this task.

If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM.
There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial,
But in the task of text->3d my opinion is that you need to get good data to improve.

@joshkiller
Copy link

@joshkiller

I Was wondering if someone can fine tune a model and not change the general behavior of the model.

Usually what you are describing can be achived with LoRA and it's deriviatives. but I'm not sure OpenLRM can help you. OpenLRM is single_image->3d-reconstruction

text->3d is a different yet related task, there are other models available for this task.

If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial, But in the task of text->3d my opinion is that you need to get good data to improve.

Thanks a lot for your answers. I will try to delve into LoRA more than I did for now

@da2r-20
Copy link

da2r-20 commented May 30, 2024

@hayoung-jeremy I'm also trying to finetune the same model
Currenly it manages to overfit but with some issues.
I manage to overfit to the shape of the object well
but the textures get's lost and the overall look of the infered object appear blurry compared to the pretrain.

@ZexinHe I've also noticed that the original paper uses perceptual_weight=2.0
image

Training with this weight didn't improve my results though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants