Pretrained models config files #45

RiccardoLincetto · 2024-05-06T08:42:16Z

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.

ZexinHe · 2024-05-06T09:51:04Z

Hi,

You can simply change the model configs and dataset configs based on the differences described in the model_card.md.
Here's an example.

da2r-20 · 2024-05-23T06:25:03Z

Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.
Every published model contains a config.json file with the info
See here for example: https://huggingface.co/zxhezexin/openlrm-obj-small-1.1/tree/main

You can also fetch this configuration with the following code:

import transformers
model_config = transformers.PretrainedConfig.from_pretrained("zxhezexin/openlrm-obj-base-1.1")
print(model_config)

PretrainedConfig {
  "camera_embed_dim": 1024,
  "encoder_feat_dim": 768,
  "encoder_freeze": false,
  "encoder_model_name": "dinov2_vitb14_reg",
  "encoder_type": "dinov2",
  "rendering_samples_per_ray": 96,
  "transformer_dim": 768,
  "transformer_heads": 12,
  "transformer_layers": 12,
  "transformers_version": "4.28.1",
  "triplane_dim": 48,
  "triplane_high_res": 64,
  "triplane_low_res": 32
}

Personaly for pretraining I've changed the code to load the pretrained model directly

from openlrm.utils.hf_hub import wrap_model_hub

class LRMTrainer(Trainer):

    ...

    def _build_model(self, cfg):
        assert (
            cfg.experiment.type == "lrm"
        ), f"Config type {cfg.experiment.type} does not match with runner {self.__class__.__name__}"
        from openlrm.models import ModelLRM

        model_class = wrap_model_hub(ModelLRM)
        model = model_class.from_pretrained(cfg.experiment.pretrained)
        return model

you can replace cfg.experiment.pretrained with "zxhezexin/openlrm-obj-base-1.1" or add a pretrained key to your config

ZexinHe · 2024-05-29T09:39:51Z

@da2r-20 This is amazing! Thanks!

hayoung-jeremy · 2024-05-30T08:20:28Z

Hi @ZexinHe , thank you for your advice.
I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)?
My goal is to increase the inference result's texture quality.
I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data.
But the training result is not good.

training data

There are 1000 custom glb files, all processed through blender_script.py properly.
I know the number of data is not enough, so I'm currently just trying overfitting.

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 128
    transformer_dim: 1024
    transformer_layers: 16
    transformer_heads: 16
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg
    encoder_feat_dim: 768
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/home/ubuntu/training-tokyo/OpenLRM/views"
            meta_path:
                train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
                val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 1008 # higher resolution
    render_image:
        low: 512 # higher resolution
        high: 1008 # higher resolution
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 3  # reduced it because of the CUDA OOM error
    accum_steps: 1
    epochs: 2000  # modified it for overfitting
    debug_global_steps: null

val:
    batch_size: 2 # modified
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" 
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result

[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00,  1.06s/it]

as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.

previous trained inference result

I really need to increase the texture resolution.
Could you please give me an advice for that?

joshkiller · 2024-05-30T08:53:54Z

Hi @ZexinHe , thank you for your advice. I'm wondering if I can modify the resolutions much higher than 336, such as 1008(since the patch value is 14)? My goal is to increase the inference result's texture quality. I'm finetuning openlrm-mix-large-1.1, with 1000 pairs of data. But the training result is not good.

training data

There are 1000 custom glb files, all processed through blender_script.py properly. I know the number of data is not enough, so I'm currently just trying overfitting.

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 128
    transformer_dim: 1024
    transformer_layers: 16
    transformer_heads: 16
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg
    encoder_feat_dim: 768
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/home/ubuntu/training-tokyo/OpenLRM/views"
            meta_path:
                train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
                val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 1008 # higher resolution
    render_image:
        low: 512 # higher resolution
        high: 1008 # higher resolution
        region: 64
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16  # REPLACE THIS BASED ON GPU TYPE
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 3  # reduced it because of the CUDA OOM error
    accum_steps: 1
    epochs: 2000  # modified it for overfitting
    debug_global_steps: null

val:
    batch_size: 2 # modified
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1" 
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

training result

[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00,  1.06s/it]

as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good.

previous trained inference result

I really need to increase the texture resolution. Could you please give me an advice for that?

Hi boss Can I ask a question related to fune tuning ?

hayoung-jeremy · 2024-05-30T09:35:42Z

Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!

joshkiller · 2024-05-30T09:45:13Z

Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can!

I Was wondering if someone can fine tune a model and not change the general behavior of the model. Like i find that a model as stable diffusion generate some time images that can't be use for 3D reconstruction. How and with what kind of data can we remediate to that problem. so that the model cool generate only total and unique objects ? I'm doing my master intership program with text to 3d pipeline

da2r-20 · 2024-05-30T10:04:42Z

@joshkiller

I Was wondering if someone can fine tune a model and not change the general behavior of the model.

Usually what you are describing can be achived with LoRA and it's deriviatives.
but I'm not sure OpenLRM can help you.
OpenLRM is single_image->3d-reconstruction

text->3d is a different yet related task, there are other models available for this task.

If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM.
There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial,
But in the task of text->3d my opinion is that you need to get good data to improve.

joshkiller · 2024-05-30T10:14:07Z

@joshkiller

I Was wondering if someone can fine tune a model and not change the general behavior of the model.

Usually what you are describing can be achived with LoRA and it's deriviatives. but I'm not sure OpenLRM can help you. OpenLRM is single_image->3d-reconstruction

text->3d is a different yet related task, there are other models available for this task.

If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. There has been some work in different tasks stating that synthetic data pairs generated by a trained model could be beneficial, But in the task of text->3d my opinion is that you need to get good data to improve.

Thanks a lot for your answers. I will try to delve into LoRA more than I did for now

da2r-20 · 2024-05-30T10:17:51Z

@hayoung-jeremy I'm also trying to finetune the same model
Currenly it manages to overfit but with some issues.
I manage to overfit to the shape of the object well
but the textures get's lost and the overall look of the infered object appear blurry compared to the pretrain.

@ZexinHe I've also noticed that the original paper uses perceptual_weight=2.0

Training with this weight didn't improve my results though

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained models config files #45

Pretrained models config files #45

RiccardoLincetto commented May 6, 2024

ZexinHe commented May 6, 2024

da2r-20 commented May 23, 2024 •

edited

ZexinHe commented May 29, 2024

hayoung-jeremy commented May 30, 2024

joshkiller commented May 30, 2024

training data

train-sample.yaml

training result

previous trained inference result

hayoung-jeremy commented May 30, 2024

joshkiller commented May 30, 2024

da2r-20 commented May 30, 2024 •

edited

joshkiller commented May 30, 2024

da2r-20 commented May 30, 2024 •

edited

Pretrained models config files #45

Pretrained models config files #45

Comments

RiccardoLincetto commented May 6, 2024

ZexinHe commented May 6, 2024

da2r-20 commented May 23, 2024 • edited

ZexinHe commented May 29, 2024

hayoung-jeremy commented May 30, 2024

training data

train-sample.yaml

training result

previous trained inference result

joshkiller commented May 30, 2024

training data

train-sample.yaml

training result

previous trained inference result

hayoung-jeremy commented May 30, 2024

joshkiller commented May 30, 2024

da2r-20 commented May 30, 2024 • edited

joshkiller commented May 30, 2024

da2r-20 commented May 30, 2024 • edited

da2r-20 commented May 23, 2024 •

edited

da2r-20 commented May 30, 2024 •

edited

da2r-20 commented May 30, 2024 •

edited