Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

Not able to generate predicted text after Done with copy master to slices. with 1.3B pre-trained model #269

Open
SanchiMittal opened this issue Jan 21, 2022 · 0 comments

Comments

@SanchiMittal
Copy link

Describe the bug

On running the main.py script using pre-trained 1.3B model with the --predict flag on, the runtime is stuck for hours after printing Done with copy master to slices., and the predictions are not generated.

To Reproduce

Steps to reproduce the behavior:

  1. Download pre-trained 1.3B model from https://mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/ using wget
  2. Create a file with prompt text sample_prompt.txt
  3. Edit config file at ./GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/config.json. set "mesh_shape" : "x:1,y:1" (accprding to gpu devices), set model_path to GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/
  4. From root directory of the repository, run python3 main.py --predict --prompt sample_prompt.txt --gpu_ids 'device:GPU:0' --model "/home/sanchi/GPTNeo/GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/config.json"

Expected behavior
Generate predicted text

Runtime Logs

Current step 362000
Saving config to /home/sanchi/GPTNeo/GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/
Done!
params = defaultdict(<function fetch_model_params.<locals>.<lambda> at 0x7f1f57167b80>, {'n_head': 16, 'n_vocab': 50257, 'embed_dropout': 0, 'lr': 0.0002, 'lr_decay': 'cosine', 'warmup_steps': 3000, 'beta1': 0.9, 'beta2': 0.95, 'epsilon': 1e-08, 'opt_name': 'adam', 'weight_decay': 0, 'train_batch_size': 512, 'attn_dropout': 0, 'train_steps': 400000, 'lr_decay_end': 300000, 'eval_steps': 10, 'predict_steps': 0, 'res_dropout': 0, 'eval_batch_size': 128, 'predict_batch_size': 128, 'iterations': 500, 'n_embd': 2048, 'datasets': [['pile', None, None, None]], 'model_path': '/home/sanchi/GPTNeo/GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/', 'n_ctx': 2048, 'n_layer': 24, 'scale_by_depth': True, 'scale_by_in': False, 'attention_types': ['global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local', 'global', 'local'], 'mesh_shape': 'x:1,y:2', 'layout': 'batch:x,memory_length:y,embd:y', 'activation_function': 'gelu', 'recompute_grad': True, 'gradient_clipping': 1.0, 'tokens_per_mb_per_replica': 4096, 'precision': 'bfloat16', 'padding_id': 50257, 'eos_id': 50256, 'dataset_configs': {'pile': {'n_vocab': 50257, 'path': 'gs://neo-datasets/pile/pile_*.tfrecords', 'eval_path': 'gs://neo-datasets/pile_val.tfrecords', 'tokenizer_is_pretrained': True, 'tokenizer_path': 'gpt2', 'eos_id': 50256, 'padding_id': 50257}}, 'mlm_training': False, 'causal': True, 'num_cores': 2, 'auto_layout': False, 'auto_layout_and_mesh_shape': False, 'use_tpu': False, 'gpu_ids': ['device:GPU:0', 'device:GPU:1'], 'steps_per_checkpoint': 5000, 'predict': True, 'model': 'GPT', 'export': False, 'sampling_use_entmax': False, 'moe_layers': None, 'slow_sampling': False})
Using config: {'_model_dir': '/home/sanchi/GPTNeo/GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/', '_tf_random_seed': None, '_save_summary_steps': 500, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=500, num_shards=2, num_cores_per_replica=1, per_host_input_for_training=4, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1, experimental_allow_per_host_v2_parallel_get_next=False, experimental_feed_hook=None), '_cluster': None}
_TPUContext: eval_on_tpu True
eval_on_tpu ignored because use_tpu is False.
Predictions generated
Calling model_fn.
Running infer on CPU/GPU
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Defaulting to GELU activation (see here: https://arxiv.org/abs/1606.08415)
Variable gpt2/h0/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h0/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h0/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h0/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h0/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h0/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h1/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h1/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h1/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h1/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h1/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h1/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h10/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h10/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h10/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h10/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h10/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h10/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h11/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h11/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h11/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h11/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h11/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h11/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h12/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h12/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h12/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h12/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h12/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h12/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h13/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h13/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h13/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h13/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h13/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h13/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h14/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h14/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h14/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h14/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h14/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h14/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h15/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h15/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h15/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h15/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h15/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h15/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h16/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h16/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h16/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h16/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h16/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h16/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h17/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h17/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h17/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h17/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h17/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h17/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h18/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h18/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h18/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h18/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h18/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h18/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h19/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h19/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h19/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h19/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h19/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h19/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h2/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h2/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h2/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h2/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h2/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h2/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h20/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h20/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h20/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h20/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h20/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h20/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h21/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h21/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h21/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h21/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h21/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h21/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h22/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h22/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h22/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h22/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h22/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h22/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h23/attn/k                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h23/attn/o                                              size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h23/attn/q                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h23/attn/v                                              size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h23/mlp/conv1d_main/c_fc/kernel                         size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h23/mlp/conv1d_main/c_proj/kernel                       size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h3/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h3/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h3/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h3/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h3/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h3/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h4/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h4/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h4/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h4/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h4/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h4/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h5/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h5/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h5/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h5/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h5/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h5/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h6/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h6/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h6/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h6/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h6/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h6/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h7/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h7/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h7/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h7/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h7/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h7/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h8/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h8/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h8/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h8/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h8/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h8/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/h9/attn/k                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h9/attn/o                                               size 4194304      slice_size 2097152      Shape[heads=2048, embd=2048]                                
Variable gpt2/h9/attn/q                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h9/attn/v                                               size 4194304      slice_size 2097152      Shape[embd=2048, heads=2048]                                
Variable gpt2/h9/mlp/conv1d_main/c_fc/kernel                          size 16777216     slice_size 8388608      Shape[embd=2048, intermediate_expanded=8192]                
Variable gpt2/h9/mlp/conv1d_main/c_proj/kernel                        size 16777216     slice_size 8388608      Shape[intermediate_expanded=8192, embd=2048]                
Variable gpt2/wpe                                                     size 4194304      slice_size 2097152      Shape[embed_sequence=2048, embd=2048]                       
Variable gpt2/wte                                                     size 102926336    slice_size 51463168     Shape[vocab=50257, embd=2048]                               
Variable stacked/gpt2/h0/mlp/conv1d_main/c_fc/bias                    size 65536        slice_size 65536        Shape[stacked=8, intermediate_expanded=8192]                
    gpt2/h0/mlp/conv1d_main/c_fc/bias
    gpt2/h1/mlp/conv1d_main/c_fc/bias
    gpt2/h2/mlp/conv1d_main/c_fc/bias
    gpt2/h3/mlp/conv1d_main/c_fc/bias
    gpt2/h4/mlp/conv1d_main/c_fc/bias
    gpt2/h5/mlp/conv1d_main/c_fc/bias
    gpt2/h6/mlp/conv1d_main/c_fc/bias
    gpt2/h7/mlp/conv1d_main/c_fc/bias
Variable stacked/gpt2/h0/norm_1/g                                     size 131072       slice_size 65536        Shape[stacked=64, embd=2048]                                
    gpt2/h0/norm_1/g
    gpt2/h0/norm_1/b
    gpt2/h0/attn/compute_output_bias/o_b
    gpt2/h0/norm_2/g
    gpt2/h0/norm_2/b
    gpt2/h0/mlp/conv1d_main/c_proj/bias
    gpt2/h1/norm_1/g
    gpt2/h1/norm_1/b
    gpt2/h1/attn/compute_output_bias/o_b
    gpt2/h1/norm_2/g
    gpt2/h1/norm_2/b
    gpt2/h1/mlp/conv1d_main/c_proj/bias
    gpt2/h2/norm_1/g
    gpt2/h2/norm_1/b
    gpt2/h2/attn/compute_output_bias/o_b
    gpt2/h2/norm_2/g
    gpt2/h2/norm_2/b
    gpt2/h2/mlp/conv1d_main/c_proj/bias
    gpt2/h3/norm_1/g
    gpt2/h3/norm_1/b
    gpt2/h3/attn/compute_output_bias/o_b
    gpt2/h3/norm_2/g
    gpt2/h3/norm_2/b
    gpt2/h3/mlp/conv1d_main/c_proj/bias
    gpt2/h4/norm_1/g
    gpt2/h4/norm_1/b
    gpt2/h4/attn/compute_output_bias/o_b
    gpt2/h4/norm_2/g
    gpt2/h4/norm_2/b
    gpt2/h4/mlp/conv1d_main/c_proj/bias
    gpt2/h5/norm_1/g
    gpt2/h5/norm_1/b
    gpt2/h5/attn/compute_output_bias/o_b
    gpt2/h5/norm_2/g
    gpt2/h5/norm_2/b
    gpt2/h5/mlp/conv1d_main/c_proj/bias
    gpt2/h6/norm_1/g
    gpt2/h6/norm_1/b
    gpt2/h6/attn/compute_output_bias/o_b
    gpt2/h6/norm_2/g
    gpt2/h6/norm_2/b
    gpt2/h6/mlp/conv1d_main/c_proj/bias
    gpt2/h7/norm_1/g
    gpt2/h7/norm_1/b
    gpt2/h7/attn/compute_output_bias/o_b
    gpt2/h7/norm_2/g
    gpt2/h7/norm_2/b
    gpt2/h7/mlp/conv1d_main/c_proj/bias
    gpt2/h8/norm_1/g
    gpt2/h8/norm_1/b
    gpt2/h8/attn/compute_output_bias/o_b
    gpt2/h8/norm_2/g
    gpt2/h8/norm_2/b
    gpt2/h8/mlp/conv1d_main/c_proj/bias
    gpt2/h9/norm_1/g
    gpt2/h9/norm_1/b
    gpt2/h9/attn/compute_output_bias/o_b
    gpt2/h9/norm_2/g
    gpt2/h9/norm_2/b
    gpt2/h9/mlp/conv1d_main/c_proj/bias
    gpt2/h10/norm_1/g
    gpt2/h10/norm_1/b
    gpt2/h10/attn/compute_output_bias/o_b
    gpt2/h10/norm_2/g
Variable stacked/gpt2/h10/norm_2/b                                    size 131072       slice_size 65536        Shape[stacked=64, embd=2048]                                
    gpt2/h10/norm_2/b
    gpt2/h10/mlp/conv1d_main/c_proj/bias
    gpt2/h11/norm_1/g
    gpt2/h11/norm_1/b
    gpt2/h11/attn/compute_output_bias/o_b
    gpt2/h11/norm_2/g
    gpt2/h11/norm_2/b
    gpt2/h11/mlp/conv1d_main/c_proj/bias
    gpt2/h12/norm_1/g
    gpt2/h12/norm_1/b
    gpt2/h12/attn/compute_output_bias/o_b
    gpt2/h12/norm_2/g
    gpt2/h12/norm_2/b
    gpt2/h12/mlp/conv1d_main/c_proj/bias
    gpt2/h13/norm_1/g
    gpt2/h13/norm_1/b
    gpt2/h13/attn/compute_output_bias/o_b
    gpt2/h13/norm_2/g
    gpt2/h13/norm_2/b
    gpt2/h13/mlp/conv1d_main/c_proj/bias
    gpt2/h14/norm_1/g
    gpt2/h14/norm_1/b
    gpt2/h14/attn/compute_output_bias/o_b
    gpt2/h14/norm_2/g
    gpt2/h14/norm_2/b
    gpt2/h14/mlp/conv1d_main/c_proj/bias
    gpt2/h15/norm_1/g
    gpt2/h15/norm_1/b
    gpt2/h15/attn/compute_output_bias/o_b
    gpt2/h15/norm_2/g
    gpt2/h15/norm_2/b
    gpt2/h15/mlp/conv1d_main/c_proj/bias
    gpt2/h16/norm_1/g
    gpt2/h16/norm_1/b
    gpt2/h16/attn/compute_output_bias/o_b
    gpt2/h16/norm_2/g
    gpt2/h16/norm_2/b
    gpt2/h16/mlp/conv1d_main/c_proj/bias
    gpt2/h17/norm_1/g
    gpt2/h17/norm_1/b
    gpt2/h17/attn/compute_output_bias/o_b
    gpt2/h17/norm_2/g
    gpt2/h17/norm_2/b
    gpt2/h17/mlp/conv1d_main/c_proj/bias
    gpt2/h18/norm_1/g
    gpt2/h18/norm_1/b
    gpt2/h18/attn/compute_output_bias/o_b
    gpt2/h18/norm_2/g
    gpt2/h18/norm_2/b
    gpt2/h18/mlp/conv1d_main/c_proj/bias
    gpt2/h19/norm_1/g
    gpt2/h19/norm_1/b
    gpt2/h19/attn/compute_output_bias/o_b
    gpt2/h19/norm_2/g
    gpt2/h19/norm_2/b
    gpt2/h19/mlp/conv1d_main/c_proj/bias
    gpt2/h20/norm_1/g
    gpt2/h20/norm_1/b
    gpt2/h20/attn/compute_output_bias/o_b
    gpt2/h20/norm_2/g
    gpt2/h20/norm_2/b
    gpt2/h20/mlp/conv1d_main/c_proj/bias
    gpt2/h21/norm_1/g
    gpt2/h21/norm_1/b
Variable stacked/gpt2/h16/mlp/conv1d_main/c_fc/bias                   size 65536        slice_size 65536        Shape[stacked=8, intermediate_expanded=8192]                
    gpt2/h16/mlp/conv1d_main/c_fc/bias
    gpt2/h17/mlp/conv1d_main/c_fc/bias
    gpt2/h18/mlp/conv1d_main/c_fc/bias
    gpt2/h19/mlp/conv1d_main/c_fc/bias
    gpt2/h20/mlp/conv1d_main/c_fc/bias
    gpt2/h21/mlp/conv1d_main/c_fc/bias
    gpt2/h22/mlp/conv1d_main/c_fc/bias
    gpt2/h23/mlp/conv1d_main/c_fc/bias
Variable stacked/gpt2/h21/attn/compute_output_bias/o_b                size 36864        slice_size 18432        Shape[stacked=18, embd=2048]                                
    gpt2/h21/attn/compute_output_bias/o_b
    gpt2/h21/norm_2/g
    gpt2/h21/norm_2/b
    gpt2/h21/mlp/conv1d_main/c_proj/bias
    gpt2/h22/norm_1/g
    gpt2/h22/norm_1/b
    gpt2/h22/attn/compute_output_bias/o_b
    gpt2/h22/norm_2/g
    gpt2/h22/norm_2/b
    gpt2/h22/mlp/conv1d_main/c_proj/bias
    gpt2/h23/norm_1/g
    gpt2/h23/norm_1/b
    gpt2/h23/attn/compute_output_bias/o_b
    gpt2/h23/norm_2/g
    gpt2/h23/norm_2/b
    gpt2/h23/mlp/conv1d_main/c_proj/bias
    gpt2/ln_f/g
    gpt2/ln_f/b
Variable stacked/gpt2/h8/mlp/conv1d_main/c_fc/bias                    size 65536        slice_size 65536        Shape[stacked=8, intermediate_expanded=8192]                
    gpt2/h8/mlp/conv1d_main/c_fc/bias
    gpt2/h9/mlp/conv1d_main/c_fc/bias
    gpt2/h10/mlp/conv1d_main/c_fc/bias
    gpt2/h11/mlp/conv1d_main/c_fc/bias
    gpt2/h12/mlp/conv1d_main/c_fc/bias
    gpt2/h13/mlp/conv1d_main/c_fc/bias
    gpt2/h14/mlp/conv1d_main/c_fc/bias
    gpt2/h15/mlp/conv1d_main/c_fc/bias
Trainable Variables            count: 152     Total size: 1315575808       Total slice_size: 657886208      
All Variables                  count: 152     Total size: 1315575808       Total slice_size: 657886208      
Counters:
allconcat: 1.05e+06
 allconcat/0: 1.05e+06
  allconcat/0/reshape_op: 1.05e+06
allreduce: 2.19e+11
 allreduce/[0]: 2
  allreduce/[0]/reduce_op: 2
 allreduce/[1]: 2.19e+11
  allreduce/[1]/einsum_op: 2.19e+11
  allreduce/[1]/reduce_op: 2.53e+08
einsum: 4.24e+14
einsum_unique: 4.11e+14
output: 3.36e+12
 output/AddOperation: 7.75e+11
 output/BinaryOpWithBroadcasting: 1.32e+08
 output/BroadcastOperation: 1.03e+11
 output/ConcatOperation: 5.15e+10
 output/Constant: 4.92e+04
 output/EinsumOperation: 8.01e+11
 output/ImportOperation: 5.24e+05
 output/OneHotOperation: 2.64e+10
 output/RangeOperation: 6.35e+04
 output/ReduceOperation: 4.54e+08
 output/ReshapeOperation: 1.93e+11
 output/ScalarAddOperation: 1.03e+11
 output/ScalarMultiplyOperation: 3.22e+11
 output/ShiftOperation: 2.58e+10
 output/SlicewiseOperation: 7.55e+11
 output/StackedVariable: 6.92e+05
 output/StopGradient: 1.55e+11
 output/UnstackOperation: 6.92e+05
 output/Variable: 1.32e+09
 output/WhileLoopOperation: 5.15e+10
output_unique: 2.32e+12
 output_unique/AddOperation: 5.94e+11
 output_unique/BinaryOpWithBroadcasting: 6.79e+07
 output_unique/BroadcastOperation: 1.03e+11
 output_unique/ConcatOperation: 2.58e+10
 output_unique/Constant: 2.46e+04
 output_unique/EinsumOperation: 4.92e+11
 output_unique/ImportOperation: 2.62e+05
 output_unique/OneHotOperation: 1.32e+10
 output_unique/RangeOperation: 3.28e+04
 output_unique/ReduceOperation: 2.27e+08
 output_unique/ReshapeOperation: 1.03e+11
 output_unique/ScalarAddOperation: 5.16e+10
 output_unique/ScalarMultiplyOperation: 1.68e+11
 output_unique/ShiftOperation: 1.29e+10
 output_unique/SlicewiseOperation: 6e+11
 output_unique/StackedVariable: 4.96e+05
 output_unique/StopGradient: 1.29e+11
 output_unique/UnstackOperation: 4.96e+05
 output_unique/Variable: 1.32e+09
 output_unique/WhileLoopOperation: 2.58e+10
variables: 1.32e+09
 variables/trainable: 1.32e+09
Done calling model_fn.
Graph was finalized.
Restoring parameters from /home/sanchi/GPTNeo/GPT_1_3B/mystic.the-eye.eu/public/AI/gptneo-release/GPT3_XL/model.ckpt-362000
Running local_init_op.
Done running local_init_op.
Before copy master to slices.
Done with copy master to slices.

Environment:

  • GPUs: I am using a DGX Machine with 4 GPUs of 32 GB RAM each.
  • Configs: Ubuntu 18.04.5, conda environment with Python 3.9.7
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant