sequential `for samples in batch` loop in Videochat2 is slow, any ways to parallalize it ? #143

dragen1860 · 2024-03-12T07:36:18Z

Hi, dear all:
I noticed a sequential for loop in videochat2_it.py forward function:

Ask-Anything/video_chat2/models/videochat2_it.py

Lines 240 to 288 in 078540a

    
           # handle each prompt individually 
        
           for idx, prompt in enumerate(text_input): 
        
               tmp_img_embeds = img_embeds[idx].unsqueeze(0) 
        
               # split the prompt via END_TOKEN 
        
               end_token = self.img_end_token if use_image else self.end_token 
        
               p_before, p_after = prompt.split(end_token) 
        
               p_after = end_token + p_after 
        
               p_before_tokens = self.llama_tokenizer(p_before, return_tensors="pt", add_special_tokens=False).to(tmp_img_embeds.device) 
        
               p_after_tokens = self.llama_tokenizer(p_after, return_tensors="pt", add_special_tokens=False).to(tmp_img_embeds.device) 
        
               if self.use_lora: 
        
                   p_before_embeds = self.llama_model.base_model.model.model.embed_tokens(p_before_tokens.input_ids) 
        
                   p_after_embeds = self.llama_model.base_model.model.model.embed_tokens(p_after_tokens.input_ids) 
        
               else: 
        
                   p_before_embeds = self.llama_model.model.embed_tokens(p_before_tokens.input_ids) 
        
                   p_after_embeds = self.llama_model.model.embed_tokens(p_after_tokens.input_ids) 
        
               input_embeds = torch.cat([p_before_embeds, tmp_img_embeds, p_after_embeds], dim=1) 
        
               # extract the answers and mask the target 
        
               # the answers are only in the p_after 
        
               sep1 = self.begin_signal + self.role[0] + ": " 
        
               sep2 = self.begin_signal + self.role[1] + ": " 
        
               raw_text = p_after.split(sep2) 
        
               for idx in range(1, len(raw_text)): 
        
                   raw_text[idx] = sep2 + raw_text[idx] 
        
               # the first raw_text contains system and question 
        
               # the last raw_text only contains answer 
        
               # rstrip() for the extra " " 
        
               answer_targets = p_after_tokens.input_ids.clone() 
        
               # target: "###Human:       ###Assistant: xxxxx. ###" 
        
               system = raw_text[0].split(sep1)[0] 
        
               system_len = self._get_text_len(system.rstrip()) 
        
               sep_len = self._get_text_len(sep1.rstrip()) 
        
               cur_len = self._get_text_len(raw_text[0].rstrip()) 
        
               answer_targets[:, :system_len] = -100 
        
               answer_targets[:, (system_len+sep_len):cur_len] = -100 
        
               for text in raw_text[1:-1]:  
        
                   total_len = self._get_text_len(text.rstrip()) 
        
                   ans_len = self._get_text_len((text.split(sep1)[0]+sep1).rstrip()) 
        
                   answer_targets[:, (cur_len+ans_len):(cur_len+total_len)] = -100 
        
                   cur_len += total_len 
        
               cur_len += self._get_text_len(raw_text[-1].rstrip()) 
        
               assert cur_len == answer_targets.shape[1], f"The final length ({cur_len}) is not equal to the original prompt ({answer_targets.shape[1]}): {prompt}" 
        
               max_len = max(max_len, input_embeds.shape[1]) 
        
               input_embed_list.append(input_embeds) 
        
               p_before_len_list.append(p_before_tokens.input_ids.shape[1]) 
        
               target_list.append(answer_targets) 
        
           # plus one for bos

As most of tensor computation is parallel, any good idea to implement the for loop parallel?

The text was updated successfully, but these errors were encountered:

Andy1621 · 2024-03-12T08:04:00Z

Hopefully for your solution!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sequential `for samples in batch` loop in Videochat2 is slow, any ways to parallalize it ? #143

sequential `for samples in batch` loop in Videochat2 is slow, any ways to parallalize it ? #143

dragen1860 commented Mar 12, 2024 •

edited

Andy1621 commented Mar 12, 2024

sequential for samples in batch loop in Videochat2 is slow, any ways to parallalize it ? #143

sequential for samples in batch loop in Videochat2 is slow, any ways to parallalize it ? #143

Comments

dragen1860 commented Mar 12, 2024 • edited

Andy1621 commented Mar 12, 2024

sequential `for samples in batch` loop in Videochat2 is slow, any ways to parallalize it ? #143

sequential `for samples in batch` loop in Videochat2 is slow, any ways to parallalize it ? #143

dragen1860 commented Mar 12, 2024 •

edited