Replacing pipeline components with different models #302
Replies: 2 comments 4 replies
-
Also one other question: could you tell me how small the ROIs (input to text extraction service ) generally is using dd.analyzer's layout model? Are they as small as work or paragraph or list etc? |
Beta Was this translation helpful? Give feedback.
-
It does not work out-of-the box. If you want to use a particular library you have to write a That is, if you want to use an end-to-end OCR predictor from PaddleOCR, you will have to wrap the ocr detector in a deepdoctection class PaddleOCRDetector(ObjectDetector)
def __init__(self, config_path_yaml, path_weights): # if the paddle model requires a config file and a weights file
self.name = "paddle-ocr"
self.config = config_path_yaml
self. path_weights = path_weights
self.paddle_model = # code to instantiate the PaddleOCR model
def predict(self. np_img: ImageType):
# transform the numpy image so that it can be loaded into the paddle model
paddle_input = transform_to_paddle_input(np_image)
paddle_outputs = self.paddle_model(paddle_input)
# transform paddle outputs into a list of 'DetectionResult`
detection_results = paddle_outputs_to_detection_results(paddle_outputs)
return detection_results
def get_requirements(cls):
return [] # or you can write a requirement function, to check if PaddlePaddle is installed
def clone()
return self.__class__(# yout init input values) I recommend to look into some examples in the library, how the inferface a implemented, e.g. ( You can then plug your wrapper in the |
Beta Was this translation helpful? Give feedback.
-
is it possible to replace pipeline components with models that aren't specified in ModelCatalog?
For example to do text extraction service to PaddleOCR instead of Tesseract?
More generally can you also register any kinds of models to model catalog by using register() method from deepdoctection/extern
/model.py?
Beta Was this translation helpful? Give feedback.
All reactions