Extracting Text (OCR) [beta]
A common task for computer vision is extracting text from images, also known as OCR (Optical Character Recognition).
The LandingLens Python SDK has OCR models available out-of-the-box, without the need to train your own model. The models are pre-trained on a variety of fonts types, and are optimized for accuracy and speed.
Running OCR Inference
In order to extract text from an image, you can use the landingai.predict.OcrPredictor
class, and run inference on a Frame
.
The model works well with several font types. Let's try with this example image, which contains handwriting:
from landingai.predict import OcrPredictor
from landingai.pipeline.image_source import Frame
predictor = OcrPredictor(api_key="<insert your API key here>") # (1)!
frame = Frame.from_image("/path/to/image.png") # (2)!
frame.run_predict(predictor) # (3)!
for prediction in frame.predictions: # (4)!
print(f"{prediction.text} (Confidence: {prediction.score})") # (5)!
- Create an
OcrPredictor
instance with your API key. Visit https://app.landing.ai/ and see Getting the API Key for more details on how to get your API key. You can optionally specify alanguage
parameter to tell the backend to pick a model that was trained on that particular language; the defaultch
supports Chinesse and English characters, while the valueen
only supports English characters and may sometimes provide better results if you already know the image only contains English characters. - Create a
Frame
instance from an image file. You could use any image source, such as a webcam, video file, screenshots, etc. See Image Acquisition for more details. - Run inference on the frame to extract the text.
- Iterate over the predictions.
- Print the text and confidence score.
In the example above, the output should look like this:
You can also use the in
operator to check if a certain set of characters is present in the predictions:
The results may vary depending on the image quality, the font, and the language. Try with your own images to see how well the model performs, and provide us feedback about the results.