Maximizing Text Recognition Accuracy with Image Transformers in Spark OCR
Wednesday, June 24th, 2020: 2:00 PM to 3:00 PM
June 21, 2020,
Volume 267, Issue 3

online

Spark OCR is an object character recognition library that can scale natively on any Spark cluster; enables processing documents privately without uploading them to a cloud service; and most importantly, provides state-of-the-art accuracy for a variety of common use cases. A primary method of maximizing accuracy is using a set of pre-built image pre-processing transformers - for noise reduction, skew correction, object removal, automated scaling, erosion, binarization, and dilation. These transformers can be combined into OCR pipelines that effectively resolve common 'document noise' issues that reduce OCR accuracy.

Hosted by Julia Mohler From Miami Hadoop User Group

Read More ...

Keywords:

 
Other articles in the SF IT Hadoop section of Volume 267, Issue 3:
  • Maximizing Text Recognition Accuracy with Image Transformers in Spark OCR (this article)

See all archived articles in the SF IT Hadoop section.