OCTess

About Our Tool

Using an optical character recognition machine learning model, we developed and validated an algorithm for extracting macular optical coherence tomography data, yielding an accuracy comparable to a human extractor of 99.97% while being significantly more efficient.

Purpose

Manual data extraction of spectral domain optical coherence tomography (SD-OCT) reports from large databases is a significant time- and resource-intensive process. To that end, we have created an optical character recognition (OCR) algorithm to automatically extract clinical and demographic data from Cirrus SD-OCT macular cube reports. Read our paper here! (pending approval)

How it works

The Zeiss Cirrus 5000 and 6000 monocular PDF outputs are the only files that we accept. Demographic variables for extraction include the patient’s name, birthdate, laterality (left or right eye), SD-OCT scan date, and gender. Clinical variables for extraction include superior, central superior, nasal, central nasal, inferior, central inferior, temporal, central temporal and central macular thickness, average cube volume, average cube thickness, signal strength and foveal coordinates. Watch the video for a short demonstration. Read more about it here!

Tesseract

Our algorithm utilizes an open-source OCR engine called Tesseract (pyTesseract version 0.3.9) to convert images to text. “OCTess” (i.e. portmanteau of OCT and Tesseract) was evaluated on two different Tesseract engines: one legacy version which works by recognizing character patterns and a newer, recurrent neural network-based OCR engine. Both engines are publicly available and have been developed by Google. Learn more about Tesseract here!

Seconds Per Document

Percent Accuracy

Second Improvment

Times Faster

Usage

Upload pngs and/or pdfs

Credits

Authors and Developers

This web app is an implementation of OCTess: An Optical Character Recognition Algorithm for Automated Data Extraction of Spectral Domain Optical Coherence Tomography Reports. Published in Retina (2023)

Michael Balas, MD(C)¹; Josh Herman, MD(C)¹, Nishaant (Shaan) Bhambra, MD², Jack Longwell, HBSc³, Marko M Popovic, MD, MPH(C)⁴, Isabela M Melo, MD^4,5, Rajeev H Muni, MD, MSc^4,5

1 Temerty Faculty of Medicine, University of Toronto

2 Faculty of Medicine, McGill University

3 Department of Mathematics and Statistics, McMaster University

4 Department of Ophthalmology & Vision Sciences, University of Toronto

5 Department of Ophthalmology, St. Michael’s Hospital/Unity Health Toronto

App developed by Jack Longwell

F.A.Q

Frequently Asked Questions

What kind of document types are acceptable?

The algorithm will accept both PDF and PNG files. Please note that Zeiss Cirrus 5000 and 6000 monocular PDF/PNG outputs are the only files that will work as intended.
How does my extracted data get returned?

All data is extracted via the OCTess algorithm and writted to an EXCEL spreadsheet. You will be able to download your extracted data after the algorithm has concluded.
How long will it take to get my data extracted?

You can expect the web app to take around 10 seconds per file that you upload. Please note that the majority of this time will be uploading your files to the worker.
How many documents can I upload at once?

This will depend on the size of your files and your internet's uploading speed. You should be able to upload 20+ at once. Any more will likely result in a timeout error.

Contact

Name:

Jack Longwell

Location:

36 Queen St E, Toronto, ON M5B 1W8

Email:

[email protected]

Welcome to OCTess

An automated spectral domain optical coherence tomography data extraction tool