Linguistics & Technology

OCR & OCR-on-a-Chip

Originating from the Ligature OCR engine, tens of man-years were devoted to the development of sophisticated, accurate and lightweight OCR engines. Based on neural network technology and other intelligent statistical approaches, Wizcomtech's OCR engines are designed to operate on mobile devices with limited resources, such as minimal space and low memory consumption. Wizcomtech's OCR engine recognizes a wide variety of scanning languages. Our expertise in embedded environments enables integration of other engines into Wizcomtechtech products, such as text to speech, handwriting recognition, other OCR engines, etc.

Grey to Bit and Image Extraction
Wizcomtech's OCR engine includes a pre-processing phase that employs a sophisticated grey to bit algorithm, image enhancement and extraction of text regions.

Open Architecture
Our software platform is flexible and adaptable. Its development tools can be used to implement a wide variety of embedded applications which can be tailored to meet specific market requirements.

Portable Scanning and Recognition Technology
Wizcomtech's electro-optics system, using our patented technology, provides precise text scanning using 400 dpi resolution. The use of the proprietary ASIC (Application-Specific Integrated Circuit) along with our highly integrated hardware design combines to create a powerful device in a compact, ergonomically designed package.

Linguistic Technology
Wizcomtech's dictionary databases are created specifically for use in electronic (linguistic) products. They are especially designed to operate in a low resource environment (both storage and runtime consumption) and are built to identify and translate any word form that might appear in written text. These databases and the accompanying engines consequently have many advantages over dictionaries which are primarily compiled for printing. These advantages are not abstract lexicographical concepts, but are concrete features that make Wizcomtech's products more useful and easy to use.

Morphological Search Engines
Wizcomtech's sophisticated search engines were tailored to operate on embedded systems using our efficiently compressed dictionary databases. Our databases and search mechanisms are optimized to obtain immediate responses with minimum memory consumption.
Our products also use an extensive indexing system in addition to the dynamic Reverse Derivation Technology (RDT). Our highly effective search engines feature:

  • A wide range of inflections within each index.
  • An exhaustive range of inflections using RDT.
  • Recognize alternative spellings, and directs the user to the correct entry even if he has entered the headword using a different spelling, for example:
    • Words with country-dependent spelling such as color/colour and center/centre
    • Words where new spelling rules have been applied (spelling reforms) therefore modifying the original spelling, such as in German and Dutch.
    • Words which can be spelled either with a hyphen or as one word.
  • Direct Lookup feature of idioms and phrasal verbs, which takes the user straight to the relevant place in the dictionary database.
  • Inflected forms of phrasal verbs, for example, entering "walked in", will direct the user to "walk in".
  • Homonym differentiation.

Language-specific solutions
Wizcomtechtech has developed a number of features to increase both the efficiency and user-friendliness of dictionary databases whose source language is not English. The following are some examples:

Compound Words
Some languages (notably German) combine words to form compounds. The number of combinations is huge, and only the most common of these compounds are generally listed as headwords in dictionaries. Our compound words look-up feature divides a compound word that is not a headword into its component parts, and provides the translation of each part. In the vast majority of cases this is sufficient for the user to understand the meaning of the compound word.

Affixes as Headwords
This feature is similar to compound words. Huge numbers of words can be formed using prefixes and suffixes (e.g., -based, -ship, anti-, sub-). Only the most common of these are generally listed as headwords. When scanning a non-frequently used word which consists of an affix and a word, the user will be given a translation of both the affix and the rest of the word, rather than receiving a "word not found" message.

Split Verbs
In some languages (notably German and Dutch) many verbs are split into two parts, where the first part is not necessarily followed by the second part, and may have any number of words in between the two parts. In most cases, the two parts of the split verb, when viewed separately have an entirely different meaning than when they are combined. Our split verbs feature allows the user to scan an entire sentence containing a split verb. The split verb will be identified and translated as a complete entity. The user is also able to enter the two parts separately and receive the translation of the complete split verb.
Word Segmentation
We have developed an algorithm that deals with languages that do not use spaces as word delimiters (e.g., Japanese, Chinese). This algorithm divides the scanned sentence into words and then displays the corresponding entries.

