• Bokmål
  • English


Removal of clutter from historical scanned documents

Removal of clutter from historical scanned documents

An innovation project led by Lumex A/S and partly financed by the Norwegian Research Council, aims to improve optical character recognition in noisy historical documents. The goal is to develop solutions that will be able to detect, locate and characterize clutter and then apply adapted OCR to regions containing clutter. The clutter can be related to tears, cracks and aging of the paper of the documents, or stamps and annotations that have been deliberately introduced. Ink smears and blobs from the printing process are also frequent.

NR is contributing to this project by developing novel methods that can help to remove various types of clutter in such images. The images below show results where clutter has been automatically located and marked.


Detected clutter marked in red.

Research areas


    • Lumex A/S
    • The National Archives of Norway
    Norsk Regnesentral
    Postboks 114 Blindern
    0314 Oslo
    Norsk Regnesentral
    Gaustadalleen 23a
    Kristen Nygaards hus
    0373 Oslo
    (+47) 22 85 25 00
    Adresse Hvordan komme til NR
    Sosiale media Del på sosiale media
    Personvernerklæring Personvernerklæring
    Postadresse: Norsk Regnesentral, Postboks 114 Blindern, 0314 Oslo
    Besøksadresse: Norsk Regnesentral, Gaustadalleen 23a, Kristen Nygaards hus, 0373 Oslo
    Tlf: (+47) 22 85 25 00
    AdresseHvordan komme til NR