Medieval messages and documents obscured by incomprehensible ciphers can be found in libraries and archives all over the world. Artificial intelligence is helping historians crack open these mysterious texts.
Deep in the archives of the Vatican library, a mysterious hand-written book, scrawled with strange symbols, had lain unread for more than 400 years. Its cryptic pages apparently concealed secret remedies “for affections of the human body”, according to some text scratched inside the cover. Such healing practices were kept under wraps at the time since they could attract suspicion or even accusations of witchcraft.
Known as the Borg cipher, the 408-page-long manuscript is mostly incomprehensible – coded using 34 obscure symbols with a few Roman letters and a front page written in Arabic. There was no known key to reveal what was encrypted. Some of the pages are also damaged due to their age, making the code even more challenging to read.
But with the help of machine learning – a form of artificial intelligence – researchers were able to unravel the code. They discovered the text was filled with thousands of bizarre treatments such as drinking several glasses of high-quality red wine or fermenting a nutmeg in some dough to combat dysentery.
“It is like detective work where every symbol, pattern, and partial solution may bring us closer to someone’s secrets and to a lost historical world,” says Beáta Megyesi, a professor in computational linguistics at Stockholm University in Sweden, who was part of the team who decoded the text. Even with the help of AI, the process of unlocking the cipher key was painstaking.
Now Megyesi and her colleagues are leading efforts to harness the power of AI to crack historic ciphers more efficiently, potentially unlocking a wealth of coded information from the past that has previously been uncrackable.
According to some estimates, around 1% of the material in archives and libraries around the world is fully or partially encrypted. Some of the earliest known ciphers date back to Ancient Greece and Rome.
Decoys, dead languages and bad handwriting
Together, coded historic documents conceal diplomatic intelligence, the rituals of secret societies, medical knowledge, love affairs or everyday details that people wanted to keep secret. This is information currently missing from historical narratives.
In some cases, decoding these documents has the potential to rewrite what we know about a famous individual or an entire period of history. (One recent cipher to do this were a collection of coded letters that were found to have been written by Mary Queen of Scots during her long imprisonment in England.
They revealed her involvement in plots to regain her throne and her tense relationship with her son, James VI of Scotland and future King James I of England.)
Historic ciphers can be relatively simple: the Borg cipher, for example, uses a simple substitution cipher, meaning that each symbol was swapped with a single Roman letter to hide what was written. Others, however, can be difficult to unravel.
In some cases, nothing is known about the original language the uncoded text was written in. Extra, meaningless symbols can also be inserted as a decoy to throw off anyone hoping to snoop on the text. In other cases, several signs can be used to represent the same letter.
This can mean a huge amount of work – often involving trial and error – to decode even a small amount of text. It took Cecile Pierrot, a cryptologist at the French National Institute for Computer Science Research (INRIA) in Nancy, France, and her colleagues six months to gradually unravel the key to a 500-year-old letter from Charles V, the Holy Roman Emperor and King of Spain, that had been written using 120 different cipher symbols across three pages.
(The decrypted letter revealed Charles V – one of the most powerful men of his time – undone by fear of a plot to kill him. The king was terrified that an Italian mercenary warlord serving the French king, Francis I, was about to assassinate him.)
Before code-breaking can begin, researchers must first painstakingly transform a handwritten cipher into a digital document that can be fed into code-cracking software. Bad handwriting and fading of the ink can make this task even harder.
Pierrot says it typically takes her a day just to transcribe a two-page letter containing symbols that are unfamiliar to her.
How AI is helping speed-read secrets
But AI is starting to speed up the process. Michelle Waldispühl, a professor of German linguistics at the University of Oslo in Norway and her colleagues, recently used an online AI platform called Transkribus to transcribe a secret letter written by nobleman Sigismund Heusner von Wandersleben to the Swedish Lord High Chancellor Axel Oxenstierna in 1637 at the height of the 30 Years’ War, a religious conflict that would ultimately claim millions of lives and devastate huge swathes of Europe.
The tool has been trained on various languages, scripts and handwriting styles that cover several centuries. After the image of a document is uploaded to the system, the AI detects blocks of texts and individual lines before scanning the whole text character by character to turn it into a digital form.
Although some manual corrections were needed, the tool worked quite well on
Von Wandersleben’s letter as it was only partly encrypted using numbers separated by dots that were neatly written with clear spaces between them. Other parts were not coded and simply written in 17th-Century German script.
Existing AI transcription platforms often struggle when manuscripts are encrypted with unusual characters, such as invented signs, astrological symbols or numbers that are written in an odd way.
But Megyesi, Waldispühl and their colleagues are developing their own AI tool to turn handwritten historical texts with obscure symbols or scripts into machine-readable documents as part of the multinational Descrypt project.
“We are developing more adaptable models trained and tested across a broad range of scripts, alphabets and symbolic repertoires,” says Megyesi.
Once a secret document has been transcribed, the detective work can begin. At the moment, cryptologists often use specially designed non-AI computer software to help with the task which harnesses algorithms to try to determine what cipher was used and break the code.
Simple ciphers can often be cracked by analysing the frequency of symbols used and matching them to letters of the alphabet that appear at the same rate in a language. In English, for example, the letter E is the most common while Z, Q and X are the least frequent.
But in Von Wandersleben’s letter from the frontlines of the 30 Years’ War, for example, he used up to eight different symbols to represent the letter E. It meant trial and error, as well as Waldispühl‘s knowledge of old German, was needed to gradually unpick the code.
“It was very much back and forth between the machine and the human validator,” says Waldispühl. “Maybe at some point AI can do it completely independently.”
Hidden behind the cipher were Von Wandersleben’s warnings about the threat posed by factions of Sweden’s protestant allies in the war. He told Oxenstierna that he had been forced to make strategic retreats from the conflict after being told about a conspiracy among his allies, including Lord Franz Heinrich of Saxony.
Reopening cold case codes
Megyesi and her team are now exploring how AI could skip the transcription stage all together, simply by analysing photos of the pages to decipher secret messages. They recently showed how the approach could work for simple codes, where every letter is replaced by a single symbol.
They tested the system on a 105-page manuscript they had already decoded, known as the Copiale cipher, which details the rituals, rules and ideals of an 18th-Century German secret society.




