In Edgar Allan Poe's short story The Gold-Bug this encoded message occurs:
53++!305))6;4826)4+.)4+);806;48! 860))85;]8:+8!83(88)5!; 46(;8896?;8)+(;485);5!2:+(; 49562(5-4)88; 4069285);)6 !8)4++;1(+9;48081;8:8+1;48!85;4)485! 52880681(+9;48;(88;4(+?3 4;48)4+;161;:188;+?;
The code is explained as such: > the solution is by no means so difficult as you might be led to imagine from the first hasty inspection of the characters. These characters, as anyone might readily guess, form a cipher – that is to say, they convey a meaning.
Even though The Art of Secret Messages is old, the name of cryptography is modern. The word cryptography actually originates from Poe’s short story The Gold-Bug from 1843.
Poe is perhaps best known for being the inventor of the detective genre, but he is also known for his uncanny stories. In this fictitious example Poe uses asymmetric cryptography, in which every included character is paired up - one acts as secret, one as value.
Each pair contains characters that are substitutable for another. In The Gold-Bug, the "values" are unknown and one of the main characters tries to get on top of the problem by counting the strange signs, thus making a table. Poe used his understanding of communication (language) when he dealt with cryptography. To me, Poe's procedure seems to have strong similarities to heuristics in algorithms. Sometimes you're not able to rely on pure abstractions and logic, you must make assumptions. The better the assumptions, the higher the possibility of an accurate or good outcome.
Now, in English, the letter which most frequently occurs is e. Afterwards, the succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z. E however predominates so remarkably that an individual sentence of any length is rarely seen, in which it is not the prevailing character."Let us assume 8, then, as e. Now, of all the words in the language, 'the' is the most usual; let us see, therefore, whether they are not repetitions of any three characters in the same order of collocation, the last of them being 8. If we discover repetitions of such letters, so arranged, they will most probably represent the word 'the'. On inspection, we find no less than seven such arrangements, the characters being ;48. We may, therefore, assume that the semicolon represents t, that 4 represents h, and that 8 represents e --the last being now well confirmed. Thus a great step has been taken.
— Edgar Allan Poe in The Gold-Bug
Now let's reproduce this, creating an encoder and a decoder with JavaScript. We will use this to make a conversion table.
A: "5", B: "2", C: "-", D: "†", E: "8", F: "1", G: "3", H: "4", I: "6", J: ",", K: "7", L: "0", M: "9", N: \"*", O:"‡", P:".", Q:"\$", R:"(", S:")", T:";", U:"?", V:"¶", W:"]", X:"¢", Y:":", Z:"[\"
Given a conversion table, this kind of algorithm - Caesar's cipher - is easy to understand and implement. We simply have to substitute each character in the encoded message with its correlating value in the table. The other way around, of course, would be the procedure for decoding messages.
Basically, we loop through the message and switch each and every character with another (the 'opposite' value). But… we have a problem. The result is not very satisfying—well, it's not formatted anyway.
This is, by the way, a good example of an area where the human mind outshines the computer. The computer will encode and decode endlessly faster. But it can't understand text, so it's quite tricky to separate the words—something we humans can do quite fast.
This is our expected output:
AGOODGLASSINTHEBISHOPSHOSTELINTHEDE VILSSEATTWENTYONEDEGREESANDTHIRTEENMI NUTESNORTHEASTANDBYNORTHMAINBRANHSEVENTH LIMBEASTSIDESHOOTFROMTHELEFTEYE OFTHEDEATHSHEADABEELINEFROMTHETREE THROUGHTHESHOTFIFTYFEETOUT
Is there some way to manage this? …making it even easier?
I will present a theoretically correct procedure - but with flawed results, if you don’t have billions of years to sit around and wait for the result (and a whole planet acting RAM).
My hypothesis was to match the string, a string that lacks blank spaces, for words by comparing combinations of individual characters with actual words in the English dictionary. This I managed. The problem, though, is that the algorithm generates more words than those included in the string.
I thought that if I combined every element in the array (in this case approximately 200), joined them and looked for combined strings with the same length as the string evaluated, we would find only a few matching combinations - perhaps even only one. But… I did not consider the 5.092×10\^46 average Gregorian years it would take.
I will present what I managed step by step and why it failed.
First,
const goldBugKey = {
A: "5", B: "2", C: "-", D: "†",
E: "8", F: "1", G: "3", H: "4",
I: "6", J: ",", K: "7", L: "0",
M: "9", N: "*", O: "‡", P: ".",
Q: "$", R: "(", S: ")", T: ";",
U: "?", V: "¶", W: "]", X: "¢",
Y: ":", Z: "["
};
But this only covers one part. We also need the pairs in the opposite order. To obtain it, we'll make the value-secret order opposite.
const reversedArr = Object.entries(goldBugKey).reverse();
let keyReversed = {};
reversedArr.forEach((keyPair) = {
const [a, b] = keyPair;
let newPair = {
[b]: a
};
keyReversed = {
...keyReversed,
...newPair
};
});
This is the encoded message the reader is confronted with in The Gold-Bug:
const codeFromTheGoldBug = `53‡‡†305))6
*;4826)4‡.)4‡);806*;48†8
¶60))85;;]8*;:‡*8†83(88)5*†;46(;88*96
*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8
¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡
1;48†85;4)485†528806*81(‡9;48;(88;4
(‡?34;48)4‡;161;:188;‡?;`;
Something that translates to (and we need both for proofs of concept):
const sentenceFromTheGoldBug = "A good glass
in the bishop's hostel in the devil's seat
twenty-one degrees and thirteen minutes
northeast and by north main branch seventh
limb east side shoot from the left eye of
the death's-head a bee line from the tree
through the shot fifty feet out.";
To avoid problems with mismatching lower- and uppercase letters we arbitrarily choose to uppercase all letters. We also needed to trim the string, so we can avoid confusion with whitespace at both ends. In the end we also want to handle each character as a separate element, and therefore needed to split the string into an array.
Next follows the most important relevant step - the conversion. We substitute each character with the corresponding character in the conversion table.
const senToEncode = Array
.from(sentenceFromTheGoldBug.toUpperCase()
.trim());
const encodedMsg = senToEncode
.map((char) => char = goldBugKey[char] || " ")
.join()
.replace(/,/g, "");
And then do the same, but in the other direction (if that would be our task).
const codeToDecodeArr = Array
.from(codeFromTheGoldBug);
const decodedMsg = codeToDecodeArr
.map((char) => char = keyReversed[char] || "\n")
.join()
.replace(/,/g, "");
I have used the NPM-package an-array-of-english-words,
which contains 275 000 words. My assumption is to no encoded individual
word is longer than 27 characters, the longest word in the works of
Shakespeare. According to
Wikipedia
the word is “Honorificabilitudinitatibus”, meaning “the state of being
able to achieve honours”.
const charsInMsg = decodedMsg.split("");
let relevantGuesses = [];
for (let i = 0; i < charsInMsg.length; i++) {
let tempString = "";
let char = 0;
while (char <= 27) {
if ((i + char) < charsInMsg.length) {
let tempIndex = i + char;
tempString += charsInMsg[tempIndex];
if (wordsArrAsObj[tempString]) {
const temp = [...relevantGuesses, tempString]
relevantGuesses = temp;
}
}
char++;
}
}
console.log(relevantGuesses)
Given the sentence provided in The Gold-Bug this would be the output:
a - ag - ago - agood - go - goo - good - o - oo - o - od - glass - la - las - lass - lassi - a - as - ass - si - sin - i - in - nth - the - he - bi - bis - bish - bishop - bishops - i - is - ish - sh - shop - shops - ho - hop - hops - o - op - ops - sh - ho - hos - host - hostel - o - os - st - te - tel - el - elint - li - lin - lint - i - in - nth - the - he - ed - de - dev - devil - devils - evil - evils - i - sea - seat - ea - eat - a - at - att - twenty - we - wen - went - en - y - yo - yon - o - on - one - ne - ned - ed - de - deg - degree - degrees - gree - grees - re - ree - rees - ee - es - san - sand - a - an - and - thir - thirteen - hi - i - te - tee - teen - ee - een - en - mi - minute - minutes - i - in - nu - nut - u - ut - ute - utes - te - tes - es - snort - no - nor - north - northeast - o - or - ort - the - he - heast - ea - eas - east - a - as - st - stand - standby - ta - tan - a - an - and - by - y - no - nor - north - o - or - ort - hm - ma - main - a - ai - ain - i - in - bra - bran - ran - a - an - seven - seventh - eve - even - event - vent - en - nth - li - limb - i - be - beast - beasts - ea - eas - east - easts - a - as - st - si - side - sides - sideshoot - i - id - ide - ides - de - es - sh - shoo - shoot - ho - hoo - hoot - o - oo - oot - o - fro - from - rom - o - om - the - he - hele - el - left - lefte - ef - eft - te - eye - y - ye - o - of - oft - the - he - ed - de - death - deaths - ea - eat - eath - a - at - sh - she - shea - he - head - ea - a - ad - da - dab - a - ab - be - bee - beeline - ee - eel - el - li - lin - line - i - in - ne - nef - ef - fro - from - rom - o - om - the - he - het - et - tree - re - ree - ee - et - eth - thro - through - rough - rought - o - ou - ought - u - ug - ugh - the - he - hes - es - sh - shot - ho - hot - o - fifty - i - if - y - fe - fee - feet - ee - et - to - tout - o - ou - out - u - ut -
As you can see, all the words are included, but the .length hypothesis turned out to be quite problematic. We can't know what words are to be included. It would be possible to guess, and with the length of the string check if what combinations of words together would equal this number. But we can't know if this combination would be the one, if we (the algoritm) would not understand the sentence.