Contact CTF writeups Notes

[PicoCTF 2018] - crypto - Hertz

This is one of my writeups for PicoCTF 2018

Problem

Here's another simple cipher for you where we made a bunch of substitutions. Can you decrypt it? Connect with nc 2018shell3.picoctf.com 14928.

Hint :

  1. NOTE: Flag is not in the usual flag format

Solution

The problem description indicates that we are confronted with a substitution cipher. Substitution ciphers are vulnerable to frequency analysis : letters in any language appear in text at different frequencies. So we can easily guess that the letter that appears the most in the ciphertext is the substitution for the letter that appears the most in the english language (that would be "e".) After guessing a few most common letters, we can just guess the other substitutions by just looking a the partially deciphered text.

The name of the task is a hint at this approach (hertz being a unit used to measure frequency).

So I used the following bit of code to look at character frequency in the ciphertext :

from collections import Counter
from string import ascii_lowercase

with open('ciphertext', 'r') as f:
    ciphertext = f.read()

charcount = Counter(c for c in ciphertext if c in ascii_lowercase)
total_chars = sum(charcount.values())
for char, count in charcount.items():
    print(f'{char}: {(count*100)/total_chars}% ({count})')

It produced this output :

l: 2.174877940523746% (49)
r: 7.23479804704838% (163)
o: 6.56901908566356% (148)
w: 2.3524189968930314% (53)
q: 6.1695517088326675% (139)
s: 6.924101198402131% (156)
z: 9.409675987572125% (212)
g: 7.1016422547714155% (160)
v: 5.947625388371061% (134)
e: 13.049267643142477% (294)
y: 6.924101198402131% (156)
k: 2.2192632046160674% (50)
p: 2.929427430093209% (66)
a: 2.4411895250776743% (55)
i: 5.0599201065246335% (114)
u: 1.3759431868619618% (31)
h: 1.908566355969818% (43)
f: 1.020861074123391% (23)
n: 3.195739014647137% (72)
c: 2.796271637816245% (63)
b: 2.174877940523746% (49)
j: 0.1775410563692854% (4)
d: 0.0887705281846427% (2)
x: 0.5770084332001776% (13)
t: 0.13315579227696406% (3)
m: 0.04438526409232135% (1)

We can compare that with the letter frequency of the english language. Looking at the output, we see an overwhelming majority of "e", with a 13% frequency : according to wikipedia, the letter e has a frequency of 12.702% in the english language. from that we can deduct that the letter e is in fact no substitued with anything in that cipher.

Let's look at the next most common letters :

  • "z" has a frequency of about 9.41% : the closest match in english would be "t"
  • similarly, the frequency of "y" in the ciphertext seems to match that of "i" in english.

Next, I built the following script :

from colorama import Fore

with open('ciphertext', 'r') as f:
    ciphertext = f.read()


sub = {
    'e': 'e',
    'z': 't',
    'y': 'i',
}
text = ''
for char in ciphertext:
    if char in sub:
        char = char.replace(char, f'{Fore.GREEN}{sub[char]}{Fore.RESET}')
    text += char

print(text)

It simply prints the text, performing the guessed substitutions and printing the decrypted letters in green (is uses a third party library, colorama, for that). By just looking at the partially decrypted text, we can guess more letters.

I simply extended the subs dictionary with letters progressively, each newly guessed substutions helping to guess one, until the flag appeared in the partially decrypted text : substitution_ciphers_are_solvable_uyhyldalrg.