Foray into Cryptography

Until recently my main experience with cryptography was fitting my password to the arbitrary requirements of whatever website I’m signing up for. ‘Crypto’ keeps popping up randomly in web apps, so I bought a used version of Understanding Cryptography to understand… cryptography. The previous owner extensively highlighted the introductory chapter, but abruptly stopped at around page 22. I was about ready to give up too, but the next page has this fantastic quote in the “Lessons Learned” summary section:

“Never ever develop your own crypto algorithm unless you have a team of experienced cryptanalysts checking your design.”

I can definitely get behind something that not only recommends being lazy, but is also one of the major tenets of the field. Despite the brilliance of a single person or small team working in isolation, they will probably never beat the time-tested, heavily researched methods. I love that in the job of hiding information, the experts are extremely open about how to hide it.

One of the first exercises in the book is to crack a deciphered message that uses the infamous Caesar cipher.

The Caesar Cipher, or Shift Cipher, encrypts a document by assigning each letter a numerical value then ‘shifting’ it by some amount. For example, if the key is 3 then ‘A’ becomes ‘D’, ‘B’ becomes ‘E’, and so on. Once you get to ‘Z’ it loops around, so ‘Y’ becomes ‘B’, ‘Z’ becomes ‘C’, etc.

The encryption function e for our cleartext (normal) letter x, looks like this:

e(x) = (x + k) % 26

The decryption function for the encrypted letter y, looks like this:

d(y) = (y - k) % 26

where is the key (the number of spaces to shift) and is the modulo operator.

This encryption method isn’t very secure because the encrypted data retains characteristics of English (or whatever language the cleartext is written in). Certain letters and words are statistically more likely than others, so through analysis of the encrypted text you can determine the encryption key. Here’s the encrypted text given in the book:

“xultpaajcxitltlxaarpjhtiwtgxktghidhipxciwtvgtpilpitghlxiwiwtxgqadds”

Now the plan for decryption (in ruby):

  1. Write helper functions to convert words to numbers where A = 0, B = 1, C = 2, and vice versa.
  2. Write a function that will retrieve the letter frequency from the text
  3. Compare the most frequent letters to the most frequent letters in English to determine the key.

Admittedly, that’s a lot of work when you could just count the letters to skip right to step 3. But let’s pretend we had to do this on thousands of pages worth of data.

To turn letters into numbers I added a method to the String class named numerify:

class String
  #makes it so that 'a' is 0, 'b' is 1, etc.
  def numerify
    num = []
    self.downcase.each_char {|c| num << c.ord - 97}
    num
  end
end

Notice that returns an Array of numbers, so to reverse that function I had to modify the Array class:

class Array
  #takes an array of encrypted letters (as numbers) and turns into lowercase letters
  def denumerify
    letters = ""
    self.each {|n| letters += (n + 97).chr}
    letters
  end
end

Now we write the encryption and decryption functions using the equations from above.

#input: string to be encrypted, key (int)
#output: encrypted string as a comma separated string
def encrypt(x, k)
  e = []
  x.numerify.each {|i| e << ((i + k) % 26)}
  e.denumerify
end
#input: takes in an encrypted string and the key,
#output returns the decrypted string
def decrypt(y, k)
  d = []
  y.numerify.each{|i| d << ((i - k) % 26)}
  d.denumerify
end

Finally, we have to count the letter frequency so we can actually decipher the message. Here was my original way of doing it, which returns a key => value hash (letter => count).

letter_freq = {}
encrypted_text.chars.each do |i|
  letter_freq[i].nil? ? letter_freq[i] = 1 : letter_freq[i] += 1
end

But there’s always a better way! I found this more rubyesque method on Stack Overflow which I could tack onto the ‘String’ class:

class String
  def frequency
    self.chomp.each_char.with_object(Hash.new(0)) do |letter, hash|
      hash[letter.downcase] += 1
    end
  end
  #makes it so that 'a' is 0, 'b' is 1, etc.
  def numerify
    num = []
    self.downcase.each_char {|c| num << c.ord - 97}
    num
  end
end

Almost there. Now we can call the frequency method on our encrypted string to get the letter count, then sort by occurrence:

letter_freq = encrypted_string.frequency
#sort by occurance
sorted_letter_frequency = []
letter_freq.each_pair do |i|
  sorted_letter_frequency << i
  sorted_letter_frequency.sort! do |i, j|
    j[1] <=> i[1]
  end
end

The most common letter in the encrypted text is t, then i, then x. Wikipedia has some nice graphs on letter frequency, but usually you can just assume the most common letter is ‘e’.

E(4) -> T(19) = 15 letters in between, so we print out the decryption with a key of 15, and there’s our answer.

p decrypt(encrypted_string, 15)

I’m not going to spoil this one, but let’s just say these crypto guys are a bit sick in the head. Anyways, it was fun pretending to be Jake Gyllenhaal in Zodiac and I hope to write more about encryption in the future.

 

Mental Exercise for Menial Chores

If there’s one thing I learned from teaching, it’s that one of the most mind numbing activities is test proctoring. Most proctors aren’t allowed to use their smartphone, read a book, draw, or even sit. You are supposed to walk around and “actively proctor,” whatever that means. One of my coping tricks was to determine the collatz sequence of a random starting number, mentally.

A collatz sequence is a simple algorithm that creates a chain of numbers based on a starting ‘seed’ number. The rule is simple:

If n is even then:

n –> n/2

If n is odd then:

n –> 3n + 1

For example if you start with 5, an odd number, the next number would be

3*5 + 1 = 16

Since 16 is even, we divide it by two, so the next number is 8. So far our sequence is

5 –> 16 –> 8

Since 8 is even, we divide by two again, giving us 4. Divide that, and on and on until you reach 1. The overall sequence contains 6 numbers:

5 — > 16 –> 8 –> 4 –> 2 –> 1

This is called the collatz conjecture because it’s unproven whether or not any starting number will always reach one, like in the example above. Here is the question from Project Euler  which I will try to solve.

Which starting number, under one million, produces the longest chain?

How I will approach this problem

  1. Write a helper function that will determine the next number in the sequence
  2. Write a function that will loop through every number keeping track of the number of iterations.
  3. Once the chain has finished, check if it has surpassed the last winner. If it has, then replace it as the current ‘leader’ .

First, the helper function to determine the next number:

def next_collatz(n)
  if (n%2 == 0)
    return n/2
  else
    return 3*n+1
  end
end

Then, declare the variables to keep track of the current leader (longest)

longest = 0
counter = 1
winning_num = 0
x = 1

Construct the loop to go through all numbers less than one million

while counter < 1000000
  x = counter
  chain = 1
  while x > 1
    x = next_collatz(x)
    chain+=1
  end
  #print("N: ", counter, " Had a chain of: ",chain,"\n")
  if (chain > longest)
    longest = chain
    winning_num=counter
  end
  counter+=1
end

Print the winner!

print("longest: ",longest," winningnum: ",winning_num,"\n") #Note this is the number that produces the longest chain, not the length of chain

 

Champernownes Constant

Champernowne was a mathematician/friend of Alan Turing who is most well known for his research on his self-named number, Champernowne’s Constant. You can generate this number by appending positive integers in ascending order after the decimal place:

0.123456789101112131415… etc.

It was a lot easier to come up with unique numbers in the 30’s.

Here are the challenges

1. Write an algorithm to quickly determine what number (0-9) is at a given index. For example, c(4) = 4, c(11) = 0, c(14) = 1, etc.

2. From Project Euler: Determine the product of c(1) * c(10) * c(100) * c(1,000) * c(10,000) * c(100,000) * c(1,000,000)

Since this number is a geometric series, you must know the previous number to generate the next. There is a tempting brute force approach:

c = "0."
i = 1
while i < 10000000
  c += i.to_s
  i += 1
end

While that does work, after 100,000 loops you’re going to start running out of memory. Here’s a better, but still nonfunctional solution:

idxs = [1, 10, 100, 1000, 10000, 100000, 1000000]
i = 1 #loop number counter
prod = 1
index = 1
c = "0"
while i <= 1000000
  s = i.to_s #we need to check each digit in the string
  #p s
  #c += s
  s.chars.each_index do |j|
    if idxs.include?(index)
      prod *= s[j].to_i
    end
    index += 1 #this is the current place in the constant
  end
  i+=1
end
p prod

That works by changing a second incrementor, index, based on each number’s value. For example, the number ’12’ would change index by 2, the number ‘142’ would change index by 3, etc. It is still too slow to actually do the math.

Unable to find a solution fast enough to store the value in memory and look through it, I went with storing the data into a file instead, and reading in from the file.

First we create the file:

#Approximately 6mb
File.open("champernownes", "a") do |f|
  i = 0
  while i < 1000000
    f<<i.to_s
    i+=1
  end
end

Then we read through it character by character, storing the product whenever we reach one of the indices relevant to the challenge.

idxs = [1, 10, 100, 1000, 10000, 100000, 1000000]
prod = 1
File.open("champernownes", "r") do |f|
  i = 0
  f.each_char do |c|
    if idxs.include?(i)
      prod *= c.to_i
    end
    i+=1
  end
end
p prod

Be assured this is nowhere near the fastest algorithm for solving this problem, but it was one of the first times that I considered storing the ‘brute force’ solution into a file and then reading from that.