Friday, October 18, 2013

Merkle-Hellman Knapsack cryptosystem

Recently I read about the Merkel Hellman crypto system and I thought I'd reproduce my understanding of the same.

 -=====================]
The cryptosystem
Like other asymmetric cryptosytems such as RSA, D-H, ECC based cryptography -- the Merkle-Hellman cryptosystem uses the mathematically hard problem -- In this case it uses "The Subset Problem".

 -=====================]
The subset problem
The problem can be stated as follows : Given a set of positive integers, find the subset that adds to a given sum "s". The problem is known to be NP-hard(not NP-complete as it is not a decision problem). 

The cryptosystem is based on the insight that the presence or absence of an element in the subset can be denoted by a 1 and a 0 respectively, and that can be mapped against the 1 and 0 bits of a message. Before we discuss how exactly this fact is implemented as a cryptosystem we briefly discuss what is referred to as a super-increasing sequence.

 -====================]
Notation
sigma refers to the summation symbol in Math.
sigma(i=1, n)(i) would mean the summation of numbers from 0 through n.

 -====================]
Super-increasing sequence and its relevance

A super-increasing sequence is one where each element of the sequence is greater than the sum of all elements before it. An example of such a sequence would be :-

2 4 7 14 28 55

Note that the sum of no two combinations of the above sequence would yield the same sum. Why this is important will be clear in a moment.

Consider a message of the same length as the super-increasing sequence(let the length be n) M = (1,0,1,1,0,0)
A value "c" be computed as c = sigma(i=1, n) (S[i] * M[i]), where S be the super-increasing sequence and m be the message. It is important that two distinct messages do not evaluate to the same sum "c" -- using such sequences ensures the same.

 -====================]
Knapsack and Trapdoor knapsack

It is also noteworthy that a knapsack can be trivially solved when using a super-increasing sequence -- using them directly for encryption would not work out. Hence, we resort to transforming the super-increasing sequence(a trivial knapsack) into a trapdoor knapsack as follows :-

trapdoor[i] = (S[i] * r mod q) (for i = 1 -> n)

where "q" is a number greater than the sum of all elements of s and "r" and "q" are co-prime. At this point you might want to spend a few moments pondering over why "q" would need to be a value strictly greater than the sum of all elements in the trivial knapsack. We answer this later on in this post.

 -====================]
Merkle-Hellman cryptosystem

Finally, lets get down to it. This is how a message can be encrypted/decrypted in a Merkle-Hellman cryptosystem.
- First, we select a superincreasing sequence
- We select q such that it is greater than the sum of all elements of the sequence -
- We select an "r" which is lesser than "q" and co-prime to q
- We compute a "beta" sequence such that beta[i] = W[i] * rmodq
- At this point the public key is the ("beta") sequence and the private key is (W, r, q).

Encryption is performed by computing "c" such that
c = m[i] * beta[i]

Decryption we would need to figure out c' computed as follows :-
c' = c . r^(-1) modq and finding the elements in W that would add up to c'. The corresponding bits set to 1 and 0 would lead to the original message.

 -====================]
Why this works 

The encryption of the message uses a trapdoor knapsack, the solution of which is NP-hard(its not a decision problem)[3]. The decryption, which requires knowledge of a superincreasing sequence can be easily performed.
The trapdoor knapsack is created from the superincreasing sequence, and the encrypted message is the sum of a subset of the trapdoor knapsack.
Knowledge of the private key lets you deduce the message by transforming the sum of the subset of the trapdoor knapsack into the sum of the subset of a trivial knapsack. q is chosen to be a value greater than the sum of all elements so that no two plaintext messages sum up to the same "c". "r" is chosen to be co-prime to "q" so that an r-inverse exists(which is required for decryption).[4] A PoC implementation using sage can be found here[5]
 
[3] http://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems
[4] http://en.wikipedia.org/wiki/Merkle-Hellman_knapsack_cryptosystem
[5] https://gist.github.com/eQu1NoX/dfa27a9d0c4a1651150a