Monday, December 16, 2013

Yet another library for padding oracle exploitation

There seem to be quite a few tools out there that try to help one exploit padding oracle vulnerabilities. I just wrote my own for fun.

The repository with the code can be found here.
A dummy server can be found in the same repository along with code that uses the library.

You can use the library as shown here. You are expected to create an instance of "PaddingAttack" with the ciphertext, the IV, a callback, the blocksize(defaults to 16) and a logging level(defaults to INFO).

The details of how it can be used can be found here.


Reference :
[+] blog.gdssecurity.com/l/b/2010/09/14/automated-padding-oracle-attacks-with-padbuster/

Saturday, November 9, 2013

RWTH CTF : Smartgrid writeup

RWTH CTF 2013 had an interesting python service by the name of Smartgrid that had an admin interface challenge-response based authentication set up as follows : https://gist.github.com/eQu1NoX/fdd78e7b44cb9afc10b8

Looking at the details of the key we see that it has a very low public exponent of 3. The message length is also small(1024 bits, 128 bytes) -- and there is no padding. Hence, we could try adding in different multiples of N, take a cuberoot and check if its the message(by taking a cube again and checking against the ciphertext).

Something like this :
https://gist.github.com/eQu1NoX/b0bbb47b540d57b22909

After this we could issue "readstatus" requests that would get us flags.

Friday, October 18, 2013

Merkle-Hellman Knapsack cryptosystem

Recently I read about the Merkel Hellman crypto system and I thought I'd reproduce my understanding of the same.

 -=====================]
The cryptosystem
Like other asymmetric cryptosytems such as RSA, D-H, ECC based cryptography -- the Merkle-Hellman cryptosystem uses the mathematically hard problem -- In this case it uses "The Subset Problem".

 -=====================]
The subset problem
The problem can be stated as follows : Given a set of positive integers, find the subset that adds to a given sum "s". The problem is known to be NP-hard(not NP-complete as it is not a decision problem). 

The cryptosystem is based on the insight that the presence or absence of an element in the subset can be denoted by a 1 and a 0 respectively, and that can be mapped against the 1 and 0 bits of a message. Before we discuss how exactly this fact is implemented as a cryptosystem we briefly discuss what is referred to as a super-increasing sequence.

 -====================]
Notation
sigma refers to the summation symbol in Math.
sigma(i=1, n)(i) would mean the summation of numbers from 0 through n.

 -====================]
Super-increasing sequence and its relevance

A super-increasing sequence is one where each element of the sequence is greater than the sum of all elements before it. An example of such a sequence would be :-

2 4 7 14 28 55

Note that the sum of no two combinations of the above sequence would yield the same sum. Why this is important will be clear in a moment.

Consider a message of the same length as the super-increasing sequence(let the length be n) M = (1,0,1,1,0,0)
A value "c" be computed as c = sigma(i=1, n) (S[i] * M[i]), where S be the super-increasing sequence and m be the message. It is important that two distinct messages do not evaluate to the same sum "c" -- using such sequences ensures the same.

 -====================]
Knapsack and Trapdoor knapsack

It is also noteworthy that a knapsack can be trivially solved when using a super-increasing sequence -- using them directly for encryption would not work out. Hence, we resort to transforming the super-increasing sequence(a trivial knapsack) into a trapdoor knapsack as follows :-

trapdoor[i] = (S[i] * r mod q) (for i = 1 -> n)

where "q" is a number greater than the sum of all elements of s and "r" and "q" are co-prime. At this point you might want to spend a few moments pondering over why "q" would need to be a value strictly greater than the sum of all elements in the trivial knapsack. We answer this later on in this post.

 -====================]
Merkle-Hellman cryptosystem

Finally, lets get down to it. This is how a message can be encrypted/decrypted in a Merkle-Hellman cryptosystem.
- First, we select a superincreasing sequence
- We select q such that it is greater than the sum of all elements of the sequence -
- We select an "r" which is lesser than "q" and co-prime to q
- We compute a "beta" sequence such that beta[i] = W[i] * rmodq
- At this point the public key is the ("beta") sequence and the private key is (W, r, q).

Encryption is performed by computing "c" such that
c = m[i] * beta[i]

Decryption we would need to figure out c' computed as follows :-
c' = c . r^(-1) modq and finding the elements in W that would add up to c'. The corresponding bits set to 1 and 0 would lead to the original message.

 -====================]
Why this works 

The encryption of the message uses a trapdoor knapsack, the solution of which is NP-hard(its not a decision problem)[3]. The decryption, which requires knowledge of a superincreasing sequence can be easily performed.
The trapdoor knapsack is created from the superincreasing sequence, and the encrypted message is the sum of a subset of the trapdoor knapsack.
Knowledge of the private key lets you deduce the message by transforming the sum of the subset of the trapdoor knapsack into the sum of the subset of a trivial knapsack. q is chosen to be a value greater than the sum of all elements so that no two plaintext messages sum up to the same "c". "r" is chosen to be co-prime to "q" so that an r-inverse exists(which is required for decryption).[4] A PoC implementation using sage can be found here[5]
 
[3] http://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems
[4] http://en.wikipedia.org/wiki/Merkle-Hellman_knapsack_cryptosystem
[5] https://gist.github.com/eQu1NoX/dfa27a9d0c4a1651150a

Wednesday, May 8, 2013

Self solving keygens for basic crackmes ?

The capability of using SMT solvers to solve and reduce equations makes me wonder if it would be possible to generate and solve equations generated by analyzing x86 asm instructions. Each basic block were taken an equation could be generated with the RHS as the value being compared against just before the jump of a basic block(an oversimplification I agree, I doubt that this will have any practical implications at all!). This is mainly meant to be used on a parsed tracelog rather than the disassembly/binary as such. I have a very basic PoC uploaded here that uses the Python API for the z3 SMT solver, I'll keep uploading newer gists as revisions.

Saturday, January 19, 2013

Authenticated Encryption using AES-GCM -- a 50,000 foot view

I recently read the NIST specification for GCM and thought I'd document a very high level view of the same down here. Please note that this post will not go into the details of each step but will try to give you very high level view of how the various functions fit together in order to provide Authenticated Encryption.

As always, the referenced NIST paper is listed out in the References section. The images used are from the same paper.

GCM or the Galois Counter Mode is a mode of operation for the AES block cipher that provides Authenticated Encryption.

-===================]
Authenticated Encryption
Its a mode of a block cipher which provides not just confidentiality but integrity/authenticity too. In other words, if you encrypted blob has been tampered with(intentionally or otherwise), the algorithm provides a means for its detection.

-=============]
GMAC and AAD
There is a specialization of the GCM mode, wherein the mode is used to ensure the integrity and authenticity of the data alone -- this mode is referred to as GMAC. The data whose integrity alone needs to be maintained is referred to as Additional Authenticated Data(AAD).

-=============]
Functions of GCM
The two functions in GCM are Authenticated Encryption and Authenticated Decryption. Like in the case of the CTR mode for AES, both of these functions are parallelizable.
Its very interesting to note that it would be possible to give two kinds on input to the AE function at the same time -- the plaintext and the AAD ; in that particular case, GCM would protect the confidentiality and the integrity of plaintext and the integrity of the AAD.

To summarize, the AE function would take in the plaintext and the AAD, encrypt the plaintext and compute an "Authentication Tag" using the ciphertext and the AAD . The AD function would take in the ciphertext, the AAD, the Authentication tag, recompute the tag, verify its integrity and proceed to decrypt the ciphertext.

-====]
The IV
For the rest of this document we will use the terms IV, nonce and counter as follows :-
- The IV is the Initialization Vector which is created from the nonce and the counter value. You might be able to relate it to the IV in CTR mode where IV = nonce (concatenated with) counter. In GCM the IV creation is slightly different, but we'll look into it in more depth at a later stage.
- The Nonce is a value typically specified by the person doing the encryption. If you ever use an API to encrypt data(like BC for Java, Crypto++ for C++), chances are that you will be passing in this particular value(even though the API documentation might refer to this as "IV").
- The counter value is an internal counter that is incremented for every block. The initial state of the counter is generated by applying a function to the nonce value.

Now we will proceed to take a brief look at the different functions internal to GCM and its functionality :-

-==============]
Incrementing function
This is the function that is responsible for incrementing the counter value in GCM.

Given an input X and s, the function increments the rightmost s bits of the binary representation of X, modulo 2^s.

-============]
GHASH function

In the block diagram below, each column is a round of the GHASH function.

GHASH is a hash function which is not a cryptographically strong hash in itself but is responsible for integrity in GCM. It takes an input bit string of a length which is a multiple of 128 and outputs a GHASH block. The output of one round of the GHASH function is XOR'd with the input of the next round of GHASH and so on. The first GHASH round takes has the input XOR'd with a zero block of size 128(a bit string of 128 0's).


-===========]
GCTR function

For those of your who are familiar with the CTR mode of AES, you might find the block diagram below familiar :-



As shown in the above diagram, the Initial Counter Block(ICB) which is the IV, is passed through the Cipher function, and the result is xored against our input partitioned into 128 bit blocks. Of course the rightmost Xn block might not be a multiple of 128 bits and might be smaller than that. The output is the concatenated output ( Y1 || Y2 .... || Yn ). The values CB2, CB3... CBn are generated by applying the increment function on CB1, CB2 ... CBn-1 respectively.

-==========]
CIPH function
In this case we refer to the CIPH function as the underlying encryption algorithm. Here we are discussing AES, so CIPH refers to AES.

-======================]
Tying it all together to Encrypt
The following block diagram demonstrates how Authenticated Encryption happens :-


The following steps take place when the AE function is invoked :-

  1. You create an initial block "H"(which is then used as the hash subkey) but passing a zero-block of size 128 through CIPH
  2. Calculate a block J0 based on the IV value and size
  3. Compute a block C such that C = GCTRk(inc32(J0), P)
  4. At this point we know the ciphertext C and the AAD. We calculate values "u" and "v" based on the lengths of C and AAD which can be used for padding later on
  5. Steps from here are about computing the Authentication Tag. A block S is calculated by concatenating AAD, C, the length of C, the length of AAD and a padding with 0's. The padding length would be u + v.
  6. The S block is then encrypted using GCTR and the most significant "t" bits of the result form the Authentication Tag. i.e. T = MSBt(GCTRk(J0, S)). Please note that this is the only part of the whole process where GCTR is used to encrypt with J0.
  7. Return (C, T)
The bit length of the tag may be one of the following 5 values :- 128, 120, 112, 104 or 96. 


-=======================]
Tying it all together to Decrypt
The following block diagram shows how Authenticated Decryption works :-

The following steps take place when AD is invoked :-

  1. Check if the length of the IV, C and AAD are all supported by the implementation. Else FAIL.
  2. You create an initial block "H"(which is then used as the hash subkey) but passing a zero-block of size 128 through CIPH
  3. Calculate a block J0 based on the IV value and size
  4. Compute a block P such that P = GCTRk(inc32(J0), C)
  5. At this point we know the ciphertext C and the AAD. We calculate values "u" and "v" based on the lengths of C and AAD which can be used for padding later on
  6. Steps from here are about recomputing the Authentication Tag. A block S is calculated by concatenating AAD, C, the length of C, the length of AAD and a padding with 0's. The padding length would be u + v.
  7. The S block is then encrypted using GCTR and the most significant "t" bits of the result form the Authentication Tag. i.e. T' = MSBt(GCTRk(J0, S)). Please note that this is the only part of the whole process where GCTR is used to encrypt with J0.
  8. If T = T', return (P). Else FAIL


Reference:
1. NIST Recommendation for Block Cipher Modes Of Operation: Galois/Counter Mode(GCM) and GMAC

Sunday, January 13, 2013

A Quick High-level look at RSA


Todays post is my attempt at reproducing my understanding of how the Math in RSA works.

I've mentioned references at the bottom of the email -- hopefully those links will help you dig deeper and provide you the clarity this post might not be able to provide.

Well, lets get started. I'm going to assume that the reader already know what PKI is, and that public keys and private keys are used to encrypt and decrypt respectively and that private keys are used for signing.

Most tutorials start off with a little bit of Math, and this is for good reason, I assure you. Also, whenever a theorem is mentioned I strongly urge you to play around with Sage, and check if the theorem does hold true(which is probably the next best thing to reading the proof of the theorem itself).

-==================]
The Modulus operation

As computer programmers we all know what the following does :-

                       a = b % c;

NOTE:
=> From this point onward "=" refers to the "congruence operator" and not the "equality operator".

When we talk about Modular arithmetic in Math, we would write the above statement as :-

                        b = a (mod c)

-=============================]
Primes and Relatively-Prime numbers

We all know what primes are, so I'm not even going to bother with it. Relatively Prime numbers are the ones that have a gcd of 1.

gcd (a, b) = 1, iff a and b are relatively prime.

If you give it a bit of thought you realize that 1 is relatively prime to every number out there.


-======================]
Extended Euclidean Theorem

Again you've probably heard of this, but I'll just mention it anyhow. With Extended Euclidean Theorem, you can calculate values x and y such that,

ax + by = 1,

when gcd(a, b) = 1

It isn't hard to do this on pen and paper for small values -- I also urge you to check out the sage function "xgcd" which performs this computation for you.

-=====================]
The notation "Zn" and "Zn*"

Zn has different meanings in different contexts -- however in the context of Cryptography, Zn is the set of all numbers from 0 to n-1.

Zn* is the a subset of Zn in which all the numbers are relatively prime to n

So far, so good huh?

-=====================]
When I say....

Later in this post I will say stuff like "is 1 in Zn".

What I mean when I say that is :-

a = 1 mod n "


-====================]
Fermat's Theorem

Fermat came up with a ton of theorems -- one of them stated the following :-

" For any x in Zp* such that p is prime, the value x^(p-1) is 1 in Zp ".

In other words,

x ^ (p-1) = 1 mod p "


Why this is relevant will hopefully become more clear in the next part.

-====================]
Euler Totient Function

Simply put, its the size of Zn*.

In other words, it is the count of the numbers in the range 0 to n-1 that are relatively prime to n.

-====================]
Euler's Theorem

About a hundred years later, Euler came with an extension of Fermat's theorem. Fermat's theorem only applied to primes, but Euler extended it to composite numbers too.

" For a number x in Zn*, the value x^(phi(n)) is 1 in Zn "

In other words,

x ^ (phi(n)) = 1 mod n "

Even better, the theorem suggests a way to compute phi(n) for certain values of n.

If n = p * q such that and are primes, you can compute n as :-

phi(n) = phi(p) * phi(q) = (p-1)(q-1)"

Please note that if n is sufficiently huge and if its prime factorization cannot be computed, phi(n) also cant be computed.

-====================]
Steps of RSA

About time, huh?

Step 1. Choose two primes p and and compute n = p * q
Step 2. Choose a value e in Zn*
Step 3. Using Euler's Theorem, compute d such that 

" de = 1 ( mod phi(n) ) "

At this point, our public key is (n, e) and our private key is (p, q, d).

Step 4. Encryption is done as 

c = m^e (mod n) "

Step 5. Decryption is done as

d = c^d (mod n) "

-====================]
A short note on Step 3

At step 3 we have the equation :-

" de = 1 ( mod phi(n) ) "

Please note that this can be re-written as :-

d.e - k.phi(n) = 1

At this point, the value of d is solved using the Extended Euclidean Theorem.



-====================]
References :-
- "Number Theory and the RSA Cryptosystem" - Minh Van Nguyen
- Stanford Coursera course on Cryptography(10-1, 10-2) - Dan Boneh

Friday, January 11, 2013

PoCS : Synfloods and SynCookies

This is the first part of hopefully quite a few posts to come in the "Papers in Computer Security" series taken from [1]. Today we will have a look at "Syn Cookies"[2], [3].

A SynFlood is a kind of DoS attack where you create try to get a server to open more "half-open" TCP connections than it can handle, eventually reaching a state where legitimate requests to the server dont get served. A half-connection happens when a client sends over a SYN, the server responds with a SYN-ACK and waits for a ACK back from the client, but does not receive it.

A classic pattern that can be noticed in DoS attacks is that it costs nothing for a client to send out a new connection request but the server has to spawn a new thread/process to handle each request -- there is a certain asymmetry in that scenario.

A solution to this problem would be to recognize an ACK to belong to the same TCP stream as the SYN that came initially but not have a thread waiting for the ACK at the same time. One of the solutions would be to use SYN Cookies.

A SYN Cookie is a choice of a TCP sequence number.

First a quick review of how TCP sequence numbers work in a TCP handshake :-

Client -> Server : Syn(n)
Server -> Client : Syn(n+1), Ack(m)
Client -> Server : Ack(m+1)

"n" and "m" are the sequence numbers chosen at random by the client and the server respectively. Also, please note that an ACK(m) typically means "I have received all packets upto m-1, send me the packet corresponding to m". In this case, it doesn't really apply however.

When using SYN cookies, the sequence number "m" is chosen such that :-

  • The top 5 bits are a counter mod 2^5, which increments every 64 seconds
  • The next 3 bits is an encoding of the MSS selected by the server in response to the clients MSS
  • The last 24 bits is a secret function of t, client IP and port, server IP and port
Now when the server receives (m+1), it recomputes "m" using a recent value of "t"(I'm not very sure how "t" is kept track of) and it is compared with the sequence number in the ACK it just received.

In order to allow SYN Cookies in Linux you'd have to do :-

echo 1 > /proc/sys/net/ipv4/tcp_syncookies



Now when an attacker spoofs the source address and sends over a SYN when the server has a full SYN queue, the server does not drop the connection -- instead it operates as it would have had there been additional space in the SYN queue and sends over an ACK with a sequence number computed as mentioned above.

Its of utmost importance however, that attackers be unable to predict "m".

Also, its quite obvious that this does not solve the problem of DDoS attack caused by say, a botnet as full connections are established with clients in that case.


[1] https://www.cs.utexas.edu/~shmat/courses/cs380s/syllabus.html
[2] http://cr.yp.to/syncookies.html
[3] https://www.cs.utexas.edu/~shmat/courses/cs380s/syncookies.pdf