Tangible Computing
7. Diffie-Hellman Key Exchange

7.1 Talking Arduinos

We can get two Arduinos to talk to each other over their serial ports by connecting them together as per the diagram:

Note that the connections you want to make are:
Machine A TX1 ----> Machine B RX1
Machine A RX1 <---- Machine B TX1

In case one or more machines is a laptop, you may also need to connect
Machine A GND ----- Machine B GND
The reason for connecting the two grounds together when you have a laptop involved is that the the laptop has its own ground that is typically isolated from the ground of the power supply. The two machines need to share a common reference point for 0 Volts, and that is the ground.

Warning: when connecting two arduinos that are attached to desktops together, you typically do not connect their grounds together. They already have a ground established by the USB ports, and so the boards have ground potentials already established. But, if the desktops are plugged into different circuits, it is possible for them to have different ground potentials (although they should not). If you then connect their grounds together you can have a current running between them - not a good thing. After connecting the serial ports together, you can use this code to exchange messages. Characters typed in one serial monitor appear in the other, and vice-versa.


    void setup() {
      // initialize both serial ports:
    void loop() {
      // read from serial port 1, and send to serial monitor on port 0
      if (Serial1.available()) {
        // returns one byte of unsigned serial data, or -1 if none available
        int16_t inByte = Serial1.read();
      // read from serial monitor on port 0, and send to port 1
      if (Serial.available()) {
        // returns one byte of unsigned serial data, or -1 if none available
        int16_t inByte = Serial.read();

If we take a thrid Arduio (say called Eve) running the same program, and attach only its RX line to one of the TX lines between the two talking Alice and Bob Arduinos we can evesdrop on their conversation.

So in general it's a good idea to prevent Eve from understanding our conversation even if they manage to connect up to our transmission lines. We can implement a simple encryption scheme that uses modular arithmetic.

Remember that in C/C++, the % is the mod operator, where a % b is the remainder after dividing a evenly by b, provided that a is non-negative, and b is positive. If one or both of a and b are negative, you need to determine how your % operator behaves or your modular arithmetic will be off.

Given an encryption key, we can take a single character message m and encrypt it to the single character m' as follows:
m' = (m + key) % 256
and later decrypt it by doing
m = (m' - key) % 256
This is not a very secure encryption algorithm!

7.2 Diffie-Hellman Key Exchange

Diffie-Hellman was one of the first algorithms for public key distribution, invented in 1976. Alice wants to send a message to Bob, but they have not met in-person to share a secret key. If their communication is always insecure, then how can Alice send a message to Bob that only Bob can read?

For example, suppose Bob is gpu.srv.ualberta.ca and Alice is a student that wants to communicate her password to Bob without the ever-eavesdropping Eve snooping her password. Alice and Bob can do so with this process of secret key generation, all communicated publicly, yet Eve is unaware of their secret.

We now walk through the process:

  1. Agree on a prime and generator Alice and Bob agree on a large prime, p, and g such that g is a generator for the multiplicative group of integers modulo p. What this means is simply that
    g^0 mod p, g^1 mod p, g^2 mod p, ..., g^(p-1) mod p
    are all of the unique integers in the range 1, ..., p-1$. The pair (p,g) can be made public or even shared by a group of users.

    We will choose fixed values: p = 37, g = 5.

    Question: how do you know that g is a generator for the modulus p?

  2. Choose secret personal keys Alice and Bob each choose secret random keys, say a and b, in the range 1 to (p-1).

  3. Compute the index of the secret Each computes the index of their secret,
    A = g^a mod p B = g^b mod p

  4. Exchange indexes publically Alice sends A to Bob, and Bob sends B to Alice.

  5. Compute the shared sceret Alice then computes k = B^a mod p, while Bob computed k = A^b mod p.

    Both have computed the same key, k, so now they have a shared secred. But Eve has only seen (p,g) A and B. These are insufficient to compute k. Eve needs to be able to compute the discrete log modulo p. That is, given X, Eve needs to be able to compute the y such that X = g^y mod p. This is considered to be difficult for large primes p.

  6. Used the shared secret for further communication. At this point the shared key k can be used as the key to some kind of encryption function.

7.3 Implementing Diffie-Hellman on the Arduino

First, we need to get some prime-generator pairs that are the appropriate size for the arithmetic we can perform. A quick call to our local Primes-R-Us gives us this:

Num Bits p g factors of p-1
15 19211 6 2 x 5 x 17 x 113
31 2147483647
Mersenne 31
also 7
2 * 3^2 * 7 * 11 * 31 * 151 * 331

Let's trust these numbers for the moment, and assume they are suitable (p,g) pairs. There is a way to check if g is a generator when we have the factors of (p-1).

Let's implement simple Diffie-Hellman key exhange.
What are the things we need to compute?
How do we perform things like the mod operation when we have limited bit-length numbers?

All arithmetic that we do here is with the natural numbers. But it is also modular, and in two different ways. One is that we will be using modular arithmetic in the cryptographic computations, modulo our prime p. The other, a bit more subtle, is that because we are using numbers of a fixed number (say n) of bits, we are also doing modular arithmetic mod 2**n. We have to be careful about not overflowing our n-bit representations.

When faced with a complicated task like DH, it is best to break the larger task into smaller tasks. These smaller tasks ideally should be self-contained, solved independently, and easily tested on their own. How can we break down the DH procedure into smaller tasks? Once we have identified subproblems then we can solve these problems almost independently, and in any order.

If we can't do modular exponentiation then we are blocked. So we should work on that first.

7.4 Thinking like a Computer Scientist: Modular Exponentiation

We are ready to focus on our first subtask for DH: modular exponentiation. Recall that both encryption and decryption involves the following computation:
(b ** e) mod m
where b is the base, e is the exponent, and m is the modulus.

Key exchange raising the generator to an exponent and taking the result mod a prime. Other kinds of encryption, like RSA raise a message, M, to the encrypting exponent, E, modulo N. Decryption raises the cipher text, C, to the decrypting (private key) exponent, D, modulo N. Thus it makes sense to figure out how to do this in general for given b, e, and m.

When we see some step or computation that is self-contained and used repeatedly, you should immediately think of writing a procedure (or subroutine, function, method, etc). We have already seen procedures, but let's look at the parts of a procedure more closely.

7.5 Starter Program

Load the following program into the Arduino IDE. Note that it only involves communication over the serial port, additional input/output components needs to be hooked up.


    // global state variables
    void setup()
    void loop()
      uint32_t b, e, m;
      char line[128];
      b = readlong();
      Serial.print(" ");
      e = readlong();
      Serial.print(" ");
      m = readlong();
      Serial.print("answer = ");  
      Serial.println(pow_mod(b, e, m));
    // common functions and procedures
        Compute and return (b ** e) mod m
        For unsigned b, e, m and m > 0
    uint32_t pow_mod(uint32_t b, uint32_t e, uint32_t m)
      return 0;
        Read a sequence characters from the serial monitor and interpret
        them as a decimal integer.
        The characters can be leading and tailing blanks, a leading '-',
        and the digits 0-9.
        Return that number as a 32 bit int.
    int32_t readlong()
      char s[128]; /* 128 characters should be more than enough. */
      readline(s, 128);
      return atol(s);
        Read a \n terminated line from the serial monitor and store the result 
        in string s.
        String s has maximum size, including the terminating \0, of bufsize
        characters.  If the input line is longer than can be stored in s, then
        it is truncated.  bufsize must be at least 1.
        s will always be a properly terminted string.
    void readline(char *s, int bufsize)
      uint8_t i = 0;
      while( i < bufsize-1 ) {
        // wait for a character 
        while (Serial.available() == 0) { } /* Do nothing */
        // grab the character and save in the buffer
        s[i] = Serial.read();
        // if end of line or somehow a \0 got sent, we are done
        if (s[i] == '\n' || s[i] == '\0') break;
        i += 1;
      // \0 teminate the string
      s[i] = '\0';

If you look over loop() you will see that this program simply reads three unsigned long numbers from the Serial monitor, and then prints the result from calling the procedure pow_mod with these values. Currently, pow_mod does nothing.

7.6 Procedures

The SOS and Colour Intro Labs discussed procedures but let's review. The pow_mod program contains a number of procedures. By default, setup and loop are procedures that must be defined for all Arduino programs. Let's look in depth at the procedure pow_mod, as that is where we will be solving our first subtask.
uint32_t pow_mod(uint32_t b, uint32_t e, uint32_t m) {
     return 0;
Its definition has some key parts:
  1. First, procedures are always defined outside of any other procedure.
  2. Second, prior to the procedure name is the procedure's return type: the type of value returned by the procedure. If the procedure does something (e.g., turn on an LED) but does not compute a value, then it may have no return value, in which case its type is void.

    In this case the procedure's return type is uint32_t (unsigned long).

  3. Third, after the name, inside of the parentheses are the list of the procedure's arguments. Arguments, are values the procedure takes as input, which are used to control what the procedure does or what is returned. A procedure can take more than one argument, each separated by a comma. When a procedure is defined, each argument is defined by a type and a variable name. These variable names can then be used inside of the procedure to refer to the values passed to the procedure when it was called.

    pow_mod's arguments are three uint32_t inputs called b, e and m.

  4. Finally, the procedure definition contains a block of code (within { ... }) which is the code to be executed when the procedure is called. If the procedure has a return value (besides void) then the statement return RETURNVALUE; causes the procedure to finish with the value RETURNVALUE.

    Currently pow_mod doesn't execute any other code besides returning the value 0. We want this function to compute B to the exponent E modulo M, and return that value.

7.7 Algorithm Design

Now we get to be computer scientists! What is an algorithm or detailed procedure for doing modular exponentiation?

There are potentially many different such procedures. Let's start by thinking of the simplest possible one.
ulong pow_mod(ulong b, ulong e, ulong m) {
     ulong r = 1;
     for (ulong i=0; i<e; i++) {
         r = r * b % m;
     return r;
The notes here are purposefully blank. Think of a procedure for doing modular exponentiation.

(Actually, it's not blank. If you highlight the blank space you'll see some code for doing modular exponentiation.)

Implement your proposed procedure. Or the one above, after you are sure you understand it!

7.8 Algorithm Analysis

Computer science doesn't stop with the design of an algorithm. Rather, this is where computer science starts: algorithm analysis. Just having a potential solution isn't good enough. There are many questions we likely need to answer about any algorithm for solving a problem. Broadly speaking though, there are really just two questions.
  1. Correctness. Does it produce the desired output? Or more commonly, for what inputs does it produce the desired output?
  2. Efficiency / Complexity. How does it scale? How long does it take to execute (as a function of the inputs)? How much memory (or other resources) does it use?
We will never stop asking and answering these questions. Every time we design an algorithm these questions will be at the forefront. We will throughout the course develop fancier techniques for doing algorithm analysis, but let's just analyze our first algorithm intuitively.

Is our program correct? Or, for what inputs will the program be correct? How do you know? Is it correct when the input numbers are very small? What about very large? What's the range of input values for which your procedure will work?

How fast is the program? Does the program run faster or slower if the argument b gets large? What about e? Or m? How long would it take to answer 7 ** 123456789 mod 13? How can you figure that out?

Why do we care about the answers to these questions?

Let's start by investigating the speed of our algorithm empirically using the micros() timing function. We can time a portion of code using:
long time = micros();
... section of code to time ...
time = micros() - time;
Serial.print("time = ");
The function micros() returns the number of microseconds since the Arduino was turned on. By recording this in time we can subtract off the new value after the code executes to find the amount of elapsed time taken to execute the intervening lines of code. Use this formulation to answer the following questions on our exponentiation function.
  1. How does the size of b affect the computation time? Try small values of b, e, and m. Then double the value of b. Again. Again. Again. Does its size affect the running time?
  2. Do the same thing for e and m? Which affects the running time?
  3. Make a table of different values of e and the resulting running time. Have e range from 2 up to 8192. Plot the results on a graph.
  4. Figure out how long it would take to compute 7 ** 123456789 mod 13.
If you made a graph, you would see it's a line. So we say that the running time of this algorithm for pow_mod is linear in the value of e. What would the running time graph look like for b or m? In such a case, we say the running time is constant in these parameters.

We can just look at the original algorithm and come to the same conclusion that we did by running and timing it. Let's start by assuming that addition and multiplication (as well as subtraction, division, mod, bit-shifting, et cetera) takes constant time, regardless of the size of the numbers (at least for any values representable as an unsigned long). This is essentially true, in fact. Now for each line of code of the algorithm, consider: You can introduce new constants, such as T_mul or T_add or T_mod, to refer to the time it takes to do multiplication, addition, and mod (et cetera). Now the total running time can written as just the sum over every line of code of the number of times it is executed multiplied by its execution time. In the end we get one big formula.

Does the resulting function depend upon b or m? Does it depend upon e? What does the formula look like? If we gather terms, we can see it has the same form as the equation of a line. So we can conclude that our algorithm is linear in e without even running the algorithm!

How do we know the output of our program is correct? One answer, is to try running it on inputs where we know the answer and can check if it's correct. If not, we might have a bug: an error in our program. Or, our algorithm might be wrong, and it doesn't work for all inputs. As long as we know the subset of inputs for which it is correct (and if this subset contains the inputs we want to use it on), this may still be good enough.

What example inputs should we test it on? This is a hard question. If we want certainty, we should test it on every input we might ever need it to run on. However, this is rarely practical. We will usually accept a small number of test cases. We can still be smart about which test cases to choose. In particular, it is a good idea to test boundary cases. Boundary cases are the extremes of the input range where it is expected to be correct. Since, the input numbers for our function are non-negative integers, this means we should have test cases that test the answer when the inputs are near zero (e.g., 0, 1, and 2), our boundary on the input. We should also test a small number of general cases.

Since we cannot empirically test our function on all possible inputs, can we analytically prove that it is correct without having to execute it at all? Let's prove it's correct.

We can use a form of induction, called a Loop Invariant. A loop invariant is a statement about the relationships of the variables that is true at the beginning of each time through a loop. What is true about the relationship of r to b and e at the start of each iteration of the while loop?
r = (b ** i) % m
We can prove this by induction. First we start by proving it is true before the loop starts. i = 0, so (b ** i) % m = 1 = r. Now, prove that if the loop invariant holds at some point, that it will hold at the start of the next iteration. So r = (b ** i) % m. At the end of this iteration r = ((b ** i) * b) % m = (b ** (i + 1)) % m. But then i is incremented so r = (b ** i) % m, thus proving our loop invariant.

Now we can prove our algorithm is correct. When the loop terminates, our invariant will hold and in addition i = e (why?). This gives us r = (b ** i) % m = (b ** e) % m. So returning r returns the value (b ** e) % m.

7.9 A Better Algorithm

We clearly need a better algorithm. While there's no guarantee a faster algorithm will always exist, in our case one does. Take some time to think about it. Try considering the following hints:
  1. Hint #1: Suppose e was a power of 2. In other words, we could write e = 2**k for some value k. Is there a faster algorithm for pow_mod for just this case? If you find such an algorithm, think about the case where e is the addition of two powers of 2, i.e., e = 2**k + 2**j. Can we always write e as the addition of some number of powers of 2? Can we use this fact to construct a general algorithm?
  2. Hint #2: Consider v = b*b;. What is v? It is b**2. Now consider v = v*v;. What is v now? It is b**4. What if we do this k times? Now go back and read Hint #1 again.

7. Diffie-Hellman Key Exchange
Tangible Computing / Version 3.20 2013-03-25