How Bitcoin Works

We build a blockchain pretty much.

Why blockchain?

Before we can look into how blockchain works, we must first understand what problem it’s trying to solve.

Blockchain is a distributed ledger that ensures trust between a number of concurrent users who don’t trust each other, or a third party ledger.

How does a blockchain work?

There are several different variations of blockchain protocols, for this I’ll be using a very simple protocol; way to simple to be deployed, but useful for education.

A blockchain data structure is essentially a linked list of blocks.

Each block contains a number of transactions, the hash of the previous block as well as a proof of work.

About the proof of work.

The proof of work is what is ensuring the integrity of the blockchain. In order to append a block to the chain, you must have found an integer $p^*$ such that the proof of the previous block $p$ concatenated with $p^*$ has a number of leading zeros.

Exactly how many zeros depends on the difficulty scale of the blockchain. On production level blockchains, the difficulty automatically adjusts depending on the number of users who are trying to find a suitable proof, or ‘mine’ blocks.

Okay, that’s fine, but what happens when two nodes conflict on what blocks should go into the chain?

If there’s a conflict about which blocks should go in the chain, and the nodes end up with different chains, the node with the longest chain is the authoritative one.

And that’s all there’s to it.

Implementing a blockchain

What we will be implementing:

Core data structures

Proof of work Simple networking Consensus algorithm As there’s about 300 lines of code, I won’t be showing everything here; just the important bits. You can see all the code on Github.

Core data structures

The main object we’re going to interact with is the Blockchain class which is going to contain the chain itself as well as the uncommitted transactions which will be baked into the next block.

class Blockchain():
  def __init__(self):
    self.current_transactions = []
    self.chain = []

    # genesis block
    self.new_block(previous_hash=1, proof=100)

We also have a genesis block. The reason we have that is so that when we add our first real block, we can reference the hash of the previous block which would be the genesis block.

The attributes of the genesis block aren’t important. The only thing that matters is that it’s a valid block.

Creating a new block.

class Blockchain():
  ...
  def new_block(self, proof, previous_hash=None):
    block = {
      'transactions': self.current_transactions,
      'proof': proof,
      'previous_hash': previous_hash or self.hash(self.chain[-1]),
    }

    # Reset the current list of transactions
    self.current_transactions = []
    self.chain.append(block)

Adding a new block is relatively simple, but there are a few things we need for it to work properly.

Firstly, we need to embed the transactions which is what we actually care about.

Secondly, we need the proof that this block is valid; that is the integer p^*p∗ that makes the hash of the previous proof concatenated with the current proof have a certain number of leading zeros.

Thirdly, we need to include the previous hash. This is so we can check if the chain has been corrupted later on.

As a bit of cleanup, we need to empty the transactions cache so that we don’t commit the same transactions in multiple blocks.

Adding transactions to future blocks is also really easy:

class Blockchain():
  ...
  def new_transaction(self, payload):
    self.current_transactions.append(payload)

Where the payload is whatever content you want to store in your blockchain; whether it is actual financial transactions, posts on a social network, or pledges to sell one’s soul; it doesn’t matter.

Proof of work

Proof of work is what ensures that nobody tampers with the blockchain; atleast without having access to more than 50% of the resources on the network.

class Blockchain():
  ...
  def proof_of_work(self, last_proof):
    # use random guessing to enable concurrent operation
    proof = random.randint(1, 2**32)
    while self._valid_proof(last_proof, proof) is False:
      proof += 1
    return proof

  def _valid_proof(self, last_proof, proof, leading_zeros=4):
    guess = f'{last_proof}{proof}'.encode()
    guess_hash = hashlib.sha256(guess).hexdigest()
    return guess_hash[:leading_zeros] == "0"*leading_zeros

As stated previously, the goal of proof of work is to find an integer, proof, $p^*$ such that $\text{hash}(p\Vert p^*)$ has a certain number of leading zeros. Where $p$ is the proof of the previous block.

Since we’re using a very strong hashing algorithm; sha256, it’s difficult to calculate a $p^*$ such that the hash starts with a number of zeros.

In fact, a viable approach is just to try a bunch of different numbers and hope you will get one right eventually.

We can do this very simply by incrementing $p*$ due to avalanche property of hash algorithms which causes the output to be wildly different even when there’s just a small change in the input such as incrementing a number.

When we eventually will have multiple nodes working on finding the same $p*$ it’d be silly to have them all start at zero and doing redundant work by checking solutions which are already known not to work.

To combat this, we start the process at a random number between $1$ and $2^{32}$ which is the upper limit for a 32 bit unsigned integer.

Networking

It’d not be very interesting to have a blockchain if there’s just one node contributing to it.

You have to be a special kind of paranoid if you don’t even trust yourself.

In order to make the blockchain work over the network with multiple nodes, we need a data structure to keep track of all the nodes on the network.

class Blockchain():
  def __init__(self):
    ...
    self.nodes = set()

For this, we simply use a set which solves the problem of duplicate nodes.

We don’t need anything fancy to manage nodes.

class Blockchain():
  ...
  def register_node(self, address):
    parsed_url = urlparse(address)
    self.nodes.add(parsed_url.netloc)
  ...
  @property
   def get_nodes(self):
    return self.nodes

As I said in the beginning, this is very simple networking.

In the Github repository, there’s also included a lightweight Flask wrapper which exposes an api, so the blockchain is actually (barely) usable.

Consensus algorithm

The consensus algorithm ensures that the nodes agree on which blocks are committed to the chain.

We use a relatively simple algorithm where the longest chain is the authoritative one.

class Blockchain():
  ...
  def resolve_conflicts(self):
    new_chain = None

    # only look for chains longer than current chain
    current_length = len(self.chain)

    for node in self.nodes:
      # uses the Flask API available in the repository
      response = requests.get(f'http://{node}/chain')
      content = response.json()

      if response.status_code == 200:
        length = content['length']
        chain = content['chain']

        # check if other chain is longer than our chain
        if length > current_length:
          current_length = length
          new_chain = chain

      # replace chain if other chain is longer
      if new_chain:
        self.chain = new_chain
        return True
    return False

So the algorithm for resolving any conflicts is very simple; it just checks the chains of the other nodes on the network, and if one of their chains is longer than our own, we replace our own with the longer version.

And that’s really all there’s to it.

You now have the core of a functioning blockchain.

It may seem a bit daunting, but once you’ve gotten some time to digest it all, you will likely find that blockchains aren’t that complicated after all.

For the full implementation, check out the Github repository.