Hyperledger Fabric: Transitioning from Development to Production

You've set up your development environment. Designed your chaincode. Set up the client. Made some decent looking UI as well. Everything works fine locally. All of your tests pass, and all the services are up and running.

All this, and you still aren't satisfied. You want to do more. You want to scale the network. Simulate a production level environment. Maybe you even want to deploy to production. You give it an attempt. Add multiple orderers. Add more organizations with their own peers and CAs. You try testing it with multiple machines.

This is where everything you've done fails, no matter what you try. You somehow debug everything and come up with some nifty hacks you're not entirely proud of. Some parts work, some don't. You wind up the day, still not satisfied.

I don't know if you can relate to this. I went through this phase, racking my head and taking multiple cups of coffee trying to clear my head and starting the debugging from scratch. Multiple facepalms and "ah, this is how this works" later, I decided to write this article, and be your friendly neighbourhood developer-man(sorry).

The First Steps

Fabric provides two consensus protocols for the network - Solo, and Kafka-Zookeeper. If you've only been working with Solo mode (configurable in configtx.yaml), this is where you change it. When we want our network to be shared among more than 3 peers, it makes sense to have multiple orderers, in the case where one particular, cursed node goes down.

Now by design, orderers are like the postmen in a Fabric network. They fetch and relay the transactions to the peers. Solo mode requires all our orderers to be up, and if one goes down the entire network goes down. So here, the trick is to use a Kafka based ordering service. I've comprehensively explained how it works in Fabric here.

Fabric docs provide some great best practices to follow.

So here goes our first step - Solo to Kafka. Don't forget to specify the brokers in your transaction config files. A sample has been provided below.

configtx.yaml

---
Orderer: &OrdererDefaults
  OrdererType: kafka
  Addresses:
    - orderer0.example.com:7050
    - orderer1.example.com:7050
    - orderer2.example.com:7050
    - orderer3.example.com:7050
  BatchTimeout: 2s
  BatchSize:
    MaxMessageCount: 10
    AbsoluteMaxBytes: 99 MB
    PreferredMaxBytes: 512 KB
 
  Kafka:
    Brokers:
      - kafka0:9092
      - kafka1:9092
      - kafka2:9092
      - kafka3:9092

Now that we have our orderers' dependencies resolved, let's dive into some network level tips.

A Swarm of Services

All the images that we use for Hyperledger Fabric are docker images, and the services that we deploy are dockerized. To deploy to production, we've two choices. Well, we have a lot of choices, but I'll only describe the ones that might just not get you fired. For now.

The two choices, are Kubernetes and Docker Swarm. I decided to stick with Docker Swarm because I didn't really like the hacky docker-in-docker set up for the former. More on this here.

What's a swarm?

From the Docker docs,

"A swarm consists of multiple Docker hosts which run in swarm mode and act as managers (to manage membership and delegation) and workers (which run swarm services). A given Docker host can be a manager, a worker, or perform both roles".

It is a cluster management and orchestration method, that is featured in Docker Engine 1.12.

So consider this - the blockchain administrator can have access to these manager nodes, and each potential organization can be a designated worker node. This way, a worker node will only have access to the allowed resources, and nothing gets in the way. Swarm uses the raft consensus algorithm, and the managers and workers accomodate replication of services, thus providing crash tolerance.

DNS - Do Not Sforget_to_give_hostname

Not giving hostnames to your services is a rookie mistake. A hostname for every service is necessary. This is how one service in the Swarm maps the IP address to the host name. This is used extensively for Zookeeper ensembles.

Consider 3 Zookeeper instances. We would have these depend on each other for synchronization. When these are deployed to a swarm, the Swarm gives an internal IP address, which is pretty dynamic.

---
zookeeper0:
  hostname: zookeeper0.example.com
  image: hyperledger/fabric-zookeeper
  environment:
    - ZOO_SERVERS=server.1=zookeeper0:2888:3888 server.2=zookeeper1:2888:3888 server.3=zookeeper2:2888:3888

Another Zookeeper instance that would depend on this,

---
zookeeper1:
  hostname: zookeeper1.example.com
  environment:
    - ZOO_SERVERS=server.1=zookeeper0:2888:3888 server.2=zookeeper1:2888:3888 server.3=zookeeper2:2888:3888

Restrictions and Constraints

Usually, inside a Swarm, the services are distributed to any node by default. Docker Swarm will always try to improve the performance on the manager nodes. Therefore, it will try to distribute the services to the worker nodes. When a worker node goes down, the manager can redistribute the services inside the swarm.

Now there are some very important services that we would like to keep afloat, like the Kafka-Zookeeper ensemble, since they synchronize transactions among all the orderers. Hence, what we would like to do here is make sure that we don't suffer any downtime. There also may be a service that holds certificates, and it is important that the certificates don't leak in the Swarm network. Therefore, we need restrictions on stack deployment in the Swarm.

We can constrain the services deployed to nodes in the swarm. A simple example is shown below.

---
services:
  peer0:
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
      placement:
        constraints:
          - node.role == Manager
          - node.hostname == Skcript

Note that the deploy section defines the crash handling as well, so use it to your benefit. When we implement constraints, almost every authentication issue can be resolved. You might observe that this defeats the purpose of docker Swarm, where it is supposed to maintain the state of the Swarm as much as possible. If such is the case, you'll have to spend more on another node that can handle downtime and probably extend your constraints.

Protecting Certificates and Crypto-material

Certificates are not supposed to be disclosed to other entities in a network. Ideally, the easiest thing to do, is to supply the certificates of a particular organization to the node hosting it, and install to a location that is common to both a docker container and a simple linux machine, like /var/<network-name>/certs. This way, you mount only the required volumes. Speaking of volumes, when mounting, be sure to have absolute paths. You can only deploy services to a Docker Swarm from a manager node, and hence it needs to have the certificates at the location in a node hosting it, else the service will shut down.

An example:

docker-compose-peer.yml

---
volumes:
  - /var/run/:/host/var/run/
  - /var/network/certs/crypto-config/peerOrganizations/state.example.com/peers/peer0.org1.example.com/msp:/var/hyperledger/msp

The /var/network/certs/ directory should be copied in the host worker node before deploying the service.

I don't recommend a standalone service floating in the swarm, that anyone can access.

Naming the services

docker stack deploy doesn't allow certain characters for service creation. If you worked directly from the tutorials, you would have the service names like peer0.org1.example.com, orderer0.example.com.

So it is better to name them as peer0_org1 , and orderer0 and so on.

Fetching the names

Docker Swarm usually prefixes the stack name to the service, and the suffix is a SHA256 hash. Hence, to execute any commands, we need the name of the services, given to them by the Swarm. So for example, if you've named your service peer0_org1, and the stack you've deployed it to is deadpool, the name that swarm will give it would look like deadpool_peer0_org1.1.sa213adsdaa…..

You can fetch its name by a simple command,

docker ps --format="{{.Names}}" | grep peer0_org1

PRO TIP: Environment variables are your best friends. Do have a dedicated .env for all your scripts.

Channels and Chaincodes

When you have multiple organizations, and you want custom channels to run amongst these, and install different smart contracts on each channel, this is how it should work.

Channel creation has to be done based on what is defined in the configtx.yaml. You create the channel, and join the respective peers. A sample channel creation command looks like this,

docker exec -e "CORE_PEER_LOCALMSPID=Org1MSP" -e "CORE_PEER_MSPCONFIGPATH=/var/hyperledger/users/Admin@org1.example.com/msp" "$ORG1_PEER_NAME" peer channel create -o "$ORG1_ORDERER_NAME":7050 -c "$ORG1_CHANNEL_NAME" -f "$ORG1_CHANNEL_TX_LOCATION"

Now to join the channel from a seperate organization, fetch is necessary. Note that 0 here means we are fetching the 0th block.

# fetch the channel block
docker exec -e "CORE_PEER_LOCALMSPID=Org1MSP" -e "CORE_PEER_MSPCONFIGPATH=/var/hyperledger/users/Admin@org1.example.com/msp" "$PEER_NAME" peer channel fetch 0 -o "$ORDERER_NAME":7050 -c "$CHANNEL_NAME"
 
# join the channel
docker exec -e "CORE_PEER_LOCALMSPID=CHMSP" -e "CORE_PEER_MSPCONFIGPATH=/var/hyperledger/users/Admin@ch.example.com/msp" "$CH_PEER_NAME" peer channel join -b "$CHANNEL_NAME"_0.block

Chaincode creation is similar, however it is interesting to note that you have to instantiate the chaincode only on one peer in the channel. For the other peers, you will have to manually install the chaincode, but instantiating is not required. An invoke command should fetch the most currently instantiated chaincode, and simultaneously execute the transaction as well.

Winding up

So if you're able to invoke and query successfully, you should be good to go.

Document your configuration well, create some scripts that save time, like ssh-ing into the services and executing the above commands from a manager node itself.

Attach your client to the services above, and enjoy your Production-level Fabric Network set up. Try deploying your network to the IBM cloud, or AWS.

If you have any queries or suggestions, do comment below.