Monday, February 10, 2020

Automated deployment of HDFS cluster, Kubernetes cluster, and Google Compute Engine instance using Terraform and Ansible in 20 minutes

Summary:

This post will walk through steps about Automated deployment of HDFS cluster, Kubernetes cluster, and Google Compute Engine instance's using Terraform and Ansible in 20 minutes.


Environment:
  • Google compute engine (not GKE)
  • Hadoop HDFS cluster running on Kubernetes cluster
  •     Zoo keeper - 3
  •     Name node - 2
  •     Data node - 3
  •     Journal node - 3
  • Kubernetes cluster running on GCE
  •     Master node -1
  •     Worker node - 3
  • Jump host with Vagrant Centos 7 running on laptop




Prerequisites:

  • Install Ansible, Terraform, and git on jump host

# ansible --version
ansible 2.9.3
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, Aug  7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]


# terraform version
Terraform v0.12.20



  • Download the Terraform IaC and Ansible playbook from git repository 
 https://github.com/vdsridevops/terraform-k8s-gce-hdfs.git

# git clone https://github.com/vdsridevops/terraform-k8s-gce-hdfs.git
Cloning into 'terraform-k8s-gce-hdfs'...
remote: Enumerating objects: 56, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 56 (delta 6), reused 56 (delta 6), pack-reused 0
Unpacking objects: 100% (56/56), done.



Directory structure:

# tree
.
├── ansible_hosts
├── ansible_playbook
│   ├── ansible_hosts
│   ├── group_vars
│   │   └── env_variables
│   ├── hosts
│   │   ├── playbooks
│   │   ├── configure_hdfs_nodes.yml
│   │   ├── configure_master_node.yml
│   │   ├── configure_worker_nodes.yml
│   │   ├── post_configure_k8s_cluster.yml
│   │   ├── pre_configure_hdfs_nodes.yml
│   │   ├── prerequisites.yml
│   │   └── setting_up_nodes.yml
│   ├── post_configure_nodes.yml
│   ├── setup_hdfs_nodes.yml
│   ├── setup_master_node.yml
│   ├── setup_worker_nodes.yml
│   └── storageclass.yaml
├── clean.sh
├── main.tf
├── modules
│   ├── gce-hdfs-disk
│   │   ├── main.tf
│   │   └── variables.tf
│   ├── gce-k8s-master
│   │   ├── main.tf
│   │   └── variables.tf
│   └── gce-k8s-worker
│       ├── main.tf
│       └── variables.tf
├── README.md
├── terraform.tfstate
├── terraform.tfstate.backup
└── variables.tf

7 directories, 30 files



  • Change the variable.tf according to your environment on Terraform root and modules.

  • Change environment variable file on ansible playbook according to your environment.

Terraform plan graph:





Left side continuation

 

Right side continuation



Terraform Modules used:

gce-hdfs-disk

gce-hdfs-disk module for to create  presistent disk on google cloud engine for to use it on Hadoop HDFS cluster

# gcloud compute disks list
NAME                    LOCATION    LOCATION_SCOPE  SIZE_GB  TYPE         STATUS
hdfs-journalnode-k8s-0  us-west1-a  zone            20       pd-standard  READY
hdfs-journalnode-k8s-1  us-west1-a  zone            20       pd-standard  READY
hdfs-journalnode-k8s-2  us-west1-a  zone            20       pd-standard  READY
hdfs-krb5-k8s           us-west1-a  zone            20       pd-standard  READY
hdfs-namenode-k8s-0     us-west1-a  zone            100      pd-standard  READY
hdfs-namenode-k8s-1     us-west1-a  zone            100      pd-standard  READY
kubernetes-master       us-west1-a  zone            10       pd-standard  READY
kubernetes-worker1      us-west1-a  zone            10       pd-standard  READY
kubernetes-worker2      us-west1-a  zone            10       pd-standard  READY
kubernetes-worker3      us-west1-a  zone            10       pd-standard  READY
zookeeper-0             us-west1-a  zone            5        pd-standard  READY
zookeeper-1             us-west1-a  zone            5        pd-standard  READY
zookeeper-2             us-west1-a  zone            5        pd-standard  READY




gce-k8s-master

gce-k8s-master module for to deploy Kubernetes master node on google cloud engine.
Centos 7 image with ssh key copy.
Install Kubernetes and initialize as master.



gce-k8s-worker

gce-k8s-worker module for to deploy Kubernetes 3 x worker node's on google cloud engine.
Centos 7 image with ssh key copy.
Install Kubernetes and add as worker to master.

Ansible playbook used:


  • prerequisites.yml
  • setting_up_nodes.yml
  • configure_master_node.yml
  • configure_worker_nodes.yml
  • post_configure_k8s_cluster.yml
  • pre_configure_hdfs_nodes.yml
  • configure_hdfs_nodes.yml


Automated deployment using Terraform:

# terraform init

# terraform plan

# terraform apply


Verification: