This post will walk through steps about Automated deployment of HDFS cluster, Kubernetes cluster, and Google Compute Engine instance's using Terraform and Ansible in 20 minutes.
Environment:
- Google compute engine (not GKE)
- Hadoop HDFS cluster running on Kubernetes cluster
- Zoo keeper - 3
- Name node - 2
- Data node - 3
- Journal node - 3
- Kubernetes cluster running on GCE
- Master node -1
- Worker node - 3
- Jump host with Vagrant Centos 7 running on laptop
Prerequisites:
- Install Ansible, Terraform, and git on jump host
# ansible --version
ansible 2.9.3
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /bin/ansible
python version = 2.7.5 (default, Aug 7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
# terraform version
Terraform v0.12.20
- Download the Terraform IaC and Ansible playbook from git repository
# git clone https://github.com/vdsridevops/terraform-k8s-gce-hdfs.git
Cloning into 'terraform-k8s-gce-hdfs'...
remote: Enumerating objects: 56, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 56 (delta 6), reused 56 (delta 6), pack-reused 0
Unpacking objects: 100% (56/56), done.
Directory structure:
# tree
.
├── ansible_hosts
├── ansible_playbook
│ ├── ansible_hosts
│ ├── group_vars
│ │ └── env_variables
│ ├── hosts
│ │ ├── playbooks
│ │ ├── configure_hdfs_nodes.yml
│ │ ├── configure_master_node.yml
│ │ ├── configure_worker_nodes.yml
│ │ ├── post_configure_k8s_cluster.yml
│ │ ├── pre_configure_hdfs_nodes.yml
│ │ ├── prerequisites.yml
│ │ └── setting_up_nodes.yml
│ ├── post_configure_nodes.yml
│ ├── setup_hdfs_nodes.yml
│ ├── setup_master_node.yml
│ ├── setup_worker_nodes.yml
│ └── storageclass.yaml
├── clean.sh
├── main.tf
├── modules
│ ├── gce-hdfs-disk
│ │ ├── main.tf
│ │ └── variables.tf
│ ├── gce-k8s-master
│ │ ├── main.tf
│ │ └── variables.tf
│ └── gce-k8s-worker
│ ├── main.tf
│ └── variables.tf
├── README.md
├── terraform.tfstate
├── terraform.tfstate.backup
└── variables.tf
7 directories, 30 files
- Change the variable.tf according to your environment on Terraform root and modules.
- Change environment variable file on ansible playbook according to your environment.
Terraform plan graph:
Left side continuation
Right side continuation
Terraform Modules used:
gce-hdfs-disk
gce-hdfs-disk module for to create presistent disk on google cloud engine for to use it on Hadoop HDFS cluster
# gcloud compute disks list
NAME LOCATION LOCATION_SCOPE SIZE_GB TYPE STATUS
hdfs-journalnode-k8s-0 us-west1-a zone 20 pd-standard READY
hdfs-journalnode-k8s-1 us-west1-a zone 20 pd-standard READY
hdfs-journalnode-k8s-2 us-west1-a zone 20 pd-standard READY
hdfs-krb5-k8s us-west1-a zone 20 pd-standard READY
hdfs-namenode-k8s-0 us-west1-a zone 100 pd-standard READY
hdfs-namenode-k8s-1 us-west1-a zone 100 pd-standard READY
kubernetes-master us-west1-a zone 10 pd-standard READY
kubernetes-worker1 us-west1-a zone 10 pd-standard READY
kubernetes-worker2 us-west1-a zone 10 pd-standard READY
kubernetes-worker3 us-west1-a zone 10 pd-standard READY
zookeeper-0 us-west1-a zone 5 pd-standard READY
zookeeper-1 us-west1-a zone 5 pd-standard READY
zookeeper-2 us-west1-a zone 5 pd-standard READY
gce-k8s-master
gce-k8s-master module for to deploy Kubernetes master node on google cloud engine.
Centos 7 image with ssh key copy.
Install Kubernetes and initialize as master.
gce-k8s-worker
gce-k8s-worker module for to deploy Kubernetes 3 x worker node's on google cloud engine.
Centos 7 image with ssh key copy.
Install Kubernetes and add as worker to master.
Ansible playbook used:
- prerequisites.yml
- setting_up_nodes.yml
- configure_master_node.yml
- configure_worker_nodes.yml
- post_configure_k8s_cluster.yml
- pre_configure_hdfs_nodes.yml
- configure_hdfs_nodes.yml
# terraform init
# terraform plan
# terraform apply
Verification: