Sunday, May 14, 2017

ElasticSearch: Issues Adding a new node

The Issue

The new node was visible on the cluster but existing shards were not relocating to the new node

The steps

I had a pre-existing elasticsearch cluster of 3 nodes, and I went about adding a new node. In a round-robin fashion, I updated the elasticsearch.yml configuration of the pre-existing nodes to include the new node by updating the list of hosts and the minimum number of master nodes:

elasticsearch.yml
 discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4"]  
 discovery.zen.minimum_master_nodes: 3  

Restarting each node, and checking the health status as follows:


 [root@mongo-elastic-node-1 centos]# curl -XGET 10.0.0.1:9200/_cluster/health?pretty  
 {  
  "cluster_name" : "cpi",  
  "status" : "green",  
  "timed_out" : false,  
  "number_of_nodes" : 4,  
  "number_of_data_nodes" : 4,  
  "active_primary_shards" : 40,  
  "active_shards" : 71,  
  "relocating_shards" : 2,  
  "initializing_shards" : 0,  
  "unassigned_shards" : 0,  
  "delayed_unassigned_shards" : 0,  
  "number_of_pending_tasks" : 0,  
  "number_of_in_flight_fetch" : 0,  
  "task_max_waiting_in_queue_millis" : 0,  
  "active_shards_percent_as_number" : 100.0  
 }  

The important item to notice from above, is the bit about "relocating_shards". Here it's saying that the cluster is relocating 2 shards. To find out which shards are going where, you can check with this command:

 [root@mongo-elastic-node-1 centos]# curl -XGET http://10.0.0.9:9200/_cat/shards | grep RELO  
  % Total  % Received % Xferd Average Speed  Time  Time   Time Current  
                  Dload Upload  Total  Spent  Left Speed  
 100 7881 100 7881  0   0  318k   0 --:--:-- --:--:-- --:--:-- 334k  
 cpi12         2 p RELOCATING 6953804  5.8gb 10.0.0.2 cpi2 -> 10.0.0.4 fBmdkD2gT6-jTJ6k_bEF0w cpi4  
 cpi12         0 r RELOCATING 6958611  5.5gb 10.0.0.3 cpi3 -> 10.0.0.4 fBmdkD2gT6-jTJ6k_bEF0w cpi4  

Here's it's saying that cluster is trying to send shards belonging to the index called cpi12 from node cpi3 and node cpi2 to node cpi4. More specifically, it's trying to send shard #2 and shard #0 by RELOCATING them to cpi4. To monitor it's progress, I would login into cpi4 and see if the diskspace usage was going up. And here is where I noticed my first problem:


 [root@elastic-node-4 elasticsearch]# df -h  
 Filesystem   Size Used Avail Use% Mounted on  
 /dev/vdb     69G  52M  66G  1% /mnt  

The mounted folder where I expected to find my elasticsearch data remained unchanged at 52 MB.

Debugging

I remained stumped on this one for a long time and did the following checks:

  • The elasticsearch.yml config file for every node ensuring that discovery.zen.ping.unicast.hosts was correctly.
  • Every node could ping the new node and vice versa.
  • Every node could access ports 9200 and 9300 on the new node and vice-versa using the telnet command.
  • Every node had sufficient diskspace for the shard relocation
  • New node had the right permissions to write to it's elasticsearch folder
  • Check cluster settings: curl 'http://localhost:9200/_cluster/settings?pretty' and look for cluster.routing settings
  • Restarted elasticsearch on each node 3 times over
However, none of the above solved the issue. Even worse, the repeated restarts of each node, managed to get my cluster into an even worse state where now some of shards became UNASSIGNED:

 [root@mongo-elastic-node-1 bin]# curl -XGET http://10.0.0.1:9200/_cat/shards | grep UNASS  
  % Total  % Received % Xferd Average Speed  Time  Time   Time Current  
                  Dload Upload  Total  Spent  Left Speed  
 100 5250 100 5250  0   0  143k   0 --:--:-- --:--:-- --:--:-- 146k  
 .marvel-es-2017.05.13 0 p UNASSIGNED  
 .marvel-es-2017.05.13 0 r UNASSIGNED  
 .marvel-es-2017.05.14 0 p UNASSIGNED  
 .marvel-es-2017.05.14 0 r UNASSIGNED  
 cpi14         1 p UNASSIGNED  
 cpi13         1 p UNASSIGNED  
 cpi13         4 p UNASSIGNED  

After much browsing on the web, there was one forum that mentioned the state of the plugins on all nodes must be exactly the same as referenced from here: http://stackoverflow.com/questions/28473687/elasticsearch-cluster-no-known-master-node-scheduling-a-retry

The solution

The question about the plugins got my memory jogging where I had previously installed the marvel plugin. To see what plugins are installed for each node, run the plugin command from the command-line:

 [root@elastic-node-3 elasticsearch]# cd /usr/share/elasticsearch/bin  
 [root@elastic-node-3 bin]# ./plugin list  
 Installed plugins in /usr/share/elasticsearch/plugins:  
   - license  
   - marvel-agent  

It turned out my pre-existing 3 nodes each had the license and marvel-agent plugins installed. Whereas the fresh install of the 4th node had no plugins at all. Because of this, the nodes were able to acknowledge each other, but refused to talk. To fix this, I manually removed the plugins for each node:

 [root@elastic-node-3 bin]# ./plugin remove license  
 -> Removing license...  
 Removed license  
 [root@elastic-node-3 bin]# ./plugin remove marvel-agent  
 -> Removing marvel-agent...  
 Removed marvel-agent  

Before I could see if shard relocation would work, I first had to assign the UNASSIGNED shards:

 [root@mongo-elastic-node-1 elasticsearch]# curl -XPOST -d '{ "commands" : [{ "allocate" : { "index": "cpi14", "shard":1, "node":"cpi4", "allow_primary":true } }]}' localhost:9200/_cluster/reroute?pretty  

I had repeat this command for every UNASSIGNED shard. Checking the cluster health, I could see that there were no more unassigned shards, and that there were 2 shards currently relocating:

 [root@elastic-node-4 elasticsearch]# curl -XGET localhost:9200/_cluster/health?pretty  
 {  
  "cluster_name" : "cpi",  
  "status" : "green",  
  "timed_out" : false,  
  "number_of_nodes" : 4,  
  "number_of_data_nodes" : 4,  
  "active_primary_shards" : 40,  
  "active_shards" : 71,  
  "relocating_shards" : 2,  
  "initializing_shards" : 0,  
  "unassigned_shards" : 0,  
  "delayed_unassigned_shards" : 0,  
  "number_of_pending_tasks" : 0,  
  "number_of_in_flight_fetch" : 0,  
  "task_max_waiting_in_queue_millis" : 0,  
  "active_shards_percent_as_number" : 100.0  
 }  

Again, checking the diskspace usage on the new node this time showed that shards were indeed relocating this time! Yay!

References

http://stackoverflow.com/questions/23656458/elasticsearch-what-to-do-with-unassigned-shards

http://stackoverflow.com/questions/28473687/elasticsearch-cluster-no-known-master-node-scheduling-a-retry

https://www.elastic.co/guide/en/elasticsearch/plugins/2.2/listing-removing.html

Wednesday, May 3, 2017

MongoDB switching to WireTiger storage engine

We were already running in production with a mongodb cluster of 3 nodes (replicated) which were running out of diskspace, each node having access to a 750 GB drive at 77% usage. The obvious solution was to expand the diskspace, but at the same time I wanted to be more efficient with the disk space usage itself.

Previously we were using the storage engine callled MMAPv1 which had no support for compression and I wanted to switch over to the WireTiger storage engine which does have support for compression options.

Here I describe the strategy I used :


Since my mongoDB cluster was replicated, I was able to take down one node at a time to perform the switch over to WiredTiger. Once I was finished with one node, I could bring it back up, and take down the next node, and so on until all nodes were upgraded. By doing it this way, there was no downtime whatsoever from the perspective of the user.


For each node I did the following:



  • Shutdown the mongod service
  • Moved the mongo data folder, which in my case was /var/lib/mongo, to another volume attached storage for backup purposes in case procedure fails.
  • Recreate the mongo data folder, in my case /var/lib/mongo and assign the appropriate permissions: chown mongod:mongod /var/lib/mongo
  • Modify the /etc/mongod.conf configuration file to include the following: storageEngine=wiredTiger
  • Restart mongod service
  • Check wiredTiger is configured correctly using the mongo command-line:
 db.serverStatus().storageEngine  
 { "name" : "wiredTiger", "supportsCommittedReads" : true }  

Now that the node is back up and running, replication will happen in the background. If you head over to your primary mongo node, and type rs.status() and you should see a status of STARTUP2.
Once the node has replicated successfully, repeat the same procedure for the next node.

Reference:


https://docs.mongodb.com/v3.0/release-notes/3.0-upgrade/?_ga=1.86531032.1131483509.1428671022#change-replica-set-storage-engine-to-wiredtiger

https://askubuntu.com/questions/643252/how-to-migrate-mongodb-2-6-to-3-0-with-wiredtiger