Monday, September 8, 2014

Resetting (Deleting and Cleaning Out) an Ambari Cluster

If you are experimenting with Ambari for Hadoop cluster provisioning, it is useful to be able to wipe the ambari server and agents clean so you can try again.  There are some commands provided by Ambari that you can run to do this, but there are also a couple of things to watch out for--detailed below.  These instructions worked for me on Ambari 1.6.1 with Redhat 6.5.

First, stop and reset on the Ambari server:

[root@test-ambari ambuser]# ambari-server stop
[root@test-ambari ambuser]# ambari-server reset

Next, to prevent a possible obscure "no more mirrors to try" error on re-provisioning, clean out yum cache on all the agent machines--as I showed here.  I have SaltStack installed so I can run it across my cluster like this (or just log into each machine and run 'yum clean all'):

[root@test-ambari ~]# salt '*' cmd.run 'yum clean all'

Then go to each Ambari agent machine and run the host cleanup.  It would be nice to do this with SaltStack, but that requires giving sudo tty permissions for the command (which I didn't want to get into).  

I'm showing some of the output below but you may see different behaviour depending on the particulars of the cluster and how far the prior provisioning process got:

[root@master-master ~]# python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent

Now restart the Ambari server:

[root@test-ambari ambuser]# ambari-server start

Now I don't know if this was documented anywhere, but I am using a script to provision my cluster via the API--and I found I had to wait until all the machines (agents) self-register with the ambari server (at least this is what I think is going on).  Here, I am using the ambari api,  piped through "wc", to monitor the count of of registered machines.  It took about 45 seconds for all the agents to register (when the count finally hit 4).

[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      2       6     102
[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      3       9     152
[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      3       9     152
[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      4      12     202

If you proceed before everything is registered, you may run into this error using the API:

  "status" : 400,
  "message" : "Attempted to add unknown hosts to a cluster.  These hosts have not been registered with the server: test-agent3.example.com"
At this point, you should have clean ambari server/agent cluster substrate to create the next cluster.  Happy provisioning!



Here are the commands with output:

Ambari-server  stop/reset:

[root@test-ambari ambuser]# ambari-server stop
Using python  /usr/bin/python2.6
Stopping ambari-server
Ambari Server stopped
[root@test-ambari ambuser]# ambari-server reset
Using python  /usr/bin/python2.6
Resetting ambari-server
**** WARNING **** You are about to reset and clear the Ambari Server database. This will remove all cluster host and configuration information from the database. You will be required to re-configure the Ambari server and re-run the cluster wizard. 
Are you SURE you want to perform the reset [yes/no] (no)? y
Confirm server reset [yes/no](no)? y
Resetting the Server database...
Connecting to local database...done.
WARNING: Non critical error in DDL, use --verbose for more information
Ambari Server 'reset' completed with warnings.

Yum cache cleaning:

[root@test-ambari ~]# salt '*' cmd.run 'yum clean all'
test-master.example.com:
    Loaded plugins: product-id, refresh-packagekit, rhnplugin, security,
    Cleaning repos: HDP-2.1 HDP-UTILS-1.1.0.17 Updates-ambari-1.6.1 ambari-1.x
                  : dogfood dogfood_6_x86-64 epel6_x86-64 rhel-x86_64-server-6
                  : rhel-x86_64-server-optional-6 rhel-x86_64-server-supplementary-6
    Cleaning up Everything
...SNIP

Host Cleanup (on the agents)--your output could be quite different:

[root@master-master ~]# python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent
INFO:HostCleanup:
Killing pid's: ['']
INFO:HostCleanup:Deleting packages: ['']
INFO:HostCleanup:
Deleting users: ['ambari-qa', 'yarn', 'hdfs', 'mapred', 'zookeeper']
INFO:HostCleanup:Executing command: sudo userdel -rf ambari-qa
INFO:HostCleanup:Successfully deleted user: ambari-qa
INFO:HostCleanup:Executing command: sudo userdel -rf yarn
INFO:HostCleanup:Successfully deleted user: yarn
INFO:HostCleanup:Executing command: sudo userdel -rf hdfs
INFO:HostCleanup:Successfully deleted user: hdfs
INFO:HostCleanup:Executing command: sudo userdel -rf mapred
INFO:HostCleanup:Successfully deleted user: mapred
INFO:HostCleanup:Executing command: sudo userdel -rf zookeeper
INFO:HostCleanup:Successfully deleted user: zookeeper
INFO:HostCleanup:Executing command: sudo groupdel hadoop
WARNING:HostCleanup:Cannot delete group : hadoop, groupdel: cannot remove the primary group of user 'tez'
INFO:HostCleanup:Path doesn't exists: /home/ambari-qa
INFO:HostCleanup:Path doesn't exists: /home/yarn
INFO:HostCleanup:Path doesn't exists: /home/hdfs
INFO:HostCleanup:Path doesn't exists: /home/mapred
INFO:HostCleanup:Path doesn't exists: /home/zookeeper
INFO:HostCleanup:
Deleting directories: ['']
INFO:HostCleanup:Path doesn't exists: 
INFO:HostCleanup:
Deleting repo files: []
INFO:HostCleanup:
Erasing alternatives:{'symlink_list': [''], 'target_list': ['']}
INFO:HostCleanup:Path doesn't exists: 

INFO:HostCleanup:Clean-up completed. The output is at /var/lib/ambari-agent/data/hostcleanup.result

Restart the Ambari server:

[root@test-ambari ambuser]# ambari-server start
Using python  /usr/bin/python2.6
Starting ambari-server
Ambari Server running with 'root' privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Waiting for server start...
sh: line 0: ulimit: open files: cannot modify limit: Operation not permitted
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Ambari Server 'start' completed successfully.

[root@test-ambari ambuser]# python AmbariApiScript.py 

2 comments:

simran said...

I struggled with the same issue for like weeks and then I decided to share it here: http://www.yourtechchick.com/hadoop/how-to-completely-remove-and-uninstall-hdp-components-hadoop-uninstall-on-linux-system/

Clark Updike said...

Actually, you are stripping it back further than I was. I still want Ambari installed and functional, but I want the cluster removed so I can install a new cluster using Ambari but without reinstalling it.