VMware Identity Manager is a virtual appliance based on Suse enterprise 11. It should be working like a black-box and normally there is no need to get your hands dirty working the commandline interface, unless you’re upgrading the appliance. When you have implemented an Identity Manager cluster, there are more things that could break and need to be fixed. There is for example the rabbitmq-service which is configured as a cluster when you add more nodes. Whenever you are performing an upgrade, the rabbitmq-services need to be stopped on each node (and in a specific order) or this service can break.
There is also the Elasticsearch service, which is also configured as a cluster and is responsible for searching and creating reports. This service is the reason why VMware recommends at least 3 nodes in an Identity Manager cluster. If you have 2 nodes and 1 node goes down, the following limitations apply until the node is brought up again:
- The dashboard does not display data.
- Most reports are unavailable.
- Sync log information is not displayed for directories.
- The search field in the top-right corner of the administration console does not return any results.
- Auto-complete is not available for text fields.
You can see if the Elasticsearch an the Rabbitmq services are running correctly on each node if you go to the system diagnostic webinterface. On the bottom there are:
- Analytics Connection: Connection test successful (this is the Elasticsearch service)
- Messaging Connection: Connection test successful (this is the rabbitmq service)
After upgrading to 2.8.0 I encountered the following:
So I had a problem with the Elasticsearch service. When you just booted the appliances, this could take a while to give a “connection test successful” message, but this was taking too long. I entered the commandline on the first node and entered:
curl -XGET ‘https://localhost:9200/_cluster/health?pretty=true’
Status=red => this should be green!
VMware suggests that if Elasticsearch does not start correctly or its status is red, you should follow these steps to troubleshoot.
- Ensure port 9300 is open.
a. Update node details by adding the IP addresses of all nodes in the cluster to the /usr/local/horizon/scripts/updateiptables.hzn file:
ALL_IPS=”node1IPadd node2IPadd node3IPadd”
b. Run the following script on all nodes in the cluster.
- Restart Elasticsearch on all nodes in the cluster.
service elasticsearch restart
- Check logs for more details.
tail -f horizon.log
In my case, there are unassigned shards (unassigned_shards : 6). This is probably the reason why the status is not green. The documentation doesn’t suggest how to fix this, so I did the following:
First, let’s list all unassigned shards:
|curl -XGET ‘https://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason’||grep UNASSIGNED|
I decided to just delete the index “v3_2016-12-13 by entering:
curl -XDELETE ‘https://localhost:9200/v3_2016-12-13/’
I then checked the status again:
And it was now green!
I’m not sure if this is required, but I restarted the Horizon-workspace service: service horizon-workspace restart
After this, the connection test was successful again. I hope this helps others experiencing the same issue.