Using a Dashboard to Identify Application Infrastructure Issues
You can use IO to quickly identify issues in your infrastructure. To get you started, here is an example of how you can use a standard dashboard, the Application Health by Tier dashboard, to review the health of your application infrastructure, identify issues, and drill down and view related open cases and investigations.
Navigate to the Application Health by Tier Dashboard
The Application Healthy by Tier dashboards shows you all the applications and tiers with issues. The Platinum tier has two applications with critical issues, EHR and Ordering System. Let's look at the open cases on the Ordering System application.
Drill down to view the Ordering System application's open cases.
There are three open cases. Two are based on single metric alarm rules while one is based on the Exchange Performance rule template. Let's look at that open case.
The application experienced some read latency during a one hour period. The Primary Rule shows us the conditions required for the alarm to trigger. Thirteen events were recorded.
Select the Latest Alarms tab to view the event details.
This type of alarm rule does not include an investigation or any trend charts so we need to use other features in the platform to investigate the alarm.
Possible Causes of Application Performance Issues
Flow control on storage ports
High CPU utilization on non-ESX hosts
Incorrectly set HBA queue depth settings
High utilization on HBA ports
Speed mismatches between HBA ports and storage ports (FC SAN)
View Issues on Related Application Infrastructure
We can use the Topology feature to determine if there are issues on the related application infrastructure.
Use the Topology button on the open case page to view the impacted application's end-to-end topology.
The Default view is loaded. All related infrastructure, including other applications and tiers are shown. Switch to the Application – Fibre Channel view to focus on the application.
Review the alarms on the Fibre Channel infrastructure
There are warnings on two of the HBA ports, critical alarms on hosts and sub-entities, and critical alarms on 5 of the 7 storage ports.
Since we are interested in flow control events on the storage ports, let's review the open cases on the impacted storage ports.
The PowerMax storage array's ports all show a Link Buffer-to-buffer Credits alarm that matches the date and time of the application's performance alarm. We'll investigate these alarms first. Drill down on one of the open cases.
The time frame of the storage port's alarm matches that of the application's alarm. We can use the slider on the Master/Detail trend chart to view the event.
We can use the investigation to troubleshoot this issue.
The investigation detected a speed mismatch between the HBA ports connected to the storage port.
The investigation also ran Queue Solver and determined that changing the queue depth settings on the HBA port may also improve performance.
Review the alarms on the VMware infrastructure
Switch to the Application - VMware topology view and expand the view.
There are alarms on ESX hosts and VMs.
Review the alarms on the ESX hosts
Both ESX hosts have Exchange Performance alarms that occur at the same time as the application's Exchange Performance alarm. These are likely all the same performance event but are shown in IO as separate cases because they are triggered by different alarm rules that were set on the different entity types.
Review the alarm statistics and investigations.
Is the vSphere cluster imbalanced in CPU utilization?
The automated investigation found that rebalancing the VMs on the cluster would not improve the CPU utilization on the host.
Are there VMs on this ESX host that have a runaway process?
The investigation does not reveal any runaway processes.
Review the alarms on the ESX VM
The ESX VM has a number of CPU utilization open cases. Let's review the most recent one.
Review the alarm statistics and investigations.
Is there a runaway process?
There is no runaway process on the VM.
Is there insufficient vCPU for the workload on this VM?
vCPU appears to be sufficient.