Skip to main content

Application Discovery

The Application Discovery analytic is used to facilitate Application entity discovery in your environment using a variety of algorithms. When run, The Application Discovery analytic produces a Zip file with two results files that will be used to create Application entities using Entity Import. Refer to the IO Administrator Guide for details on using Entity Import. The two files are mapping.csv, which is a roadmap for how IO will identify content in the second file, apps.csv.

Please note that the Application Discovery analytic does not itself create Applications using the existing application suggestion mechanism.

The export dialogue contains further instructions (see below).

Caution

The Application Discovery Analytic can only be run by one IO user at a time. If another users attempts to run the analytic while another user's run is in progress, an error will be issued.

Caution

This analytic can consume a large amount of system memory. If there is not sufficient memory available to complete a run, the user will be advised to increase the amount of memory allocated for their Virtual Edition, virtual machine by a specific amount. Note that the IO Virtual Edition Guide suggests allocating 64GB of RAM to the manager node rather than the default 32GB.

Increasing memory for physical Appliances (4210s and 4220s) is not an option, so the remedy is to limit the number of unique hosts and VMs that appear in the Pre-processed Host & VM Names preview (see below).

In general, the amount of time it takes to run the analytic depends on the number of unique, preprocessed names of hosts and VMs and on the clock speed of the ESX Host's CPU.

Running Application Discovery

  1. Start by running a new Migration Analysis by clicking the New Analytic > Application Discovery or Run New button.

    analytics-appd-1.png
  2. Select the algorithms you wish to run.

    1. Application Discovery. This algorithm matches host communication ports with the default communication ports of widely used applications.The algorithm works best when the applications are configured to use the default communications ports. Required integration: NetFlow.

    2. Entity Grouping Algorithms. These algorithms discover groups of hosts and VMs that can possibly form applications. Hostname Analysis is the only algorithm that works without NetFlow and NetStat data. It is recommended to run Application Discovery, before running entity grouping algorithms. There are three options in this category.

      1. Communication Heuristics. This algorithm discovers communication between different hosts, filters out unrelated traffic, and then identifies application components running on the hosts. The identified components are scored to find the relative confidence of the application. The results are presented as host and VM groups that can form applications. Required integrations: NetFlow and/or Operating System; both integrations are recommended.

      2. Used Ports and Services. This algorithm creates application suggestions based on well known services (based on port number), process names, and top talkers. The results are presented as host and VM groups that can form applications.The algorithm works best when the applications are configured to use the default communications ports.Required integrations: NetFlow and/or Windows, Linux, or Solaris Operating System.

      3. Hostname Analysis. This algorithm performs grouping of hosts and VMs based on hostname similarity. The algorithm works best when 30% of hosts and VMs have applications assigned.Required Integration: none.

    You may optionally check the Use Advanced Options checkbox, of which there are two, as shown below in the Advanced column.

    analytics-appd-2.png

    The first advanced option is self-explanatory.

    The second option applies to the Hostname Analysis algorithm and the following options are available.provides for using one or more regular expressions to filter hostnames already discovered by IO. Click the "+" (plus) button to add additional regular expressions. A Preview Results button is available to test your regular expressions.

    • Pre-Processing Rules - provides for using one or more regular expressions that can be used to remove matching patterns from hostnames already discovered by IO. Before regular expressions are applied, hostnames are converted to lowercase and split from the domain part of the hostname, if applicable. Click the "+" (plus) button to add additional regular expressions, up to a maximum of 20, and then select the Custom Regx option from the drop-down menu as shown below. A Preview Results button is available to validate the results of the applied regular expression.

      vw-api-8.png

      Note

      Regular expressions are processed by Python. Refer to the Python documentation for supported regular expression syntax.

      As shown below, a count of processed host and VM names is shown. Should you need to decrease the number of processed hosts to overcome memory or run-time constraints, you may need to use a more targeted regular expression. For example, if your environment has lots of VMs with names such as vm-UUID, where UUID is a unique identifier, you should consider using a regular expression that eliminates the -UUID part of the name.

      analytics-appd-preview.png
    • Name Analysis Algorithms - a checkbox that will automatically select the appropriate algorithms and parameters to use based on available data. The available algorithms are: ADA, a proprietary clustering algorithm and HCA, a well-known hierarchical clustering algorithm. These will be described in more detail below.

      Important

      This control may be grayed out if less than 30% of the host entities have been already assigned to Application entities. If more than 30% of the host entities are already assigned to Application entities, then this option will automatically run both ADA and HCA for different sets of parameter values through their respective ranges and then compare the results with the groups of the hosts already assigned to applications to identify the best matches of hosts/virtual machines to applications.

      • ADA, Proprietary Clustering Algorithm developed by Virtana. The Name grouping Threshold control shown below impacts the size of discovered host and virtual machine groups. The larger the threshold the more groups with fewer hosts and virtual machines will be created. The recommended range for the threshold values is from 0.6 to 0.9. The default value is 0.75.

        analytics-appd-ada-control.png
      • HCA, a well-known, non-proprietary clustering algorithm that implements Hierarchical Clustering Analysis. The Name Grouping Threshold control shown below impacts the size of discovered host and virtual machine groups. The smaller the threshold the more groups with fewer hosts and virtual machines will be created. A value of 1 means that all hosts and virtual machines will be placed in a single group. The recommended range for the threshold values is from 0.15 to 0.3. The default value is 0.22.

        analytics-appd-hca-control.png
  3. Select a time range for the export by clicking on the date field.

    The ideal date range for extracting the most granular data (5-minute) is 30 days. The granularity of data extracted for ranges over 30 days will be based on IO's data persistence policy.

    vwanalytics-migrationanalysis3.png
  4. Click Start Discovery to start the application discovery process.

    The process runs in the background. You can choose to be notified when it is complete by checking the Notify me when completed box.

    analytics-appd-3.png

    A message is displayed when the analysis is complete. Click on the Export Results button to retrieve the results Zip file. The following dialog will be displayed that contains further instructions.

    analytics-appd-4.png

    The Zip file will be named similar to the following: Application_Discovery_Result_4ccdb60c-0d9e-4524-88dc-f81155fa0686.zip . As noted above, it will contain two files: apps.csv and mapping.csv.