HPE Diagnostics User Interface freezes or hangs

Having launched the HPE Diagnostics User Interface (UI) by clicking on “Open Diagnostics” (or “Open in This Window”) and entering a user name and password, performing an action such as viewing the data for long periods (days or weeks) for all Java Probes may result in the UI freezing and no further actions being possible. This issue may occur in a larger Diagnostics environment where there are many Mediators and numerous probes.

Once the Diagnostics UI has frozen no recovery is possible and the browser tab displaying the UI must be closed. The UI can then be opened in a separate browser tab (or new browser instance) however it is likely that the freeze behavior will re-occur.

A possible cause is that the Java applet which runs the Diagnostics UI has insufficient memory such that while the UI is performing the user action, the applet process runs out of memory and terminates. When this process terminates, the UI remains visible in the browser but is no longer functional.

To confirm that the Java applet process is terminating, open the Windows Task Manager and locate the process named “jp2launcher.exe”. This process will appear while the Diagnostics UI is being initialised (after the user name and password are validated):

image text

If the “jp2launcher.exe” is no longer visible when the Diagnostics UI freezes, then the process may have run out of memory.

A workaround for this issue is to increase the heap (memory) available to the Java applet. This can be done by modifying the Java runtime parameters as follows:

1) Open the Windows Control Panel and search using “Java” to locate the “Java (32-bit)” Control Panel entry. Click on the entry to open the Java Control Panel:

image text

Note: If there is no “Java (32-bit)” Control Panel entry this may be due to there being multiple Java versions installed on the client platform. Locate the Java 32-bit installed in a folder under “C:\Program Files (x86)” (typically “C:\Program Files (x86)\Java\jre7\bin”) and execute the file “javacpl.exe” using right-click “Run as administrator”.

2) Click on the “Java” tab and then on the “View” button to open the “Java Runtime Environment Settings” dialog.

3) The default heap size is typically 256MB however this may be system dependent. The actual process memory size can be seen in the Windows Task Manager before the process terminates. Specify a larger heap size and test to see if the UI freeze remains present – increase as needed. The following example confgure a 700MB heap however up to 1GB (“-Xmx1G”) may be used (as this is a 32-bit process memory is limited):

image text

Reduce instrumentation overhead

First of all make sure you are instrumenting is appropriate – for example the basic recommendation is not to instrument get/set calls. These are simply returning or setting a single value, very fast. For transactions with a very small transaction time you wouldn’t need to instrument for performance.

Then note that Diagnostics is designed to use the level of instrumentation that will provide adequate information to troubleshoot a temporary or hard to reproduce performance issue while imposing a low overhead that can be tolerated in most production environments.

To achieve this goal, Diagnostics provides two mechanisms which automatically adjust data collection in response to the performance characteristics of the currently executing server request.

The first such mechanism is latency-based trimming. If a particular invocation of an instrumented method is fast, the invocation is not reported (there will be no corresponding node in the Call Profile). This cuts the overhead substantially, as the Diagnostics Agent does not have to create the necessary object and place it in the call tree. At the same time, it is assumed that such fast calls are of no interest to the user who is interested in pinpointing performance issues. You can adjust the reporting threshold (51 ms by default) to eliminate some of these types of fast calls (presented by very thin bars in the call profile). These calls have relatively high overhead, and probably do not provide any useful information which can help diagnose performance issues.

Another automatic data collection mechanism is stack trace sampling (for Java 1.5 or later). This feature reports long running methods even if they are not instrumented. Thus by enabling this feature, and tuning it to provide adequate level of information, the user can turn off some of the instrumentation and trust that any potential performance issues in this module will be reported by stack trace sampling.

As far as light-weight code injection, we do exactly that. Our instrumentation is as light-weight as possible. One should realize though that a major portion of the overhead is caused just by taking a time stamp (which is necessary to calculate the latency).

Diagnostics thresholds

Each of the numeric metric data for an entity (CPU of a host, heap used in a VM…) can have a threshold value set. Threshold is evaluated against the metric data points received, usually every 5 seconds. The metric with a threshold set will have one of the following status levels: Green, Yellow and Red. The entity’s status is derived from all its metric statuses according to worst-child rules (if any metric for the entity is red, the entity is red).

As long as the metric value does not exceed the threshold the status remains Green. If 3 or more metric data points are beyond the threshold the status turns to Yellow. If the average metric value within the last 5 minutes is beyond the threshold the status becomes Red. Once the 5 minute average goes below the threshold the status becomes Green again. Note that Diagnostics status does not revert to Yellow it goes directly back to Green.

The threshold values for metrics are configurable in the UI (details pane) and some metrics also have default thresholds set. The default threshold configuration is set in the server’s etc directory in thresholds.configuration.

If you need to set thresholds on specific methods you would want to add a separate entry in the points file for each method and this will allow you to set up thresholds and alerts for the specific method.

Get user access information

You can get a list of active users seen by the Diagnostics server in the last 60 seconds. And you can see the Queries/sec indicating how much load the user generates with summary or trend queries.

From the main Diagnostics UI select Configure Diagnostics and the Components page is displayed. (You can also access this Components page by selecting the Maintenance link in any Diagnostics view). Select the query link and then select the Active Users link at the bottom of that page to display a list of active users. Also this data is under Mercury System groupby.

Identify load balancing issues in a cluster

Assuming you’ve put all probes for the JVMs in the cluster into the same probe group then in the Aggregate Server Request view, you can add the count metric to the entity table which tells you the total number of requests across the cluster. You can drill into the aggregate server request to see the server request performance in each JVM. Again use the count metric to see the number of server request instances for each JVM. 

Diagnostics and SiteScope integration

What port does the Diagnostics/SiteScope integration use? You point SiteScope to a Diagnostics MEDIATOR on the standard 2006 port. For example you’d set Receiver URL to: http://meditor.customer.com:2006/metricdata/siteScopeData.

And once I tag an existing SiteScope monitor with this Diagnostics integration will I need to do any restarts or touch my files? No there is no need to bounce any servers or touch any files.

When you first try to view SiteScope data in the Diagnostics External Monitors view, by default the monitor’s status is gray and no data is graphed. To see a status (red, yellow, green), you must first set a threshold on a metric (in the details pane). To see data in the graph, you must first select a metric to be charted (in the details pane).