Exception Signal 11 while starting Java probe with the WebSphere

Immediately upon starting the WebSphere application server with Diagnostics Probe, the JVM crashes. In the dump file from the crash, if "Exception Signal 11" is observed in WebSphere logs.

E.g.

JVMDG217: Dump Handler is Processing Signal 11 – Please Wait.

JVMDG303: JVM Requesting Java core file

JVMDG304: Java core file written to /opt/WebSphere/AppServer/javacore.20101129.161814.22608.txt

JVMDG215: Dump Handler has Processed Exception Signal 11.

"signal 11 received" is seen in java core dump logs too.

Process level CPU usage metrics are not supported in probe for Linux 1.4.2 core, and though the probe should detect this initially, in some cases this didn’t happen.

Comment out the following metrics in etc/metrics.config file and restart the application/probe again:

ProcessMetrics/processCpuUtil=ProcessCpuUtil|percent|Probe

ProcessMetrics/processCpuUtilAbs = ProcessCpuUtilAbs|percent|Probe

Advertisements

Error: “RoleBasedAuth E SECJ0306E: No received or invocation credential exist on the thread. The Role based authorization check …”

When using the Diagnostics 8.04 Java Agent to probe a WebSphere 6.1 Application Server (WAS), if the following error messages are repeatedly written to the WAS SystemOut.log:

[3/19/11 18:35:09:625 CET] 00000028 RoleBasedAuth E SECJ0306E: No received or invocation credential exist on the thread. The Role based authorization check will not have an accessId of the caller to check. The parameters are: access check method getState on resource Server and module Server. The stack trace is java.lang.Exception: Invocation and received credentials are both null

at com.ibm.ws.security.role.RoleBasedAuthorizerImpl.checkAccess(RoleBasedAuthorizerImpl.java:287)

at com.ibm.ws.management.AdminServiceImpl.preInvoke(AdminServiceImpl.java:2061)

at com.ibm.ws.management.AdminServiceImpl.preInvoke(AdminServiceImpl.java:1900)

at com.ibm.ws.management.AdminServiceImpl.preInvoke(AdminServiceImpl.java:1800)

at com.ibm.ws.management.AdminServiceImpl.preInvoke(AdminServiceImpl.java:1773)

at com.ibm.ws.management.AdminServiceImpl.getAttribute(AdminServiceImpl.java:735)

at com.ibm.ws.management.AdminServiceImpl.getAttribute(AdminServiceImpl.java:702)

at com.ibm.ws.management.PlatformMBeanServer.getAttribute(PlatformMBeanServer.java:662)

at com.mercury.diagnostics.capture.metrics.jmx.WebSphereJMXCollector.isStartupCompleted(WebSphereJMXCollector.java:158)

at com.mercury.diagnostics.capture.metrics.jmx.WebSphereJMXCollector.doInitialize(WebSphereJMXCollector.java:65)

at com.mercury.diagnostics.capture.metrics.jmx.JMXCollector.initialize(JMXCollector.java:139)

at com.mercury.diagnostics.capture.metrics.CollectorControl.initialize(CollectorControl.java:385)

at com.mercury.diagnostics.capture.metrics.CollectorAgent.validateInitialization(CollectorAgent.java:912)

at com.mercury.diagnostics.capture.metrics.CollectorAgent.run(CollectorAgent.java:681)

at java.lang.Thread.run(Thread.java:810)

.

[3/19/11 18:35:09:633 CET] 00000028 RoleBasedAuth A SECJ0305I: The role-based authorization check failed for admin-authz operation Server:getState. The user UNAUTHENTICATED (unique ID: unauthenticated) was not granted any of the following required roles: adminsecuritymanager, operator, iscadmins, deployer, administrator, monitor, configurator.

[3/19/11 18:35:09:695 CET] 00000028 ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl initialize FFDC0009I: FFDC opened incident stream file /prod/IBM/websphere61/AppServer/profiles/mpeprodmpeasp06/logs/ffdc/ap1ga0f1_00000028_11.03.19_18.35.09_0.txt

[3/19/11 18:35:09:711 CET] 00000028 ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl resetIncidentStream FFDC0010I: FFDC closed incident stream file /prod/IBM/websphere61/AppServer/profiles/mpeprodmpeasp06/logs/ffdc/ap1ga0f1_00000028_11.03.19_18.35.09_0.txt

[3/19/11 18:35:09:771 CET] 00000028 ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl open FFDC0009I: FFDC opened incident stream file /prod/IBM/websphere61/AppServer/profiles/mpeprodmpeasp06/logs/ffdc/ap1ga0f1_00000028_11.03.19_18.35.09_1.txt

[3/19/11 18:35:09:787 CET] 00000028 ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl resetIncidentStream FFDC0010I: FFDC closed incident stream file /prod/IBM/websphere61/AppServer/profiles/mpeprodmpeasp06/logs/ffdc/ap1ga0f1_00000028_11.03.19_18.35.09_1.txt

WebSphere 6.1 uses role-based security to protect access to the MBeanServer when administrative security is ‘on’. This causes security exceptions when the Diagnostics JMX collectors access the MBeanServer. A workaround for this issue is included in the Diagnostics WebSphere6JMXCollector. However, when the Diagnostics Java Agent runs with WAS 6.1, both the Diagnostics WebSphere5JMXCollector and WebSphere6JMXCollector find the same WAS MBeanServer instance. The errors are reported as the WebSphere5JMXCollector does not include the required workaround.

To remove this error, modify the file "metrics.config" in the probe configuration "/etc" directory and either comment out or remove the metrics entries for the WebSphere V5.x Collector, that is, comment/delete all entries beginning with "WebSphere5" for instance:

WebSphere5/beanModule.creates = EJB Creates|count|EJB

Diagnostics Commander jetty.log shows several out of threads error message

If there are several "OUT OF THREADS" and "LOW ON THREADS" messages in the Commander jetty.log file and increasing jetty.threads.max from 200 to 600 in the webserver.properties file also does not resolve the problem, the reason is because there is not enough connections for the amount of probes. Each mediator needs 40 connections per probe plus 40.

Steps to take for the out of threads issue on the Diagnostics Commander Server:

• Edit the file "<MercuryDiagnostics_Install>\Server\etc\webserver.properties" and set:

jetty.threads.max=1000

• Stop the Diagnostics application running on Diagnostics commander server,

• Edit the file "<MercuryDiagnostics_Install>\Server\nanny\windows\dat\nanny\server.nanny" and replace the line:

start_nt=<MercuryDiagnostics_Install>\Server\jre\bin\javaw.exe^ -server -Xmx1200m -Dsun.net.client.defaultReadTimeout=70000 -Dsun.net.client.defaultConnectTimeout=30000 "-javaagent:<MercuryDiagnostics_Install>\Server\probe\lib\probeagent.jar" -classpath "<MercuryDiagnostics_Install>\Server\lib\mediator.jar;<MercuryDiagnostics_Install>\Server\lib\loading.jar;<MercuryDiagnostics_Install>\Server\lib\common.jar;<MercuryDiagnostics_Install>\Server\lib\mercury_picocontainer-1.1.jar" com.mercury.opal.mediator.util.DiagnosticsServer

with:

start_nt=<MercuryDiagnostics_Install>\Server\jre\bin\javaw.exe^ -server -Xmx1024m -Xms1024m -XX:MaxNewSize=448m -XX:NewSize=448m -XX:SurvivorRatio=6 -Dsun.net.client.defaultReadTimeout=70000 -Dsun.net.client.defaultConnectTimeout=30000 "-javaagent:<MercuryDiagnostics_Install>\Server\probe\lib\probeagent.jar" -classpath "<MercuryDiagnostics_Install>\Server\lib\mediator.jar;<MercuryDiagnostics_Install>\Server\lib\loading.jar;<MercuryDiagnostics_Install>\Server\lib\common.jar;<MercuryDiagnostics_Install>\Server\lib\mercury_picocontainer-1.1.jar" com.mercury.opal.mediator.util.DiagnosticsServer

• Start the Diagnostics application on Diagnostics commander server.

Where "<MercuryDiagnostics_Install>" is, for example, "E:\MercuryDiagnostics".

Diagnostic support on VMware

It should be possible to install the Diagnostic Server on a VMware if the machine’s MAC address is static.

Diagnostics server’s license depends on the MAC address which by default is dynamic on a VMware. Please contact your system administrator to perform the necessary adjustments to make the VMware MAC address static. The following VMware knowledge base article describes how to do so:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=219

The installation of the Mediator component should not be a problem from a licensing point of view as Diagnostic license uses the Diagnostic Server’s MAC address and not the Mediator’s MAC address.

Connection Failure: The connection to the named instance has failed

Connection Failure: The connection to the named instance has failed. Error: java.net.SocketTimeoutException: Receive timed out. The error occurs when we specify a named Instance for SQL collector in sqlserver-config.xml

The presence of the instanceName is failing to connect to the Database because of a bug in the JDBC driver.

The issue is fixed with Diagnostics 8.03 and above

List of probes displayed in the Diagnostics UI differ from that displayed by the registrar

In an environment containing several hundred HP Diagnostics probes, if you observe that the list of probes displayed using the Diagnostics UI on the Commander server using the view "Entire Enterprise > Probes" is compared with that displayed by the registrar using:

http://<Commander_name>:2006/registrar/view_components (filtered by just probes)

in some circumstances more probes may appear in the Diagnostics UI view than in the registrar.

The list of probes that is displayed using the Diagnostics UI "Entire Enterprise > Probes" view is based on the information contained in the Diagnostics time series database (TSDB) and consequently is a more historical view of the probes present within a Diagnostics environment. In particular, the TSDB may contain probe data relating to a period when a probe was present that has subsequently been decommissioned or renamed. Thus some probes may appear in the Diagnostics UI that are no longer present within the Diagnostics environment.

However the registrar view that is returned using:

http://<Commander_name>:2006/registrar/view_components

is based on the current Diagnostics environment and while it shows whether each probe is active or not at the time the report is generated, it will reflect more accurately the probes that are present at that time in the Diagnostics environment. Thus it is a more reliable report than that produced by the Diagnostics UI.

However any probes that have lost connectivity to their Mediator or Commander for an extended period of time may not be listed and this may indicate that further investigation is required to determine the underlying reason for this.