ClassNotFoundException when trying to start topology - java

This is a strange problem that I'm hoping someone's encountered and overcome.
I have a relatively simple topology:
Spout -> BoltA -> BoltB
-> BoltC
If I use bolts A and B, it works. A and C works too. Enabling all three though gives me this:
16:05:20.627 [main] WARN backtype.storm.daemon.nimbus - Topology submission exception. (topology name='traffic') #<RuntimeException java.lang.RuntimeException: java.lang.ClassNotFoundException: backtype.storm.daemon.nimbus$normalize_conf$get_merged_conf_val__3916$fn__3917>
16:05:20.634 [main] ERROR o.a.zookeeper.server.NIOServerCnxn - Thread Thread[main,5,main] died
java.lang.RuntimeException: java.lang.ClassNotFoundException: backtype.storm.daemon.nimbus$normalize_conf$get_merged_conf_val__3916$fn__3917
at backtype.storm.utils.Utils.deserialize(Utils.java:88) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.daemon.nimbus$read_storm_conf.invoke(nimbus.clj:89) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.daemon.nimbus$start_storm.invoke(nimbus.clj:724) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at backtype.storm.daemon.nimbus$eval3974$exec_fn__1459__auto__$reify__3987.submitTopologyWithOpts(nimbus.clj:962) ~[na:na]
at backtype.storm.daemon.nimbus$eval3974$exec_fn__1459__auto__$reify__3987.submitTopology(nimbus.clj:971) ~[na:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_55]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_55]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_55]
I'm running it locally using the code below:
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(topoName, conf, buildTopology(source, service));
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
cluster.killTopology(topoName);
cluster.shutdown();
I'm running against storm-core-0.9.1-incubating.jar
All I could find was this discussion about it "being a classpath issue". The fact that it only happens after adding a third bolt, but it doesn't matter which bolt, really confuses me. Is there something that Storm does once it has a "sufficiently complex topology" that kicks in at 3 bolts or 4 components?
EDIT
Well, it gets weirder. After finding someone with the same problem I tried their solution and added to my logback.xml:
<logger name="backtype.storm.daemon.nimbus" level="INFO" />
which fixed the problem. Anything higher (WARN, ERROR) brings it back. Lower (DEBUG) is fine. Removing this line and setting the root logger to DEBUG brings the problem back, so I'm really confused. If something in the code path depends on logger.isInfoEnabled() shouldn't it act the same for rootLogger=INFO and backtype.storm.daemon.nimbus=INFO?

Related

How to know from which activity an error comes from my application in production?

My application is 2 months in production and I was checking the section that shows the play console crashing and I have been able to solve some problems, but there are some in which I do not know exactly what activity they come from, mi error is java.lang.IllegalStateException and when I press see more:
java.lang.IllegalStateException:
at androidx.work.impl.utils.ForceStopRunnable.run (ForceStopRunnable.java:115)
at androidx.work.impl.utils.SerialExecutor$Task.run (SerialExecutor.java:91)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:641)
at java.lang.Thread.run (Thread.java:919)
Caused by: android.database.sqlite.SQLiteCantOpenDatabaseException:
at android.database.sqlite.SQLiteConnection.nativeOpen (Native Method)
at android.database.sqlite.SQLiteConnection.open (SQLiteConnection.java:300)
at android.database.sqlite.SQLiteConnection.open (SQLiteConnection.java:218)
at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked (SQLiteConnectionPool.java:737)
at android.database.sqlite.SQLiteConnectionPool.open (SQLiteConnectionPool.java:284)
at android.database.sqlite.SQLiteConnectionPool.open (SQLiteConnectionPool.java:251)
at android.database.sqlite.SQLiteDatabase.openInner (SQLiteDatabase.java:1386)
at android.database.sqlite.SQLiteDatabase.open (SQLiteDatabase.java:1331)
at android.database.sqlite.SQLiteDatabase.openDatabase (SQLiteDatabase.java:967)
at android.database.sqlite.SQLiteDatabase.openDatabase (SQLiteDatabase.java:955)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked (SQLiteOpenHelper.java:448)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase (SQLiteOpenHelper.java:391)
at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper$OpenHelper.getWritableSupportDatabase (FrameworkSQLiteOpenHelper.java:145)
at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper.getWritableDatabase (FrameworkSQLiteOpenHelper.java:106)
at androidx.room.RoomDatabase.beginTransaction (RoomDatabase.java:352)
at androidx.work.impl.utils.ForceStopRunnable.cleanUp (ForceStopRunnable.java:156)
at androidx.work.impl.utils.ForceStopRunnable.run (ForceStopRunnable.java:87)
Out of 750 users who have installed my application it happened to one 5 times, Should a bug that happened to a single user (of 750) be investigated or not?, And most importantly, how can I know exactly where the error came from?
When throwing an exception, put the user that is currently being accessed so you can track it down. Put as much info in the exception as possible that will help you figure out which item might be causing it.
https://stackify.com/best-practices-exceptions-java/
These are some good examples as well as good practices to follow. In those messages that are mentioned, add the user id or something that helps you identify what's going on. Learn to make custom exceptions and catch the most specific exceptions you can. Don't just catch Exception, its better to be specific about the exception your catching rather than casting a wide net.

How to debug why the map job fails after multiple retries

I wrote a mapreduce job to scan an hbase table for a certain time range to count certain elements we need for analysis.
Mappers in the MR job keeps failing but I don't know why. Seems like each time I run the job, a different number of mappers fail. The YARN log (see below) from Cloudera manager isn't helpful in pointing what the problem is, although, someone said I might be running out of memory.
It seems to retry multiple times but each time it fails. What do I need to do to make it stop failing or how can I log things to help me better determine what is happening?
Below is a log from YARN for one of the mappers that failed.
Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=36, exceptions: Thu Jun 15 16:26:57 PDT 2017,
null, java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60301: row '152_p3401.db161139.sjc102.dbi_1496271480' on
table 'dbi_based_data' at
region=dbi_based_data,151_p3413.db162024.iad4.dbi_1476974340,1486675565213.d83250d0682e648d165872afe5abd60e., hostname=hslave35118.ams9.mysecretdomain.com,60020,1483570489305,
seqNum=19308931 at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
at
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:236)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused
by: java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60301: row '152_p3401.db161139.sjc102.dbi_1496271480' on
table 'dbi_based_data' at
region=dbi_based_data,151_p3413.db162024.iad4.dbi_1476974340,1486675565213.d83250d0682e648d165872afe5abd60e., hostname=hslave35118.ams9.mysecretdomain.com,60020,1483570489305,
seqNum=19308931 at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.io.IOException: Call to
hslave35118.ams9.mysecretdomain.com/10.216.35.118:60020 failed on
local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=12, waitTime=60001, operationTimeout=60000 expired. at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291)
at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:64)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:360)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:334)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 4 more Caused by:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=12,
waitTime=60001, operationTimeout=60000 expired. at
org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
... 13 more
So it looks like for my case I needed to extend the timeout setting. In my Java program I had to add the following lines to make the exception go away:
conf.set("hbase.rpc.timeout","90000");
conf.set("hbase.client.scanner.timeout.period","90000");
The answer was found on this link on Cloudera's site

OPC client issue

I am getting the following error in the OPC client code.
I start my client- close it - start it again to see the following error.
It is clear that something from previous run is causing it. But I am unable to figure out what it is.
When I diff the jstack of my first run and close. I do not see any running thread from opc.
Has anyone seen this issue? Or
Is there some other way I can debug the issue?
2016-05-19 16:35:53,564 WARN [netty-event-loop-0] io.netty.channel.ChannelInitializer - Failed to initialize a channel. Closing: [id: 0xe25cac5b] java.lang.ExceptionInInitializerError
at com.digitalpetri.opcua.stack.client.UaTcpStackClient$1.initChannel(UaTcpStackClient.java:340)
at com.digitalpetri.opcua.stack.client.UaTcpStackClient$1.initChannel(UaTcpStackClient.java:337)
at io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
at io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:449)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:377)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:423)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: 'awaiting-handshake' is already in use
at io.netty.util.UniqueName.<init>(UniqueName.java:53)
at io.netty.util.AttributeKey.<init>(AttributeKey.java:47)
at io.netty.util.AttributeKey.valueOf(AttributeKey.java:39)
at com.digitalpetri.opcua.stack.client.handlers.UaTcpClientAcknowledgeHandler.<clinit>(UaTcpClientAcknowledgeHandler.java:44)
... 13 more
Looks like you might have some kind of ClassLoader issue - a static final field of UaTcpAcknowledgeHandler is being defined twice somehow.
What exactly happens when you "close" your client?

Hadoop Configuration Error

I am attempting to start up my hadoop application, however upon startup i am seeing this in the log files, does anyone have a clue as to what the problem is?
Creating filesystem for hdfs://10.170.4.141:9000
java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.(Configuration.java:229)
at org.apache.hadoop.conf.Configuration.(Configuration.java:216)
at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:60)
at org.apache.hadoop.net.NetUtils.makeSocketAddr(NetUtils.java:188)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:168)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:88)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:125)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:240)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
at org.blismedia.VolumeReportGenerateUpdates.main(VolumeReportGenerateUpdates.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
I think you're running into HADOOP-2851. This "error" can safely be ignored.
Apparently, Configuration's constructor logs an exception to the debug log, despite no exception actually being thrown. Why? Your guess is as good as mine. But the issue is resolved in their project as won't fix. "It's a feature, not a bug."
public Configuration(boolean loadDefaults) {
if (LOG.isDebugEnabled()) {
LOG.debug(StringUtils.stringifyException(new IOException("config()")));
}
// ...
}

Java Quartz Ibatis Cron Issues

I have a java webapp using an ibatis row handler to load a very large dataset (1 million rows in an innodb table). The process is run as a nightly cron job by quartz scheduler. However, after it processes for 6 minutes, it dies with the following stack trace:
WARN [DefaultQuartzScheduler_Worker-8] MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(168) | Could not invoke method 'doBatch' on target object [org.myCron#4adb34]
org.springframework.jdbc.UncategorizedSQLException: SqlMapClient operation: encountered SQLException [
--- The error occurred in org/myCron/mySqlMap.xml.
--- The error occurred while applying a result map.
--- Check the mySqlMap.outputMapping.
--- The error happened while setting a property on the result object.
--- Cause: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.io.EOFException
STACKTRACE:
java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1903)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2402)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2860)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6106)
at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:168)
at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at com.ibatis.common.jdbc.logging.ResultSetLogProxy.invoke(ResultSetLogProxy.java:47)
at $Proxy10.next(Unknown Source)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleResults(SqlExecutor.java:380)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleMultipleResults(SqlExecutor.java:301)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:190)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(GeneralStatement.java:205)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(GeneralStatement.java:173)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithRowHandler(GeneralStatement.java:133)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryWithRowHandler(SqlMapExecutorDelegate.java:649)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryWithRowHandler(SqlMapSessionImpl.java:156)
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryWithRowHandler(SqlMapClientImpl.java:133)
at org.springframework.orm.ibatis.SqlMapClientTemplate$5.doInSqlMapClient(SqlMapClientTemplate.java:267)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:165)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryWithRowHandler(SqlMapClientTemplate.java:265)
at org.myCron.doBatch(MyCron.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:248)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:165)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:66)
at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)
** END NESTED EXCEPTION **
The stack trace is very vague. The only hints that I see are 'the error happened while setting a property on the result object'. There are only two properties on the result object: a String and an Integer. Both of them permit null values, but my select statements indicate that neither of them have any null values. They both have a proper gettter/setter (which makes sense since the process runs for a while successfully before dying). Every time that the cron runs, it dies at a random point (so it isn't stuck on a particular row).
Note - The method 'doBatch' does exist since that is the method that starts the cron process. If it couldn't find doBatch, it couldn't successfully process the first thousand rows.
I've also tried runnning the job outside of quartz and it also fails there as well. We tried increasing our MySQL net_read_timeout, net_write_timeout, and delayed_insert_timeout but none of these settings helped with the problem. I also tried setting my log4j setting to DEBUG and I did not get any helpful info.
Any other ideas about what I could try?
Sounds like MySQL closed the connection for some reason. Check the MySQL log see if anything shows up. Turn on various logging options for MySQL if necessary.
Also, start printing debug data (including timestamp) from your app - just print everything, then see what the last action was - perhaps you have some rarely triggered conditions in your code that has a bug.
I.e. every single time you talk to MySQL log it before AND after.

Categories