Hadoop Pig Cassandra get_range_slices error - java

I'm using Cassandra 1.0.9 and the latest Pig and Hadoop to execute MapReduce tasks.
Just a simple task scripted in Pig to extract 2 columns from the Cassandra database.
Seems to work and then it encounters this problem.
java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing get_range_slices at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:334) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:348) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:222) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:178) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.thrift.TApplicationException: Internal error processing get_range_slices at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:754) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:734) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:289) ... 17 more
Is there a way around it? I could show the Pig script on request.

Related

com.bea.wli.monitoring.MonitoringException: Caused by: java.lang.IllegalArgumentException: Server 'null' not found

I have java code that uses Java Management Extensions (JMX) monitoring API to get service statistics data from oracle weblogic console.
I have a production server and a testing server.
this java code works fine and gets service statistics data from weblogic console of testing server. when i run this code for production server it shows the below error :
java.lang.reflect.UndeclaredThrowableException
at $Proxy0.getProxyServiceStatistics(Unknown Source)
at reports.All_ServiceStatisticsReport.getProxyServiceDetails(All_ServiceStatisticsReport.java:235)
at reports.All_ServiceStatisticsReport.<init>(All_ServiceStatisticsReport.java:189)
at reports.All_ServiceStatisticsReport.main(All_ServiceStatisticsReport.java:567)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at reports.All_ServiceStatisticsReport$ServiceDomainMBeanInvocationHandler.invoke(All_ServiceStatisticsReport.java:145)
... 4 more
Caused by: com.bea.wli.monitoring.MonitoringException: [OSB-473057]Failed to get statistics for services due to java.lang.IllegalArgumentException: Server 'null' not found
at weblogic.utils.StackTraceDisabled.unknownMethod()
Caused by: java.lang.IllegalArgumentException: Server 'null' not found
... 1 more
Process exited with exit code 0.
the line where the error occurs is :
proxyServiceResourceStatisticMap = serviceDomainMbean.getProxyServiceStatistics(filteredProxyServiceRefs,resourceType.value(),null);
I am passing all the right arguments according to the oracle docs.
https://docs.oracle.com/middleware/1213/osb/java-api/com/bea/wli/monitoring/ServiceDomainMBean.html
then why i am getting this error.
thanks in advance.

getting java.lang.NullPointerException when trying to run nutch from eclipse with a cassandra datastore

I am trying to run apache nutch from eclipse with cassandra on Windows. This is the error i am getting.
InjectorJob: starting at 2017-02-17 17:35:42
InjectorJob: Injecting urlDir: C:/Users/STAN/Desktop/trunk/urls/seeds.txt
InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage class.
InjectorJob: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:434)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:115)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:246)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:267)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:290)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:299)
InjectorJob: Injecting urlDir: C:/Users/STAN/Desktop/trunk/urls/seeds.txt
According to me this should be a directory not a file.

KiteSDK MapReduce: EOF exception during parquet file load

I have hadoop map-reduce job which uses KitSDK DatasetKeyInputFormat. It is configured to read parquet file.
Eveery time I run the job I get following exception:
Error: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.hadoop.ParquetInputSplit.readArray(ParquetInputSplit.java:304)
at parquet.hadoop.ParquetInputSplit.readFields(ParquetInputSplit.java:263)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:372)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:754)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
The same file can be successfully read by map-reduce jobs created by hive. I.e. I can query it successfully.
To isolate possible issue I have created map-reduce job based on the KiteSDK example for mapreduce. But I still get the same exception.
Note: AVRO and CSV formats work well.

Apache nutch 1.5 and solr 4.7 indexing

I have crawled websites using apache nutch and want to index the data in solr. I have been following the tutorial mentioned here
However the tutorial mentions about indexing as it crawls except in my case I need to index the data that already has been crawled.
I am running the below command
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
[abc#xyz nutch-crawler]$ bin/nutch index http://abc.xyz:8983/solr/ pryder/crawldb/ -linkdb pryder/linkdb/ pryder/segments/20140330021243/
Indexer: starting at 2014-04-02 20:34:09
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/solr/client/solrj/impl/CommonsHttpSolrServer
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
at java.lang.Class.getConstructor0(Class.java:2708)
at java.lang.Class.newInstance0(Class.java:328)
at java.lang.Class.newInstance(Class.java:310)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:157)
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
Caused by: java.lang.ClassNotFoundException: org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 11 more
What would be going wrong here?

ClassNotFoundException error while interacting with database

I was trying to run a mapreduce program which runs on top of an underlying database. When I installed a distribution of hadoop which is available in hadoop downloads. The programs worked fine for this distribution. But when I compiled my own distribution of hadoop and tried to run the same programs I am getting the below error. I followed the procedures like putting the mysql connector jar in the hadoop/lib directory and putting one in the distributed cache. While these procedures worked for the distribution which was available under hadoop downloads but they did not work for the distribution which I created.
Can anyone kindly tell what might have gone wrong ? I tried all other ways like updating classpath and HADOOP_CLASSPATH variable but none worked.
hduser#ramanujan:~$ hadoop jar SimpleConn.jar
13/04/15 13:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/04/15 13:50:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/04/15 13:50:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/04/15 13:50:17 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hduser/.staging/job_1366013851608_0001
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:169)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:470)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:490)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at DBCountPageView.run(DBCountPageView.java:227)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at DBCountPageView.main(DBCountPageView.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:195)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:163)
... 21 more
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:188)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:189)
... 22 more
Be sure to add any dependencies to both the HADOOP_CLASSPATH and -libjars upon submitting a job like in the following examples:
Use the following to add all the jar dependencies from current and lib directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar you'll need to also pass it the jars of any dependencies through use of -libjars. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed commands require a different delimiter character; the HADOOP_CLASSPATH is : separated and the -libjars need to be , separated.

Categories