Sutime / english.sutime.txt takes much loading time from runnable jar - java

I am working on java project and i am using standford corenlp http://stanfordnlp.github.io/CoreNLP/ library within it.
I have added them as reference library and also added some maven dependency. I have setup whole project with eclipse.
Now I create runnable jar for this project, but when I execute that jar it takes lot of time to execute library english.sutime.txt as follow,
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.9 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.6 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Jun 29, 2016 6:37:39 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Jun 29, 2016 6:58:03 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Read 25 rules
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [3.1 sec].
So after english.sutime.txt I have to wait for further process about 15minutes. But project execute in eclipse not take this much time.
Can anyone help me to resolve this? My runnable jar contains all dependency within it.
Thanks,
Priyank

While creating the runnable jar file in eclipse, select "Extract Required libraries into generated jar" instead of "Package required libraries into generated jar".

Related

format Alluxio: No Under File System Factory found for: hdfs://nameservice1/alluxio/journal/BlockMaster

I want to Deploy Alluxio on a Cluster with HA.My CDH version: 3.0.0+cdh6.3.2.
I build Alluxio with a specific Hadoop release version:
mvn install -Phadoop-3 -Dhadoop.version=3.0.0 -DskipTests
I put alluxio-assembly-server-2.4.1-2-SNAPSHOT-jar-with-dependencies.jar and alluxio-underfs-hdfs-2.4.1-2-SNAPSHOT-jar-with-dependencies.jar in the lib/ folder of Alluxio every node.
/opt/alluxio-2.4.1-1/conf/alluxio-site.properties:
alluxio.master.mount.table.root.ufs=hdfs://nameservice1/alluxio/data
alluxio.master.journal.type=UFS
alluxio.master.journal.folder=hdfs://nameservice1/alluxio/journal/
alluxio.master.security.impersonation.root.users=*
alluxio.worker.tieredstore.level0.dirs.quota=10GB
alluxio.worker.tieredstore.level1.dirs.quota=10GB
alluxio.worker.tieredstore.level2.dirs.quota=10GB
alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=test-cdh001:2181,test-cdh002:2181,test-cdh003:2181
alluxio.underfs.hdfs.configuration=/etc/hadoop/conf/core-site.xml:/etc/hadoop/conf/hdfs-site.xml
when I format Alluxio cluster with the following command in one of the master nodes:
./bin/alluxio format
I got a error:
Executing the following command on all worker nodes and logging to /opt/alluxio-2.4.1-1/logs/task.log: /opt/alluxio-2.4.1-1/bin/alluxio formatWorker
Waiting for tasks to finish...
All tasks finished
Formatting Alluxio Master # test-cdh001
2021-01-07 18:35:58,766 INFO Format - Formatting master journal: hdfs://nameservice1/alluxio/journal/
2021-01-07 18:35:58,806 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.4.1-1/lib
2021-01-07 18:35:58,869 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.4.1-1/extensions
2021-01-07 18:35:58,886 WARN ExtensionFactoryRegistry - No factory implementation supports the path hdfs://nameservice1/alluxio/journal/BlockMaster
2021-01-07 18:35:58,887 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.4.1-1/lib
2021-01-07 18:35:58,906 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.4.1-1/extensions
2021-01-07 18:35:58,915 WARN ExtensionFactoryRegistry - No factory implementation supports the path hdfs://nameservice1/alluxio/journal/BlockMaster
2021-01-07 18:35:58,915 ERROR Format - Failed to format
java.lang.IllegalArgumentException: No Under File System Factory found for: hdfs://nameservice1/alluxio/journal/BlockMaster
at alluxio.underfs.UnderFileSystem$Factory.create(UnderFileSystem.java:95)
at alluxio.master.journal.ufs.UfsJournal.<init>(UfsJournal.java:149)
at alluxio.master.journal.ufs.UfsJournalSystem.createJournal(UfsJournalSystem.java:73)
at alluxio.master.journal.ufs.UfsJournalSystem.createJournal(UfsJournalSystem.java:47)
at alluxio.cli.Format.format(Format.java:120)
at alluxio.cli.Format.main(Format.java:97)
Any help would be much appreciated.
Add your hadoop version to the mount command with "--option alluxio.underfs.version="
Ex. alluxio fs mount --option alluxio.underfs.version=3.2 /mnt/hdfs/emr hdfs://hostname:8020/tmp/emr

Error while using JsonLoader from elephant-bird-pig

I'm trying to use JSonLoader form elephant-bird-pig package.
My script is simple:
register elephant-bird-pig-4.5.jar
register elephant-bird-hadoop-compat-4.5.jar
A = load '1_record_2.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP A
And I get an error:
2014-09-30 16:15:32,439 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-30 16:15:32,447 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-30 16:15:32,448 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-09-30 16:15:32,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-30 16:15:32,450 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-30 16:15:32,450 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-30 16:15:32,464 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop1/10.242.8.131:8050
2014-09-30 16:15:32,466 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-30 16:15:32,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-30 16:15:32,467 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
Details at logfile: pig_1412081068149.log
I don't know what is missing. Can you please suggest something?
File pig_1412081068149.log contains:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
java.lang.NoClassDefFoundError: com/twitter/elephantbird/util/HadoopCompat
at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.setLocation(LzoBaseLoadFunc.java:93)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:477)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:191)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1324)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1309)
at org.apache.pig.PigServer.storeEx(PigServer.java:980)
at org.apache.pig.PigServer.store(PigServer.java:944)
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) etc...
What class is missing (java.lang.NoClassDefFoundError) ? What libraries should I add more?
Thanks
pawel
I've checked if registered libraries exist and classes are inside the libraries.
Everythig looked fine, but I was still getting this error.
So I left pig shell, and opened it once again - the same script works fine.
You missed two important jars to register.
register elephant-bird-core-4.5.jar
register json-simple-1.1.1.jar
Add these two to your pig script and everything should work fine.

Sonar-runner execution failure causing cast exception

After configure the sonar tools (SonarQube, MySql database and Sonar-runner) I perform an analysis over an Android project without any problem. But after install the Android Plugin for sonar and repeat the analysis, this one fails getting the next error:
INFO - Preview mode
Load batch settings
User cache: /home/user/.sonar/cache
INFO - Install plugins
INFO - Exclude plugins: devcockpit, jira, pdfreport, views, report, buildstability, scmactivity, buildbreaker
INFO - Create JDBC datasource for jdbc:h2:/home/user/workspace/myAndroidProject/.sonar/.sonartmp/preview1394469024394-0
INFO - Initializing Hibernate
INFO - Load project settings
INFO - Apply project exclusions
INFO - ------------- Scan myAndroidProject
INFO - Load module settings
INFO - Language is forced to java
INFO - Loading technical debt model...
INFO - Loading technical debt model done: 424 ms
INFO - Configure Maven plugins
INFO - Base dir: /home/user/workspace/myAndroidProject
INFO - Working dir: /home/user/workspace/myAndroidProject/.sonar
INFO - Source dirs: /home/user/workspace/myAdnroidProject/src
INFO - Source encoding: UTF-8, default locale: en_EN
INFO - Index files
INFO - Included sources:
INFO - src/**
INFO - 116 files indexed
WARN - Accessing the filesystem before the Sensor phase is deprecated and will not be supported in the future. Please update your plugin.
INFO - Index files
INFO - Included sources:
INFO - src/**
INFO - 116 files indexed
WARN - Accessing the filesystem before the Sensor phase is deprecated and will not be supported in the future. Please update your plugin.
INFO - Index files
INFO - Included sources:
INFO - src/**
INFO - 116 files indexed
INFO - Quality profile for java: Sonar way
INFO - Sensor JavaSourceImporter...
INFO - Sensor JavaSourceImporter done: 49 ms
INFO - Sensor JavaSquidSensor...
INFO - Java AST scan...
INFO - 116 source files to be analyzed
INFO - 116/116 source files analyzed
INFO - Java AST scan done: 6693 ms
WARN - Java bytecode has not been made available to the analyzer. The Depth of Inheritance Tree (DIT) metric, Response for Class (RFC) metric, Number of Children (NOC) metric, Lack of Cohesion (LCOM4) metric, deperecated dependencies metrics, UnusedPrivateMethod rule, RedundantThrowsDeclarationCheck rule, S1160 rule, S1217 rule are disabled.
INFO: ------------------------------------------------------------------------
INFO: EXECUTION FAILURE
INFO: ------------------------------------------------------------------------
Total time: 18.440s
Final Memory: 12M/357M
INFO: ------------------------------------------------------------------------
ERROR: Error during Sonar runner execution
ERROR: Unable to execute Sonar
ERROR: Caused by: org.sonar.api.resources.Directory cannot be cast to org.sonar.api.resources.JavaPackage
My sonar-project.properties file is the enxt:
#Required metadata
sonar.projectKey=mKey
sonar.projectName=myAndroidProject
sonar.projectVersion=1.0
# Paths to source directories.
# Paths are relative to the sonar-project.properties file. Replace "\" by "/" on Windows.
# Do not put the "sonar-project.properties" file in the same directory with the source code.
# (i.e. never set the "sonar.sources" property to ".")
sonar.sources=src
# The value of the property must be the key of the language.
sonar.language=java
# Encoding of the source code
sonar.sourceEncoding=UTF-8
# Analysis mode
sonar.analysis.mode=preview
#Enables the Lint profile to analyze the code using the Lint rules.
#sonar.profile=Android Lint
I'm using the next environment:
SonarQube 4.2 RC1
Sonar-runner 2.3
Database: MySQL
Ubuntu 12.04 LTS
Java 1.7
I tryed uninstalling the Android plugin but the problem persists. The unique way that I've found to solve it is deleting the database and the user and create them again.
As stated on http://docs.codehaus.org/pages/viewpage.action?pageId=236224987, the Android plugin is not yet compatible with SonarQube 4.2-RC1. See also http://jira.codehaus.org/browse/SONARPLUGINS-3483.
You need to provide the binaries (bytecode .class files) to the sonar executor.
Add the following line to your sonar-project.properties
# Path to the class files
sonar.binaries=build\\classes\\main
If the above line doesn't work , then check your binaries actual path and place it in sonar.binaries property

Cannot register pig UDF jar

I'm having problems with a particular UDF jar and have so far been unable to figure out where to begin. I have a way of testing the jar from the command line and it works. If I REGISTER the jar in a pig script, pig fails to create a jar for the job. I can register other jars without trouble and this jar was working until a few days ago. Here is the output when running the pig script:
[michael#hadoop01 logitech-correlation]$ pig -f MatchWithClassifier.pig -param date=20130301 -param siteId=0
2013-05-10 11:20:30,523 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0-cdh4.1.2 (rexported) compiled Nov 01 2012, 18:38:58
2013-05-10 11:20:30,524 [main] INFO org.apache.pig.Main - Logging error messages to: /home/michael/correlation/pig_1368210030521.log
2013-05-10 11:20:30,981 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop01/
2013-05-10 11:20:31,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hadoop01.dev.terapeak.com:8021
2013-05-10 11:20:32,143 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER
2013-05-10 11:20:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-10 11:20:32,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-05-10 11:20:32,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-05-10 11:20:32,508 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-10 11:20:32,518 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-10 11:20:32,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5623238576559565298.jar
2013-05-10 11:20:36,398 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /home/michael/correlation/pig_1368210030521.log
The stack trace is below:
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:727)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:259)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1275)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1260)
at org.apache.pig.PigServer.execute(PigServer.java:1250)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:430)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.util.zip.ZipException: invalid distance too far back
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.ZipInputStream.read(ZipInputStream.java:163)
at java.util.jar.JarInputStream.read(JarInputStream.java:194)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.pig.impl.util.JarManager.addStream(JarManager.java:242)
at org.apache.pig.impl.util.JarManager.mergeJar(JarManager.java:216)
at org.apache.pig.impl.util.JarManager.mergeJar(JarManager.java:206)
at org.apache.pig.impl.util.JarManager.createJar(JarManager.java:126)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:411)
... 17 more
================================================================================
Based on this I would think the problem is from the "java.util.zip.ZipException: invalid distance too far back" exception. Is pig having some issue reading the jar?

Unable to disable Hibernate log messages

I'm using Hibernate for a personal project.
In my project, I have these directory:
+ conf
log4j.properties
+ bin
my classes
Using Windows console, I go to project directory (the parent of bin and conf) and I start the application with a command like this:
java -cp conf;lib/lib1.jar;lib/lib2.jar;[etc] com.moc.Main
My log4j.properties file is this (taken from an hibernate example):
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d - %m%n
log4j.rootLogger=info, stdout
log4j.logger.org.hibernate=error
log4j.logger.org.hibernate.tool.hbm2ddl=error
log4j.logger.org.hibernate.hql.ast.QueryTranslatorImpl=error
log4j.logger.org.hibernate.hql.ast.HqlSqlWalker=error
log4j.logger.org.hibernate.hql.ast.SqlGenerator=error
log4j.logger.org.hibernate.hql.ast.AST=error
On application start, this is the output:
2010-11-06 19:00:56,376 - Logger.getRootLogger().info() statement
12 [main] INFO org.hibernate.cfg.Environment - Hibernate 3.5.3-Final
13 [main] INFO org.hibernate.cfg.Environment - hibernate.properties not found
16 [main] INFO org.hibernate.cfg.Environment - Bytecode provider name : javassist
20 [main] INFO org.hibernate.cfg.Environment - using JDK 1.4 java.sql.Timestamp handling
108 [main] INFO org.hibernate.cfg.Configuration - configuring from resource: com/moc/hibernate.cfg.xml
108 [main] INFO org.hibernate.cfg.Configuration - Configuration resource: com/moc/hibernate.cfg.xml
124 [main] INFO org.hibernate.cfg.Configuration - Reading mappings from file: conf\hiber\Customer.hbm.xml
.
.
.
and so on
.
.
.
795 [main] INFO org.hibernate.impl.SessionFactoryImpl - closing
795 [main] INFO org.hibernate.connection.DriverManagerConnectionProvider - cleaning up connection pool: jdbc:mysql://localhost/mydb
The color of Hibernate log lines is red, my log lines are black.
Why I still see INFO output from Hibernate? What am I doing wrong?
A good way of checking your log4j configuration and the events occuring at runtime is adding
-Dlog4j.debug option to the java command line. In your case it will become:
java -Dlog4j.debug -cp conf;lib/lib1.jar;lib/lib2.jar;[etc] com.moc.Main
This will throw information on console of the sequence of loading of log4j configuration. You can then determine if your log4j.properties is getting loaded correctly or not.
Your log4j configuration looks ok, is your log4j.properties file on the classpath and in the root package? I.e. is it in the root of conf, lib1.jar, lib2.jar pr any other jar/directory in your classpath?
Try this to check if the file is being loaded correctly.
On this line:
log4j.rootLogger=info, stdout
chage to
log4j.rootLogger=error, stdout
This will set the log level for the root logger and hence all loggers to ERROR, if you are still seeing the INFO log entries then your log4j.properties file must not be loading correctly, most likely for the reasons stated above.
Can you try this syntax instead?
log4j.category.org.hibernate=ERROR

Categories