How to run nutch 1.9 in eclipse on windows? - java

I want to run Nutch 1.9 in Eclipse on Windows. I followed the tutorial from http://wiki.apache.org/nutch/RunNutchInEclipse and opened the project in Eclipse.
But when I run Nutch, I get the following error:
2014-09-19 17:45:48,039 INFO crawl.Injector (Injector.java:inject(283)) - Injector: starting at 2014-09-19 17:45:48
2014-09-19 17:45:48,043 INFO crawl.Injector (Injector.java:inject(284)) - Injector: crawlDb: K:/kumar/Nutch/apache-nutch-1.9/crawlresult
2014-09-19 17:45:48,043 INFO crawl.Injector (Injector.java:inject(285)) - Injector: urlDir: K:/kumar/Nutch/apache-nutch-1.9/urls
2014-09-19 17:45:48,043 INFO crawl.Injector (Injector.java:inject(294)) - Injector: Converting injected urls to crawl db entries.
2014-09-19 17:45:48,207 INFO jvm.JvmMetrics (JvmMetrics.java:init(71)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-09-19 17:45:48,252 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(661)) - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2014-09-19 17:45:48,268 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1
2014-09-19 17:45:48,485 INFO mapred.JobClient (JobClient.java:monitorAndPrintJob(1275)) - Running job: job_local_0001
2014-09-19 17:45:48,487 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1
2014-09-19 17:45:48,526 INFO mapred.MapTask (MapTask.java:runOldMapper(347)) - numReduceTasks: 0
2014-09-19 17:45:48,565 INFO plugin.PluginRepository (PluginManifestParser.java:parsePluginFolder(87)) - Plugins: looking in: K:\Nutch\apache-nutch-1.9\plugins
2014-09-19 17:45:48,566 WARN plugin.PluginRepository (PluginManifestParser.java:parsePluginFolder(101)) - java.io.FileNotFoundException: K:\Nutch\apache-nutch-1.9\plugins\creativecommons\plugin.xml (The system cannot find the file specified)
It seems that Hadoop is the causing error. I don't know how to solve this problem. I know Nutch requires Unix environment. But, I want to run Nutch in Eclipse on Windows.
Can anybody help me to solve this?

Download cygwin, then add that to your path of the environment variables. I think your problem is caused by the fact that windows can't invoke a unix native command. That is what I did however as soon as i got past that problem, I encountered other problems.

Related

format Alluxio: No Under File System Factory found for: hdfs://nameservice1/alluxio/journal/BlockMaster

I want to Deploy Alluxio on a Cluster with HA.My CDH version: 3.0.0+cdh6.3.2.
I build Alluxio with a specific Hadoop release version:
mvn install -Phadoop-3 -Dhadoop.version=3.0.0 -DskipTests
I put alluxio-assembly-server-2.4.1-2-SNAPSHOT-jar-with-dependencies.jar and alluxio-underfs-hdfs-2.4.1-2-SNAPSHOT-jar-with-dependencies.jar in the lib/ folder of Alluxio every node.
/opt/alluxio-2.4.1-1/conf/alluxio-site.properties:
alluxio.master.mount.table.root.ufs=hdfs://nameservice1/alluxio/data
alluxio.master.journal.type=UFS
alluxio.master.journal.folder=hdfs://nameservice1/alluxio/journal/
alluxio.master.security.impersonation.root.users=*
alluxio.worker.tieredstore.level0.dirs.quota=10GB
alluxio.worker.tieredstore.level1.dirs.quota=10GB
alluxio.worker.tieredstore.level2.dirs.quota=10GB
alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=test-cdh001:2181,test-cdh002:2181,test-cdh003:2181
alluxio.underfs.hdfs.configuration=/etc/hadoop/conf/core-site.xml:/etc/hadoop/conf/hdfs-site.xml
when I format Alluxio cluster with the following command in one of the master nodes:
./bin/alluxio format
I got a error:
Executing the following command on all worker nodes and logging to /opt/alluxio-2.4.1-1/logs/task.log: /opt/alluxio-2.4.1-1/bin/alluxio formatWorker
Waiting for tasks to finish...
All tasks finished
Formatting Alluxio Master # test-cdh001
2021-01-07 18:35:58,766 INFO Format - Formatting master journal: hdfs://nameservice1/alluxio/journal/
2021-01-07 18:35:58,806 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.4.1-1/lib
2021-01-07 18:35:58,869 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.4.1-1/extensions
2021-01-07 18:35:58,886 WARN ExtensionFactoryRegistry - No factory implementation supports the path hdfs://nameservice1/alluxio/journal/BlockMaster
2021-01-07 18:35:58,887 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.4.1-1/lib
2021-01-07 18:35:58,906 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.4.1-1/extensions
2021-01-07 18:35:58,915 WARN ExtensionFactoryRegistry - No factory implementation supports the path hdfs://nameservice1/alluxio/journal/BlockMaster
2021-01-07 18:35:58,915 ERROR Format - Failed to format
java.lang.IllegalArgumentException: No Under File System Factory found for: hdfs://nameservice1/alluxio/journal/BlockMaster
at alluxio.underfs.UnderFileSystem$Factory.create(UnderFileSystem.java:95)
at alluxio.master.journal.ufs.UfsJournal.<init>(UfsJournal.java:149)
at alluxio.master.journal.ufs.UfsJournalSystem.createJournal(UfsJournalSystem.java:73)
at alluxio.master.journal.ufs.UfsJournalSystem.createJournal(UfsJournalSystem.java:47)
at alluxio.cli.Format.format(Format.java:120)
at alluxio.cli.Format.main(Format.java:97)
Any help would be much appreciated.
Add your hadoop version to the mount command with "--option alluxio.underfs.version="
Ex. alluxio fs mount --option alluxio.underfs.version=3.2 /mnt/hdfs/emr hdfs://hostname:8020/tmp/emr

wildfly-swarm launch from Eclipse "org.jboss.modules.ModuleLoadException"

For multiple projects, I use wildfly-swarm to avoid installing a webserver. The swarm jar file is generated and I can successfully launch it through "java -jar mypackage-swarm.jar".
However, to debug it easily, I would like to launch it via my IDE (eclipse). Either by directly launching my main class or using fakerplace, I have the following exception :
Fakereplace is running.
Dependencies not bundled, will resolve from local M2REPO
2017-01-17 08:31:41,806 org.wildfly.swarm.internal.SwarmMessages [main] DEBUG Logging Provider: org.jboss.logging.Log4jLoggerProvider
2017-01-17 08:31:41,811 org.wildfly.swarm.Swarm [main] DEBUG WFSWARM0020: Stage Config found in swarm.project.stage.file system property at location: null
2017-01-17 08:31:41,862 INFO [org.wildfly.swarm] (main) WFSWARM0018: Installed fraction: Logging - STABLE org.wildfly.swarm:logging:2016.12.1
2017-01-17 08:31:41,865 INFO [org.wildfly.swarm] (main) WFSWARM0018: Installed fraction: Undertow - STABLE org.wildfly.swarm:undertow:2016.12.1
2017-01-17 08:31:41,865 INFO [org.wildfly.swarm] (main) WFSWARM0018: Installed fraction: Spring WebMVC - STABLE org.wildfly.swarm:spring:2016.12.1
org.jboss.modules.ModuleLoadException: Error loading module from modules/org/apache/xerces/main/module.xml
...
Caused by: org.jboss.modules.xml.XmlPullParserException: Failed to resolve artifact 'xerces:xercesImpl:2.11.0.SP5' (position: END_TAG seen ...<resources>\n <artifact name="xerces:xercesImpl:2.11.0.SP5"/>... #5:52)
I've found similar exceptions on internet, but I cannot find a solution.
Has anyone any idea on this ?
Why don't you just run it as jar application ?
I'm not sure how to do that in eclipse below is example in intellj but for eclipse you can find your solution here:
eclipse: how to debug a Java program as a .jar file?

Error while using JsonLoader from elephant-bird-pig

I'm trying to use JSonLoader form elephant-bird-pig package.
My script is simple:
register elephant-bird-pig-4.5.jar
register elephant-bird-hadoop-compat-4.5.jar
A = load '1_record_2.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
DUMP A
And I get an error:
2014-09-30 16:15:32,439 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-30 16:15:32,447 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-30 16:15:32,448 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-09-30 16:15:32,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-30 16:15:32,450 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-30 16:15:32,450 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-30 16:15:32,464 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop1/10.242.8.131:8050
2014-09-30 16:15:32,466 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-30 16:15:32,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-30 16:15:32,467 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
Details at logfile: pig_1412081068149.log
I don't know what is missing. Can you please suggest something?
File pig_1412081068149.log contains:
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
java.lang.NoClassDefFoundError: com/twitter/elephantbird/util/HadoopCompat
at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.setLocation(LzoBaseLoadFunc.java:93)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:477)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:298)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:191)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1324)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1309)
at org.apache.pig.PigServer.storeEx(PigServer.java:980)
at org.apache.pig.PigServer.store(PigServer.java:944)
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) etc...
What class is missing (java.lang.NoClassDefFoundError) ? What libraries should I add more?
Thanks
pawel
I've checked if registered libraries exist and classes are inside the libraries.
Everythig looked fine, but I was still getting this error.
So I left pig shell, and opened it once again - the same script works fine.
You missed two important jars to register.
register elephant-bird-core-4.5.jar
register json-simple-1.1.1.jar
Add these two to your pig script and everything should work fine.

Sonar-runner execution failure causing cast exception

After configure the sonar tools (SonarQube, MySql database and Sonar-runner) I perform an analysis over an Android project without any problem. But after install the Android Plugin for sonar and repeat the analysis, this one fails getting the next error:
INFO - Preview mode
Load batch settings
User cache: /home/user/.sonar/cache
INFO - Install plugins
INFO - Exclude plugins: devcockpit, jira, pdfreport, views, report, buildstability, scmactivity, buildbreaker
INFO - Create JDBC datasource for jdbc:h2:/home/user/workspace/myAndroidProject/.sonar/.sonartmp/preview1394469024394-0
INFO - Initializing Hibernate
INFO - Load project settings
INFO - Apply project exclusions
INFO - ------------- Scan myAndroidProject
INFO - Load module settings
INFO - Language is forced to java
INFO - Loading technical debt model...
INFO - Loading technical debt model done: 424 ms
INFO - Configure Maven plugins
INFO - Base dir: /home/user/workspace/myAndroidProject
INFO - Working dir: /home/user/workspace/myAndroidProject/.sonar
INFO - Source dirs: /home/user/workspace/myAdnroidProject/src
INFO - Source encoding: UTF-8, default locale: en_EN
INFO - Index files
INFO - Included sources:
INFO - src/**
INFO - 116 files indexed
WARN - Accessing the filesystem before the Sensor phase is deprecated and will not be supported in the future. Please update your plugin.
INFO - Index files
INFO - Included sources:
INFO - src/**
INFO - 116 files indexed
WARN - Accessing the filesystem before the Sensor phase is deprecated and will not be supported in the future. Please update your plugin.
INFO - Index files
INFO - Included sources:
INFO - src/**
INFO - 116 files indexed
INFO - Quality profile for java: Sonar way
INFO - Sensor JavaSourceImporter...
INFO - Sensor JavaSourceImporter done: 49 ms
INFO - Sensor JavaSquidSensor...
INFO - Java AST scan...
INFO - 116 source files to be analyzed
INFO - 116/116 source files analyzed
INFO - Java AST scan done: 6693 ms
WARN - Java bytecode has not been made available to the analyzer. The Depth of Inheritance Tree (DIT) metric, Response for Class (RFC) metric, Number of Children (NOC) metric, Lack of Cohesion (LCOM4) metric, deperecated dependencies metrics, UnusedPrivateMethod rule, RedundantThrowsDeclarationCheck rule, S1160 rule, S1217 rule are disabled.
INFO: ------------------------------------------------------------------------
INFO: EXECUTION FAILURE
INFO: ------------------------------------------------------------------------
Total time: 18.440s
Final Memory: 12M/357M
INFO: ------------------------------------------------------------------------
ERROR: Error during Sonar runner execution
ERROR: Unable to execute Sonar
ERROR: Caused by: org.sonar.api.resources.Directory cannot be cast to org.sonar.api.resources.JavaPackage
My sonar-project.properties file is the enxt:
#Required metadata
sonar.projectKey=mKey
sonar.projectName=myAndroidProject
sonar.projectVersion=1.0
# Paths to source directories.
# Paths are relative to the sonar-project.properties file. Replace "\" by "/" on Windows.
# Do not put the "sonar-project.properties" file in the same directory with the source code.
# (i.e. never set the "sonar.sources" property to ".")
sonar.sources=src
# The value of the property must be the key of the language.
sonar.language=java
# Encoding of the source code
sonar.sourceEncoding=UTF-8
# Analysis mode
sonar.analysis.mode=preview
#Enables the Lint profile to analyze the code using the Lint rules.
#sonar.profile=Android Lint
I'm using the next environment:
SonarQube 4.2 RC1
Sonar-runner 2.3
Database: MySQL
Ubuntu 12.04 LTS
Java 1.7
I tryed uninstalling the Android plugin but the problem persists. The unique way that I've found to solve it is deleting the database and the user and create them again.
As stated on http://docs.codehaus.org/pages/viewpage.action?pageId=236224987, the Android plugin is not yet compatible with SonarQube 4.2-RC1. See also http://jira.codehaus.org/browse/SONARPLUGINS-3483.
You need to provide the binaries (bytecode .class files) to the sonar executor.
Add the following line to your sonar-project.properties
# Path to the class files
sonar.binaries=build\\classes\\main
If the above line doesn't work , then check your binaries actual path and place it in sonar.binaries property

Jetty Server does not start

I have a problem running jetty server.
>>> STARTING EMBEDDED JETTY SERVER, PRESS ANY KEY TO STOP
[main] INFO org.apache.wicket.velocity.Initializer - Initialized Velocity successfully
[main] WARN org.apache.wicket.protocol.http.WicketFilter - initialization failed, destroying now
[main] INFO org.apache.wicket.Application - [wicket.project] destroy: Wicket core library initializer
[main] INFO org.apache.wicket.Application - [wicket.project] destroy: DevUtils DebugBar Initializer
[main] INFO org.apache.wicket.Application - [wicket.project] destroy: Wicket extensions initializer
[main] INFO org.apache.wicket.Application - [wicket.project] destroy: Wicket JMX initializer
[main] INFO org.apache.wicket.Application - [wicket.project] destroy: org.apache.wicket.velocity.Initializer#1453a1c7
[main] WARN org.eclipse.jetty.util.component.AbstractLifeCycle - FAILED wicket.project: javax.servlet.ServletException: java.lang.UnsupportedOperationException: path to '/C:/Users/F%c4%b1rat/Desktop/2/src/itudb1323.db': 'C:\Users\F%c4%b1rat' does not exist
javax.servlet.ServletException: java.lang.UnsupportedOperationException: path to '/C:/Users/F%c4%b1rat/Desktop/2/src/itudb1323.db': 'C:\Users\F%c4%b1rat' does not exist
at org.apache.wicket.protocol.http.WicketFilter.init(WicketFilter.java:449)
The problem seems like this C:/Users/F%c4%b1rat/Desktop/2/src/itudb1323.db': 'C:\Users\F%c4%b1rat' does not exist
The path should be C:/Users/Fırat/Desktop/2/src/itudb1323.db however it tries to find F%c4%b1rat
The ı is interpreted by Java as ...
U+0131 LATIN SMALL LETTER DOTLESS I character (ı)
Which is UTF-8 translated from the Windows-1252 codepage as Hex 0xC4 0xB1, hence the F%c4%b1rat part of the path. Which is required to be URL encoded for the URLClassLoader.
Sounds like you have hit a JVM bug with unicode and/or windows codepage support in the URLClassloader. Would encourage you to not deploy on these kinds of paths, or upgrade your JVM to see if this is better supported with a later JVM.
You can try to put your workspace into a directory which does not contain Windows-1254 character set (Example: Under C:/). This is a temporary solution but it can fix your problem.

Categories