Apache nutch 1.5 and solr 4.7 indexing - java

I have crawled websites using apache nutch and want to index the data in solr. I have been following the tutorial mentioned here
However the tutorial mentions about indexing as it crawls except in my case I need to index the data that already has been crawled.
I am running the below command
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
[abc#xyz nutch-crawler]$ bin/nutch index http://abc.xyz:8983/solr/ pryder/crawldb/ -linkdb pryder/linkdb/ pryder/segments/20140330021243/
Indexer: starting at 2014-04-02 20:34:09
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/solr/client/solrj/impl/CommonsHttpSolrServer
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
at java.lang.Class.getConstructor0(Class.java:2708)
at java.lang.Class.newInstance0(Class.java:328)
at java.lang.Class.newInstance(Class.java:310)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:157)
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
Caused by: java.lang.ClassNotFoundException: org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 11 more
What would be going wrong here?

Related

Configuring Pig-0.12.1 with Hadoop-2.5.0

I have an existing Hadoop client (hadoop-0.20.2) at $HADOOP_HOME. With this version of Hadoop all the client configuration files are placed in the directory: $HADOOP_HOME/conf
To get Pig to work, I have set $PIG_CLASSPATH=$HADOOP_HOME/conf. I can then simply run Pig from it's home directory without issues.
I tried to set up a new Hadoop client (hadoop-2.5.0), setting $HADOOP_HOME to point at this directory. For this version of Hadoop I have placed the client configuration at $HADOOP_HOME/etc/hadoop
I then set $PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop. However when I try to run pig now I get the error below:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:587)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 7 more
Are there other locations that need to be added to the PIG_CLASSPATH to get this up and running?
Managed to find the cause. I was copying over config to $HADOOP_HOME/etc/hadoop that had been sent to me. Inside hadoop-env.sh there was a line
export HADOOP_MAPRED_HOME=....
that was updating the location of the relevant jars to a non-existent location.

Java Error when trying to start stardog server

When I try to start my stardog-server, my terminal shows me following Java-Error
Exception in thread "main" java.lang.NoClassDefFoundError: com/complexible/stardog/cli/admin/CLI
Caused by: java.lang.ClassNotFoundException: com.complexible.stardog.cli.admin.CLI
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I have no idea why this keeps happening, because I have gotten my stardom-server to run before, but today it started showing me this Error.
Does anybody have an idea about why this keeps happening?
I'm working on OSX Mavericks (if that helps anyone)
There seems to be a problem with the classpath.
Searching for this I found this exchange:
This is due to a bug in the stardog.bat script. If you add a semicolon to line 21 so that it reads 'set CLASSPATH=%HOMEDIR%\client\api*;%HOMEDIR%\client\cli*;%HOMEDIR%\client\http*;%HOMEDIR%\client\snarl*;%HOMEDIR%\pack\client*;%SLF4J_JARS%' this problem will be fixed.

Trouble setting up lucene on Mac OSX

I'm having a lot of trouble getting Lucene to work on Mac OS 10.7.5
I downloaded the binaries from http://lucene.apache.org/core/2_9_4/demo.html.
I changed my classpath
$ echo $CLASSPATH
/Users/me/Downloads/lucene-4.5.1/demo/lucene-demo-4.5.1.jar:/Users/me/Downloads/lucene-4.5.1/core/lucene-core-4.5.1.jar
Now I'm trying to run it.
$java org.apache.lucene.demo.IndexFiles /Users/me/Downloads/lucene-4.5.1/src
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/analysis/standard/StandardAnalyzer
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
at java.lang.Class.getMethod0(Class.java:2774)
at java.lang.Class.getMethod(Class.java:1663)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.standard.StandardAnalyzer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 6 more
It's not working. Can someone give me a step-by-step guide to getting up and running with Lucene. I have a simple task I want to do achieve, which is searching for text in a directory of files in a more efficient way than grep. Any help would be appreciated.
You'll also need to add two more jars to your classpath: lucene-analyzers-common-{version}.jar to fix this problem, and lucene-queryparser-{version}.jar to fix the next one. More recent demo documentation makes this clear (the documentation you linked to is for version 2.9.4)

Not able to connect MS SQL server 2008 with java

I tried to connect java with My MS SQL express, I downloaded the sqljdbc4.jar also
When I don't bother with the CLASSPATH and all and I try to execute my program( even with sqljdbc in
C:\Program Files (x86)\Java\jre6\lib
I get the following error
java.lang.ClassNotFoundException: com.microsoft.jdbc.sqlserver.SQLServerDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at Connect.getConnection(Connect.java:24)
at Connect.displayDbProperties(Connect.java:42)
at Connect.main(Connect.java:78)
Error Trace in getConnection() : com.microsoft.jdbc.sqlserver.SQLServerDriver
Error: No active Connection
But If I try and set the CLASSPATH variable to point to sqljdbc4.jar then ClassNotHoundExcepetion is occurring with identifying my classname.
Need immediate help. kindly respond.
I use Command line mode of invoking it, I tried setting the classpath in the system variables dialog box of windows 7. and when I do that or use set CLASSPATH="C:\temp\sqljdbc4.jar" the jre fails to recogonize the main class of my program and throws
Exception in thread "main" java.lang.NoClassDefFoundError: Connect
Caused by: java.lang.ClassNotFoundException: Connect
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: Connect. Program will exit.
So I am at a loss here.
Your problem occur because classloader has not found driver jar, how you start your program?
Have you put -cp or -classpath if it command line ? Or add to lib folder if web application ?
You should set the classpath to include the sqljdbc4.jar file. See details here.

using nutch 1.4 in ubuntu

I try to use nutch 1.4 crawler in ubuntu however when I try to execute nutcg with all the setting that are suggested in nutch wiki it gives this error:
erogol#erogol-G50V:~/Desktop/search engine/apache-nutch-1.4-bin/runtime/local$
bin/nutch crawl urls -dir crawl -depth 1
bin/nutch: line 108: [: /home/erogol/Desktop/search: binary operator expected
Exception in thread "main" java.lang.NoClassDefFoundError: engine/apache-nutch-1/4-bin
/runtime/local/logs
Caused by: java.lang.ClassNotFoundException: engine.apache-nutch-
1.4-bin.runtime.local.logs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: engine/apache-nutch-1.4-bin/runtime/local/logs.
Program will exit.
Do you have any suggestion or idea to solve the using problem for nutch?
Thanks in advance... all nutch knowers :)
The problem isn't about nutch: the space within the search engine folder name creates problems. As you can see from the ClassNotFoundException the part after the space is taken as name of the class to be executed. Can't you just rename search engine to something else like search-engine?

Categories