I'm trying to make use of Tika from a C# project that needs to extract text from a large volume of files.
I started with a simple proof of concept that made use of TikaJAXRS, reading the content of the files and making a HTTP PUT request with the file content to the TikaJAXRS server at http://localhost:9998/tika. This works reasonably well, but it struck me that the overhead of streaming content through HTTP must be slowing things down.
So I decided to write a Java implementation to see how the performance would compare once HTTP is removed from the equation. What I've found is unexpected. It performs much slower, taking roughly twice as long to parse 65 files of various types totaling 16MB. 1200ms for the TikaJAXRS HTTP scenario, 2400ms for the Java app.
Both the TikaJAXRS server and the Tika libraries I'm using are version 1.7. My Java code listing is below. What am I missing, why is my Java app so much slower?
import org.apache.log4j.varia.NullAppender;
import org.apache.tika.Tika;
import org.apache.tika.exception.TikaException;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.TrueFileFilter;
import org.apache.commons.lang3.time.StopWatch;
public class TikaTest {
public static void main(String[] args) {
// I'm not interested in what log4j has to say...
org.apache.log4j.BasicConfigurator.configure(new NullAppender());
File folder = new File("C:\\LMDevelopment");
StopWatch timer = new StopWatch();
timer.start();
Collection<File> files = FileUtils.listFiles(folder, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
Tika tikaClient = new Tika();
try {
tikaClient.parseToString(files.iterator().next());
} catch (IOException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
System.out.println("Time to warm up: " + timer.getTime() + "ms");
timer.reset();
timer.start();
for (File f : files)
{
try {
tikaClient.parseToString(f);
} catch (IOException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
}
timer.stop();
System.out.println("Time to parse all files: " + timer.getTime() + "ms");
}
}
Related
I have been experimenting with Java Swing using a GUI and have hit a wall. I am trying to play a sound using Java Sound. Ultimately, I want to push a button and the sound plays. I have tried a lot of combinations but none seem to work. Here is the latest code I tried and I code and it reports:
Error: could not find or load main class.
I am not seeing why:
package net.codejava.sound;
import java.io.File;
import java.io.IOException;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.UnsupportedAudioFileException;
/**
* This is an example program that demonstrates how to play back an audio file
* using the SourceDataLine in Java Sound API.
* #author www.codejava.net
*
*/
public class AudioPlayerExample2 {
// size of the byte buffer used to read/write the audio stream
private static final int BUFFER_SIZE = 4096;
/**
* Play a given audio file.
* #param audioFilePath Path of the audio file.
*/
void play(String audioFilePath) {
File audioFile = new File(audioFilePath);
try {
AudioInputStream audioStream = AudioSystem.getAudioInputStream(audioFile);
AudioFormat format = audioStream.getFormat();
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
SourceDataLine audioLine = (SourceDataLine) AudioSystem.getLine(info);
audioLine.open(format);
audioLine.start();
System.out.println("Playback started.");
byte[] bytesBuffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = audioStream.read(bytesBuffer)) != -1) {
audioLine.write(bytesBuffer, 0, bytesRead);
}
audioLine.drain();
audioLine.close();
audioStream.close();
System.out.println("Playback completed.");
} catch (UnsupportedAudioFileException ex) {
System.out.println("The specified audio file is not supported.");
ex.printStackTrace();
} catch (LineUnavailableException ex) {
System.out.println("Audio line for playing back is unavailable.");
ex.printStackTrace();
} catch (IOException ex) {
System.out.println("Error playing the audio file.");
ex.printStackTrace();
}
}
public static void main(String[] args) {
String audioFilePath = "https://codehs.com/uploads/1981fc4b1d2e4123e9cbe7ab8cc1962a";
AudioPlayerExample2 player = new AudioPlayerExample2();
player.play(audioFilePath);
}
}
I made a couple small changes to the tutorial code example you posted, and the program worked perfectly well.
Here are my changes:
(1) Replaced "File audioFile = new File(audioFilePath);" with the following:
URL audioFile = null;
try {
audioFile = new URL(audioFilePath);
} catch (MalformedURLException e) {
e.printStackTrace();
}
(2) Added the following line to the module-info file (required if you are using Java 9 or higher):
requires java.desktop;
My package setting is slightly different, but I assume you know how to properly set up packages. Your class is in the file folder specified by the package statement, yes?
The error being cited: "could not find or load main class" indicates that something is going wrong with how the code is being invoked rather than a problem with the audio part of the code. What version of Java are you using? What IDE? What is the command you are issuing to execute the program? FWIW, my setup that successfully executed this code has an up-to-date Eclipse IDE running Java 11.
Nam Ha Minh's tutorials at codejava.net usually are quite good. I think he is one of the more reliable tutorial writers out there.
I have this code that reads an mp3 file
import java.io.File;
import java.io.IOException;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;
public class Sound {
public static void main(String[] args) {
File sampleFile = new File("test.mp3");
try {
AudioSystem.getAudioFileFormat(sampleFile);
} catch (UnsupportedAudioFileException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The problem here is that it is returning file not supported exception, the file here is an mp3 file. Java doesn't support mp3 files? if so what are others to validate an audio file?(like ogg, wav)
You may take a look at Apache Tika library. It can detect type of a file by its content and extract file metadata. It supports mp3 format.
Here is an example of file type detection with Apache Tika.
You need to add MP3SPI library so that java audio api could recognize and decode mp3 files.
I want to use ttorent java lib in my project. I am tried to figure out how it works. When I want to use it as a standalone program and call
./client -o ~ ~/file.torrent -i eth3
there is always 0%. When I tried to use it as a library with simple code like this:
import java.io.File;
import java.net.InetAddress;
import java.util.concurrent.TimeUnit;
import org.apache.log4j.BasicConfigurator;
import com.turn.ttorrent.client.Client;
import com.turn.ttorrent.client.Client.ClientState;
import com.turn.ttorrent.client.SharedTorrent;
public class Main {
/**
* #param args
*/
public static void main(String[] args) {
BasicConfigurator.configure();
// Get options
File output = new File("/home/user");
// Get the .torrent file path
File torrentPath = new File("/home/user/file.torrent");
// Start downloading file
try {
SharedTorrent torrent = SharedTorrent.fromFile(torrentPath, output);
System.out.println("Starting client for torrent: "
+ torrent.getName());
Client client = new Client(InetAddress.getLocalHost(),
torrent);
try {
System.out.println("Start to download: " + torrent.getName());
client.download(); // DONE for completion signal
while (!ClientState.SEEDING.equals(client.getState())) {
// Check if there's an error
if (ClientState.ERROR.equals(client.getState())) {
throw new Exception("ttorrent client Error State");
}
// Display statistics
System.out
.printf("%f %% - %d bytes downloaded - %d bytes uploaded\n",
torrent.getCompletion(),
torrent.getDownloaded(),
torrent.getUploaded());
// Wait one second
TimeUnit.SECONDS.sleep(1);
}
System.out.println("download completed.");
} catch (Exception e) {
System.err.println("An error occurs...");
e.printStackTrace(System.err);
} finally {
System.out.println("stop client.");
client.stop();
}
} catch (Exception e) {
System.err.println("An error occurs...");
e.printStackTrace(System.err);
}
}
}
It is always 0% too. I tried to download with this torrent file using some other client and it was ok, so I assume this is no lack of seeds.
I'm trying to create a simple Flash chat application for educational purposes, but I'm stuck trying to send a policy file from my Java server to the Flash app (after several hours of googling with little luck).
The policy file request reaches the server that sends a harcoded policy xml back to the app, but the Flash app doesn't seem to react to it at all until it gives me a security sandbox error.
I'm loading the policy file using the following code in the client:
Security.loadPolicyFile("xmlsocket://myhostname:" + PORT);
The server recognizes the request as "<policy-file-request/>" and responds by sending the following xml string to the client:
public static final String POLICY_XML =
"<?xml version=\"1.0\"?>"
+ "<cross-domain-policy>"
+ "<allow-access-from domain=\"*\" to-ports=\"*\" />"
+ "</cross-domain-policy>";
The code used to send it looks like this:
try {
_dataOut.write(PolicyServer.POLICY_XML + (char)0x00);
_dataOut.flush();
System.out.println("Policy sent to client: " + PolicyServer.POLICY_XML);
} catch (Exception e) {
trace(e);
}
Did I mess something up with the xml or is there something else I might have overlooked?
I've seen your approach and after some time trying i wrote a working class, listening on any port you want:
package Server;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.ServerSocket;
import java.net.Socket;
public class PolicyServer {
public static final String POLICY_XML =
"<?xml version=\"1.0\"?>"
+ "<cross-domain-policy>"
+ "<allow-access-from domain=\"*\" to-ports=\"*\" />"
+ "</cross-domain-policy>";
public PolicyServer(){
ServerSocket ss = null;
try {
ss = new ServerSocket(843);
} catch (IOException e) {e.printStackTrace();}
while(true){
try {
final Socket client = ss.accept();
new Thread(new Runnable() {
#Override
public void run() {
try {
client.setSoTimeout(10000); //clean failed connections
client.getOutputStream().write(PolicyServer.POLICY_XML.getBytes());
client.getOutputStream().write(0x00); //write required endbit
client.getOutputStream().flush();
BufferedReader in = new BufferedReader(new InputStreamReader(client.getInputStream()));
//reading two lines emties flashs buffer and magically it works!
in.readLine();
in.readLine();
} catch (IOException e) {
}
}
}).start();
} catch (Exception e) {}
}
}
}
Try add \n at the end of policy xml.
How do I backup / restore any kind of databases inside my java application to flate files.Are there any tools framework available to backup database to flat file like CSV, XML, or secure encrypted file, or restore from csv or xml files to databases, it should be also capable of dumping table vise restore and backup also.
There are many ways to do this. It really depends on how complicated your "database" is.
The simplest solution is to write to a text file in a CSV format:
import java.io.PrintWriter;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class FileOutput {
public static void main(String[] args) {
File file = new File("C:\\MyFile.csv");
FileOutputStream fis = null;
PrintWriter output = null;
try {
fos = new FileOutputStream(file);
output = new PrintWriter(fos);
output.println("Column A, Column B, Column C");
// dispose all the resources after using them.
outputStream.flush();
fos.close();
outputStream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Or, if you're looking for an XML solution, you can play with Xerces API, which I think is included in the latest JDK, so you just have to include the packages.