Reading word file and saving it as odt - java

I have downloaded ODFToolkit, and I don't have to setup OpenOffice. I can create odt file as following.
And my question is - May I read .doc and .docx files and save them as .odt ?
Could you help me please?
Here is the code:
import org.odftoolkit.odfdom.doc.OdfTextDocument;
import java.net.URI;
public class QuickOdt {
public static void main(String[] args) {
OdfTextDocument outputDocument;
try {
outputDocument = OdfTextDocument.newTextDocument();
outputDocument.addText("I'm using the ODFDOM toolkit!");
outputDocument.newParagraph();
outputDocument.newImage(new URI("images/odf-community.jpg"));
outputDocument.newParagraph("Bu bir taze paragrafdyr");
outputDocument.save("quick.odt");
} catch (Exception e) {
System.err.println("Unable to create output file.");
System.err.println(e.getMessage());
}
}
}

You should look into Apache POI for reading word documents, and Apache OpenOffice Writer for writing into odt format.

Related

Crawl online directories and parse online pdf document to extract text in java

I need to be able to crawl an online directory such as for example this one http://svn.apache.org/repos/asf/ and whenever a pdf, docx, txt, or odt file come across the crawling, I need to be able to parse, and extract the text from it.
I am using files.walk in order to crawl around locally in my laptop, and Apache Tika library to parse text, and it works just fine, but I don't really know how can I do the same in an online directory.
Here's the code that goes through my PC and parses the files just so you guys have an idea of what I'm doing:
public static void GetFiles() throws IOException {
//PathXml is the path directory such as "/home/user/" that
//is taken from an xml file .
Files.walk(Paths.get(PathXml)).forEach(filePath -> { //Crawling process (Using Java 8)
if (Files.isRegularFile(filePath)) {
if (filePath.toString().endsWith(".pdf") || filePath.toString().endsWith(".docx") ||
filePath.toString().endsWith(".txt")){
try {
TikaReader.ParsedText(filePath.toString());
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
System.out.println(filePath);
}
}
});
}
and here's the TikaReader method:
public static String ParsedText(String file) throws IOException, SAXException, TikaException {
InputStream stream = new FileInputStream(file);
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
try {
parser.parse(stream, handler, metadata);
System.out.println(handler.toString());
return handler.toString();
} finally {
stream.close();
}
}
So again, how can I do the same thing with the given online directory above?

Tika slower in my Java app than in TikaJAXRS

I'm trying to make use of Tika from a C# project that needs to extract text from a large volume of files.
I started with a simple proof of concept that made use of TikaJAXRS, reading the content of the files and making a HTTP PUT request with the file content to the TikaJAXRS server at http://localhost:9998/tika. This works reasonably well, but it struck me that the overhead of streaming content through HTTP must be slowing things down.
So I decided to write a Java implementation to see how the performance would compare once HTTP is removed from the equation. What I've found is unexpected. It performs much slower, taking roughly twice as long to parse 65 files of various types totaling 16MB. 1200ms for the TikaJAXRS HTTP scenario, 2400ms for the Java app.
Both the TikaJAXRS server and the Tika libraries I'm using are version 1.7. My Java code listing is below. What am I missing, why is my Java app so much slower?
import org.apache.log4j.varia.NullAppender;
import org.apache.tika.Tika;
import org.apache.tika.exception.TikaException;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.TrueFileFilter;
import org.apache.commons.lang3.time.StopWatch;
public class TikaTest {
public static void main(String[] args) {
// I'm not interested in what log4j has to say...
org.apache.log4j.BasicConfigurator.configure(new NullAppender());
File folder = new File("C:\\LMDevelopment");
StopWatch timer = new StopWatch();
timer.start();
Collection<File> files = FileUtils.listFiles(folder, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
Tika tikaClient = new Tika();
try {
tikaClient.parseToString(files.iterator().next());
} catch (IOException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
System.out.println("Time to warm up: " + timer.getTime() + "ms");
timer.reset();
timer.start();
for (File f : files)
{
try {
tikaClient.parseToString(f);
} catch (IOException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
}
timer.stop();
System.out.println("Time to parse all files: " + timer.getTime() + "ms");
}
}

Java detecting an audio file (mp3)

I have this code that reads an mp3 file
import java.io.File;
import java.io.IOException;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;
public class Sound {
public static void main(String[] args) {
File sampleFile = new File("test.mp3");
try {
AudioSystem.getAudioFileFormat(sampleFile);
} catch (UnsupportedAudioFileException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The problem here is that it is returning file not supported exception, the file here is an mp3 file. Java doesn't support mp3 files? if so what are others to validate an audio file?(like ogg, wav)
You may take a look at Apache Tika library. It can detect type of a file by its content and extract file metadata. It supports mp3 format.
Here is an example of file type detection with Apache Tika.
You need to add MP3SPI library so that java audio api could recognize and decode mp3 files.

Java How do I read and write an internal properties file?

I have a file I'm using to hold system information that my program needs on execution.
The program will read from it and write to it periodically. How do I do this? Among other problems, I'm having trouble with paths
Example
How do I read/write to this properites file if deploying application as runnable jar
Take a look at the http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html
You can utilize this class to use your key=value pairs in the property/config file
Second part of your question, how to build a runnable jar. I'd do that with maven, take a look at this :
How can I create an executable JAR with dependencies using Maven?
and this :
http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html
I see you're not using maven to build your project altogether
You can't write to a file that exists as part of a ZIP file... it does not exist as a file on the filesystem.
Considered the Preferences API?
To read from a file you can declare a file reader using a scanner as
Scanner diskReader = new Scanner(new File("myProp.properties"));
After then for example if you want to read a boolean value from the properties file use
boolean Example = diskReader.nextBoolean();
If you wan't to write to a file it's a bit more complicated but this is how I do it:
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Random;
import java.util.Scanner;
public class UpdateAFile {
static Random random = new Random();
static int numberValue = random.nextInt(100);
public static void main(String[] args) {
File file = new File("myFile.txt");
BufferedWriter writer = null;
Scanner diskScanner = null;
try {
writer = new BufferedWriter(new FileWriter(file, true));
} catch (IOException e) {
e.printStackTrace();
}
try {
diskScanner = new Scanner(file);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
appendTo(writer, Integer.valueOf(numberValue).toString());
int otherValue = diskScanner.nextInt();
appendTo(writer, Integer.valueOf(otherValue + 10).toString());
int yetAnotherValue = diskScanner.nextInt();
appendTo(writer, Integer.valueOf(yetAnotherValue * 10).toString());
try {
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
static void appendTo(BufferedWriter writer, String string) {
try {
writer.write(string);
writer.newLine();
writer.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
}
And then write to the file by:
diskWriter.write("BlahBlahBlah");

java database backup and restore

How do I backup / restore any kind of databases inside my java application to flate files.Are there any tools framework available to backup database to flat file like CSV, XML, or secure encrypted file, or restore from csv or xml files to databases, it should be also capable of dumping table vise restore and backup also.
There are many ways to do this. It really depends on how complicated your "database" is.
The simplest solution is to write to a text file in a CSV format:
import java.io.PrintWriter;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class FileOutput {
public static void main(String[] args) {
File file = new File("C:\\MyFile.csv");
FileOutputStream fis = null;
PrintWriter output = null;
try {
fos = new FileOutputStream(file);
output = new PrintWriter(fos);
output.println("Column A, Column B, Column C");
// dispose all the resources after using them.
outputStream.flush();
fos.close();
outputStream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Or, if you're looking for an XML solution, you can play with Xerces API, which I think is included in the latest JDK, so you just have to include the packages.

Categories