Converting huge amount of small PDF files to PNG - java

I need to transform over 500k pdf files to png (with the sufficient density) tpo be able to treat it later (reading a QR code embedded in the pdf). The files normally don't surpass 200kb
I tried using magick to convert it (first using convert and then mogrify), but figured that it will take days to finish and figured that maybe doing it with threads was better. So i implemented a little app in java that creates n threads and executes in the windows shell (Runtime.getRuntime().exec()) a builded command with the file and target and all that.
Problem is it kills my pc. Apparently magick uses multithreads for processing each images and since some of those are taken by the script I did, it takes longer and takes resources that the jvm wouldnt normally take. Here is my code:
public class SuperPdfToPngConverter {
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("Setting Up Environment...");
System.out.println("Reading existing files...");
// To change according to our needs
String targetFolder = "D:\\Digest";
// Execution parameters
ArrayList<String> myList = getTodoList(targetFolder);
ArrayList<String> rejected = new ArrayList<String>();
System.out.println("I've found " + myList.size() + " documents pending to be converted.");
System.out.println(" Treating files... ");
int count = 1;
ExecutorService executor = Executors.newFixedThreadPool(16);
for (String fileId : myList) {
System.out.println("Queueing file " + count + ", " + fileId);
Runnable worker = new WorkerThread(fileId, targetFolder, true, 150);
executor.execute(worker);
//rejected.add(fileId);
count++;
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished");
System.out.println(
"Treated " + (count - 1) + " documents; ");
}
And the working part on each thread goes like this:
#Override
public void run() {
System.out.println(Thread.currentThread().getName()+" Start. File = "+ fileName);
processCommand();
System.out.println(Thread.currentThread().getName()+" End.");
}
private void processCommand() {
String fileNamePng = fileName.replace(".pdf", ".png");
String cmd = "magick convert -density " + density + "x" + density + " " + fileName + " " + fileNamePng;
System.out.println(cmd);
try {
Runtime.getRuntime().exec(cmd, null, new File(targetPath));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Document " + fileName + " processed.");
}
I was wondering what can I do to make this run a little bit faster and stable. I don't care much about how long it takes (as long as its reasonable).
Do you think my approach is good? Is it better if I use a library rather than a tool like magick?
Thank you for your insight.

Related

Shutting a Java application programmatically and restarting it again with different VM Arguments

I have a Java application (say X) whose performance (time taken to run the application) has to be logged in a text file along with the specified VM Arguments. Performance of my application will vary as we change the maximum Heap Size.
So, my requirement is to start the application X from another program (java or python or shell script) with one set of VM argument(say Xmx50M), perform the operations, log the time, shut it down and then perform the same set of operations with a different VM argument. I have to do this multiple times for multiple VM arguments.
I am not sure how can I do this.
I have read various threads here and blogs but couldn't find anything which could let me shut and then restart the application with different set of VM arguments.
I have also tried using shutdown hooks but that didn't help. I guess I am doing something wrong in the usage.
public static void restartApplication() throws IOException {
try {
String java = System.getProperty("java.home") + "/bin/java";
List<String> vmArguments = ManagementFactory.getRuntimeMXBean()
.getInputArguments();
StringBuffer vmArgsOneLine = new StringBuffer();
for (String arg : vmArguments) {
if (!arg.contains("-agentlib")) {
vmArgsOneLine.append(arg);
vmArgsOneLine.append(" ");
}
}
final StringBuffer cmd = new StringBuffer("\"" + java + "\" "
+ vmArgsOneLine);
String[] mainCommand = System.getProperty("sun.java.command")
.split(" ");
if (mainCommand[0].endsWith(".jar")) {
cmd.append("-jar " + new File(mainCommand[0]).getPath());
} else {
cmd.append("-cp \"" + System.getProperty("java.class.path")
+ "\" " + mainCommand[0]);
}
for (int i = 1; i < mainCommand.length; i++) {
cmd.append(" ");
cmd.append(mainCommand[i]);
}
Runtime.getRuntime().addShutdownHook(new Thread() {
#Override
public void run() {
try {
Runtime.getRuntime().exec(cmd.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
});
System.exit(0);
} catch (Exception e) {
throw new IOException(
"Error while trying to restart the application", e);
}

Downloading long files freezes the computer

I have made a launcher for my game Privateers. It works perfectly, downloads everything needed - But, for some reason, it freezes the entire computer!
The freeze usually occurs when doing something, if I afk at my computer until the download completes, then nothing happens. However, when I tested it on my mothers computer playing the game "World Of Tanks", the computer froze almost immediately. If I play a game then the launcher also has a tendency to freeze my computer.
I use windows 8, my mother uses windows 7.
On my own computer, when this happens I am able to move the mouse very slowly (between 30 second to 2 minute delay), alt+tab won't work, control+alt+delete will work (but when opening task manager the task manager does not appear).
On my mothers' computer it is basically the same except that EVERYTHING is frozen 100% except for the mouse which is working fine.
It only happens when downloading large (5MB+) files. When my launcher downloads smaller files there is no issue.
I use the following code to download files:
void download(String source, String destination, int size) {
File ofile = new File(System.getProperty("user.dir") + "", destination);
System.out.printf("\nDownloading\n\t%s\nTo\n\t%s\n", source, destination);
try {
if (ofile.exists()) ofile.delete();
if (!ofile.createNewFile()) {
throw new IOException("Can't create " + ofile.getAbsolutePath());
}
int inChar = 0;
URL url = new URL(source);
InputStream input = url.openStream();
FileOutputStream fos = new FileOutputStream(ofile);
for (int i = 0; i < size && inChar != -1; i++) {
int percentage = (int) ((i * 100.0f) / size);
progressBar.setValue(((int) ((percentage * 100.0f) / 100)));
fr.setTitle(ofile.getName() + ": " + progressBar.getValue() + "%" + " Total: " + oprogressBar.getValue() + "%");
inChar = input.read();
fos.write(inChar);
}
input.close();
fos.close();
System.out.println("Downloaded " + ofile.getAbsolutePath());
} catch (EOFException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
I have been unable to find a duplicate of this happening when searching on the internet. Any help is appreciated.
Maybe multithreading will help you out over here.
See more about it in this post
Buffer the input stream, or read more than one character at a time, or both.

java saved file disappears (ObjectOutputStream)

well this is a strange one.
The first save attampt usually works (1 more try max).
but in a heavy load (many saves in a row) the saved file disappears.
if uncommenting the "Thread.sleep" the error is captured otherwise the validation passes succesfully
public void save(Object key, T objToSave) throws FileNotFoundException, IOException {
IOException ex = null;
for (int i = 0; i < NUM_OF_RETRIES; i++) {
try {
/* saving */
String filePath = getFilePath(key);
OutputStream outStream = getOutStream(filePath);
ObjectOutputStream os = new ObjectOutputStream(outStream);
os.writeObject(objToSave);
os.close();
/* validations warnings etc. */
if (i>0){
logger.warn(objToSave + " saved on attamped " + i);
/* sleep more on each fail */
Thread.sleep(100+i*8);
}
//Thread.sleep(50);
File doneFile = new File(filePath);
if (! (doneFile.exists())){
logger.error("got here but file was not witten to disk ! id was" + key);
throw new IOException();
}
logger.info("6. start persist " + key + " path=" + new File(filePath).getAbsolutePath() + " exists="+doneFile.exists());
return;
} catch (IOException e) {
logger.error(objToSave + " failed on attamped " + i);
ex = e;
} catch (InterruptedException e) {
e.printStackTrace();
}
}
throw ex;
}
It is not a java writers issue.
I was not using threads explicitly but in my test I was deleting the folder i was saving to using: Runtime.getRuntime("rm -rf saver_directory");
I found out the hard way that it is asynchronous and the exact delete and create time was changing in mili-seconds.
so the solution was adding "sleep" after the delete.
the correct answer would be using java for the delete and not making a shortcuts ;)
Thank you all.

Issues with running Runtime.getRuntime().exec

I'm using process = Runtime.getRuntime().exec(cmd,null,new File(path));
to execute some SQL in file (abz.sql)
Command is:
"sqlplus "+ context.getDatabaseUser() + "/"
+ context.getDatabasePassword() + "#"
+ context.getDatabaseHost() + ":"
+ context.getDatabasePort() + "/"
+ context.getSid() + " #"
+ "\""
+ script + "\"";
String path=context.getReleasePath()+ "/Server/DB Scripts";
It is executing that file but not getting exit. Hence I tried using:
Writer out = new OutputStreamWriter(process.getOutputStream());
out.append("commit;\r\n");
out.append("exit \r\n");
System.out.println("---------"+out);
out.close();
This it complete block that I m using:
if(context.getConnectionField()=="ORACLE")
{
String cmd=
"sqlplus "+ context.getDatabaseUser() + "/"
+ context.getDatabasePassword() + "#"
+ context.getDatabaseHost() + ":"
+ context.getDatabasePort() + "/"
+ context.getSid() + " #"
+ "\""
+ script +"\"";
String path=context.getReleasePath()+ "/Server/DB Scripts";
process = Runtime.getRuntime().exec(cmd,null,new File(path));
out = new OutputStreamWriter(process.getOutputStream());
out.append("commit;\r\n");
out.append("exit \r\n");
System.out.println("---------"+out);
out.close();
Integer result1 = null;
while (result1 == null) {
try {
result1 = process.waitFor();
}
catch (InterruptedException e) {}
}
if(process.exitValue() != 0)
return false;
return true;
}
The code shown fails to read the error stream of the Process. That might be blocking progress. ProcessBuilder was introduced in Java 1.5 and has a handy method to redirectErrorStream() - so that it is only necessary to consume a single stream.
For more general tips, read & implement all the recommendations of When Runtime.exec() won't.
I can see a few issues here. The version of 'exec' that you are using will tokenize the command string using StringTokenizer, so unusual characters in the password (like spaces) or the other parameters being substituted are accidents waiting to happen. I recommend switching to the version
Process exec(String[] cmdarray,
String[] envp,
File dir)
throws IOException
It is a bit more work to use but much more robust.
The second issue that there are all kinds of caveat about whether or not exec will run concurrently with the Java process (see http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Process.html). So you need to say which operating system you're on. If it does not run concurrently then your strategy of writing to the output stream cannot work!
The last bit of the program is written rather obscurely. I suggest ...
for (;;) {
try {
process.waitFor();
return process.exitValue() == 0;
} catch ( InterruptedException _ ) {
System.out.println( "INTERRUPTED!" ); // Debug only.
}
}
This eliminates the superfluous variable result1, eliminates the superfluous boxing and highlights a possible cause of endless looping.
Hope this helps & good luck!

Thread interrupt not ending blocking call on input stream read

I'm using RXTX to read data from a serial port. The reading is done within a thread spawned in the following manner:
CommPortIdentifier portIdentifier = CommPortIdentifier.getPortIdentifier(port);
CommPort comm = portIdentifier.open("Whatever", 2000);
SerialPort serial = (SerialPort)comm;
...settings
Thread t = new Thread(new SerialReader(serial.getInputStream()));
t.start();
The SerialReader class implements Runnable and just loops indefinitely, reading from the port and constructing the data into useful packages before sending it off to other applications. However, I've reduced it down to the following simplicity:
public void run() {
ReadableByteChannel byteChan = Channels.newChannel(in); //in = InputStream passed to SerialReader
ByteBuffer buffer = ByteBuffer.allocate(100);
while (true) {
try {
byteChan.read(buffer);
} catch (Exception e) {
System.out.println(e);
}
}
}
When a user clicks a stop button, the following functionality fires that should in theory close the input stream and break out of the blocking byteChan.read(buffer) call. The code is as follows:
public void stop() {
t.interrupt();
serial.close();
}
However, when I run this code, I never get a ClosedByInterruptException, which SHOULD fire once the input stream closes. Furthermore, the execution blocks on the call to serial.close() -- because the underlying input stream is still blocking on the read call. I've tried replacing the interrupt call with byteChan.close(), which should then cause an AsynchronousCloseException, however, I'm getting the same results.
Any help on what I'm missing would be greatly appreciated.
You can't make a stream that doesn't support interruptible I/O into an InterruptibleChannel simply by wrapping it (and, anyway, ReadableByteChannel doesn't extend InterruptibleChannel).
You have to look at the contract of the underlying InputStream. What does SerialPort.getInputStream() say about the interruptibility of its result? If it doesn't say anything, you should assume that it ignores interrupts.
For any I/O that doesn't explicitly support interruptibility, the only option is generally closing the stream from another thread. This may immediately raise an IOException (though it might not be an AsynchronousCloseException) in the thread blocked on a call to the stream.
However, even this is extremely dependent on the implementation of the InputStream—and the underlying OS can be a factor too.
Note the source code comment on the ReadableByteChannelImpl class returned by newChannel():
private static class ReadableByteChannelImpl
extends AbstractInterruptibleChannel // Not really interruptible
implements ReadableByteChannel
{
InputStream in;
⋮
The RXTX SerialInputStream (what is returned by the serial.getInputStream() call) supports a timeout scheme that ended up solving all my problems. Adding the following before creating the new SerialReader object causes the reads to no longer block indefinitely:
serial.enableReceiveTimeout(1000);
Within the SerialReader object, I had to change a few things around to read directly from the InputStream instead of creating the ReadableByteChannel, but now, I can stop and restart the reader without issue.
i am using the code below to shutdown rxtx. i run tests that start them up and shut them down and the seems to work ok. my reader looks like:
private void addPartsToQueue(final InputStream inputStream) {
byte[] buffer = new byte[1024];
int len = -1;
boolean first = true;
// the read can throw
try {
while ((len = inputStream.read(buffer)) > -1) {
if (len > 0) {
if (first) {
first = false;
t0 = System.currentTimeMillis();
} else
t1 = System.currentTimeMillis();
final String part = new String(new String(buffer, 0, len));
queue.add(part);
//System.out.println(part + " " + (t1 - t0));
}
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
//System.out.println(Thread.currentThread().getName() + " interrupted " + e);
break;
}
}
} catch (IOException e) {
System.err.println(Thread.currentThread().getName() + " " + e);
//if(interruSystem.err.println(e);
e.printStackTrace();
}
//System.out.println(Thread.currentThread().getName() + " is ending.");
}
thanks
public void shutdown(final Device device) {
shutdown(serialReaderThread);
shutdown(messageAssemblerThread);
serialPort.close();
if (device != null)
device.setSerialPort(null);
}
public static void shutdown(final Thread thread) {
if (thread != null) {
//System.out.println("before intterupt() on thread " + thread.getName() + ", it's state is " + thread.getState());
thread.interrupt();
//System.out.println("after intterupt() on thread " + thread.getName() + ", it's state is " + thread.getState());
try {
Thread.sleep(100);
} catch (InterruptedException e) {
System.out.println(Thread.currentThread().getName() + " was interrupted trying to sleep after interrupting" + thread.getName() + " " + e);
}
//System.out.println("before join() on thread " + thread.getName() + ", it's state is " + thread.getState());
try {
thread.join();
} catch (InterruptedException e) {
System.out.println(Thread.currentThread().getName() + " join interruped");
}
//System.out.println(Thread.currentThread().getName() + " after join() on thread " + thread.getName() + ", it's state is" + thread.getState());
}

Categories