I am writing a small program to retrieve a large number of XML files. The program sort of works, but no matter which solution from stackoverflow I use, every XML file I save locally misses the end of the file. By "the end of the file" I mean approximately 5-10 lines of xml code. The files are of different length (~500-2500 lines) and the total length doesn't seem to have an effect on the size of the missing bit. Currently the code looks like this:
package plos;
import static org.apache.commons.io.FileUtils.copyURLToFile;
import java.io.File;
public class PlosXMLfetcher {
public PlosXMLfetcher(URL u,File f) {
try {
org.apache.commons.io.FileUtils.copyURLToFile(u, f);
} catch (IOException ex) {
Logger.getLogger(PlosXMLfetcher.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I have tried using BufferedInputStream and ReadableByteChannel as well. I have tried running it in threads, I have tried using read and readLine. Every solution gives me an incomplete XML file as return.
In some of my tests (I can't remember which, sorry), I got a socket connection reset error - but the above code executes without error messages.
I have manually downloaded some of the XML files as well, to check if they are actually complete on the remote server - which they are.
I'm guessing that somewhere along the way a BufferedWriter or BufferedOutputStream has not had flush() called on it.
Why not write your own copy function to rule out FileUtils.copyURLToFile(u, f)
public void copyURLToFile(u, f) {
InputStream in = u.openStream();
try {
FileOutputStream out = new FileOutputStream(f);
try {
byte[] buffer = new byte[1024];
int count;
while ((count = in.read(buffer) > 0) {
out.write(buffer, 0, count);
}
out.flush();
} finally {
out.close();
}
} finally {
in.close();
}
}
I'm looking for a foolproof way to generate a temporary file that will have always end up with a unique name on a per JVM basis. Basically I want to be sure in a multithreaded application that if two or more threads attempt to create a temporary file at the exact same moment in time that they will both end up with a unique temporary file and no exceptions will be thrown.
This is the method I have currently:
public File createTempFile(InputStream inputStream) throws FileUtilsException {
File tempFile = null;
OutputStream outputStream = null;
try {
tempFile = File.createTempFile("app", ".tmp");
tempFile.deleteOnExit();
outputStream = new FileOutputStream(tempFile);
IOUtils.copy(inputStream, outputStream);
} catch (IOException e) {
logger.debug("Unable to create temp file", e);
throw new FileUtilsException(e);
} finally {
try { if (outputStream != null) outputStream.close(); } catch (Exception e) {}
try { if (inputStream != null) inputStream.close(); } catch (Exception e) {}
}
return tempFile;
}
Is this perfectly safe for what my goal is? I reviewed the documentation at the below URL but I'm not sure.
See java.io.File#createTempFile
The answer posted at the below URL answers my question. The method I posted is safe in a multithreaded single JVM process environment. To make it safe in a multithreaded multi-JVM process environment (e.g. a clustered web app) you can use Chris Cooper's idea which involves passing a unique value in the prefix argument for the File.createTempFile method within each JVM process.
Is createTempFile thread-safe?
Just use the thread name and current time in millis to name the file.
You can supply a different prefix or suffix to the temporary files for this exact reason.
Assign a unique ID to each process starting up, and use that unique id as the prefix or suffix, multiple threads in the same VM will not clash, and now VMs will not clash either.
Here is how I compressed the string into a file:
public static void compressRawText(File outFile, String src) {
FileOutputStream fo = null;
GZIPOutputStream gz = null;
try {
fo = new FileOutputStream(outFile);
gz = new GZIPOutputStream(fo);
gz.write(src.getBytes());
gz.flush();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
gz.close();
fo.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Here is how I decompressed it:
static int BUFFER_SIZE = 8 * 1024;
static int STRING_SIZE = 2 * 1024 * 1024;
public static String decompressRawText(File inFile) {
InputStream in = null;
InputStreamReader isr = null;
StringBuilder sb = new StringBuilder(STRING_SIZE);//constant resizing is costly, so set the STRING_SIZE
try {
in = new FileInputStream(inFile);
in = new BufferedInputStream(in, BUFFER_SIZE);
in = new GZIPInputStream(in, BUFFER_SIZE);
isr = new InputStreamReader(in);
char[] cbuf = new char[BUFFER_SIZE];
int length = 0;
while ((length = isr.read(cbuf)) != -1) {
sb.append(cbuf, 0, length);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
return sb.toString();
}
The decompression seems to take forever to do. I have got a feeling that I am doing too much redundant steps in the decompression bit. any idea of how I could speed it up?
EDIT: have modified the code to the above based on the following given recommendations,
1. I chaged the pattern, so to simply my code a bit, but if I couldn't use IOUtils is this still ok to use this pattern?
2. I set the StringBuilder buffer to be of 2M, as suggested by entonio, should I set it to be a little bit more? the memory is still OK, I still have around 10M available as it is suggested by the heap monitor from eclipse
3. I cut the BufferedReader and added a BufferedInputStream, but I am still not sure about the BUFFER_SIZE, any suggestions?
The above modification has improved the time taken to loop all my 30 2M files from almost 30 seconds to around 14, but I need to reduce it to under 10, is it even possible on android? Ok, basically, I need to process a text file in all 60M, I have divided them up into 30 2M, and before I start processing on each strings, I did the above timing on the time cost for me just to loop all the files and get the String in the file into my memory. Since I don't have much experience, will it be better, if I use 60 of 1M files instead? or any other improvement should I adopt? Thanks.
ALSO: Since physical IO is quite time consuming, and since my compressed version of files are all quite small(around 2K from 2M of text), is it possible for me to still do the above, but on a file that is already mapped to memory? possibly using java NIO? Thanks
The BufferedReader's only purpose is the readLine() method you don't use, so why not just read from the InputStreamReader? Also, maybe decreasing the buffer size may be helpful. Also, you should probably specify the encoding while both reading and writing, though that shouldn't have an impact on performance.
edit: more data
If you know the size of the string ahead, you should add a length parameter to decompressRawText and use it to initialise the StringBuilder. Otherwise it will be constantly resized in order to accomodate the result, and that's costly.
edit: clarification
2MB implies a lot of resizes. There is no harm if you specify a capacity higher than the length you end up with after reading (other than temporarily using more memory, of course).
You should wrap the FileInputStream with a BufferedInputStream before wrapping with a GZipInputStream, rather than using a BufferedReader.
The reason is that, depending on implementation, any of the various input classes in your decoration hierarchy could decide to read on a byte-by-byte basis (and I'd say the InputStreamReader is most likely to do this). And that would translate into many read(2) calls once it gets to the FileInputStream.
Of course, this may just be superstition on my part. But, if you're running on Linux, you can always test with strace.
Edit: once nice pattern to follow when building up a bunch of stream delegates is to use a single InputStream variable. Then, you only have one thing to close in your finally block (and can use Jakarta Commons IOUtils to avoid lots of nested try-catch-finally blocks).
InputStream in = null;
try
{
in = new FileInputStream("foo");
in = new BufferedInputStream(in);
in = new GZIPInputStream(in);
// do something with the stream
}
finally
{
IOUtils.closeQuietly(in);
}
Add a BufferedInputStream between the FileInputStream and the GZIPInputStream.
Similarly when writing.
I'm trying to delete a file, after writing something in it, with FileOutputStream. This is the code I use for writing:
private void writeContent(File file, String fileContent) {
FileOutputStream to;
try {
to = new FileOutputStream(file);
to.write(fileContent.getBytes());
to.flush();
to.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
As it is seen, I flush and close the stream, but when I try to delete, file.delete() returns false.
I checked before deletion to see if the file exists, and: file.exists(), file.canRead(), file.canWrite(), file.canExecute() all return true. Just after calling these methods I try file.delete() and returns false.
Is there anything I've done wrong?
Another bug in Java. I seldom find them, only my second in my 10 year career. This is my solution, as others have mentioned. I have nether used System.gc(). But here, in my case, it is absolutely crucial. Weird? YES!
finally
{
try
{
in.close();
in = null;
out.flush();
out.close();
out = null;
System.gc();
}
catch (IOException e)
{
logger.error(e.getMessage());
e.printStackTrace();
}
}
It was pretty odd the trick that worked. The thing is when I have previously read the content of the file, I used BufferedReader. After reading, I closed the buffer.
Meanwhile I switched and now I'm reading the content using FileInputStream. Also after finishing reading I close the stream. And now it's working.
The problem is I don't have the explanation for this.
I don't know BufferedReader and FileOutputStream to be incompatible.
I tried this simple thing and it seems to be working.
file.setWritable(true);
file.delete();
It works for me.
If this does not work try to run your Java application with sudo if on linux and as administrator when on windows. Just to make sure Java has rights to change the file properties.
Before trying to delete/rename any file, you must ensure that all the readers or writers (for ex: BufferedReader/InputStreamReader/BufferedWriter) are properly closed.
When you try to read/write your data from/to a file, the file is held by the process and not released until the program execution completes. If you want to perform the delete/rename operations before the program ends, then you must use the close() method that comes with the java.io.* classes.
As Jon Skeet commented, you should close your file in the finally {...} block, to ensure that it's always closed. And, instead of swallowing the exceptions with the e.printStackTrace, simply don't catch and add the exception to the method signature. If you can't for any reason, at least do this:
catch(IOException ex) {
throw new RuntimeException("Error processing file XYZ", ex);
}
Now, question number #2:
What if you do this:
...
to.close();
System.out.println("Please delete the file and press <enter> afterwards!");
System.in.read();
...
Would you be able to delete the file?
Also, files are flushed when they're closed. I use IOUtils.closeQuietly(...), so I use the flush method to ensure that the contents of the file are there before I try to close it (IOUtils.closeQuietly doesn't throw exceptions). Something like this:
...
try {
...
to.flush();
} catch(IOException ex) {
throw new CannotProcessFileException("whatever", ex);
} finally {
IOUtils.closeQuietly(to);
}
So I know that the contents of the file are in there. As it usually matters to me that the contents of the file are written and not if the file could be closed or not, it really doesn't matter if the file was closed or not. In your case, as it matters, I would recommend closing the file yourself and treating any exceptions according.
There is no reason you should not be able to delete this file. I would look to see who has a hold on this file. In unix/linux, you can use the lsof utility to check which process has a lock on the file. In windows, you can use process explorer.
for lsof, it's as simple as saying:
lsof /path/and/name/of/the/file
for process explorer you can use the find menu and enter the file name to show you the handle which will point you to the process locking the file.
here is some code that does what I think you need to do:
FileOutputStream to;
try {
String file = "/tmp/will_delete.txt";
to = new FileOutputStream(file );
to.write(new String("blah blah").getBytes());
to.flush();
to.close();
File f = new File(file);
System.out.print(f.delete());
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
It works fine on OS X. I haven't tested it on windows but I suspect it should work on Windows too. I will also admit seeing some unexpected behavior on Windows w.r.t. file handling.
If you are working in Eclipse IDE, that could mean that you haven't close the file in the previous launch of the application. When I had the same error message at trying to delete a file, that was the reason. It seems, Eclipse IDE doesn't close all files after termination of an application.
Hopefully this will help. I came across similar problem where i couldn't delete my file after my java code made a copy of the content to the other folder. After extensive googling, i explicitly declared every single file operation related variables and called the close() method of each file operation object, and set them to NULL. Then, there is a function called System.gc(), which will clear up the file i/o mapping (i'm not sure, i just tell what is given on the web sites).
Here is my example code:
public void start() {
File f = new File(this.archivePath + "\\" + this.currentFile.getName());
this.Copy(this.currentFile, f);
if(!this.currentFile.canWrite()){
System.out.println("Write protected file " +
this.currentFile.getAbsolutePath());
return;
}
boolean ok = this.currentFile.delete();
if(ok == false){
System.out.println("Failed to remove " + this.currentFile.getAbsolutePath());
return;
}
}
private void Copy(File source, File dest) throws IOException {
FileInputStream fin;
FileOutputStream fout;
FileChannel cin = null, cout = null;
try {
fin = new FileInputStream(source);
cin = fin.getChannel();
fout = new FileOutputStream(dest);
cout = fout.getChannel();
long size = cin.size();
MappedByteBuffer buf = cin.map(FileChannel.MapMode.READ_ONLY, 0, size);
cout.write(buf);
buf.clear();
buf = null;
cin.close();
cin = null;
fin.close();
fin = null;
cout.close();
cout = null;
fout.close();
fout = null;
System.gc();
} catch (Exception e){
this.message = e.getMessage();
e.printStackTrace();
}
}
the answer is when you load the file, you need apply the "close" method, in any line of code, works to me
There was a problem once in ruby where files in windows needed an "fsync" to actually be able to turn around and re-read the file after writing it and closing it. Maybe this is a similar manifestation (and if so, I think a windows bug, really).
None of the solutions listed here worked in my situation. My solution was to use a while loop, attempting to delete the file, with a 5 second (configurable) limit for safety.
File f = new File("/path/to/file");
int limit = 20; //Only try for 5 seconds, for safety
while(!f.delete() && limit > 0){
synchronized(this){
try {
this.wait(250); //Wait for 250 milliseconds
} catch (InterruptedException e) {
e.printStackTrace();
}
}
limit--;
}
Using the above loop worked without having to do any manual garbage collecting or setting the stream to null, etc.
The problem could be that the file is still seen as opened and locked by a program; or maybe it is a component from your program that it had been opened in, so you have to ensure you use the dispose() method to solve that problem.
i.e. JFrame frame;
....
frame.dispose();
You have to close all of the streams or use try-with-resource block
static public String head(File file) throws FileNotFoundException, UnsupportedEncodingException, IOException
{
final String readLine;
try (FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
LineNumberReader lnr = new LineNumberReader(isr))
{
readLine = lnr.readLine();
}
return readLine;
}
if file.delete() is sending false then in most of the cases your Bufferedreader handle will not be closed. Just close and it seems to work for me normally.
I had the same problem on Windows. I used to read the file in scala line by line with
Source.fromFile(path).getLines()
Now I read it as a whole with
import org.apache.commons.io.FileUtils._
// encoding is null for platform default
val content=readFileToString(new File(path),null.asInstanceOf[String])
which closes the file properly after reading and now
new File(path).delete
works.
FOR Eclipse/NetBeans
Restart your IDE and run your code again this is only trick work for me after one hour long struggle.
Here is my code:
File file = new File("file-path");
if(file.exists()){
if(file.delete()){
System.out.println("Delete");
}
else{
System.out.println("not delete");
}
}
Output:
Delete
Another corner case that this could happen: if you read/write a JAR file through a URL and later try to delete the same file within the same JVM session.
File f = new File("/tmp/foo.jar");
URL j = f.toURI().toURL();
URL u = new URL("jar:" + j + "!/META-INF/MANIFEST.MF");
URLConnection c = u.openConnection();
// open a Jar entry in auto-closing manner
try (InputStream i = c.getInputStream()) {
// just read some stuff; for demonstration purposes only
byte[] first16 = new byte[16];
i.read(first16);
System.out.println(new String(first16));
}
// ...
// i is now closed, so we should be good to delete the jar; but...
System.out.println(f.delete()); // says false!
Reason is that the internal JAR file handling logic of Java, tends to cache JarFile entries:
// inner class of `JarURLConnection` that wraps the actual stream returned by `getInputStream()`
class JarURLInputStream extends FilterInputStream {
JarURLInputStream(InputStream var2) {
super(var2);
}
public void close() throws IOException {
try {
super.close();
} finally {
// if `getUseCaches()` is set, `jarFile` won't get closed!
if (!JarURLConnection.this.getUseCaches()) {
JarURLConnection.this.jarFile.close();
}
}
}
}
And each JarFile (rather, the underlying ZipFile structure) would hold a handle to the file, right from the time of construction up until close() is invoked:
public ZipFile(File file, int mode, Charset charset) throws IOException {
// ...
jzfile = open(name, mode, file.lastModified(), usemmap);
// ...
}
// ...
private static native long open(String name, int mode, long lastModified,
boolean usemmap) throws IOException;
There's a good explanation on this NetBeans issue.
Apparently there are two ways to "fix" this:
You can disable the JAR file caching - for the current URLConnection, or for all future URLConnections (globally) in the current JVM session:
URL u = new URL("jar:" + j + "!/META-INF/MANIFEST.MF");
URLConnection c = u.openConnection();
// for only c
c.setUseCaches(false);
// globally; for some reason this method is not static,
// so we still need to access it through a URLConnection instance :(
c.setDefaultUseCaches(false);
[HACK WARNING!] You can manually purge the JarFile from the cache when you are done with it. The cache manager sun.net.www.protocol.jar.JarFileFactory is package-private, but some reflection magic can get the job done for you:
class JarBridge {
static void closeJar(URL url) throws Exception {
// JarFileFactory jarFactory = JarFileFactory.getInstance();
Class<?> jarFactoryClazz = Class.forName("sun.net.www.protocol.jar.JarFileFactory");
Method getInstance = jarFactoryClazz.getMethod("getInstance");
getInstance.setAccessible(true);
Object jarFactory = getInstance.invoke(jarFactoryClazz);
// JarFile jarFile = jarFactory.get(url);
Method get = jarFactoryClazz.getMethod("get", URL.class);
get.setAccessible(true);
Object jarFile = get.invoke(jarFactory, url);
// jarFactory.close(jarFile);
Method close = jarFactoryClazz.getMethod("close", JarFile.class);
close.setAccessible(true);
//noinspection JavaReflectionInvocation
close.invoke(jarFactory, jarFile);
// jarFile.close();
((JarFile) jarFile).close();
}
}
// and in your code:
// i is now closed, so we should be good to delete the jar
JarBridge.closeJar(j);
System.out.println(f.delete()); // says true, phew.
Please note: All this is based on Java 8 codebase (1.8.0_144); they may not work with other / later versions.
I have a Java Applet that I'm making some edits to and am running into performance issues. More specifically, the applet generates an image which I need to export to the client's machine.
This is really at the proof-of-concept stage so bear with me. For right now, the image is exported to the clients machine at a pre-defined location (This will be replaced with a save-dialog or something in the future). However, the process takes nearly 15 seconds for a 32kb file.
I've done some 'shoot-by-the-hip' profiling where I have printed messages to the console at logical intervals throughout the method in question. I've found, to my surprise, that the bottleneck appears to be with the actual data stream writing process, not the jpeg encoding.
KEEP IN MIND THAT I ONLY HAVE A BASIC KNOWLEDGE OF JAVA AND ITS METHODS
So go slow :p - I'm mainly looking for suggestions to solve the problem rather the solution itself.
Here is the block of code where the magic happens:
ByteArrayOutputStream jpegOutput = new ByteArrayOutputStream();
JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(jpegOutput);
encoder.encode(biFullView);
byte[] imageData = jpegOutput.toByteArray();
String myFile="C:" + File.separator + "tmpfile.jpg";
File f = new File(myFile);
try {
dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(myFile),512));
dos.writeBytes(byteToString(imageData));
dos.flush();
dos.close();
}
catch (SecurityException ee) {
System.out.println("writeFile: caught security exception");
}
catch (IOException ioe) {
System.out.println("writeFile: caught i/o exception");
}
Like I mentioned, using system.out.println() I've narrowed the performance bottleneck to the DataOutputStream block. Using a variety of machines with varying hardware stats seems to have little effect on the overall performance.
Any pointers/suggestions/direction would be much appreciated.
EDIT:
As requested, byteToString():
public String byteToString(byte[] data){
String text = new String();
for ( int i = 0; i < data.length; i++ ){
text += (char) ( data[i] & 0x00FF );
}
return text;
}
You might want to take a look at ImageIO.
And I think the reason for the performance problem is the looping in byteToString. You never want to do a concatenation in a loop. You could use the String(byte[]) constructor instead, but you don't really need to be turning the bytes into a string anyway.
If you don't need the image data byte array you can encode directly to the file:
String myFile="C:" + File.separator + "tmpfile.jpg";
File f = new File(myFile);
FileOutputStream fos = null;
try {
fos = new FileOutputStream(f);
JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(
new BufferedOutputStream(fos));
encoder.encode(biFullView);
}
catch (SecurityException ee) {
System.out.println("writeFile: caught security exception");
}
catch (IOException ioe) {
System.out.println("writeFile: caught i/o exception");
}finally{
if(fos != null) fos.close();
}
If you need the byte array to perform other operations it's better to write it directly to the FileOutputStream:
//...
fos = new FileOutputStream(myFile));
fos.write(imageData, 0, imageData.length);
//...
You could also use the standard ImageIO API (classes in the com.sun.image.codec.jpeg package are not part of the core Java APIs).
String myFile="C:" + File.separator + "tmpfile.jpg";
File f = new File(myFile);
ImageIO.write(biFullView, "jpeg", f);