I am trying to read text file whilst running the program from a jar archive.
I come accros that I need to use InputStream to read file. The snippet of code:
buffer = new BufferedInputStream(this.getClass().getResourceAsStream((getClass().getClassLoader().getResource("English_names.txt").getPath())));
System.out.println(buffer.read()+" yeas");
At this line System.out.println(buffer.read()+" yeas"); program stops and nothing happens since then. Once you output the contents of buffer object it is not null.
What might be the problem?
From InputStream#read():
This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
So basically, the stream appears to be waiting on content. I'm guessing it's how you've constructed the stream, you can simplify your construction to:
InputStream resourceStream = getClass().getResourceAsStream("/English_names.txt");
InputStream buffer = new BufferedInputStream(resourceStream);
I'd also check to make sure that resourceStream is not-null.
You should not worry about InputStream being null when passed into BufferedInputStream constructor since it, the constructor handles null parameters just fine. When supplied with null it will just return null without throwing any exception. Also since InputStream implements AutoClosable the try-with-resources block will take care of closing your streams properly.
try (
final InputStream is = getClass().getResourceAsStream("/English_names.txt");
final BufferedInputStream bis = new BufferedInputStream(is);
) {
if (null == bis)
throw new IOException("requsted resource was not found");
// Do your reading.
// Do note that if you are using InputStream.read() you may want to call it in a loop until it returns -1
} catch (IOException ex) {
// Either resource is not found or other I/O error occurred
}
Related
I have written 2 methods to read the file
public static void parseCsvFile(String path) throws IOException {
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
//logger.info(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
}
public static void parseCsvUsingJavaStream(String path) {
try (Stream<String> stream = Files.lines(Paths.get(path))) {
stream.forEach(System.out :: println);
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
From the first approach what I understand is that the method does not load all the lines from the file into the memory at once, which is memory efficient. I want to achieve the same using lambda expression. My question here is the does my second approach load all the lines into the memory?If yes then how can I make my second approach memory efficient?
The answer to your question is in the Files.lines javadoc :
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Your second code sample should be roughly as memory-efficient as your first code sample.
Using the streams api should result to about the same memory usage as the other approach, unless you parallelize the stream.
From the Javadoc:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Bytes from the file are decoded into characters using the specified charset and the same line terminators as specified by readAllLines are supported.
After this method returns, then any subsequent I/O exception that occurs while reading from the file or when a malformed or unmappable byte sequence is read, is wrapped in an UncheckedIOException that will be thrown from the Stream method that caused the read to take place. In case an IOException is thrown when closing the file, it is also wrapped as an UncheckedIOException.
The returned stream encapsulates a Reader. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed.
I have a synchronize method, in whoich i am using datainputstream.readfully() .Its throwing me the "EOF exception". Why the reallyfully method throws EOF when it is still inside the synchronize method? below is the code for reference
private static synchronized String getTransactionId() {
try {
String txnId_fname = SiteConfiguration.getInstance().getProperty("TRANSACTION.INFO_FILE", //
LaneProcessor.DEFAULT_TRANSACTION_ID_FILE_NAME);
File tmpFile = new File(txnId_fname);
if (!tmpFile.exists()) {
tmpFile.createNewFile();
}
else {
long sz = tmpFile.length();
if ( 12 == sz ) {
// read the transaction id from the file, the ID must be 12 bytes long to be valid.
DataInputStream dis = new DataInputStream(new FileInputStream(tmpFile));
byte[] datainBytes = new byte[dis.available()];
dis.readFully(datainBytes);
transactionIdLog = new String(datainBytes, 0, datainBytes.length);
if ( Stringer.isNumeric(transactionIdLog))
{
transactionId = Long.valueOf(transactionIdLog);
}
dis.close();
//log.debug("transaction id from the existing file"+transactionId);
}
}
transactionId = ConvertUtils.incrementLong(transactionId);
transactionIdLog = Long.toString(transactionId);
transactionIdLog = Stringer.zpad(transactionIdLog, 12);
_out = new FileOutputStream(tmpFile);
_out.write(transactionIdLog.getBytes());
_out.flush();
_out.close();
}
catch (Exception e) {
log.error("Error in transaction id generation" + e.getMessage(), e);
}
return transactionIdLog;
}
The contract for available is that it returns an estimate of the number of bytes available; if you try to read that many bytes, the program won't block but it may read fewer bytes than available says. If available's result is too high, then readFully could get an EOF exception. Unfortunately, I tried looking at the source of FileInputStream.available to see how it worked, but it's native, so I can't tell whether it could return a "too large" value. All I can say is, based on the javadoc, I don't think your code is guaranteed to work.
To see whether this really is the problem, I'd recommend having the program output datainBytes.length after the array is created, and then check that against the actual file size.
Will the synchronize method throw EOF Exception?
Literally No. Any exception in the method will be caught and logged. So it won't propagate an EOFException. What is more, there is no throw new EOFException(...).
But could your method catch EOFException and log it? I think the answer is Yes!
The readFully method will throws EOFException if it cannot fill the buffer, and you have set the buffer size to the number of bytes that available() says are readable. But consider this scenario:
Your application executes to the point where available() returns.
Your application is paused (e.g. by the OS scheduler).
Some other application truncates the file.
Your application is resumed, and calls readFully ... only to discover that there are ZERO bytes to be read.
EOFException ...
This illustrates the point that the result of isAvailable() is only a hint. You can't entirely rely on it.
But, I don't think it is technically possible to code that method in such a way that an EOFException cannot occur. You certainly can't do it without some kind of file locking ... to prevent other applications truncating the file while your application is reading it.
When loading huge files with ObjectInputStream, all read objects are buffered by stream for object graph resolving.
This cause huge memory overhead which isn't needed in my case (all objects read are interdependent).
Is there an equivalent to the reset() method of ObjectOutputStream which reset this buffer?
Code example:
try (FileInputStream fileInputStream = new FileInputStream(filename);
BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);
ObjectInputStream objectInputStream = new ObjectInputStream(bufferedInputStream)) {
while (object = objectInputStream.readObject()) {
System.Out.println(object.toString());
}
}
There is actually a reset method on the class but it does a complete different thing.
See Java APIs which cause memory bloat
It's up to the sender to decide when to break the integrity of sent object graphs, by calling ObjectOutputStream.reset(). Not the receiver.
NB your code doesn't compile, and wouldn't be valid if it did:
while (object = objectInputStream.readObject()) {
}
This should be
try {
while (true) {
object = objectInputStream.readObject();
// ...
}
}
catch (EOFException exc) {
// end of stream
}
There is a misconception abroad that readObject() returns null at end of stream. It doesn't. It throws EOFException. It can return null any time you wrote a null.
Hmm it seems you need to use some sort of lazy loading techniques where you only load necessairy components of the object graph, not everything.
I just wanted to see if there was a better way I should be handling this. My understanding of streams is that as long as you close a stream, any streams encapsulated within it will be closed which is why I only close TarArchiveOutputStream in finally. If I get FileNotFound on the rawDir or archiveDir I want to log it, otherwise anything else I want to throw.
public static void createTarGzOfDirectory(File rawDir, File archiveFile) throws IOException {
FileOutputStream fOut = null;
BufferedOutputStream bOut = null;
GzipCompressorOutputStream gzOut = null;
TarArchiveOutputStream tOut = null;
try {
fOut = new FileOutputStream(archiveFile);
bOut = new BufferedOutputStream(fOut);
gzOut = new GzipCompressorOutputStream(bOut);
tOut = new TarArchiveOutputStream(gzOut);
addFileToTarGz(tOut, rawDir, "");
} catch (FileNotFoundException e) {
log.error("File not found: " + e);
} finally {
if(tOut != null) {
tOut.finish();
tOut.close();
}
}
Any other considerations or thoughts on improving things?
My understanding of streams is that as long as you close a stream, any streams encapsulated within it will be closed ...
That is correct.
However, your code is (effectively) assuming that if tOut is null, then none of the other streams in the chain have been created. That's a somewhat dodgy assumption. Consider this sequence:
The FileOutputStream is created and is assigned to fOut.
The BufferedOutputStream is created and is assigned to bOut.
The GzipCompressorOutputStream constructor throws an exception or error. (Maybe the heap is full ...).
The catch is skipped ... wrong exception.
The finally checks tOut, finds it is null, and does nothing.
Net result: we've leaked the file descriptor / channel held by the FileOUtputStream.
The key to getting this example (absolutely) right is to understand which of those stream objects holds the critical resources, and ensuring that THAT stream gets closed. The other streams that don't hold resources don't have to be closed.
} finally {
if (fOut != null) {
fOut.close();
}
}
The other point is that you need to move the tOut.finish() call into the try block after the addFileToTarGz call.
If the addFileToTarGz call throws an exception, or if you don't get that far, the finish call is a waste of time.
The finish call will attempt to write the index to the archive, and THAT could throw an IOException. If this happens in the finally block, then any following code in the finally block to close the stream chain won't get executed ... and a file descriptor will be leaked.
Although it would look ugly and is,maybe, unlikely to be the case, you should close them all in cascade. Yes, if you close the TarArchiveOutputStream, it is supposed to close the underlyning streams. But, depending on the implementation, it may not always be the case. Moreover, and probably mainly, if one of the intermediate constructors throw an exception, tOut will be null, but the other ones may not be. Meaning that your streams are opened but your did not close any.
You could chain all your constructors together like so:
tOut = new TarArchiveOutputStream(new GzipCompressorOutputStream(new BufferedOutputStream(new FileOutputStream(archiveFile))));
And save yourself 6 lines of initialization and 3 local variables for debugging. Not everyone likes chaining things that way - I personally find it more readable but the rest of your team may prefer it your way.
As far as closing the stream, it looks correct to me.
My code makes use of BufferedReader to read from a file [main.txt] and PrintWriter to write to a another temp [main.temp] file. I close both the streams and yet I was not able to call delete() method on the File object associated with [main.txt]. Only after calling System.gc() after closing both the stream was I able to delete the File object.
public static boolean delete (String str1, String str2, File FileLoc)
{
File tempFile = null;
BufferedReader Reader = null;
PrintWriter Writer = null;
try
{
tempFile = new File (FileLoc.getAbsolutePath() + ".tmp");
Reader = new BufferedReader(new FileReader(FileLoc));
Writer = new PrintWriter(new FileWriter(tempFile));
String lsCurrLine = null;
while((lsCurrLine = Reader.readLine()) != null)
{
// ...
// ...
if (true)
{
Writer.println(lsCurrLine);
Writer.flush();
}
}
Reader.close();
Writer.close();
System.gc();
}
catch(FileNotFoundException loFileExp)
{
System.out.println("\n File not found . Exiting");
return false;
}
catch(IOException loFileExp)
{
System.out.println("\n IO Exception while deleting the record. Exiting");
return false;
}
}
Is this reliable? Or is there a better fix?
#user183717 - that code you posted is clearly not all of the relevant code. For instance, those "..."'s and the fact that File.delete() is not actually called in that code.
When a stream object is garbage collected, its finalizer closes the underlying file descriptor. So, the fact that the delete only works when you added the System.gc() call is strong evidence that your code is somehow failing to close some stream for the file. It may well be a different stream object to the one that is opened in the code that you posted.
Properly written stream handling code uses a finally block to make sure that streams get closed no matter what. For example:
Reader reader = new BufferedReader(new FileReader(file));
try {
// do stuff
} finally {
try {
reader.close();
} catch (IOException ex) {
// ...
}
}
If you don't follow that pattern or something similar, there's a good chance that there are scenarios where streams don't always get closed. In your code for example, if one of the read or write calls threw an exception you'd skip past the statements that closed the streams.
Is this [i.e. calling System.gc();] reliable?
No.
The JVM may be configured to ignore your application's gc() call.
There's no guarantee that the lost stream will be unreachable ... yet.
There's no guarantee that calling System.gc() will notice that the stream is unreachable. Hypothetically, the stream object might be tenured, and calling System.gc() might only collect the Eden space.
Even if the stream is found to be unreachable by the GC, there's no guarantee that the GC will run the finalizer immediately. Hypothetically, running the finalizers can be deferred ... indefinitely.
Or is there a better fix ?
Yes. Fix your application to close its streams properly.
try using java.io.File library. here the simple sample:
File f = new File("file path or file name");
f.delete();
When you say you "close both the streams" you mean the BufferedReader and the PrintWriter?
You should only need to close the BufferedReader before the delete will work, but you also need to close the underlying stream; normally calling BufferedReader.close() will do that. It sounds like you think you are closing the stream but you aren't actually succeeding.
One problem with your code: you don't close the streams if exceptions occur. It's usually best to close the streams in a finally block.
Also, the code you posted doesn't use File.delete() anywhere? And what exactly do the ... lines do - are they re-assinging Reader to a new stream by any chance?
try using the apache commons io
http://commons.apache.org/io/description.html