Java zip.close() hangs - java

I am trying to add two small files to a zip, as that is the format the destination requires. Both files are less than 1000kb but when I run my code, the program hangs indefinitely during zip.close(), no errors.
What am I doing wrong?
val is = new PipedInputStream()
val os = new PipedOutputStream(is)
val cos = new CountingOutputStream(os)
val zip = new ZipOutputStream(cos)
val fis = new FileInputStream(file)
zip.putNextEntry(new ZipEntry(location))
var i = 0
while(i != -1) {
zip.write(i)
i = fis.read()
}
zip.closeEntry()
fis.close()
zip.close()

When using piped streams, you need to read from the PipedInputStream at the same time you're writing to a PipedOutputStream, otherwise the pipe fills up and the writing will block.
Based on your code, you're not doing the reading part (in a separate thread of course). You can test it with a FileOutputStream, and it should write the file nicely.

Related

Read S3 Object and write into InMemory Buffer

I am trying to read from S3 and writing into InMemory buffer like:
def inMemoryDownload(bucketName: String, key: String): String = {
val s3Object = s3client.getObject(new GetObjectRequest(bucketName, key))
val s3Stream = s3Object.getObjectContent()
val outputStream = new ByteArrayOutputStream()
val buffer = new Array[Byte](10* 1024)
var bytesRead:Int =s3Stream.read(buffer)
while (bytesRead > -1) {
info("writing.......")
outputStream.write(buffer)
info("reading.......")
bytesRead = ss3Stream.read(buffer)
}
val data = new String(outputStream.toByteArray)
outputStream.close()
s3Object.getObjectContent.close()
data
}
But It is giving me heap space error(Size of file on S3 is 4MB)
You should be using thbytes you just read, when writing into the stream. The way you have it written, writes the entire buffer every time. I doubt that is the cause of your memory problem, but it could be. Imagine that read returns a single byte to you every time, and you write 10K into the stream. That's 40G, right there.
Another problem is, that, I am not 100% sure, but I suspect, that getObjectObject creates a new input stream every time. Basically, you just keep reading the same bytes over and over again in the loop. You should put it into a variable instead.
Also, if I may make a suggestion, try rewriting your code in actual scala, not just syntactically, but idiomatically. Avoid mutable state, and use functional transformations. If you are going to write scala code might as well take some time to get into the right mind set. You'll grow to appreciate it eventually, I promise :)
Something like this, perhaps?
val input = s3Object.getObjectContent
Stream
.continually(input.read(buffer))
.takeWhile(_ > 0)
.foreach { output.write(buffer, 0, _) }

Java SequenceInputStream

I try to send multiple Files from my Server (NanoHttpd) to my Client (Apache DefaultHttpClient).
My approach is to send multiple files via one Response of NanoHttpd.
For this purpose i wanted to use SequenceInputStream.
I am trying to concatenate multiple Files, send them via the Response (InputStream) and write every File again in a seperate File with my Client.
On the Serverside i call this:
List<InputStream> data = new ArrayList<InputStream>(o_file_path.size());
for (String file_name : files)
{
File file = new File(file_name);
data.add(new FileInputStream(file));
}
InputStream is = new SequenceInputStream(Collections.enumeration(data));
return new NanoHTTPD.Response(HTTP_OK, "application/octet-stream", is);
Now my Question is how to receive and split the Files correctly.
I have tried it this way on my client, but it does not work:
int read = 0;
int remaining = 0;
byte[] bytes = new byte[buffer];
// Read till the end of the Stream
while ( (read != -1) && (counter < files.size()))
{
// Create a .o file for the current file
read = 0;
remaining = is.available();
// Should open each Stream
while (remaining > 0)
{
read = is.read(bytes);
remaining = remaining - read;
os.write(bytes, 0, read);
}
os.flush();
os.close();
}
This way I want to go over all Stream (untill read == 1, or i know there is no file anymore), and read any stream into a file.
I clearly seem to understand something groundbreaking wrong, since is.available() always is 0.
Could anyone please tell me how to read properly from this SequencedInputStream, or how to solve my Problem.
Thanks in advance.
It won't work this way. SequenceInputStream will merge all input streams in one solid byte stream. There will be no separators or EOFs. I suggest to abandon the idea and look for a different approach.

How to chain multiple different InputStreams into one InputStream

I'm wondering if there is any ideomatic way to chain multiple InputStreams into one continual InputStream in Java (or Scala).
What I need it for is to parse flat files that I load over the network from an FTP-Server. What I want to do is to take file[1..N], open up streams and then combine them into one stream. So when file1 comes to an end, I want to start reading from file2 and so on, until I reach the end of fileN.
I need to read these files in a specific order, data comes from a legacy system that produces files in barches so data in one depends on data in another file, but I would like to handle them as one continual stream to simplify my domain logic interface.
I searched around and found PipedInputStream, but I'm not positive that is what I need. An example would be helpful.
It's right there in JDK! Quoting JavaDoc of SequenceInputStream:
A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.
You want to concatenate arbitrary number of InputStreams while SequenceInputStream accepts only two. But since SequenceInputStream is also an InputStream you can apply it recursively (nest them):
new SequenceInputStream(
new SequenceInputStream(
new SequenceInputStream(file1, file2),
file3
),
file4
);
...you get the idea.
See also
How do you merge two input streams in Java? (dup?)
This is done using SequencedInputStream, which is straightforward in Java, as Tomasz Nurkiewicz's answer shows. I had to do this repeatedly in a project recently, so I added some Scala-y goodness via the "pimp my library" pattern.
object StreamUtils {
implicit def toRichInputStream(str: InputStream) = new RichInputStream(str)
class RichInputStream(str: InputStream) {
// a bunch of other handy Stream functionality, deleted
def ++(str2: InputStream): InputStream = new SequenceInputStream(str, str2)
}
}
With that, I can do stream sequencing as follows
val mergedStream = stream1++stream2++stream3
or even
val streamList = //some arbitrary-length list of streams, non-empty
val mergedStream = streamList.reduceLeft(_++_)
Another solution: first create a list of input stream and then create the sequence of input streams:
List<InputStream> iss = Files.list(Paths.get("/your/path"))
.filter(Files::isRegularFile)
.map(f -> {
try {
return new FileInputStream(f.toString());
} catch (Exception e) {
throw new RuntimeException(e);
}
}).collect(Collectors.toList());
new SequenceInputStream(Collections.enumeration(iss)))
Here is a more elegant solution using Vector, this is for Android specifically but use vector for any Java
AssetManager am = getAssets();
Vector v = new Vector(Constant.PAGES);
for (int i = 0; i < Constant.PAGES; i++) {
String fileName = "file" + i + ".txt";
InputStream is = am.open(fileName);
v.add(is);
}
Enumeration e = v.elements();
SequenceInputStream sis = new SequenceInputStream(e);
InputStreamReader isr = new InputStreamReader(sis);
Scanner scanner = new Scanner(isr); // or use bufferedReader
Here's a simple Scala version that concatenates an Iterator[InputStream]:
import java.io.{InputStream, SequenceInputStream}
import scala.collection.JavaConverters._
def concatInputStreams(streams: Iterator[InputStream]): InputStream =
new SequenceInputStream(streams.asJavaEnumeration)

Load text file to memory in Java

I have wiki.txt file and its size is 50 MB.
I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?
This is the code that I written:
File file = new File("wiki.txt");
FileInputStream fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
System.out.println((char)mapByteBuffer.get());
I get error on this code: mapByteBuffer.get().
I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.
Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.
Thank you.
I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.
My answers for point (1):
It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).
Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size.
RandomAccessFile if you are on Java 1.4 or above.
For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.
It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.
You will get an error if the file is empty as there is no first byte.
Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)
However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.
I would suggest to use BufferedReader. It is much faster and requires relatively less resources.
First read number of lines:
InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1)
{
for (int i = 0; i < numberOfChars; ++i)
{
if (chars[i] == '\n' && numberOfChars - i != 1)
{
++count;
}
}
}
count++
return count; // number of lines
Then read the lines:
BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++)
{
String oneLine = in.readLine();
}
In this strings you can even do search for what you need.

What method is more efficient for concatenating large files in Java using FileChannels

I want to find out what method is better of two that I have come up with for concatenating my text files in Java. If someone has some insight they can share about what goes on at the kernel level that explains the difference between these methods of writing to a FileChannel, I would greatly appreciate it.
From what I understand from documentation and other Stack Overflow conversations, the allocateDirect allocates space right on the drive, and mostly avoids using RAM. I have a concern that the ByteBuffer created with allocateDirect might have a potential to overflow or not be allocated if the File infile is large, say 1GB. I am guaranteed at this point in the development of our software that the File will be no larger than 2 GB; but there is potential in the future that it might be as big as 10 or 20GB.
I have observed that the transferFrom loop never goes through the loop more than once... so it seems to succeed in writing the entire infile at once; but I haven't tested it with files bigger than 60MB. I looped though, because the documentation specifies that there is no guarantee of how much will be written at once. With transferFrom only able to accept, on my system, an int32 as its count parameter, I won't be able to specify more than 2GB at a time be transferred... Again, kernel expertise would help me understand.
Thanks in advance for your help!!
Using a ByteBuffer:
boolean concatFiles(StringBuffer sb, File infile, File outfile) {
FileChannel inChan = null, outChan = null;
try {
ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
//write the stringBuffer so it goes in the output file first:
buff.put(sb.toString().getBytes());
//create the FileChannels:
inChan = new RandomAccessFile(infile, "r" ).getChannel();
outChan = new RandomAccessFile(outfile, "rw").getChannel();
//read the infile in to the buffer:
inChan.read(buff);
// prep the buffer:
buff.flip();
// write the buffer out to the file via the FileChannel:
outChan.write(buff);
inChan.close();
outChan.close();
} catch...etc
}
Using trasferTo (or transferFrom):
boolean concatFiles(StringBuffer sb, File infile, File outfile) {
FileChannel inChan = null, outChan = null;
try {
//write the stringBuffer so it goes in the output file first:
PrintWriter fw = new PrintWriter(outfile);
fw.write(sb.toString());
fw.flush();
fw.close();
// create the channels appropriate for appending:
outChan = new FileOutputStream(outfile, true).getChannel();
inChan = new RandomAccessFile(infile, "r").getChannel();
long startSize = outfile.length();
long inFileSize = infile.length();
long bytesWritten = 0;
//set the position where we should start appending the data:
outChan.position(startSize);
Byte startByte = outChan.position();
while(bytesWritten < length){
bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
startByte = bytesWritten + 1;
}
inChan.close();
outChan.close();
} catch ... etc
transferTo() can be far more efficient as there is less data copying, or none if it can all be done in the kernel. And if it isn't on your platform it will still use highly tuned code.
You do need the loop, one day it will iterate and your code will keep working.

Categories