Java- using an InputStream as a File - java

I'm trying to generate a PDF document from an uploaded ".docx" file using JODConverter.
The call to the method that generates the PDF is something like this :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
I'm using apache commons FileUpload to handle uploading the docx file, from which I can get an InputStream object. I'm aware that Java.io.File is just an abstract reference to a file in the system.
I want to avoid the disk write (saving the InputStream to disk) and the disk read (reading the saved file in JODConverter).
Is there any way I can get a File object refering to an input stream? just any other way to avoid disk IO will also do!
EDIT: I don't care if this will end up using a lot of system memory. The application is going to be hosted on a LAN with very little to zero number of parallel users.

File-based conversions are faster than stream-based ones (provided by StreamOpenOfficeDocumentConverter) but they require the OpenOffice.org service to be running locally and have the correct permissions to the files.
Try the doc to avoid disk writting:
convert(java.io.InputStream inputStream, DocumentFormat inputFormat, java.io.OutputStream outputStream, DocumentFormat outputFormat)

There is no way to do it and make the code solid. For one, the .convert() method only takes two Files as arguments.
So, this would mean you'd have to extend File, which is possible in theory, but very fragile, as you are required to delve into the library code, which can change at any time and make your extended class non functional.
(well, there is a way to avoid disk writes if you use a RAM-backed filesystem and read/write from that filesystem, of course)

Chances are that commons fileupload has written the upload to the filesystem anyhow.
Check if your FileItem is an instance of DiskFileItem. If this is the case the write implementation of DiskFileItem willl try to move the file to the file object you pass. You are not causing any extra disk io then since the write already happened.

Related

How can I read a Base64 file that comes as a chain?

I am currently developing a REST service which receives in its request a field where it is passed a file in base 64 format ("n" characters come). What I do within the service logic is to convert that character string to a File to save it in a predetermined path.
But the problem is that when the file is too large (3MB) the service becomes slow and takes a long time to respond.
This is the code I am using:
String filename = "TEXT.DOCX"
BufferedOutputStream stream = null;
// THE FIELD base64file IS WHAT A STRING IN BASE FORMAT COMES FROM THE REQUEST 64
byte [] fileByteArray = java.util.Base64.getDecoder (). decode (base64file);
// VALID FILE SIZE
if ((1 * 1024 * 1024 <fileByteArray.length) {
    logger.info ("The file [" + filename + "] is too large");
} else {
    stream = new BufferedOutputStream (new FileOutputStream (new File ("C: \" + filename)));
    stream.write (fileByteArray);
}
How can I do to avoid this inconvenience. And that my service does not take so long to convert the file to File.
Buffering does not improve your performance here, as all you are trying to do is simply write the file as fast as possible. Generally it looks fine, change your code to directly use the FileOutputStream and see if it betters things:
try (FileOutputStream stream = new FileOutputStream(path)) {
stream.write(bytes);
}
Alternatively you could also try using something like Apache Commons to do the task for you:
FileUtils.writeByteArrayToFile(new File(path), bytes);
Try the following, also for large files.
Path outPath = Paths.get(filename);
try (InputStream in = Base64.getDecoder ().wrap(base64file)) {
Files.copy(in, outPath);
}
This keeps only a buffer in memory. Your code might become slow because of taking more memory.
wrap takes an InputStream which you should provide, not the entire String.
From Network point of view:
Both json and xml can support large amount of data exchange. And, 3MB is not really huge. But, there is a limitation on how much browser can handle (if this call is from a user interface).
Also, web server like Tomcat has property to handle 2MB by default (check maxPostSize http://tomcat.apache.org/tomcat-6.0-doc/config/http.html#Common_Attributes)
You can also try chunking the request payload (although it shouldn't be required for a 3MB file)
From Implementation point of view:
Write operation on your disk could be slow. It also depends on your OS.
If your file size is really large, you can use Java FileChannel class with ByteBuffer.
To know the cause of slowness (network delay or code), check the performance with a simple Java program against the web service call.

Java | method to write a datahandler to a file takes more time than expected

I am trying to read the mails from the MS Exchange by using camel and getting the attachments as DataHandler. A 10MB file takes around 3hrs to write into the location.
File outputFile = new File(someDirectory, someFileName);
DataHandler attachment_data = destination1Attachments.get("someFileName.txt");
try (FileOutputStream fos = new FileOutputStream(outputFile)) {
attachment_data.writeTo(fos);
}
I have also noticed that sometimes a 6 to 7Mb file takes around 2 to 3 minutes and when another mail comes just after that it takes more time than expected.
Because of GC ?
Trying to find the exact root cause or any other method to write the data to the file.
Update 1 :
Tried using BufferedOutputStream around FileOutputSteam as mentioned by #user207421 in the comment. No much change could find (just 1sec or little more).
This could be due to the default implementation of write mechanism.
attachment_data.writeTo(fos);
If the DataHandler.getDataSource()!=null then this theory will work
In this method implementation 8 bytes are getting read at a time and writing into the stream. The number of read and writes are more and this might be causing the issue.
Try reading the on your own from DataHandler.getInputStream and write to file by increasing the read content from the input stream.
One must assume that the object is loaded in memory or writeTo very inefficient. Hence specify the DataFlavor and inspect attachment_data.getTransferDataFlavors().
DataFlavor flavor = new DataFlavor(InputStream.class, "application/octetstream");
try (InputStream in = (InputStream) attachment_data.getTransferData(flavor)) {
Some fiddling needed.

Java: FileOutputStream and FileInputStream together on the same file

Do I can open a file (linux character device) for read+write, and use the two classes to implement a dialog like client-server?
Something like this:
File file = new File("/dev/ttyS0");
FileOutpuStream fo = new FileOutputStream(file)
FileInputStream fi = new FileInputStream(file)
After the above declarations, can I continuously send pollings (questions) to the file, and read its replies? (Of course, attached to ttyS0 there is a kind of server)
I was not able to test it, but you might want to give RandomAccessFile a try.
It does not give you the opertunity to create streams, but it implements DataInput and DataOutput. Thats maybe good enough for your purpose?
RandomAccessFile docs
String file = "/dev/ttyS0";
try {
RandomAccessFile f = new RandomAccessFile(file, "rwd");
} catch (IOException e){
e.printStackTrace();
}
The /dev/ttyS0 file is a device file for a serial terminal.
If the device has been configured appropriately to connect to a serial terminal line, then you should be able to read and write like that. However, on a typical desktop or laptop, it probably won't work because there won't be connected serial line.
(For example, when I do this on my PC:
$ sudo bash -c "cat < /dev/ttyS0"
I get this:
cat: -: Input/output error
which is saying that the device cannot be read from.)
Note that a /dev/tty* device does not behave like a regular file. The characters that are written in no way relate to the characters that you read back. Also note that it is not possible to make ioctl requests using the standard Java APIs. So configuring the terminal driver from Java would be problematic.
If you were talking abour reading and writing a regular file, it should work too. However, the behavior could be a rather confusing, especially if you have buffering in your streams. One issue you need to deal with is that the two file descriptors are independent of each other.
If you need to do this kind of thing with a regular file, you should probably use RandomAccessFile.
I didn't try RandomAccessFile, it could also work... it worked smoothly with FileInputStream and FileOutputStream, see this answer in SO: https://stackoverflow.com/a/56935267/7332147

Random-access Zip file without writing it to disk

I have a 1-2GB zip file with 500-1000k entries. I need to get files by name in fraction of second, without full unpacking. If file is stored on HDD, this works fine:
public class ZipMapper {
private HashMap<String,ZipEntry> map;
private ZipFile zf;
public ZipMapper(File file) throws IOException {
map = new HashMap<>();
zf = new ZipFile(file);
Enumeration<? extends ZipEntry> en = zf.entries();
while(en.hasMoreElements()) {
ZipEntry ze = en.nextElement();
map.put(ze.getName(), ze);
}
}
public Node getNode(String key) throws IOException {
return Node.loadFromStream(zf.getInputStream(map.get(key)));
}
}
But what can I do if program downloaded the zip file from Amazon S3 and has its InputStream (or byte array)? While downloading 1GB takes ~1 second, writing it to HDD may take some time, and it is slightly harder to handle multiple files since we don't have HDD garbage collector.
ZipInputStream does not allow to random access to entries.
It would be nice to create a virtual File in memory by byte array, but I couldn't find a way to.
You could mark the file to be deleted on exit.
If you want to go for an in-memory approach: Have a look at the new NIO.2 File API. Oracle provides a filesystem provider for zip/ jar and AFAIK ShrinkWrap provides an in-memory filesystem. You could try a combination of the two.
I've written some utility methods to copy directories and files to/from a Zip file using the NIO.2 File API (the library is Open Source):
Maven:
<dependency>
<groupId>org.softsmithy.lib</groupId>
<artifactId>softsmithy-lib-core</artifactId>
<version>0.3</version>
</dependency>
Tutorial:
http://softsmithy.sourceforge.net/lib/current/docs/tutorial/nio-file/index.html
API: CopyFileVisitor.copy
Especially PathUtils.resolve helps with resolving paths across filesystems.
You can use SecureBlackbox library, it allows ZIP operations on any seekable streams.
I think you should consider using your OS in order to create "in memory" file system (i.e - RAM drive).
In addition, take a look at the FileSystems API.
A completely different approach: If the server has the file on disk (and possibly cached in RAM already): make it give you the file(s) directly. In other words, submit which files you need and then take care to extract and deliver these on the server.
Blackbox library only has Extract(String name, String outputPath) method. Seems that it can randomly access any file in seekable zip-stream indeed, but it can't write result to byte array or return stream.
I couldn't find and documentation for ShrinkWrap. I couldn't find any suitable implementations of FileSystem/FileSystemProvider etc.
However, it turned out that Amazon EC2 instance I'm running (Large) somehow writes 1gb file to disk in ~1 second. So I just write file to the disk and use ZipFile.
If HDD would be slow, I think RAM disk would be the easiest solution.

How to verify the content of ZipFile before saving to disk

I have an application that requires a user to upload a zipfile containing xml report file among other files.
What I want to do is, to verify it is a zip, then open and check if there is an xml file, and verify some few nodes which are required in that xml.
I want to do this before I save this zipfile to a disk/filesystem, and withought creating a temporary file. I will only save the file if it passes the validation.
I am using Spring multipart CommonsMultipartFile to manage uploads.
The application is using Java, jsp, tomcat
Thanks.
See my comment on the OP about the wisdom of buffering the entire file in memory.
One quick first check for a valid zip file would be to check the first 4 bytes for the appropriate "magic" bytes. a zip file should start with the first 4 bytes {(byte)0x50, (byte)0x4b, (byte)0x03, (byte)0x04}. the only way to really check it, however, is to attempt to unzip it.
If you want to check whether a file is a ZIP file, perhaps you can use getContentType() method of the URLConnection class? Something like this:
URL u = new URL(fileUrl);
URLConnection uc = u.openConnection();
String type = uc.getContentType();
But it would be much faster to detect the magic bytes which, for the ZIP format, are 50 4B.

Categories