I have an application that requires a user to upload a zipfile containing xml report file among other files.
What I want to do is, to verify it is a zip, then open and check if there is an xml file, and verify some few nodes which are required in that xml.
I want to do this before I save this zipfile to a disk/filesystem, and withought creating a temporary file. I will only save the file if it passes the validation.
I am using Spring multipart CommonsMultipartFile to manage uploads.
The application is using Java, jsp, tomcat
Thanks.
See my comment on the OP about the wisdom of buffering the entire file in memory.
One quick first check for a valid zip file would be to check the first 4 bytes for the appropriate "magic" bytes. a zip file should start with the first 4 bytes {(byte)0x50, (byte)0x4b, (byte)0x03, (byte)0x04}. the only way to really check it, however, is to attempt to unzip it.
If you want to check whether a file is a ZIP file, perhaps you can use getContentType() method of the URLConnection class? Something like this:
URL u = new URL(fileUrl);
URLConnection uc = u.openConnection();
String type = uc.getContentType();
But it would be much faster to detect the magic bytes which, for the ZIP format, are 50 4B.
Related
This is my first hands on using Java Spring boot in a project, as I have mostly used C# and I have a requirement of reading a file from a blob URL path and appending some string data(like a key) to the same file in the stream before my API downloads the file.
Here are the ways that I have tried to do it:
FileOutputStream/InputStream: This throws a FileNotfoundException as it is not able to resolve the blob path.
URLConnection: This got me somewhere and I was able to download the file successfully but when I tried to write/append some value to the file before I download, I failed.
the code I have been doing.
//EXTERNAL_FILE_PATH is the azure storage path ending with for e.g. *.txt
URL urlPath = new URL(EXTERNAL_FILE_PATH);
URLConnection connection = urlPath.openConnection();
connection.setDoOutput(true); //I am doing this as I need to append some data and the docs mention to set this flag to true.
OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
out.write("I have added this");
out.close();
//this is where the issues exists as the error throws saying it cannot read data as the output is set to true and it can only write and no read operation is allowed. So, I get a 405, Method not allowed...
inputStream = connection.getInputStream();
I am not sure if the framework allows me to modify some file in the URL path and read it simultaneously and download the same.
Please help me in understanding if they is a better way possible here.
From logical point of view you are not appending data to the file from URL. You need to create new file, write some data and after that append content from file from URL. Algorithm could look like below:
Create new File on the disk, maybe in TMP folder.
Write some data to the file.
Download file from the URL and append it to file on the disk.
Some good articles from which you can start:
Download a File From an URL in Java
How to download and save a file from Internet using Java?
How to append text to an existing file in Java
How to write data with FileOutputStream without losing old data?
I have uploaded a MultipartFile which is a video, and am trying to use capture a frame of it using some code. However, when it tries to open the file, it always gives the error java.io.IOException: File Not Found.
First I extract the multipart file to a normal file like so:
File convertedFile = new File(multipartFile.getOriginalFilename());
multipartFile.transferTo(convertedFile);
Then I set the file name of the video to the code to capture the frame like this:
"file://" + convertedFile.getName()
then it fails when it tries to create a URL out of the file name:
new URL(_videoFilename));
How can I make it find the file?
I haven't worked with MultipartFile but you can find this in the Javadoc
The file contents are either stored in memory or temporarily on disk.
In either case, the user is responsible for copying file contents to a
session-level or persistent store as and if desired. The temporary
storages will be cleared at the end of request processing.
Therefore, you certainly have to usetransferTo(File dest) to be able to handle your file on a determined place.
I'm getting from the client an inputStream and file Metadata, and saving it in my SQL table. This table also holds full file path and some unique uid.
I want to be able to pass a uid and get a "handler" to the file, but can't seem to understand if I need to return outputStream, InputStream or File?
Which one should be returned?
I want this handler for the client for the following reasons:
The user will pass it to another function
The user will decide to convert stream to a file and copy it to some local path
Also, When returning outputstram,is it enough to do the following:
OutputStream out = new FileOutputStream(PATH_TO_MY_FILE))
return out;
Am I returning an empty stream? Does out contain all file data?
I thought maybe the best way will be to return file:
File f = new File(PATH_TO_MY_FILE);
return f;
Editing:
My metadata holds file name and file type. When I get InputStream I save in in my folder and set the path in the SQL table to be : folerPath+"/"+filename + "."+ fileType
When The user will run the following function : get(fileUid) I want to retrieve the full path (by using sql query) and return the file (hanlder)
Can you please advise?
Thanks
The user will decide to convert stream to a file and copy it to some local path
This tells us that what you need to give them is an InputStream (or Reader), since they'll be reading from it.
Your code will be reading from your database or whatever, presumably via the InputStream you get back from ResultSet#getBinaryStream or similar. You might give that directly to the caller, or you may prefer to have your code in the middle, perhaps working through a memory buffer.
Re your comment below:
I'm saving the file at some DB folder...
Databases don't have folders; file systems have folders. It sounds like the file isn't stored in your database table, just the path to it. If so, use FileInputStream with the path to get an InputStream for it, which you can return to the caller.
I have a property file named sysconfig.properties, I want to read it multiple times, because it is mutable.But I found when I changed the content of the sysconfig.properties then I read the content that is imutable, which is the same with the first time I read from the systemconfig.properties file.The content of the sysconfig.propertes file as follows:
isInitSuccess=TRUE
isStartValid=2013
May be sometime it will been changed as follows:
isInitSuccess=FALSE
isStartValid=2013
The code of read the properties file as follows:
InputStream inStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(filePath);
I use the code read the file mutilple times, but every time the "isInitSuccess" is "TRUE", even though I changed the isInitSuccess=FALSE.Is the system just read it one time, then I read the file, it just get the input stream from the memory?
But when I use the code below, it will work fine:
InputStream inStream = new FileInputStream(new File(strPath));
I googled, but I did not find any help, the problem confused me a lot, any help would be appreciate.
You need to read up on what the classpath is.
In short, Java has a concept of classpath which includes all the resources (.class files, .properties files, and anything really) it needs to run. When you use ClassLoader#getResourceAsStream(String), you're actually getting the InputStream of a classpath resource. This resource can be a physical resource on disk or it can be in an archive.
When you use a FileInputStream, you are getting the InputStream of a file on disk.
The InputStream from the ClassLoader and the one from the FileInputStream do not correspond to the same file.
You should read up on how your IDE (or whatever build system) handles your files.
I'm trying to generate a PDF document from an uploaded ".docx" file using JODConverter.
The call to the method that generates the PDF is something like this :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
I'm using apache commons FileUpload to handle uploading the docx file, from which I can get an InputStream object. I'm aware that Java.io.File is just an abstract reference to a file in the system.
I want to avoid the disk write (saving the InputStream to disk) and the disk read (reading the saved file in JODConverter).
Is there any way I can get a File object refering to an input stream? just any other way to avoid disk IO will also do!
EDIT: I don't care if this will end up using a lot of system memory. The application is going to be hosted on a LAN with very little to zero number of parallel users.
File-based conversions are faster than stream-based ones (provided by StreamOpenOfficeDocumentConverter) but they require the OpenOffice.org service to be running locally and have the correct permissions to the files.
Try the doc to avoid disk writting:
convert(java.io.InputStream inputStream, DocumentFormat inputFormat, java.io.OutputStream outputStream, DocumentFormat outputFormat)
There is no way to do it and make the code solid. For one, the .convert() method only takes two Files as arguments.
So, this would mean you'd have to extend File, which is possible in theory, but very fragile, as you are required to delve into the library code, which can change at any time and make your extended class non functional.
(well, there is a way to avoid disk writes if you use a RAM-backed filesystem and read/write from that filesystem, of course)
Chances are that commons fileupload has written the upload to the filesystem anyhow.
Check if your FileItem is an instance of DiskFileItem. If this is the case the write implementation of DiskFileItem willl try to move the file to the file object you pass. You are not causing any extra disk io then since the write already happened.