I'm writing a web application and want the user to be able click a link and get a file download.
I have an interface is in a third party library that I can't alter:
writeFancyData(File file, Data data);
Is there an easy way that I can create a file object that I can pass to this method that when written to will stream to the HTTP response?
Notes:
Obviously I could just write a temporary file and then read it back in and then write it the output stream of the http response. However what I'm looking for is a way to avoid the file system IO. Ideally by creating a fake file that when written to will instead write to the output stream of the http response.
e.g.
writeFancyData(new OutputStreamBackedFile(response.getOutputStream()), data);
I need to use the writeFancyData method as it writes a file in a very specific format that I can't reproduce.
Assuming writeFancyData is a black box, it's not possible. As a thought experiment, consider an implementation of writeFancyData that did something like this:
public void writeFancyData(File file, Data data){
File localFile = new File(file.getPath());
...
// process data from file
...
}
Given the only thing you can return from any extended version of File is the path name, you're just not going to be able to get the data you want into that method. If the signature included some sort of stream, you would be in a lot better position, but since all you can pass in is a File, this can't be done.
In practice the implementation is probably one of the FileInputStream or FileReader classes that use the File object really just for the name and then call out to native methods to get a file descriptor and handle the actual i/o.
As dlawrence writes the API it is impossible to determine what the API is doing with the File.
A non-java approach is to create a named pipe. You could establish a reader for the pipe in your program, create a File on that path and pass it to API.
Before doing anything so fancy, I would recommend analyzing performance and verify that disk i/o is indeed a bottleneck.
Given that API, the best you can do is to give it the File for a file in a RAM disk filesystem.
And lodge a bug / defect report against the API asking for an overload that takes a Writer or OutputStream argument.
Related
I have code like
ParquetWriter<Record> writer = getParquetWriter("s3a://my_bucket/my_object_path.snappy.parquet");
for (Record r : someIterable) {
validate(r);
writer.write()
}
writer.close();
if validate throws an exception, I want to release all resources associated with the writer. But I don't want to create any objects in S3 in that case. Is this achievable?
If I close the writer it will conclude the s3 multipart upload and create an object in the cloud. If I don't close it, the parts written so far will remain in the disk buffer, clogging up the works.
Yes it is a problem. It's been discussed in HADOOP-16906 Add some Abortable.abort() interface for streams etc which can be terminated
Problem here is it's not enough to add to the S3ABlockOutputStream class, we'd need to pass it through the FSDataOutputStream etc, specify it in the FS APIs, define semantics if the passthrough doesn't work, commit to maintaining it etc. A lot of effort. If you do want to do that though, patches welcome...
Keep an eye on HDFS-13934, multipart upload API. This will let you do the upload and then commit/abort it. Doesn't quite fit your workflow.
Afraid you will have to go with the upload. Do remember to set a lifecycle rule for the bucket to delete old uploads, and look at the hadoop s3guard uploads command to list/abort them too.
If I had a directory filled with different object files, is there a way I could input them into my application without opening a new stream every time? I am currently using ObjectInputStream, but I don't mind using another form of IO.
For example, if I stored my users directly onto my harddrive as objects (each having their own file: name.user), is there a way I could load them all back in using the same stream? Or would it be impossible seeing how a new File object would be needed for each individual file? Is there a way around this?
Each file will need its own stream behind the scenes; there's no way round that. But that doesn't stop you creating your own InputStream that manages this for you, and then allows you to read everything off from one stream.
The idea would be that when you try to read from your CompoundObjectInputStream or whatever, it looks to see if there are any more files that it hasn't yet processed, and opens one if so using another stream, and passes the data through. When it reaches the point where there are no more files in that directory, the CompoundObjectInputStream indicates end-of-stream.
No, there is not. Each physical file requires its own FileInputStream, FileChannel, or other corresponding native accessor.
Note that File has no direct link to a physical file, it is just an abstract path name.
I have some text in a String that I can't write to disk. I need to pass it to a method which only accepts File as a type. What would be the chain of conversions through which I could do this? I imagine I start with ByteArrayInputStream, but where next?
The problem is that java.io.File does not provides any method to read from it, it's just a reference to a file in a file system, and it may not exist.
The method you are calling may just get the full address of the file using file.getAbsolutePath() and use that to open an InputStream.
If there's a method that receives an InputStream then you could send your ByteArrayInputStream.
Even if the method could receive an URL, then you could open up a little HTTP server and serve the data... But with a file it's kind of difficult.
I have an application which need to read a file which is a serialized result of ArrayList.(ArrayList<String>, 50000 records in this list, size: 20MB)
I don't know exactly how to read the data in to hadoop platform. I only have some sense I need to override InputFormat and OutpurFormat.
I'm a beginner in hadoop platform. Could you give me some advise?
Thanks,
Zheng.
To start with you'll need to extend the FileInputFormat, notable implementing the abstract FileInputFormat.createRecordReader method.
You can look through the source of something like the LineRecordReader (which is what TextInputFormat uses to process text files).
From there you're pretty much on your own (i.e. it depends on how your ArrayList has been serialized). Look through the source for the LineRecordReader and try and relate that to how your ArrayList has been serialized.
Some other points of note, is your file format splittable? I.e. can you seek to an offset in the file and recover the stream from there (Text files can as they just scan forward to the end of the current line and then start from there). If your file format uses compression, you also need to take this into account (you cannot for example randomly seek to a position in a gzip file). By default FileInputFormat.isSplittable will return true, which you may want to initially override to be false. If you do stick with 'unsplittable' then note that your file will be processed by a single mapper (not matter it's size).
Before processing data on Hadoop you should upload data to HDFS or another supported file system of cause if it wasn't upload here by something else. If you are controlling the uploading process you can convert data on the uploading stage to something you can easily process, like:
simple text file (line per array's item)
SequenceFile if array can contain lines with '\n'
This is the simplest solution since you don't have to interfere to Hadoop's internals.
In my program i want the user to be able to take some images from a directory, and save them under a single file, that can be transferred to another computer possibly, and actually read and displayed(using the same program).
How would i go about doing this, especially if i want to save other data along with it, perhaps objects and such. I know you can use the ObjectOutputStream class, but im not sure how to integrate it with images.
So overall, i want the program to be able to read/write data, objects, and images to/from a single file.
Thanks in Advance.
[EDIT - From Responses + Comment regarding Zip Files]
A zip might be able to get the job done.
But i want it to be read only be the program. ( You think making it a zip, changing the file extension would work, then when reading it just chaing it back and reading as a zip?? ) I dont want users to be able to see the contents directly.
Ill elaborate a bit more saying its a game, and users can create their own content using xml files, images and such. But when a user creates something i dont want other users to be able to see exactly how they created it, or what they used, only the end result.
You can programatically create a zip file, and read a zip file from Java, no need to expose it as a regular .zip file.
See: java.io.zip pacakge for more information, and these others for code samples on how to read/write zip using java.
Now if you want to prevent the users from unzipping this file, but you don't want to complicate your life by encrypting the content, or creating a complex format, you can emulate a simple internet message format, similar to the one used for e-mails to attach files.
You can read more about the internet message format here
This would be a custom file format only used by your application so you can do it as simple as you want. You just have to define your format.
It could be:
Header with the names ( and number ) of files in that bundle.
Followed by a list of separators ( for instance limit.a.txt=yadayada some identifier to know you have finished with that content )
Actual content
So, you create the bundle with something like the following:
public void createBundle() {
ZipOutputStream out = ....
writeHeader( out );
writeLimits( out yourFiles );
for( File f : youFiles ) {
writeFileTo( f, out );
}
out.close();
}
Sort of...
And the result would be a zipped file with something like:
filenames =a.jpg, b.xml, c.ser, d.properties, e.txt
limits.a.jpg =poiurqpoiurqpoeiruqeoiruqproi
limits.b.xml =faklsdjfñaljsdfñalksjdfa
limit.s.ser =sdf09asdf0as9dfasd09fasdfasdflkajsdfñlk
limit.d.properties =adfa0sd98fasdf90asdfaposdifasdfklasdfkñm
limit.e.txt =asdf9asdfaoisdfapsdfñlj
attachments=
<include binary data from a.jpg here>
--poiurqpoiurqpoeiruqeoiruqproi
<include binary data from b.xml here>
--faklsdjfñaljsdfñalksjdfa
etc
Since is your file format you can keep it as simple as possible or complicate your life at infinitum.
If you manage to include a MIME library in your app, that could save you a lot of time.
Finally if you want to add extra security, you have to encrypt the file, which is not that hard after all, the problems is, if you ship the encrypting code too, your users could get curious about it and decompile them to find out. But a good encrypting mechanism would prevent this.
So, depending on your needs you can go from a simple zip, a zip with a custom format, a zip with a complicated customformat or a zip with a custom complicated encrypted format.
Since that's to broad you may ask about specific parts here: https://stackoverflow.com/questions/ask
In your case I would use a ZIP library to package all the images in a ZIP file. For the metadata you want to save along with these, use XML files. XML and ZIP are quite a de-facto standard today, simple to handle and though flexible if you want to add new files or metadata. There are also serializing tools to serialize your objects into XML. (I don't know them exactly in Java, but I'm sure there are.)
Yep, just pack/unpack them with java.util.zip.* which is pretty straightforward to go. Every Windows Version since XP has built in zip support, so your good to go. There are many good (and faster) free zip libraries for java/c#, too.
I know you can use the ObjectOutputStream class, but im not sure how to integrate it with images.
Images are binary data, so reading it into a byte[] and writing the byte[] to ObjectOutputStream should work. It's however only memory hogging since every byte eats at least one byte of JVM's memory. You'll need to take this into account.