is it possible to set custom metadata on files, using Java? - java

Is it possible to get and set custom metadata on File instances? I want to use the files that I process through my system as some kind of a very simple database, where every file should contain additional custom metadata, such as the email of the sender, some timestamps, etc.
It is for an internal system, so security is not an issue.

In java 7 you can do this using the Path class and UserDefinedFileAttributeView.
Here is the example taken from there:
A file's MIME type can be stored as a user-defined attribute by using this code snippet:
Path file = ...;
UserDefinedFileAttributeView view = Files
.getFileAttributeView(file, UserDefinedFileAttributeView.class);
view.write("user.mimetype",
Charset.defaultCharset().encode("text/html");
To read the MIME type attribute, you would use this code snippet:
Path file = ...;
UserDefinedFileAttributeView view = Files
.getFileAttributeView(file,UserDefinedFileAttributeView.class);
String name = "user.mimetype";
ByteBuffer buf = ByteBuffer.allocate(view.size(name));
view.read(name, buf);
buf.flip();
String value = Charset.defaultCharset().decode(buf).toString();

You should always check if the filesystem supports UserDefinedFileAttributeView for the specific file you want to set
You can simply invoke this
Files.getFileStore(Paths.get(path_to_file))).supportsFileAttributeView(UserDefinedFileAttributeView.class);
From my experience, the UserDefinedFileAttributeView is not supported in FAT* and HFS+ (for MAC) filesystems

Related

Let Tika suggest a file-extension [duplicate]

I am uploading files to an Amazon s3 bucket and have access to the InputStream and a String containing the MIME Type of the file but not the original file name. It's up to me to actually create the file name and extension before pushing the file up to S3. Is there a library or convenient way to determine the appropriate extension to use from the MIME Type?
I've seen some references to the Apache Tika library but that seems like overkill and I haven't been able to get it to successfully detect file extensions yet. From what I've been able to gather it seems like this code should work, but I'm just getting an empty string when my type variable is "image/jpeg"
MimeType mimeType = null;
try {
mimeType = new MimeTypes().forName(type);
} catch (MimeTypeException e) {
Logger.error("Couldn't Detect Mime Type for type: " + type, e);
}
if (mimeType != null) {
String extension = mimeType.getExtension();
//do something with the extension
}
As some of the commentors have pointed out, there is no universal 1:1 mapping between mimetypes and file extensions... Some mimetypes have more than one possible extension, many extensions are shared by multiple mimetypes, and some mimetypes have no extension.
Wherever possible, you're much better off storing the mimetype and using that going forward, and forgetting about the extension.
That said, if you do want to get the most common file extension for a given mimetype, then Tika is a good way to go. Apache Tika has a very large set of mimetypes it knows about, and for many of these it also knows mime magic for detection, common extensions, descriptions etc.
If you want to get the most common extension for a JPEG file, then as shown in this Apache Tika unit test you just need to do something like:
MimeTypes allTypes = MimeTypes.getDefaultMimeTypes();
MimeType jpeg = allTypes.forName("image/jpeg");
String jpegExt = jpeg.getExtension(); // .jpg
assertEquals(".jpg", jpeg.getExtension());
The key thing is that you need to load up the xml file that's bundled in the Tika jar to get the definitions of all the mimetypes. If you might be dealing with custom mimetypes too, then Tika supports those, and change line one to be:
TikaConfig config = TikaConfig.getDefaultConfig();
MimeTypes allTypes = config.getMimeRepository();
By using the TikaConfig method to get the MimeTypes, Tika will also check your classpath for custom mimetype defintions, and include those too.

Serialization and file size

When I serialize a file object (whose size on the hard drive is 3,404,851 bytes) using org.apache.commons.lang3.SerializationUtils such as :
File fileObject = new File(path);
byte[] fileBuffer = SerializationUtils.serialize(fileObject);
The fileBuffer.length returns 91! Shouldn't it be rather 3,404,851 ?
You're serializing the File object, not the file itself.
The File object contains just a few fields describing the file and its location, not the entire file contents.
By looking at the serialization documentation (https://docs.oracle.com/javase/8/docs/api/serialized-form.html#java.io.File) you can notice that the serialization util only saves the path of the file (with the original separator which is converted while deserializing if needed), rather than its content.
In case of Oracle documentation refer to "See also" section of javadoc to get a link to description of the serialized form.

Loading multiple properties sets from single file for multiple class instances

I have a class of which I need a different instance if one of its attributes changes. These changes are read at runtime from a property file.
I would like to have a single file detailing the properties of all the single instances:
------------
name=Milan
surface=....
------------
name=Naples
surface=....
How can I load each set of properties in a different Property class (maybe creating a Properties[])? Is there a Java built in method to do so?
Should I manually parse it, how could create an InputStream anytime I find the division String among the sets?
ArrayList<Properties> properties = new ArrayList<>();
if( whateverItIs.nextLine() == "----" ){
InputStream limitedInputStream = next-5-lines ;
properties.add(new Properties().load(limitedInputStream));
}
Something like above. And, by the way, any constructor method which directly creates the class from a file?
EDIT: any pointing in the right direction to look it for myself would be fine too.
First of all, read the whole file as a single string. Then use split and StringReader.
String propertiesFile = FileUtils.readFileToString(file, "utf-8");
String[] propertyDivs = propertiesFile.split("----");
ArrayList<Properties> properties = new ArrayList<Properties>();
for (String propertyDiv : propertyDivs) {
properties.add(new Properties().load(new StringReader(propertyDiv)));
}
The example above uses apache commons-io library for file to String one-liner, because Java does not have such a built-in method. However, reading file can be easily implemented using standard Java libraries, see Whole text file to a String in Java

Android get file using path (in String format)

My app needs to get an existing file for processing. Now I have the path of the file in String format, how can I get the File with it? Is it correct to do this:
File fileToSave = new File(dirOfTheFile);
Here dirOfTheFile is the path of the file. If I implement it in this way, will I get the existing file or the system will create another file for me?
That's what you want to do. If the file exists you'll get it. Otherwise you'll create it. You can check whether the file exists by calling fileToSave.exists() on it and act appropriately if it does not.
The new keyword is creating a File object in code, not necessarily a new file on the device.
I would caution you to not use hardcoded paths if you are for dirOfFile. For example, if you're accessing external storage, call Environment.getExternalStorageDirectory() instead of hardcoding /sdcard.
The File object is just a reference to a file (a wrapper around the path of the file); creating a new File object does not actually create or read the file; to do that, use FileInputStream to read, FileOutputStream to write, or the various File helper methods (like exists(), createNewFile(), etc.) for example to actually perform operations on the path in question. Note that, as others have pointed out, you should use one of the utilities provided by the system to locate directories on the internal or external storage, depending on where you want your files.
try this..
File fileToSave = new File(dirOfTheFile);
if(fileToSave.exists())
{
// the file exists. use it
} else {
// create file here
}
if parent folder is not there you may have to call fileToSave.getParentFile().mkdirs() to create parent folders

Java - when there is no need of writing to file - should I use input or output stream?

I'm getting from the client an inputStream and file Metadata, and saving it in my SQL table. This table also holds full file path and some unique uid.
I want to be able to pass a uid and get a "handler" to the file, but can't seem to understand if I need to return outputStream, InputStream or File?
Which one should be returned?
I want this handler for the client for the following reasons:
The user will pass it to another function
The user will decide to convert stream to a file and copy it to some local path
Also, When returning outputstram,is it enough to do the following:
OutputStream out = new FileOutputStream(PATH_TO_MY_FILE))
return out;
Am I returning an empty stream? Does out contain all file data?
I thought maybe the best way will be to return file:
File f = new File(PATH_TO_MY_FILE);
return f;
Editing:
My metadata holds file name and file type. When I get InputStream I save in in my folder and set the path in the SQL table to be : folerPath+"/"+filename + "."+ fileType
When The user will run the following function : get(fileUid) I want to retrieve the full path (by using sql query) and return the file (hanlder)
Can you please advise?
Thanks
The user will decide to convert stream to a file and copy it to some local path
This tells us that what you need to give them is an InputStream (or Reader), since they'll be reading from it.
Your code will be reading from your database or whatever, presumably via the InputStream you get back from ResultSet#getBinaryStream or similar. You might give that directly to the caller, or you may prefer to have your code in the middle, perhaps working through a memory buffer.
Re your comment below:
I'm saving the file at some DB folder...
Databases don't have folders; file systems have folders. It sounds like the file isn't stored in your database table, just the path to it. If so, use FileInputStream with the path to get an InputStream for it, which you can return to the caller.

Categories