Read metadata with ExifTool

Read metadata with ExifTool - java

I'm trying to read illustrator file metadata value by using Exiftool. I tried as per below.
File[] images = new File("filepath").listFiles();
ExifTool tool = new ExifTool(Feature.STAY_OPEN);
for(File f : images) {
if (f.toString().contains(".ai"))
{
System.out.println("test "+tool.getImageMeta(f, Tag.DATE_TIME_ORIGINAL));
}
}
tool.close();
Above code not printing any value. I even tried this.
public static final File[] IMAGES = new File("filepath").listFiles();
ExifTool tool = new ExifTool(Feature.STAY_OPEN);
for (File f : IMAGES) {
System.out.println("\n[" + f.getName() + "]");
System.out.println(tool.getImageMeta(f, Format.NUMERIC,
Tag.values()));
}
Which only prints {IMAGE_HEIGHT=2245, IMAGE_WIDTH=5393}. How do I call metadata values using Exiftool. Any advices and references links are highly appreciated.

For the given API, it either;
1-does not contain the tag you are looking for
2-the file itself might not have that tag filled
3-you might want to recreate your own using a more general tag command when calling exiftool.exe
Look in the source code and find the enum containing all the tags available to the API, that'll show you what you're restricted to. But yeah, you might want to consider making your own class similar to the one you're using. I'm in the midst of doing the same. That way you can store the tags in perhaps a set or HashMap instead of an enum and therefore be much less limited in tag choice. Then, all you have to do is write the commands for the tags you want to the process's OutputStream and then read the results from the InputStream.

Related

Multiple file reading loop and distinguishing between .pdf and .doc files

Am writing a Java program in Eclipse to scan keywords from resumes and filter the most suitable resume among them, apart from showing the keywords for each resume. The resumes can be of doc/pdf format.
I've successfully implemented a program to read pdf files and doc files seperately (by using Apache's PDFBox and POI jar packages and importing libraries for the required methods), display the keywords and show resume strength in terms of the number of keywords found.
Now there are two issues am stuck in:
(1) I need to distinguish between a pdf file and a doc file within the program, which is easily achievable by an if statement but am confused how to write the code to detect if a file has a .pdf or .doc extension. (I intend to build an application to select the resumes, but then the program has to decide whether it will implement the doc type file reading block or the pdf type file reading block)
(2) I intend to run the program for a list of resumes, for which I'll need a loop within which I'll run the keyword scanning operations for each resume, but I can't think of a way as because even if the files were named like 'resume1', 'resume2' etc we can't assign the loop's iterable variable in the file location like : 'C:/Resumes_Folder/Resume[i]' as thats the path.
Any help would be appreciated!

You can use a FileFilter to read only one type or another, then respond accordingly. It'll give you a List containing only files of the desired type.
The second requirement is confusing to me. I think you would be well served by creating a class that encapsulates the data and behavior that you want for a parsed Resume. Write a factory class that takes in an InputStream and produces a Resume with the data you need inside.
You are making a classic mistake: You are embedding all the logic in a main method. This will make it harder to test your code.
All problem solving consists of breaking big problems into smaller ones, solving the small problems, and assembling them to finally solve the big problem.
I would recommend that you decompose this problem into smaller classes. For example, don't worry about looping over a directory's worth of files until you can read and parse an individual PDF and DOC file.
Create an interface:
public interface ResumeParser {
Resume parse(InputStream is) throws IOException;
}
Implement different implementations for PDF and Word Doc.
Create a factory to give you the appropriate ResumeParser based on file type:
public class ResumeParserFactory {
public ResumeParser create(String fileType) {
if (fileType.contains(".pdf") {
return new PdfResumeParser();
} else if (fileType.contains(".doc") {
return new WordResumeParser();
} else {
throw new IllegalArgumentException("Unknown document type: " + fileType);
}
}
}
Be sure to write unit tests as you go. You should know how to use JUnit.

Another alternative to using a FileFilter is to use a DirectoryStream, because Files::newDirectoryStream easily allows to specify relevant file endings:
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir, "*.{doc,pdf}")) {
for (Path entry: stream) {
// process files here
}
} catch (DirectoryIteratorException ex) {
// I/O error encounted during the iteration, the cause is an IOException
throw ex.getCause();
}
}

You can do something basic like:
// Put the path to the folder containing all the resumes here
File f = new File("C:\\");
ArrayList<String> names = new ArrayList<>
(Arrays.asList(Objects.requireNonNull(f.list())));
for (String fileName : names) {
if (fileName.length() > 3) {
String type = fileName.substring(fileName.length() - 3);
if (type.equalsIgnoreCase("doc")) {
// doc file logic here
} else if (type.equalsIgnoreCase("pdf")) {
// pdf file logic here
}
}
}
But as DuffyMo's answer says, you can also use a FileFilter (it's definitely a better option than my quick code).
Hope it helps.

Reading & Parsing Blockchain DAT files

I'm working on some code that reads the DAT files in the Blockchain, and I was trying to use bitcoinj because it seemed fairly straightforward. However, I can't seem to get it to actually read the blocks within the DAT file. I've tried many different versions and have made no significant progress.
I'm feeling like this should be fairly straightforward, and I'm just missing something simple here. To be clear, I'm not trying to write to the Blockchain, just read the DAT files.
Thanks!
Here is a code snippet.
NetworkParameters np = new MainNetParams();
Context c = new Context( np );
Context.getOrCreate(MainNetParams.get());
List<File> blockChainFiles = new ArrayList<>();
blockChainFiles.add( new File( "blk00000.dat" ) );
BlockFileLoader bfl = new BlockFileLoader(np, blockChainFiles);
int blockNum = 0;
// Iterate over the blocks in the dataset.
for (Block block : bfl) {
...
This code produces the following error:
Exception in thread "main" java.lang.IllegalStateException: Context does not match implicit network params: org.bitcoinj.params.MainNetParams#9d1d82f2 vs org.bitcoinj.params.MainNetParams#9d1d82f2
at org.bitcoinj.core.Context.getOrCreate(Context.java:147)
at testBitcoin.main(testBitcoin.java:20)

The block .dat files contain multiple blocks in one file, including orphans, separated by magic numbers.
Please refer https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
Your code doesn't seem to be looking for magic numbers or jumping lengths as specified by message structure.

Just get rid of the complaining line, Context.getOrCreate(MainNetParams.get());, it's not needed.
The following slightly altered version of your code worked for me:
List<File> blockChainFiles = new ArrayList<>();
blockChainFiles.add(new File("blk00000.dat"));
MainNetParams params = MainNetParams.get();
Context context = new Context(params);
BlockFileLoader bfl = new BlockFileLoader(params, blockChainFiles);
// Iterate over the blocks in the dataset.
for (Block block : bfl) {
System.out.println(block.getHashAsString());
}

You can use my blockchain parser. It writtens on Python and can parse all the data from blk*.dat files to the simple text view.

Loading multiple properties sets from single file for multiple class instances

I have a class of which I need a different instance if one of its attributes changes. These changes are read at runtime from a property file.
I would like to have a single file detailing the properties of all the single instances:
------------
name=Milan
surface=....
------------
name=Naples
surface=....
How can I load each set of properties in a different Property class (maybe creating a Properties[])? Is there a Java built in method to do so?
Should I manually parse it, how could create an InputStream anytime I find the division String among the sets?
ArrayList<Properties> properties = new ArrayList<>();
if( whateverItIs.nextLine() == "----" ){
InputStream limitedInputStream = next-5-lines ;
properties.add(new Properties().load(limitedInputStream));
}
Something like above. And, by the way, any constructor method which directly creates the class from a file?
EDIT: any pointing in the right direction to look it for myself would be fine too.

First of all, read the whole file as a single string. Then use split and StringReader.
String propertiesFile = FileUtils.readFileToString(file, "utf-8");
String[] propertyDivs = propertiesFile.split("----");
ArrayList<Properties> properties = new ArrayList<Properties>();
for (String propertyDiv : propertyDivs) {
properties.add(new Properties().load(new StringReader(propertyDiv)));
}
The example above uses apache commons-io library for file to String one-liner, because Java does not have such a built-in method. However, reading file can be easily implemented using standard Java libraries, see Whole text file to a String in Java

Java - find matching pairs from list

background:
I need to load test a process on a server that I am working with. What I am doing is I am creating a bunch of files on client side and will upload them to server. The server is monitoring for new files (in input dir, file names are unique) and once there is a new file it processes it, once done, it creates a response file with same name but different extension to output dir. If the processing fails, it puts the incoming file to error dir. I am using the inotifywait to monitor the changes on server, which outputs:
10:48:47 /path/to/in/ CREATE ABCD.infile1
10:48:55 /path/to/out/ CREATE ABCD.outfile1
or
10:49:11 /path/to/in/ CREATE ASDF.infile1
10:49:19 /path/to/err/ CREATE ASDF.infile1
problem:
I need to parse the list of all results (planning to implement in java) like so, that I take the infile and match it with the same file name (either found in ERR or OUT), calculate the time taken and indicate weather it was success or not. The idea I am having is to create 3 lists (in, out, err) and try to parse, something like (in pseudo-code)
inList
outList
errList
for item : inList
if outlist.contains(item) parse;
else if errList.contains(item) parse;
else error;
question:
Is this efficient? Or is there a better way to approach this situation? Anyway, you might think that it is a code you are executing just once, why the struggle, but I really would like to know how do handle this properly.

The solution with lists is problematic, as you will have to keep them synchronized properly with the state of drive and always load them. What is more you will reach at some point capacity limit for file stored in single location.
Alternatives what you have are that you use i/o API to check path existence, or introduce a between database where you will store your values.
Another approach is database where you will store the information about keys and physical paths that file really has.
If I was you i would start with the I/O API and design a simple interface that could be replaced in future if the solution would appear to be inefficient.

You can use the "UserDefinedfileAttributeView" concept.
Create your own File attribute, say, "Result" and set its value accordingly for the files in IN dir. If the file is moved to OUT dir, "Result"="Success" and if the file is moved to ERR dir, "Result"="Error"
I tried the below code, hope it helps.
public static void main(String[] args) {
try{
Path file = Paths.get("C:\\Users\\rohit\\Desktop\\imp docs\\Steps.txt");
UserDefinedFileAttributeView userView = Files.getFileAttributeView(file, UserDefinedFileAttributeView.class);
String attribName = "RESULT";
String attribValue = "SUCCESS";
userView.write(attribName, Charset.defaultCharset().encode(attribValue));
List<String> attribList = userView.list();
for (String s : attribList) {
ByteBuffer buf = ByteBuffer.allocate(userView.size(s));
userView.read(s, buf);
buf.flip();
String value = Charset.defaultCharset().decode(buf).toString();
if("SUCCESS".equals(value)){
System.out.print(String.format("User defined attribute: %s", s));
System.out.println(String.format("; value: %s", value));
}
}
}
catch(Exception e){
}
You can do this for every file placed in IN dir.

Files, URIs, and URLs conflicting in Java

I am getting some strange behavior when trying to convert between Files and URLs, particularly when a file/path has spaces in its name. Is there any safe way to convert between the two?
My program has a file saving functionality where the actual "Save" operation is delegated to an outside library that requires a URL as a parameter. However, I also want the user to be able to pick which file to save to. The issue is that when converting between File and URL (using URI), spaces show up as "%20" and mess up various operations. Consider the following code:
//...user has selected file
File userFile = myFileChooser.getSelectedFile();
URL userURL = userFile.toURI().toURL();
System.out.println(userFile.getPath());
System.out.println(userURL);
File myFile = new File(userURL.getFile());
System.out.println(myFile.equals(userFile);
This will return false (due to the "%20" symbols), and is causing significant issues in my program because Files and URLs are handed off and often operations have to be performed with them (like getting parent/subdirectories). Is there a way to make File/URL handling safe for paths with whitespace?
P.S. Everything works fine if my paths have no spaces in them (and the paths look equal), but that is a user restriction I cannot impose.

The problem is that you use URL to construct the second file:
File myFile = new File(userURL.getFile());
If you stick to the URI, you are better off:
URI userURI = userFile.toURI();
URL userURL = userURI.toURL();
...
File myFile = new File(userURI);
or
File myFile = new File( userURL.toURI() );
Both ways worked for me, when testing file names with blanks.

Use instead..
System.out.println(myFile.toURI().toURL().equals(userURL);
That should return true.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Read metadata with ExifTool - java

Related

Multiple file reading loop and distinguishing between .pdf and .doc files

Reading & Parsing Blockchain DAT files

Loading multiple properties sets from single file for multiple class instances

Java - find matching pairs from list

Files, URIs, and URLs conflicting in Java

Categories

Resources