Extract additional information from CSV/XML file in Java Code - java

I have a question for you.
I have a XML file (or CSV file):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<City>
<Code>LO</Code>
<Name>London</Name>
</City>
and I want to extract the additional information (for example, Author, Description, Creator, Comments, Format, ContentType etc.) from it in Java Code.
I read this similar question, but the extracting is from Excel file to Java Code: How to set Author name to excel file using poi
I would like to get in output the additional information (System.out.println(getAuthor) for example), if I give in input the filename (for example, test.csv or test.xml).
Who can help me?

Those information are not inside the file itself, which only contains its content (like your XML string). They depend on the operating system (which one are you using?). And it is a little bit unclear what you are looking for. So here is what you mentioned:
Author
Path path = Paths.get("C:/Users/Thomas/workspace_eclipse_java/Test/javassist-3.12.1.GA.jar");
FileOwnerAttributeView owner= Files.getFileAttributeView(path, FileOwnerAttributeView.class);
System.out.println("owner: " + owner.getOwner().getName());
Description
I have no idea what this should be. Never saw this on Windows or Linux.
Creator
Do you mean the Author again?
Comments
I have no idea what this should be. Never saw this on Windows or Linux.
Format
Check the file extension
ContentType
Check the file extension or take a look inside.
Generally
Generally you can check what is available by this:
FileSystem fileSystem = FileSystems.getDefault();
Set<String> fileSystemViews = fileSystem.supportedFileAttributeViews();
for (String fileSystemView : fileSystemViews)
System.out.println(fileSystemView);

Related

How to save file with custom file extension in java?

Dear brothers Hope you all right?
I'm designing a document program, however, rather to save file .text extension or using any other MS-Office API in java, i want to create my custom file format such as ".sad" extension so that this sort of file can only be read by my programs, how this can be possible?
Your requirement seems ambiguous. Are you looking to make a program that creates MS Office Word documents or plain text files with a custom file extension?
In the case of the former, you can't have a custom extension as MS Word documents, by definition, have a .doc / .docx extension.
However, if you are looking to create a program that produces text files then you can easily have a custom extension. Just look at this tutorial: How to create a file in Java
I already stated why this is a bad idea. Yet I have a solution for you (more like a how-not-to-do-it)
Take your plain text you want to save, convert it to bytes and apply this "highly enthusiastic encryption nobody will ever be able to break" on it:
string plainText = "yadayada";
bytes[] bytesFromText = toBytes(plainText);
bytes[] encrypted = new Array(sizeof(bytesFromText)*2);
for(int i = 0; i < sizeof(bytesFromText); i++){
if((i modulo 2) == 0){
encrypted.push(toByte(Math.random modulo 255));
}
encrypted.push(bytesFromText[i]);
}
I let it up to you to figure out why this is a bad idea and how to decrypt it. ;)
You can create file with any extension
For example,
File f = new File("confidential.sad");
Hope this will work for you :)
Working with custom files in Java
Here is the tutorial that will help you in getting the concept about how to create your own files with custom extension such as .doc or .sad with some information embedded in it and after saving the file you want to read that information form the file.
ZIP
Similar applications often use archives to store data. Consider MS-Word and its documents >with the .docx file extension. If you change the extension of any .docx file to .zip, you >will find that the document is actually a zip archive, with only a different extension.
https://www.ict.social/java/files/working-with-custom-files-in-java-zip-archive
I have published a library that saves files, and handles everything with one line of code only, you can find it here along with its documentation
Github repository
and the answer to your question is so easy
String path = FileSaver
.get()
.save(file,"file.custom");

Storing File Name only in a parameter

I have a requirement in which the parameter is coming as file name that upon debugging I have analyzed, as shown below:
private processfile ( string filepath)
{
}
Now this file path can be like:
C:\abc\file1.txt
or
C:\abc\def\file1.txt
or
C:\ghj\ytr\wer\file1.txt
so I have achieved this with as shown below..
String p = new File(filePath).getName();
Now the issue is that upon printing the parameter p upon console it prints
file1.txt
whereever I was tring that only the file name to be stored and not the extension, such as
P should only contain file1 only and no extenstion. please advise.
What vishal_aim said will work and is correct, but in my opinion it is better to use a library because it will be more expressive and will you won't have to fix all the bugs they've already fixed. Therefore, you should use this:
FilenameUtils.getBaseName(yourFile)
Here's what the documentation says:
a/b/c.txt --> c
a.txt --> a
a/b/c --> c
a/b/c/ --> ""
That last case is something that probably didn't occur to any of us here as a possibility, but the library writers already thought of it for us.
there is no inbuilt API to to get file name without extension. But why cant you trancate it programmatically like:
p = p.substring(0, p.lastIndexOf("."));

Capture generated output file path and name using CSSDK

We are in the process of converting over to using the XSLT compiler for page generation. I have a Xalan Java extention to exploit the CSSDK and capture some meta data we have stored in the Extended Attributes for output to the page. No problems in getting the EA's rendered to the output file.
The problem is that I don't know how to dynamically capture the file path and name of the output file.
So just as POC, I have the CSVPath hard coded to the output file in my Java extension. Here's a code sample:
CSSimpleFile sourceFile = (CSSimpleFile)client.getFile(new CSVPath("/some-path-to-the-output.jsp"));
Can someone point me in the CSSDK to where I could capture the output file?
I found the answer.
First, get or create your CSClient. You can use the examples provided in the cssdk/samples. I tweaked one so that I captured the CSClient in the method getClientForCurrentUser(). Watch out for SOAP vs Java connections. In development, I was using a SOAP connection and for the make_toolkit build, the Java connection was required for our purposes.
Check the following snippet. The request CSClient is captured in the static variable client.
CSSimpleFile sourceFile = (CSSimpleFile)client.getFile(new CSVPath(XSLTExtensionContext.getContext().getOutputDirectory().toString() + "/" + XSLTExtensionContext.getContext().getOutputFileName()));

How to extract data from a lot of URLs?

I have about 3200 URLs to small XML files which have some data in the form of strings(obviously).The XML files are displayed(not downloaded) when I go to the URLs. So I need to extract some data from all those XMLs and save it in a single .txt file or XML file or whatever. How can I automate this process?
*Note: This is what the files look like. I need to copy the 'location' and 'title' from all of them and put them in one single file. Using what methodology can this be achieved?
<?xml version="1.0"?>
-<playlist xmlns="http://xspf.org/ns/0/" version="1">
-<tracklist>
<location>http://radiotool.com/fransn.mp3</location>
<title>France, Paris radio 104.5</title>
</tracklist>
</playlist>
*edit: Fixed XML.
It's easy enough with XQuery or XSLT, though the details will depend on how the URLs are held. If they're in a Java List, then (with Saxon at least) you can supply this list as a parameter to the following query:
declare variable urls as xs:string* external;
<data>{
for $u in $urls return doc($u)//*:tracklist
}</data>
The Java code would be something like:
Processor proc = new Processor();
XQueryCompiler c = proc.newXQueryCompiler();
XQueryEvaluator q = c.compile($query).load();
List<XdmItem> urls = new ArrayList();
for (url : inputUrls) {
urls.append(new XdmAtomicValue(url);
}
q.setExternalVariable(new QName("urls"), new XdmValue(urls));
q.setDestination(...)
run();
Have a look at the JSoup library here: http://jsoup.org/
It has facilities for pulling and fixing the contents of a URL, it is intended for HTML though, so I'm not sure it will be good for XML, but it is worth a look.

identify the file extension using java

i have different format files in DB. i want to copy to my local machine.
how can i identify the file format (doc, xls, etc...)
Regards,
krishna
Thanks, for providing suggestions... based on your suggestions i had written the code & i am completed...
please look into my blog.. i posted the code over here...
http://muralie39.wordpress.com/java-program-to-copy-files-from-oracle-to-localhost/
Thank you guys..
Thanks,
krishna
If your files are named according to convention, you can just parse the filename:
String filename = "yourFileName";
int dotPosition = filename.lastIndexOf(".");
String extension = "";
if (dotPosition != -1) {
extension = filename.substring(dotPosition);
}
System.out.println("The file is of type: " + extension);
That's the simplest approach, assuming your files are named using some kind of standard naming convention. The extensions could be proprietary to your system, even, as long as they follow a convention this will work.
If you need to actually scan the file to get the format information, you will need to do some more investigation into document formats.
How are the files stored? Do you have filenames with extensions, or just the binary data?
Mime Util has tools to detect format both from extensions and from magic headers, but of course that's never 100%.
You can use the Tika apache library.
As Dmitri pointed out however, you may have incorrect results sometimes if detecting mime type from file headers or file extension.

Categories