Extracting Multiple Embedded Files via Oracle Search and Export - java

I am currently implementing the Oracle OutsideIn Search and Export tool in Java to extract the metadata and content of different files. I was able to do this on multiple files inside a folder however I wasn't able to extract the files embedded on another file. I would like to know if this is possible in Search and Export.
If not, I'd go for CleanContent but it only accepts Microsoft and PDF files.

Search Export can convert or extract embedded files from within archives or within other types of files. We distinguish between three different types of embeddings, each of which has its own option to control their conversion. The three types are archive sub-docs, email attachments, and generic embeddings. By default the first two are converted, but the third isn't. To enable generic embeddings conversion, set the SCCEX_XML_EMBEDDINGS flag in the SCCOPT_XML_SEARCHML_FLAGS option. If you are using the exporter sample app supplied with the SDK, try enabling the following in your CFG file.
embeddingsflag yes
If you are trying to extract a binary copy of the embedding, it becomes a three step process. On your initial conversion, set the SCCEX_XML_PRODUCEOBJECTINFO flag in the SCCOPT_XML_SEARCHML_FLAGS option. Use that information for the desired embedding(s) to fill in a SCCDAOBJECT structure that is passed to DAOpenDocument. The hDoc that is returned from that function can be passed to DASaveInputObject to save a binary copy of the embedding. This works for any of the three types of embeddings described above. There is no Java sample app that demonstrates this process.

Related

Need to get absolute path of a webpage using Java

I am writing a program that downloads some HTML. I need to retain the original filenames and folder structure as are on the server, so I need a way to handle links like "www.google.com". If you type it in, it will obviously download some PHP, but I need to know exactly what that file is called. I am writing in Kotlin, but if you can provide an answer in Java that'll work.
I need to retain the original filenames and folder structure as are on the server
This not possible, you cannot know the structure of the data on the server. There may not even exist a file and folder structure on the server, the returned data could be all dynamically generated an not be based on a filesystem.

how to see the custom properties information in the Windows property on a specific file when i right click on it using Explore?

can anybody explain to me, how to proceed in the following scenario ?
I need to add custom properties(that is new metadata to a file like example classification_of_file with value sensitive) to all files like txt,pdf,doc,docx, ppt pptx , xls,xlsx etc.. using JAVA and then i want to see this custom properties information in the Windows property on a specific file when i right click on it using Explorer .
note:
Is there any API using which i can do this ?
Is it possible to do this by using Apache Jackrabbit?
Are you talking about Windows property on a specific file when you right click on it using Explorer?
If so, you need to use the Java API for file attributes, precisely UserDefinedFileAttributeView.
You can use this view to write any property you may want on a specific file.
Path path = FileSystems.getDefault().getPath("C:/file.txt");
UserDefinedFileAttributeView view =
Files.getFileAttributeView(path, UserDefinedFileAttributeView.class);
view.write("classification_of_file", Charset.defaultCharset().encode("sensitive"));
You can also call FileStore.supportsFileAttributeView() to check if your file system supports it.
You will find more explanations on file attributes in the Java documentation.
As for the second point, I don't know Apache Jackrabbit so I can't help you that much.
Apache Jackrabbit will not help you set properties on a file that's stored in your filesystem.
It can nicely manage metadata of any kind for files that it stores itself, and which you can make available via WebDAV, but that requires storing files in the JCR repository.

How to know file type without extension

While trying to come-up with a servlet based application to read files and manipulate them (image type conversion) here is a question that came up to me:
Is it possible to inspect a file content and know the filetype?
Is there a standard that specifies that each file MUST provide some type of marker in their content so that the application will not have to rely on the file extension constraints?
Consider an application scenario:
I am creating an application that will be able to convert different file formats to a set of output formats. Say user uploads an PDF, my application can suggest that the possible conversion formats are microsoft word or TIFF or JPEG etc.
As my application will gradually support different file formats (over a period of time), I want my application to inspect the input file instead of having the user to specify the format. And suggest to user the possible formats of output.
I understand this is an open ended, broad question. Please let me know if it needs to be modified.
Thanks,
Ayusman
Yeap you can figure out the type without an extension using the magic number.
Also, the way the file command figures it out, is actually through a 3 step check:
Check for filesystem properties to identifie empty files, folders, etc...
The said magic number
In text files, check for language in it
Here's a library that'll help you with Magic Numbers: jmimemagic

creating own file extension [duplicate]

This question already has answers here:
How to create my own file extension like .odt or .doc? [closed]
(3 answers)
Closed 8 years ago.
I'm on my way in developing a desktop application using netbeans(Java Dextop Application) and I need to implement my own file format which is specific to that application only. I'm quite uncertain as to how should I go about first.What code should I use so that my java application read that file and open it in a way as I want it to be.
If it's character data, use Reader/Writer. If it's binary data, use InputStream/OutputStream. That's it. They are available in several flavors, like BufferdReader which eases reading a text file line by line and so on.
They're part of the Java IO API. Start learning it here: Java IO tutorial.
By the way, Java at its own really doesn't care about the file extension or format. It's the code logic which you need to write to handle each character or byte of the file according to some file format specification (which you in turn have to writeup first if you'd like to invent one yourself).
I am not sure this directly addresses your question, but since you mentioned a custom file format, it is worth noting that applications launched using Java Web Start can declare a file association. If the user double clicks one of those file types, the file name will be passed to the main(String[]) of the app.
This ability is used in the File Service demo. of the JNLP API - available at my site.
As to the exact format of the file & the best ways to load and save it, there are a large number of possibilities that can be narrowed down with more details of the information it contains.
Choosing a new/existing file extension does not affect your application (or in any case anyone's). It is upto the programmer what files he wants his app to read.
For example, you may consider you can't read a pdf or doc directly as a text file....but that is not because they are written/ stored differently, but because they have headers or characters which your app does not understand. So we might use a plugin or extension which understands those added headers ( or rather the grammar of the pdf /doc file) removes them & lets our app know what text (or anything else) it contains.
So if you wish to incorporate your own extension, & specifically want no other application to be able to read it, just write the text in a way that only your program is able to understand. Though writing a file in binary pretty much ensures that your file is not read directly just by user opening a file, but it is however still possible to read from it, if it is merely collection of raw characters.
If you ask code for hiding a data, I'd say there are plenty of algorithms you might use, which usually get tagged as encryptions cause you are basically trying to lock/hide your stuff. So if you do not really care for the big hulla-bulla, simply trying to keep a file from being directly read & successful attempts to read the file does not cause any harm to your application, write it in binary.

Create AppleDouble formatted file in Linux

I'm working on an application that syncs data. For Mac OS, files are uploaded and if they contain resource fork information, the fork is read and stored as a string using: file/..namedfork/rsrc
Users can access their files using a Web application(Java) that's running on a Linux server, is there a way that I can generate a valid AppleDouble format file using only the data fork and the string I read from the namedfork? I don't mind losing the Finder Metadata.
Note: The generated file will be downloaded (using the Web Application) as a single file for Mac OS users.
Is this possible?
Regards
As far as I'm aware, OS 9/OS X can only natively access the resource forks on files served by AppleTalk shares. For other media, e.g. SMB (Microsoft Networking) or HTTP, the only way to preserve the resource fork is to place the file in an archive.
There are several Mac-specific archive formats that support this, for example, StuffIt and HQX. I very much doubt the Linux binaries for StuffIt would allow packaging a resource fork from a separate file, but at least there is something for you to evaluate.
Looking at the AppleDouble Wikipedia entry, it seems it may be possible to create such a file from a non-Apple machine using an open source tool, and sending the resultant file using the multipart/appledouble MIME type. Perhaps you could call this binary from your Java code?
The wikipedia article states:
AppleSingle combined both file forks and the related Finder meta-file information into a single file, whereas AppleDouble stored them as two separate files.
The apple knowledgebase article states:
The second new file has the name of the original file prefixed by a "._ " and contains the resource fork of the original file.
So I assume you just have to save the content of your resource fork string into the appropriately named file.
Edit:
After your comment I'm not sure what you want. Your question was how to
Create AppleDouble formatted file in Linux
and the documentation I linked to shows that you need to create two files to do that one containing the data and one containing the resource fork with a name that has ._ prefixed. If that is not what you want then you need to ask a different question.

Categories