How to extract images from Thumbs.db in Java?

How to extract images from Thumbs.db in Java? - java

HI
I read about POI project and tried to extract images from thumbs.db but getting exception in code .. Code os
InputStream stream = new FileInputStream("C:\\Thumbs.db");
POIFSFileSystem fs = new POIFSFileSystem(stream);
DirectoryEntry root = fs.getRoot();
Entry entry = root.getEntry("2");
DocumentInputStream is = fs.createDocumentInputStream(entry.getName());
JPEGImageDecoder decoder = JPEGCodec.createJPEGDecoder(is);
JPEGDecodeParam param = JPEGCodec.getDefaultJPEGEncodeParam(4, JPEGDecodeParam.COLOR_ID_RGBA);
decoder.setJPEGDecodeParam(param);
BufferedImage originalBufferedImage = decoder.decodeAsBufferedImage();
Getting exception as "com.sun.image.codec.jpeg.ImageFormatException: Not a JPEG file: starts with 0x0c 0x00"
What is problem with above case ?
Can you suggest some other way to do above task ?

You need to read the header of the Thumbs.db file before you can start extracting the images. Try with the changes I added below, it should remove the ImageFormatException you are getting.
InputStream stream = new FileInputStream("C:\\Thumbs.db");
POIFSFileSystem fs = new POIFSFileSystem(stream);
DirectoryEntry root = fs.getRoot();
Entry entry = root.getEntry("2");
DocumentInputStream is = fs.createDocumentInputStream(entry.getName());
//Added to read the header lines and fix the ImageFormatException
int header_len = is.read();
for (int i = 1; i < header_len; i++) {
is.read();
}
JPEGImageDecoder decoder = JPEGCodec.createJPEGDecoder(is);
JPEGDecodeParam param = JPEGCodec.getDefaultJPEGEncodeParam(4,JPEGDecodeParam.COLOR_ID_RGBA);
decoder.setJPEGDecodeParam(param);
BufferedImage originalBufferedImage = decoder.decodeAsBufferedImage();
I hope that helps!

Related

PNGJInputException when editing Metadata

I want to add some metadata to a png image. I am using android and the library PNGJ seems to be quite helpfull, yet I always get a "PngjInputException".
Here is the corresponding code snippet:
PngReader pngReader = new PngReader(file);
File destFile = new File(file.getAbsolutePath());
PngWriter pngWriter = new PngWriter(destFile, pngReader.imgInfo, true);
pngWriter.copyChunksFrom(pngReader.getChunksList(), ChunkCopyBehaviour.COPY_ALL_SAFE);
LinkedHashMap<String, byte[]> hashMap = bluetoothHelper.getHashMap();
for (String key : hashMap.keySet()) {
pngWriter.getMetadata().setText(key, hashMap.get(key).toString());
}
for (int row = 0, c = 0; row < pngWriter.imgInfo.rows; row++) {
IImageLine line = pngReader.readRow();
pngWriter.writeRow(line);
}
pngReader.end();
pngWriter.end();
This is the Exception thrown:
ar.com.hjg.pngj.PngjInputException: Failed to feed bytes (premature ending?)
Can anyone help me with this exception?

You are writing to the same file you are reading. You should never do that. Change destFile to a different file and retry. (Warning: check first if the source png has not been corrupted after running the above code).

Enumset error in hadoop sequence file

I am trying to create sequence file with metadata and createflag, but it gives me the error:
cannot resolve method for createwriter
I am new to Hadoop and Java programming. I have added the code below.
I am trying to add multiple images in a sequence file with keys. Once the sequence file is created if it not exist and if it exist then append image data.
Path path = new Path("hdfs://localhost:8020/user/image_data/SequenceFileCodecTest.seq");
FSDataInputStream in = null;
Text key = new Text();
BytesWritable value = new BytesWritable();
SequenceFile.Metadata metadata = null;
SequenceFile.Writer writer = null;
Option optPath = SequenceFile.Writer.file(path);
Option optKey = SequenceFile.Writer.keyClass(key.getClass());
Option optVal = SequenceFile.Writer.valueClass(value.getClass());
Option optCom = SequenceFile.Writer.compression(SequenceFile.CompressionType.RECORD);
final EnumSet<CreateFlag> flag = EnumSet.of(CreateFlag.CREATE,CreateFlag.APPEND);
writer = SequenceFile.createWriter(conf,optPath,optKey, optVal,optCom,new DefaultCodec(),metadata,flag);

java.net.malformedURL exception

URL stringfile = getXsl("test.xml");
File originFile = new File(stringfile.getFile());
String xml = null;
ByteArrayOutputStream pdfStream = null;
try {
FileInputStream fis = new FileInputStream(originFile);
int length = fis.available();
byte[] readData = new byte[length];
fis.read(readData);
xml = (new String(readData)).trim();
fis.close();
xml = xml.substring(xml.lastIndexOf("<HttpCommandList>")+17, xml.lastIndexOf("</HttpCommandList>"));
String[] splitxml = xml.split("</HttpCommand>");
for (int i = 0; i < splitxml.length; i++) {
tmpxml = splitxml[i].trim() + "</HttpCommand>";
System.out.println("splitxml:" +tmpxml);
pdfStream = new ByteArrayOutputStream();
pdf = new com.lowagie.text.Document();
PdfWriter.getInstance(pdf, pdfStream);
pdf.open();
URL xslToUse = getXsl("test.xsl");
// Here am using some utility class to transform
// generate the XML needed by iText to generate the PDF using MessageBuffer contents
String iTextXml = XmlUtil.transformXml(tmpxml.toString(), xslToUse).trim();
// generate the PDF document by parsing the specified XML file
XmlParser.parse(pdf, new ByteArrayInputStream(iTextXml.getBytes()));
}
For the above code, during the XmlParser am getting java.net.malformedURL exception : no protocol
Am trying to generate the pdf document by parsing the specified xml file.

We could need the actual xml-file to decide what is missing. I expect, that there is no protocol defined, just like this:
192.168.1.2/ (no protocol)
file://192.168.1.2/ (there is one)
And URL seems to need one.
Also try:
new File("somexsl.xlt").toURI().toURL();
See here and here.
It always helps spoilering the complete stacktrace. No one knows, where the exception actually occured, if you dont post the line numbers.

Any difference in content extracted by pdfbox and itext

I am evaluating pdfbox-1.8.6 and iText-4.2.0(free) for performance. I have noticed that content extraction is faster in iText, but searching words using regex in the content extracted by iText takes longer time than pdfbox.
Extracted content and its size seems same in both cases.
Can anybody explain me on this?
My environment is ubuntu 12.04/java 1.7.
EDIT: adding sample codes.
pdfbox
//in constructor
fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
pdfDocument = PDDocument.load(bis);
textStripper = new PDFTextStripper();
//in extarctContent method
textStripper.setStartPage(pageNo);
textStripper.setEndPage(pageNo);
return textStripper.getText(pdfDocument);
iText
//in constructor
fis = new FileInputStream(file);
reader = new PdfReader(fis);
pdfTextExtractor = new PdfTextExtractor(reader);
//in extarctContent method
return pdfTextExtractor.getTextFromPage(pageNo);
searching words
StopWatch searchTime = ...
StopWatch pdfTime = ...
for (int i = 1; i <= pageCount; i++) {
// fetch one page
pdfTime.resume();
String pageContent = pdfParserService.extractPageContent(i);
pdfTime.suspend();
if (pageContent == null) {
pageContent = "";
}
pageContent = pageContent.replace("-\n", "").replace("\n", "").replace("\\", " ");
searchTime.resume();
Collection list = searchService.search(pageContent, wordList);
searchTime.suspend();
}
pdfTime.stop();
searchTime.stop();
System.out.println(pdfTime.getTime());
System.out.println(searchTime.getTime());

Open Microsoft Word in Java

I'm trying to open MS Word 2003 document in java, search for a specified String and replace it with a new String. I use APACHE POI to do that. My code is like the following one:
public void searchAndReplace(String inputFilename, String outputFilename,
HashMap<String, String> replacements) {
File outputFile = null;
File inputFile = null;
FileInputStream fileIStream = null;
FileOutputStream fileOStream = null;
BufferedInputStream bufIStream = null;
BufferedOutputStream bufOStream = null;
POIFSFileSystem fileSystem = null;
HWPFDocument document = null;
Range docRange = null;
Paragraph paragraph = null;
CharacterRun charRun = null;
Set<String> keySet = null;
Iterator<String> keySetIterator = null;
int numParagraphs = 0;
int numCharRuns = 0;
String text = null;
String key = null;
String value = null;
try {
// Create an instance of the POIFSFileSystem class and
// attach it to the Word document using an InputStream.
inputFile = new File(inputFilename);
fileIStream = new FileInputStream(inputFile);
bufIStream = new BufferedInputStream(fileIStream);
fileSystem = new POIFSFileSystem(bufIStream);
document = new HWPFDocument(fileSystem);
docRange = document.getRange();
numParagraphs = docRange.numParagraphs();
keySet = replacements.keySet();
for (int i = 0; i < numParagraphs; i++) {
paragraph = docRange.getParagraph(i);
text = paragraph.text();
numCharRuns = paragraph.numCharacterRuns();
for (int j = 0; j < numCharRuns; j++) {
charRun = paragraph.getCharacterRun(j);
text = charRun.text();
System.out.println("Character Run text: " + text);
keySetIterator = keySet.iterator();
while (keySetIterator.hasNext()) {
key = keySetIterator.next();
if (text.contains(key)) {
value = replacements.get(key);
charRun.replaceText(key, value);
docRange = document.getRange();
paragraph = docRange.getParagraph(i);
charRun = paragraph.getCharacterRun(j);
text = charRun.text();
}
}
}
}
bufIStream.close();
bufIStream = null;
outputFile = new File(outputFilename);
fileOStream = new FileOutputStream(outputFile);
bufOStream = new BufferedOutputStream(fileOStream);
document.write(bufOStream);
} catch (Exception ex) {
System.out.println("Caught an: " + ex.getClass().getName());
System.out.println("Message: " + ex.getMessage());
System.out.println("Stacktrace follows.............");
ex.printStackTrace(System.out);
}
}
I call this function with following arguments:
HashMap<String, String> replacements = new HashMap<String, String>();
replacements.put("AAA", "BBB");
searchAndReplace("C:/Test.doc", "C:/Test1.doc", replacements);
When the Test.doc file contains a simple line like this : "AAA EEE", it works successfully, but when i use a complicated file it will read the content successfully and generate the Test1.doc file but when I try to open it, it will give me the following error:
Word unable to read this document. It may be corrupt.
Try one or more of the following:
* Open and repair the file.
* Open the file with Text Recovery converter.
(C:\Test1.doc)
Please tell me what to do, because I'm a beginner in POI and I have not found a good tutorial for it.

First of all you should be closing your document.
Besides that, what I suggest doing is resaving your original Word document as a Word XML document, then changing the extension manually from .XML to .doc . Then look at the XML of the actual document you're working with and trace the content to make sure you're not accidentally editing hexadecimal values (AAA and EEE could be hex values in other fields).
Without seeing the actual Word document it's hard to say what's going on.
There is not much documentation about POI at all, especially for Word document unfortunately.

I don't know : is its OK to answer myself, but Just to share the knowledge, I'll answer myself.
After navigating the web, the final solution i found is :
The Library called docx4j is very good for dealing with MS docx file, although its documentation is not enough till now and its forum is still in a beginning steps, but overall it help me to do what i need..
Thanks 4 all who help me..

You could try OpenOffice API, but there arent many resources out there to tell you how to use it.

You can also try this one: http://www.dancrintea.ro/doc-to-pdf/

Looks like this could be the issue.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract images from Thumbs.db in Java? - java

Related

PNGJInputException when editing Metadata

Enumset error in hadoop sequence file

java.net.malformedURL exception

Any difference in content extracted by pdfbox and itext

Open Microsoft Word in Java

Categories

Resources