How to write XMP metadata to a PSD? (JAVA) - java

I have a photoshop file that I want to be able to change 2 text values with a java program. Opening the PSD with a text editor I can find the text that I want to change. LayerText Eighty, LayerText Nine
I hid some content with blue for privacy reasons. If I use exiftool gui i see [this][2]. So I assumed it was under TextLayerText. In photoshop they are [text layers.][3] I did some research and heard about Sanselan in apache commons. I can find the same code that I found in my [text editor][4].
File imageFile = new File(fileField.getText());
File outputFile = new File(fileField.getText().split("\\.")[0] + ".png");
BufferedImage image = Sanselan.getBufferedImage(imageFile);
logArea.append("--- XMP Metadata ----\n");
logArea.append(Sanselan.getXmpXml(imageFile));
Map params = new HashMap();
params.put("TextLayerText", "");
Sanselan.writeImage(image, outputFile, ImageFormat.IMAGE_FORMAT_PNG, params);
This is the code I currently have. It declares 2 files first is input and 2nd is output. It gets the XMP and prints it out. I create a params Map but my error is.
org.apache.sanselan.ImageWriteException: Unknown parameter: TextLayerText
The goal of this program is to modify the 2 text layers and render the png from this. It renders the png file if i leave the params blank, and i can read the params with Sanselan.getXmpXml. I am struggling to find a way to change them though. I put all pictures in one because of my reputation I can't post more than 2 links.

Related

GSON / iText: Extract Text From PDF 1.7 byte[]

I'm automating tests using Rest-Assured and GSON - and need to validate the contents of a PDF file that is returned in the response of a POST request. The content of the files vary and can contain anything from just text, to text and tables, or text and tables and graphics. Every page can, and most likely will be different as far a glyph content. I am only concerned with ALL text on the pdf page - be it just plain text, or text inside of a table, or text associated with (or is inside of) an image. Since all pdf's returned by the request are different, I cannot define search areas (as far as I know). I just need to extract all text on the page.
I extract the pdf data into a byte array like so:
Gson pdfGson = new Gson();
byte[] pdfBytes =
pdfGson.fromJson(this.response.as(JsonObject.class)
.get("pdfData").getAsJsonObject().get("data").getAsJsonArray(), byte[].class);
(I've tried other extraction methods for the byte[], but this is the only way I've found that returns valid data.) This returns a very large byte[] like so:
[37, 91, 22, 45, 23, ...]
When I parse the array I run into the same issue as This Question (except my pdf is 1.7) and I attempt to implement the accepted answer, adjusted for my purposes and as explained in the documentation for iText:
byte[] decodedPdfBytes = PdfReader.decodeBytes(pdfBytes, new PdfDictionary(), FilterHandlers.getDefaultFilterHandlers());
IRandomAccessSource source = new RandomAccessSourceFactory().createSource(decodedPdfBytes);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ReaderProperties readerProperties = new ReaderProperties();
// Ineffective:
readerProperties.setPassword(user.password.getBytes());
PdfReader pdfReader = new PdfReader(source, readerProperties);
// Ineffective:
pdfReader.setUnethicalReading(true);
PdfDocument pdfDoc = new PdfDocument(pdfReader, new PdfWriter(baos));
for(int i = 1; i < pdfDoc.getNumberOfPages(); i++) {
String text = PdfTextExtractor.getTextFromPage(pdfDoc.getPage(i));
System.out.println(text);
}
This DOES decode the pdf page, and return text, but it is only the header text. No other text is returned.
For what it's worth, on the front end, when the user clicks the button to generate the pdf, it returns a blob containing the download data, so I'm relatively sure that the metadata is GSA encoded, but I'm not sure if that matters at all. I'm not able to share an example of the pdf docs due to sensitive material.
Any point in the right direction would be greatly appreciated! I've spent 3 days trying to find a solution.
For those looking for a solution - ultimately we wound up going a different route. We never found a solution to this specific issue.

Disable pdf-text searching with pdfBox

I have a pdf document (no form) where I want to disable the text searching using pdfBox (java).
Following possibilities I can imagine:
Flatten text
Remove Text information (without removing text itself)
Add overlay to document.
Currently I've no idea how I can implement that. Does anyone has an idea how to solve that?
many thanks for your help here. I guess I found a way that fit to the requirements. (Honestly, not really clean):
Add the rectangle to the address sections
convert PDF to image
convert image back to pdf.
While losing all text information, the user isn't able to see the critical information anymore. Due to the reason, that this is only for display (the initial PDF document doesn't get changed) this is ok for now.
It depends on your goals:
avoid everything on some texts: print, mark with black ink, and scan again;
delete sensible text: you have to scan inside text, and remove/replace it (with pdfbox), but it is risky (some text are splitted);
mask some text for viewer : find text and add a black rectangle (with pdfbox), but it is not very safe. You can remove the rectangle, or use another tool to read the text. Usually, if text is inside, some tool can find it;
avoiding copy/paste the text (but not search / view): use security options, with password:
see: https://pdfbox.apache.org/2.0/cookbook/encryption.html
PDDocument doc = PDDocument.load(new File("filename.pdf"));
// Define the length of the encryption key.
// Possible values are 40, 128 or 256.
int keyLength = 128;
// 256 => plante
AccessPermission ap = new AccessPermission();
// disable printing, everything else is allowed
ap.setCanPrint(false);
ap.setCanExtractContent(false);
ap.setCanExtractForAccessibility(false);
// Owner password (to open the file with all permissions) is "12345"
// User password (to open the file but with restricted permissions, is empty here)
StandardProtectionPolicy spp = new StandardProtectionPolicy("12345", "", ap);
spp.setEncryptionKeyLength(keyLength);
spp.setPermissions(ap);
doc.protect(spp);
doc.save("filename-encrypted2.pdf");
doc.close();

How to manipulate image metadata in icafe

I am looking at examples of icafe library https://github.com/dragon66/icafe to see how to manipulate the image metadata but I can't find any examples.
I am trying to add a field to the exif metadata like Description and add some sample text to that field.
Also, from what I have found I can't seem to tell whether icafe will work on image input stream or does it need an absolute path to a file stored on the disk?
Although there is no example on the wiki page, there is actually a detailed example on how to manipulate metadata which can be found in the source code package com.icafe4j.test. The name for the class is TestMetadata which shows you how to insert different metadata like EXIF, IPTC, XMP, Comment, Thumbnail etc.
ICAFE works with InputStream and OutputStream. So it doesn't matter if it comes from a local file or not as long as it is an InputStream. If you only want to add some comments, you can simply do something like this:
FileInputStream fin = new FileInputStream("input.png");
FileOutputStream fout = new FileOutputStream("comment-inserted.png");
Metadata.insertComments(fin, fout, Arrays.asList("Comment1", "Comment2"));
The above code works for common image formats like JPEG, TIFF, PNG, GIF etc equally as long as the format supports certain metadata.
If you want to work with Exif, you can use:
Metadata.insertExif(InputStream fin, OutputStream fout, Exif exif, boolean upate);
which also has a parameter "update" to control whether or not you want to keep the original Exif data if present. Details on how to create Exif instance can be found from the same example.

Add text in MS Word doc using apache POI

I have been trying to edit different types of documents using Apache POI. The script should handle both extensions .doc and .docx. I could successfully edit the .docx file using XWPF api and the required text was added at the end of the docx file.
For editing .doc files(which include header, footer and a few paragraphs), following script is used, which use HWPFDocument.
FileInputStream fis = new FileInputStream(args[0]);
POIFSFileSystem fs = new POIFSFileSystem(fis);
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
CharacterRun run = range.insertAfter("FROM SEHWAGGG A FOUUURRRRRR");
run.setBold(true);
run.setItalic(true);
The script works fine with normal documents which does not have header and footer. But seems that the issue appears with complex documents. It insert text, but in between the paragraphs (and at the beginning using insertBefore()). There are no text replacements required, just have to put the text at the end of the document. I searched similar scripts but most of them handle text replacement.
How can I add the text at the end, after all paragraphs?
I've tested It with the following document:
At first (with your original code) it completely destroyed the document:
By changing the following line, the insert works fine for me:
// Old
Range range = doc.getEndnoteRange();
// New
Range range = doc.getEndnoteRange();
I'm afraid you are out of luck with HWPF with the current state of the project.
I created a custom HWPF library for one of our clients, but the changes are not public. The changes were huge, so you can't spend - say - a week and assume that things will be fixed. You might get away with the current public HWPF when only some text needs to be replaced without changing the string length ("abc" -> "123" or "a " -> "1234").

java - OCR with Asprise library

I make an Android app that captures a photo and saves the text from it using OCR. This is my code with Asprise library, but something is wrong with the "recognize" method:
Ocr.setUp();
Ocr ocr = new Ocr();
ocr.startEngine("eng", Ocr.SPEED_FASTEST);
String s = ocr.recognize(theImage, Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_PLAINTEXT);
ocr.stopEngine();
"theImage" is Bitmap, but they want "RenderedImage" type there (thought Bitmap is rendered too), and the fourth parameter of the "recognize" method is "Object... propSpec", but there in the sample of asprise official site there are only 3 parameters. And now parameters in the "recognize" line are underlined with red. So, what should I do with my code that it work properly?
P.S. Of course, I've heard about tess-two library, but it's a bit complicated for me to add it in Android Studio (I don't know why they couldn't just make it the way that it be added with only one line in build.gradle)
I've implemented same , what you want to do , by following code , and this is working as i wanted it to , other issues may be like file reader in your PC i.e, if you want PDF file to be OCR , .pdf reader should be installed .
Ocr.setUp();
Ocr ocr = new Ocr();
ocr.startEngine("eng", Ocr.SPEED_FASTEST);
String s = ocr.recognize(new File[] {new File(path)},
Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_PLAINTEXT);
System.out.println("Result: \n" + s);
ocr.stopEngine();
System.out.println("---END---");

Categories