Error decoding base64 string - java

I have an XML file which have a node called "CONTENIDO", in this node I have a PDF file encoded in base64 string.
I'm trying to read this node, decode the string in base64 and download the PDF file to my computer.
The problem is that the file is downloaded with the same size (in kb) as the original PDF and has the same number of pages, but... all the pages are in blank without any content and when I open the downloaded file a popup appears with an error saying "unknown distinctive 806.6n". I don't know what that means.
I've tried to find a solution in the internet, with diferents ways to decode the string, but always get the same result... The XML is Ok I've checked the base64 string and is Ok.
I've also debugged the code and I've seen that the content of the var "fichero" where I'm reading the base64 string is also Ok, so I don't know what can be the problem.
This is my code:
package prueba.sap.com;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import sun.misc.BASE64Decoder;
import javax.xml.bind.DatatypeConverter;
public class anexoPO {
public static void main(String[] args) throws Exception {
FileInputStream inFile =
new FileInputStream("C:/prueba/prueba_attach_b64.xml");
FileOutputStream outFile =
new FileOutputStream("C:/prueba/salida.pdf");
anexoPO myMapping = new anexoPO();
myMapping.execute(inFile, outFile);
System.out.println("Success");
System.out.println(inFile);
}
public void execute(InputStream in, OutputStream out)
throws com.sap.aii.mapping.api.StreamTransformationException {
try {
//************************Code To Generate The XML Parsing Objects*****************************//
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(in);
Document docout = db.newDocument();
NodeList CONTENIDO = doc.getElementsByTagName("CONTENIDO");
String fichero = CONTENIDO.item(0).getChildNodes().item(0).getNodeValue();
//************** decode *************/
//import sun.misc.BASE64Decoder;
//BASE64Decoder decoder = new BASE64Decoder();
//byte[] decoded = decoder.decodeBuffer(fichero);
//import org.apache.commons.codec.binary.*;
//byte[] decoded = Base64.decode(fichero);
//import javax.xml.bind.DatatypeConverter;
byte[] decoded = DatatypeConverter.parseBase64Binary(fichero);
//************** decode *************/
String str = new String(decoded);
out.write(str.getBytes());
} catch (Exception e) {
System.out.print("Problem parsing the file");
e.printStackTrace();
}
}
}
Thanks in advance.

Definitely:
out.write(decoded);
out.close();
Strings cannot represent all bytes, and PDF is binary.
Also remove the import of sun.misc.BASE64Decoder, as this package does not exist everywhere. It might be removed by the compiler, however I would not bet on it.

Related

Decoding B64 image in Tensorflow

I'm having a terrible time dealing with image en/de-coding in TensorFlow Java. I need to handle the B64 because I have a saved model from Google AutoML vision that expects that input format. Just to be explicit the Maven import is:
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>tensorflow-core-platform</artifactId>
<version>0.4.0</version>
</dependency>
and the following minimal example shows the root issue:
import java.io.File;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import javax.activation.MimetypesFileTypeMap;
import org.apache.commons.codec.binary.Base64;
import org.tensorflow.Graph;
import org.tensorflow.Output;
import org.tensorflow.Session;
import org.tensorflow.op.image.DecodeJpeg;
import org.tensorflow.op.image.DecodeJpeg.Options;
import org.tensorflow.types.TString;
import org.tensorflow.types.TUint8;
public class tensorflowLoadMinimal{
public static void main(String[] args) throws Exception{
// Get a public JPG locally for example purposes
String imgUrl = "https://file-examples-com.github.io/"
+ "uploads/2017/10/file_example_JPG_100kB.jpg";
String localPath = "/tmp/imgFile.jpg";
InputStream in = new URL(imgUrl).openStream();
Files.copy(in, Paths.get(localPath), StandardCopyOption.REPLACE_EXISTING);
// Sanity checking the JPG; base64 encode
File f = new File(localPath);
System.out.println("Mime Type of " + f.getName() + " is " +
new MimetypesFileTypeMap().getContentType(f));
byte[] fileBytes = Files.readAllBytes(Paths.get(localPath));
String encodedString = Base64.encodeBase64String(fileBytes);
// Make b64 string a tensor; wrap in TF structs
Graph graph = new Graph();
Session s = new Session(graph);
TString tensor = TString.scalarOf(encodedString);
Output<TString> tensorAsOut = graph
.opBuilder("Const", "imgPixels", graph.baseScope())
.setAttr("dtype", tensor.dataType())
.setAttr("value", tensor)
.build()
.<TString> output(0);
// Try to decode b64 as Jpeg... and fail
Options[] opts = new Options[1];
opts[0] = DecodeJpeg.channels(3L);
DecodeJpeg dJpg = DecodeJpeg.create(graph.baseScope(), tensorAsOut, opts);
Output<TUint8> jpgOut = dJpg.image();
s.run(jpgOut);
s.close();
}
}
It confirms I have a JPG file, and then fails to do the decoding, complaining the input format is not an image file, with succinct output:
Mime Type of imgFile.jpg is image/jpeg
...
Exception in thread "main" org.tensorflow.exceptions.TFInvalidArgumentException: Unknown image file format. One of JPEG, PNG, GIF, BMP required.
[[{{node DecodeJpeg}}]]
at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
...
at orc.tensorflowLoadMinimal.main(tensorflowLoadMinimal.java:55)
Where am I going wrong?
It reads:
Unknown image file format. One of JPEG, PNG, GIF, BMP required.
Which can be fixed either by removing this superfluous part:
// Try to decode b64 as Jpeg... and fail
Options[] opts = new Options[1];
opts[0] = DecodeJpeg.channels(3L);
DecodeJpeg dJpg = DecodeJpeg.create(graph.baseScope(), tensorAsOut, opts);
Output<TUint8> jpgOut = dJpg.image();
s.run(jpgOut);
s.close();
... or by passing the expected parameter JPEG, likely into DecodeJpeg.create() or opts.

how to fill windows exif tags

Good evening
I want to fill in the jpg photo file windows properties
Apparently these are the exiftags
[Exif IFD0] Windows XP Title
[Exif IFD0] Windows XP Author
[Exif IFD0] Windows XP Subject
I looked at the side of icafe.jar but have not found these tags.
Can I make it with icafe or other jar library ?
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.ArrayList;
import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.exif.Exif;
import com.icafe4j.image.meta.jpeg.JpegExif;
import com.icafe4j.image.meta.exif.ExifTag;
import com.icafe4j.image.tiff.TiffTag;
import com.icafe4j.image.tiff.FieldType;
fin = new FileInputStream(Fm_filePathIn);
fout = new FileOutputStream(Fm_filePathOut);
List<Metadata> metaList = new ArrayList<Metadata>();
metaList.add(populateExif(JpegExif.class));
Exif populateExif(Class<?> exifClass) throws IOException {
Exif exif = new JpegExif();
exif.addImageField(ExifTag.WINDOWS_XP_AUTHOR, FieldType.WINDOWSXP, "Toto");
exif.addImageField(ExifTag.WINDOWS_XP_KEYWORDS, FieldType.WINDOWSXP, "Copyright;Authorbisou");
// Insert ThumbNailIFD
// Since we don't provide thumbnail image, it will be created later from the input stream
exif.setThumbnailRequired(true);
return exif;
}
fin.close();
fout.close();
Those tags do exist in ICAFE but they are not Exiftag. They are TiffTag. Replace the ExifTag with TiffTag, it will work. Look at the TestMetada.java, it clearly shows that.
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.exif.Exif;
import com.icafe4j.image.meta.jpeg.JpegExif;
import com.icafe4j.image.meta.exif.ExifTag;
import com.icafe4j.image.tiff.TiffTag;
import com.icafe4j.image.tiff.FieldType;
public class TestWindowsXP {
public static void main(String[] args) throws IOException {
FileInputStream fin = new FileInputStream(Fm_filePathIn);
FileOutputStream fout = new FileOutputStream(Fm_filePathOut);
List<Metadata> metaList = new ArrayList<Metadata>();
Exif exif = new JpegExif();
exif.addImageField(TiffTag.WINDOWS_XP_AUTHOR, FieldType.WINDOWSXP, "Toto");
exif.addImageField(TiffTag.WINDOWS_XP_KEYWORDS, FieldType.WINDOWSXP, "Copyright;Authorbisou");
// Insert ThumbNailIFD
// Since we don't provide thumbnail image, it will be created later from the input stream
exif.setThumbnailRequired(true);
metaList.add(exif);
Metadata.insertMetadata(metaList, fin, fout);
fin.close();
fout.close();
}
}
And the following is a screenshot when I right-click the resulting image->show properties. You can see the information you wanted to insert is showing.

Redis/java - writing and reading binary data

I'm trying to write and read a gzip to/from Redis. The problem is that I tried saving the read bytes to a file and opening it with gzip - it's invalid. The strings are also different when looking at them in the Eclipse console.
Here's my code:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import redis.clients.jedis.Jedis;
public class TestRedis
{
public static void main(String[] args) throws IOException
{
String fileName = "D:/temp/test_write.gz";
String jsonKey = fileName;
Jedis jedis = new Jedis("127.0.0.1");
byte[] jsonContent = ReadFile(new File(fileName).getPath());
// test-write data we're storing in redis
FileOutputStream fostream = new FileOutputStream("D:/temp/test_write_before_redis.gz"); // looks ok
fostream.write(jsonContent);
fostream.close();
jedis.set(jsonKey.getBytes(), jsonContent);
System.out.println("writing, key: " + jsonKey + ",\nvalue: " + new String(jsonContent)); // looks ok
byte[] readJsonContent = jedis.get(jsonKey).getBytes();
String readJsonContentString = new String(readJsonContent);
FileOutputStream fos = new FileOutputStream("D:/temp/test_read.gz"); // invalid gz file :(
fos.write(readJsonContent);
fos.close();
System.out.println("n\nread json content from redis: " + readJsonContentString);
}
private static byte[] ReadFile(String aFilePath) throws IOException
{
Path path = Paths.get(aFilePath);
return Files.readAllBytes(path);
}
}
You are using Jedis.get(String) to read which includes an inner UTF-8 conversion. But using Jedis.set(byte[], byte[]) to write does not include such conversion. The mismatch could be because of this reason. If so, you can try Jedis.get(byte[]) to read from redis to skip UTF-8 conversion. E.g.
byte[] readJsonContent = jedis.get(jsonKey.getBytes());

How to return binary data from AWS Lambda written in Java

Given that it is now possible to handle binary data in Amazon Api Gateway and Amazon Lambda, I wanted to try to make an Amazon Lambda endpoint which returned an Excel spreadsheet. It is entirely possible to do so using node/js, as demonstrated here. Unfortunately, any time I try to do this using Java, it falls to pieces.
My initial attempt was to create a simple workbook using apache XSSFWorkbook, write it to the output stream provided by RequestStreamHandler, and done.
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
public class FileRequestHandler implements RequestStreamHandler {
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context)
throws IOException {
Workbook wb = new XSSFWorkbook();
String sheetName = "Problem sheet";
wb.createSheet(sheetName);
wb.write(outputStream);
}
}
When tested locally, the output stream can be piped to a file resulting in a valid output excel file.
import com.amazonaws.util.StringInputStream;
import org.junit.Test;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
public class FileRequestHandlerTest {
#Test
public void shouldCreateExcelFile() throws IOException {
FileRequestHandler fileRequestHandler = new FileRequestHandler();
InputStream inputStream = new StringInputStream("hello world");
String fileName = "FileRequestLambda";
String path = fileName + ".xlsx";
FileOutputStream fileOutputStream = new FileOutputStream(path);
fileRequestHandler.handleRequest(inputStream, fileOutputStream, TestUtils.createContext());
fileOutputStream.close();
}
}
But when I run it in Amazon Lambda, I get malformed binary output:
PKn��I_rels/.rels���j�0��}
�{㴃1F�^Ơ�2��l%1I,c�[��3�l
l�����H��4��R�l��·����q}*�2�������;�*��
t"�^�l;1W)�N�iD)ejuD�cKz[׷:}g����#:�
�3����4�7N�s_ni�G�M*7�����2R�+�� �2�/�����b��mC�Pp�ֱ$POyQ�抒�DsZ��IС�'un���~�PK����OPKn��I[Content_Types].xml�SMO1��+6��m��1���G%��β
�J[���MDL0�S;yo�{3i�Ӎ5�c��5lć�B'��nѰ��S}˪��)0�aÜg��`<�L��԰.�p'D�ZH�t��>Z�Tƅ ��#q=��]F��\4�=`+���P�!-!S.�v�#��+�����N�tEV=nHe7���S,;K]_h7Q+�W8߶Z��re��c�U�����}�����g�&A��,���H�$�B<��`�"�Jb���"���I�N�1���A���CI�#��܂v��?|\�{��`�b������$�c�D��|2�PKKB�>'PKn��IdocProps/app.xmlM��
�0D�~EȽ��ADҔ���A? ��6�lB�J?ߜ���0���ͯ��)�#��׍H6���V>��$;�SC
;̢(�ra�g�l�&�e��L!y�%��49��`_���4G���F��J��Wg
�GS�b����
~�PK�|wؑ�PKn��IdocProps/core.xmlm��J�0F��!�m�V����(���Ż��m��!�v}{ӺVP/g��a��wG5�wp~4��4�1-�u���n��c�גOFC����6��e�888c��<�홰
B��/P�g��q�b��!��'��W�)��"
�<p�S��I)Ŧ�onZR�#��Ќ�6�S�߅u��G?n�<��\�\����ۛ���t���p|��f� Q4��ac&ߓ��������i��"�UG+vV��z�ɯ���U�^�H#�����IM�$�&�PK����PKn��Ixl/sharedStrings.xml=�A� ツ��.z0Ɣ�`������,�����q2��o�ԇ���N�E��x5�z>�W���(R�K���^4{�����ŀ�5��y�V����y�m�XV�\�.��j�����
8�PKp��&x�PKn��I
xl/styles.xml���n� ��>bop2TQ��P)U��RWb�6*�����ӤS�Nw�s���3ߍ֐���t��(l��������ҝx�!N=#$ɀ��}��3c���ʰr`:i��2��w,�
�d
�T��R#�voc �;c�iE���Û��E<|��4Iɣ�����F#��n���B�z�F���y�j3y��yҥ�jt>���2��Lژ�!6��2F��OY��4#M�!���G��������1�t��y��p��" n����u�����a�ΦDi�9�&#��%I��9��}���cK��T��$?������`J������7���o��f��M|PK�1X#C�PKn��Ixl/workbook.xml���N�0��<��wj�E�8��J��P�;�����hmZ'Q�#����~;���;vCJ6 �Fà���"��|x|�}���#]����C�0�<֜'=�WiG��#y���O#�2i#������+`!��F�{��-�O�!/B�r)�;&h�����zOz�o����xO��I2����YuĔ��s�u��<J8Q�z6��Qm�:�,�c��Z�����PK1����dPKn��Ixl/_rels/workbook.xml.rels��Mk1#���0�nv-�R�^����0$����$dƯo���R�OC�ރ�-��������#Sՠ(�����ܼ?��b��p�����d�AJ�¾O�
#�/�޴f�iD�b�P6m�#Jy�N'�[�HO��E�k����3�W���ܑ`���Zri㪐����?�ض��e�������7p�wj�W5r���]������=�|���<:�[p��7�O�PK��4��9PKn��Ixl/worksheets/sheet1.xmleP�N�0���މ�V��THU$���$��j���[��c�����3��-v�nT���/a����7�Zߗ��z���]uQ���0 ��zJD�[�C3�3!� }|鈝�H��ab4�br�^���v�z���:�)P1v%ܭ#W�"|�8�?X�ܚ���C[B�'�~��ȅO������Tyb�bgN�<�|��$��ƙ��{#&����h��>��D�Ű�z�#��6��8�LF�dQ����,4�xS����/PK�_�Y�lPKn��I����O_rels/.relsPKn��IKB�>'[Content_Types].xmlPKn��I�|wؑ��docProps/app.xmlPKn��I����mdocProps/core.xmlPKn��Ip��&x��xl/sharedStrings.xmlPKn��I�1X#C�
nxl/styles.xmlPKn��I1����d�xl/workbook.xmlPKn��I��4��9xl/_rels/workbook.xml.relsPKn��I�_�Y�l$ xl/worksheets/sheet1.xmlPK ?Z
The output is about 5KB in size, while the output on my local computer is about 3KB in size. This appears to be a problem with binary output in general for Java on Amazon Lambda. When I do run some code that writes an image to the output string, it also works locally, but results in an image twice the size and garbled when run from Amazon Lambda.
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import java.io.*;
import java.net.URL;
public class ImageRequestHandler implements RequestStreamHandler {
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context)
throws IOException {
String address = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/AmazonWebservices_Logo.svg/580px-AmazonWebservices_Logo.svg.png";
URL url = new URL(address);
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
outputStream.write(response);
}
}
The types of the was input and output streams are:
lambdainternal.util.NativeMemoryAsInputStream
lambdainternal.util.LambdaByteArrayOutputStream
Help?
I had the same problem with returning JPG image from Amazon Lambda and I found a work-around.
You need to encode an output stream with base64 encoding:
OutputStream encodedStream = Base64.getEncoder().wrap(outputStream);
encodedStream.write(response);
encodedStream.close();
Then you need to update Method Response and Integration Response of your function as described here: AWS Gateway API base64Decode produces garbled binary?

Convert DOCX to HTML incliding IMAGES

I am using DOCX4J to convert the DOCX to HTML .I have successfully done the conversion and got the html format.I will be using the html format to embed it as EMAIL body to send an email.But I have some issues which are listed below....
Unable to display images in email body
Losing the spaces and bullets
Please find the code which I have written,
WordprocessingMLPackage wordMLPackage;
wordMLPackage = Docx4J.load(new java.io.File(resourcePath2));
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(imageFolder + resourcePath2 + "_files");
htmlSettings.setImageTargetUri(imageFolder +resourcePath2.substring(resourcePath2.lastIndexOf("/")+1) + "_files");
htmlSettings.setWmlPackage(wordMLPackage);
OutputStream os;
os = new ByteArrayOutputStream();
Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_SAVE_FLAT_XML);
DOCX = ((ByteArrayOutputStream)os).toString();
You may add like this in your code
package tcg.doc.web.managedBeans;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;
#Component
#Scope("session")
#Qualifier("ConvertWord")
public class ConvertWord {
private static final String docName = "TestDocx.docx";
private static final String outputlFolderPath = "d:/";
String htmlNamePath = "docHtml.html";
String zipName="_tmp.zip";
File docFile = new File(outputlFolderPath+docName);
File zipFile = new File(zipName);
public void ConvertWordToHtml() {
try {
// 1) Load DOCX into XWPFDocument
InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
System.out.println("InputStream"+doc);
XWPFDocument document = new XWPFDocument(doc);
// 2) Prepare XHTML options (here we set the IURIResolver to load images from a "word/media" folder)
XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;
// Extract image
String root = "target";
File imageFolder = new File( root + "/images/" + doc );
options.setExtractor( new FileImageExtractor( imageFolder ) );
// URI resolver
options.URIResolver( new FileURIResolver( imageFolder ) );
OutputStream out = new FileOutputStream(new File(htmlPath()));
XHTMLConverter.getInstance().convert(document, out, options);
System.out.println("OutputStream "+out.toString());
} catch (FileNotFoundException ex) {
} catch (IOException ex) {
}
}
public static void main(String[] args) {
ConvertWord cwoWord=new ConvertWord();
cwoWord.ConvertWordToHtml();
System.out.println();
}
public String htmlPath(){
// d:/docHtml.html
return outputlFolderPath+htmlNamePath;
}
public String zipPath(){
// d:/_tmp.zip
return outputlFolderPath+zipName;
}
}
For maven Dependency on pom.xml
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>
<version>1.0.4</version>
</dependency>
or download it from Here
For images to work in an email body, I guess you need to use either a data URI or publish them to a web-reachable location.
In either case, you'll need to write an implementation of:
public interface ConversionImageHandler {
/**
* #param picture
* #param relationship of the image
* #param part of the image, if it is an internal image, otherwise null
* #return uri for the image we've saved, or null
* #throws Docx4JException this exception will be logged, but not propagated
*/
public String handleImage(AbstractWordXmlPicture picture, Relationship relationship, BinaryPart part) throws Docx4JException;
}
and configure docx4j to use it with htmlSettings.setImageHandler.
You can look at some of the existing implementations in the docx4j source code, and take advantage of the helper methods in AbstractConversionImageHandler (eg createEncodedImage if you want data URIs).

Categories