Base64 encode and decode not give same result - java

I have been banging my head against a wall on this all day. I have a PDF file that we generate. The PDF file looks fine in Acrobat.
I need to encode the file in base64. Using Apache codec library I do this:
String base64buf = Base64.encodeBase64String( m_reportText.getBytes( "UTF-8" ) );
As a test I write base64buf out to a file:
Files.write( new File( "report.b64" ).toPath(), base64buf.getBytes( "UTF-8") );
Then I convert it back, just to see if it is working:
String encodedName = "report.b64";
String decodedName = "report.pdf";
// Read original file.
byte[] encodedBuffer = Files.readAllBytes( new File( encodedName ).toPath() );
// Decode
byte[] decodedBuffer = Base64.decodeBase64( encodedBuffer );
// Write out decodedBuffer.
FileOutputStream outputStream = new FileOutputStream( decodedName );
outputStream.write( decodedBuffer );
outputStream.close();
I open report.pdf in Acrobat and it is a blank document. It has the correct number of pages (all are blank).
What am I missing here?

m_reportText is a String, hence contains Unicode text. However a PDF is in general binary data. That should really be avoided, as the superfluous conversion in both directions is lossy and error prone. For a hack you could try storing and retrieving the PDF bytes as ISO-8859-1.
Use a byte[] m_reportText.

Related

Umlauts get lost on another system after encoding and decoding Base64

Giving the following implementation I face the problem that, on another system, the XML file is missing the Umlaute (ä, ü, ö) compared to the origin XML file. Instead of the Umlaute the replacement character is inserted in the XML file. (0xEF 0xBF 0xBD (efbfbd))
Get a zip file containing a XML with Umlauts
Decompress the zip file
Encode the xml content to a Base64 payload and save it to the db
Querys the entity
Get the Base64 payload
Decode the Base64 content
Decoded Base64 content is a XML which should contain the origin Umlauts
Whats driving me crazy is the fact that the decoded Base64 content is missing the Umlaute on another system. Instead of the umlaute I get the replacement character. On my system the same implementation is working without the replacement.
The following code is just a MCVE to explain the problem which works fine on my system but on a other system (Windows Server 2013) misses the umlaute after decode.
String requestUrl = "https://myserver/mypath/Message_166741.zip";
HttpGet httpget = new HttpGet(String requestUrl = "https://myserver/mypath/Message_166741.zip";);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
String content = null;
content = new String(decompressedInputStream, StandardCharsets.UTF_8);
String originFileName = new SimpleDateFormat("yyyyMMddHHmm'_origin.xml'").format(new Date());
String originFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", originFileName);
// File contains the expected umlauts
FileUtils.writeStringToFile(new File(originFileNameWithPath), content);
String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(content.getBytes("UTF-8")));
String payload = Base64.encodeBase64String(ZipUtils.compress(content.getBytes()));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(ZipUtils.compress(content.getBytes())));
String xmlMessageJavaBase64;
byte[] compressedBinaryJavaBase64 = java.util.Base64.getDecoder().decode(payloadJavaBase64);
byte[] decompressedBinaryJavaBase64= ZipUtils.decompress(compressedBinaryJavaBase64);
xmlMessageJavaBase64 = new String(decompressedBinaryJavaBase64, "UTF-8");
String xmlMessageUTF8;
byte[] compressedBinaryUTF8 = java.util.Base64.getDecoder().decode(payloadUTF8);
byte[] decompressedBinaryUTF8 = ZipUtils.decompress(compressedBinaryUTF8);
xmlMessageUTF8 = new String(decompressedBinaryUTF8, "UTF-8");
String xmlMessage;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
xmlMessage = new String(decompressedBinary, "UTF-8");
String processedFileName = new SimpleDateFormat("yyyyMMddHHmm'_processed.xml'").format(new Date());
String processedFileNameUTF8 = new SimpleDateFormat("yyyyMMddHHmm'_processedUTF8.xml'").format(new Date());
String processedFileNameJavaBase64 = new SimpleDateFormat("yyyyMMddHHmm'_processedJavaBase64.xml'").format(new Date());
// These files do not contain the umlauts anymore.
// Instead of the umlauts a replacement character is inserted (0xEF 0xBF 0xBD (efbfbd))
String processedFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", processedFileName);
String processedFileNameWithPathUTF8 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameUTF8);
String processedFileNameWithPathJavaBase64 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameJavaBase64);
FileUtils.writeStringToFile(new File(processedFileNameWithPath), xmlMessage);
FileUtils.writeStringToFile(new File(processedFileNameWithPathUTF8), xmlMessageUTF8);
FileUtils.writeStringToFile(new File(processedFileNameWithPathJavaBase64), xmlMessageJavaBase64);
The three files are just for testing purpose but I hope you getting the problem
Edit
Both ways create XML file with ü, ö, ä on my machine
Only the WITHOUT implementation create an XML XML file with ü, ö, ä on another system The "content" string of WITH UTF-8 contains for ü =>
// WITHOUT UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = new String(dci);
byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary);
// WITH UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = String(dci, StandardCharsets.UTF_8);;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary, "UTF-8");
Edit #2
There also seems to be a difference between running the code in IntelliJ and outside of IntelliJ on my machine. Did not know that this makes such a huge difference. So - if I run the code outside of IntelliJ (java.exe -jar myjarfile) the WITH UTF8 Part replaces the Ü. with ... I don't know. Notepad++ shows xFC. Funny: My raspberry pi shows both files with Ü where my Windows / notepad++ shows xFC.
That whole thing confuses me and I would like to understand whats the problem is. Also because the XML file contains the UTF8 as encode in header.
Edit #3 Final Solution
// ## SERVER
// Get ZIP from request URL
HttpGet httpget = new HttpGet(requestUrl);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
// Produces a XML string which SHOULD contain ü, ö, ä
String xmlOfZipFileContent = new String(decompressedInputStream, StandardCharsets.UTF_8);
// Just for testing write to file
String xmlOfZipFileSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_original.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfZipFileSavePath), xmlOfZipFileContent, StandardCharsets.UTF_8);
// The payloadExplicitUtf8 gets stored into the DB
String payload = java.util.Base64.getEncoder().encodeToString(ZipUtils.compress(xmlOfZipFileContent.getBytes(StandardCharsets.UTF_8)));
// Store payload to db
// Client queries database and gets the payload
// payload = dbEntity.get().payload
// The following three lines is on client
byte[] compressedBinaryPayload = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinaryPayload = ZipUtils.decompress(compressedBinaryPayload);
String xmlMessageOutOfPayload = new String(decompressedBinaryPayload, StandardCharsets.UTF_8);
String xmlOfPayloadSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_payload.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfPayloadSavePath), xmlMessageOutOfPayload, StandardCharsets.UTF_8);
If I understood correctly, your situation seems to be the following:
// Decompress data from the server, it's in ISO-8859-1 or similar 1 byte encoding
byte[] dci = decompress(inputStream);
// Data gets corrupted because of wrong charset
// This is where ü gets converted to unicode replacement character
content = new String(dci, StandardCharsets.UTF_8);
The rest of the code uses UTF8 explicitly, but it doesn't matter as the data has already been corrupted at this point. In the end you expect an UTF-8 encoded file.
Also because the XML file contains the UTF8 as encode in header.
That doesn't prove anything. If you treat it as just a text file, you can write it out in as many encodings as you want to, and it would still claim to be UTF8.
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
Fine, and it is assumed that the bytes are in UTF-8, as:
String content = new String(decompressedInputStream, StandardCharsets.UTF_8);
Should the bytes not be in UTF-8, you could try Windows Latin-1:
Charset.forName("Windows-1252")
Otherwise decompressedInputStream can be used whereever content is converted to bytes in UTF-8.
...
The FileUtils.writeStringToFile without encoding specified uses the default platform encoding.
// File contains the expected umlauts
//FileUtils.writeStringToFile(new File(originFileNameWithPath), content);
Better is to ensure that UTF-8 is written. Either add the encoding to convert the Unicode String to bytes in UTF-8, or simply write the original bytes:
Files.write(Paths.get(originFileNameWithPath), decompressedInputStream);
Also the Base64 encoded UTF-8 bytes of the String should be used:
String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(
content.getBytes(StandardCharsets.UTF_8)));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(
ipUtils.compress(content.getBytes(StandardCharsets.UTF_8))));
The standard JavaSE Base64 will do; though do not use its decodeString and encodeString as that uses ISO-8859-1, Latin-1.

base 64 decode and write to doc file

I have a base64 encoded String . Which looks like this
UEsDBBQABgAIAAAAIQDhD46/jQEAACkGAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA .
I decodeD the String and write it to word file using FileWriter. But When I tried opening the doc file i get an error saying corrupt data.
I would like to know what are the steps I need to follow to write the content to a word document after decoding the data . below is the code what I did and went wrong.
byte[] encodedBytes = stringBase64.getBytes();
byte[] decodedBytes = Base64.decodeBase64(encodedBytes);
String decodeString = new String(decodedBytes);
filewriter = new java.io.FileWriter("F:\xxx.docx”);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(decodeString);
The decoded data isn't plain text data - it's just binary data. So write it with a FileStream, not a FileWriter:
// If your Base64 class doesn't have a decode method taking a string,
// find a better one!
byte[] decodedBytes = Base64.decodeBase64(stringBase64);
// Note the try-with-resources block here, to close the stream automatically
try (OutputStream stream = new FileOutputStream("F:\\xxx.doc")) {
stream.write(decodedBytes);
}
Or even better:
byte[] decodedBytes = Base64.decodeBase64(stringBase64);
Files.write(Paths.get("F:\\xxx.doc"), decodedBytes);
Please have a look.
byte[] encodedBytes = /* your encoded bytes*/
// Decode data on other side, by processing encoded data
byte[] decodedBytes= Base64.decodeBase64(encodedBytes );
String yourValue=new String(decodedBytes);
System.out.println("Decoded String is " + yourValue);
Now further you can write this string into a file and read further.

Trying to Change the Encdoing of a File in Java is Doubling the Contents of the File

I have a FileOutputStream in java that is reading the contents of UDP packets and saving them to a file. At the end of reading them, I sometimes want to convert the encoding of the file. The problem is that currently when doing this, it just ends up doubling all the contents of the file. The only workaround that I could think to do would be to create a temp file with the new encoding and then save it as the original file, but this seems too hacky.
I must be just overlooking something in my code:
if(mode.equals("netascii")){
byte[] convert = new byte[(int)file.length()];
FileInputStream input = new FileInputStream(file);
input.read(convert);
String temp = new String(convert);
convert = Charset.forName("US-ASCII").encode(temp).array();
fos.write(convert);
}
JOptionPane.showMessageDialog(frame, "Read Successful!");
fos.close();
}
Is there anything suspect?
Thanks in advance for any help!
The problem is the array of bytes you've read from the InputStream will be converted as if its ascii chars, which I'm assuming its not. Specify the InputStream encoding when converting its bytes to String and you'll get a standard Java string.
I've assumed UTF-16 as the InputStream's encoding here:
byte[] convert = new byte[(int)file.length()];
FileInputStream input = new FileInputStream(file);
// read file bytes until EOF
int r = input.read(convert);
while(r!=-1) r = input.read(convert,r,convert.length);
String temp = new String(convert, Charset.forName("UTF-16"));

Byte Order Mark to read UTF-8 cvs and excel files

Friends
I have to use BOM ( Byte Order Mark ) to make sure that downloaded files in cvs and excel in UTF-8 formats are displayed properly.
**
My question is can BOM be applied to FileOutputStream instead of
OutputStream as below ?
**
String targetfileName = reportXMLFileName.substring(0, reportXMLFileName.lastIndexOf(".") + 1);
targetfileName += "xlsx";
File tmpFile=new File((filePath !=null?filePath.trim():"")+"/"+templateFile);
FileOutputStream out = new FileOutputStream((filePath !=null?filePath.trim():"")+"/"+targetfileName);
/* Here is the example*/
out.write(new byte[] {(byte)0xEF, (byte)0xBB, (byte)0xBF });
substitute(tmpFile, tmp, sheetRef.substring(1), out);
out.close();
(As asked in comment.)
In the following file may be File, (File)OutputStream or String (filename).
final String ENCODING = "UTF-8";
PrintWriter out = new PrintWriter(file, ENCODING))));
try {
out.print("\ufffe"); // Write the BOM
out.println("<?xml version=\"1.0\" encoding=\"" + ENCODING + "\"?>");
...
} finally {
out.close();
}
Or since Java 7:
try (PrintWriter out = new PrintWriter(f, ENCODING))))) {
out.print("\ufffe"); // Write the BOM
out.println("<?xml version=\"1.0\" encoding=\"" + ENCODING + "\"?>");
...
}
Working with text, makes it more natural to use a Writer, as one does not need oneself to convert string with String.getBytes("UTF-8").
Yes. A BOM sits at the beginning of any data stream, whether over the network or in a file. Just write the BOM to the file at the beginning in the same manner, just to a FileOutputStream. Anyway, remember that a FileOutputStream is a type of OutputStream.

Java - Image encoding in XML

I thought I would find a solution to this problem relatively easily, but here I am calling upon the help from ye gods to pull me out of this conundrum.
So, I've got an image and I want to store it in an XML document using Java. I have previously achieved this in VisualBasic by saving the image to a stream, converting the stream to an array, and then VB's xml class was able to encode the array as a base64 string. But, after a couple of hours of scouring the net for an equivalent solution in Java, I've come back empty handed. The only success I have had has been by:
import it.sauronsoftware.base64.*;
import java.awt.image.BufferedImage;
import org.w3c.dom.*;
...
BufferedImage img;
Element node;
...
java.io.ByteArrayOutputStream os = new java.io.ByteArrayOutputStream();
ImageIO.write(img, "png", os);
byte[] array = Base64.encode(os.toByteArray());
String ss = arrayToString(array, ",");
node.setTextContent(ss);
...
private static String arrayToString(byte[] a, String separator) {
StringBuffer result = new StringBuffer();
if (a.length > 0) {
result.append(a[0]);
for (int i=1; i<a.length; i++) {
result.append(separator);
result.append(a[i]);
}
}
return result.toString();
}
Which is okay I guess, but reversing the process to get it back to an image when I load the XML file has proved impossible. If anyone has a better way to encode/decode an image in an XML file, please step forward, even if it's just a link to another thread that would be fine.
Cheers in advance,
Hoopla.
I've done something similar (encoding and decoding in Base64) and it worked like a charm. Here's what I think you should do, using the class Base64 from the Apache Commons project:
// ENCODING
BufferedImage img = ImageIO.read(new File("image.png"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(img, "png", baos);
baos.flush();
String encodedImage = Base64.encodeToString(baos.toByteArray());
baos.close(); // should be inside a finally block
node.setTextContent(encodedImage); // store it inside node
// DECODING
String encodedImage = node.getTextContent();
byte[] bytes = Base64.decode(encodedImage);
BufferedImage image = ImageIO.read(new ByteArrayInputStream(bytes));
Hope it helps.
Apache Commons has a Base64 class that should be helpful to you:
From there, you can just write out the bytes (they are already in a readable format)
After you get your byte array
byte[] array = Base64.encode(os.toByteArray());
use an encoded String :
String encodedImg = new String( array, "utf-8");
Then you can do fun things in your xml like
<binImg string-encoding="utf-8" bin-encoding="base64" img-type="png"><![CDATA[ encodedIImg here ]]></binImg>
With Java 6, you can use DatatypeConverter to convert a byte array to a Base64 string:
byte[] imageData = ...
String base64String = DatatypeConverter.printBase64Binary(imageData);
And to convert it back:
String base64String = ...
byte[] imageData = DatatypeConverter.parseBase64Binary(base64String);
Your arrayToString() method is rather bizarre (what's the point of that separator?). Why not simply say
String s = new String(array, "US-ASCII");
The reverse operation is
byte[] array = s.getBytes("US-ASCII");
Use the ASCII encoding, which should be sufficient when dealing with Base64 encoded data. Also, I'd prefer a Base64 encoder from a reputable source like Apache Commons.
You don't need to invent your own XML data type for this. XML schema defines standard binary data types, such as base64Binary, which is exactly what you are trying to do.
Once you use the standard types, it can be converted into binary automatically by some parsers (like XMLBeans). If your parser doesn't handle it, you can find classes for base64Binary in many places since the datatype is widely used in SOAP, XMLSec etc.
most easy implementation I was able to made is as below, And this is from Server to Server XML transfer containing binary data Base64 is from the Apache Codec library:
- Reading binary data from DB and create XML
Blob blobData = oRs.getBlob("ClassByteCode");
byte[] bData = blobData.getBytes(1, (int)blobData.length());
bData = Base64.encodeBase64(bData);
String strClassByteCode = new String(bData,"US-ASCII");
on requesting server read the tag and save it in DB
byte[] bData = strClassByteCode.getBytes("US-ASCII");
bData = Base64.decodeBase64(bData);
oPrStmt.setBytes( ++nParam, bData );
easy as it can be..
I'm still working on implementing the streaming of the XML as it is generated from the first server where the XML is created and stream it to the response object, this is to take care when the XML with binary data is too large.
Vishesh Sahu
The basic problem is that you cannot have an arbitrary bytestream in an XML document, so you need to encode it somehow. A frequent encoding scheme is BASE64, but any will do as long as the recipient knows about it.
I know that the question was aking how to encode an image via XML, but it is also possible to just stream the bytes via an HTTP GET request instead of using XML and encoding an image. Note that input is a FileInputStream.
Server Code:
File f = new File(uri_string);
FileInputStream input = new FileInputStream(f);
OutputStream output = exchange.getResponseBody();
int c = 0;
while ((c = input.read()) != -1) {
output.write(c); //writes each byte to the exchange.getResponseBody();
}
result = new DownloadFileResult(int_list);
if (input != null) {input.close();}
if (output != null){ output.close();}
Client Code:
InputStream input = connection.getInputStream();
List<Integer> l = new ArrayList<>();
int b = 0;
while((b = input.read()) != -1){
l.add(b);//you can do what you wish with this list of ints ie- write them to a file. see code below.
}
Here is how you would write the Integer list to a file:
FileOutputStream out = new FileOutputStream("path/to/file.png");
for(int i : result_bytes_list){
out.write(i);
}
out.close();
node.setTextContent( base64.encodeAsString( fileBytes ) )
using org.apache.commons.codec.binary.Base64

Categories