org.xmlpull.v1.XmlPullParserException - java

I'm trying to bind an xml file(as a byte[]) to a java object. This is my code-
public voidinputConfigXML(String xmlfile, byte[] xmlData) {
IBindingFactory bFact = BindingDirectory.getFactory(GroupsDTO.class);
IUnmarshallingContext uctx = bFact.createUnmarshallingContext();
groups = (GroupsDTO) uctx.unmarshalDocument(new ByteArrayInputStream(xmlData), "UTF8");
}
The unmarshalDocument() is giving me this exception. What do i do?
FYI: Running as JUnit test case
The following is the stacktrace -
Error parsing document (line 1, col 1)
org.xmlpull.v1.XmlPullParserException: only whitespace content allowed before start tag and not \u0 (position: START_DOCUMENT seen \u0... #1:1)
at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1519)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.jibx.runtime.impl.XMLPullReaderFactory$XMLPullReader.next(XMLPullReaderFactory.java:291)
at org.jibx.runtime.impl.UnmarshallingContext.toStart(UnmarshallingContext.java:451)
at org.jibx.runtime.impl.UnmarshallingContext.unmarshalElement(UnmarshallingContext.java:2755)
at org.jibx.runtime.impl.UnmarshallingContext.unmarshalDocument(UnmarshallingContext.java:2905)
at abc.dra.DRAAPI.inputConfigXML(DRAAPI.java:31)
at abc.dra.XMLToObject_Test.test(XMLToObject_Test.java:34)
[...]
This is my code that forms byte[]-
void test() {
String xmlfile = "output.xml"
File file = new File(xmlfile);
byte[] xmlData = new byte[(int) file.length()];
groups = dra.inputConfigXML(xmlfile, xmlData);
}

The ByteArrayInputstream is empty:
only whitespace content allowed before start tag and not \u0
(position: START_DOCUMENT seen \u0... #1:1)
means, that a \u0 Bit was found as first char within the XML.
Ensure you have content within your byte[] and the UTF-8 don't start with a BOM.
I don't think, that the BOM is your problem here, but I often encountert regarding BOM and java.
update
You don't fill the byte[]. You have to read the file-content into the byte[]:
read this: File to byte[] in Java
By the way: byte[] xmlData = new byte[(int) file.length()]; is bad code-style, becaus you will run into problems with larger XML-files. If they are larger than Integer.MAX_VALUE you will read a corrupt file.

Hari,
JiBX need characters as input. I think you have specified your encoding incorrectly. Try this code instead:
FileInputStream fis = new FileInputStream("output.xml");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
groups = (GroupsDTO) uctx.unmarshalDocument(isr);
If you must use the code you have written, I would try outputting the text to the console (System.put.println(xxx)) to make sure you are decoding the utf-8 correctly.
Don

Go to to mvn repository path and delete that folder for xml file.

Related

FileChannel not writing special characters properly

I'm trying to write some text into a file using a FileChannel. So far everything works fine except for the fact that umlauts are not written correctly.
Path fileChannel = Paths.get("c:/channel.txt");
try (FileChannel channel = FileChannel.open(
fileChannel,
StandardOpenOption.CREATE,
StandardOpenOption.READ,
StandardOpenOption.WRITE)) {
String response = "Würzburg";
channel.write(ByteBuffer.wrap(response.getBytes(StandardCharsets.UTF_8)));
}
In this example, I want to write Würzburg into the file, but when I open it, it contains the following: Würzburg. The file itself is utf-8.
Any suggestions, what could be done?
edit:
Finally, I would like to read out the file again, for example like this:
try (FileChannel channel = FileChannel.open(fileChannel, StandardOpenOption.READ)) {
byte[] buffer = new byte[(int) channel.size()];
ByteBuffer bb = ByteBuffer.wrap(buffer);
channel.read(bb);
String request = new String(buffer, StandardCharsets.UTF_8);
System.out.println(request);
}
However, a comparison of the strings response and request shows that they are not identical.
Running on a Windows machine, Java 17 (IntelliJ configured with adoptium open jdk)

Umlauts get lost on another system after encoding and decoding Base64

Giving the following implementation I face the problem that, on another system, the XML file is missing the Umlaute (ä, ü, ö) compared to the origin XML file. Instead of the Umlaute the replacement character is inserted in the XML file. (0xEF 0xBF 0xBD (efbfbd))
Get a zip file containing a XML with Umlauts
Decompress the zip file
Encode the xml content to a Base64 payload and save it to the db
Querys the entity
Get the Base64 payload
Decode the Base64 content
Decoded Base64 content is a XML which should contain the origin Umlauts
Whats driving me crazy is the fact that the decoded Base64 content is missing the Umlaute on another system. Instead of the umlaute I get the replacement character. On my system the same implementation is working without the replacement.
The following code is just a MCVE to explain the problem which works fine on my system but on a other system (Windows Server 2013) misses the umlaute after decode.
String requestUrl = "https://myserver/mypath/Message_166741.zip";
HttpGet httpget = new HttpGet(String requestUrl = "https://myserver/mypath/Message_166741.zip";);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
String content = null;
content = new String(decompressedInputStream, StandardCharsets.UTF_8);
String originFileName = new SimpleDateFormat("yyyyMMddHHmm'_origin.xml'").format(new Date());
String originFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", originFileName);
// File contains the expected umlauts
FileUtils.writeStringToFile(new File(originFileNameWithPath), content);
String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(content.getBytes("UTF-8")));
String payload = Base64.encodeBase64String(ZipUtils.compress(content.getBytes()));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(ZipUtils.compress(content.getBytes())));
String xmlMessageJavaBase64;
byte[] compressedBinaryJavaBase64 = java.util.Base64.getDecoder().decode(payloadJavaBase64);
byte[] decompressedBinaryJavaBase64= ZipUtils.decompress(compressedBinaryJavaBase64);
xmlMessageJavaBase64 = new String(decompressedBinaryJavaBase64, "UTF-8");
String xmlMessageUTF8;
byte[] compressedBinaryUTF8 = java.util.Base64.getDecoder().decode(payloadUTF8);
byte[] decompressedBinaryUTF8 = ZipUtils.decompress(compressedBinaryUTF8);
xmlMessageUTF8 = new String(decompressedBinaryUTF8, "UTF-8");
String xmlMessage;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
xmlMessage = new String(decompressedBinary, "UTF-8");
String processedFileName = new SimpleDateFormat("yyyyMMddHHmm'_processed.xml'").format(new Date());
String processedFileNameUTF8 = new SimpleDateFormat("yyyyMMddHHmm'_processedUTF8.xml'").format(new Date());
String processedFileNameJavaBase64 = new SimpleDateFormat("yyyyMMddHHmm'_processedJavaBase64.xml'").format(new Date());
// These files do not contain the umlauts anymore.
// Instead of the umlauts a replacement character is inserted (0xEF 0xBF 0xBD (efbfbd))
String processedFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", processedFileName);
String processedFileNameWithPathUTF8 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameUTF8);
String processedFileNameWithPathJavaBase64 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameJavaBase64);
FileUtils.writeStringToFile(new File(processedFileNameWithPath), xmlMessage);
FileUtils.writeStringToFile(new File(processedFileNameWithPathUTF8), xmlMessageUTF8);
FileUtils.writeStringToFile(new File(processedFileNameWithPathJavaBase64), xmlMessageJavaBase64);
The three files are just for testing purpose but I hope you getting the problem
Edit
Both ways create XML file with ü, ö, ä on my machine
Only the WITHOUT implementation create an XML XML file with ü, ö, ä on another system The "content" string of WITH UTF-8 contains for ü =>
// WITHOUT UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = new String(dci);
byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary);
// WITH UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = String(dci, StandardCharsets.UTF_8);;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary, "UTF-8");
Edit #2
There also seems to be a difference between running the code in IntelliJ and outside of IntelliJ on my machine. Did not know that this makes such a huge difference. So - if I run the code outside of IntelliJ (java.exe -jar myjarfile) the WITH UTF8 Part replaces the Ü. with ... I don't know. Notepad++ shows xFC. Funny: My raspberry pi shows both files with Ü where my Windows / notepad++ shows xFC.
That whole thing confuses me and I would like to understand whats the problem is. Also because the XML file contains the UTF8 as encode in header.
Edit #3 Final Solution
// ## SERVER
// Get ZIP from request URL
HttpGet httpget = new HttpGet(requestUrl);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
// Produces a XML string which SHOULD contain ü, ö, ä
String xmlOfZipFileContent = new String(decompressedInputStream, StandardCharsets.UTF_8);
// Just for testing write to file
String xmlOfZipFileSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_original.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfZipFileSavePath), xmlOfZipFileContent, StandardCharsets.UTF_8);
// The payloadExplicitUtf8 gets stored into the DB
String payload = java.util.Base64.getEncoder().encodeToString(ZipUtils.compress(xmlOfZipFileContent.getBytes(StandardCharsets.UTF_8)));
// Store payload to db
// Client queries database and gets the payload
// payload = dbEntity.get().payload
// The following three lines is on client
byte[] compressedBinaryPayload = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinaryPayload = ZipUtils.decompress(compressedBinaryPayload);
String xmlMessageOutOfPayload = new String(decompressedBinaryPayload, StandardCharsets.UTF_8);
String xmlOfPayloadSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_payload.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfPayloadSavePath), xmlMessageOutOfPayload, StandardCharsets.UTF_8);
If I understood correctly, your situation seems to be the following:
// Decompress data from the server, it's in ISO-8859-1 or similar 1 byte encoding
byte[] dci = decompress(inputStream);
// Data gets corrupted because of wrong charset
// This is where ü gets converted to unicode replacement character
content = new String(dci, StandardCharsets.UTF_8);
The rest of the code uses UTF8 explicitly, but it doesn't matter as the data has already been corrupted at this point. In the end you expect an UTF-8 encoded file.
Also because the XML file contains the UTF8 as encode in header.
That doesn't prove anything. If you treat it as just a text file, you can write it out in as many encodings as you want to, and it would still claim to be UTF8.
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
Fine, and it is assumed that the bytes are in UTF-8, as:
String content = new String(decompressedInputStream, StandardCharsets.UTF_8);
Should the bytes not be in UTF-8, you could try Windows Latin-1:
Charset.forName("Windows-1252")
Otherwise decompressedInputStream can be used whereever content is converted to bytes in UTF-8.
...
The FileUtils.writeStringToFile without encoding specified uses the default platform encoding.
// File contains the expected umlauts
//FileUtils.writeStringToFile(new File(originFileNameWithPath), content);
Better is to ensure that UTF-8 is written. Either add the encoding to convert the Unicode String to bytes in UTF-8, or simply write the original bytes:
Files.write(Paths.get(originFileNameWithPath), decompressedInputStream);
Also the Base64 encoded UTF-8 bytes of the String should be used:
String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(
content.getBytes(StandardCharsets.UTF_8)));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(
ipUtils.compress(content.getBytes(StandardCharsets.UTF_8))));
The standard JavaSE Base64 will do; though do not use its decodeString and encodeString as that uses ISO-8859-1, Latin-1.

How to fix "Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS)" in a ServletOutputStream

Following code gives veracode flaw "Improper Neutralization of Script-Related HTML Tags in a Web Page" on the line out.write(outByte,0,iRead);
:
try {
bytesImage = helper.getBlob(Integer.parseInt(id) );
ByteArrayInputStream bin = new ByteArrayInputStream(bytesImage);
ServletOutputStream out = response.getOutputStream();
outByte = new byte[bytesImage.length];
int iRead = 0;
while ((iRead = bin.read(outByte)) > 0) {
out.write(outByte,0,iRead);
}
I found a lot of similar issues here but all with strings only. These coulde be fixed with something like this:
> out.write ( ESAPI.encoder().encodeForHTML(theSimpleString) );
but for the binary OutputStream this will not work.
Any hints how to get above veracode issue solved?
Thanks
As suggested by #sinkmanu I tried to convert the bytes to String. Then applied ESAPI.encoder().encodeForHTML().
I added two conversion methods:
private static String base64Encode(byte[] bytes) {
return new BASE64Encoder().encode(bytes);
}
private static byte[] base64Decode(String s) throws IOException {
return new BASE64Decoder().decodeBuffer(s);
}
then tried with this code:
...
bytes = helper.getBlob( inId );
// 1 -> this solves Veracode issue but image is not valid anymore
String encodedString = base64Encode(bytes) ;
String safeString = ESAPI.encoder().encodeForHTML(encodedString);
safeBytes = base64Decode(safeString);
// 2 -> as written above, when i use the safe 'safeBytes' the Veracode flaw is gone but the app is not working anymore (image not ok)
// ByteArrayInputStream bin = new ByteArrayInputStream(safeBytes);
// outBytes = new byte[safeBytes.length];
// 3 -> just use the 'unsafe' bytes -> app is working but veracode flaw needs to be fixed!
ByteArrayInputStream bin = new ByteArrayInputStream(bytes);
outBytes = new byte[bytes.length];
int iRead=0;
ServletOutputStream out = response.getOutputStream();
while ((iRead = bin.read(outBytes)) > 0) {
out.write( outBytes, 0, iRead);
}
...
The above could solve the veracode issue (when 2 is uncommented) but the image then seems to be corrupt (cannot be processes anymore?).
Any hint how i can solve the veracode issue with the binary stream?
The solution to above is this:
String safeString = ESAPI.encoder().encodeForBase64(bytes,false);
byte[] safeBytes = ESAPI.encoder().decodeFromBase64(safeString);
In the ESAPI libs there are also methods to encode and decode from Base64. This was the solution to my problem. Above two lines do the magic for veracode and when using the "safeBytes" in the code later on everything is fine...
You can validate file using following function:
ESAPI.validator().getValidFileContent()
To sanitize strings you can use the encodeForHTML fromt the ESAPI library or StringEscapeUtils from Apache.
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
String data = "<script>alert(document.cookie);</script>";
String escaped = escapeHtml(data);
If your data is not a String, you have to convert it to String. Also, if you are sure that the data that you have are escaped, you can ignore the warning because it is a false positive.

Trying to Change the Encdoing of a File in Java is Doubling the Contents of the File

I have a FileOutputStream in java that is reading the contents of UDP packets and saving them to a file. At the end of reading them, I sometimes want to convert the encoding of the file. The problem is that currently when doing this, it just ends up doubling all the contents of the file. The only workaround that I could think to do would be to create a temp file with the new encoding and then save it as the original file, but this seems too hacky.
I must be just overlooking something in my code:
if(mode.equals("netascii")){
byte[] convert = new byte[(int)file.length()];
FileInputStream input = new FileInputStream(file);
input.read(convert);
String temp = new String(convert);
convert = Charset.forName("US-ASCII").encode(temp).array();
fos.write(convert);
}
JOptionPane.showMessageDialog(frame, "Read Successful!");
fos.close();
}
Is there anything suspect?
Thanks in advance for any help!
The problem is the array of bytes you've read from the InputStream will be converted as if its ascii chars, which I'm assuming its not. Specify the InputStream encoding when converting its bytes to String and you'll get a standard Java string.
I've assumed UTF-16 as the InputStream's encoding here:
byte[] convert = new byte[(int)file.length()];
FileInputStream input = new FileInputStream(file);
// read file bytes until EOF
int r = input.read(convert);
while(r!=-1) r = input.read(convert,r,convert.length);
String temp = new String(convert, Charset.forName("UTF-16"));

Java - Image encoding in XML

I thought I would find a solution to this problem relatively easily, but here I am calling upon the help from ye gods to pull me out of this conundrum.
So, I've got an image and I want to store it in an XML document using Java. I have previously achieved this in VisualBasic by saving the image to a stream, converting the stream to an array, and then VB's xml class was able to encode the array as a base64 string. But, after a couple of hours of scouring the net for an equivalent solution in Java, I've come back empty handed. The only success I have had has been by:
import it.sauronsoftware.base64.*;
import java.awt.image.BufferedImage;
import org.w3c.dom.*;
...
BufferedImage img;
Element node;
...
java.io.ByteArrayOutputStream os = new java.io.ByteArrayOutputStream();
ImageIO.write(img, "png", os);
byte[] array = Base64.encode(os.toByteArray());
String ss = arrayToString(array, ",");
node.setTextContent(ss);
...
private static String arrayToString(byte[] a, String separator) {
StringBuffer result = new StringBuffer();
if (a.length > 0) {
result.append(a[0]);
for (int i=1; i<a.length; i++) {
result.append(separator);
result.append(a[i]);
}
}
return result.toString();
}
Which is okay I guess, but reversing the process to get it back to an image when I load the XML file has proved impossible. If anyone has a better way to encode/decode an image in an XML file, please step forward, even if it's just a link to another thread that would be fine.
Cheers in advance,
Hoopla.
I've done something similar (encoding and decoding in Base64) and it worked like a charm. Here's what I think you should do, using the class Base64 from the Apache Commons project:
// ENCODING
BufferedImage img = ImageIO.read(new File("image.png"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(img, "png", baos);
baos.flush();
String encodedImage = Base64.encodeToString(baos.toByteArray());
baos.close(); // should be inside a finally block
node.setTextContent(encodedImage); // store it inside node
// DECODING
String encodedImage = node.getTextContent();
byte[] bytes = Base64.decode(encodedImage);
BufferedImage image = ImageIO.read(new ByteArrayInputStream(bytes));
Hope it helps.
Apache Commons has a Base64 class that should be helpful to you:
From there, you can just write out the bytes (they are already in a readable format)
After you get your byte array
byte[] array = Base64.encode(os.toByteArray());
use an encoded String :
String encodedImg = new String( array, "utf-8");
Then you can do fun things in your xml like
<binImg string-encoding="utf-8" bin-encoding="base64" img-type="png"><![CDATA[ encodedIImg here ]]></binImg>
With Java 6, you can use DatatypeConverter to convert a byte array to a Base64 string:
byte[] imageData = ...
String base64String = DatatypeConverter.printBase64Binary(imageData);
And to convert it back:
String base64String = ...
byte[] imageData = DatatypeConverter.parseBase64Binary(base64String);
Your arrayToString() method is rather bizarre (what's the point of that separator?). Why not simply say
String s = new String(array, "US-ASCII");
The reverse operation is
byte[] array = s.getBytes("US-ASCII");
Use the ASCII encoding, which should be sufficient when dealing with Base64 encoded data. Also, I'd prefer a Base64 encoder from a reputable source like Apache Commons.
You don't need to invent your own XML data type for this. XML schema defines standard binary data types, such as base64Binary, which is exactly what you are trying to do.
Once you use the standard types, it can be converted into binary automatically by some parsers (like XMLBeans). If your parser doesn't handle it, you can find classes for base64Binary in many places since the datatype is widely used in SOAP, XMLSec etc.
most easy implementation I was able to made is as below, And this is from Server to Server XML transfer containing binary data Base64 is from the Apache Codec library:
- Reading binary data from DB and create XML
Blob blobData = oRs.getBlob("ClassByteCode");
byte[] bData = blobData.getBytes(1, (int)blobData.length());
bData = Base64.encodeBase64(bData);
String strClassByteCode = new String(bData,"US-ASCII");
on requesting server read the tag and save it in DB
byte[] bData = strClassByteCode.getBytes("US-ASCII");
bData = Base64.decodeBase64(bData);
oPrStmt.setBytes( ++nParam, bData );
easy as it can be..
I'm still working on implementing the streaming of the XML as it is generated from the first server where the XML is created and stream it to the response object, this is to take care when the XML with binary data is too large.
Vishesh Sahu
The basic problem is that you cannot have an arbitrary bytestream in an XML document, so you need to encode it somehow. A frequent encoding scheme is BASE64, but any will do as long as the recipient knows about it.
I know that the question was aking how to encode an image via XML, but it is also possible to just stream the bytes via an HTTP GET request instead of using XML and encoding an image. Note that input is a FileInputStream.
Server Code:
File f = new File(uri_string);
FileInputStream input = new FileInputStream(f);
OutputStream output = exchange.getResponseBody();
int c = 0;
while ((c = input.read()) != -1) {
output.write(c); //writes each byte to the exchange.getResponseBody();
}
result = new DownloadFileResult(int_list);
if (input != null) {input.close();}
if (output != null){ output.close();}
Client Code:
InputStream input = connection.getInputStream();
List<Integer> l = new ArrayList<>();
int b = 0;
while((b = input.read()) != -1){
l.add(b);//you can do what you wish with this list of ints ie- write them to a file. see code below.
}
Here is how you would write the Integer list to a file:
FileOutputStream out = new FileOutputStream("path/to/file.png");
for(int i : result_bytes_list){
out.write(i);
}
out.close();
node.setTextContent( base64.encodeAsString( fileBytes ) )
using org.apache.commons.codec.binary.Base64

Categories