I have a problem displaying Cyrillic symbols. I have an HTML containing Cyrillic symbols. The problem is that after converting they all displaying like ### instead of symbols. I'm using the library like this:
var document = Jsoup.parse(new ByteArrayInputStream(resultHtml), "UTF-8", "/");
ByteArrayOutputStream os = new ByteArrayOutputStream();
try (os) {
var temp = new W3CDom().fromJsoup(document);
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.toStream(os);
builder.useFont(new File("/resources/fonts/times.ttf"), "Times");
builder.withW3cDocument(temp, null);
builder.run();
}
return os;
The resultHtml is a HTML string and it's okay, because using library iText7 I got the result I wanted: I got PDF with normal symbols, but the problem is that it's not free, I'm saying this only to cut the area of possible problems, so I assume the problem is in how I use the library. I don't really have any resources related to html, that's why it's baseUri is / and null. Library gives me 2 warnings but I don't think the problem is because of that because it says it's ignoring it.
com.openhtmltopdf.css-parse WARNING:: (null#inline_style_1) so-language is an unrecognized CSS property at line 21. Ignoring declaration.
com.openhtmltopdf.css-parse WARNING:: (null#inline_style_1) so-language is an unrecognized CSS property at line 32. Ignoring declaration.
I checked in the debug, I can see the document is okay because I can see the formed HTML with cyrillic symbols normally, but the temp is becoming [#document:null]. I read that it doesn't mean the document is null, but maybe it's the problem? I tried different charsets like CP1251, CP1252 but they're giving strange symbols too. At first I tried all charsets without the font declaration, because the only font in use is TimesNewRoman and I think it's default, but then added it in resources and in code declaration, but it didn't help. I'm using 1.0.10 version of the library and 1.14.3 version of jsoup.
I have a simple xml file on my hard drive.
When I open it with notepad++ this is what I see:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>
... more stuff here ...
</content>
But when I read it using a FileInputStream I get:
?<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>...
I'm using JAXB to parse xml's and it throws an exception of "content not allowed in prolog" because of that "?" sign.
What is this extra "?" sign? why is it there and how do I get rid of it?
That extra character is a byte order mark, a special Unicode character code which lets the XML parser know what the byte order (little endian or big endian) of the bytes in the file is.
Normally, your XML parser should be able to understand this. (If it doesn't, I would regard that a bug in the XML parser).
As a workaround, make sure that the program that produces this XML leaves off the BOM.
Check the encoding of the file, I've seen a similar thing, openeing the file in most editors and it looked fine, turned out it was encoded with UTF-8 without BOM (or with, I can't recall off the top of my head). Notepad++ should be ok to switch between the two.
You can use Notepad++ to see show all symbols from the View > Show Symbols > Show All Characters menu. It would show you the extra bytes present in the beginning. There is a possibility that it is the byte order mark. If the extra bytes are indeed byte order mark, this approach would not help. In that case, you will need to download a hex editor or if you have Cygwin installed, follow the steps in the last paragraph of this response. Once you can see the file in terms of hex codes, look for the first two characters. Do they have one of the codes mentioned at http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
If they indeed are byte order mark or if you are unable to determine the cause of the error, just try this:
From the menu select, Encoding > Encoding in UTF-8 without BOM, and then save the file.
(On Linux, one can use command line tools to check what's the in the beginning. e.g. xxd -g1 filename | head or od -t cx1 filename | head.)
You might be having a newline. Delete that.
Select View > Show Symbol > Show All Characters in Notepad++ to see what's happening.
this is not a jaxb problem, the problem resides in the way you use to read the xml ... try using an inputstream
...
Unmarshaller u = jaxbContext.createUnmarshaller();
XmlDataObject xmlDataObject = (XmlDataObject) u.unmarshal(new FileInputStream("foo.xml"));
...
Next to the FileInputStream a ByteArrayInputStream worked also with me:
JAXB.unmarshal(new ByteArrayInputStream(string.getBytes("UTF-8")), Delivery.class);
=> No unmarshaling error anymore.
I have a Java based web service client connected to Java web service (implemented on the Axis1 framework).
I am getting following exception in my log file:
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
at org.apache.ws.axis.security.WSDoAllReceiver.invoke(WSDoAllReceiver.java:114)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:198)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
This is often caused by a white space before the XML declaration, but it could be any text, like a dash or any character. I say often caused by white space because people assume white space is always ignorable, but that's not the case here.
Another thing that often happens is a UTF-8 BOM (byte order mark), which is allowed before the XML declaration can be treated as whitespace if the document is handed as a stream of characters to an XML parser rather than as a stream of bytes.
The same can happen if schema files (.xsd) are used to validate the xml file and one of the schema files has an UTF-8 BOM.
Actually in addition to Yuriy Zubarev's Post
When you pass a nonexistent xml file to parser. For example you pass
new File("C:/temp/abc")
when only C:/temp/abc.xml file exists on your file system
In either case
builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = builder.parse(new File("C:/temp/abc"));
or
DOMParser parser = new DOMParser();
parser.parse("file:C:/temp/abc");
All give the same error message.
Very disappointing bug, because the following trace
javax.servlet.ServletException
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
...
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
... 40 more
doesn't say anything about the fact of 'file name is incorrect' or 'such a file does not exist'. In my case I had absolutely correct xml file and had to spent 2 days to determine the real problem.
Try adding a space between the encoding="UTF-8" string in the prolog and the terminating ?>. In XML the prolog designates this bracket-question mark delimited element at the start of the document (while the tag prolog in stackoverflow refers to the programming language).
Added: Is that dash in front of your prolog part of the document? That would be the error there, having data in front of the prolog, -<?xml version="1.0" encoding="UTF-8"?>.
I had the same problem (and solved it) while trying to parse an XML document with freemarker.
I had no spaces before the header of XML file.
The problem occurs when and only when the file encoding and the XML encoding attribute are different. (ex: UTF-8 file with UTF-16 attribute in header).
So I had two ways of solving the problem:
changing the encoding of the file itself
changing the header UTF-16 to UTF-8
It means XML is malformed or the response body is not XML document at all.
Just spent 4 hours tracking down a similar problem in a WSDL. Turns out the WSDL used an XSD which imports another namespace XSD. This imported XSD contained the following:
<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://www.xyz.com/Services/CommonTypes" elementFormDefault="qualified"
xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:CommonTypes="http://www.xyz.com/Services/CommonTypes">
<include schemaLocation=""></include>
<complexType name="RequestType">
<....
Note the empty include element! This was the root of my woes. I guess this is a variation on Egor's file not found problem above.
+1 to disappointing error reporting.
My answer wouldn't help you probably, but it help with this problem generally.
When you see this kind of exception you should try to open your xml file in any Hex Editor and sometime you can see additional bytes at the beginning of the file which text-editor doesn't show.
Delete them and your xml will be parsed.
In my case, removing the 'encoding="UTF-8"' attribute altogether worked.
It looks like a character set encoding issue, maybe because your file isn't really in UTF-8.
For the same issues, I have removed the following line,
File file = new File("c:\\file.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");
It is working fine. Not so sure why that UTF-8 gives problem. To keep me in shock, it works fine for UTF-8 also.
Am using Windows-7 32 bit and Netbeans IDE with Java *jdk1.6.0_13*. No idea how it works.
Sometimes it's the code, not the XML
The following code,
Document doc = dBuilder.parse(new InputSource(new StringReader("file.xml")));
will also result in this error,
[Fatal Error] :1:1: Content is not allowed in prolog.org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
because it's attempting to parse the string literal, "file.xml" (not the contents of the file.xml file) and failing because "file.xml" as a string is not well-formed XML.
Fix: Remove StringReader():
Document doc = dBuilder.parse(new InputSource("file.xml"));
Similarly, dirty buffer problems can leave residual junk ahead of the actual XML. If you've carefully checked your XML and are still getting this error, log the exact contents being passed to the parser; sometimes what's actually being (tried to be) parsed is surprising.
First clean project, then rebuild project. I was also facing the same issue. Everything came alright after this.
If all else fails, open the file in binary to make sure there are no funny characters [3 non printable characters at the beginning of the file that identify the file as utf-8] at the beginning of the file. We did this and found some. so we converted the file from utf-8 to ascii and it worked.
As Mike Sokolov has already pointed it out, one of the possible reasons is presence of some character/s (such as a whitespace) before the tag.
If your input XML is being read as a String (as opposed to byte array) then you
can use replace your input string with the below code to make sure that all 'un-necessary'
characters before the xml tag are wiped off.
inputXML=inputXML.substring(inputXML.indexOf("<?xml"));
You need to be sure that the input xml starts with the xml tag though.
To fix the BOM issue on Unix / Linux systems:
Check if there's an unwanted BOM character:
hexdump -C myfile.xml | more
An unwanted BOM character will appear at the start of the file as ...<?xml>
Alternatively, do file myfile.xml. A file with a BOM character will appear as: myfile.xml: XML 1.0 document text, UTF-8 Unicode (with BOM) text
Fix a single file with: tail -c +4 myfile.xml > temp.xml && mv temp.xml myfile.xml
Repeat 1 or 2 to check the file has been sanitised. Probably also sensible to do view myfile.xml to check contents have stayed.
Here's a bash script to sanitise a whole folder of XML files:
#!/usr/bin/env bash
# This script is to sanitise XML files to remove any BOM characters
has_bom() { head -c3 "$1" | LC_ALL=C grep -qe '\xef\xbb\xbf'; }
for filename in *.xml ; do
if has_bom ${filename}; then
tail -c +4 ${filename} > temp.xml
mv temp.xml ${filename}
fi
done
What i have tried [Did not work]
In my case the web.xml in my application had extra space. Even after i deleted ; it did not work!.
I was playing with logging.properties and web.xml in my tomcat, but even after i reverted the error persists!.
Solution
To be specific i tried do adding
org.apache.catalina.filters.ExpiresFilter.level = FINE
Tomcat expire filter is not working correctly
I followed the instructions found here and i got the same error.
I tried several things to solve it (ie changing the encoding, typing the XML file rather than copy-pasting it ect) in Notepad and XML Notepad but nothing worked.
The problem got solved when I edited and saved my XML file in Notepad++ (encoding --> utf-8 without BOM)
In my case I got this error because the API I used could return the data either in XML or in JSON format. When I tested it using a browser, it defaulted to the XML format, but when I invoked the same call from a Java application, the API returned the JSON formatted response, that naturally triggered a parsing error.
Just an additional thought on this one for the future. Getting this bug could be the case that one simply hits the delete key or some other key randomly when they have an XML window as the active display and are not paying attention. This has happened to me before with the struts.xml file in my web application. Clumsy elbows ...
I was also getting the same
XML reader error: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,2] Message: Reference is not allowed in prolog.
, when my application was creating a XML response for a RestFull Webservice call.
While creating the XML format String I replaced the < and > with < and > then the error went off, and I was getting proper response. Not sure how it worked but it worked.
sample:
String body = "<ns:addNumbersResponse xmlns:ns=\"http://java.duke.org\"><ns:return>"
+sum
+"</ns:return></ns:addNumbersResponse>";
I had the same issue.
First I downloaded the XML file to local desktop and I got Content is not allowed in prolog during the importing file to portal server. Even visually file was looking good to me but somehow it's was corrupted.
So I re-download the same file and tried the same and it worked.
We had the same problem recently and it turned out to be the case of a bad URL and consequently a standard 403 HTTP response (which obviously isn't the valid XML the client was looking for). I'm going to share the detail in case someone within the same context run into this problem:
This was a Spring based web application in which a "JaxWsPortProxyFactoryBean" bean was configured to expose a proxy for a remote port.
<bean id="ourPortJaxProxyService"
class="org.springframework.remoting.jaxws.JaxWsPortProxyFactoryBean"
p:serviceInterface="com.amir.OurServiceSoapPortWs"
p:wsdlDocumentUrl="${END_POINT_BASE_URL}/OurService?wsdl"
p:namespaceUri="http://amir.com/jaxws" p:serviceName="OurService"
p:portName="OurSoapPort" />
The "END_POINT_BASE_URL" is an environment variable configured in "setenv.sh" of the Tomcat instance that hosts the web application. The content of the file is something like this:
export END_POINT_BASE_URL="http://localhost:9001/BusinessAppServices"
#export END_POINT_BASE_URL="http://localhost:8765/BusinessAppServices"
The missing ";" after each line caused the malformed URL and thus the bad response. That is, instead of "BusinessAppServices/OurService?wsdl" the URL had a CR before "/". "TCP/IP Monitor" was quite handy while troubleshooting the problem.
For all those that get this error:
WARNING: Catalina.start using conf/server.xml: Content is not allowed in prolog.
Not very informative.. but what this actually means is that there is garbage in your conf/server.xml file.
I have seen this exact error in other XML files.. this error can be caused by making changes with a text editor which introduces the garbage.
The way you can verify whether or not you have garbage in the file is to open it with a "HEX Editor" If you see any character before this string
"<?xml version="1.0" encoding="UTF-8"?>"
like this would be garbage
"‰ŠŒ<?xml version="1.0" encoding="UTF-8"?>"
that is your problem....
The Solution is to use a good HEX Editor.. One that will allow you to save files with differing types of encoding..
Then just save it as UTF-8.
Some systems that use XML files may need it saved as UTF NO BOM
Which means with "NO Byte Order Mark"
Hope this helps someone out there!!
For me, a Build->Clean fixed everything!
I had the same problem with some XML files, I solved reading the file with ANSI encoding (Windows-1252) and writing a file with UTF-8 encoding with a small script in Python. I tried use Notepad++ but I didn't have success:
import os
import sys
path = os.path.dirname(__file__)
file_name = 'my_input_file.xml'
if __name__ == "__main__":
with open(os.path.join(path, './' + file_name), 'r', encoding='cp1252') as f1:
lines = f1.read()
f2 = open(os.path.join(path, './' + 'my_output_file.xml'), 'w', encoding='utf-8')
f2.write(lines)
f2.close()
Even I had faced a similar problem. Reason was some garbage character at the beginning of the file.
Fix : Just open the file in a text editor(tested on Sublime text) remove any indent if any in the file and copy paste all the content of the file in a new file and save it. Thats it!. When I ran the new file it ran without any parsing errors.
I took code of Dineshkumar and modified to Validate my XML file correctly:
import org.apache.log4j.Logger;
public class Myclass{
private static final Logger LOGGER = Logger.getLogger(Myclass.class);
/**
* Validate XML file against Schemas XSD in pathEsquema directory
* #param pathEsquema directory that contains XSD Schemas to validate
* #param pathFileXML XML file to validate
* #throws BusinessException if it throws any Exception
*/
public static void validarXML(String pathEsquema, String pathFileXML)
throws BusinessException{
String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
String nameFileXSD = "file.xsd";
String MY_SCHEMA1 = pathEsquema+nameFileXSD);
ParserErrorHandler parserErrorHandler;
try{
SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA);
Source [] source = {
new StreamSource(new File(MY_SCHEMA1))
};
Schema schemaGrammar = schemaFactory.newSchema(source);
Validator schemaValidator = schemaGrammar.newValidator();
schemaValidator.setErrorHandler(
parserErrorHandler= new ParserErrorHandler());
/** validate xml instance against the grammar. */
File file = new File(pathFileXML);
InputStream isS= new FileInputStream(file);
Reader reader = new InputStreamReader(isS,"UTF-8");
schemaValidator.validate(new StreamSource(reader));
if(parserErrorHandler.getErrorHandler().isEmpty()&&
parserErrorHandler.getFatalErrorHandler().isEmpty()){
if(!parserErrorHandler.getWarningHandler().isEmpty()){
LOGGER.info(
String.format("WARNING validate XML:[%s] Descripcion:[%s]",
pathFileXML,parserErrorHandler.getWarningHandler()));
}else{
LOGGER.info(
String.format("OK validate XML:[%s]",
pathFileXML));
}
}else{
throw new BusinessException(
String.format("Error validate XML:[%s], FatalError:[%s], Error:[%s]",
pathFileXML,
parserErrorHandler.getFatalErrorHandler(),
parserErrorHandler.getErrorHandler()));
}
}
catch(SAXParseException e){
throw new BusinessException(String.format("Error validate XML:[%s], SAXParseException:[%s]",
pathFileXML,e.getMessage()),e);
}
catch (SAXException e){
throw new BusinessException(String.format("Error validate XML:[%s], SAXException:[%s]",
pathFileXML,e.getMessage()),e);
}
catch (IOException e) {
throw new BusinessException(String.format("Error validate XML:[%s],
IOException:[%s]",pathFileXML,e.getMessage()),e);
}
}
}
Set your document to form like this:
<?xml version="1.0" encoding="UTF-8" ?>
<root>
%children%
</root>
I had the same issue with spring
MarshallingMessageConverter
and by pre-proccess code.
Mayby someone will need reason:
BytesMessage #readBytes - reading bytes.. and i forgot that reading is one direction operation.
You can not read twice.
Try with BOMInputStream in apache.commons.io:
public static <T> T getContent(Class<T> instance, SchemaType schemaType, InputStream stream) throws JAXBException, SAXException, IOException {
JAXBContext context = JAXBContext.newInstance(instance);
Unmarshaller unmarshaller = context.createUnmarshaller();
Reader reader = new InputStreamReader(new BOMInputStream(stream), "UTF-8");
JAXBElement<T> entry = unmarshaller.unmarshal(new StreamSource(reader), instance);
return entry.getValue();
}
I was having the same problem while parsing the info.plist file in my mac. However, the problem was fixed using the following command which turned the file into an XML.
plutil -convert xml1 info.plist
Hope that helps someone.
I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.
I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:
<?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ListDomainsResult>
<DomainName>Audio</DomainName>
<DomainName>Course</DomainName>
<DomainName>DocumentContents</DomainName>
<DomainName>LectureSet</DomainName>
<DomainName>MetaData</DomainName>
<DomainName>Professors</DomainName>
<DomainName>Tag</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
<BoxUsage>0.0000071759</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
I pass in this XML to a parser with
XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
and call eventReader.nextEvent(); a bunch of times to get the data I want.
Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:
com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
... (rest of lines omitted)
I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.
It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:
XML with and without the prolog
With and without newlines
With and without the "encoding=" attribute in the prolog
Both newline styles
With and without the chunking information present in the HTTP stream
And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?
Thanks!
The encoding in your XML and XSD (or DTD) are different.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>
Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:
helloworld<?xml version="1.0" encoding="utf-8"?>
or even a space or special character.
There are some special characters called byte order markers that could be in the buffer.
Before passing the buffer to the Parser do this...
String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\\W]+)<","<");
I had issue while inspecting the xml file in notepad++ and saving the file, though I had the top utf-8 xml tag as <?xml version="1.0" encoding="utf-8"?>
Got fixed by saving the file in notpad++ with Encoding(Tab) > Encode in UTF-8:selected (was Encode in UTF-8-BOM)
This error message is always caused by the invalid XML content in the beginning element. For example, extra small dot “.” in the beginning of XML element.
Any characters before the “<?xml….” will cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” error message.
A small dot “.” before the “<?xml….
To fix it, just delete all those weird characters before the “<?xml“.
Ref: http://www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/
I catched the same error message today.
The solution was to change the document from UTF-8 with BOM to UTF-8 without BOM
I was facing the same issue. In my case XML files were generated from c# program and feeded into AS400 for further processing. After some analysis identified that I was using UTF8 encoding while generating XML files whereas javac(in AS400) uses "UTF8 without BOM".
So, had to write extra code similar to mentioned below:
//create encoding with no BOM
Encoding outputEnc = new UTF8Encoding(false);
//open file with encoding
TextWriter file = new StreamWriter(filePath, false, outputEnc);
file.Write(doc.InnerXml);
file.Flush();
file.Close(); // save and close it
In my xml file, the header looked like this:
<?xml version="1.0" encoding="utf-16"? />
In a test file, I was reading the file bytes and decoding the data as UTF-8 (not realizing the header in this file was utf-16) to create a string.
byte[] data = Files.readAllBytes(Paths.get(path));
String dataString = new String(data, "UTF-8");
When I tried to deserialize this string into an object, I was seeing the same error:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
When I updated the second line to
String dataString = new String(data, "UTF-16");
I was able to deserialize the object just fine. So as Romain had noted above, the encodings need to match.
Removing the xml declaration solved it
<?xml version='1.0' encoding='utf-8'?>
Unexpected reason: # character in file path
Due to some internal bug, the error Content is not allowed in prolog also appears if the file content itself is 100% correct but you are supplying the file name like C:\Data\#22\file.xml.
This may possibly apply to other special characters, too.
How to check: If you move your file into a path without special characters and the error disappears, then it was this issue.
I was facing the same problem called "Content is not allowed in prolog" in my xml file.
Solution
Initially my root folder was '#Filename'.
When i removed the first character '#' ,the error got resolved.
No need of removing the #filename...
Try in this way..
Instead of passing a File or URL object to the unmarshaller method, use a FileInputStream.
File myFile = new File("........");
Object obj = unmarshaller.unmarshal(new FileInputStream(myFile));
In the spirit of "just delete all those weird characters before the <?xml", here's my Java code, which works well with input via a BufferedReader:
BufferedReader test = new BufferedReader(new InputStreamReader(fisTest));
test.mark(4);
while (true) {
int earlyChar = test.read();
System.out.println(earlyChar);
if (earlyChar == 60) {
test.reset();
break;
} else {
test.mark(4);
}
}
FWIW, the bytes I was seeing are (in decimal): 239, 187, 191.
I had a tab character instead of spaces.
Replacing the tab '\t' fixed the problem.
Cut and paste the whole doc into an editor like Notepad++ and display all characters.
In my instance of the problem, the solution was to replace german umlauts (äöü) with their HTML-equivalents...
bellow are cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” exception.
First check the file path of schema.xsd and file.xml.
The encoding in your XML and XSD (or DTD) should be same.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-8'?>
if anything comes before the XML document type declaration.i.e: hello<?xml version='1.0' encoding='utf-16'?>
I zipped the xml in a Mac OS and sent it to a Windows machine, the default compression changes these files so the encoding sent this message.
Happened to me with #JsmListener with Spring Boot when listening to IBM MQ. My method received String parameter and got this exception when I tried to deserialize it using JAXB.
It seemed that that the string I got was a result of byte[].toString(). It was a list of comma separated numbers.
I solved it by changing the parameter type to byte[] and then created a String from it:
#JmsListener(destination = "Q1")
public void receiveQ1Message(byte[] msgBytes) {
var msg = new String(msgBytes);