Java – org.xml.sax.SAXParseException;Reference is not allowed in prolog. auto generated XHTML, java

I just wanted to try out Flying Saucer to generate a PDF from an xhtml code. So what I did was to make a layout in LibreOffice, let it generate the xhtml code and (wanted to) hand this over to the parsing library (in java) to generate the pdf.
However, I couldn't take over all of the xml-code 1:1 as i needed to escape things.. so i escaped all "<" with "<" and all ">" with ">" and all double-quotes with a " \" ".

When trying to parse the whole thing i get following error:

[Fatal Error] :1:2: Reference is not allowed in prolog.

I tried to track it down via some logic thinking and googling. If I understood right following is my "prolog":

    buf.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
    buf.append("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN\" \"http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd\">");
    buf.append("<html xmlns=\"http://www.w3.org/1999/xhtml\"><!--This file was converted to xhtml by OpenOffice.org - see http://xml.openoffice.org/odf2xhtml for more info.--><head profile=\"http://dublincore.org/documents/dcmi-terms/\"><meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=utf-8\"/><title xml:lang=\"en-US\">- no title specified</title><meta name=\"DCTERMS.title\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.language\" content=\"en-US\" scheme=\"DCTERMS.RFC4646\"/><meta name=\"DCTERMS.source\" content=\"http://xml.openoffice.org/odf2xhtml\"/><meta name=\"DCTERMS.issued\" content=\"2012-11-20T20:59:05.11\" scheme=\"DCTERMS.W3CDTF\"/><meta name=\"DCTERMS.provenance\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.subject\" content=\",\" xml:lang=\"en-US\"/><link rel=\"schema.DC\" href=\"http://purl.org/dc/elements/1.1/\" hreflang=\"en\"/><link rel=\"schema.DCTERMS\" href=\"http://purl.org/dc/terms/\" hreflang=\"en\"/><link rel=\"schema.DCTYPE\" href=\"http://purl.org/dc/dcmitype/\" hreflang=\"en\"/><link rel=\"schema.DCAM\" href=\"http://purl.org/dc/dcam/\" hreflang=\"en\"/><style type=\"text/css\">");

sorry for the huge (and ugly) thing, but well.. next thing i did, was commenting out line per line to see where the wrong thing is.

the error still appears if i comment out the first two lines of this, after the third i get a different error ("Content is not allowed in prolog" or similiar)

however, here is the third line.. i can't find the error, every help is appreciated 🙂

        buf.append("<html xmlns=\"http://www.w3.org/1999/xhtml\"><!--This file was converted to xhtml by OpenOffice.org - see http://xml.openoffice.org/odf2xhtml for more info.--><head profile=\"http://dublincore.org/documents/dcmi-terms/\"><meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=utf-8\"/><title xml:lang=\"en-US\">- no title specified</title><meta name=\"DCTERMS.title\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.language\" content=\"en-US\" scheme=\"DCTERMS.RFC4646\"/><meta name=\"DCTERMS.source\" content=\"http://xml.openoffice.org/odf2xhtml\"/><meta name=\"DCTERMS.issued\" content=\"2012-11-20T20:59:05.11\" scheme=\"DCTERMS.W3CDTF\"/><meta name=\"DCTERMS.provenance\" content=\"\" xml:lang=\"en-US\"/><meta name=\"DCTERMS.subject\" content=\",\" xml:lang=\"en-US\"/><link rel=\"schema.DC\" href=\"http://purl.org/dc/elements/1.1/\" hreflang=\"en\"/><link rel=\"schema.DCTERMS\" href=\"http://purl.org/dc/terms/\" hreflang=\"en\"/><link rel=\"schema.DCTYPE\" href=\"http://purl.org/dc/dcmitype/\" hreflang=\"en\"/><link rel=\"schema.DCAM\" href=\"http://purl.org/dc/dcam/\" hreflang=\"en\"/><style type=\"text/css\">");

thanks in advance!

edit1: http://validator.w3.org/check validated it as totally correct!

Solution to answer:

It appears you're being confused by the bad layout of this blog article. If you download the sample code, you'll see that the '<' and '>' characters are not converted to "<" and ">" in the author's actual code and data.

In order to get quotes into hard-coded Java strings, you do of course have to escape them. But you shouldn't need any of this xml escaping.