XMLParser is eating my whitespace

XMLParser is eating my whitespace - java

I am losing significant whitespace from a wiki page I am parsing and I'm thinking it's because of the parser. I have this in my Groovy script:
#Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )
def slurper = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser())
slurper.keepWhitespace = true
inputStream.withStream{ doc = slurper.parse(it)
println "originalContent = " + doc.'**'.find{ it.#id == 'editpageform' }.'**'.find { it.#name=='originalContent'}.#value
}
Where inputStream is initialized from a URL GET request to edit a confluence wiki page.
Later on in the withInputStream block where I do this:
println "originalContent = " + doc.'**'.find{ it.#id == 'editpageform' }.'**'.find { it.#name=='originalContent'}.#value
I notice all the original content of the page is stripped of its newlines. I originally thought it was a server-side thing but when I went to make the same req in my browser and view source I could see newlines in the "originalContent" hidden parameter. Is there an easy way to disable the whitespace normalization and preserve the contents of the field? The above was run against a internal Confluence wiki page but could most likely be reproved when editing any arbitrary wiki page.
Updated above
I added a call to "slurped.keepWhitespace = true" in an attempt to preserve whitespace but that still doesn't work. I'm thinking this method is intended for elements and not attributes? Is there a way to easily tweak flags on the underlying Java XMLParser? Is there a specific setting to set for whitespace in attribute values?

I first tried to reproduce this with some confluence page of my own, but there was no value attribute and no text content in the input node, so I created my own test html.
Now, I figured the tagsoup parser would need to be configured to preserve whitespace too, just setting this on the slurper won't help because the default is to ignore whitespace.
So I've done just this, the tagsoup feature ignorable-whitespace is documented btw. (search for whitespace on the page)
Anyway, it doesn't work. Whitespace from attributes is preserved as you can see from the example and preserving text whitespace doesn't seem to work despite setting the extra feature. Maybe this is a bug in tagsoup or the xml slurper?
I suggest you have a closer look at your html too, is there really a value attribute present?
#Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )
String html = """\
<html><head><title>test</title></head><body>
<p>
<form id="editpageform">
<p>
<input name="originalContent" value=" ">
</input>
</p>
</form>
</p>
</body></html>
"""
def inputStream = new ByteArrayInputStream(html.getBytes())
def parser = new org.ccil.cowan.tagsoup.Parser()
parser.setFeature("http://www.ccil.org/~cowan/tagsoup/features/ignorable-whitespace", true)
def slurper = new XmlSlurper(parser)
slurper.keepWhitespace = true
inputStream.withStream{ doc = slurper.parse(it)
def parse = { doc.'**'.find{ it.#id == 'editpageform' }.'**'.find { it.#name=='originalContent'} }
println "originalContent (name) = '${parse().#name}'"
println "originalContent (value) = '${parse().#value}'"
println "originalContent (text) = '${parse().text()}'"
}

It seems the newlines are not preserved in the value attribute. See below:
#Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )
String html = """\
<html><head><title>test</title></head><body>
<p>
<form id="editpageform">
<p>
<input name="originalContent" value="
">
</input>
</p>
</form>
</p>
</body></html>
"""
def inputStream = new ByteArrayInputStream(html.getBytes())
def parser = new org.ccil.cowan.tagsoup.Parser()
parser.setFeature("http://www.ccil.org/~cowan/tagsoup/features/ignorable-whitespace", true)
def slurper = new XmlSlurper(parser)
slurper.keepWhitespace = true
inputStream.withStream{ doc = slurper.parse(it)
def parse = { doc.'**'.find{ it.#id == 'editpageform' }.'**'.find { it.#name=='originalContent'} }
println "originalContent (name) = '${parse().#name}'"
println "originalContent (value) = '${parse().#value}'"
println "originalContent (text) = '${parse().text()}'"
assert parse().#value.toString().contains('\n') : "Should contain a newline"
}

Related

HtmlUnit asNormalizedText() returns empty string

I have this code:
HtmlPage rowPage = ...
String address1 = ((HtmlDivision)rowPage.getFirstByXPath("//div[contains(#class, 'client_address1')]")).asXml();
System.out.println("address1 = " + address1);
String address1_2 = ((HtmlDivision)rowPage.getFirstByXPath("//div[contains(#class, 'client_address1')]")).asNormalizedText();
System.out.println("address1_2 = " + address1_2);
and my output is:
address1 = <div class="client_address1 clientRow">
123 Somewhere ln
</div>
address1_2 =
I expect asNormalizedText() to return 123 Somewhere ln. What circumstances would cause asNormalizedText to return nothing?

A bit more specific XPath would help
//div[contains(#class, 'client_address1')]/text()

What circumstances would cause asNormalizedText to return nothing?
From the javadoc:
Returns a normalized textual representation of this element that represents
what would be visible to the user if this page was shown in a web browser.
Please check the css class - maybe the style hides the text.

Java jcabi xpath returns unescaped text

Consider the following:
String s = "<tag>This has a <a href=\"#\">link<a>.</tag>";
final XML xml = new XMLDocument(s);
String extractedText = xml.xpath("//tag/text()").get(0);
System.out.println(extractedText); // Output: This has a link.
System.out.println(s.contains(extractedText)); // Output: false!
System.out.println(s.contains("This has a <a href=\"#\">link<a>.")); // Output: true
I have an XML file given as a string with some escaped HTML. Using the jcabi library, I get the text of the relevant elements (in this case everything in <tag>s). However, what I get isn't actually what's in the original string--I'm expecting < and > but am getting < and > instead. The original string paradoxically does not contain the substring that I extracted from it.
How can I get the actual text and not an unescaped version?

Html Slurping in Groovy

I am trying to parse HTML that comes to me as a giant String. When I get to Line 13, NodeChild page = it.parent()
I am able to find the key that I am looking for, but the data comes to me like This Is Value One In My KeyThis is Value Two in my KeyThis is Value Three In My Key and so on. I see a recurring trend where the seperator between the two is always UppercaseUppercase (withoutSpaces).
I would like to put it into an ArrayList one way or another. Is there a method that I am missing from the docs that is able to automatically do this? Is there a better way to parse this together?
class htmlParsingStuff{
private def slurper = new XmlSlurper(new Parser())
private void slurpItUp(String rawHTMLString){
ArrayList urlList = []
def htmlParser = slurper.parseText(rawHTMLString)
htmlParser.depthFirst().findAll() {
//Loop through all of the HTML Tags to get to the key that I am looking for
//EDIT: I see that I am able to iterate through the parent object, I just need a way to figure out how to get into that object
boolean trigger = it.text() == 'someKey'
if (trigger){
//I found the key that I am looking for
NodeChild page = it.parent()
page = page.replace('someKey', '')
LazyMap row = ["page": page, "type": "Some Type"]
urlList.add(row)
}
}
}
}

I can't provide you with working code since I don't know your specific html.
But: don't use XmlSlurper for parsing HTML, HTML is not well formed and therefor XmlSlurper is not the right tool for the job.
For HTML use a library like JSoup. You will find it much easier to use especially if you have some JQuery knowledge. Since you didn't post your HTML snippet I made up my own example:
#Grab(group='org.jsoup', module='jsoup', version='1.10.1')
import org.jsoup.Jsoup
def html = """
<html>
<body>
<table>
<tr><td>Key 1</td></tr>
<tr><td>Key 2</td></tr>
<tr><td>Key 3</td></tr>
<tr><td>Key 4</td></tr>
<tr><td>Key 5</td></tr>
</table>
</body>
</html>"""
def doc = Jsoup.parse(html)
def elements = doc.select('td')
def result = elements.collect {it.text()}
// contains ['Key 1', 'Key 2', 'Key 3', 'Key 4', 'Key 5']
To manipulate the document you would use
def doc = Jsoup.parse(html)
def elements = doc.select('td')
elements.each { oldElement ->
def newElement = new Element(Tag.valueOf('td'), '')
newElement.text('Another key')
oldElement.replaceWith(newElement)
}
println doc.outerHtml()

Form not binding in Play 2.4.6 framework

The problem in a nut shell:
Two forms with more or less identical code, both take a single value "ean". One form works as intended, the other always fails to bind. I Included
println(form.data)
in each controller to see what was going on. When entering the value "h", for example, into each form the working form prints out
Map(ean -> h)
where as the "broken" form prints out
Map()
So why is the second form not binding values?
The story:
I have been doing a project in Scala using the Play framework. Things were going well until I wanted to create a new form. For some reason this form always fails to bind. I couldn't see why this was happening so I decided to make a "copy" of a currently working form, only changing the names of the variables etc. However this "copy" form also has the same problem! I've looked online for a bit of help and the closest problem I can find is the following:
Issue with bindFromRequest in Play! Framework 2.3
But after trying the posted solution it doesn't seem to help at all. Below I've posted the relevant chunks of code, the "ProductPartForm" is the original working form and the "AddDealForm" is the broken copy form. Am I making a silly trivial error somewhere? Any help would be greatly appreciated. Please also note that I'm aware that the "success" message doesn't work (as you can see from the comment), however that shouldn't have any effect on the problem I'm considering.
Thanks!
The code:
Classes:
package models
case class ProductPartForm(ean: Long) {
}
and
package models
case class AddDealForm(ean : Long) {
}
Controller:
package controllers
class Suppliers extends Controller {
private val productForm: Form[ProductPartForm] = Form(mapping("ean" -> longNumber)(ProductPartForm.apply)(ProductPartForm.unapply))
private val dealForm: Form[AddDealForm] = Form(mapping("ean" -> longNumber)(AddDealForm.apply)(AddDealForm.unapply))
def supplierList = Action {
implicit request =>
val suppliers = Supplier.findAll
Ok(views.html.supplierList(suppliers, productForm, dealForm))
}
def findByProduct = Action { implicit request =>
val newProductForm = productForm.bindFromRequest()
newProductForm.fold(
hasErrors = { form =>
println(form.data)
val message = "Incorrent EAN number! Please try again."
Redirect(routes.Suppliers.supplierList()).flashing("error" -> message)
},
success = { newProduct =>
val productSuppliers = Supplier.findByProduct(newProductForm.get.ean)
val message2 = "It worked!" //can't display message?
Ok(views.html.supplierList(productSuppliers, productForm ,dealForm)).flashing("success" -> message2)
}
)
}
def addDeal = Action { implicit request =>
val newDealForm = dealForm.bindFromRequest()
dealForm.fold(
hasErrors = { form =>
println(form.data)
val message = "Incorrent EAN number! Please try again."
Redirect(routes.Suppliers.supplierList()).flashing("error" -> message)
},
success = { newDeal =>
val message2 = "a"
Redirect(routes.Products.list).flashing("success" -> message2)
}
)
}
HTML:
#helper.form(action = routes.Suppliers.findByProduct()) {
<fieldset style="margin-left:200px">
<legend>
#helper.inputText(productForm("ean"))
</legend>
</fieldset>
<div style="padding-bottom:60px">
<input type="submit" class="btn primary" value="Submit" style="margin-left:400px">
</div>
}
#helper.form(action = routes.Suppliers.addDeal()) {
<fieldset style="margin-left:200px">
<legend>
#helper.inputText(dealForm("ean"))
</legend>
</fieldset>
<div style="padding-bottom:60px">
<input type="submit" class="btn primary" value="Submit" style="margin-left:400px">
</div>
}
Routes:
POST /Suppliers controllers.Suppliers.findByProduct
POST /Suppliers/b controllers.Suppliers.addDeal

I had exactly some problem with play 2.4.6 version. In my case problem was that I not specified a request body parser. More about body parsers can be found there:
https://www.playframework.com/documentation/2.5.x/ScalaBodyParsers .
You should specify body parser in your action (use urlFormEncoded if you use simple form)
def findByProduct = Action(parse.urlFormEncoded) { implicit request =>
}

Dynamically add SWFObject using Wicket

I am trying to add a Flash (*.swf) file to my Wicket application. I found some information here, but unfortunately it is not working, and I don't know why. On a web page, the elements and tag
<object wicket:id="swf" data="resources/test.swf" width="700" height="70" style="float: right; margin: 15px 0 0 0;"></object>
render as
<object height="70" style="float: right; margin: 15px 0 0 0;" width="140" data="../../resources/wicketapp.ViewPanel/resources/test.swf" type="application/x-shockwave-flash"><param name="movie" value="../../resources/wicketapp.ViewPanel/resources/test.swf">
</object>
Clearly, this is not the path of my Flash file. Also, I want to load the file dynamically, but the method of embedding Flash discussed in the above link is static. How can I load swf files dynamically?

Looking at the linked implementation, if you want an absolute path you should precede it with a slash:
// if it's an absolute path, return it:
if( src.startsWith( "/" ) || src.startsWith( "http://" ) || src.startsWith( "https://" ) )
return(src);
Otherwise a wicket resource path is generated.
I'd actually recommend using swfobject for embedding flash - there is some nice wicket integration code at the start of this page, along with a flash-based component that uses it.

As I have understood your question, your want change swf file in runtime. I have solve this problem as shown below (this is Scala code, but I suppose that you understand it):
class SWFObject(id: String) extends WebComponent(id)
with LoggerSupport {
def script: String = """
var swfVersionStr = "10.0.0";
var xiSwfUrlStr = "flash/playerProductInstall.swf";
var flashvars = {};
var params = {};
params.quality = "high";
params.bgcolor = "#ebf4ff";
params.allowscriptaccess = "sameDomain";
params.allowfullscreen = "true";
var attributes = {};
attributes.align = "middle";
swfobject.embedSWF(
"${name}", "flashContent",
"100%", "100%",
swfVersionStr, xiSwfUrlStr,
flashvars, params, attributes);
swfobject.createCSS("#flashContent", "display:block;text-align:left;");
"""
/**
* Path to SWF file.
*/
var swfFile: String = _;
override def onComponentTag(tag: ComponentTag) = {
checkComponentTag(tag, "script")
}
override def onComponentTagBody(markupStream: MarkupStream, openTag: ComponentTag) = {
val relativeName = getRequestCycle()
.getProcessor()
.getRequestCodingStrategy()
.rewriteStaticRelativeUrl(swfFile)
val body = body.replace("${name}", relativeName)
replaceComponentTagBody(markupStream, openTag, body)
}
}
Here are example of using:
private val gameObject = new SWFObject("game");
gameObject.swfFile = "flash/" + swfFile;
HTML is used swfobject script and based on standard FlashBuilder export.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XMLParser is eating my whitespace - java

Related

HtmlUnit asNormalizedText() returns empty string

Java jcabi xpath returns unescaped text

Html Slurping in Groovy

Form not binding in Play 2.4.6 framework

Dynamically add SWFObject using Wicket

Categories

Resources