Add a new string in between existing string - java

I have read the content of HTML into string, now I need to add a new attribute inside the body tag, I was thinking of using StringBuilder for this. But I am unable to frame the logic. Any help would be really appreciated.
Existing HTML
<body class="temporaryrevision">
HTML that I want to create
<body class="temporaryrevision" bgcolor="#FFFFFF">

String htmlString="<html>...<body class="temporaryrevision">..</body>...</html>";
String[] tempData=htmlString.split("<body class=/"temporaryrevision/"");
String data = tempData[0]+"bgcolor=/"#FFFFFF/""+tempData[1];

You can use jQUery for this:
$( ".temporaryrevision" ).attr( "bgcolor", "#FFFFFF" );

Ill suggets a way that will be generic:
Use a XML parser for the same.
Load the file into java and pass it to and instance of XML Parser(SAXParser or something like that)
Traverse the parsed Object tree to your required element tag name. in this example HTML
use Library method to add attribute to the element.
convert the object back to xml format(Basic HTML).
Write and replace the file contents with your updated content.
This way is complex but will be generic to all ur such needs..

Related

JavaFX how to parse HTML String into HTML Element?

I wrote a method to insert a div with text passed as parameter.
And then I noticed I need to add various HTML content into that div. Current method works on these basic 5 lines of instruction:
//engine is the WebEngine object of some WebView object
Node html = engine.getDocument().getChildNodes().item(0);
Node body = html.getChildNodes().item(1);
Element e = engine.getDocument().createElement("div");
e.setTextContent(msg);
body.appendChild(e);
So here comes my question. Is there a way of parsing some HTML content into an Element object, so I can append that element to the document?
Example HTML String: <b>SomeText</b>
I solved the problem with Javascript! I could append any HTML data with JS.
Example:
engine.executeScript("document.body.innerHTML += '<div><b>SomeText</b></div>' ");
I recently created such a tool, I hope it helps a lot
https://github.com/graycatdeveloper/JavaFXHtmlText

How to select text in HTML tag without a tag around it (JSoup)

I would like to select the text inside the strong-tag but without the div under it...
Is there a possibility to do this with jsoup directly?
My try for the selection (doesn't work, selects the full content inside the strong-tag):
Elements selection = htmlDocument.select("strong").select("*:not(.dontwantthatclass)");
HTML:
<strong>
I want that text
<div class="dontwantthatclass">
</div>
</strong>
You are looking for the ownText() method.
String txt = htmlDocument.select("strong").first().ownText();
Have a look at various methods jsoup have to deal with it https://jsoup.org/apidocs/org/jsoup/nodes/Element.html. You can use remove(), removeChild() etc.
One thing you can do is use regex.
Here is a sample regex that matches start and end tag also appended by </br> tag
https://www.debuggex.com/r/1gmcSdz9s3MSimVQ
So you can do it like
selection.replace(/<([^ >]+)[^>]*>.*?<\/\1>|<[^\/]+\/>/ig, "");
You can further modify this regex to match most of your cases.
Another thing you can do is, further process your variable using javascript or vbscript:-
Elements selection = htmlDocument.select("strong")
jquery code here:-
var removeHTML = function(text, selector) {
var wrapped = $("<div>" + text + "</div>");
wrapped.find(selector).remove();
return wrapped.html();
}
With regular expression you can use ownText() methods of jsoup to get and remove unwanted string.
I guess you're using jQuery, so you could use "innerText" property on your "strong" element:
var selection = htmlDocument.select("strong")[0].innerText;
https://jsfiddle.net/scratch_cf/8ds4uwLL/
PS: If you want to wrap the retrieved text into a "strong" tag, I think you'll have to build a new element like $('<strong>retrievedText</strong>');

JSoup check if <HTML>,<HEAD> and <BODY> tags are present

Hi I am using JSoup to parse a HTML file. After parsing, I want to check if the file contains the tag. I am using the following code to check that,
htmlDom = parser.parse("<p>My First Heading</p>clk");
Elements pe = htmlDom.select("html");
System.out.println("size "+pe.size());
The output I get is "size 1" even though there is no HTML tag present. My guess is that it is because the HTML tag is not mandatory and that it is implicit. Same is the case for Head and Body tag. Is there any way I could check for sure if these tags are present in the input file?
Thank you.
It does not return 1 because the tag is implicit, but because it is present in the Document object htmlDom after you have parsed the custom HTML.
That is because Jsoup will try to conform the HTML5 Parsing Rules, and thus adds missing elements and tries to fix a broken document structure. I'm quite sure you would get a 1 in return if you were to run the following aswell:
Elements pe = htmlDom.select("head");
System.out.println("size "+pe.size());
To parse the HTML without Jsoup trying to clean or make your HTML valid, you can instead use the included XMLParser, as below, which will parse the HTML as it is.
String customHtml = "<p>My First Heading</p>clk";
Document customDoc = Jsoup.parse(customHtml, "", Parser.xmlParser());
So, as opposed to your assumption in the comments of the question, this is very much possible to do with Jsoup.

jmesa renders the html as text

I'm using jmesa in Java directly using the tableModel.render() to get the HTML directly. Some of my web objects in my result lists contain HTML - example:
class blah {
String email;
public String getEmailLink() {
return "<a href='" + email + "</a>"
}
}
In my Java code I would just do this:
htmlRow.addColumn(new HtmlColumn("emailLink"));
jmesa is rendering this as text. How can I tell jmesa to render the text as-is to be html in the document?
TIA
Looking at the JMesa soure code, HtmlCellEditor automatically escapes HTML.
I haven't tested it, but you should be able to override the default HtmlCellEditor with a different type... such as the bare-bones BasicCellEditor. It shouldn't be too much extra code:
HtmlColumn emailLinkColumn = new HtmlColumn("emailLink");
emailLinkColumn.setCellEditor(new BasicCellEditor());
htmlRow.addColumn(emailLinkColumn);
Another option to all of this is to create a custom CellEditor and have it create your <a> tag for you instead of doing it in your bean. This page should get you started with custom CellEditors if you want to go that route.
BTW, if you are messing with just a value inside of a cell, overriding/replacing CellEditor is probably all you need (CellEditor it is analogous to the body of a <td>). CellRenderer is concerned with the entire cell (analogous to the <td> as well as its contents).
Use a HtmlCellRenderer as shown in this tutorial.

Parsing HTML and get all the nodes

I need to parse an HTML file in java. Unlike XML there is no repetitive tags. So I need a code that can parse the html file and reach all nodes, it includes nested tags .. etc. The HTML code is not fixed. In other words given any HTML code I need to reach all the tags in the HTML.
try this HTML Parser
http://htmlparser.sourceforge.net/samples.html
I think you need this...
var els=document.getElementsByTagName("*");
for(var i=0;i<els.length;i+)document.write(els.nodeName+"<br />");

Categories