How to get SVGDocument object from an SVG string? - java

I gonna get an SVGDocument object to fill into JSVGCanvas, but I just had an SVG string without any files, so I cannot use URI to construct.

You can read your SVG from a StringReader like this:
StringReader reader = new StringReader(svgString);
String uri = "file:make-something-up";
String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory f = new SAXSVGDocumentFactory(parser);
SVGDocument doc = f.createSVGDocument(uri, reader);
You need to make up a valid URI but it's not important unless you make relative references to other URIs from your SVG.

Related

Trying to get the normal text between an opening and closing element tag

I am trying to get just the text printed out between one specific element tag in an XML file. Here is my java code:
SAXBuilder builder = new SAXBuilder();
byte[] requestFile = FileManager.getByteArray(args[0]);
byte[] responseFile = FileManager.getByteArray(args[1]);
InputStream request = new ByteArrayInputStream(requestFile);
InputStream response = new ByteArrayInputStream(responseFile);
Document requestDoc = builder.build(request);
Document responseDoc = builder.build(response);
String xpathResponseStr = "//status";
JDOMXPath xpath = new JDOMXPath(xpathResponseStr);
Element responseElem = (Element)xpath.selectSingleNode(requestDoc);
String statusRequestText = responseElem.getTextTrim();
System.out.println("RESPONSE: \n" + statusRequestText);
And here is my XML file that I am reading in:
<status>success</status>
<generatedDate>
<date>2022-09-08</date>
<time>12:03:23</time>
</generatedDate>
<filingInformation>
<paymentInformation>
<amount>0.00</amount>
</paymentInformation>
</filingInformation>
</response>
I am essentially trying to get my console to print the word "success" between the tags. But instead I am getting a null pointer. Not sure if this is because my xpath expression is incorrect or what exactly. Any input would help!
What I was doing wrong was I was calling the wrong Document object when running
Element responseElem = (Element)xpath.selectSingleNode(requestDoc);
It should have been passing in the responseDoc Document object instead of the reqestDoc Document object. Each of those objects had a different XML, and in the requestDoc, there was no element inside named <status>.

Get website name as displayed on a browser tab?

How can I get the String representation of what is displayed on a tab when opening a website in a browser? Let's say, if I opened http://www.stackoverflow.com, is it possible to extract "Stack Overflow" String, as it's shown here:
I'm interested in Java implementation - java.net.URL doesn't seem to have a method for that.
I'm interested in Java implementation - java.net.URL doesn't seem to have a method for that.
java.net.URL won't do it, no, you need an HTML parser like JSoup. Then you just take the content of the title tag in the head.
E.g., assuming you have a URL:
Document doc = Jsoup.connect(url).get();
Element titleElement = doc.select("head title").first(); // Or just "title", it's always supposed to be in the head
String title = titleElement == null ? null : titleElement.text();
Look for following pattern in reponse -
private static final Pattern TITLE_TAG = Pattern.compile("\\<title>(.*)\\</title>", Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
One more solution as parsing HTML using regex is not considered good -
javax.swing.text.html.HTMLDocument
URL url = new URL('http://yourwebsitehere.com');
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
String title = (String) htmlDoc.getProperty(HTMLDocument.TitleProperty);
System.out.println('HTMLDocument Title: ' + title);

Convert html String to org.w3c.dom.Document in Java

To convert from HTML String to
org.w3c.dom.Document
I'm using
jtidy-r938.jar
here is my code:
public static Document getDoc(String html) {
Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setWraplen(Integer.MAX_VALUE);
// tidy.setPrintBodyOnly(true);
tidy.setXmlOut(false);
tidy.setShowErrors(0);
tidy.setShowWarnings(false);
// tidy.setForceOutput(true);
tidy.setQuiet(true);
Writer out = new StringWriter();
PrintWriter dummyOut = new PrintWriter(out);
tidy.setErrout(dummyOut);
tidy.setSmartIndent(true);
ByteArrayInputStream inputStream = new ByteArrayInputStream(html.getBytes());
Document doc = tidy.parseDOM(inputStream, null);
return doc;
}
But sometime the library work incorrectly, some tag is lost.
Please tell a good open library to do this task.
Thanks very much!
You don't tell why sometimes the library doesn't give the good result.
Nevertheless, i am working very regularly with html files where I must extract data from and the main problem encountered is that fact that some tags are not valid because not closed for example.
The best solution i found to resolve is the api htmlcleaner (htmlCleaner Website).
It allows you to make your html file well formed.
Then, to transform it in document w3c or another strict format file is easier.
With HtmlCleaner, you could do such as :
HtmlCleaner cleaner = new HtmlCleaner();
TagNode node = cleaner.clean(html);
DomSerializer ser = new DomSerializer(cleaner.getProperties());
Document myW3cDoc = ser.createDOM(node);
I refer DomSerializer from htmlcleaner.

CSS parser parsing string content

I am trying to use CSS Parser in a java project to extract the CSS rules/DOM from a String of the text input.
All the examples that I have come across take the css file as input. Is there a way to bypass the file reading and work with the string content of the css file directly.
Because the class that I am working on gets only the string content of the css file and all the reading has already been taken care of.
Right now I have this, where the 'cssfile' is the filepath for css file being parsed.
InputStream stream = oParser.getClass().getResourceAsStream(cssfile);
InputSource source = new InputSource(new InputStreamReader(stream));
CSSOMParser parser = new CSSOMParser();
CSSStyleSheet stylesheet = parser.parseStyleSheet(source, null, null);
CSSRuleList ruleList = stylesheet.getCssRules();
System.out.println("Number of rules: " + ruleList.getLength());
Reference link
A workaround that I found was to create a Reader using a StringReader with the contents and set the characterStream for the Input source. But there should be a better way to do this..
InputSource inputSource = new InputSource();
Reader characterStream = new StringReader(cssContent);
inputSource.setCharacterStream(characterStream);
CSSStyleSheet stylesheet = cssParserObj.parseStyleSheet(source, null,
null);
CSSRuleList ruleList = stylesheet.getCssRules();

XPath application using tika parser

I want to clean an irregular web content - (may be html, pdf image etc) mostly html. I am using tika parser for that. But I dont know how to apply xpath as I use in html cleaner.
The code I use is,
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
URL u = new URL("http://stackoverflow.com/questions/9128696/is-there-any-way-to-reach- drop-moment-in-drag-and-drop");
new HtmlParser().parse(u.openStream(),handler, metadata, context);
System.out.println(handler.toString());
But in this case I am getting no output. But for the url- google.com I am getting output.
In either case I don't know how to apply the xpath.
Any ideas please...
Tried by making my custom xpath as how body content handler uses,
HttpClient client = new HttpClient();
GetMethod method = new GetMethod("http://stackoverflow.com/questions/9128696/is-there-any-way-to-reach-drop-moment-in-drag-and-drop");
int status = client.executeMethod(method);
HtmlParser parse = new HtmlParser();
XPathParser parser = new XPathParser("xhtml", "http://www.w3.org/1999/xhtml");
//Matcher matcher = parser.parse("/xhtml:html/xhtml:body/descendant:node()");
Matcher matcher = parser.parse("/html/body//h1");
ContentHandler textHandler = new MatchingContentHandler(new WriteOutContentHandler(), matcher);
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
parse.parse(method.getResponseBodyAsStream(), textHandler,metadata ,context);
System.out.println("content: " + textHandler.toString());
But not getting the content in the given xpath..
I'd suggest you take a look at the source code for BodyContentHandler, which comes with Tika. BodyContentHandler only returns the xml within the body tag, based on an xpath
In general though, you should use a MatchingContentHandler to wrap your chosen ContentHandler with an XPath, which is what BodyContentHandler does internally.

Categories