JSOUP parsing for multiple rows

JSOUP parsing for multiple rows - java

I am trying to parse information from a particular website using JSOUP.
So far I can parse and display a single row, as the website has a lot of html and I am quite new to this I was wondering is there a way to parse all table rows on the page containing the word "fixturerow".
Here is my parser code:
Document doc =Jsoup.connect("http://www.irishrugby.ie/club/ulsterbankleagueandcup/fixtures.php").get();
Elements kelime = doc.select("tr#fixturerow0");
for(Element sectd:kelime){
Elements tds = sectd.select("td");
String result = tds.get(0).text();
String result1 = tds.get(1).text();
String result2 = tds.get(2).text();
String result3 = tds.get(3).text();
String result4 = tds.get(4).text();
String result5 = tds.get(5).text();
String result6 = tds.get(6).text();
String result7 = tds.get(7).text();
System.out.println("Date: " + result);
System.out.println("Time: " + result1);
System.out.println("League: " + result2);
System.out.println("Home Team: " + result3);
System.out.println("Score: " + result4);
System.out.println("Away Team: " + result5);
System.out.println("Venue: " + result6);
System.out.println("Ref: " + result7);
}`
Thanks for your time!

You can use the ^= (starts-with) selector:
Elements kelime = doc.select("tr[id^=fixturerow]");
This will return all elements with an id that starts with fixturerow.

You may have better luck if you use a selector that looks for id's that start-with the text of interest. So try changing
Elements kelime = doc.select("tr#fixturerow0");
to
Elements kelime = doc.select("tr[id^=fixturerow]");
Where ^= means that the text of interest starts with the text that follows.

Related

Hazelcast Count mechanism

I was trying to use hazelcast aggregation to perform the count operations.
Example:-
Here I'm looking to count number of salary1 fields present in the json.
String json1 = "{\r\n" + " \"salary\": 200\r\n" + "}";
String json2 = "{\r\n" + " \"salary\": 300\r\n" + "}";
String json5 = "{\r\n" + " \"salary1\": 300\r\n" + "}";
map.put(1, new HazelcastJsonValue(json1));
map.put(2, new HazelcastJsonValue(json2));
map.put(3, new HazelcastJsonValue(json5));
Long count = map.aggregate(Aggregators.count("salary1"));
System.out.println("count is " + count);
I have only one salary1 field but its still giving the full count.
what is the issue?

I think you need to use Predicate to filter first the entries you count. Try the following.
Predicate p = Predicates.notEqual("salary1", null);
Long count = map.aggregate(Aggregators.count(), p);

Splitting array in string does not give last element

Hi I am splitting and storing string with use of array but does not give result
String str = "123456";
String[] arrOfStr = str.split("");
String otpnum1 = arrOfStr[0];
String otpnum2 = arrOfStr[1];
String otpnum3 = arrOfStr[2];
String otpnum4 = arrOfStr[3];
String otpnum5 = arrOfStr[4];
String otpnum6 = arrOfStr[5];
System.out.println("otp"+otpnum1+otpnum2+otpnum3+otpnum4+otpnum5+otpnum6);
OUTPUT
System.out: otp12345

You are printing without any space or newline, which is the reason you are not able to interpret individual variables. Use this
System.out.println("otp " + otpnum1+ " " + otpnum2+" " + " "+ otpnum3+ " " + otpnum4+ " " + otpnum5+ " " + otpnum6);

I understand, the output is 12345, and expected 123456 for the result.
But, looking your code looks like correct.
I have try your code here, for test, and works fine.
The output was: otp123456

How to replace all domains with pattern in a XML string in Java?

I have an XML output like this (<xml> element or xlink:href attribute are just fiction and you cannot rely on them to create regex pattern.)
<xml>http://localhost:8080/def/abc/xyx</xml>
<element xlink:href="http://localhostABCDEF/def/ABC/XYZ">Some Text</element>
...
What I want to do is using Java regex to replace the domain pattern (I don't know about existing domains):
"http(s)?://.*/def/.*
with an input domain (e.g: http://google.com/def) and the result will be:
<xml>http://google.com/def/abc/xyx</xml>
<element xlink:href="http://google.com.com/def/ABC/XYZ">Some Text</element>
...
How can I do it? I think Regex in Java can do or String.replaceAll (but this one seems not possible).

Regex: http[s]?:\/{2}.+\/def Substitution: http://google.com/def
Details:
? Matches between zero and one times
[] Match a single character present in the list
. Matches any character
+ Matches between one and unlimited times
Java code:
String domain = "http://google.com/def";
String html = "<xml>http://localhost:8080/def/abc/xyx</xml>\r\n<element xlink:href=\"http://localhostABCDEF/def/ABC/XYZ\">Some Text</element>";
html = html.replaceAll("http[s]?:\\/{2}.+\\/def", domain);
System.out.print(html);
Output:
<xml>http://google.com/def/abc/xyx</xml>
<element xlink:href="http://google.com/def/ABC/XYZ">Some Text</element>

Actually, this could be done with Regex and it is simple enough than parsing XML document. Here is the answer:
String text = "<epsg:CommonMetaData>\n"
+ " <epsg:type>geographic 2D</epsg:type>\n"
+ " <epsg:informationSource>EPSG. See 3D CRS for original information source.</epsg:informationSource>\n"
+ " <epsg:revisionDate>2007-08-27</epsg:revisionDate>\n"
+ " <epsg:changes>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2002.151\"/>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2003.370\"/>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2006.810\"/>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2007.079\"/>\n"
+ " </epsg:changes>\n"
+ " <epsg:show>true</epsg:show>\n"
+ " <epsg:isDeprecated>false</epsg:isDeprecated>\n"
+ " </epsg:CommonMetaData>\n"
+ " </gml:metaDataProperty>\n"
+ " <gml:metaDataProperty>\n"
+ " <epsg:CRSMetaData>\n"
+ " <epsg:projectionConversion xlink:href=\"http://www.opengis.net/def/coordinateOperation/EPSG/0/15593\"/>\n"
+ " <epsg:sourceGeographicCRS xlink:href=\"http://www.opengis.net/def/crs/EPSG/0/4979\"/>\n"
+ " </epsg:CRSMetaData>\n"
+ " </gml:metaDataProperty>"
+ "<gml:identifier codeSpace=\"OGP\">http://www.opengis.net/def/area/EPSG/0/1262</gml:identifier>";
String patternString1 = "(http(s)?://.*/def/.*)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
String prefixDomain = "http://localhost:8080/def";
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String url = prefixDomain + matcher.group(1).split("def")[1];
matcher.appendReplacement(sb, url);
System.out.println(url);
}
matcher.appendTail(sb);
System.out.println(sb.toString());
which returns output https://www.diffchecker.com/CyJ8fY8p

How to generate dynamic table in a java class from a list

I am working in a application where i have to make an html table in the java class and have to save that in database.I am creating that table in java but the how to generate dynamic row in that table .Using 3 lists.I am giving what i have done,
+"Interview LineUp"
+" <table border ='1'>"
+"<tr>"
+"<td>Interviewe</td>"
+"<td>Timing1</td>"
+"<td>Timing2</td> "
+"</tr> "
+"<tr>"
+"<td>name</td>"
+"<td>timing1</td> "
+"<td>timing2</td> "
+"</tr> "
+"</table>"
So this is the table i am using in the java class,and i have 3 lists which contains 3 set of information like name,timing1,timing2.Now i want that if there are 3 values in all the lists then 3 rows will be generating.
The lists are
List<String> interviewTimingToFrom1 = Arrays.asList(interviewTime1.split(","));
List<String> interviewTimingToFrom2 = Arrays.asList(interviewTime2.split(","));
List<String> listOfinterviewerName = Arrays.asList(intervierwName.split(","));
Like i am doing this
+"<tr>";
for(int k=0;k<listOfinterviewerName .size();k++){
+"<td>listOfinterviewerName .get(k)</td>"
+}
How to do that,in that java class ?? somebody please help .Thanks in advance

+"test" is not a valid Java statement. What are you adding the text to?
When building a String incrementally, you should always use a StringBuilder.
List<String> interviewTimingToFrom1 = Arrays.asList(interviewTime1.split(","));
List<String> interviewTimingToFrom2 = Arrays.asList(interviewTime2.split(","));
List<String> listOfinterviewerName = Arrays.asList(intervierwName.split(","));
StringBuilder buf = new StringBuilder();
buf.append("<html>" +
"<body>" +
"<table>" +
"<tr>" +
"<th>Interviewe</th>" +
"<th>Timing1</th>" +
"<th>Timing2</th>" +
"</tr>");
for (int i = 0; i < listOfinterviewerName.size(); i++) {
buf.append("<tr><td>")
.append(listOfinterviewerName.get(i))
.append("</td><td>")
.append(interviewTimingToFrom1.get(i))
.append("</td><td>")
.append(interviewTimingToFrom2.get(i))
.append("</td></tr>");
}
buf.append("</table>" +
"</body>" +
"</html>");
String html = buf.toString();
Of course, to guard against Cross-site scripting (XSS) attacks, you should escape the values.

I was not so clear about your question, but as per what I understood, the code below will work for you.
public static void main(String[] args) {
String s = ""+"Interview LineUp"
+" <table border ='1'>"
+"<tr>"
+"<td>Interviewe</td>"
+"<td>Timing1</td>"
+"<td>Timing2</td> "
+"</tr> "
;
String interviewTime1="11:30,12:30";
String interviewTime2="13:30,15:00";
String intervierwName="Adam,Smith";
List<String> interviewTimingToFrom1 = Arrays.asList(interviewTime1.split(","));
List<String> interviewTimingToFrom2 = Arrays.asList(interviewTime2.split(","));
List<String> listOfinterviewerName = Arrays.asList(intervierwName.split(","));
for(int i=0;i<interviewTimingToFrom1.size();i++)
{
s = s.concat( "<tr>"
+"<td>"+listOfinterviewerName.get(i)+"</td>"
+"<td>"+interviewTimingToFrom1.get(i)+"</td> "
+"<td>"+interviewTimingToFrom2.get(i)+"</td> "
+"</tr> ");
}
s=s.concat( "</table>");
System.out.println(s);
}

Search Function in HTML

How can I search text in HTMLDocument and then return the index and last index of that word/sentence but ignoring tags when searching..
Searching: stackoverflow
html: <p class="red">stack<b>overflow</b></p>
this should return index 15 and 31.
Just like in browsers when searching in webpages.

If you want to do that in Java, here are rough example using Jsoup. But of course you should implement the detail so that the code can parse properly for any given html.
String html = "<html><head><title>First parse</title></head>"
+ "<body><p class=\"red\">stack<b>overflow</b></p></body></html>";
String search = "stackoverflow";
Document doc = Jsoup.parse(html);
String pPlainText = doc.body().getElementsByTag("p").first().text(); // will return stackoverflow
if(search.matches(pPlainText)){
System.out.println("text found in html");
String pElementString = doc.body().html(); // this will return <p class="red">stack<b>overflow</b></p></body>
String firstWord = doc.body().getElementsByTag("p").first().ownText(); // "stack"
String secondWord = doc.body().getElementsByTag("p").first().children().first().ownText(); // "overflow"
//search the text in pElementString
int start = pElementString.indexOf(firstWord); // 15
int end = pElementString.lastIndexOf(secondWord) + secondWord.length(); // 31
System.out.println(start + " >> " + end);
}else{
System.out.println("cannot find searched text");
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JSOUP parsing for multiple rows - java

You can use the ^= (starts-with) selector: Elements kelime = doc.select("tr[id^=fixturerow]"); This will return all elements with an id that starts with fixturerow.

You may have better luck if you use a selector that looks for id's that start-with the text of interest. So try changing Elements kelime = doc.select("tr#fixturerow0"); to Elements kelime = doc.select("tr[id^=fixturerow]"); Where ^= means that the text of interest starts with the text that follows.

Related

Hazelcast Count mechanism

Splitting array in string does not give last element

How to replace all domains with pattern in a XML string in Java?

How to generate dynamic table in a java class from a list

Search Function in HTML

Categories

Resources