The row contains
row number -- name surname -- instructor name-- E
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl6429100 align=right width=28 style='height:15.0pt;
border-top:none;width:21pt'>row number</td>
<td class=xl8629100 width=19 style='border-top:none;border-left:none;
width:14pt'> </td>
<td class=xl6529100 width=137 style='border-top:none;border-left:none;
width:103pt'>name</td>
<td class=xl6529100 width=92 style='border-top:none;border-left:none;
width:69pt'>surname</td>
<td class=xl7929100 style='border-top:none;border-left:none'>instructor name</td>
<td class=xl8129100 style='border-top:none'>grade</td>
I want to retrieve only one row from this html file to control my own grade. I get the source of the html by using java but now how can I reach the row that I want? I will find the surname first. In this part of the table how can I reach the grade coloumn?
here is my code;
import java.net.*;
import java.io.*;
public class staj {
public static void main(String[] args) throws Exception {
URL staj = new URL("http://www.cs.bilkent.edu.tr/~sekreter/SummerTraining/2014G/CS399.htm");
BufferedReader in = new BufferedReader(new InputStreamReader(staj.openStream()));
String inputLine;
String grade;
while ((inputLine = in.readLine()) != null){
if(inputLine.contains(mysurname))
//grade = WHAT?
}
in.close();
}
And also, is using java efficient and appropriate for this aim? Which language would be better?
You should definitely use Jsoup library to extract what you need from HTML document - http://jsoup.org/
I've created a sample code that demonstrates an example of extracting data from the table you provided in a description: https://gist.github.com/wololock/15f511fd9d7da9770f1d
public static void main(String[] args) throws IOException {
String url = "http://www.cs.bilkent.edu.tr/~sekreter/SummerTraining/2014G/CS399.htm";
String username = "Samet";
Document document = Jsoup.connect(url).get();
Elements rows = document.select("tr:contains("+username+")");
for (Element row : rows) {
System.out.println("---------------");
System.out.printf("No: %s\n", row.select("td:eq(0)").text());
System.out.printf("Evaluator: %s\n", row.select("td:eq(4)").text());
System.out.printf("Status: %s\n", row.select("td:eq(5)").text());
}
}
Take a look on this:
document.select("tr:contains("+username+")");
Jsoup allows you to use jquery-like methods and selectors to extract data from html documents. In this example selector you extracts only those tr elements that contain given username in nested elements. When you have a list of those rows you can simply extract the data. Here we use:
row.select("td:eq(n)")
where :eq(n) means select n-th td element nested in tr. Here is the output:
---------------
No: 85
Evaluator: Buğra Gedik
Status: E
---------------
No: 105
Evaluator: Çiğdem Gündüz Demir
Status: E
Related
I have a Homepage were we want to track if the location is empty/free. The Website is provided from an external Source as Service from them.
I already did the login on the Homepage via application and when I check the whole Document doc, it shows everything and what i want to have is also included.
Right now i have tried to specify the data I want to have (UID and Status) but i dont know how to do it.
I have no idea what to choose in the,
Elements data = doc.select("a");
I have tried using div.tiles and div.tableauDeBoard but didn't work.
Following is the Expected Output:
2652 free
2653 free
and so on
I hope i did everything right, its my first time posting here.
String URL = "..." //URL in there to shorten code
Document doc = Jsoup.connect(URL).get();
Elements data = doc.select("a");
System.out.println(data.outerHTML());
<div id="tableauDeBoard" class="porlet-body" style="min-height: 40px;">
<div class="tiles">
<a href="..."
class="tile-v2 undefined popovers" data-content="Status : free<hr>time : 4h22<hr>last change : 05.11.2019 at 18:46<hr>UID : 2652<hr> Typ : P<hr>connection OK" data-html="true" data-placement="auto" data-container="body" data-trigger="hover" data-original-title="" title="">
<div class="tile-code-ville"></div>
<div class="tile-id-automate">2652</div>
<div class="tile-infos"><span class="tile-icon-transmission">
<img src="....png">
</span><span class="tile-icon-jauge"></span></div></a>
Same code 7 time with different UID/Time/Status
</div>
You can try the following way,
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class TestJsoup{
public static void main(String[] args) throws InterruptedException {
StringBuilder html = new StringBuilder();
html.append("<div id=\"tableauDeBoard\" class=\"porlet-body\" style=\"min-height: 40px;\">");
html.append(" <div class=\"tiles\">");
html.append(" <a href=\"...\"");
html.append(
"class=\"tile-v2 undefined popovers\" data-content=\"Status : free<hr>time : 4h22<hr>last change : 05.11.2019 at 18:46<hr>UID : 2652<hr> Typ : P<hr>connection OK\" data-html=\"true\" data-placement=\"auto\" data-container=\"body\" data-trigger=\"hover\" data-original-title=\"\" title=\"\">\r\n"
+ "");
html.append("<div class=\"tile-code-ville\"></div>");
html.append("<div class=\"tile-id-automate\">2652</div>\r\n");
html.append(" <div class=\"tile-infos\"><span class=\"tile-icon-transmission\">");
html.append("<img src=\"....png\">\r\n");
html.append("</span><span class=\"tile-icon-jauge\"></span></div></a>\r\n");
html.append("document.write(<style type='text/css'>div,iframe { top: 0; position:absolute; }</style>');");
html.append("</div>");
html.append("</head><body></body> </html>");
Document doc = Jsoup.parse(html.toString());
Elements allClassElements = doc.getElementsByClass("tiles"); //fetching the elements of the class "tiles"
for (Element ele : allClassElements) {
Elements links = ele.getElementsByTag("a"); // Finding the anchor tag which contains the required data
for (Element link : links) {
String str = link.attr("data-content"); // to get the status value
//Without Regex
String oo = str.substring(str.indexOf(":") + 1, str.indexOf("<hr>"));
System.out.println(link.text() + " " + oo.replaceAll("\\s+", "")); //link.text contains id value
// Using Regex
Pattern p = Pattern.compile("\\:.*?\\<");
Matcher m = p.matcher(str);
if (m.find())
System.out.println(link.text() + " "
+ m.group().subSequence(1, m.group().length() - 1).toString().replaceAll("\\s+", ""));
}
}
}
}
Console Output:
2652 free
2652 free
For more details around fetching the data using Jsoup, try visiting jsoup cookbook at
https://jsoup.org/cookbook/extracting-data/attributes-text-html
I am trying to scrape data from multiple tables on this website: http://www.national-autograss.co.uk/march.htm
I need to keep the table data together with their respective dates located in h2 so I would like a way to do the following:
Find first date header h2
Extract table data beneath h2 (can be multiple tables)
Move on to next header and extract tables etc
I have written code to extract all parts separately but I do not know how to extract the data so that it stays with the relevant date header.
Any help or guidance would be much appreciated. The code I am starting with is below but like I said all it is doing is iterating through the data.
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
Document doc = null;
try {
doc = Jsoup.connect("http://www.national-autograss.co.uk/march.htm").get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Elements elementsTable1 = doc.select("#table1");
Elements elementsTable2 = doc.select("#table2");
Elements dateElements = doc.select("h2");
for (int i = 0; i < dateElements.size(); i++) {
System.out.println(dateElements.get(i).text());
System.out.println(elementsTable1.get(i).text());
System.out.println(elementsTable2.get(i).text());
}
}
}
It seems that the values that you want are stored inside <tr>'s in a table where in every table the first child is a <h2>.
<table align="center"><col width="200"><col width="150"><col width="100"><col width="120"><col width="330"><col width="300">
<h2>Sunday 30 March</h2>
<tr id="table1">
<td><b>Club</b></td>
<td><b>Venue</b></td>
<td><b>Start Time</b></td>
<td><b>Meeting Type</b></td>
<td><b>Number of Days for Meeting</b></td>
<td><b>Notes</b></td>
</tr>
<tr id="table2">
<td>Evesham</td>
<td>Dodwell</td>
<td>11:00am</td>
<td>RO</td>
<td>Single Days Racing</td>
<td></td>
</tr>
</table>
My suggestion is that you search for all tables, when first child is a h2 you do something with the rest of its children:
Elements tables = doc.select("table");
for(Element table : tables) {
if(table.child(0).tagName().equals("h2")) {
Elements children = table.children()
}
}
Hope this helps!
EDIT : You want to remove all <col> before the <h2> as they will appear before it (did not notice this before):
for(Element element : doc.select("col"))
{
element.remove();
}
I'm creating a servlet to display a front end of a little program I've produced, In this program I have a LinkedList call Execution Queue that I place in a string builder.
public String getJobsForPrint() {
Iterator<JobRequest> it = ExecutionQueue.iterator();
StringBuilder result = new StringBuilder();
String NEW_LINE = System.getProperty("line.separator");
while (it.hasNext()) {
JobRequest temp = it.next();
result.append(this.getClass().getName()).append(" Object {").append(NEW_LINE);
result.append(" User ID: ").append(temp.getUserID());
result.append(" Start Date: ").append(temp.getStartDate());
result.append(" End Date: ").append(temp.getEndDate());
result.append(" Deadline Date: ").append(temp.getDeadDate());
result.append(" Department: ").append(temp.getDepartment());
result.append(" Project Name: ").append(temp.getProjectName());
result.append(" Project Application: ").append(temp.getProjectApplication());
result.append(" Priority: ").append(temp.getPriority());
result.append(" Cores: ").append(temp.getCores());
result.append(" Disk Space: ").append(temp.getDiskSpace());
result.append(" Analysis: ").append(temp.getAnaylsis()).append(NEW_LINE);
result.append("}");
}
return result.toString();
on my servlet side I call the string by:
protected void processExecutionQueue(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
PrintWriter out = response.getWriter();
out.println("<!DOCTYPE html>");
out.println("<html>");
out.println("<head>");
out.println("<title>Execution Queue</title>");
out.println("</head>");
out.println("<body>");
out.println("<div>");
out.println("<div style='position:absolute; top:20px; right: 20px;'><a href='/ProjectAndBackend/index.jsp'>LOGOUT</a></div>");
out.println("</div>");
out.println("<p>Queue:</p>");
out.println("Execution Queue:" + SystemServlet.getScheduler().getJobsForPrint());
out.println("</body>");
out.println("</html>");
}
So now I display strings after each other with all the data taken from the linkedlist. I want to on the webpage side, be able to take that data and put it into a table so that it looks neater and not just strings tossed onto the webpage.
How would I implement a html to show the specific elements of the string in certain aspects
So with example below I have the headers then where the data is take the element from the string and keep writing it out until all the data from the iterator displayed
<table border="1">
<tr>
<th>User ID</th>
<th>Start Date</th>
</tr>
<tr>
<td>User ID DATA</td>
<td>Start Date DATA</td>
</tr>
Or if anyone can direct me to an example as I can't find any with my current searches.
When you build the StringBuilder use HTML tags to place each in a row.
Code should be something like this
StringBuilder result = new StringBuilder();
result.append("<tr><td>").append(User ID DATA).append("</td><td>").append(Start Date DATA).append("</td></tr>");
Note:
1.You need to create the base html and table header columns inside processExecutionQueue() method.
2. Only the data row needs to be created in getJobsForPrint() method.
So when the result is passed from getJobsForPrint() it will be embedded into other HTML files.
Hope you can complete the code with this suggestion.
I am trying to store the results of my query in a string, and print them to the bottom of my JSP page by passing that string to it. Right now, the JSP page displays fine initially, but nothing is happening when I click the button to post the command. Earlier when I accessed the servlet from an html page, and printed all my output to out using a PrintWriter, I got the results to display, but they would display on a separate page.
1) Is it a good idea to store out in this way, or should I make it something different than a string?
2) How do I get the results of the query to post to the JSP page?
databaseServlet.java
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.sql.*;
#SuppressWarnings("serial")
public class databaseServlet extends HttpServlet {
private Connection conn;
private Statement statement;
public void init(ServletConfig config) throws ServletException {
try {
Class.forName(config.getInitParameter("databaseDriver"));
conn = DriverManager.getConnection(
config.getInitParameter("databaseName"),
config.getInitParameter("username"),
config.getInitParameter("password"));
statement = conn.createStatement();
}
catch (Exception e) {
e.printStackTrace();
}
}
protected void doPost (HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String out = "\n";
String query = request.getParameter("query");
if (query.toString().toLowerCase().contains("select")) {
//SELECT Queries
try {
ResultSet resultSet = statement.executeQuery(query.toString());
ResultSetMetaData metaData = resultSet.getMetaData();
int numberOfColumns = metaData.getColumnCount();
for(int i = 1; i<= numberOfColumns; i++){
out.concat(metaData.getColumnName(i));
}
out.concat("\n");
while (resultSet.next()){
for (int i = 1; i <= numberOfColumns; i++){
out.concat((String) resultSet.getObject(i));
}
out.concat("\n");
}
}
catch (Exception f) {
f.printStackTrace();
}
}
else if (query.toString().toLowerCase().contains("delete") || query.toLowerCase().contains("insert")) {
//DELETE and INSERT commands
try {
conn.prepareStatement(query.toString()).executeUpdate(query.toString());
out = "\t\t Database has been updated!";
}
catch (Exception l){
l.printStackTrace();
}
}
else {
//Not a valid response
out = "\t\t Not a valid command or query!";
}
RequestDispatcher dispatcher = request.getRequestDispatcher("/dbServlet.jsp");
dispatcher.forward(request, response);
request.setAttribute("queryResults", out);
}
}
dbServlet.jsp
<?xml version = "1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- dbServlet.html -->
<html xmlns = "http://www.w3.org/1999/xhtml">
<head>
<title>MySQL Servlet</title>
<style type="text/css">
body{background-color: green;}
</style>
</head>
<body>
<h1>This is the MySQL Servlet</h1>
<form action = "/database/database" method = "post">
<p>
<label>Enter your query and click the button to invoke a MySQL Servlet
<textarea name = "query" cols="20" rows="5"></textarea>
<input type = "submit" value = "Run MySQL Servlet" />
<input type = "reset" value = "Clear Command" />
</label>
</p>
</form>
<hr>
<%=
request.getAttribute("queryResults");
%>
</body>
</html>
dispatcher.forward(request, response);
request.setAttribute("queryResults", out);
It should be like this
request.setAttribute("queryResults", out);
dispatcher.forward(request, response);
Before the request is dispatched the attributes has to be set
1) Is it a good idea to store out in this way, or should I make it something different than a string?
Since this is tabular data, I'd use something that preserves that structure, so that the JSP can piece it apart easily for customized formatting. Bold headers, putting it in an HTML table and stuff. Either some custom bean, or maybe just a List<String[]>.
2) How do I get the results of the query to post to the JSP page?
What you are doing now (request.setAttribute) should work. However, you need to set the attribute before you forward the request.
You could then print the String you now have like this:
<%= request.getAttribute("queryResults") %>
Or if you go with a table-structure
<% List<String[]> rows = request.getAttribute("queryResults"); %>
and then loop over that.
1) Is it a good idea to store out in this way, or should I make it something different than a string?
NO. Don't mix the presentation logic in Java code. Leaverage your JSP for that purpose I would advice you to use JAVA objects and store the row wise values in one object instance. Put all the objects in a collection and use the same in JSP for display. Same goes with column names.
2) How do I get the results of the query to post to the JSP page?
In your current format of queryResults, just print the results using = operator or out.println method in your JSP as:
<hr>
<%=request.getAttribute("queryResults"); %>
or
<% out.println(request.getAttribute("queryResults"));%>
But if you decide t use collection as adviced in answer1, then get the collection back from the request, iterate and print the results, e.g. if you decide to use List<String[]> where String[] maps one row data then:
<TABLE id="results">
<% List<String> columns = (List<String>)request.getAttribute("queryColumns");
List<String[]> results = (List<String[]>)request.getAttribute("queryResults");
out.println("<TR>");
for(String columnName: columns ){
out.println("<TD>"+columnName+"</TD>");
}
out.println("</TR>");
//print data
for(String[] rowData: results){
out.println("<TR>");
for(String data: rowData){
out.println("<TD>"+data+"</TD>");
}
out.println("</TR>");
}
%>
</TABLE>
Usually we have many internal links in a file. I want to parse a html file such that i get the headings of a page and its corresponding data in a map.
Steps i did:
1) Got all the internal reference elements
2) Parsed the document for the id = XXX where XXX == (element <a href="#XXX").
3) it takes me to the <span id="XXX">little text here </span> <some tags here too ><p> actual text here </p> <p> here too </p>
4) How to go from <span> to <p> ???
5) I tried going to parent of span and thought that its one of the child is <p> too... its true. But it also involves <p> of other internal links too.
EDIT: added an sample html file portion:
<li class="toclevel-1 tocsection-1"><a href="#Enforcing_mutual_exclusion">
<span class="tocnumber">1</span> <span class="toctext">Enforcing mutual exclusion</span> </a><ul>
<li class="toclevel-2 tocsection-2"><a href="#Hardware_solutions">
<span class="tocnumber">1.1</span> <span class="toctext">Hardware solutions</span>
</a></li>
<li class="toclevel-2 tocsection-3"><a href="#Software_solutions">
<h2><span class="editsection">[<a href="/w/index.php?title=Mutual_exclusion&
amp;action=edit§ion=1" title="Edit section: Enforcing mutual exclusion">
edit</a>]</span> <span class="mw-headline" id="Enforcing_mutual_exclusion">
<comment --------------------------------------------------------------------
**see the id above = Enforcing_mutual_exclusion** which is same as first internal
link . Jsoup takes me to this span element. i want to access every <p> element after
this <span> tag before another <span> tag with id="any of the internal links"
------------------------------------------------------------------------------!>
Enforcing mutual exclusion</span></h2>
<p>There are both software and hardware solutions for enforcing mutual exclusion.
The different solutions are shown below.</p>
<h3><span class="editsection">[<a href="/w/index.php?title=Mutual_exclusion&
amp;action=edit§ion=2" title="Edit section: Hardware solutions">
edit</a>]</span> <span class="mw-headline" id="Hardware_solutions">Hardware
solutions</span></h3>
<p>On a <a href="/wiki/Uniprocessor" title="Uniprocessor" class="mw-
redirect">uniprocessor</a> system a common way to achieve mutual exclusion inside
kernels is
disable <a href="/wiki/Interrupt" title="Interrupt">
Here is my code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public final class Website {
private URL websiteURL ;
private Document httpDoc ;
LinkedHashMap<String, ArrayList<String>> internalLinks =
new LinkedHashMap<String, ArrayList<String>>();
public Website(URL __websiteURL) throws MalformedURLException, IOException, Exception{
if(__websiteURL == null)
throw new Exception();
websiteURL = __websiteURL;
httpDoc = Jsoup.parse(connect());
System.out.println("Parsed the http file to Document");
}
/* Here is my function: i first gets all the internal links in internalLinksElements.
I then get the href name of <a ..> tag so that i can search for it in documnet.
*/
public void getDataWithHeadingsTogether(){
Elements internalLinksElements;
internalLinksElements = httpDoc.select("a[href^=#]");
for(Element element : internalLinksElements){
// some inline links were bad. i only those having span as their child.
Elements spanElements = element.select("span");
if(!spanElements.isEmpty()){
System.out.println("Text(): " + element.text()); // this can not give what i want
// ok i get the href tag name that would be the id
String href = element.attr("href") ;
href = href.replace("#", "");
System.out.println(href);
// selecting the element where we have that id.
Element data = httpDoc.getElementById(href);
// got the span
if(data == null)
continue;
Elements children = new Elements();
// problem is here.
while(children.isEmpty()){
// going to its element unless gets some data.
data = data.parent();
System.out.println(data);
children = data.select("p");
}
// its giving me all the data of file. thats bad.
System.out.println(children.text());
}
}
}
/**
*
* #return String Get all the headings of the document.
* #throws MalformedURLException
* #throws IOException
*/
#SuppressWarnings("CallToThreadDumpStack")
public String connect() throws MalformedURLException, IOException{
// Is this thread safe ? url.openStream();
BufferedReader reader = null;
try{
reader = new BufferedReader( new InputStreamReader(websiteURL.openStream()));
System.out.println("Got the reader");
} catch(Exception e){
e.printStackTrace();
System.out.println("Bye");
String html = "<html><h1>Heading 1</h1><body><h2>Heading 2</h2><p>hello</p></body></html>";
return html;
}
String inputLine, result = new String();
while((inputLine = reader.readLine()) != null){
result += inputLine;
}
reader.close();
System.out.println("Made the html file");
return result;
}
/**
*
* #param argv all the command line parameters.
* #throws MalformedURLException
* #throws IOException
*/
public static void main(String[] argv) throws MalformedURLException, IOException, Exception{
System.setProperty("proxyHost", "172.16.0.3");
System.setProperty("proxyPort","8383");
System.out.println("Sending url");
// a html file or any url place here ------------------------------------
URL url = new URL("put a html file here ");
Website website = new Website(url);
System.out.println(url.toString());
System.out.println("++++++++++++++++++++++++++++++++++++++++++++++++");
website.getDataWithHeadingsTogether();
}
}
I think you need to understand that the <span>s that you are locating are children of header elements, and that the data you want to store is made up of siblings of that header.
Therefore, you need to grab the <span>'s parent, and then use nextSibling to collect nodes that are your data for that <span>. You need to stop collecting data when you run out of siblings, or you encounter another header element, because another header indicates the start of the next item's data.