I need to create web scraper utility which get web resources by URL. Then count number of provided word(s) occurrence on webpage and number of characters.
URL url = new URL(urlStr);
URLConnection connection = url.openConnection();
InputStream inputStream = connection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream,"UTF-8"));
With that I can get all text on page(and html tags) so what I do next?
Can someone help me with that? Some doc or sthg to read. I need use only JavaSE. Can't use 3d party library.
For example, you have page.html:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Login Page</title>
</head>
<body>
<div id="login" class="simple" >
<form action="login.do">
Username : <input id="username" type="text" />
Password : <input id="password" type="password" />
<input id="submit" type="submit" />
<input id="reset" type="reset" />
</form>
</div>
</body>
</html>
To parse it you can with:
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
/**
* Java Program to parse/read HTML documents from File using Jsoup library.
*/
public class HTMLParser{
public static void main(String args[]) {
// Parse HTML String using JSoup library
String HTMLSTring = "<!DOCTYPE html>"
+ "<html>"
+ "<head>"
+ "<title>JSoup Example</title>"
+ "</head>"
+ "<body>"
+ "<table><tr><td><h1>HelloWorld</h1></tr>"
+ "</table>"
+ "</body>"
+ "</html>";
Document html = Jsoup.parse(HTMLSTring);
String title = html.title();
String h1 = html.body().getElementsByTag("h1").text();
System.out.println("Input HTML String to JSoup :" + HTMLSTring);
System.out.println("After parsing, Title : " + title);
System.out.println("Afte parsing, Heading : " + h1);
// JSoup Example 2 - Reading HTML page from URL
Document doc;
try {
doc = Jsoup.connect("http://google.com/").get();
title = doc.title();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Jsoup Can read HTML page from URL, title : " + title);
// JSoup Example 3 - Parsing an HTML file in Java
//Document htmlFile = Jsoup.parse("login.html", "ISO-8859-1"); // wrong
Document htmlFile = null;
try {
htmlFile = Jsoup.parse(new File("login.html"), "ISO-8859-1");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // right
title = htmlFile.title();
Element div = htmlFile.getElementById("login");
String cssClass = div.className(); // getting class form HTML element
System.out.println("Jsoup can also parse HTML file directly");
System.out.println("title : " + title);
System.out.println("class of div tag : " + cssClass);
}
}
Output:
Input HTML String to JSoup :<!DOCTYPE html><html><head><title>JSoup Example</title></head><body><table><tr><td><h1>HelloWorld</h1></tr></table></body></html>
After parsing, Title : JSoup Example
Afte parsing, Heading : HelloWorld
Jsoup Can read HTML page from URL, title : Google
Jsoup can also parse HTML file directly
title : Login Page
class of div tag : simple
Related
I want to export data to Excel sheet
<%# page language="java" contentType="text/html; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"%>
<%#include file="connection.jsp" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Insert title here</title>
</head>
<body>
<%
String ht = (String)session.getAttribute("ht");
%>
<table border="1">
<%
pst = con.prepareStatement("select * from attendance where ht='"+ht+"'");
res = pst.executeQuery();
if(res.next())
{
String uname = res.getString(2);
%>
<b>Student Name:<%=uname%></b>
<%
String hlt = res.getString(1);
%>
<b>Hallticket:<%=hlt%></b>
<tr><th>CG</th><th>CD</th><th>MPI</th><th>HCI</th><th>WT</th><th>MPI-Lab</th><th>CT=Lab</th><th>WT-Lab</th></tr>
<%
String cg = res.getString(3);
String cd = res.getString(4);
String mpi = res.getString(5);
String hci = res.getString(6);
String wt = res.getString(7);
String mpi_lab = res.getString(8);
String ct_lab = res.getString(9);
String wt_lab = res.getString(10);
%>
<tr>
<td align="center"><%=cg%></td>
<td align="center"><%=cd%></td>
<td align="center"><%=mpi%></td>
<td align="center"><%=hci%></td>
<td align="center"><%=wt%></td>
<td align="center"><%=mpi_lab%></td>
<td align="center"><%=ct_lab%></td>
<td align="center"><%=wt_lab%></td>
</tr>
<br/><br/>
<%
}
%>
</table>
</body>
</html>
I want the data that is retrieved from database and displayed in table should be printed on excel sheet
Please can any one tell me how to do it ... :(
I used mysql databse.
Try using a servlet to do the Excel writing.
You could use it as a JSP:Include in your existing page if you wanted
to.
From the servlet you'll have to do something like this:
ServletOutputStream out = resp.getOutputStream();
resp.setContentType("application/vnd.ms-excel")
/*
* get data
*/
if (data != null) {
for (int i=0; i data.length; i++) {
String dataRow = "";
for (int j = 0; j data[0].length; j++) {
dataRow += data[i][j] + "\t";// add tab delimiter
}
out.println(dataRow);// print data
}
} else {//Bad data...
out.println("No data to report.");
}
out.flush();
Hope it helps you. :)
You must add following lines to your jsp page which you want to export to excel:
response.setContentType("application/xls");
response.setHeader("Content-Disposition", "attachment;filename=File.xls");
Or you must learn about POI
And you must change if(res.next()) with while(res.next())
I am trying to implement an ajax call to populate the options of a select drop down based on the input textfield. Any help would be appreciated on this.
This is my method which allows us to get the template for a number .
System.out.println("Getting template for " + no_nego);
//Do the database code or business logic here.
try {
Connection con;
con = null;
Class.forName("com.mysql.jdbc.Driver").newInstance();
con = DriverManager.getConnection("jdbc:mysql://localhost:8081/RSI_MANAGEMENT", "root", "user");
Statement stmt = null;
stmt = con.createStatement();
String tableName = "rsi_demande";
String sql;
sql = "select filename from " + tableName +
" Where (filename IS NOT NULL and no_negociateur=" + getNo_nego() + " ) ";
ResultSet res = null;
res = stmt.executeQuery(sql);
while (res.next()) {
listeTemplateDownload.add(res.getString(1));
}
//setListeTemplateDownload(listeTemplateDownload);
stmt.close();
} catch (Exception ex1) {
ex1.printStackTrace();
}
for (int i = 0; i < 2; i++)
System.out.println(listeTemplateDownload.get(i));
JSONArray json = new JSONArray();
json.addAll(getListeTemplateDownload());
json.toString();
System.out.printf("JSON: %s", json.toString());
return Action.SUCCESS;
}
And here is my jsp page :
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<script src="js/jquery-1.11.1.min.js"></script>
</head>
<body>
<script>
$(function() {
$("#no_nego").change(
function() {
var state = {
"no_nego": $("#no_nego").val()
};
$.ajax({
url: "readDistricts",
data: JSON.stringify(state),
dataType: 'JSON',
contentType: 'application/json',
type: 'POST',
async: true,
success: function() {
var $select = $('#listeTemplateDownload');
$select.html('');
console.log(listeTemplateDownload.size());
for (var i = 0; i < getListeTemplateDownload().size(); i++) {
$select.append(
'<option value= ' + listeTemplateDownload.get(i) + '</option>');
}
}
});
});
});
</script>
<h3>Struts 2 Dynamic Drop down List</h3>
State :
<input type="text" id="no_nego"></select> District :
<select id="listeTemplateDownload"></select>
</body>
</html>
I want that when a user finished set number, the list will be generated dynamically ...
But how can i populate select form with these data?
Solved .
The problem was the append method :
jsp :
<%# taglib prefix="s" uri="/struts-tags" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<script src="js/jquery-1.11.1.min.js"></script>
<script>
$(document).ready(function () {
$('#listeTemplateDownload').html('');
$("#no_nego").change(
function() {
var no_nego = {
"no_nego" : $("#no_nego").val()
};
$.ajax({
url : "readDistricts.action",
data : JSON.stringify(no_nego),
dataType : 'json',
contentType : 'application/json',
type : 'post',
async : true,
success : function(res) {
console.log(res.listeTemplateDownload.length);
for ( var i = 0; i < res.listeTemplateDownload.length; i++) {
$('#listeTemplateDownload').append( '<option value=' + res.listeTemplateDownload[i] + '>' + res.listeTemplateDownload[i] + '</option>');
}
}
});
});
});
</script>
</head>
<body>
<h3>Struts 2 Dynamic Drop down List</h3>
Negociateur n°:
<input type="text" id="no_nego" > Template :
<select id="listeTemplateDownload"></select>
</body>
</html>
I'm working on my first project using docx4j... My goal is to export xhtml from a webapp (ckeditor created html) into a docx, edit it in Word, then import it back into the ckeditor wysiwyg.
(*crosspost from http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts-a-lot-of-space-t1966.html#p6791?sid=78b64a02482926c4dbdbafbf50d0a914
will update when answered)
I have created an html test document with the following contents:
<html><ul><li>TEST LINE 1</li><li>TEST LINE 2</li></ul></html>
My code then creates a docx from this html like so:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent()
.addAll(xHTMLImporter.convert(new File("test.html"), null));
System.out.println(XmlUtils.marshaltoString(wordMLPackage
.getMainDocumentPart().getJaxbElement(), true, true));
wordMLPackage.save(new java.io.File("test.docx"));
My code then attempts to convert the docx BACK to html like so:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
WordprocessingMLPackage docx = WordprocessingMLPackage.load(new File("test.docx"));
AbstractHtmlExporter exporter = new HtmlExporterNG2();
OutputStream os = new java.io.FileOutputStream("test.html");
HTMLSettings htmlSettings = new HTMLSettings();
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(
os);
exporter.html(docx, result, htmlSettings);
The html returned is:
<?xml version="1.0" encoding="UTF-8"?><html xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<style>
<!--/*paged media */ div.header {display: none }div.footer {display: none } /*#media print { */#page { size: A4; margin: 10%; #top-center {content: element(header) } #bottom-center {content: element(footer) } }/*element styles*/ .del {text-decoration:line-through;color:red;} .ins {text-decoration:none;background:#c0ffc0;padding:1px;}
/* TABLE STYLES */
/* PARAGRAPH STYLES */
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
.Normal {display:block;}
/* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
-->
</style>
<script type="text/javascript">
<!--function toggleDiv(divid){if(document.getElementById(divid).style.display == 'none'){document.getElementById(divid).style.display = 'block';}else{document.getElementById(divid).style.display = 'none';}}
--></script>
</head>
<body>
<!-- userBodyTop goes here -->
<div class="document">
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 1</span>
</p>
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 2</span>
</p>
</div>
<!-- userBodyTail goes here -->
</body>
</html>
There is a lot of extra space created after each line now. Not sure why this is happening, the conversion appears to add a lot of extra white space/carriage returns.
Its not clear from your question whether you are worried about whitespace in the (X)HTML source document, or in your page as rendered (presumably in CKEditor). If the latter, then the browser and CK version may be relevant.
Whitespace may or may not be significant; try Googling 'xhtml significant whitespace' for more.
By way of background, depending on docx4j property docx4j.Convert.Out.HTML.OutputMethodXML, docx4j will use
<xsl:output method="html" encoding="utf-8" omit-xml-declaration="no" indent="no"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
or
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no" indent="no"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
Note the different in the value of #method. If you want something different, you can modify docx2html.xsl or docx2xhtml.xsl respectively.
Is there a way to convert wordMLPackage to html without all the extra stuff like:
<?xml version="1.0" encoding="UTF-8"?>
and the css?
Could it just be something simple as the original html and inline css like <html><body><div style="...."></div></body></html> ?
I've created autocomplete with Jquery UI library and try to get the text box value in java, but not getting the value instead of getting null value. Please help to get value from text box. This is the line String query = (String)request.getParameter("country"); not getting values ?
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<link rel="stylesheet" href="http://code.jquery.com/ui/1.10.3/themes/smoothness/jquery-ui.css" />
<script src="http://code.jquery.com/jquery-1.9.1.js"></script>
<script src="http://code.jquery.com/ui/1.10.3/jquery-ui.js"></script>
<style>
input {
font-size: 120%; }
</style>
</head>
<body>
<h3>Feature</h3>
<input type="text" id="country" name="country"/>
<script>
//$("#country").autocomplete("getdata.jsp");
$("#country").autocomplete({
source: "getdata.jsp",
minLength: 2,
select: function( event, ui ) {
log( ui.item ?
"Selected: " + ui.item.value + " aka " + ui.item.id :
"Nothing selected, input was " + this.value );
}
});
</script>
</body>
</html>
getdata.jsp
<%#page contentType="text/html" pageEncoding="UTF-8"%>
<%#page import="java.sql.*"%>
<%#page import="java.util.*"%>
<%
String query = (String)request.getParameter("country");
System.out.println("query"+query);
try{
String s[]=null;
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection con =DriverManager.getConnection("XXXXX");
Statement st=con.createStatement();
ResultSet rs = st.executeQuery("select name from table1 where name like '"+query+"%'");
List li = new ArrayList();
while(rs.next())
{
li.add(rs.getString(1));
}
String[] str = new String[li.size()];
Iterator it = li.iterator();
int i = 0;
while(it.hasNext())
{
String p = (String)it.next();
str[i] = p;
i++;
}
//jQuery related start
int cnt=1;
for(int j=0;j<str.length;j++)
{
if(str[j].toUpperCase().startsWith(query.toUpperCase()))
{
out.print(str[j]+"\n");
if(cnt>=5)// 5=How many results have to show while we are typing(auto suggestions)
break;
cnt++;
}
}
//jQuery related end
rs.close();
st.close();
con.close();
}
catch(Exception e){
e.printStackTrace();
}
%>
it's not a form,so don't get the value use getParameter().
source: "getdata.jsp?country="+$("#country").val(),
Part of my homework for tomorrow is to search and add entries using Java EE. If the search is not existing, an add item option will show as follow:
Supposedly, when the Stock ID is not existing, It will be transfered to the Add Item Text Field of StockID. But I have no idea how to do it. My code is as follows:
Servlet:
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html");
PrintWriter out = response.getWriter();
Item item = (Item) request.getAttribute("invenItem");
if (item != null) {
out.println("<html><title>Inventory Item</title>");
out.println("<body><h1>Inventory Item Details:</h1>");
out.println("Stock ID : " + item.getStockID() + "<br/>");
out.println("Name : " + item.getItemName() + "<br/>");
out.println("Unit Price: " + item.getUnitPrice() + "<br/>");
out.println("On Stock : " + item.getOnStock() + "<br/>");
out.println("</body>");
out.println("</html>");
} else {
RequestDispatcher rd = request.getRequestDispatcher("/DataForm.html");
rd.include(request, response);
out.println("Sorry Item not found..");
rd = request.getRequestDispatcher("AddEntry.html");
rd.include(request, response);
}
}
}
HTML:
<html>
<head>
<title>Add Entry</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<h2>Add Item:</h2>
Stock ID: <input type ="text" name ="stockId" value="???"> <br> <--how to get it?
Item Name: <input type ="text" name ="name"> <br>
Unit Price: <input type ="text" name ="unitPrice"> <br>
On Stock : <input type ="text" name ="stock"> <br><br>
<input type ="submit" value ="Add Item">
</body>
</html>
You're approaching this the wrong way. HTML belongs in JSP files, not in Servlet classes. Also, EL ${} doesn't run in plain HTML files at all, but in JSP files only. Rename your .html files to .jsp. This way EL like ${param.id} will then also work, even though you still have a XSS attack hole open.
See also:
Our JSP wiki page
Our Servlets wiki page
(please read them, they contains hello world examples which should turn on some lights in your head)
You can't use the expression language (i.e. ${param.id}) in plain HTML files. It'll only be interpreted in JSPs (files with a .jsp extension).