i have a question about Jsoup library ...
i have this little program , which download ,parse and get the title of an HTML page which is google.com .
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class HTMLParser{
public static void main(String args[]) {
// JSoup Example - Reading HTML page from URL
Document doc;
try {
doc = Jsoup.connect("http://google.com/").get();
title = doc.title();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Jsoup Can read HTML page from URL, title : "+title);
}
}
The program is working very well,BUT the problem is :
when i try to parse a file from the ip adress 192.168.1.1(i change the google.com to 192.168.1.1 which is the adress of the router):
doc = Jsoup.connect("http://192.168.1.1/").get();
it does not work and shows me the error below :
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=401, URL=http://192.168.1.1/
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
at HTMLParser.main(HTMLParser.java:43)
first i think that the problem is related to "ussername and the password" so i change the address 192.168.1.1 to Username:Password#192.168.1.1 :
doc = Jsoup.connect("http://username:password#192.168.1.1/").get();
but it does not work , the program read the entire line as an adress.
if someone have any idea please help me !! and thanks for everybody
As with saka1029, you can request the URL with authentication. Then you use Jsoup.parse(String) to get the Document object.
Or you simply use Jsoup methods to send the request and get the response:
Getting HTML Source using Jsoup of a password protected website
Jsoup connection with basic access authentication
(I usually use javax.xml.bind.DatatypeConverter.printBase64Binary for the Base64 conversion.)
thank you very much saka1029;Griddoor. i read what you suggest , and it helps very much,
for me i use this solution :
URL url = new URL("http://user:pass#domain.com/url");
URLConnection urlConnection = url.openConnection();
if (url.getUserInfo() != null) {
String basicAuth = "Basic " + new String(new Base64().encode (url.getUserInfo().getBytes()));
urlConnection.setRequestProperty("Authorization", basicAuth);
}
InputStream inputStream = urlConnection.getInputStream();
from : Connecting to remote URL which requires authentication using Java
and used this method to read the inputstream:
StringWriter writer = new StringWriter();
IOUtils.copy(inputStream, writer);
String theString = writer.toString();
from : Read/convert an InputStream to a String
then i parse the theString with Jsoup.
Related
I have WEBDAV server where documents are stored. They are available by url e.q https://my-url.net/document.docx. Now I'd like to get some document and read his content. What i have:
public void getDocumentContent() throws ExternalIntegrationException {
var client = getHttpClient();
var download = new HttpGet(doc);
try {
InputStream input = client.execute(download).getEntity().getContent();
String str = IOUtils.toString(input, StandardCharsets.UTF_8);
System.out.println(str);
} catch(IOException e) {
throw new ExternalIntegrationException("Failure download file from " + webDavPath + ". " +
"Details:" + e.getMessage(), e);
}
}
private HttpClient getHttpClient() {
var credentialsProvider = new BasicCredentialsProvider();
var credentials = new UsernamePasswordCredentials(userName, password);
credentialsProvider.setCredentials(AuthScope.ANY, credentials);
return HttpClientBuilder.create()
.setDefaultCredentialsProvider(credentialsProvider)
.build();
}
my System.out.printl (for tests) get this in the console:
X�K����nDUA*�)Y����ă�ښl 1i�J�/z,'��nV���K~ϲ��)a���m ����j0�Hu�T�9bx�<�9X�
�Q���
�Iʊ~���8��W�Z�"V0}����������>����uQwHo�� �� PK ! ��� N _rels/.rels �(� ���JA���a�}7�
ig�#��X6_�]7~
f��ˉ�ao�.b*lI�r�j)�,l0�%��b�
6�i���D�_���, � ���|u�Z^t٢yǯ;!Y,}{�C��/h> �� PK ! �d�Q� 1 word/_rels/document.xml.rels �(� ���j�0���{-;���ȹ�#��� �����$���~�
�U�>�0̀�"S�+a_݃(���vuݕ���c���T�/<�!s��Xd3�� �����?'g![�?��4���%�9���R�k6��$C�,�`&g�!/=� �� PK ! �^�� " word/document.xml�W]o�0}����y� ��"B���=T+�&�k�����wV���*�D�����s�mfW?
��k���0"�T3�6 yX��$p�*F�V��=8r5�n���Ns ��\\���{��K� �j
��[��S���|��,�)Ԧ�m�<5�*bhA �ܖנ�ע��mR�$���ٷ3m�1KwX)�w�2cu
�/����k�ga���Իۺ�⪽cgh���� 2_-�WA���`ô�x=�L�7��6�J�� ^ɶ�u:O'�cJ���2O�f:[Z���`�!�=��L,�!w��/�;��-���ٰK���<j�,��r>������/V<�B�~T�q�A����:������ZU��O7ܥx������Ͽ^h�b�^h��`���N�d�U�:��������s�r�Y��1��~��]㓿UϽ��]<��woO �F�ڟ
R�T����ߊ�9��q�Z
How can I get .docx file from URL without downloading and read document content and save it as a string or maybe List if there were more documents ??
Why is it not working for you?
Since docx is a plain text xml based format contains binary blobs in it- you can't simply print the document as a string.
Solution:
I recommend saving the file locally, and opening it as FileInputStream.
Just delete the file at the end.
If you can't save the file locally, you can convert the InputStream to FileInputStream.
Once you have the variable "input" as FileInputStream - you can use the following code:
import java.io.File;
import java.io.FileInputStream;
import java.util.List;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public void readDocxFile(FileInputStream input) {
try {
XWPFDocument document = new XWPFDocument(input);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
}
input.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I'm trying to make a program that submits a search query to Google and then opens the browser with the results.
I have managed to connect to Google but I'm stuck because I don't know how to insert the search query into the URL and submit it.
I have tried to use HtmlUnit but it doesn't seem to work.
This is the code so far:
URL url = new URL("http://google.com");
HttpURLConnection hr = (HttpURLConnection) url.openConnection();
System.out.println(hr.getResponseCode());
String str = "search from java!";
You can use the Java.net package to browse the internet. I have used an additional method to create the search query for google to replace the spaces with %20 for the URL address
public static void main(String[] args) {
URI uri= null;
String googleUrl = "https://www.google.com/search?q=";
String searchQuery = createQuery("search from Java!");
String query = googleUrl + searchQuery;
try {
uri = new URI(query);
Desktop.getDesktop().browse(uri);
} catch (IOException | URISyntaxException e) {
e.printStackTrace();
}
}
private static String createQuery(String query) {
query = query.replaceAll(" ", "%20");
return query;
}
The packages used are core java:
import java.awt.Desktop;
import java.net.URI;
import java.net.URISyntaxException;
Trying to build http://IP:4567/foldername/1234?abc=xyz. I don't know much about it but I wrote below code from searching from google:
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URL;
public class MyUrlConstruct {
public static void main(String a[]){
try {
String protocol = "http";
String host = "IP";
int port = 4567;
String path = "foldername/1234";
URL url = new URL (protocol, host, port, path);
System.out.println(url.toString()+"?");
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
}
}
I am able to build URL http://IP:port/foldername/1234?. I am stuck at query part. Please help me to move forward.
You can just pass raw spec
new URL("http://IP:4567/foldername/1234?abc=xyz");
Or you can take something like org.apache.http.client.utils.URIBuilder and build it in safe manner with proper url encoding
URIBuilder builder = new URIBuilder();
builder.setScheme("http");
builder.setHost("IP");
builder.setPath("/foldername/1234");
builder.addParameter("abc", "xyz");
URL url = builder.build().toURL();
Use OkHttp
There is a very popular library named OkHttp which has been starred 20K times on GitHub. With this library, you can build the url like below:
import okhttp3.HttpUrl;
URL url = new HttpUrl.Builder()
.scheme("http")
.host("example.com")
.port(4567)
.addPathSegments("foldername/1234")
.addQueryParameter("abc", "xyz")
.build().url();
Or you can simply parse an URL:
URL url = HttpUrl.parse("http://example.com:4567/foldername/1234?abc=xyz").url();
In general non-Java terms, a URL is a specialized type of URI. You can use the URI class (which is more modern than the venerable URL class, which has been around since Java 1.0) to create a URI more reliably, and you can convert it to a URL with the toURL method of URI:
String protocol = "http";
String host = "example.com";
int port = 4567;
String path = "/foldername/1234";
String auth = null;
String fragment = null;
URI uri = new URI(protocol, auth, host, port, path, query, fragment);
URL url = uri.toURL();
Note that the path needs to start with a slash.
If you happen to be using Spring already, I have found the org.springframework.web.util.UriComponentsBuilder to be quite nifty. Here is how you would use it in your case.
final URL myUrl = UriComponentsBuilder
.fromHttpUrl("http://IP:4567/foldername/1234?abc=xyz")
.build()
.toUri()
.toURL();
If using Spring Framework:
UriComponentsBuilder.newInstance()
.scheme(scheme)
.host(host)
.path(path)
.build()
.toUri()
.toURL();
A new UriComponentsBuilder class helps to create UriComponents
instances by providing fine-grained control over all aspects of
preparing a URI including construction, expansion from template
variables, and encoding.
Know more:
https://www.baeldung.com/spring-uricomponentsbuilder
JavaDoc:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/UriComponentsBuilder.html
If you use Android, you can use the Uri.Builder API. Example:
val uri = Uri.Builder().scheme("https").authority("s3.amazonaws.com").appendEncodedPath(bucketName).appendEncodedPath(fileName).build()
Docs:
https://developer.android.com/reference/android/net/Uri.Builder
I want to write code for login to websites with java.
Here is the code :
package login;
import java.net.*;
import java.io.*;
public class ConnectToURL {
// Variables to hold the URL object and its connection to that URL.
private static URL URLObj;
private static URLConnection connect;
public static void main(String[] args) {
try {
CookieManager cManager = new CookieManager();
CookieHandler.setDefault(cManager);
// Establish a URL and open a connection to it. Set it to output mode.
URLObj = new URL("https://accounts.google.com/ServiceLogin?service=mail&continue=https://mail.google.com/mail/#identifier");
connect = URLObj.openConnection();
connect.setDoOutput(true);
}
catch (MalformedURLException ex) {
System.out.println("The URL specified was unable to be parsed or uses an invalid protocol. Please try again.");
System.exit(1);
}
catch (Exception ex) {
System.out.println("An exception occurred. " + ex.getMessage());
System.exit(1);
}
try {
// Create a buffered writer to the URLConnection's output stream and write our forms parameters.
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(connect.getOutputStream()));
writer.write("Email=myemail#gmail.Com&Passwd=123456&submit=Login");
writer.close();
// Now establish a buffered reader to read the URLConnection's input stream.
BufferedReader reader = new BufferedReader(new InputStreamReader(connect.getInputStream()));
String lineRead = "";
// Read all available lines of data from the URL and print them to screen.
while ((lineRead = reader.readLine()) != null) {
System.out.println(lineRead);
}
reader.close();
}
catch (Exception ex) {
System.out.println("There was an error reading or writing to the URL: " + ex.getMessage());
}
}
}
I have tried this code on Facebook and Gmail but the problem is that it didn't work.
It keep telling me that the cookies is not enabled. (I have used chrome browser and they were enabled).
Is there any other ways to achieve this?
If your goal is just login to some web site, much better solution is to use Selenium Web Driver.
It has API for creating modern drivers instances, and operate with their web elements.
Code example:
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
public class Example {
public static void main(String[] args) {
// Create a new instance of the html unit driver
// Notice that the remainder of the code relies on the interface,
// not the implementation.
WebDriver driver = new HtmlUnitDriver();
// And now use this to visit Google
driver.get("http://www.google.com");
// Find the text input element by its name
WebElement element = driver.findElement(By.name("q"));
// Enter something to search for
element.sendKeys("Cheese!");
// Now submit the form. WebDriver will find the form for us from the element
element.submit();
// Check the title of the page
System.out.println("Page title is: " + driver.getTitle());
driver.quit();
}
}
Also it has solution how to manage cookies as well - Cookies
Just look at documentation how to configure driver instances and manage web elements, preferred way is to use Page Object pattern.
Update:
For getting location from web page which doesn't have id or name attributes can be done using xpath expressions, very useful for this can be firefox extensions like:
FirePath
XpathChecker.
And use concisely and short Xpath functions.
For example:
<table>
<tr>
<td>
<p>some text here 1</p>
</td>
</tr>
<tr>
<td>
<p>some text here 2</p>
</td>
</tr>
<tr>
<td>
<p>some text here 3</p>
</td>
</tr>
</table>
for getting text some text here 2 you able to use following xpath:
//tr[2]/td/p
if you know that text is static you able to use contains():
//p[contains(text(), 'some text here 2')]
For checking if your xpath is unique at this page the best is to use console.
How to do is described here How to verify an XPath expression
What exactly are you trying to do with this? You are almost certainly better off using something like Selenium web-driver for browser automation tasks, as you piggy back on the work of an existing web-browser to handle things like cookies.
In this case, you're talking about your web browser saying cookies are not enabled, but you're not actually using a web browser, you're sending a connection via your java application.
I want to login to a https website with username and password, go to one url in that website and download the page at the url (and maybe parse contents of that page). I want to do this using only core Java apis and not htmlunit, jsoup etc. I got the below code to learn how to do this, but it does not show me how to login to a website. Please tell me how I can login, maintain a session and then finally close the connection.
Source - http://www.mkyong.com/java/java-https-client-httpsurlconnection-example/
import java.net.MalformedURLException;
import java.net.URL;
import java.security.cert.Certificate;
import java.io.*;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLPeerUnverifiedException;
public class HttpsClient{
public static void main(String[] args)
{
new HttpsClient().testIt();
}
private void testIt(){
String https_url = "https://www.google.com/";
URL url;
try {
url = new URL(https_url);
HttpsURLConnection con = (HttpsURLConnection)url.openConnection();
//dumpl all cert info
print_https_cert(con);
//dump all the content
print_content(con);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
private void print_https_cert(HttpsURLConnection con){
if(con!=null){
try {
System.out.println("Response Code : " + con.getResponseCode());
System.out.println("Cipher Suite : " + con.getCipherSuite());
System.out.println("\n");
Certificate[] certs = con.getServerCertificates();
for(Certificate cert : certs){
System.out.println("Cert Type : " + cert.getType());
System.out.println("Cert Hash Code : " + cert.hashCode());
System.out.println("Cert Public Key Algorithm : "
+ cert.getPublicKey().getAlgorithm());
System.out.println("Cert Public Key Format : "
+ cert.getPublicKey().getFormat());
System.out.println("\n");
}
} catch (SSLPeerUnverifiedException e) {
e.printStackTrace();
} catch (IOException e){
e.printStackTrace();
}
}
}
private void print_content(HttpsURLConnection con){
if(con!=null){
try {
System.out.println("****** Content of the URL ********");
BufferedReader br =
new BufferedReader(
new InputStreamReader(con.getInputStream()));
String input;
while ((input = br.readLine()) != null){
System.out.println(input);
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Every website manages logins differently. You will need to scout the website, find out how the session is maintained, and mimic the functions in such a way that the server can't tell that it is not a browser.
In general, a web server stores a secret hash in the cookie. Here is the process
Post a login and password to said url using HttpsURLConnection to send the form.
The server responds with a hash in a header that it wants stored in the cookie. Usually has session in the name.
Send requests back with the hash in the header in the correct value
All of the above can be done only using URL and HttpsURLConnection, but you will need to mimic a browser exactly to trick the server.
For scouting, I would recommend using a tool like fiddler. It captures all communication from the webserver and back, so that you can see exactly what is going on at the http level to mimic in your java code.
Here is an overview of fiddler. I have never looked at the logs. Fiddler has a sweet interface. The video is really boring, but it gives an overview of the interface. You want to look at the raw text view, and mimic that.
For your other question, owasp is a great resource for best practices. The reality is that there is a lot of insecure and bad code out there that does stuff that you would never expect. I have seen a server put the boolean value inside of a script tag to be stored as a javascript variable. You just have to carefully watch how the server changes the responses after you log in. For a popular website following best practices, they will use the above method.