i have two parts running perfectly fine.
First I connect to the website and put my login name and password into the form.
I get the cookies and store them.
Connection.Response login = Jsoup.connect("website")
.data("name", "name")
.data("password", "password")
.method(Connection.Method.POST)
.execute();
Map<String, String> cookies = login.cookies();
this works just fine, even the next connect
Document doc1 = Jsoup.connect("website/subpages")
.cookies(cookies)
.get();
doc1 is running perfect and I can get the text with:
String pages1 = doc.toString();
but on my last request, I get Server Error 500
Document pages2 = Jsoup.connect(website/anothersubpage)
.cookies(cookies)
.get();
I guess the Problem is that the last url "website/anothersubpage" is no set Url.
Each time I login and get a new Session Key (which is the Cookie I store in the Cookies) the URL to the subpage changes.
After I thought about it, I parsed the hole pages into an String and used substring to get the new variable URL.
String newLink = text.substring(text.indexOf("Start href"),text.indexOf("End href"));
It worked, so I stored into the String newLink, the link (href"") from the website.
But now if I use the same code as before:
Document pages2 = Jsoup.connect(my parsed href"..." link)
.cookies(cookies)
.get();
I get the error 500, I tried so much stuff but I can't get it to work for 3 days now.
I am really grateful for every suggestion or tip
Related
I'm trying to connect and retrieve the page title from here. The code works fine if I remove everything after ".com" from the link. The following code does not work:
try {
Document doc = Jsoup.connect("https://news.google.com/news/local/section/geo/Sammamish,%20WA%2098075,%20United%20States/Sammamish,%20Washington?ned=us&hl=en")
.data("query", "Java")
.userAgent("Chrome")
.cookie("auth", "token")
.timeout(3000)
.post();
String title = doc.title();
Log.d("hellomate", title);
}
catch (IOException e) {
Log.d("hellomatee", e.toString());
}
If the code worked, the title returned should be "Sammamish Washington - Google News".
The error returned from the code is: "org.jsoup.HttpStatusException: HTTP error fetching URL. Status=405, URL=https://news.google.com/news/local/section/geo/Sammamish,%20WA%2098075,%20United%20States/Sammamish,%20Washington?ned=us&hl=en"
What does status 405 mean? Does Jsoup not allow the kind of url I used?
Thanks.
Status 405 is an http error code that means "Method Not allowed". You can find some documentation from microsoft on it here. As #Andreas said, you can fix this by changing .post(); to .get();.
If you look at the jsoup docs under example, it shows you how you would probably want to structure your requests:
Jsoup.connect("http://en.wikipedia.org/").get();
This question already has answers here:
Login on website with java
(2 answers)
Closed 7 years ago.
I need an app, that you would give your login credentials, and it would log in for you, download the data and display it (into a TextView for example). I can pull the data out of the source and all that, but i don't know how to do the logging in and stuff. (Well I know I should send the credentials via post request, but that's about all.)
You can try using JSOUP to connect and login using credentials. It gives you the ability to do POST and GET requests. Once logged in you can navigate to pages by providing links and cookies. The challenge will be to make sure to provide all the data necessary to successfully login.
For example:
Connection.Response res = Jsoup.connect("http://some-site.com/login.jsp")
.data("username", "somename")
.data("password", "apass")
.method(Method.POST)
.settimeout(90000)//time set for the connection 1 min
.execute();
Map<String, String> cookies = res.cookies();
Document doc = Jsoup
.connect("http:/some-site.com/home.jsp")
.cookies(cookies)
.get();
You can then work on doc object to get all the information you may need.
I am desperately trying to send an HTTP post to authenticate to twitter via JAVA but I keep getting HTTP 400 response code.
this is the HTML Code for the form that I want to use:
Sign in Remember me · Forgot password?
New to Twitter?
I am using JSOUP to try to access the class "signin" from the form at the top. I want to do that because I want to then use this code:
Element loginform = doc.getElementByClass("signin");
Elements inputElements = loginform.getElementsByTag("input");
List<NameValuePair> paramList = new ArrayList<NameValuePair>();
for (Element inputElement : inputElements) {
String key = inputElement.attr("name");
String value = inputElement.attr("value");
if (key.equals("session[username_or_email]"))
value = username;
else if (key.equals("session[password]"))
value = password;
paramList.add(new BasicNameValuePair(key, value));
}
This code will not work because the "loginform" class must be "getElemenById" instead of "getElementByClass", but the thing is, in the form, the "signin" is a class.
So my question is how do i get inputElements from a class instead of an id? I need this so that I can extract the parameters to send a valid HTTP POST to twitter so that I can authenticate an account.
ALL help is greatly appreciated
The power in JSoup is utilizing CSS selectors to handle parsing HTML. Take a look at the examples on jsoup.org.
For your specific example, try:
doc.select("input.signin")
Where the tag.class or tag#id or combine with other selectors as shown in the documentation.
In particular, this is with the website amazon.com to be specific. I am receiving a 503 error for their domain, but I can successfully parse other domains.
I am using the line
Document doc = Jsoup.connect(url).timeout(30000).get();
to connect to the URL.
You have to set a User Agent:
Document doc = Jsoup.connect(url).timeout(30000).userAgent("Mozilla/17.0").get();
(Or others; best you choose a browser user agent)
Else you'll get blocked.
Please see also: Jsoup: select(div[class=rslt prod]) returns null when it shouldn't
you can try
val ret=Jsoup.connect(url)
.userAgent("Mozilla/5.0 Chrome/26.0.1410.64 Safari/537.31")
.timeout(2*1000)
.followRedirects(true)
.maxBodySize(1024*1024*3) //3Mb Max
//.ignoreContentType(true) //for download xml, json, etc
.get()
it maybe works, maybe amazon.com need followRedirects set to true.
I want to use Jsoup to crawl a page that is only available when I signed in. I guess it means I need to sign in on one page and send cookies to another page.
I read some earlier post here and write the following code:
public static void main(String[] args) throws IOException {
Connection.Response res = Jsoup.connect("login.yahoo.com")
.data("login", "myusername", "passwd", "mypassword")
.method(Method.POST)
.execute();
Document doc=res.parse();
String sessionId = res.cookie("SESSIONID");
Document doc2 = Jsoup.connect("http://health.groups.yahoo.com/group/asthma/messages")
.cookie("SESSIONID", sessionId)
.get();
Elements Eles=doc2.getElementsByClass("message");
String content=Eles.first().text();
System.out.println(content);
My question is how I can know my cookie name (i.e. "SESSIONID") here for sending my login info? I used the .cookies() method to get all the cookies from the login page:
B
DK
YM
T
PH
Y
F
I tried them one by one but none worked. I could get sessionId from some of them, but I could not successfully get nodes from the second page, which means I didn't successfully sign in. Could anyone give me some suggestions? Many thanks!
Ive struggled with logging in to websites with jsoup also.
What i came up with was a hybrid of selenium webdriver, and jsoup.
Webdriver can remote control a browser, typically this is used for testing purposes.
For my application, it was not desirable to have the browser visible, and messing about on the screen. So I have used the "silent" webdriver: HtmlUnitDriver instead. You can instantiate this using this line of code:
HtmlUnitDriver driver = new HtmlUnitDriver(true); // true meaning javascript support (Using rhino i be leave)
Now to login to a website i use:
String baseUrl = "http://www.thesite.com";
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
driver.get(baseUrl);
driver.findElement(By.id("TextBoxUser")).clear();
driver.findElement(By.id("TextBoxUser")).sendKeys("username");
driver.findElement(By.id("TextBoxPass")).clear();
driver.findElement(By.id("TextBoxPass")).sendKeys("password");
driver.findElement(By.id("Button1")).click();
Get the page content:
String htmlContent = driver.getPageSource();
Start using jsoup:
Document document = Jsoup.parse(htmlContent);
This has worked great for me.
Steffn Otto Jensen
Have you tried to do something like this:
Connection.Response res = Jsoup.connect("https://login.yahoo.com/config/login?")
.data("login", "myusername", "passwd", "mypassword")
.method(Method.POST)
.execute();
Map<String, String> cookies = res.cookies();
Connection connection = Jsoup.connect("http://health.groups.yahoo.com/group/asthma/messages");
for (Map.Entry<String, String> cookie : cookies.entrySet()) {
connection.cookie(cookie.getKey(), cookie.getValue());
}
Document doc= connection.get();
// #code selector
// Example
// Element e=doc.select(".ygrp-grdescr").first();
// System.out.println(e.text()); // Print => This list will be for asthmatics, and anyone whose life is affected by it. Discussions include causes, problems, and treatment
I hope you this works for your problem.