JSoup parsing form with checkboxes and select input - java

I have a form which I have to read with jsoup, it contains several fields including checkboxes and comboboxes (select inputs).
I am reading there values with following code -
Element campaignForm = doc.getElementById("Campaign");
Elements allInputFields = campaignForm.getElementsByTag("input");
Elements allSelections = campaignForm.getElementsByTag("select");
Map<String, String> postData = new HashMap<String, String>();
for(Element selectField:allSelections){
postData.put(selectField.attr("name"), selectField.attr("value"));
}
for(Element inputField:allInputFields){
if(inputField.attr("type").equalsIgnoreCase("checkbox")){
postData.put(inputField.attr("name"), inputField.attr("checked").equalsIgnoreCase("checked")?"1":"0");
}else{
postData.put(inputField.attr("name"), inputField.attr("value"));
}
}
So when I print the postData Map, it gives correct values for text input fields but for checkboxes and dropdown(comboboxes) it is not working. Please let me know if there is different way to handle checkboxes and select inputs in jsoup.
EDIT:
Checkboxes I got working with help of comment, but select input still not working.
Thanks in advance.

I got it working with following code -
for(Element selectField:allSelections){
String nameField = selectField.attr("name");
String valueField = "";
Elements allOptions = selectField.getElementsByTag("option");
for(Element opt:allOptions){
if(opt.attr("selected").equalsIgnoreCase("selected")){
valueField = opt.attr("value");
break;
}
}
postData.put(nameField, valueField);
}
for(Element inputField:allInputFields){
if(inputField.attr("type").equalsIgnoreCase("checkbox")){
postData.put(inputField.attr("name"), inputField.attr("checked").equalsIgnoreCase("checked")?"1":"0");
}else{
postData.put(inputField.attr("name"), inputField.attr("value"));
}

Related

I want to get list of top 250 movies of imdb but I'm unable as it give me "{}" not whole list

I have added libraries properly there is no one error but it is not showing desired result. I have got the return type string and saved it to a variable and then set it text view. I have stuck here. Please help me.
public String TableToJson() throws JSONException {
int i=0;
String s="http://www.imdb.com/chart/top";
Document doc = Jsoup.parse(s);
JSONObject jsonParentObject = new JSONObject();
//JSONArray list = new JSONArray();
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
JSONObject jsonObject = new JSONObject();
Elements tds = row.select("td");
i++;
String no = Integer.toString(i);
String Name = tds.get(1).text();
String rating = tds.get(2).text();
jsonObject.put("Ranking", no);
jsonObject.put("Title", Name);
jsonObject.put("Rating", rating);
jsonParentObject.put(Name, jsonObject);
}
}
return jsonParentObject.toString();
}
and output is only
{}
As you can see just using a regular expression will work for you.
Sample query can be similar to this
<strong title=".*</strong>
Showing 250 matches Tested using freeformatter

Converting List<WebElement> to WebElement

I am using Appium and I want to print names of the elements in the list.
I am using following code
List<WebElement> list = getDriver().findElementsByXPath(getLocator(Locators.MY_ITEM));
List<String> strings = new ArrayList<>();
for (WebElement object : list) {
String text = object.getText();
logger.info(text);
if (!text.isEmpty())
strings.add(text);
}
But I am getting text always as empty.
What is the suggested approach over here.
Note each element is of type UIACollectionCell in case of iOS and on Android //android.widget.TextView[#text='%s']
From what I understand, you should be getting the text from the text attribute, replace:
String text = object.getText();
with:
String text = object.getAttribute("text");

how to store webtable in java collection hashmap or hashset or arraylist?

In my application on users profile page, user has:
Name: XYZ
Age: ##
Address: st.XYZ
and so on...
When an element is missing (example age) other row takes its place, so I can't hardcode the xpath of elements. What I want is:
I want to (print) extract entire table data and compare with actual.
So when I ask for "Name" as key it should give cell value infront of it as value of key.
What I tried:
I was able to get text of tr tags elements keeping td fixed. But for another user when some row is missing it fails or gives wrong value.
for (int i = 2; i < 58; i++) {
String actor_name = new WebDriverWait(driver, 30).until(ExpectedConditions
.elementToBeClickable(By.xpath(first_part+i+part_two))).getText();
System.out.print("\n"+"S.no. "+(i-1)+" "+actor_name);
try {
driver.findElement(By.xpath(first_part+i+part_two)).click();
new WebDriverWait(driver, 30).until(ExpectedConditions
.elementToBeClickable(By.partialLinkText("bio"))).click();
//driver.findElement(By.partialLinkText("bio")).click();
} catch (Exception e) {
// TODO: handle exception
System.out.println("Not a link");
}
Thread.sleep(5000);
System.out.print(" "+driver.findElement(By.xpath("//*[#id='overviewTable']/tbody/tr[3]/td[2]")).getText());
driver.get("http://www.imdb.com/title/tt2310332/fullcredits?ref_=tt_cl_sm#cast");
}
Above code works fine for top 3 actors on this page but fails for 4th because that doesn't have one row missing on bio page.
On the bio page there two columns in the table one has attribute other has its value. I want to make a collection with key value pair with key as attribute (value from left column) and its value as value from right column. So that I get the freedom of fetching the values by mentioning the attribute value.
I am using JAVA to write scripts.
Can you try out with following code and provide me with any concerns if you have any...
driver.get("http://www.imdb.com/title/tt2310332/fullcredits?ref_=tt_cl_sm#cast");
String height = "";
String actorName = "";
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
List<WebElement> lstUrls = driver.findElements(By
.xpath("//span[#itemprop='name']/..")); // all a tags
List<String> urls = new ArrayList<>();
for (WebElement webElement : lstUrls) {
urls.add(webElement.getAttribute("href")); // saving all hrefs attached in each a tag
}
Map<String, String> actorHeightData = new HashMap<String, String>();
for (String string : urls) {
driver.get(string);
actorName = driver.findElement(
By.xpath(".//*[#id='overview-top']/h1/span")).getText(); // Getting actor's name
driver.findElement(By.xpath("//a[text()='Biography']")).click(); // Clicking Biography
try {
height = driver.findElement(
By.xpath("//td[.='Height']/following-sibling::td"))
.getText(); // Getting height
} catch (NoSuchElementException nsee) {
height = ""; // If height not found
}
actorHeightData.put(actorName, height); // Adding to map
}
You can create class PersonData with all nullable fields you need. But with not null getters.
for example
calss PersonData{
private String name;
public getName(){
if(name == null)
return "";
return name;
}
}
and store all persons in a List.
In you page you will ask person for field and always have something in table's cell.

Remove White Space From Text that i scraped from website

I am trying to scrape a list of medicines from a website.
I am using JSOUP to parse the Html.
Here is my code :
URL url = new URL("http://www.medindia.net/drug-price/index.asp?alpha=a");
Document doc1 = Jsoup.parse(url, 0);
Elements rows = doc1.getElementsByAttributeValue("style", "padding-left:5px;border-right:1px solid #A5A5A5;");
for(Element row : rows){
String htm = row.text();
if(!(htm.equals("View Price")||htm.contains("Show Details"))) {
System.out.println(htm);
System.out.println();
}
}
Here is the Output that I am getting:
P.S. This is not the complete output But As I couldn't Take The Screen Shot of the complete output, I just displayed it.
I need to Know Two Things :
Question 1. Why am I getting an Extra Space In front of each Drug Name and why am I getting Extra New Line After Some Drug's Name?
Question 2. How do I resolve this Issue?
A few things:
It's not the complete output because there's more than one page. I put a for loop that fixes that for you.
You should probably trim the output using htm.trim()
You should probably make sure to not print when there's a newLine (!htm.isEmpty())
That website has a weird character with ASCII value 160 in it. I added a small fix that solves the problem. (with .replace)
Here's the fixed code:
for(char page='a'; page <= 'z'; page++) {
String urlString = String.format("http://www.medindia.net/drug-price/index.asp?alpha=%c", page);
URL url = new URL(urlString);
Document doc1 = Jsoup.parse(url, 0);
Elements rows = doc1.getElementsByAttributeValue("style", "padding-left:5px;border-right:1px solid #A5A5A5;");
for(Element row : rows){
String htm = row.text().replace((char) 160, ' ').trim();
if(!(htm.equals("View Price")||htm.contains("Show Details"))&& !htm.isEmpty())
{
System.out.println(htm.trim());
System.out.println();
}
}
}
Do one thing :
Use trim function in syso : System.out.println(htm.trim());
UPDATED :
After a lot of effort I was able to parse all 80 medicines like this :-
URL url = new URL("http://www.medindia.net/drug-price/index.asp?alpha=a");
Document doc1 = Jsoup.parse(url, 0);
Elements rows = doc1.select("td.ta13blue");
Elements rows1 = doc1.select("td.ta13black.tbold");
int cnt=0;
for(Element row : rows){
cnt++;
String htm = row.text().trim();
if(!(htm.equals("View Price")||htm.contains("Show Details") || htm.startsWith("Drug"))) {
System.out.println(cnt+" : "+htm);
System.out.println();
}
}
for(Element row1 : rows1){
cnt++;
String htm = row1.text().trim();
if(!(htm.equals("View Price")||htm.contains("Show Details") || htm.startsWith("Drug"))) {
System.out.println(cnt+" : "+htm);
System.out.println();
}
}
1) Taking elements by style is quite dangerous;
2) Calling ROWS what instead is a list of FIELDS is even more dangerous :)
3) Opening the page , you can see that the extra lines are added ONLY after "black names", name of items not wrapped in an anchor link.
You problem is then that the second field in that rows is not Show Details nor View Price and not even empty... it is:
<td bgcolor="#FFFFDB" align="center"
style="padding-left:5px;border-right:1px solid #A5A5A5;">
</td>
It is a one space string. Modify your code like this:
for(Element row : rows){
String htm = row.text().trim(); // <!-- This one
if(!
(htm.equals("View Price")
|| htm.contains("Show Details")
|| htm.equals(" ")) // <!-- And this one
) {
System.out.println(htm);
System.out.println();
}
}

Multiple Line String to separated new strings for every line

I have the following code. I am using the jsoup library to retrieve the URLs from a website; after that, I am checking if the URLs contain the keyword I want, and list them in another string. My problem is that I am not able to retrieve only one URL.
Have a look at my code:
// Get the webpage and parse it.
org.jsoup.nodes.Document doc = Jsoup.connect("http://www.examplepage").get();
// Get the anchors with href attribute.
// Or, you can use doc.select("a") to get all the anchors.
org.jsoup.select.Elements links = doc.select("a[href]");
// Iterate over all the links and process them.
for (org.jsoup.nodes.Element link : links) {
String scrapedlinks += link.attr("abs:href")+"\n" ;
String scrapedlinks3 ="";
}
String[] links2 = links.split("\n");
for (String newlink : hulklinks ) {
if (newlink("mysearchterm")) {
scrapedlinks3 +=newlink ;
String[] scrapedlines = scrapedlinks3.split("\n" );
}
}
I think it will be easier if you directly store your urls in an Arraylist:
Arraylist<String> urls = new Arraylist<String>();
for (org.jsoup.nodes.Element link : links)
urls.add(link.attr("abs:href"));
After this you can easy access them with
urls.get(i);

Categories