I'm working on web scraper and I can't solve problem I'm having for the second day in row.
The problem with this method is when the bot is supposed to visit the website, harvest all URL's, and add the ones of them it didn't visit already to List< String> "toVisit"
Problematic code:
Elements temp = userAgent.visit(currentUrl).findEvery("<a href>");
for (Element e : temp) {
String x = e.getAt("href");
if(!visited.contains(x)) {
toVisit.add(x);
}
}
However, the if statement doesn't filter (or filter it in way I didn't find out) url's and I have no idea why.
I tried delete the "!" in the statement and create an else part and paste toVisit.add(x) there, but it didn't help.
When I print every url, the bot visits the same ones two or even five times.
EDIT (visited defined)
static List<String> visited = new ArrayList<String>();
EDIT2 (whole code)
import java.util.ArrayList;
import java.util.List;
import com.jaunt.*;
public class b03 {
static String currentUrl = "https://stackoverflow.com";
static String stayAt = currentUrl;
static String searchingTerm = "";
static int toSearch = 50;
static List<String> toVisit = new ArrayList<String>();
static List<String> visited = new ArrayList<String>();
static UserAgent userAgent = new UserAgent();
public static void main(String[] args) {
System.out.println("*started searching...*");
while(visited.size() < toSearch)
visitUrl(currentUrl);
System.out.println("\n\n*done*\n\n");
}
public static void visitUrl(String url) {
visited.add(url);
evaluateUrls();
searchTerm();
toVisit.remove(0);
currentUrl = toVisit.get(0);
}
public static void searchTerm() {
//if(userAgent.doc.getTextContent().contains(searchingTerm))
System.out.println(visited.size() +") "+ currentUrl);
}
public static void evaluateUrls() {
try {
Elements temp = userAgent.visit(currentUrl).findEvery("<a href>");
for (Element e : temp) {
String x = e.getAt("href");
if(!visited.contains(x) && x.contains(stayAt)) {
toVisit.add(x);
}
}
}catch (Exception e) {
System.out.println(e);
}
}
}
Your bot visits the some urls several times because you add them several times to the toVisit list.
To illustrate this: let's assume that the first few links that your bot find on the stackoverflow site are the links to "home" (stackoverflow.com), tags (stackoverflow.com/tags), users (stackoverflow.com/users) and jobs (stackoverflow.jobs) and your bot adds three of those to the toVisit list.
Next it visits the tags page (stackoverflow.com/tags). This page contains again links to the same four urls as before. Since you didn't yet visit the users and the jobs subpage it will add those a second time to the toVisit list.
To fix this, you should only add urls to the toVisit list that are not in the visited list and not in the toVisit list:
if (!visited.contains(x) && !toVisit.contains(x) && x.contains(stayAt)) {
toVisit.add(x);
}
I can not try this code because of the jaunt lib
Split your code, make it readable.
Dont use "static" as much as possible.
Hope it helps
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import com.jaunt.*;
public class B03 {
static String currentUrl = "https://stackoverflow.com";
static String stayAt = currentUrl;
static String searchingTerm = "";
static int toSearch = 50;
static List<String> toVisit = new ArrayList<String>();
static List<String> visited = new ArrayList<String>();
static UserAgent userAgent = new UserAgent();
public static void main(String[] args) {
System.out.println("*started searching...*");
toVisit.add(currentUrl);
while(toVisit.size() > 0 && visited.size() < toSearch){
visitUrl(toVisit.get(0));
}
System.out.println("\n\n*done*\n\n");
}
public static void visitUrl(String url) {
List<String> ee = evaluateUrls(url);
searchTerm(url);
visited.add(url);
toVisit.remove(url);
toVisit.addAll(ee.stream().filter( e -> !visited.contains(e)).collect(Collectors.toList()));
toVisit.remove(0);
}
public static void searchTerm(String currentUrl) {
//if(userAgent.doc.getTextContent().contains(searchingTerm))
System.out.println(visited.size() +") "+ currentUrl);
}
public List<String> evaluateUrls(String currentUrl) {
List<String> subUrls = new ArrayList<>();
try {
Elements temp = userAgent.visit(currentUrl).findEvery("<a href>");
for (Element e : temp) {
String x = e.getAt("href");
subUrls.add(x);
}
}catch (Exception e) {
System.out.println(e);
}
return subUrls;
}
}
Related
I have data of tracks & tracklinks like folowing:
trackname - abc
links - www.abc.com
www.abc1.com
www.abc2.com
trackname - xyz
links - www.xyz.com
www.xyz1.com
www.xyz2.com
I want to make array with in array in Java. so final array would be:
trackdata = {
[0] {
[trackname] = 'abc',
[tracklinks] = {
[0] = "www.abc.com";
[1] = "www.abc1.com";
[2] = "www.abc2.com";
}
},
[1] {
[trackname] = 'xyz',
[tracklinks] = {
[0] = "www.xyz.com";
[1] = "www.xyz1.com";
[2] = "www.xyz2.com";
}
}
I have tried to make this using ArrayList, Map but not succeed.
Map<String,String> map = new HashMap<>();
map.put("trackname", "abc");
ArrayList<String> myLinks= new ArrayList<>();
myLinks.add("www.abc.com");
myLinks.add("www.abc1.com");
myLinks.add("www.abc2.com");
map.put("tracklinks", myLinks);
please help me here.
Consider using a multimap, a map whose values are list objects:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
public class Starter {
public static void main(String[] args) {
Map<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
ArrayList<String> myLinks= new ArrayList<>();
myLinks.add("www.abc.com");
myLinks.add("www.abc1.com");
myLinks.add("www.abc2.com");
map.put("abc", myLinks);
System.out.println(map); // {abc=[www.abc.com, www.abc1.com, www.abc2.com]}
}
}
You should create a class and then access the properties like you want.
class TrackData {
private String trackme;
private List<String> trackLink;
public String getTrackme() {return trackme;}
public void setTrackme(String trackme) {this.trackme = trackme;}
public List<String> getTrackLink() {return trackLink;}
public void setTrackLink(List<String> trackLink) {this.trackLink = trackLink;}
}
To access it:
#Test
void arrayInArray_Test1() {
List<TrackData> trackData = new ArrayList<>();
trackData.add(new TrackData(){{
setTrackme("abc");
setTrackLink(new ArrayList<String>(){{
add("www.abc.com");
add("www.abc1.com");
add("www.abc2.com");
}});
}});
trackData.add(new TrackData(){{
setTrackme("xyz");
setTrackLink(new ArrayList<String>(){{
add("www.xyz.com");
add("www.xyz1.com");
add("www.xyz2.com");
}});
}});
System.out.println(trackData);
}
If you are using a newer Java version, you can create a record instead of a class.
You can achieve as follows
public class TrackTest {
public static void main(String[] args) {
List<Tracks> trackList = new ArrayList<>();
Tracks track1 = new Tracks("abc");
track1.getTrackLinks().add("www.abc.com");
track1.getTrackLinks().add("www.abc1.com");
track1.getTrackLinks().add("www.abc2.com");
Tracks track2 = new Tracks("xyz");
track2.getTrackLinks().add("www.xyz.com");
track2.getTrackLinks().add("www.xyz1.com");
track2.getTrackLinks().add("www.xyz2.com");
trackList.add(track1);
trackList.add(track2);
System.out.println(trackList);
}
static class Tracks{
private String trackName;
private List<String> trackLinks;
public Tracks(String trackName) {
this.trackName = trackName;
this.trackLinks = new ArrayList<>();
}
public Tracks(String trackName, List<String> trackLinks) {
this.trackName = trackName;
this.trackLinks = trackLinks;
}
public String getTrackName() {
return trackName;
}
public List<String> getTrackLinks() {
return trackLinks;
}
#Override
public String toString() {
return "Tracks [trackName=" + trackName + ", trackLinks=" + trackLinks + "]";
}
}
}
Let me know, if you want other approach.
how are u?
Why u dont do this.
Create class named URL, for example.
public class Url(){
//atributes
String domain;
String url;
//Contructor
public class URL(String domain, String url){
this.domain = domain;
this.url = url;
}
}
In ur main.class, u can create one Arraylist to saves ur instaces of URL.
public static void newURL(){
String domain, url;
Scanner keyboard = new Scanner(System.in);
//ask domain, i will use an example.
System.out.println("What is domain of URL?");
domain = keyboard.nextLine();
System.out.println("What is url?");
url = keyboard.nextLine;
//now, u have atributes of new url
URL url = new URL(domain,url);
}
What its ur objective? It's important question.
If is for simple control in a little program, u can do this
I'd like to find the methods which changed between two arbitrary Java files.
What I've Tried
I've tried using diff (GNU diffutils 3.3) to find changes to lines in the file and diff --show-c-function connect those lines to the changed method. Unfortunately, in Java this lists the class, not the function.
I also tried git diff which seems to properly be able to find the changed function (at least as displayed on GitHub), but it doesn't always list the full signature and my files are not in the same Git repository (https://github.com/apache/commons-math/commit/34adc606601cb578486d4a019b4655c5aff607b5).
Desired Results
Input:
~/repos/d4jBugs/Math_54_buggy/src/main/java/org/apache/commons/math/dfp/Dfp.java
~/repos/d4jBugs/Math_54_fixed/src/main/java/org/apache/commons/math/dfp/Dfp.java
State of Files:
The changed methods between those two files are public double toDouble() and protected Dfp(final DfpField field, double x)
Output: (fully qualified names)
org.apache.commons.math.dfp.Dfp.toDouble()
org.apache.commons.math.dfp.Dfp(final DfpField field, double x)
Summary
Can I find the modified methods with the GNU diffutils tool or git diff and if yes, how would I do that? (Note: I'm not bound to those tools and am happy to install something else if needed.)
I used JavaParser 3.4.4, but it probably works but has not been tested with other versions.
It can be imported in Gradle with:
compile group: 'com.github.javaparser', name: 'javaparser-core', version: '3.4.4'
You can use my class like:
HashSet<String> changedMethods = MethodDiff.methodDiffInClass(
oldFileNameWithPath,
newFileNameWithPath
);
MethodDiff Source:
import com.github.javaparser.JavaParser;
import com.github.javaparser.ast.CompilationUnit;
import com.github.javaparser.ast.Node;
import com.github.javaparser.ast.body.CallableDeclaration;
import com.github.javaparser.ast.body.ClassOrInterfaceDeclaration;
import com.github.javaparser.ast.body.ConstructorDeclaration;
import com.github.javaparser.ast.body.MethodDeclaration;
import com.github.javaparser.ast.comments.Comment;
import com.github.javaparser.printer.PrettyPrinterConfiguration;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
/**
* Created by Loren Klingman on 10/19/17.
* Finds Changes Between Methods of Two Java Source Files
*/
public class MethodDiff {
private static PrettyPrinterConfiguration ppc = null;
class ClassPair {
final ClassOrInterfaceDeclaration clazz;
final String name;
ClassPair(ClassOrInterfaceDeclaration c, String n) {
clazz = c;
name = n;
}
}
public static PrettyPrinterConfiguration getPPC() {
if (ppc != null) {
return ppc;
}
PrettyPrinterConfiguration localPpc = new PrettyPrinterConfiguration();
localPpc.setColumnAlignFirstMethodChain(false);
localPpc.setColumnAlignParameters(false);
localPpc.setEndOfLineCharacter("");
localPpc.setIndent("");
localPpc.setPrintComments(false);
localPpc.setPrintJavadoc(false);
ppc = localPpc;
return ppc;
}
public static <N extends Node> List<N> getChildNodesNotInClass(Node n, Class<N> clazz) {
List<N> nodes = new ArrayList<>();
for (Node child : n.getChildNodes()) {
if (child instanceof ClassOrInterfaceDeclaration) {
// Don't go into a nested class
continue;
}
if (clazz.isInstance(child)) {
nodes.add(clazz.cast(child));
}
nodes.addAll(getChildNodesNotInClass(child, clazz));
}
return nodes;
}
private List<ClassPair> getClasses(Node n, String parents, boolean inMethod) {
List<ClassPair> pairList = new ArrayList<>();
for (Node child : n.getChildNodes()) {
if (child instanceof ClassOrInterfaceDeclaration) {
ClassOrInterfaceDeclaration c = (ClassOrInterfaceDeclaration)child;
String cName = parents+c.getNameAsString();
if (inMethod) {
System.out.println(
"WARNING: Class "+cName+" is located inside a method. We cannot predict its name at"
+ " compile time so it will not be diffed."
);
} else {
pairList.add(new ClassPair(c, cName));
pairList.addAll(getClasses(c, cName + "$", inMethod));
}
} else if (child instanceof MethodDeclaration || child instanceof ConstructorDeclaration) {
pairList.addAll(getClasses(child, parents, true));
} else {
pairList.addAll(getClasses(child, parents, inMethod));
}
}
return pairList;
}
private List<ClassPair> getClasses(String file) {
try {
CompilationUnit cu = JavaParser.parse(new File(file));
return getClasses(cu, "", false);
} catch (FileNotFoundException f) {
throw new RuntimeException("EXCEPTION: Could not find file: "+file);
}
}
public static String getSignature(String className, CallableDeclaration m) {
return className+"."+m.getSignature().asString();
}
public static HashSet<String> methodDiffInClass(String file1, String file2) {
HashSet<String> changedMethods = new HashSet<>();
HashMap<String, String> methods = new HashMap<>();
MethodDiff md = new MethodDiff();
// Load all the method and constructor values into a Hashmap from File1
List<ClassPair> cList = md.getClasses(file1);
for (ClassPair c : cList) {
List<ConstructorDeclaration> conList = getChildNodesNotInClass(c.clazz, ConstructorDeclaration.class);
List<MethodDeclaration> mList = getChildNodesNotInClass(c.clazz, MethodDeclaration.class);
for (MethodDeclaration m : mList) {
String methodSignature = getSignature(c.name, m);
if (m.getBody().isPresent()) {
methods.put(methodSignature, m.getBody().get().toString(getPPC()));
} else {
System.out.println("Warning: No Body for "+file1+" "+methodSignature);
}
}
for (ConstructorDeclaration con : conList) {
String methodSignature = getSignature(c.name, con);
methods.put(methodSignature, con.getBody().toString(getPPC()));
}
}
// Compare everything in file2 to what is in file1 and log any differences
cList = md.getClasses(file2);
for (ClassPair c : cList) {
List<ConstructorDeclaration> conList = getChildNodesNotInClass(c.clazz, ConstructorDeclaration.class);
List<MethodDeclaration> mList = getChildNodesNotInClass(c.clazz, MethodDeclaration.class);
for (MethodDeclaration m : mList) {
String methodSignature = getSignature(c.name, m);
if (m.getBody().isPresent()) {
String body1 = methods.remove(methodSignature);
String body2 = m.getBody().get().toString(getPPC());
if (body1 == null || !body1.equals(body2)) {
// Javassist doesn't add spaces for methods with 2+ parameters...
changedMethods.add(methodSignature.replace(" ", ""));
}
} else {
System.out.println("Warning: No Body for "+file2+" "+methodSignature);
}
}
for (ConstructorDeclaration con : conList) {
String methodSignature = getSignature(c.name, con);
String body1 = methods.remove(methodSignature);
String body2 = con.getBody().toString(getPPC());
if (body1 == null || !body1.equals(body2)) {
// Javassist doesn't add spaces for methods with 2+ parameters...
changedMethods.add(methodSignature.replace(" ", ""));
}
}
// Anything left in methods was only in the first set and so is "changed"
for (String method : methods.keySet()) {
// Javassist doesn't add spaces for methods with 2+ parameters...
changedMethods.add(method.replace(" ", ""));
}
}
return changedMethods;
}
private static void removeComments(Node node) {
for (Comment child : node.getAllContainedComments()) {
child.remove();
}
}
}
One of my methods is not working which I used with both map and java reflection. I am not sure is it because of reflection or any other reason but it is working in other class where I didn't use reflection.
The method findAccessors() should retrieve a value from map2. The method is defined in the class ReadEdges. This method is called by another method findmethod() which is defined in the class FindMethod.
Whenever I call the method findAccessors() by the method findmethod(), it is returning an empty Linked List instead of returning the value from map2. The classes are given below:
Class ReadEdges :
import java.io.BufferedReader;
import java.io.CharArrayReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.StringTokenizer;
import java.util.regex.Pattern;
import javax.swing.JOptionPane;
public class ReadEdges {
static DFSclass dfs = new DFSclass();
List<String> sourcenodes=new ArrayList<String>(); // source node
List<String> destinationnodes=new ArrayList<String>(); // destination node
LinkedHashSet<String> findtransitions=new LinkedHashSet<String>();
LoanApprovalSystem LS = new LoanApprovalSystem();
TestdataGeneration testdata = new TestdataGeneration();
private static final String edgePat = "([a-zA-Z]|[0-9])+(,|\\x20)([a-zA-Z]|[0-9])+";
private static final String start=dfs.getstart();
private static final String edge = dfs.getedge();
private static final String transitions=dfs.gettransitions();
public static String a;
public static String b;
public static String c;
public static String d;
private Map<String, LinkedHashSet<String>> map = new HashMap();
private Map<String, LinkedHashSet<String>> map2 = new HashMap();
public int getLineCount(String edge){
int count = edge.split("[\n|\r]").length;
//System.out.println(count);
return count;
}
public void addEdge(String node1, String node2) throws IOException{
LinkedHashSet<String> adjacent = map.get(node1);
{
if(adjacent==null) {
adjacent = new LinkedHashSet();
map.put(node1, adjacent);
}
adjacent.add(node2);
}
}
public void addedgeandAccessor(String edge, String accessor) throws IOException{
LinkedHashSet<String> adjacent2 = map2.get(edge);
{
if(adjacent2==null) {
adjacent2 = new LinkedHashSet();
map2.put(edge, adjacent2);
//System.out.println(map2);
}
adjacent2.add(accessor);
//System.out.println(map2);
}
}
public void ReadEdge(String edgeinput,String transitionsinput,String accessorinput) throws InvalidInputException
{
char[] buf = edgeinput.toCharArray();
BufferedReader br = new BufferedReader(new CharArrayReader(buf));
char[] buf2 = transitionsinput.toCharArray();
BufferedReader br2 = new BufferedReader(new CharArrayReader(buf2));
String str2 = null;
char[] buf3 = accessorinput.toCharArray();
BufferedReader br3 = new BufferedReader(new CharArrayReader(buf3));
String str3 = null;
try
{
//a string for a next edge
String str = null;
//a StringTokinizer
StringTokenizer newNodes = null;
//get edges and set edges for the graph
while((((str = br.readLine()) != null) && (str2 = br2.readLine()) != null) && ((str3 = br3.readLine()) != null))
{
c=str;
d=str2;
LinkedHashSet<String> adjacent = map.get(str);
if(adjacent==null) {
adjacent = new LinkedHashSet();
map.put(str, adjacent);
}
adjacent.add(str2);
addedgeandAccessor(str,str3);
//if the edge inputs are not in good format, throw the exception
if(!Pattern.matches(edgePat, str.trim()))
JOptionPane.showMessageDialog(null,"An invalid input '" + str + "' for an edge. Please read the notes above the forms. ");
//use a comma to separate tokens
newNodes = new StringTokenizer (str, ", ");
//get the value of source node of an edge
String src = newNodes.nextToken();
//create the source node and destination node
String srcNode = src;
String desNode = newNodes.nextToken();
a=srcNode;
b=desNode;
addEdge(srcNode, desNode);
//System.out.println(adjacent);
//findTransition(a,b);
//findAccessors(a,b);
}
//System.out.println(listoftransitions);
}
catch (IOException e) {
JOptionPane.showMessageDialog(null, "Something is Wrong!");
e.printStackTrace();
}
}
public LinkedList<String> adjacentNodes(String last) {
LinkedHashSet<String> adjacent = map.get(last);
if(adjacent==null) {
return new LinkedList();
}
return new LinkedList<String>(adjacent);
}
public LinkedList<String> findTransition(String node1, String node2) throws IOException{
LinkedHashSet<String> adjacent = map.get(node1+" "+node2);
if(adjacent==null) {
return new LinkedList();
}
findtransitions = adjacent;
return new LinkedList<String>(findtransitions);
}
public LinkedList<String> findAccessors(String node1, String node2) {
LinkedHashSet<String> adjacent = map2.get(node1+" "+node2);
if(adjacent==null) {
return new LinkedList();
}
System.out.println(adjacent);
return new LinkedList<String>(adjacent);
}
public String getsrcNode(){
return a;
}
public String getedgeline(){
return c;
}
public String gettransitionline(){
return d;
}
}
Class FindMethod :
import java.util.ArrayList;
import java.util.LinkedList;
import java.lang.reflect.*;
public class FindMethod {
ReadEdges r = new ReadEdges();
LoanApprovalSystem LS = new LoanApprovalSystem();
TestdataGeneration testdata = new TestdataGeneration();
int method1;
String method2;
boolean method3;
boolean method4;
String method5;
String m;
//returns the method name using refletion
public String getmethod(Method method){
FindMethod fm = new FindMethod();
m = method.getName();
String str = "";
str += m+"(" +fm.getparameter(method)+ ")";
// System.out.println(str);
return str;
}
//returns the parameter name of the method using refletion (i.e. (int))
public String getparameter(Method method){
String str = "";
Class<?>[] params = method.getParameterTypes();
for (int i = 0; i < params.length; i++) {
if (i > 0) {
str += ", ";
}
str += (params[i].getSimpleName());
}
return str;
}
public void findmethod(String s,String t,String transition) throws InstantiationException, IllegalAccessException, NoSuchMethodException, SecurityException, IllegalArgumentException, InvocationTargetException{
FindMethod fm = new FindMethod();
LoanApprovalSystem cls = new LoanApprovalSystem();
Class<? extends LoanApprovalSystem> c = cls.getClass();
Object obj = c.newInstance();
Method[] methods = LoanApprovalSystem.class.getMethods();
for(Method method : methods)
{
//returns the method name (i.e. Receive or Asses)
m = method.getName();
fm.getmethod(method);
if(transition.equals(fm.getmethod(method)) && (transition.equals("Receive(int)")) )
{
if(fm.getparameter(method).equals("int") )
{
//LS.Receive(testdata.TestData(s,t));
//invoking the method at runtime where m="Receive".
method = c.getMethod(m, int.class);
method.invoke(obj,testdata.TestData(s,t));
LinkedList<String> accessors= r.findAccessors(s,t);
System.out.println("A:"+accessors);
method1=LS.getamount();
System.out.println(m+"("+method1+")");
System.out.println("Amount: "+method1);
}
}
}
}
public static void main(String[] args) throws InstantiationException, IllegalAccessException, NoSuchMethodException, SecurityException, IllegalArgumentException, InvocationTargetException
{
FindMethod fm = new FindMethod();
fm.findmethod("1","2","Receive(int)");
}
}
Can anybody please tell me why my method findAccessors() is not working within the method findmethod()? Or please give me a solution of this problem.
Note: There is another class used in this program LoanApprovalSystem (). If anyone need I can give the definition of that class too.
It looks like you are calling the default constructor for ReadEdges:
'ReadEdges r = new ReadEdges();'
When you need to call your constructor that populates the maps:
'ReadEdges r = new ReadEdges(edgeinput, transitionsinput, accessorinput);'
EDIT:
The function
public void ReadEdge(String edgeinput, String transitionsinput, String accessorinput); is never being called.
You need to remove void and add as 's' to ReadEdge to make it a constructor:
public ReadEdges(String edgeinput, String transitionsinput, String accessorinput);
Then, when you instantiate ReadEdges in the FindMethod class, you need to supply the arguments.
ReadEdges r = new ReadEdges();
should be:
ReadEdges r = new ReadEdges(edgeinput, transitionsinput, accessorinput);
For more information, read about 'constructor overloading' and 'method overloading'. http://beginnersbook.com/2013/05/constructor-overloading/
I am writing a program to merge sort words in a string.
But when I run my cod it seems to be loosing some data in some places.
I tested it with the string: "hello world the cat sat on the bloody mat"
but all i get back is [bloody, cat, hello, mat]
Here is my code:
package mergeSort;
import java.util.LinkedList;
public class mergeSort
{
public static String sort(String userInput)
{
if (userInput == null)
{
return "";
}
LinkedList<String> input = toList(userInput);
String output = MergeSort(input).toString();
return output;
}
private static LinkedList<String> toList(String input)
{
LinkedList<String> output = new LinkedList<String>();
String[] array = input.split("\\s");
for (String element : array)
{
output.addFirst(element);
}
return (output);
}
private static LinkedList<String> MergeSort(LinkedList<String> inputstring)
{
LinkedList<String> sequence1 = new LinkedList<String>();
LinkedList<String> sequence2 = new LinkedList<String>();
if (inputstring.size() <= 1)
{
return inputstring;
}
for (int index = 0; index <= (inputstring.size() / 2); index++)
{
sequence1.addLast(inputstring.removeFirst());
}
while (!(inputstring.isEmpty()))
{
sequence2.addLast(inputstring.removeFirst());
}
sequence1 = MergeSort(sequence1);
sequence2 = MergeSort(sequence2);
return merge(sequence1, sequence2);
}
private static LinkedList<String> merge(LinkedList<String> sequence1,
LinkedList<String> sequence2)
{
LinkedList<String> merged = new LinkedList<String>();
while (!(sequence1.isEmpty()) && !(sequence2.isEmpty()))
{
if (sequence1.peekFirst().compareTo(sequence2.peekFirst()) < 0)
{
merged.addLast(sequence1.removeFirst());
}
else
{
merged.addLast(sequence2.removeFirst());
}
}
while (!(sequence1.isEmpty()))
{
merged.addLast(sequence1.removeFirst());
}
while (!(sequence1.isEmpty()))
{
merged.addLast(sequence2.removeFirst());
}
return (merged);
}
}
The testerclass:
package mergeSort;
public class mainTester
{
public static void main(String[] args)
{
String test = "hello world the cat sat on the bloody mat";
System.out.println(mergeSort.sort(test));
System.exit(0);
}
}
The problem is here:
while (!(sequence1.isEmpty()))
{
merged.addLast(sequence1.removeFirst());
}
while (!(sequence1.isEmpty()))
{
merged.addLast(sequence2.removeFirst());
}
In your merge function. Both loops check sequence1 for emptiness. Replace the second sequence1 with sequence2 in the loop condition and all will be well.
Summary:
My Jsoup parser works perfectly on its own, but fails to gather any values once copy-pasted into one of my Android application's AsyncTask task classes. The 2d array is returned filled with nothing but nulls.
Long version:
I have been working on an application that uses page-scraping via Jsoup to pull and display content from various blogs. I have written a few parsers so far, and all seem to work as expected. Unfortunately, my most recent parser (written for nyc-shows.brooklynvegan.com), has been having issues.
Here is the parser method itself, invoked by a main method with print statements added. Run this yourself. It works (not perfectly, but it works).
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
static String TAG_EVENT = "li.ds-entry";
static String TAG_TITLE = ".ds-entry-title";
static String TAG_LOCATION = ".location";
static String TAG_DATE_AND_TIME = ".ds-date";
static String TAG_TICKET_URL = ".ds-buy-tickets";
static String FEED_URL = "http://nyc-shows.brooklynvegan.com/";
public static void main(String[] args) throws IOException {
String values[][] = new String[50][6];
values = getFeedItems();
for (int i=0; i<values.length; i++) {
for (int j=0; j<6; j++) {
System.out.println(values[i][j]);
}
System.out.println("-----------------");
}
}
public static String[][] getFeedItems() throws IOException {
Document doc = null;
String values[][] = new String[50][6];
try{
doc = Jsoup.connect(FEED_URL).timeout(0).get();
Elements events = doc.select(TAG_EVENT);
String delimSpace = "[ ]";
int i = 0;
for (Element event : events) {
//Set event title
Element title = event.select(TAG_TITLE).first();
String titleString = title.text();
if (title != null) {
boolean isFake = checkFake(titleString);
if (!isFake) {
values[i][0] = titleString;
}
else {
continue;
}
}
//Set event date and time i guess
Element dateAndTime = event.select(TAG_DATE_AND_TIME).first();
if (dateAndTime != null) {
String[] dateAndTimeTokens = dateAndTime.text().split(delimSpace);
String date = dateAndTimeTokens[1];
String time = dateAndTimeTokens[3];
values[i][1] = date;
values[i][2] = time;
}
//Set price (tbd)
values[i][3] = "See Ticket";
//Set location
Element location = event.select(TAG_LOCATION).first();
if (location != null) {
values[i][4] = location.text();
}
//Set ticket urls
Element ticketContainer = event.select(TAG_TICKET_URL).first();
if (ticketContainer != null) {
String ticket = ticketContainer.select("a").attr("href");
values[i][5] = ticket;
}
else {
values[i][3] = "Free";
}
i++;
} //End of event loop
} //End of try clause
catch (IOException e) {
e.printStackTrace();
}
return values;
}
public static boolean checkFake(String s) {
boolean isFake = false;
String[] days = {"Today", "Tomorrow", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"};
for (int i=0; i<days.length; i++) {
if (s.contains(days[i])) {
isFake = true;
return isFake;
}
}
return isFake;
}
}
Now, here is the same exact method transported into an AsyncTask to be run in the background by my application while a loading screen is displayed.
package com.example.nylist;
import java.io.IOException;
import android.app.Activity;
import android.content.Context;
import android.os.AsyncTask;
import android.util.Log;
import android.widget.Toast;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class BVParser extends AsyncTask<Void, Void, String[][]> {
static String TAG_EVENT = "li.ds-entry";
static String TAG_TITLE = ".ds-entry-title";
static String TAG_LOCATION = ".location";
static String TAG_DATE_AND_TIME = ".ds-date";
static String TAG_TICKET_URL = ".ds-buy-tickets";
static String FEED_URL = "http://nyc-shows.brooklynvegan.com/";
Context context;
Activity activity;
public BVParser(Activity context) {
this.context = context.getApplicationContext();
this.activity = context;
}
#Override
protected void onPreExecute() {
super.onPreExecute();
Toast.makeText(context, "Fetching...", Toast.LENGTH_LONG).show();
}
#Override
protected String[][] doInBackground(Void... param) {
String values[][] = new String[50][6];
try {
values = getFeedItems();
}
catch (IOException e) {
Log.d("ASSERT", "Exception occured during doInBackground", e);
e.printStackTrace();
}
Log.d("ASSERT", ("values successfully returned by doInBackground, first title is: "+values[0][0]));
return values;
}
protected void onPostExecute(String[][] result) {
super.onPostExecute(result);
int eventCount = result.length;
Log.d("ASSERT", ("event count in onPostExecute is: "+eventCount));
ListRow[] listrow_data = new ListRow[eventCount];
ListRow temp;
for (int i=0; i<eventCount; i++) {
if (result[i] != null) {
temp = new ListRow(context, result[i][0], result[i][1], result[i][2],
result[i][3], result[i][4], result[i][5], i);
listrow_data[i] = temp;
}
}
((EventList) activity).setList(listrow_data);
}
public String[][] getFeedItems() throws IOException {
Document doc = null;
String values[][] = new String[50][6];
int i = 0;
try{
Log.d("ASSERT","Made it to try block");
doc = Jsoup.connect(FEED_URL).timeout(0).get();
Elements events = doc.select(TAG_EVENT);
Log.d("ASSERT","printing events, whatever it is: "+events);
String delimSpace = "[ ]";
//******THIS LOOP NEVER BEGINS*****//
for (Element event : events) {
Log.d("ASSERT","Made it to getFeedItems's main for loop");
//Set event title
Element title = event.select(TAG_TITLE).first();
String titleString = title.text();
Log.d("ASSERT","This title is: "+titleString);
boolean isFake = checkFake(titleString);
if (!isFake) {
values[i][0] = titleString;
}
else {
continue;
}
//Set event date and time i guess
Element dateAndTime = event.select(TAG_DATE_AND_TIME).first();
if (dateAndTime != null) {
String[] dateAndTimeTokens = dateAndTime.text().split(delimSpace);
String date = dateAndTimeTokens[1];
String time = dateAndTimeTokens[3];
values[i][1] = date;
values[i][2] = time;
}
//Set price
values[i][3] = "See Ticket";
//Set location
Element location = event.select(TAG_LOCATION).first();
if (location != null) {
values[i][4] = location.text();
}
//Set ticket urls
Element ticketContainer = event.select(TAG_TICKET_URL).first();
if (ticketContainer != null) {
String ticket = ticketContainer.select("a").attr("href");
values[i][5] = ticket;
}
else {
values[i][3] = "Free";
}
i++;
} //End of event loop
} //End of try clause
catch (IOException e) {
Log.d("ASSERT","Exception during getFeedItems");
e.printStackTrace();
}
Log.d("ASSERT","The first title in getFeedItems before returning is: "+values[0][0]);
return values;
}
private static boolean checkFake(String s) {
boolean isFake = false;
String[] days = {"Today", "Tomorrow", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"};
for (int i=0; i<days.length; i++) {
if (s.contains(days[i])) {
isFake = true;
return isFake;
}
}
return isFake;
}
}
Debugging Attempts:
I added log statements throughout the code in order to debug the problem. If you run this, you will see that the problem seems to occur somewhere within getFeedItems() itself, specifically within the "try" block. Although the log statement at the beginning of the try statement appears, the for loop that runs through events isn't running at all, because the log statement at it's beginning never prints.
Question:
Can someone explain why the loop through events doesn't begin? Is events null, and if so, why? Why is there a discrepancy between the method running on it's own and the method running within my AsyncTask? I have been tearing my hair out. The logic in this parser is almost identical to the logic in the (working) others that I have written, and yet this is returning a 2d array with nothing but nulls. I have trouble even beginning to understand where the logic error might be, and yet I just can't seem to find the typo.
PS:
If comparing this with my other parser would help, let me know and I'll post the source. Thanks in advance.