I am currently working on a Java program that crawls a webpage and prints out some information from it.
There is one part that I can't figure out, and thats when I try to print out one specific String Array with some information in it, all it gives me is " ] " for that line. However, a few lines before, I also try printing out another String array in the exact same way and it prints out fine. When I test what is actually being passed to the "categories" variable, its the correct information and can be printed out there.
public class Crawler {
private Document htmlDocument;
String [] keywords, categories;
public void printData(String urlToCrawl)
{
nextURL=urlToCrawl;
crawl();
//This does what its supposed to do. (Print Statement 1)
System.out.print("Keywords: ");
for (String i :keywords) {System.out.print(i+", ");}
//This doesnt. (Print Statement 2)
System.out.print("Categories: ");
for (String b :categories) {System.out.print(b+", ");}
}
public void crawl()
{
//Gather Data
//open up JSOUP for HTTP parsing.
Connection connection = Jsoup.connect(nextURL).userAgent(USER_AGENT);
Document htmlDocument = connection.get();
this.htmlDocument=htmlDocument;
System.out.println("Recieved Webpage "+ nextURL);
int guacCounter = 0;
for(Element guac : htmlDocument.select("script"))
{
if(guacCounter==5)
{
//String concentratedGuac = guac.toString();
String[] items = guac.toString().split("\\n");
categories = processGuac(items);
break;
}
else if(guacCounter<5) {
guacCounter++;
}
}
}
public String[] processKeywords(String totalKeywords)
{
String [] separatedKeywords = totalKeywords.split(",");
//System.out.println(separatedKeywords.toString());
return separatedKeywords;
}
public String[] processGuac(String[] inputGuac)
{
int categoryIsOnLine = 6;
String categoryData = inputGuac[categoryIsOnLine-1];
categoryData = categoryData.replace(",","");
categoryData = categoryData.replace("'","");
categoryData = categoryData.replace("|",",");
categoryData = categoryData.split(":")[1];
//this prints out the list of categories in string form.(Print Statement 3)
System.out.println("Testing here: " + categoryData.toString());
String [] categoryList=categoryData.split(",");
//This prints out the list of categories in array form correctly.(Print statement 4)
System.out.println("Testing here too: " );
for(String a : categoryList) {System.out.println(a);}
return categoryList;
}
}
I cut out a lot of the irrelevant parts of my code so there might be some missing variables.
Here is what my printouts look like:
PS1:
Keywords: What makes a good friend, making friends, signs of a good friend, supporting friends, conflict management,
PS2:
]
PS3:
Testing here: wellbeing,friends-and-family,friendships
PS4:
Testing here too:
wellbeing
friends-and-family
friendships
Related
I have the following code which, by means of a keyboard input, gives me the start and arrival .. the start is determined according to the "da" proposition, while the arrival determines it according to the preposition "a" so I'm fighting now is: I want to get the start and the arrival even if I change the order of the propositions .. you know how I could proceed ..
this is the OUTPUT I get :
I want to go from ostuni to trapani
Partenza :ostuni
Arrivo :trapani
but if I wrote like this:
I want to go to ostuni by trapani
I would like to print the same start and finish correctly ..that is
Patenza :trapani
Arrivo :ostuni
Is this processing possible?
thanks a lot for the attention! Good day
package eubot.controller;
import eubot.intent.Intent;
public class EubotEngine {
public Intent getIntent(String stringInput) {
String str1 = "";
String str2 = "";
Intent dictionary = null;
for (String str3 : Intent.keyWord) {
if (stringInput.contains(str3)) {
//System.out.println("La stringa contiene : " + str3);
int indice1 = stringInput.indexOf(str3) + str3.length();
String splittable =
stringInput.substring(indice1,stringInput.length()).trim();
String splittable2[] = splittable.split(" ");
int index = 0;
for (String str : splittable2) {
str = splittable2[index +1];
str1 = str;
System.out.println("Partenza :" + str1);
break;
}
String splittable3[] = splittable.split(" ");
for(String str : splittable3) {
str = splittable3[index + 3];
str2 = str;
System.out.println("Arrivo :" + str2);
break;
}
index++;
dictionary = new Intent();
dictionary.setTesto(stringInput);
}
}
return dictionary;
}
}
package eustema.eubot.intent;
public class Intent {
public String testo;
public String getTesto() {
return testo;
}
public void setTesto(String testo) {
this.testo = testo;
}
public static String[] keyWord = { "devo andare", "voglio andare", "vorrei andare", "devo recarmi"};
public static String[] parameter = { "bari", "roma", "milano","pisa","firenze","napoli","como","torino" };
}
package eustema.eubot.main;
import java.util.Scanner;
import eustema.eubot.controller.*;
import eustema.eubot.intent.*;
public class Test {
public static void main(String[] args) {
System.out.println("<<-|-|-|-|-|-|-|-|-|<<<BENVENUTO IN EuBoT>>>|-|-|-|-|-|-|-|-|->>");
EubotEngine controller = new EubotEngine();
Scanner input = new Scanner(System.in);
String string;
while (true) {
string = input.nextLine();
Intent intent = controller.getIntent(string);
}
}
}
I know this will not be considered a good answer:)
This is non-trivial to solve by means of imperative programming. The reason is there are many forms in which one can express the same intent. Things like filler words, synonyms, inversions and in general things you did not think about could disrupt your algorithm.
Of course it depends on the level of accuracy you want to achieve. If you are happy that this will not work for all cases, you could always put in conditions like:
if (arr[index-1] == "from") setStart(arr[index]);
if (arr[index-1] == "to") setDestination(arr[index]);
Google, Amazon and Apple are battling to improve this sort of human-computer interaction, but they are using a more mathematical/statistical approach through machine learning.
So, if you're looking for state of the art:
Main search terms: context-free grammars.
Other key words: Markov models, Information extraction, vector space models, tf-idf
I am working on a school assignment that required us to use SQL statements in Java code as well as use the LIKE operator for a search. In order to properly search I have to get a string from the user, and split the string by any delimiter, and then run the query like so:
SELECT * FROM movies WHERE (movies.title LIKE '%userInput%');
I then return this query in the form of an ArrayList.
Now, when I was testing it out. I originally tested it with no user input, and my query became: SELECT * FROM movies WHERE (movies.title LIKE '%%');. This gave me the correct results.
However when I put a title in there, all of the sudden I get a NullPointerException on this line:
if(title.equals("")) { return "(movies.title LIKE '%%') "; from this section of my code:
public String getSearchString(String title) {
if(title.equals("")) { return "(movies.title LIKE '%%') "; }
String ret = "(";
ArrayList<String> titleArray = Util.splitSearch(title);
for(int i = 0; i < titleArray.size() - 1; ++i) {
String temp = titleArray.get(i);
String stmt = "movies.title LIKE '%" + temp + "%' OR ";
ret += stmt;
}
String temp = "movies.title LIKE '%" + titleArray.get(titleArray.size() - 1) + "%')";
ret += temp;
return ret;
}
This is then called like so:
public List<Movie> listMovies(String title) throws SQLException {
List<Movie> search = new ArrayList<Movie>();
if(null != title && title.isEmpty()) { title = ""; }
ResultSet res = queryMovies(getSearchString(title));
while(res.next()) {
Movie mov = new Movie();
mov.setTitle(res.getString("title"));
search.add(mov);
}
return search;
}
private static queryMovies(String st) throws SQLException {
ResultSet res = null;
try {
PreparedStatement ps = dbcon.prepareStatement(st);
res = ps.executeQuery();
} catch(SQLException e) {
e.printStackTrace();
}
return res;
}
I unfortunately have to do this since I won't know how much a user will enter. And I am also not allowed to use external libraries that make the formatting easier. For reference my Util.splitSearch(...) method looks like this. It should be retrieving anything that is a alphanumeric character and should be splitting on anything that is not alphanumeric:
public static ArrayList<String> splitSearch(String str) {
String[] strArray = str.split("[^a-zA-Z0-9']");
return new ArrayList(Arrays.asList(strArray));
}
What is interesting is when I pass in getSearchString(""); explicitly, I do not get a NullPointerException. It is only when I allows the variable title to be used do I get one. And I still get one when no string is entered.
Am I splitting the String wrong? Am I somehow giving SQL the wrong statement? Any help would be appreciated, as I am very new to this.
the "title" which is passed from input is null, hence you're getting nullpointerexception when you do title.equals("").
Best practices suggest you do a null check like (null != title && title.equals("")).
You can also do "".equals(title)
I am new to MVC and following this link I have a search page for resulting pdf metadatas by using Solr. My if statement and for loop in html side do not work
Searching.java in models folder:
public class Searching {
public String q;
public String outputTitle;
public String outputAuthor;
public String outputContent;
public String outputPage;
public String outputPath;
}
search function in Application.java:
final static Form<Searching> searchForm = form(Searching.class);
final static List<Searching> searchList = new ArrayList<Searching>();
public static Result search() {
Form<Searching> filledForm = searchForm.bindFromRequest();
Searching searched = filledForm.get();
....(database connection lines)
QueryResponse response = solr.query(query);
SolrDocumentList results = response.getResults();
if(results.isEmpty())
System.out.println("SEARCH NOT FOUND");
else {
for (int i = 0; i < results.size(); ++i) {
searched.outputTitle = (String)results.get(i).getFirstValue("title");
searched.outputAuthor = (String)results.get(i).getFirstValue("author");
searched.outputPage =results.get(i).getFirstValue("pageNumber").toString();
searched.outputContent = (String)results.get(i).getFirstValue("content");
searched.outputPath = (String)results.get(i).getFirstValue("path");
searchList.add(searched);
}
System.out.println("\nresults.getNumFound(): "+ searched.outputFound);
System.out.println("results.size(): "+results.size());
}
return play.mvc.Results.ok(search.render(searched, searchForm, searchList));
}
search.scala.html
#(searched: Searching, searchForm: Form[Searching], searchList: List[Searching])
.. some buttons,a search bar...
#if(searchList.isEmpty()) {
<h1>Error</h1>
} else {
#for(search <- searchList) {
<ul>Title: #search.outputTitle</ul>
<ul>Author: #search.outputAuthor <a href="#search.outputPath" download>Download PDF</a></ul>
<ul>Number of Page(s): #search.outputPage</ul>
}
}
Java code works well. I can see outputs on the terminal, but my html side has problem and it shows one book many times according to size of searchList
I am posting the answer explicitly even though I was able to help the OP in the chat - maybe somebody else is running into such problem but did not check the chat:
The problem is even though you have the line in the for-loop you are still using the same searched variable. What you have to do is to reinitialize the variable when entering the loop. Something like:
for (...) {
searched = new Searching();
searched.outputTitle = (String)results.get(i).getFirstValue("title");
....
searchList.add(searched);
}
This solves the problem with the duplicates and everything is fine now.
I have trouble splitting a name by a space, and I can't seem to figure out why. Could someone please provide me with a solution?
My code is like this:
public void getPlayerNames(int id){
try {
Document root = Jsoup.connect("http://www.altomfotball.no/element.do?cmd=team&teamId=" + id).get();
Element table = root.getElementById("sd_players_table");
Elements names = table.getElementsByTag("a");
for(Element name : names){
getPlayers().add(new Player(name.text()));
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
which returns the name of football players as a string. The names are retrieved such as Mario Balotelli, Steven Gerrard, and so on, and I assumed I could use string.split(" "); to get me the first and last names, but whenever I try to access the second space of the string array it gives me an index out of bounds exception. Here is the code trying to fetch me the first name
/**
* Method to get the first name of a player
*/
public static String getFirstName(String name){
String[] nameArray = name.split(" ");
return nameArray[0];
}
Thanks for answers!
Sindre M
EDIT ######
So I got it to work, but thanks for the effort. The problem was that even though I could not see it in a simple sysout statement, the names actually contained a " "; character, so I solved it by running a replaceAll("  ;" , " ") on the names for a better formatting.
If you're trying to write a screen-scraper you need to be more defensive in your code... Definitely test the length of the array first and log any unexpected inputs so you can incorporate them later...
public static String getFirstName(String name) {
String[] nameArray = name.split(" ");
if (nameArray.length >= 1) { // <== check length before you access nameArray[0]
return nameArray[0];
} else {
// log error
}
return null;
}
Additionally java.util.Optional in Java 8 provides a great alternative to returning null...
public static Optional<String> getFirstName(String name) {
String[] nameArray = name.split(" ");
if (nameArray.length >= 1) {
return Optional.of(nameArray[0]);
} else {
// log error
}
return Optional.empty();
}
You might be getting in the actual string as you are retrieving from html page. try to debug and check.
package com.appkart.examples;
public class SplitProgram {
public void firstNameArray(String nameString) {
String strArr[] = nameString.split(",");
for (String name : strArr) {
String playerName = name.trim();
String firstName = playerName.substring(0, playerName.indexOf(" "));
System.out.println(firstName);
}
}
public static void main(String[] args) {
String nameString = "Mario Balotelli, Steven Gerrard";
SplitProgram program = new SplitProgram();
program.firstNameArray(nameString);
}
}
I think that the correct answer should be:
String[] nameArray = name.split("\\s+");
But to be honest, there are couple of answers at stackoverflow.
Eg.
How to split a String by space
How do I split a string with any whitespace chars as delimiters?
First try to replace white space as
string.replace(" ","");
then try to split with [,] as
String strAr[] = string.split(",");
My code is to add RSS feeds to a list - and the code originally was only to pull one feed from the first position in a list, and add this object to another list.
This was the original code:
public static List<Feed> getFeedsFromXml(String xml) {
Pattern feedPattern = Pattern.compile("<feed>\\s*<name>\\s*([^<]*)</name>\\s*<uri>\\s*([^<]*)</uri>\\s*</feed>");
Matcher feedMatch = feedPattern.matcher(xml);
while (feedMatch.find()) {
String feedName = feedMatch.group(1);
String feedURI = feedMatch.group(2);
feeds.add(new Feed(feedName, feedURI));
}
return feeds;
}
#POST
#Consumes(MediaType.APPLICATION_XML)
#Produces(MediaType.APPLICATION_XML)
public String addXmlFeed() throws IOException
{
int i = 0;
String stringXml = "<feed><name>SMH Top Headlines</name><uri>http://feeds.smh.com.au/rssheadlines/top.xml</uri></feed><feed><name>UTS Library News</name>";
getFeedsFromXml(stringXml);
Feed f = (Feed) feeds.get(0);
feedList.add(f);
String handler = "You have successfully added: \n";
String xmlStringReply = "" + f + "\n";
feedList.save(feedFile);
return handler + xmlStringReply;
}
Everything was going well, and then I decided to implement a for loop for handling the adding of more than one feed to the list, and I tried the following (only the code for the second method in question):
#POST
#Consumes(MediaType.APPLICATION_XML)
#Produces(MediaType.APPLICATION_XML)
public String addXmlFeed() throws IOException
{
int i = 0;
String stringXml = "<feed><name>SMH Top Headlines</name><uri>http://feeds.smh.com.au/rssheadlines/top.xml</uri></feed><feed><name>UTS Library News</name>";
getFeedsFromXml(stringXml);
for (Feed feed: feeds)
{
Feed f = (Feed) feeds.get(i++);
feedList.add(f);
String handler = "You have successfully added: \n";
String xmlStringReply = "" + f + "\n";
}
feedList.save(feedFile);
return handler + xmlStringReply;
}
Now I'm sure this is a basic problem, but now in the line:
return handler + xmlStringReply;
handler and xmlStringReply cannot be resolved to a variable as they are within the FOR LOOP.
Is there any easy way around this?
The scope of those 2 variables is limited to the for loop. To access them outside the loop, you need to increase their scope by declaring them before the loop:
String handler = "";
String xmlStringReply = "";
for (Feed f: feeds) {
feedList.add(f);
handler = "You have successfully added: \n";
xmlStringReply = "" + f + "\n";
}
feedList.save(feedFile);
return handler + xmlStringReply;
Also, your current code overwrites the value of your strings at each loop, whereas you probably meant to concatenate the values. In that case, you could use a StringBuilder instead of string concatenation:
StringBuilder xmlStringReply = new StringBuilder("You have successfully added: \n");
for (Feed f: feeds) {
feedList.add(f);
xmlStringReply.append(f + "\n");
}
feedList.save(feedFile);
return xmlStringReply.toString();
The question you need to answer is "what do I want to return if I add several feeds ?".
Maybe you'd like to return "You have successfully added : feed1 feed2 feed3\n"
In that case, the code is :
StringBuilder response = new StringBuilder( "You have successfully added: ");
for (Feed feed: feeds)
{
feedList.add(feed);
response.append(f.toString()).append(" ");
}
feedList.save(feedFile);
return response.toString();
By the way, your feedand fvariables are just the same and redondant !
Don't write :
int i = 0;
for (Feed feed: feeds)
{
Feed f = (Feed) feeds.get(i++);
feedList.add(f);
}
but
for (Feed feed: feeds)
{
feedList.add(feed);
}
You need to accumulate the result into a variable. I am using StringBuilder because it makes string concatenation efficient.
#POST
#Consumes(MediaType.APPLICATION_XML)
#Produces(MediaType.APPLICATION_XML)
public String addXmlFeed() throws IOException
{
String stringXml = "<feed><name>SMH Top Headlines</name><uri>http://feeds.smh.com.au/rssheadlines/top.xml</uri></feed><feed><name>UTS Library News</name>";
getFeedsFromXml(stringXml);
StringBuilder replyBuilder = new StringBuilder("You have successfully added: \n");
for (Feed feed : feeds)
{
feedList.add(feed);
String xmlStringReply = feed + "\n";
reployBuilder.append(xmlStringReply);
}
feedList.save(feedFile);
return replyBuilder.toString();
}
Because, now they became out of scope.
Beside the original error -- you can easily fix that using other suggestions, I would like to suggest that you should not make feeds as instance variable. I can see your method getFeedsFromXml() is returning the list. So, I think it would have been better if you define that variable inside that method. And then, call the method like,
List<Feed> feeds = getFeedsFromXml(stringXml);
Or in case, this doesn't give you the desired behaviour, then you should rename the method to something, loadFeedsFromXml(). Making that as instance variable may result in threading issues.
Now, trying to improve on your looping,
StringBuilder xmlStringReply = new StringBuilder("You have successfully added: \n");
for (Feed feed: feeds) {
feedList.add(feed);
xmlStringReply.append(f + "\n");
}
feedList.save(feedFile);
return xmlStringReply.toString();
Moreover, I found that your feedList is also a instance variable. And this again can cause threading issues, as it doesn't sound immutable or stateless. Synchronising the methods will give you performance issues. See if you can make it local to this method. A rule of thumb is to keep variable scope as narrow as possible.
A good rule of thumb is to view scope like this:
{ //This is a constructor
int i;
} // This is a deconstructor
anything that is created / instantiated between the curlies only lives inside the curlies. Whenever your working with variables and loops:
for(int i = 0; i < 10; i++){
//some code here
} // after this curly i is no longer in scope or accessible.