can someone explain what this code is doing - java

I am having two problems here:
The following block of codes has got me confused. Primarily, I do not know what exactly the code is doing from the basics; I just copied it from a tutorial, and it seems to do what i want it to do. If anyone can explain in bits what it does, it will be really helpful.
The second problem is that I do not know why it throws an ArrayIndexOutOfBounds error, maybe because I do not understand it or otherwise. I really need clarification.
try {
Document searchLink = Jsoup.connect("https://www.google.com.ng/search?dcr=0&source=hp&ei=5-cIWuZ30cCwB7aUhrAN&q=" + URLEncoder.encode(searchValue, encoding))
.userAgent("Mozilla/5.0").get();
String websiteLink = searchLink.getElementsByTag("cite").get(0).text();
//we are setting the value for the action "titles" in the wikipedia API with our own article title
//we use the string method replaceAll() to remove the title of the article from the wikipedia URL that we generated from google
//
String wikiAPItoSearch = "https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles="
+ URLEncoder.encode(websiteLink.replaceAll("https://en.wikipedia.org/wiki/", ""),encoding);
System.out.println(wikiAPItoSearch);
//extraction of textfiles
//from this point till down i cant really grab what is happening
HttpURLConnection httpconn = (HttpURLConnection) new URL(wikiAPItoSearch).openConnection();
httpconn.addRequestProperty("userAgent", "Mozilla/5.0");
BufferedReader bf = new BufferedReader(new InputStreamReader(httpconn.getInputStream()));
//read line by line
String response = bf.lines().collect(Collectors.joining());
bf.close();
///it returns ArrayIndexOutOfBounds here
String result = response.split("\"extract\":\"")[1];
System.out.println(result);
} catch (IOException e) {
// TODO: handle exception
e.printStackTrace();
}

I don't think anyone will take the time to explain the code for you. A good opportunity for you to do some debugging.
ArrayIndexOutOfBounds comes from response.split("\"extract\":\"")[1]. There is no guarantee that the String response can be split into at least 2 parts.
Add a check to avoid the error. Instead of...
String result = response.split("\"extract\":\"")[1];
use...
String[] parts = response.split("\"extract\":\"");
String result;
if (parts.length >= 2) {
result = parts[1];
} else {
result = "Error..." + response; // a simple fallback
}
This is how split works:
String input = "one,two,three";
String[] parts = input.split(",");
System.out.println(parts[0]); // prints 'one'
System.out.println(parst[2]); // prints 'three'
So in your case, [1] means the second item in the parts array. "\"extract\":\"" has to appear at least once in the response, otherwise there will be only one item in the parts array, and you will get an error when you try to reach the second item (since it doesn't exist). It all gets extra tricky since .split accepts a regexp string and "\"extract\":\"" contains regexp reserved characters.

OPPS... i realized it was the API that i was using that caused the error, the API i got from wikimedia does not use /extract /as a delimetre , so i checked other stack overflow articles for a more cleaner API especially a one that uses /extract/ as a delimetre for the API response.
this is the new API i got :
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=
this was the former one that causes the error:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=
i think the error was caused by my inability to understand the process in-dept.. thanks for the responses.

Related

Iterate through a dictionary array

I have a String array containing a poem which has deliberate spelling mistakes. I am trying to iterate through the String array to identify the spelling mistakes by comparing the String array to a String array containing a dictionary. If possible I would like a suggestion that allows me to continue using nested for loops
for (int i = 0; i < poem2.length; i++) {
boolean found = false;
for (int j = 0; j < dictionary3.length; j++) {
if (poem2[i].equals(dictionary3[j])) {
found = true;
break;
}
}
if (found==false) {
System.out.println(poem2[i]);
}
}
The output is printing out the correctly spelt words as well as the incorrectly spelt ones and I am aiming to only print out the incorrectly spelt ones. Here is how I populate the 'dictionary3' and 'poem2' arrays:
char[] buffer = null;
try {
BufferedReader br1 = new BufferedReader(new
java.io.FileReader(poem));
int bufferLength = (int) (new File(poem).length());
buffer = new char[bufferLength];
br1.read(buffer, 0, bufferLength);
br1.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String text = new String(buffer);
String[] poem2 = text.split("\\s+");
char[] buffer2 = null;
try {
BufferedReader br2 = new BufferedReader(new java.io.FileReader(dictionary));
int bufferLength = (int) (new File(dictionary).length());
buffer2 = new char[bufferLength];
br2.read(buffer2, 0, bufferLength);
br2.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String dictionary2 = new String(buffer);
String[] dictionary3 = dictionary2.split("\n");
Your basic problem is in line
String dictionary2 = new String(buffer);
where you ware trying to convert characters representing dictionary stored in buffer2 but you used buffer (without 2 suffix). Such style of naming your variables may suggest that you either need a loop, or in this case separate method which will return for selected file array of words it holds (you can also add as method parameter delimiter on which string should be split).
So your dictionary2 held characters from buffer which represented poem, not dictionary data.
Another problem is
String[] dictionary3 = dictionary2.split("\n");
because you are splitting here only on \n but some OS like Windows use \r\n as line separator sequence. So your dictionary array may contain words like foo\r instead of foo which will cause poem2[i].equals(dictionary3[j] to always fail.
To avoid this problem you can split on \\R (available since Java 8) or \r?\n|\r.
There are other problems in your code like closing resource within try section. If any exception will be thrown before, close() will never be invoked leaving unclosed resources. To solve it close resources in finally section (which is always executed after try - regardless if exception will be thrown or not), or better use try-with-resources.
BTW you can simplify/clarify your code responsible for reading words from files
List<String> poem2 = new ArrayList<>();
Scanner scanner = new Scanner(new File(yourFileLocation));
while(scanner.hasNext()){//has more words
poem2.add(scanner.next());
}
For dictionary instead of List you should use Set/HashSet to avoid duplicates (usually sets also have better performance when checking if they contain some elements or not). Such collections already provide methods like contains(element) so you wouldn't need that inner loop.
I copied your code and ran it, and I noticed two issues. Good news is, both are very quick fixes.
#1
When I printed out everything in dictionary3 after it is read in, it is the exact same as everything in poem2. This line in your code for reading in the dictionary is the problem:
String dictionary2 = new String(buffer);
You're using buffer, which was the variable you used to read in the poem. Therefore, buffer contains the poem and your poem and dictionary end up the same. I think you want to use buffer2 instead, which is what you used to read in the dictionary:
String dictionary2 = new String(buffer2);
When I changed that, the dictionary and poem appear to have the proper entries.
#2
The other problem, as Pshemo pointed out in their answer (which is completely correct, and a very good answer!) is that you are splitting on \n for the dictionary. The only thing I would say differently from Pshemo here is that you should probably split on \\s+ just like you did for the poem, to stay consistent. In fact, when I debugged, I noticed that the dictionary words all have "\r" appended to the end, probably because you were splitting on \n. To fix this, change this line:
String[] dictionary3 = dictionary2.split("\n");
To this:
String[] dictionary3 = dictionary2.split("\\s+");
Try changing those two lines, and let us know if that resolves your issue. Best of luck!
Convert your dictionary to an ArrayList and use Contains instead.
Something like this should work:
if(dictionary3.contains(poem2[i])
found = true;
else
found = false;
With this method you can also get rid of that nested loop, as the contains method handles that for you.
You can convert your Dictionary to an ArrayList with the following method:
new ArrayList<>(Arrays.asList(array))

How to test if a URL contains certain parameters but not others using regex

I have an url in this format:
http://www.example.com/path?param1=value1&param2=value2
I need a regex to match the path and params1 and params2 in any order but if param3 is present then I need it to fail so:
String str1 = "/path?param1=value1&param2=value2"; // This will match
String str2 = "/path?param2=value2&param1=value1"; // This will match
String str3 = "/path?param1=value1&param2=value&param3=value3"; // This will not match
So for I've tried using lookarounds to match the parameters but it is failing:
/path\?(?!param3)(?=param1=.*)(?=param2=.*)
Any thoughts?
P.D. For the curious I'm trying to match a specific URL from an AndroidManifest.xml file https://developer.android.com/guide/topics/manifest/data-element.html
Try this one out:
^(?!.*param3)(?=.*param1=)(?=.*param2=).*$
https://regex101.com/r/rI1lH5/1
If you want the path in as well, then
^\/path(?!.*param3)(?=.*param1=)(?=.*param2=).*$
This started as a comment and I got a little carried away. You can sanitize the query and see if it matches the parameters you need it to and avoid regex all together (if possible)
private boolean checkProperQueryString(String url, String[] requiredKeys){
try{
UrlQuerySanitizer sanitizer = new UrlQuerySanitizer(url);
// Check that you have the right number of parameters
List<UrlQuerySanitizer.ParameterValuePair> parameters =
sanitizer.getParameterList();
if(parameters == null || parameters.size() != requiredKeys.length)
return false;
// Check to make sure that the parameters you have are the
// correct ones
for(String key : requiredKeys){
if(TextUtils.isEmpty(sanitizer(getValue(key))
return false;
}
// We pass every test, success!
return true;
} catch(Exception e){
// Catch any errors (haven't tested this so not sure of errors)
e.printStackTrace();
return false;
}
}
You can then make the call doing something like this
boolean validUrl = checkProperQueryString(url, new String[]{"param1", "param2"});
This doesn't directly answer your question, again just too much for a comment :P
Let me know if this just adds confusion for anyone and I can remove it.
The regex provided by Michael works well but there is a glitch. It also evaluates newParam. So we should change that with:
^(?!.*(\\?|&)param3)(?=.*(\\?|&)param1=)(?=.*(\\?|&)param2=).*$
Basically we check if the parameter name starts with a ? or &. Also if you want to make a parameter optional then you can just put a ? at the end like:
(?!.*(\\?|&)param3)(?=.*(\\?|&)param1=)(?=.*(\\?|&)param2=)?.*$
In the above param2 is optional.

File reader finding exception

EDIT: Sorry, I'm an idiot. I didn't realize my loop was going for too long for my test documents data, the final product will be 150 lines long, but mine was only 9, causing the error. Sorry for the time wasting, and thanks for the help
So I need to write a program that reads in data from a file, said data is separated by "," which I am using the split command to store in an array. Every 3rd result is in an integer and needs to be parsed as such, but then I encounter an exception. My code specifically..
try {
BufferedReader read = new BufferedReader(new FileReader("temp.txt"));
String file = read.readLine();
String[] store=file.split(",");
for (int i=2; i<150; i=i+3){
int result=Integer.parseInt(store[i]);
if (result>highresult){
highresult = result;
fName = store[i - 1];
sName = store[i - 2];
}
}
read.close();
} catch (IOException e) {
System.out.println("File Read Error");
}
The exceptions is:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 11
at testing.Testing.main(Testing.java:28)
Java Result: 1
The exception is encountered on the parser inside my for loop, and I cannot for the life of me find the issue. Been staring at it so long I think I'm seeing double. Any ideas whats up with it?
Adding for clearness - the test file itself contains the following data
Name1,Sname1,50,Name2,Sname2,75,Name3,Sname3,100
Testing it with a sysout message shows it collects the data from the array correctly, the issue appears to be when changing every third piece of data, in the array at [2][5][8] (50, 75, 100 in this case) into an integer.
Thank you for your patience, still new to this website.
well I can see 2 problems: either store[i] is not an int, either you want to get a value out of the bounds of the array
just print out the i before parsing and take a look at its value and if it's <150 take a look at its value in the file

Strings that are initialized in a different place than they are declared

I'm trying to read the title from a webpage and save it as a String. However, since Strings are immutable in java, I can't just set it to null and change it when I need to. Therefore, I'm getting an error on the next to last line that strTitle may not have been initialized. This seems like it should be easy to deal with, but I can't figure it out. Thanks in advance.
URL allRecipe = new URL(inputLine); //user defined url
BufferedReader urlIn = new BufferedReader(
new InputStreamReader(allRecipe.openStream()));
String inputFromWeb;
//loops through webpage and finds title
while((inputFromWeb = urlIn.readLine()) != null){
//getting title
if(inputFromWeb.contains("<title>")){
strTitle = urlIn.readLine();
}
}//end while
urlIn.close();
//print out title
System.out.println("Title:");
System.out.println(strTitle); //this line returns the error
System.out.println("\n");
since strings are immutable in java and I can't just set it to null
and change it when I need to.
Sure you can. If you are initializing a String reference to null and then assigning to it a different String, you are not changing any String, you are just changing the String reference.
However, as is I'm getting an error on the next to last line that
strTitle may not have been initialized.
String strTitle = null;
will solve your problem.

How to set a java string variable equal to "htp://website htp://website " [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
so I have a large list of websites and I want to put them all in a String variable. I know I can not individually go to all of the links and escape the //, but is there is over a few hundred links. Is there a way to do a "block escape", so everything in between the "block" is escaped? This is an example of what I want to save in the variable.
String links="http://website http://website http://website http://website http://website http://website"
Also can anyone think of any other problems I might run into while doing this?
I made it htp instead of http because I am not allowed to post "hyperlinks" according to stack overflow as I am not at that level :p
Thanks so much
Edit: I am making a program because I have about 50 pages of a word document that is filled with both emails and other text. I want to filter out just the emails. I wrote the program to do this which was very simple, not I just need to figure away to store the pages in a string variable in which the program will be run on.
Your question is not well-written. Improve it, please. In its current format it will be closed as "too vague".
Do you want to filter e-mails or websites? Your example is about websites, you text about e-mails. As I don't know and I decided to try to help you anyway, I decided to do both.
Here goes the code:
private static final Pattern EMAIL_REGEX =
Pattern.compile("[A-Za-z0-9](:?(:?[_\\.\\-]?[a-zA-Z0-9]+)*)#(:?[A-Za-z0-9]+)(:?(:?[\\.\\-]?[a-zA-Z0-9]+)*)\\.(:?[A-Za-z]{2,})");
private static final Pattern WEBSITE_REGEX =
Pattern.compile("http(:?s?)://[_#\\.\\-/\\?&=a-zA-Z0-9]*");
public static String readFileAsString(String fileName) throws IOException {
File f = new File(fileName);
byte[] b = new byte[(int) f.length()];
InputStream is = null;
try {
is = new FileInputStream(f);
is.read(b);
return new String(b, "UTF-8");
} finally {
if (is != null) is.close();
}
}
public static List<String> filterEmails(String everything) {
List<String> list = new ArrayList<String>(8192);
Matcher m = EMAIL_REGEX.matcher(everything);
while (m.find()) {
list.add(m.group());
}
return list;
}
public static List<String> filterWebsites(String everything) {
List<String> list = new ArrayList<String>(8192);
Matcher m = WEBSITE_REGEX.matcher(everything);
while (m.find()) {
list.add(m.group());
}
return list;
}
To ensure that it works, first lets test the filterEmails and filterWebsites method:
public static void main(String[] args) {
System.out.println(filterEmails("Orange, pizza whatever else joe#somewhere.com a lot of text here. Blahblah blah with Luke Skywalker (luke#starwars.com) hfkjdsh fhdsjf jdhf Paulo <aaa.aaa#bgf-ret.com.br>"));
System.out.println(filterWebsites("Orange, pizza whatever else joe#somewhere.com a lot of text here. Blahblah blah with Luke Skywalker (http://luke.starwars.com/force) hfkjdsh fhdsjf jdhf Paulo <https://darth.vader/blackside?sith=true&midclorians> And the http://www.somewhere.com as x."));
}
It outputs:
[joe#somewhere.com, luke#starwars.com, aaa.aaa#bgf-ret.com.br]
[http://luke.starwars.com/force, https://darth.vader/blackside?sith=true&midclorians, http://www.somewhere.com]
To test the readFileAsString method:
public static void main(String[] args) {
System.out.println(readFileAsString("C:\\The_Path_To_Your_File\\SomeFile.txt"));
}
If that file exists, its content will be printed.
If you don't like the fact that it returns List<String> instead of a String with items divided by spaces, this is simple to solve:
public static String collapse(List<String> list) {
StringBuilder sb = new StringBuilder(50 * list.size());
for (String s : list) {
sb.append(" ").append(s);
}
sb.delete(0, 1);
return sb.toString();
}
Sticking all together:
String fileName = ...;
String webSites = collapse(filterWebsites(readFileAsString(fileName)));
String emails = collapse(filterEmails(readFileAsString(fileName)));
I suggest that you save your Word document as plain text. Then you can use classes from the java.io package (such as Scanner to read the text).
To solve the issue of overwriting the String variable each time you read a line, you can use an array or ArrayList. This is much more ideal than holding all the web addresses in a single String because you can easily access each address individually whenever you like.
For your first problem, take all the text out of word, put it in something that does regular expressions, use regular expressions to quote each line and end each line with +. Now edit the last line and change + to ;. Above the first line write String links =. Copy this new file into your java source.
Here's an example using regexr.
To answer your second question (thinking of problems) there is an upper limit for a Java string literal if I recall correctly 2^16 in length.
Oh and Perl was basically written for you to do this kind of thing (take 50 pages of text and separate out what is a url and what is an email)... not to mention grep.
I'm not sure what kind of 'list of websites' you're referring to, but for eg. a comma-separated file of websites you could read the entire file and use the String split function to get an array, or you could use a BufferedReader to read the file line by line and add to an ArrayList.
From there you can simply loop the array and append to a String, or if you need to:
do a "block escape", so everything in between the "block" is escaped
You can use a Regular Expression to extract parts of each String according to a pattern:
String oldString = "<someTag>I only want this part</someTag>";
String regExp = "(?i)(<someTag.*?>)(.+?)(</someTag>)";
String newString = oldString.replaceAll(regExp, "$2");
The above expression would remove the xml tags due to the "$2" which means you're interested in the second group of the expression, where groups are identified by round brackets ( ).
Using "$1$3" instead should then give you only the surrounding xml tags.
Another much simpler approach to removing certain "blocks" from a String is the String replace function, where to remove the block you could simply pass in an empty string as the new value.
I hope any of this helps, otherwise you could try to provide a full example with you input "list of websites" and the output you want.

Categories