How to explode a string on a hyphen in Java? - java

I have a task which involves me creating a program that reads text from a text file, and from that produces a word count, and lists the occurrence of each word used in the file. I managed to remove punctuation from the word count but I'm really stumped on this:
I want java to see this string "hello-funny-world" as 3 separate strings and store them in my array list, this is what I have so far , with this section of code I having issues , I just get "hello funny world" seen as one string:
while (reader.hasNext()){
String nextword2 = reader.next();
String nextWord3 = nextword2.replaceAll("[^a-zA-Z0-9'-]", "");
String nextWord = nextWord3.replace("-", " ");
int apcount = 0;
for (int i = 0; i < nextWord.length(); i++){
if (nextWord.charAt(i)== 39){
apcount++;
}
}
int i = nextWord.length() - apcount;
if (wordlist.contains(nextWord)){
int index = wordlist.indexOf(nextWord);
count.set(index, count.get(index) + 1);
}
else{
wordlist.add(nextWord);
count.add(1);
if (i / 2 * 2 == i){
wordlisteven.add(nextWord);
}
else{
wordlistodd.add(nextWord);
}
}

This can work for you ....
List<String> items = Arrays.asList("hello-funny-world".split("-"));

By considering that you are using the separator as '-'
I would suggest you to use simple split() of java
String name="this-is-string";
String arr[]=name.split("-");
System.out.println("Here " +arr.length);
Also you will be able to iterate through this array using for() loop
Hope this helps.

Related

How many times the word is used on the html page

I have a method that should return an integer which is the number of uses of the searchWord in the text of an HTML document:
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
if (bodyText.toLowerCase().contains(searchWord.toLowerCase())){
count++;
}
return count;
}
But my method always returns count=1, even if the word is used several times. I understand that the error should be obvious, but I’m stuck and I don’t see it.
You are currently only checking once that the text contains the search word, so the count will always be either 0 or 1. To find the total count, keep looping using String#indexOf(str, fromIndex) while the String can be found using the second argument that indicates the index to start searching from.
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
for(int idx = -1; (idx = bodyText.indexOf(searchWord, idx + 1)) != -1; count++);
return count;
}
According to the Java docs String#contains:
Returns true if and only if this string contains the specified sequence of char values.
You're asking if the word you're looking for is contained in the document, which it is.
You could:
Split the text on words (splitting it by spaces) and then count how many times it appears
Iterate the String using String#indexOf starting on index 0 and then from last index you found until the end of the String.
Iterate the String using contains but starting from a certain index (doing this logic yourself).
I'd go for the 2nd approach as it seems like the easiest one.
These are only conditional statements, you aren't looping through the HTML text, therefor, if it finds the instance of searchWord in bodyText, it'll increment it, and then exit the method with a value of 1. I suggest looping through every word in the html, adding it to an array, and counting it that way using something like this:
char[] bodyTextA = bodyText.toCharArray();
Or keep it in a string array and split it by a space, or new line, or whatever criteria you have. Example of space:
//puts hello, i'm, your, and string into their own array slots in the array
/split
str = "Hello I'm your String";
String[] split = str.split("\\s+");
Your issue here is that the if statement is checking if the text contains the word and the increments your count variable. So even if it contains the word multiple time, your logic goes basically, if it contains it at all, increase count by one. You will have to rewrite your code to check for multiple occurrences of the word. There are many ways you can go about this, you could loop through the entire body text, you could split the body text into an array of words and check that, or you could remove the search word from the text each time you find it and keep checking until it no longer contains the search word.
You can use indexOf(,) with an index for the last found word
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
int index = 0;
while ((index = bodyText.indexOf(searchWord, index + 1)) != -1) {
count++;
}
return count;
}

How to remove phrase from beginning to end

I've been trying for a while now, and I just give up. I want to extract the data from type (regardless whether it's a capital letter or not) to the numbers. Pretty much, I'm trying to get rid of model and birthday in each line, but what makes it even more difficult, is that it's all one string. I spaced it out just to make it easier to read.
I'm trying to find the answer in REGEX java. This is what I was trying but, is deleting of course the whole String after the first number(4,66)
[;][mo].*
Thank you in advance!
Input:
Type:Carro;high:4,66;model:fourDoors;birthday:01/01/1980
type:Truck;high:5,66;model:twoDoors;birthday:29/05/1977
tYpe:motorcycle;high:1,55;model:fiveDoors;birthday:01/01/1980
type:Carro;high:4,66;type:model;birthday:6/12/1887
type:Carro;high:9,66;model:Doors;birthday:05/12/2010
Expected OutPut:
Type:Carro;high:4,66
type:Truck;high:5,66
tYpe:motorcycle;high:1,55
type:Carro;high:4,66
type:Carro;high:9,66
Hopefully this will work for you. There are a few ways to make this code slightly smaller, however, this should at least help to get you on the right path.
I placed it into a main method, but it would be easy to put it into its own function. This would allow you to pass any number of arrays at it.
I added all of the logic in the comments within the code, I hope it helps:
public static void main(String[] args) {
/*Get your strings into an Array*/
String[] str = {"Type:Carro;high:4,66;model:fourDoors;birthday:01/01/1980",
"type:Truck;high:5,66;model:twoDoors;birthday:29/05/1977",
"tYpe:motorcycle;high:1,55;model:fiveDoors;birthday:01/01/1980",
"type:Carro;high:4,66;type:model;birthday:6/12/1887",
"type:Carro;high:9,66;model:Doors;birthday:05/12/2010",
"Expected OutPut:",
"Type:Carro;high:4,66",
"type:Truck;high:5,66",
"tYpe:motorcycle;high:1,55",
"type:Carro;high:4,66",
"type:Carro;high:9,66"
};
/*Create a "final staging" array*/
String[] newStr = new String[str.length - 1];
for (int j = 0; j < str.length - 1; j++) {//For each of your strings
str[j] = str[j].toLowerCase();//set the string to lower
/*If they don't contain a semi-colon and a model or birthday reference go to else*/
if (str[j].contains(";") && str[j].contains("model") || str[j].contains("birthday")) {
/*Otherwise, split the string by semi-colon*/
String[] sParts = str[j].split(";");
String newString = "";//the new string that will be created
for (int i = 0; i < sParts.length - 1; i++) {//for each part of the sParts array
if (sParts[i].contains("model") || sParts[i].contains("birthday")) {//if it contains what is not desired
//Do Nothing
} else {
newString += sParts[i];//otherwise concatenate it to the newString
}
newStr[j] = newString;//add the string to the "final staging" array
}
} else {
newStr[j] = str[j];//if it didn't have semi-colons and birthday or model, just add it to the "final staging" array
}
}
for (String newS : newStr) {// finally if you want to see the "final staging" array data... output it.
System.out.println(newS);
}
}
OUTPUT
type:carrohigh:4,66
type:truckhigh:5,66
type:motorcyclehigh:1,55
type:carrohigh:4,66
type:carrohigh:9,66
expected output:
type:carro;high:4,66
type:truck;high:5,66
type:motorcycle;high:1,55
type:carro;high:4,66
If I happened to miss something in the requirements, please let me know, I would be happy to fix it.
String str = "Type:Carro;high:4,66;model:fourDoors;birthday:01/01/1980,type:Truck;high:5,66;model:twoDoors;birthday:29/05/1977,tYpe:motorcycle;high:1,55;model:fiveDoors;birthday:01/01/1980,type:Carro;high:4,66;type:model;birthday:6/12/1887";
StringTokenizer tokens = new StringTokenizer(str, ",");
while (tokens.hasMoreTokens()) {
String token = tokens.nextToken() ;
StringTokenizer tokens2 = new StringTokenizer(token, ":");
while (tokens2.hasMoreTokens()) {
String key = tokens2.nextToken() ;
if (key.equalsIgnoreCase("type")){
System.out.println("locate: "+key+"\n");
}
}
}

Ignoring upper/lowercase strings

My goal is to change any form of the word "java" in a sentence to "JAVA".I've got everything done but my code won't read in mixed cases for example:JaVa, JAva,etc. I know I am suppose to use toUpperCase and toLowerCase or equalsIgnoreCase but I am not sure how to use it properly. I am not allowed to use replace or replace all, teacher wants substring method.
Scanner input=new Scanner(System.in);
System.out.println("Enter a sentence with words including java");
String sentence=input.nextLine();
String find="java";
String replace="JAVA";
String result="";
int n;
do{
n=sentence.indexOf(find);
if(n!=-1){
result =sentence.substring(0,n);
result=result +replace;
result = result + sentence.substring(n+find.length());
sentence=result;
}
}while(n!=-1);
System.out.println(sentence);
}
}
You can't do that using String.indexOf because it is case sensitive.
The simple solution is to use a regex with a case insensitive pattern; e.g.
Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(str).replaceAll(repl);
That also has the benefit of avoiding the messy string-bashing you are currently using to do the replacement.
In your example, your input string is also valid as a regex ... because it doesn't include any regex meta-characters. If it did, then the simple workaround is to use Pattern.quote(str) which will treat the meta-characters as literal matches.
It is also worth nothing that String.replaceAll(...) is a "convenience method" for doing a regex replace on a string, though you can't use it for your example because it does case sensitive matching.
For the record, here is a partial solution that does the job by string-bashing. #ben - this is presented for you to read and understand ... not to copy. It is deliberately uncommented to encourage you to read it carefully.
// WARNING ... UNTESTED CODE
String input = ...
String target = ...
String replacement = ...
String inputLc = input.lowerCase();
String targetLc = target.lowerCase();
int pos = 0;
int pos2;
while ((pos2 = inputLc.indexOf(targetLc, pos)) != -1) {
if (pos2 - pos > 0) {
result += input.substring(pos, pos2);
}
result += replacement;
pos = pos2 + target.length();
}
if (pos < input.length()) {
result += input.substring(pos);
}
It probably be more efficient to use a StringBuilder instead of a String for result.
you are allowed to use toUpperCase() ? try this one
Scanner input=new Scanner(System.in);
System.out.println("Enter a sentence with words including java");
String sentence=input.nextLine();
String find="java";
String replace="JAVA";
String result="";
result = sentence.toLowerCase();
result = result.replace(find,replace);
System.out.println(result);
}
reply with the result :))
Update : Based on
I've got everything done but my code won't read in mixed cases for
example:JaVa, JAva,etc.
you can use your code
Scanner input=new Scanner(System.in);
System.out.println("Enter a sentence with words including java");
String sentence=input.nextLine();
String find="java";
String replace="JAVA";
String result="";
int n;
do{
//for you to ignore(converts the sentence to lowercase) either lower or upper case in your sentence then do the nxt process
sentence = sentence.toLowerCase();
n=sentence.indexOf(find);
if(n!=-1){
result =sentence.substring(0,n);
result=result +replace;
result = result + sentence.substring(n+find.length());
sentence=result;
}
}while(n!=-1);
System.out.println(sentence);
}
Update 2 : I put toLowerCase Convertion outside the loop.
public static void main(String[] args){
String sentence = "Hello my name is JAva im a jaVa Man with a jAvA java Ice cream";
String find="java";
String replace="JAVA";
String result="";
int n;
//for you to ignore(converts the sentence to lowercase) either lower or upper case in your sentence then do the nxt process
sentence = sentence.toLowerCase();
System.out.println(sentence);
do{
n=sentence.indexOf(find);
if(n!=-1){
result =sentence.substring(0,n);
result=result +replace;
result = result + sentence.substring(n+find.length());
sentence=result;
}
}while(n!=-1);
System.out.println(sentence);
}
RESULT
hello my name is java im a java man with a java java ice cream
hello my name is JAVA im a JAVA man with a JAVA JAVA ice cream
A quick solution would be to remove your do/while loop entirely and just use a case-insensitive regex with String.replaceAll(), like:
sentence = sentence.replaceAll("(?i)java", "JAVA");
System.out.println(sentence);
Or, more general and according to your variable namings:
sentence = sentence.replaceAll("(?i)" + find, replace);
System.out.println(sentence);
Sample Program
EDIT:
Based on your comments, if you need to use the substring method, here is one way.
First, since String.indexOf does case-sensitive comparisons, you can write your own case-insensitive method, let's call it indexOfIgnoreCase(). This method would look something like:
// Find the index of the first occurrence of the String find within the String str, starting from start index
// Return -1 if no match is found
int indexOfIgnoreCase(String str, String find, int start) {
for(int i = start; i < str.length(); i++) {
if(str.substring(i, i + find.length()).equalsIgnoreCase(find)) {
return i;
}
}
return -1;
}
Then, you can use this method in the following manner.
You find the index of the word you need, then you add the portion of the String before this word (up to the found index) to the result, then you add the replaced version of the word you found, then you add the rest of the String after the found word.
Finally, you update the starting search index by the length of the found word.
String find = "java";
String replace = "JAVA";
int index = 0;
while(index + find.length() <= sentence.length()) {
index = indexOfIgnoreCase(sentence, find, index); // use the custom indexOf method here
if(index == -1) {
break;
}
sentence = sentence.substring(0, index) + // copy the string up to the found word
replace + // replace the found word
sentence.substring(index + find.length()); // copy the remaining part of the string
index += find.length();
}
System.out.println(sentence);
Sample Program
You could use a StringBuilder to make this more efficient since the + operator creates a new String on each concatenation. Read more here
Furthermore, you could combine the logic in the indexOfIgnoreCase and the rest of the code in a single method like:
String find = "java";
String replace = "JAVA";
StringBuilder sb = new StringBuilder();
int i = 0;
while(i + find.length() <= sentence.length()) {
// if found a match, add the replacement and update the index accordingly
if(sentence.substring(i, i + find.length()).equalsIgnoreCase(find)) {
sb.append(replace);
i += find.length();
}
// otherwise add the current character and update the index accordingly
else {
sb.append(sentence.charAt(i));
i++;
}
}
sb.append(sentence.substring(i)); // append the rest of the string
sentence = sb.toString();
System.out.println(sentence);

Reversing the order of a string

So I'm still shaky on how basic java works, and here is a method I wrote but don't fully understand how it works anyone care to explain?
It's supposed to take a value of s in and return it in its reverse order.
Edit: Mainly the for loop is what is confusing me.
So say I input "12345" I would want my output to be "54321"
Public string reverse(String s){
String r = "";
for(int i=0; i<s.length(); i++){
r = s.charAt(i) + r;
}
return r;
}
We do a for loop to the last index of String a , add tha carater of index i to the String s , add here is a concatenation :
Example
String z="hello";
String x="world";
==> x+z="world hello" #different to z+x ="hello world"
for your case :
String s="";
String a="1234";
s=a.charAt(0)+s ==> s= "1" + "" = "1" ( + : concatenation )
s=a.charAt(1)+s ==> s='2'+"1" = "21" ( + : concatenation )
s=a.charAt(2)+s ==> s='3'+"21" = "321" ( + : concatenation )
s=a.charAt(3)+s ==> s='3'+"321" = "4321" ( + : concatenation )
etc..
public String reverse(String s){
String r = ""; //this is the ouput , initialized to " "
for(int i=0; i<s.length(); i++){
r = s.charAt(i) + r; //add to String r , the caracter of index i
}
return r;
}
What this code does is the following
Create a new variable r="";
then looping for the string in input lenght it adds at the beginning of r the current character of the loop.
i=0) r="1"
i=1) r="21"
i=2) r="321"
i=3) r="4321"
i=4) r="54321"
When you enter the loop you are having empty string in r.
Now r=""
In 1st iteration, you are taking first character (i=0) and appending r to it.
r = "1" + "";
Now r=1
In 2nd iteration, you are taking second character (i=1) and appending r to it
r = "2" + "1";
Now r=21
You can trace execution on a paper like this, then you will easily understand what is happening.
What the method is doing is taking the each character from the string s and putting it at the front of the new string r. Renaming the variables may help illustrate this.
public String reverse(String s){
String alreadyReversed = "";
for(int i=0; i<s.length(); i++){
//perform the following until count i is as long as string s
char thisCharacterInTheString = s.charAt(i); // for i==0 returns first
// character in passed String
alreadyReversed = thisCharacterInTheString + alreadyReversed;
}
return alreadyReversed;
}
So in the first iteration of the for loop alreadyReversed equals 1 + itself (an empty string).
In the second iteration alreadyReversed equals 2 + itself (1).
Then 3 + itself (21).
Then 4 + 321.
Then 5 + 4321.
GO back to your problem statement (take an input string and produce an output string in reverse order). Then consider how you would do this (not how to write Java code to do this).
You would probably come up with two alternatives:
Starting at the back of the input string, get one character at a time and form a new string (thus reversing its order).
Starting at the front of the string, get a character. Then for each next character, put it in front of all the characters you have created so far.
Your pseudo code results might be like the following
Option 1
let l = the length of the input string
set the output string to ""
while l > 0
add the "lth" character of the input string to the output string
subtract 1 from l
Option 2 left as an exercise for the questioner.
Then you would consider how to write Java to handle your algorithm. You will find that there are several ways to get the "lth" character of a string. First, in Java a string of length l has characters in position 0 through l-1. You can use string.charAt(loc) or string.substring(loc,loc+1) to get the character at position loc

Remove last set of value from a comma separated string in java

I wan to remove the last set of data from string using java.
For example I have a string like A,B,C, and I want to remove ,C, and want to get the out put value like A,B . How is it possible in java? Please help.
String start = "A,B,C,";
String result = start.subString(0, start.lastIndexOf(',', start.lastIndexOf(',') - 1));
Here is a fairly "robust" reg-exp solution:
Pattern p = Pattern.compile("((\\w,?)+),\\w+,?");
for (String test : new String[] {"A,B,C", "A,B", "A,B,C,",
"ABC,DEF,GHI,JKL"}) {
Matcher m = p.matcher(test);
if (m.matches())
System.out.println(m.group(1));
}
Output:
A,B
A
A,B
ABC,DEF,GHI
Since there may be a trailing comma, something like this (using org.apache.commons.lang.StringUtils):
ArrayList<String> list = new ArrayList(Arrays.asList(myString.split()));
list.remove(list.length-1);
myString = StringUtils.join(list, ",");
You can use String#lastIndexOf to find the index of the second-to-last comma, and then String#substring to extract just the part before it. Since your sample data ends with a ",", you'll need to use the version of String#lastIndexOf that accepts a starting point and have it skip the last character (e.g., feed in the string's length minus 1).
I wasn't going to post actual code on the theory better to teach a man to fish, but as everyone else is:
String data = "A,B,C,";
String shortened = data.substring(0, data.lastIndexOf(',', data.length() - 2));
You can use regex to do this
String start = "A,B,C,";
String result = start.replaceAll(",[^,]*,$", "");
System.out.println(result);
prints
A,B
This simply erases the the 'second last comma followed by data followed by last comma'
If full String.split() is not possible, the how about just scanning the string for comma and stop after reaching 2nd, without including it in final answer?
String start = "A,B";
StringBuilder result = new StringBuilder();
int count = 0;
for(char ch:start.toCharArray()) {
if(ch == ',') {
count++;
if(count==2) {
break;
}
}
result.append(ch);
}
System.out.println("Result = "+result.toString());
Simple trick, but should be efficient.
In case you want last set of data removed, irrespective of how much you want to read, then
start.substring(0, start.lastIndexOf(',', start.lastIndexOf(',')-1))
Another way to do this is using a StringTokenizer:
String input = "A,B,C,";
StringTokenizer tokenizer = new StringTokenizer(input, ",");
String output = new String();
int tokenCount = tokenizer.countTokens();
for (int i = 0; i < tokenCount - 1; i++) {
output += tokenizer.nextToken();
if (i < tokenCount - 1) {
output += ",";
}
}
public string RemoveLastSepratorFromString(string input)
{
string result = input;
if (result.Length > 1)
{
result = input.Remove(input.Length - 1, 1);
}
return result;
}
// use from above method
string test = "1,2,3,"
string strResult = RemoveLastSepratorFromString(test);
//output --> 1,2,3

Categories