Counting occurrences in a string array and deleting the repeats using java - java

i'm having trouble with a code. I have read words from a text file into a String array, removed the periods and commas. Now i need to check the number of occurrences of each word. I managed to do that as well. However, my output contains all the words in the file, and the occurrences.
Like this:
the 2
birds 2
are 1
going 2
north 2
north 2
Here is my code:
public static String counter(String[] wordList)
{
//String[] noRepeatString = null ;
//int[] countArr = null ;
for (int i = 0; i < wordList.length; i++)
{
int count = 1;
for(int j = 0; j < wordList.length; j++)
{
if(i != j) //to avoid comparing itself
{
if (wordList[i].compareTo(wordList[j]) == 0)
{
count++;
//noRepeatString[i] = wordList[i];
//countArr[i] = count;
}
}
}
System.out.println (wordList[i] + " " + count);
}
return null;
I need to figure out 1) to get the count value into an array.. 2) to delete the repetitions.
As seen in the commenting, i tried to use a countArr[] and a noRepeatString[], in hopes of doing that.. but i had a NullPointerException.
Any thought on this matter will be much appreciated :)

I would first convert the array into a list because they are easier to operate on than arrays.
List<String> list = Arrays.asList(wordsList);
Then you should create a copy of that list (you'll se in a second why):
ArrayList<String> listTwo = new ArrayList<String>(list);
Now you remove all the duplicates in the second list:
HashSet hs = new HashSet();
hs.addAll(listTwo);
listTwo.clear();
listTwo.addAll(hs);
Then you loop through the second list and get the frequency of that word in the first list. But first you should create another arrayList to store the results:
ArrayList<String> results = new ArrayList<String>;
for(String word : listTwo){
int count = Collections.frequency(list, word);
String result = word +": " count;
results.add(result);
}
Finally you can output the results list:
for(String freq : results){
System.out.println(freq);}
I have not tested this code (can't do that right now). Please ask if there is a problem or it doesnÄt work. See these questions for reference:
How do I remove repeated elements from ArrayList?
One-liner to count number of occurrences of String in a String[] in Java?
How do I clone a generic List in Java?

some syntax issues in your code but works fine
ArrayList<String> results = new ArrayList<String>();
for(String word : listTwo){
int count = Collections.frequency(list, word);
String result = word +": "+ count;
results.add(result);
}

Related

Is it correct to convert 2D CharArray to String and use .charAt() to compare a character?

So I have a char variable called "temp" which I'd like to compare to the element stored in "X" CharArray[X][Y] while I'm in a third for loop after the 2D array.
For example:
char temp;
temp = ' ';
String end;
end = "";
for (int i = 0; i < CharArray.length; i++){
for (int m = 0; m < 2; m++){
if (somethingY){
if (somethingZ){
for (int j = 0; j < something.length; j++){
//something
temp = somethingX;
if (temp == String.valueOf(CharArray[i][m]).charAt(0)){
end = String.valueOf(CharArray[i][m]);
System.out.print(end);
}
}
}
}
}
}
I've tried printing "temp" where it says "temp = somethingX" and it prints just fine. But when I try to save the String into a String variable, it will not print the variable called "end".
According to this, it won't do anything if the object is something else, but "end" is a String.
So, what am I doing wrong?
EDIT: In case there's a confusion, "I'm trying to print "end", but I figured if temp == String.valueOf(CharArray[i][m]).charAt(0) is correct, so should "end"'s part.".
EDIT2: Defined "temp" for people...
EDIT3: I tried "end.equals(String.valueOf(CharArray[i][m]));", but still nothing happens when I try to print it. I get no errors nor anything.
EDIT4: I tried putting String.valueOf(CharArray[i][m]).charAt(0) into a another variable called "temp2" and doing if (temp == temp2), but still the same thing.
EDIT5: I tried temp == CharArray[0][m] and then end = CharArray[0][m], but still nothing prints.
EDIT6: OK. Sense this will never get resolved, I'll just say the whole point of my problem. -> I have an ArrayList where each line is a combination of a letter, space and a number (e.g. "E 3"). I need to check if a letter is repeating and if it is, I need to sum the numbers from all repeating letters.
For example, if I have the following ArrayList:
Z 3
O 9
I 1
J 7
Z 7
K 2
O 2
I 8
K 8
J 1
I need the output to be:
Z 10
O 11
I 9
J 8
K 10
I didn't want people to do the whole thing for me, but it seems I've no choice, since I've wasted 2 days on this problem and I'm running out of time.
Use a map :
ArrayList<String> input=new ArrayList<String>();
input.add("O 2");
input.add("O 2");
Map<String, Integer> map= new HashMap<String, Integer>();
for (String s:input) {
String[] splitted=s.split(" ");
String letter=splitted[0];
Integer number=Integer.parseInt(splitted[1]);
Integer num=map.get(letter);
if (num==null) {
map.put(letter,number);
}
else {
map.put(letter,number+num);
}
}
for (Map.Entry<String, Integer> entry : map.entrySet()) {
System.out.println(entry.getKey() + " " + Integer.toString(entry.getValue()));
}
Without using a map :
ArrayList<String> input=new ArrayList<String>();
input.add("O 2");
input.add("O 2");
ArrayList<String> letters=new ArrayList<String>();
ArrayList<Integer> numbers=new ArrayList<Integer>();
for (String s:input) {
String[] splitted=s.split(" ");
String letter=splitted[0];
Integer number=Integer.parseInt(splitted[1]);
int index=-1;
boolean isthere=false;
for (String l:letters) {
index++;
if (l.equals(letter)) {
isthere=true; //BUGFIX
break;
}
}
if (isthere==false) { //BUGFIX
letters.add(letter);
numbers.add(number);
}
else {
numbers.set(index,numbers.get(index)+number);
}
}
for (int i=0; i < letters.size(); i++) {
System.out.println(letters.get(i));
System.out.print(numbers.get(i));
}
Converting it back to have a nice output :
ArrayList<String> output=new ArrayList<String>();
for (int i=0; i < letters.size(); i++) {
output.add(letters.get(i)+" "+Integer.toString(numbers.get(i));
}
Feel free to comment if you are having any questions.

Java, removing elements from an ArrayList

I'm having an issue with this project. The basic premise is user enters a phrase and it's supposed to find any duplicate words and how many there are.
My issue is when entering just one word multiple times, such as...
hello hello hello hello hello
The output for that would be;
"There are 2 duplicates of the word "hello" in the phrase you entered."
"There are 1 duplicates of the word "hello" in the phrase you entered."
This only seems to happen in situations like this. If I enter in a random phrase with multiple words thrown in through out, it displays the correct answer. I think the problem has something to do with removing the duplicate words and how many times it iterates through the phrase, but I just cannot wrap my head around it. I've added print lines everywhere and have changed the times it iterates all sorts of ways, I through it in a Java Visualizer and still couldn't find the exact problem. Any help is greatly appreciated!
This is for an assignment for my online Java course, but it's only for learning/practice it does not go towards my major. I'm not looking for answers though just help.
public class DuplicateWords {
public static void main(String[] args) {
List<String> inputList = new ArrayList<String>();
List<String> finalList = new ArrayList<String>();
int duplicateCounter;
String duplicateStr = "";
Scanner scan = new Scanner(System.in);
System.out.println("Enter a sentence to determine duplicate words entered: ");
String inputValue = scan.nextLine();
inputValue = inputValue.toLowerCase();
inputList = Arrays.asList(inputValue.split("\\s+"));
finalList.addAll(inputList);
for(int i = 0; i < inputList.size(); i++) {
duplicateCounter = 0;
for(int j = i + 1; j < finalList.size(); j++) {
if(finalList.get(i).equalsIgnoreCase(finalList.get(j))
&& !finalList.get(i).equals("!") && !finalList.get(i).equals(".")
&& !finalList.get(i).equals(":") && !finalList.get(i).equals(";")
&& !finalList.get(i).equals(",") && !finalList.get(i).equals("\"")
&& !finalList.get(i).equals("?")) {
duplicateCounter++;
duplicateStr = finalList.get(i).toUpperCase();
}
if(finalList.get(i).equalsIgnoreCase(finalList.get(j))) {
finalList.remove(j);
}
}
if(duplicateCounter > 0) {
System.out.printf("There are %s duplicates of the word \"%s\" in the phrase you entered.", duplicateCounter, duplicateStr);
System.out.println();
}
}
}
}
Based on some suggestions I edited my code, but I'm not sure I'm going in the right direction
String previous = "";
for(Iterator<String> i = inputList.iterator(); i.hasNext();) {
String current = i.next();
duplicateCounter = 0;
for(int j = + 1; j < finalList.size(); j++) {
if(current.equalsIgnoreCase(finalList.get(j))
&& !current.equals("!") && !current.equals(".")
&& !current.equals(":") && !current.equals(";")
&& !current.equals(",") && !current.equals("\"")
&& !current.equals("?")) {
duplicateCounter++;
duplicateStr = current.toUpperCase();
}
if(current.equals(previous)) {
i.remove();
}
}
if(duplicateCounter > 0) {
System.out.printf("There are %s duplicates of the word \"%s\" in the phrase you entered.", duplicateCounter, duplicateStr);
System.out.println();
}
}
Your problem with your code is that when you remove an item, you still increment the index, so you skip over what would be the next item. In abbreviated form, your code is:
for (int j = i + 1; j < finalList.size(); j++) {
String next = finalList.get(i);
if (some test on next)
finalList.remove(next);
}
after remove is called, the "next" item will be at the same index, because removing an item directly like this causes all items to the right to be shuffled 1 place left to fill the gap. To fix, you should add this line after removing:
i--;
That would fix your problem, however, there's a cleaner way to do this:
String previous = "";
for (Iterator<String> i = inputList.iterator(); i.hasNext();) {
String current = i.next();
if (current.equals(previous)) {
i.remove(); // removes current item
}
previous = current;
}
inputList now has all adjacent duplicates removed.
To remove all duplicates:
List<String> finalList = inputList.stream().distinct().collect(Collectors.toList());
If you like pain, do it "manually":
Set<String> duplicates = new HashSet<>(); // sets are unique
for (Iterator<String> i = inputList.iterator(); i.hasNext();)
if (!duplicates.add(i.next())) // add returns true if the set changed
i.remove(); // removes current item
I would start by populating a Map<String, Integer> with each word; increment the Integer each time you encounter a word. Something like
String inputValue = scan.nextLine().toLowerCase();
String[] words = inputValue.split("\\s+");
Map<String, Integer> countMap = new HashMap<>();
for (String word : words) {
Integer current = countMap.get(word);
int v = (current == null) ? 1 : current + 1;
countMap.put(word, v);
}
Then you can iterate the Map entrySet and display every key (word) where the count is greater than 1. Something like,
String msgFormat = "There are %d duplicates of the word \"%s\" in "
+ "the phrase you entered.%n";
for (Map.Entry<String, Integer> entry : countMap.entrySet()) {
if (entry.getValue() > 1) {
System.out.printf(msgFormat, entry.getValue(), entry.getKey());
}
}
Before you add inputList to finalList, remove any duplicate items from inputList.

Compare content of two text files and split words java

I know this question has been already asked several times but I can't find the way to apply it on my code.
So my propose is the following:
I have two files griechenland_test.txt and outagain5.txt . I want to read them and then get which percentage of outagain5.txt is inside the other file.
Outagain5 has input like that:
mit dem 542824
und die 517126
And Griechenland is an normal article from Wikipedia about that topic (so like normal text, without freqeuncy Counts).
1. Problem
- How can I split the input in bigramms? Like every two words, but always with the one before? So if I have words A, B, C, D --> get AB, BC, CD ?
I have this:
while ((sCurrentLine = in.readLine()) != null) {
// System.out.println(sCurrentLine);
arr = sCurrentLine.split(" ");
for (int i = 0; i < arr.length; i++) {
if (null == hash.get(arr[i])) {
hash.put(arr[i], 1);
} else {
int x = hash.get(arr[i]) + 1;
hash.put(arr[i], x);
}
}
Then I read the other file with this code ( I just add the word, and not the number (I split it with 4 spaces, so the two words are at h[0])).
for (String line = br.readLine(); line != null; line = br.readLine()) {
String h[] = line.split(" ");
words.add(h[0]);
}
2. Problem
Now I make the comparsion between the String x in hash and the String s in words. I have put the else System out.print to get which words are not contained in outagain5.txt, but there are several words printed out which ARE contained in outagain5.txt. I don't understand why :D
So I think that the comparsion doesn't work well or maybe this will be solved will fix the first problem.
ArrayList<String> words = new ArrayList<String>();
ArrayList<String> neuS = new ArrayList<String>();
ArrayList<Long> neuZ = new ArrayList<Long>();
for (String x : hash.keySet()) {
summe = summe + hash.get(x);
long neu = hash.get(x);
for (String s : words) {
if (x.equals(s)) {
neuS.add(x);
neuZ.add(neu);
disc = disc + 1;
} else {
System.out.println(x);
break;
}
}
}
Hope I made my question clear, thanks a lot!!
public static List<String> ngrams(int n, String str) {
List<String> ngrams = new ArrayList<String>();
String[] words = str.split(" ");
for (int i = 0; i < words.length - n + 1; i++)
ngrams.add(concat(words, i, i+n));
return ngrams;
}
public static String concat(String[] words, int start, int end) {
StringBuilder sb = new StringBuilder();
for (int i = start; i < end; i++)
sb.append((i > start ? " " : "") + words[i]);
return sb.toString();
}
It is much easier to use the generic "n-gram" approach so you can split every 2 or 3 words if you want. Here is the link I used to grab the code from: I have used this exact code almost any time I need to split words in the (AB), (BC), (CD) format. NGram Sequence.
If I recall, String has a method titled split(regex, count) that will split the item according to a specific point and you can tell it how many times to do it.
I am referencing this JavaDoc https://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String, int).
And I guess for running comparison between two text files I would recommend having your code read both of them, populated two unique arrays and then try to run comparisons between the two strings each time. Hope I helped.

Apply a Frequency to an Element in an Array

I am trying to make a script that will take a set of Words (custom class), organize them alphabetically into an array by their text value (this part works). From here I was going to count how many terms ahead of it are the same as it, and that will be the frequency for all those similar terms. Then it continues to do this till each element in the array has been assigned a frequency. From here it re sorts the elements back into their original position provided a pre stored variable that holds their original element order. Here is the code:
public void setFrequencies() {
List<Word> dupeWordList;
dupeWordList = new ArrayList<>(wordList);
dupeWordList.removeAll(Collections.singleton(null));
Collections.sort(dupeWordList, (Word one, Word other) -> one.getValue().compareTo(other.getValue()));
int count;
int currElement;
for(currElement = 0; currElement < dupeWordList.size(); currElement++) {
count = 1;
Word tempWord = dupeWordList.get(currElement);
tempWord.setFrequency(count);
if(currElement+1 <= dupeWordList.size() - 1) {
Word nextWord = dupeWordList.get(currElement+1);
while(tempWord.getValue().equals(nextWord.getValue())) {
count++;
currElement++;
tempWord.setFrequency(count);
for(int e = 0; e < count - 1; e++) {
Word middleWord = new Word();
if(currElement-count+2+e < dupeWordList.size() - 1) {
middleWord = dupeWordList.get(currElement-count+2+e);
}
middleWord.setFrequency(count);
}
if(currElement+1 <= dupeWordList.size() - 1) {
nextWord = dupeWordList.get(currElement+1);
} else {
break;
}
}
break;
}
}
List<Word> reSortedList = new ArrayList<>(wordList);
Word fillWord = new Word();
fillWord.setFrequency(0);
fillWord.setValue(null);
Collections.fill(reSortedList, fillWord);
for(int i = 0; i < dupeWordList.size(); i++) {
Word word = dupeWordList.get(i);
int wordOrder = word.getOrigOrder();
reSortedList.set(wordOrder, word);
}
System.out.println(Arrays.toString(DebugFreq(reSortedList)));
setWordList(reSortedList);
}
public int[] DebugFreq(List<Word> rSL) {
int[] results = new int[rSL.size()];
for(int i=0; i < results.length; i++) {
results[i] = rSL.get(i).getFrequency();
}
return results;
}
As you can see I set up a little debug method at the bottom. When I run this method is shows that every word was given a frequency of 1. I cant see the issue in my code, nor does it get any errors. Keep in mind I have had it display the sorted dupeWordList and it does correctly alphabetize and their are consecutive duplicate elements in it so this should not be happening.
So If I understand you correctly.. below code would be your solution.
Okay You have a list which is having a strings (terms or words) which are sorted in alphabetical Order.
// Okay the below list is already sorted in alphabetical order.
List<String> dupeWordList = new ArrayList<>(wordList);
To count the Frequency of words in your list, Map<String, Integer> might help you as below.
//Take a Map with Integer as value and String as key.
Map<String,Integer> result = new HashMap<String,Integer> ();
//Iterate your List
for(String s : dupeWordList)
{
if(map.containskey(s))
{
map.put(s,map.get(s)+1);
// Please consider casting here.
}else
{
map.put(s,1);
}
}
Okay now we have a map which is having the frequency of your words or terms as value in your map.
Hope it helps.

how to get all permutations of strings in an array list by specified size

I am trying to get all permutations of strings(words) in a java array list with two words.Following is the code i tried but i am not getting all permutations of words in the list.
for (int y = 0; y<newList.size(); y++){
String first = newList.get(y);
String second = "";
if(y+1<newList.size()){
second = newList.get(y+1);
}
ArrayList<String> tmpArr = new ArrayList<String>();
tmpArr.add(first);
tmpArr.add(second);
ArrayList<String> retArray = combine(tmpArr);
for (int c = 0; c <retArray.size(); c++) {
System.out.println(retArray.get(c));
}
}
public static ArrayList<String> combine(ArrayList<String> arr){
ArrayList<String> retArr = new ArrayList<String>();
if(arr.size()==0){
retArr.add("");
return retArr;
}
ArrayList<String> tmpArr = (ArrayList<String>)arr.clone();
tmpArr.remove(0);
for(String str1 : combine(tmpArr)){
for(String str2 : arr){
retArr.add(str2+","+str1);
if(retArr.size() == 10)
return retArr;
}
}
return retArr;
}
Please let me know how to correct the code to get all permutations of words in the list with size 2(all permutations of words with two words as output)
For example if the input data is as follows
Input - [visit,party,delegation]
Expected Output -
[visit,party
visit,delegation
party,visit
party,delegation
delegation,visit
delegation,party]
Current Output -
[visit,party,
party,party,
party,delegation,
delegation,delegation,]
This will give you all the two-word permutations of a list of words.
List<String> strings = Arrays.asList("party","visit","delegation");
for (int i = 0; i < strings.size(); i++){
for (int j = 0; j < strings.size(); j++){
if (i != j) System.out.println(strings.get(i) + "," + strings.get(j));
}
}
EDIT: added the if statement and changed the loop since you wanted permutations of different orders as well.
You should step through your program with a debugger to see what happens.
My suggestion for the fault reason is: you create a new retArr in every call of combine, so you get only the results of the last call.
See that you use the same retArr for all executions of combine.

Categories