I have an ArrayList that contains a bunch of words each in their own cell that come from a file. Some of those words are complete word like "physicist, water, gravity". However, other words are just letters that got split throughout the process of the program. For example, "it's" became "it" and "s". As such, I want to remove all of the single letter words except "I" and "A" because these are actual words.
This is the code I have for now:
for(int i=0;i<dictionnary.size();i++) {
if(dictionnary.get(i).compareToIgnoreCase("I")!=0||dictionnary.get(i).compareToIgnoreCase("A")!=0||dictionnary.get(i).length()==1){
dictionnary.remove(i);
}
}
Where dictionnary is my ArrayList. However, when I print out the content of my arrayList the "s" from it's remains. I also know that there was originally a word "E" that got removed throughout the process above. I'm confused as to why the "S" remains and how to fix it.
From my understanding this code goes through the ArrayList and checks if the length of the case is 1 (which is the case for all single letter words) as well as checking if that case is a case of "I" or "A" regardless of if it is capitalized or not. It then removes the cases that don't correspond to the "I" or "A".
Consider using the Collection Iterator for safe removal of elements during iteration.
for (Iterator<String> iter = dictionary.iterator() ; iter.hasNext() ; ) {
String word = iter.next();
if (word.length() == 1
&& !"I".equals(word)
&& !"A".equalsIgnoreCase(word)) {
iter.remove();
}
}
My suggestion is the following:
You can use removeIf in a next way.
removeIf takes a predicate.
public static void main(String[] args) {
List<String> dictionary = new ArrayList<>();
dictionary.add("I");
dictionary.add("A");
dictionary.add("p");
dictionary.add("its");
dictionary.add("water");
dictionary.add("s");
Integer sizeRemove =1;
dictionary.removeIf(
word ->
!"I".equals(word)
&& !"A".equalsIgnoreCase(word)
&& word.length() == sizeRemove
);
System.out.println(dictionary);
}
The output is the following:
[I, A, its, water]
Reference:
https://www.programiz.com/java-programming/library/arraylist/removeif
Use iterators instead. Let's say you have a list of (1,2,3,4,5) and you want to remove the numbers 2 and 3. You start looping through and get to the second element 2. Here your i is 1. You remove that element and go to i=2. What you have now is (1,3,4,5). Since i=2, you have missed one element.
And that's the reason you should use iterators instead. Refer to #vsfDawg answer.
Related
What will be the best way to remove any item from the arraylist, which contains all the characters of the same type?
Please refer the example string array list data below:
Element 1: FFFFFFFF
Element 2: 123
Element 3: ABCD1234
Element 4: FFFFFFFFFFFFFFFFF
Element 5: ABCDEF
From the above data, I want to remove 1st and 4th records because they contain all the characters as "F".
What I have tried so far is explained in pseudo-code below:
1. Iterated the list till the end in a loop
2. Get the data of current element
3. Check if the element string contains all "F" characters and nothing else.
4. If yes, note the index position of current element else move to next element
5. Use second loop to remove the elements from the stored index position
6. Here I got stuck because removing an element from arraylist changes its size and index position of remaining elements
Note# It will be more helpful if the method is dynamic to supply any character(like if the element contains all "A").
You can call List.removeIf() with a regex to test for repeating characters:
listOfData.removeIf(s -> s.matches("(.)\\1*"));
To break down the regex:
. matches any character
(.) captures that first character
\1 backreferences that capture
* finds 0 or more of the same
In other words, if the string consists of only a character followed by itself n times, remove it.
If you want to test for a specific repeating character, say c, it's even easier:
listOfData.removeIf(s -> s.matches(c + "+"));
This means "match one or more instances of c". Note that this doesn't handle special characters like '('.
String s=//populate data of string here
int distinct = 1 ;
for (int j = 0; j < s.length(); j++) {
if(s.charAt(0)==s.charAt(j))
{
distinct++;
}
}
if(s.length==distinct){
//all characters are same and remove
}
If you are using Java 8, you can use List.removeIf() as #shmosel suggested. But if you want to compile your code under older java versions, try something like below.
public static void removeCharacterSetElementFromList(char character, ArrayList<String> list){
ArrayList<String> listCopy = (ArrayList<String>)list.clone();
for (String listItem : listCopy){
boolean removable = listItem.length()>0 && character==listItem.charAt(0);
for (int i = 0; i < listItem.length(); i++){
char current = listItem.charAt(i);
if (character!=current) {
removable=false;
break;
}
}
if(removable) list.remove(listItem);
}
}
Then you can simply call removeCharacterSetElementFromList('F',listOfData); to remove recodes.
My answer to this question is as follows, but I want to know if I can use this code and what will be the complexity:
import java.util.LinkedHashMap;
import java.util.Map.Entry;
public class FirstNonRepeatingCharacterinAString {
private char firstNonRepeatingCharacter(String str) {
LinkedHashMap<Character, Integer> hash =
new LinkedHashMap<Character, Integer>();
for(int i = 0 ; i< str.length() ; i++)
{
if(hash.get(str.charAt(i))==null)
hash.put(str.charAt(i), 1);
else
hash.put(str.charAt(i), hash.get(str.charAt(i))+1);
}
System.out.println(hash.toString());
for(Entry<Character, Integer> c : hash.entrySet())
{
if(c.getValue() == 1)
return c.getKey();
}
return 0 ;
}
public static void main(String args[])
{
String str = "geeksforgeeks";
FirstNonRepeatingCharacterinAString obj =
new FirstNonRepeatingCharacterinAString();
char c = obj.firstNonRepeatingCharacter(str);
System.out.println(c);
}
}
Your question about whether you "can use this code" is a little ambiguous - if you wrote it, I'd think you can use it :)
As for the complexity, it is O(n) where n is the number of characters in the String. To count the number of occurrences, you must iterate over the entire String, plus iterate over them again to find the first one with a count of 1. In the worst case, you have no non-repeating characters, or the only non-repeating character is the last one. In either case, you have to iterate over the whole String once more. So it's O(n+n) = O(n).
EDIT
There is a bug in your code, by the way. Because you are using an insertion-order LinkedHashMap, each call to put(Character,Integer) results in a re-ordering of the underlying list. You should probably use a LinkedHashMap<Character,int[]> instead, and check for the presence of keys before putting. If they exist, then merely increment the value stored in the int[] to avoid re-ording the map by making another put call. Even so, the resulting list will be in reverse order from the way you iterate over it, so the first non-repeating character will be the last one you find when iterating over it whose value is 1. Alternatively, you could just iterate in reverse in your first for loop, then you avoid having to always go through the entire Entry set if the first non-repeating character comes sooner than the final character in the original String.
I have a large array list of sentences and another array list of words.
My program loops through the array list and removes an element from that array list if the sentence contains any of the words from the other.
The sentences array list can be very large and I coded a quick and dirty nested for loop. While this works for when there are not many sentences, in cases where their are, the time it takes to finish this operation is ridiculously long.
for (int i = 0; i < SENTENCES.size(); i++) {
for (int k = 0; k < WORDS.size(); k++) {
if (SENTENCES.get(i).contains(" " + WORDS.get(k) + " ") == true) {
//Do something
}
}
}
Is there a more efficient way of doing this then a nested for loop?
There's a few inefficiencies in your code, but at the end of the day, if you've got to search for sentences containing words then there's no getting away from loops.
That said, there are couple of things to try.
First, make WORDS a HashSet, the contains method will be far quicker than for an ArrayList because it's doing a hash look-up to get the value.
Second, switch the logic about a bit like this:
Iterator<String> sentenceIterator = SENTENCES.iterator();
sentenceLoop:
while (sentenceIterator.hasNext())
{
String sentence = sentenceIterator.next();
for (String word : sentence.replaceAll("\\p{P}", " ").toLowerCase().split("\\s+"))
{
if (WORDS.contains(word))
{
sentenceIterator.remove();
continue sentenceLoop;
}
}
}
This code (which assumes you're trying to remove sentences that contain certain words) uses Iterators and avoids the string concatenation and parsing logic you had in your original code (replacing it with a single regex) both of which should be quicker.
But bear in mind, as with all things performance you'll need to test these changes to see they improve the situation.
I̶ ̶w̶o̶u̶l̶d̶ ̶s̶a̶y̶ ̶n̶o̶,̶ ̶b̶u̶t̶ what you must change is the way you handle the removal of the data. This is noted by this part of the explanation of your problem:
The sentences array list can be very large (...). While this works for when there are not many sentences, in cases where their are, the time it takes to finish this operation is ridiculously long.
The cause of this is that removal time in ArrayList takes O(N), and since you're doing this inside a loop, then it will take at least O(N^2).
I recommend using LinkedList rather than ArrayList to store the sentences, and use Iterator rather than your naive List#get since it already offers Iterator#remove in time O(1) for LinkedList.
In case you cannot change the design to LinkedList, I recommend storing the sentences that are valid in a new List, and in the end replace the contents of your original List with this new List, thus saving lot of time.
Apart from this big improvement, you can improve the algorithm even more by using a Set to store the words to lookup rather than using another List since the lookup in a Set is O(1).
What you could do is put all your words into a HashSet. This allows you to check if a word is in the set very quickly. See https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html for documentation.
HashSet<String> wordSet = new HashSet();
for (String word : WORDS) {
wordSet.add(word);
}
Then it's just a matter of splitting each sentence into the words that make it up, and checking if any of those words are in the set.
for (String sentence : SENTENCES) {
String[] sentenceWords = sentence.split(" "); // You probably want to use a regex here instead of just splitting on a " ", but this is just an example.
for (String word : sentenceWords) {
if (wordSet.contains(word)) {
// The sentence contains one of the special words.
// DO SOMETHING
break;
}
}
}
I will create a set of words from second ArrayList:
Set<String> listOfWords = new HashSet<String>();
listOfWords.add("one");
listOfWords.add("two");
I will then iterate over the set and the first ArrayList and use Contains:
for (String word : listOfWords) {
for(String sentence : Sentences) {
if (sentence.contains(word)) {
// do something
}
}
}
Also, if you are free to use any open source jar, check this out:
searching string in another string
First, your program has a bug: it would not count words at the beginning and at the end of a sentence.
Your current program has runtime complexity of O(s*w), where s is the length, in characters, of all sentences, and w is the length of all words, also in characters.
If words is relatively small (a few hundred items or so) you could use regex to speed things up considerably: construct a pattern like this, and use it in a loop:
StringBuilder regex = new StringBuilder();
boolean first = true;
// Let's say WORDS={"quick", "brown", "fox"}
regex.append("\\b(?:");
for (String w : WORDS) {
if (!first) {
regex.append('|');
} else {
first = false;
}
regex.append(w);
}
regex.append(")\\b");
// Now regex is "\b(?:quick|brown|fox)\b", i.e. your list of words
// separated by OR signs, enclosed in non-capturing groups
// anchored to word boundaries by '\b's on both sides.
Pattern p = Pattern.compile(regex.toString());
for (int i = 0; i < SENTENCES.size(); i++) {
if (p.matcher(SENTENCES.get(i)).find()) {
// Do something
}
}
Since regex gets pre-compiled into a structure more suitable for fast searches, your program would run in O(s*max(w)), where s is the length, in characters, of all sentences, and w is the length of the longest word. Given that the number of words in your collection is about 200 or 300, this could give you an order of magnitude decrease in running time.
If you have enough memory you can tokenize SENTENCES and put them in a Set. Then it would be better in performance and also more correct than current implementation.
Well, looking at your code I would suggest two things that will improve the performance from each iteration:
Remove " == true". The contains operation already returns a boolean, so it is enough for the if, comparing it with true adds one extra operation for each iteration that is not needed.
Do not concatenate Strings inside a loop (" " + WORDS.get(k) + " ") as it is a quite expensive operation because + operator creates new objects. Better use a string buffer / builder and clear it after each iteration with stringBuffer.setLength(0);.
Besides that, for this case I do not know any other approach, maybe you can use regular expressions if you can abstract a pattern out of those words you want to remove and have then only one loop.
Hope it helps!
If you concern about the efficiency, I think that the most effective way to do this is to use Aho-Corasick's algorithm. While you have 2 nested loops here and a contains() method (that I think takes at the best length of sentence + length of word time), Aho-Corasick gives you one loop over sentences and for checking of containing words it takes length of sentence, which is length of word times faster (+ a preprocessing time for creation of finite state machine, which is relatively small).
I'll approach this in more theoretical view.. If you don't have memory limitation, you can try to mimic the logic in counting sort
say M1 = sentences.size, M2 = number of word per sentences, and N = word.size
Assume all sentences has the same number of words just for simplicity
your current approach's complexity is O(M1.M2.N)
We can create a mapping of words - position in sentences.
Loop through your arraylist of sentences, and change them into two dimensional jagged array of words. Loop through the new array, create a HashMap where key,value = words, arraylist of word position (say with length X). That's O(2M1.M2.X) = O(M1.M2.X)
Then loop through your words arraylist, access your word hashmap, loop through the list of word position. remove each one. That's O(N.X)
Say you're need to give the result in arraylist of string, we need another loop and concat everything. That's O(M1.M2)
Total complexity is O(M1.M2.X) + O(N.X) + O(M1.M2)
assumming X is way smaller than N, you'll probably get better performance
So I have this array, and I want to delete strings that are 2 or 4 characters in length (strings that contain 2 or 4 characters). I am doing this method, and it doesn't work, even though logically, it SHOULD work.
public static void main(String[] args)
{
ArrayList<String> list = new ArrayList<String>();
list.add("This");
list.add("is");
list.add("a");
list.add("test");
for (int i=0; i<list.size(); i++)
{
if(list.get(i).length()==2 || list.get(i).length()==4)
{
list.remove(i);
}
}
}
I'd like to stick to this method of doing it. Can you please give me some suggestions as to how to correct this code?
The output of this code when I run it is:
[is, a]
Even though I want the output to be
[a]
because "is" is 2 characters long.
The list is changing. Iterate from last element to first or use iterator.
PeterPeiGuo is right - you are removing elements which is shifting your index.
This is a prime candidate for an iterator.
Iterator<String> it = list.iterator();
while(it.hasNext()) {
String val = it.next();
if(val.length() == 4 || val.length() == 2) {
it.remove();
}
}
Another option for it:
When you remove one, decrease your index by 1.
By the way, it works, but is not a good coding style.
for (int i=0; i<list.size(); i++)
{
if(list.get(i).length()==2 || list.get(i).length()==4)
{
list.remove(i);
i--;
}
}
Deleting things from the list changes the indexes of the remaining things in the list.
When your code runs, in the first iteration, i is 0 and it deletes the "this" entry at 0.
On the second iteration i is 1 and thus it doesn't check the value at 0, which is now "is" because the "this" was removed.
As PeterPeiGui says in his answer, you can work around it in this particular case just by going backward, but traversing a collection and mutating it simultaneously always has a risk of introducing plenty of confusion.
How to find the number of occurrence of every unique character in a String? You can use at most one loop. please post your solution, thanks.
Since this sounds like a homework problem, let's try to go over how to solve this problem by hand. Once we do that, let's see how we can try to implement that in code.
What needs to be done?
Let's take the following string:
it is nice and sunny today.
In order to get a count of how many times each character appears in the above string, we should:
Iterate over each character of the string
Keep a tally of how many times each character in the string appears
How would we actually try it?
Doing this this by hand might be like this:
First, we find a new characeter i, so we could note that in a table and say that i appeared 1 time so far:
'i' -> 1
Second, we find another new character t, so we could add that in the above table:
'i' -> 1
't' -> 1
Third, a space, and repeat again...
'i' -> 1
't' -> 1
' ' -> 1
Fourth, we encounter an i which happens to exist in the table already. So, we'll want to retrieve the existing count, and replace it with the existing count + 1:
'i' -> 2
't' -> 1
' ' -> 1
And so on.
How to translate into code?
Translating the above to code, we may write something like this:
For every character in the string
Check to see if the character has already been encountered
If no, then remember the new character and say we encountered it once
If yes, then take the number of times it has been encountered, and increment it by one
For the implementation, as others have mentioned, using a loop and a Map could achieve what is needed.
The loop (such as a for or while loop) could be used to iterate over the characters in the string.
The Map (such as a HashMap) could be used to keep track of how many times a character has appeared. In this case, the key would be the character and the value would be the count for how many times the character appears.
Good luck!
It's a homework, so cannot post the code, but here is one approach:
Iterate through the string, char by char.
Put the char in a hashmap key and initialize its value to 1 (count). Now, if the char is encountered again, update the value (count+1). Else add the new char to key and again set its value (count=1)
Here you go! I have done a rough program on Count occurrences of each unique character
public class CountUniqueChars{
public static void main(String args[]){
HashMap<Character, Integer> map;
ArrayList<HashMap<Character, Integer>> list = new ArrayList<HashMap<Character,Integer>>();
int i;
int x = 0;
Boolean fire = false;
String str = "Hello world";
str = str.replaceAll("\\s", "").toLowerCase();
System.out.println(str.length());
for(i=0; i<str.length() ; i++){
if(list.size() <= 0){
map = new HashMap<Character, Integer>();
map.put(str.charAt(i), 1);
list.add(map);
}else{
map = new HashMap<Character, Integer>();
map.put(str.charAt(i), 1);
fire = false;
for (HashMap<Character, Integer> t : list){
if(t.containsKey(str.charAt(i)) == map.containsKey(str.charAt(i))){
x = list.indexOf(t);
fire = true;
map.put(str.charAt(i), t.get(str.charAt(i))+1);
}
}
if(fire){
list.remove(x);
}
list.add(map);
}
}
System.out.println(list);
}
}