Duplicates in Array even though a Set was used

Duplicates in Array even though a Set was used - java

For a class project, we have to take a string(a paragraph),make it into an array of the individual words, and then make those words into objects of Object Array. The words cannot repeat so I used a Set to only get the unique values but only certain words are repeating! Here is the code for the method. Sorry for the vague description.
Private void processDocument()
{
String r = docReader.getLine();
lines++;
while(docReader.hasLines()==true)
{
r= r+" " +docReader.getLine();
lines++;
}
r = r.trim();
String[] linewords = r.split(" ");
while(linewords.length>words.length)
{
this.expandWords();
}
String[] newWord = new String[linewords.length];
for(int i=0;i<linewords.length;i++)
{
newWord[i] = (this.stripPunctuation(linewords[i]));
}
Set<String> set = new HashSet<String>(Arrays.asList(newWord));
Object[]newArray = set.toArray();
words = new Word[set.size()-1];
String newString = null;
for(int i =0;i<set.size();i++)
{
if(i==0)
{
newString = newArray[i].toString() + "";
}
else
{
newString = newString+newArray[i].toString()+" ";
}
}
newString = newString.trim();
String[] newWord2 = newString.split(" ");
for(int j=0;j<set.size()-1;j++)
{
Word newWordz = new Word(newWord2[j].toLowerCase());
words[j] = newWordz;
}

I believe the problem is when you put it into the HashSet the words are capitalized differently, causing the HashCode to be different. Cast everything to lowercase the moment you read it from the file and it should work.
newWord[i] = (this.stripPunctuation(linewords[i])).toLowerCase();

Try this:
public String[] unique(String[] array) {
return new HashSet<String>(Arrays.asList(array)).toArray();
}
Shamelessly copied from Bohemain's answer.
Also, as noted by #Brinnis, make sure that words are trimmed and in the right case.
for(int i = 0; i < linewords.length; i++) {
newWord[i] = this.stripPunctuation(linewords[i]).toLowerCase();
}
String[] newArray = unique(newWord);

Related

Text to string array and delete duplicates

The idea of the program is that it gets the text divided by space from the scanner.
I need to write a method to create an array from text, delete duplicates and return an array of the words which are used only once and don't have duplicates.
I can't find out how to make a new array of unique words. Using only simple and basic construction without HashSet etc.*
For example:
a b a b c a b d
result:
c d
public static String Dublicate(String text) {
String[] dublic = text.split(" ");
String result="";
for (int i = 0; i < dublic.length; i++) {
for (int j = i + 1; j < dublic.length; j++)
if (dublic[i].equals(dublic[j]))
dublic[j] = "delete";
}
for (String s: dublic) {
if (s !="delete") {
result =result + s + " ";
}
}
return result;
}

Split By Space
For splitting by space we can use the split() method & can pass the Space string ("") in the parameter.
String[] texts = text.split(" ");
Delete The duplicate elements
If We can use java 1.8 or greater than 1.8, we can use stream API for getting distinct elements like.
Arrays.stream(texts).distinct().toArray(String[]::new);
Or if we need to implement it with java 1.7, we can use HashSet for getting distinct elements like.
String[] distinctElements = new HashSet<String>(Arrays.asList(texts)).toArray(new String[0]);
The final Source code can be like this:
public static String[] textToArray1_7(String text) {
//split by space
String[] texts = text.split(" ");
//Distinct value
return Arrays.stream(texts).distinct().toArray(String[]::new);
}
public static String[] textToArray1_8(String text) {
//split by space
String[] texts = text.split(" ");
//Distinct value
return new HashSet<String>(Arrays.asList(texts)).toArray(new String[0]);
}
If any further question, can ask for more clarification.

You forgot to mark i-th element as duplicate in case when it really is. See my comments in the below code
public static String Dublicate(String text) {
String[] dublic = text.split(" ");
String result="";
for (int i=0; i<dublic.length; i++){
if (dublic[i].equals("delete")) { // Minor optimization:
// skip elements that are already marked
continue;
}
boolean isDub = false; // we need to track i-th element
for(int j=i+1; j<dublic.length; j++) {
if (dublic[i].equals(dublic[j])) {
dublic[j] = "delete";
isDub = true; // i-th element is also a duplicate...
}
}
if (isDub) {
dublic[i] = "delete"; // ...so you should also mark it
}
}
for(String s: dublic){
if(!s.equals("delete")) { // for strings you should use "!equals" instead of "!="
result = result + s + " ";
}
}
return result;
}
P.S. if original text contains "delete" the result will be incorrect since you use "delete" as a reserved marker word

If the array of unique strings needs to be returned, then the initial array of strings after the splitting the input has to be compacted to exclude invalid values, and a smaller copy needs to be returned:
public static String[] uniques(String text) {
String[] words = text.split(" ");
int p = 0; // index/counter of unique elements
for (int i = 0; i < words.length; i++) {
String curr = words[i];
if (null == curr) {
continue;
}
boolean dupFound = false;
for (int j = i + 1; j < words.length; j++) {
if (null == words[j]) {
continue;
}
if (curr.equals(words[j])) {
words[j] = null;
dupFound = true;
}
}
if (dupFound) {
words[i] = null;
} else {
words[p++] = words[i]; // shift unique elements to the start of array
}
}
return Arrays.copyOf(words, p);
}
If the array of unique strings is returned, it may be conveniently converted into String using String::join as shown below in the test.
Test:
System.out.println(Arrays.toString(uniques("a b a b c a b d")));
System.out.println(String.join(" ", uniques("a b a b c a b d")));
Output
[c, d]
c d

How do I exclude capitalizing specific words in a String?

I'm new to programming, and here I'm required to capitalise the user's input, which excludes certain words.
For example, if the input is
THIS IS A TEST I get This Is A Test
However, I want to get This is a Test format
String s = in.nextLine();
StringBuilder sb = new StringBuilder(s.length());
String wordSplit[] = s.trim().toLowerCase().split("\\s");
String[] t = {"is","but","a"};
for(int i=0;i<wordSplit.length;i++){
if(wordSplit[i].equals(t))
sb.append(wordSplit[i]).append(" ");
else
sb.append(Character.toUpperCase(wordSplit[i].charAt(0))).append(wordSplit[i].substring(1)).append(" ");
}
System.out.println(sb);
}
This is the closest I have gotten so far but I seem to be unable to exclude capitalising the specific words.

The problem is that you are comparing each word to the entire array. Java does not disallow this, but it does not really make a lot of sense. Instead, you could loop each word in the array and compare those, but that's a bit lengthy in code, and also not very fast if the array of words gets bigger.
Instead, I'd suggest creating a Set from the array and checking whether it contains the word:
String[] t = {"is","but","a"};
Set<String> t_set = new HashSet<>(Arrays.asList(t));
...
if (t_set.contains(wordSplit[i]) {
...

Your problem (as pointed out by #sleepToken) is that
if(wordSplit[i].equals(t))
is checking to see if the current word is equal to the array containing your keywords.
Instead what you want to do is to check whether the array contains a given input word, like so:
if (Arrays.asList(t).contains(wordSplit[i].toLowerCase()))
Note that there is no "case sensitive" contains() method, so it's important to convert the word in question into lower case before searching for it.

You're already doing the iteration once. Just do it again; iterate through every String in t for each String in wordSplit:
for (int i = 0; i < wordSplit.length; i++){
boolean found = false;
for (int j = 0; j < t.length; j++) {
if(wordSplit[i].equals(t[j])) {
found = true;
}
}
if (found) { /* do your stuff */ }
else { }
}

First of all right method which is checking if the word contains in array.
contains(word) {
for (int i = 0;i < arr.length;i++) {
if ( word.equals(arr[i])) {
return true;
}
}
return false;
}
And then change your condition wordSplit[i].equals(t) to contains(wordSplit[i]

You are not comparing with each word to ignore in your code in this line if(wordSplit[i].equals(t))
You can do something like this as below:
public class Sample {
public static void main(String[] args) {
String s = "THIS IS A TEST";
String[] ignore = {"is","but","a"};
List<String> toIgnoreList = Arrays.asList(ignore);
StringBuilder result = new StringBuilder();
for (String s1 : s.split(" ")) {
if(!toIgnoreList.contains(s1.toLowerCase())) {
result.append(s1.substring(0,1).toUpperCase())
.append(s1.substring(1).toLowerCase())
.append(" ");
} else {
result.append(s1.toLowerCase())
.append(" ");
}
}
System.out.println("Result: " + result);
}
}
Output is:
Result: This is a Test

To check the words to exclude java.util.ArrayList.contains() method would be a better choice.
The below expression checks if the exclude list contains the word and if not capitalises the first letter:
tlist.contains(x) ? x : (x = x.substring(0,1).toUpperCase() + x.substring(1)))
The expression is also corresponds to:
if(tlist.contains(x)) { // ?
x = x; // do nothing
} else { // :
x = x.substring(0,1).toUpperCase() + x.substring(1);
}
or:
if(!tlist.contains(x)) {
x = x.substring(0,1).toUpperCase() + x.substring(1);
}
If you're allowed to use java 8:
String s = in.nextLine();
String wordSplit[] = s.trim().toLowerCase().split("\\s");
List<String> tlist = Arrays.asList("is","but","a");
String result = Stream.of(wordSplit).map(x ->
tlist.contains(x) ? x : (x = x.substring(0,1).toUpperCase() + x.substring(1)))
.collect(Collectors.joining(" "));
System.out.println(result);
Output:
This is a Test

ArrayIndexOutOfBounds when trying to add to list from string array

I am having one problem that is preventing my entire code from working. It is having an array index out of bounds error, but it matches the file array perfectly, so I'm not sure what the problem is..
public void Menu() {
prompt.welcomeMsg();
prompt.nGramOptionMsg();
String userInput = input.next();
while (userInput.charAt(0) != 's' || userInput.charAt(0) != 'S') {
if (userInput.charAt(0) == 'n' || userInput.charAt(0) == 'N') {
prompt.nGramLengthMsg();
int userIntut = input.nextInt();
nGram = new NGram(userIntut);
prompt.fileUpload();
String userFilePut = input.next();
FileOpener file = new FileOpener(userFilePut);
String[] fileArray = file.openFile();
for (int i = 0; i < fileArray.length; i++) {
String[] splitedFileArray = fileArray[i].split("\\s+");
list.add(splitedFileArray[i]);
}
String[] listToStringArray = (String[]) list.toArray(new String[0]);
String[] nGrams = nGram.arrayToNGram(fileArray);
for (int i = 0; i < nGrams.length; i++) {
Word word;
if (!hashMap.containsKey(nGrams[i])) {
word = new Word(nGrams[i], 1);
hashMap.put(word.getNGram(), word);
} else {
Word tempWord = hashMap.remove(nGrams[i]);
tempWord.increaseAbsoluteFrequency();
hashMap.put(tempWord.getNGram(), tempWord);
}
}
HashMapFiller fill = new HashMapFiller();
fill.hashMap(hashMap);
fill.print();
prompt.goAgain();
}
}
The problem occurs when the list.add is trying to add the splitedFileArray. I tried doing fileArray.length-1 but it had a similar error, except -1.

The root cause for this problem is that you are trying to access the array in following line. What actually happening in behind the scenes is that you actually try to access unknown sized array which is returned from the split() method. returned array size might be less than the defined index (in your case i).
list.add(splitedFileArray[i]);
You can resolve this problem as follows..
for (int i = 0; i < fileArray.length; i++) {
String[] splitedFileArray = fileArray[i].split("\\s+");
list.addAll(Arrays.asList(splitedFileArray));
}
Hope this answer will help you to resolve your problem...

A function that display the same text with two letters reversed

I'm trying to make an encryptor.What i want it to do:
Get the text i enter and reverse the first two letters of every word
and then display it again.
I have tried a lot of ways.This is the last one i've tried:
private void TranslateToEf(){
String storage = Display.getText();
String[] arr = storage.split("\\W+");
for ( String ss : arr) {
char c[] = ss.toCharArray();
char temp = c[0];
c[0] = c[1];
c[1] = temp;
String swappedString = new String(c);
Display.appendText(swappedString + " ");
}
}

You may want to consider maintaining all the delimiters lost from the first String.split("\\W+") so they can be included in the final result. I would do that with a String.split("\\w+")
You may also want to consider that when you swap the first two letters, if the first letter is capital it becomes lowercase and the second letter becomes uppercase. Otherwise, just do a direct swap.
Code sample:
public static void main(String[] args) throws Exception {
String data = "Hello;World! My name is John. I write code.";
String[] words = data.split("\\W+");
String[] delimiters = data.split("\\w+");
int delimiterIndex = 0;
StringBuilder sb = new StringBuilder();
for (String word : words) {
if (word.length() < 2) {
sb.append(word);
} else {
char firstLetter = word.charAt(0);
char secondLetter = word.charAt(1);
if (Character.isUpperCase(firstLetter)) {
// Swap the first two letters and change casing
sb.append(Character.toUpperCase(secondLetter))
.append(Character.toLowerCase(firstLetter));
} else {
// Swap the first two letters
sb.append(secondLetter)
.append(firstLetter);
}
// Append the rest of the word past the first two letters
sb.append(word.substring(2));
}
// Append delimiters
if (delimiterIndex < delimiters.length) {
// Skip blank delimiters if there are any
while (delimiters[delimiterIndex].isEmpty()) {
delimiterIndex++;
}
// Append delimiter
sb.append(delimiters[delimiterIndex++]);
}
}
data = sb.toString();
// Display result
System.out.println(data);
}
Results:
Ehllo;Owrld! Ym anme si Ojhn. I rwite ocde.

public class Encrypto {
public static void main(String[] args) {
String input="Hello World";
String [] word = input.split(" ");
// System.out.println(word[0]);
String encryWord="";
for(int i=0;i<word.length;i++){
if (word[i].length() > 0) {
String tmp0 = String.valueOf(word[i].charAt(1));
String tmp1 = String.valueOf(word[i].charAt(0));
encryWord += tmp0.toLowerCase() + tmp1.toLowerCase() + word[i].substring(2) + " ";
}else{
encryWord +=word[i];
}
}
System.out.println(encryWord);
}
}
I think answer is more helpful for you

There are a few problems.
Declare zz outside the loop if you want to use it outside.
Append zz on every iteration. Not just assign it.
Something like this,
private void TranslateToEf(){
String storage = Display.getText();
String[] arr = storage.split("\\W+");
String zz = "";
for ( String ss : arr) {
char c[] = ss.toCharArray();
char temp = c[0];
c[0] = c[1];
c[1] = temp;
String swappedString = new String(c);
String b= " ";
zz += swappedString + b;
}
Display.setText(zz + " ");
}
You are splitting with non-word (\W+) characters, but replacing it only with a space " ". This could alter the string with special characters.

Not sure what exactly you are looking for but i little modification in your code see if this suits your needs
String storage = "Test test t";
String[] arr = storage.split("\\W+");
String abc = "";
for ( String ss : arr) {
if(ss.length() > 1)
{
char c[] = ss.toCharArray();
char temp = c[0];
c[0] = c[1];
c[1] = temp;
String swappedString = new String( c );
String b = " ";
String zz = swappedString + b;
abc = abc + zz;
}else{
abc = abc + ss;
}
}
System.out.println(abc);

In Java strings are immutable. You can't modify them "on the fly", you need to reassign them to a new instance.
Additionally, you are setting the last display text to zz, but zz is a local variable to your loop, and therefore it gets re-instantiated with every iteration. In other words, you would be assigning to display only the last word!
Here is what you have to do to make it work:
String storage = Display.getText();
String[] arr = storage.split("\\W+");
String[] newText = new String[arr.length];
for ( int i = 0; i<arr.length; i++) {
String original = arr[i];
String modified = ((char) original.charAt(1)) + ((char) original.charAt(0)) + original.substring(2);
newText[i] = modified;
}
//Join with spaces
String modifiedText = Arrays.asList(newText).stream().collect(Collectors.join(" "));
Display.setText(modifiedText);
Note that:
1) We are assuming all strings have at least 2 chars
2) that your splitting logic is correct. Can you think some edge cases where your regexp fails?

Separating an address line into House Number, Street name, and Apartment in Java or COBOL

I am currently trying to figure out the best way to take an address line and separate it out into three fields for a file, house number, street name, and apartment number. Thankfully, the city, state, and zip are already in columns so all I have to parse out is just the three things listed above, but even that is proving difficult. My initial hope was to do this in COBOL using SQL, but I dont think I am able to use the PATINDEX example someone else had listed on a separate question thread, I kept getting -440 SQL code. My second thought was to do this in Java using the strings as arrays and checking the arrays for numbers, then letters, then a compare for "Apt" or something to that effect. I have this so far to try to test out what I'm ultimately trying to do, but I am getting out of bounds exception for the array.
class AddressTest{
public static void main (String[] arguments){
String adr1 = "100 village rest court";
String adr2 = "1000 Arbor lane Apt. 21-D";
String[] HouseNbr = new String[9];
String[] Street = new String[20];
String[] Apt = new String[5];
for(int i = 0; i < adr1.length();i++){
String[] forloop = new String[] {adr1};
if (forloop[i].substring(0,1).matches("[0-9]")){
if(forloop[i+1].substring(0,1).matches("[0-9]")){
HouseNbr[i] = forloop[i];
}
else if(forloop[i+1].substring(0,1).matches(" ")){
}
else if(forloop[i].substring(0,1).matches(" ")){
}
else{
Street[i] = forloop[i];
}
}
}
for(int j = 0; j < HouseNbr.length; j++){
System.out.println(HouseNbr[j]);
}
for(int k = 0; k < Street.length; k++){
System.out.println(Street[k]);
}
}
}
Any other thoughts would be extremly helpful.

I would consider removing the unnecessary arrays and use a StringTokenizer...
public static void main(String[] args) {
String number;
String address;
String aptNumber;
String str = "This is String , split by StringTokenizer";
StringTokenizer st = new StringTokenizer(str);
System.out.println("---- Split by space ------");
while (st.hasMoreElements()) {
String s = System.out.println(st.nextElement());
if (StringUtils.isNumeric(s) {
number = s;
continue;
}
if(s.indexOf("Apt")) {
aptNumber = s.substring(s.indexOf("Apt"),s.length-1);
continue;
}
}
System.out.println("---- Split by comma ',' ------");
StringTokenizer st2 = new StringTokenizer(str, ",");
while (st2.hasMoreElements()) {
System.out.println(st2.nextElement());
}
}

If you leverage the freely available U.S. Postal Service zip code finder (https://tools.usps.com/go/ZipLookupAction!input.action), you can get back an address in standardized format. The valid options on that format are documented by the USPS and will make it easier to write a very complicated regex, or a number of simple regexes, to read the standard form.

I am still working on it, but for any in the future who may need to do this:
import java.util.Arrays;
import java.util.StringTokenizer;
import org.apache.commons.lang3.*;
class AddressTest{
public static void main (String[] arguments){
String adr1 = "100 village rest court";
//String adr2 = "1000 Arbor lane Apt. 21-D";
String reader = new String();
String holder = new String();
StringTokenizer a1 = new StringTokenizer(adr1);
String[] HouseNbr = new String[9];
String[] StreetName = new String[20];
String[] Apartment = new String[5];
int counter = 0;
while(a1.hasMoreElements()){
reader = a1.nextElement().toString();
System.out.println("Reader: " + reader);
if(StringUtils.isNumeric(reader)){
String[] HNBR = reader.split("");
for(int i = 1; i <= reader.length();i++){
System.out.println("HNBR:_" + HNBR[i]);
HouseNbr[i-1] = HNBR[i];
}
}
else if(StringUtils.startsWith(reader, "Apt.")){
holder = a1.nextElement().toString();
String[] ANBR = holder.split("");
for(int j = holder.length(); j >= 0;j--){
Apartment[j] = ANBR[j];
}
}
else{
String STR[] = reader.split("");
for(int k = 1; k <= reader.length();k++){
if(counter == StreetName.length){
break;
}
else{
StreetName[counter] = STR[k];
if(counter < StreetName.length){
counter++;
}
}
}
if((counter < StreetName.length) && a1.hasMoreElements()){
StreetName[counter] = " ";
counter++;
}
}
}
System.out.println(Arrays.toString(HouseNbr) + " " + Arrays.toString(StreetName)
+ " " + Arrays.toString(Apartment));
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Duplicates in Array even though a Set was used - java

I believe the problem is when you put it into the HashSet the words are capitalized differently, causing the HashCode to be different. Cast everything to lowercase the moment you read it from the file and it should work. newWord[i] = (this.stripPunctuation(linewords[i])).toLowerCase();

Related

Text to string array and delete duplicates

How do I exclude capitalizing specific words in a String?

ArrayIndexOutOfBounds when trying to add to list from string array

A function that display the same text with two letters reversed

Separating an address line into House Number, Street name, and Apartment in Java or COBOL

Categories

Resources