Sorting an ArrayList<String> in a TreeMap - java

I am piping in a file. I am tracking word pairs from the file. Using a treemap the keys are all sorted. However, when i add words to those keys they are not sorted.
here is the part i need help on in the process function:
private static void process(){
if(!result.containsKey(thisWord)){
result.put(thisWord, new ArrayList<String>());
}
// Add nextWord to the list of adjacent words to thisWord:
result.get(thisWord).add(nextWord); // nextword is not sorted within the key
thisword is sorted
nextWord is not..
Can i use Collections.sort(result); somehow?
im just not sure how i get to the nextWord within the result to do that.
or, is there no way to do it within my situation. I would rather not change things unless you recommend it.
This is the program
import java.util.Map.Entry;
import java.util.TreeSet;
import java.io.*;
import java.util.*;
public class program1 {
private static List<String> inputWords = new ArrayList<String>();
private static Map<String, List<String>> result = new TreeMap<String, List<String>>();
public static void main(String[] args) {
collectInput();
process();
generateOutput();
}
private static void collectInput(){
Scanner sc = new Scanner(System.in);
String word;
while (sc.hasNext()) { // is there another word?
word = sc.next(); // get next word
if (word.equals("---"))
{
break;
}
inputWords.add(word);
}
}
private static void process(){
// Iterate through every word in our input list
for(int i = 0; i < inputWords.size() - 1; i++){
// Create references to this word and next word:
String thisWord = inputWords.get(i);
String nextWord = inputWords.get(i+1);
// If this word is not in the result Map yet,
// then add it and create a new empy list for it.
if(!result.containsKey(thisWord)){
result.put(thisWord, new ArrayList<String>());
}
// Add nextWord to the list of adjacent words to thisWord:
result.get(thisWord).add(nextWord); // need to sort nextword
// Collections.sort(result);
}
}
private static void generateOutput()
{
for(Entry e : result.entrySet()){
System.out.println(e.getKey() + ":");
// Count the number of unique instances in the list:
Map<String, Integer> count = new HashMap<String, Integer>();
List<String> words = (List)e.getValue();
for(String s : words){
if(!count.containsKey(s)){
count.put(s, 1);
}
else{
count.put(s, count.get(s) + 1);
}
}
// Print the occurances of following symbols:
for(Entry f : count.entrySet()){
System.out.println(" " + f.getKey() + ", " + f.getValue() );
}
}
System.out.println();
}
}

If you want the collection of "nextword"s sorted, why not use a TreeSet rather than an ArrayList? The only reason I can see against it is if you might have duplicates. If duplicates are allowed, then yes, use Collections.sort on the ArrayList when you're done adding to them. Or look in the Apache Commons or Google collection classes - I don't know them off the top of my head, but I'm sure there is a sorted List that allows duplicates in one or both of them.

result.get(thisWord).add(nextWord);
Collections.sort(result.get(thisWord));

Y Don't you try some thing like this
Collections.sort(inputWords);

Related

Split a List of Strings based on value in Java

What I want to do is to split an array of strings, when the first 6 characters in the string are zeroes ("000000") or when all the digits in the string are zeroes. Limiting to 6 characters won't be very dynamic.
I got this code, and it does what I want to achieve.
import java.util.*;
public class Main
{
public static void main(String[] args) {
ArrayList<String> unsplitted = new ArrayList<String>();
unsplitted.add("000000: this_should_go_into_first_array");
unsplitted.add("000234: something1");
unsplitted.add("0000ff: something2");
unsplitted.add("000111: something3");
unsplitted.add("000051: something4");
unsplitted.add("007543: something5");
unsplitted.add("000000: and_this_should_go_into_second_array");
unsplitted.add("005612: something7");
unsplitted.add("005712: something8");
System.out.println("Unsplitted list: "+ unsplitted);
List<String> arrlist1 = unsplitted.subList(0, 6);
List<String> arrlist2 = unsplitted.subList(6, unsplitted.size());
System.out.println("Sublist of arrlist1: "+ arrlist1);
System.out.println("Sublist of arrlist2: "+ arrlist2);
}
}
Which prints out the wanted results
Sublist of arrlist1: [000000: this_should_go_into_first_array, 000234: something1, 0000ff: something2, 000111: something3, 000051: something4, 007543: somethi
ng5]
Sublist of arrlist2: [000000: and_this_should_go_into_second_array, 005612: something7, 005712: something8]
However, I don't know the indexes for the zeroes beforehand, so how can I achieve the same result by finding the zeroes dynamically?
You can simply iterate in your array and create "bucket" each time you detect your 000000 string :
ArrayList<String> unsplitted = new ArrayList<String>();
unsplitted.add("000000: this_should_go_into_first_array");
unsplitted.add("000234: something1");
unsplitted.add("0000ff: something2");
unsplitted.add("000111: something3");
unsplitted.add("000051: something4");
unsplitted.add("007543: something5");
unsplitted.add("000000: and_this_should_go_into_second_array");
unsplitted.add("005612: something7");
unsplitted.add("005712: something8");
List<List<String>> results = new ArrayList<>();
unsplitted.forEach(w -> {
if(w.startsWith("000000") || results.isEmpty()) {
// no bucket or detect 000000
List<String> bucket = new ArrayList<>();
bucket.add(w);
results.add(bucket);
}
else {
// not contains 00000 put the value in the last bucket
results.get(results.size() - 1).add(w);
}
});
results.forEach(w -> {
System.out.println("Sublist " + w);
});
Is it the result that you expected ?
The result :
Sublist [000000: this_should_go_into_first_array, 000234: something1, 0000ff: something2, 000111: something3, 000051: something4, 007543: something5]
Sublist [000000: and_this_should_go_into_second_array, 005612: something7, 005712: something8]
The question is quite interesting. There are different way to implement this, but I am going to show you a solution where it can be applied with any length of the first part, which we can consider as a key.
As you said in your introduction, it wouldn't be dynamic if the check was limited to only 6 characters. Based on this, as an example, you can take the position of the character ':' as reference and apply a partitioning among the elements of the array.
Here is the solution I propose:
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
public class Main
{
public static void main(String[] args) {
ArrayList<String> unsplitted = new ArrayList<String>();
unsplitted.add("000000: this_should_go_into_first_array");
unsplitted.add("000234: something1");
unsplitted.add("0000ff: something2");
unsplitted.add("000111: something3");
unsplitted.add("000051: something4");
unsplitted.add("007543: something5");
unsplitted.add("000000: and_this_should_go_into_second_array");
unsplitted.add("005612: something7");
unsplitted.add("005712: something8");
System.out.println("Non-split list: "+ unsplitted);
Predicate<String> filter = (String s) -> {
int indexOfCol = s.indexOf(":");
return s.substring(0, indexOfCol).equals("0".repeat(indexOfCol));
};
Map<Boolean, List<String>> splitMap = unsplitted.stream()
.collect(Collectors.partitioningBy(filter));
List<String> arrayZeroStart = splitMap.get(true);
List<String> arrayNonZeroStart = splitMap.get(false);
System.out.println("Sublist of arrayZeroStart: "+ arrayZeroStart);
System.out.println("Sublist of arrayWithout: "+ arrayNonZeroStart);
}
}
And here is the output:
Non-split list: [000000: this_should_go_into_first_array, 000234: something1, 0000ff:
something2, 000111: something3, 000051: something4, 007543: something5, 000000: and_this_should_go_into_second_array, 005612: something7, 005712: something8]
Sublist of arrayZeroStart: [000000: this_should_go_into_first_array, 000000: and_this_should_go_into_second_array]
Sublist of arrayWithout: [000234: something1, 0000ff: something2, 000111: something3, 000051: something4, 007543: something5, 005612: something7, 005712: something8]

java arraylist ordering and optimisation

I have written the constructor below that takes a word from a file, passes it to an external method 'Translate' that returns a translation of the word.
In parallel (and for code that I have not jet fully written) the constructor takes a string word as an that it will (once the code is written) find the word in the dictionary.
But before I do that I need to put both the word and the translation in an ArrayList. I know that a Map would be better but I need to use an ArrayList.
My code does this but there is something that I do not understand.
I write the word to the array list and then the translation....so I would expect the ArrayList to be Word1,Translation1,Word2,Translation2,
But when I run my print command it prints all the words and then all the translations...
The reason that I am trying to fathom this out is that I want to be able to sort the array list on word (the dictionary is unsorted) and then look up an individual word....and quickly pick up its translation
So my question is - am I using the ArrayList correctly (accepting the limitations of the ArrayList for this exercise and how can I sort using word as the key?
import java.io.FileNotFoundException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Scanner;
class Translate {
String original;
String translation;
public Translate(String original) throws FileNotFoundException {
this.original = original;
this.translation = translation;
ArrayList al = new ArrayList();
Path p1 = Paths.get("C:/Users/Green/documents/dictionary.txt");
Scanner sc = new Scanner(p1.toFile()).useDelimiter("\\s*-\\s*");
while (sc.hasNext()) {
String word = (sc.next());
String translation = (Translate(word));
al.add(word);
al.add(translation);
System.out.println("Print Arraylist using for loop");
for(int i=0; i < al.size(); i++) {
System.out.println( al.get(i));
}
}
}
public static void main(String args[]) throws FileNotFoundException {
Translate gcd = new Translate("envolope");
}
}
Arranging Strings in an Alphabetical Order with Java 7 (Classic way):
for (int i = 0; i < count; i++) {
for (int j = i + 1; j < count; j++) {
if (str[i].compareTo(str[j])>0) {
temp = str[i];
str[i] = str[j];
str[j] = temp;
}
}
}
Sorting the strings Java 8:
arrayList.sort((p1, p2) -> p1.compareTo(p2));
Using Comparator Java 8:
arrayList.sort(Comparator.comparing(MyObject::getA));
Find word Java 7:
List <String> listClone = new ArrayList<String>();
for (String string : list) {
if(string.matches("(?i)(text).*")) {
listClone.add(string);
}
}
Using a java.util.HashSet:
Set<String> set = new HashSet<String>(list);
if (set.contains("text")) {
System.out.println("String found!");
}
Using contains:
for (String s : list) {
if (s.contains("text")) {
System.out.println(s);
}
}
Find word Java 8:
List<String> matches = list.stream()
.filter(it -> it.contains("txt"))
.collect(Collectors.toList());
Performance of contains() in a HashSet vs ArrayList:
The contains() method works faster in HashSet compared to an ArrayList
Reference:
How to search for a string in an arraylist
Sorting ArrayList with Lambda in Java 8
HashSet vs ArrayList contains performance
https://www.baeldung.com/java-hashset-arraylist-contains-performance
https://beginnersbook.com/2018/10/java-program-to-sort-strings-in-an-alphabetical-order/

Values are overridden and not added

The values in the list are overriding in my program. I want to use the same object to add different values.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map.Entry;
import java.util.Scanner;
public class CommonValue {
static int key = 100;
public static void main(String[] args) throws IOException {
HashMap<Integer, ArrayList<String>> map = new HashMap<Integer, ArrayList<String>>();
ArrayList<String> list = new ArrayList<String>();
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
StringBuffer sBuffer = new StringBuffer();
Scanner scan = new Scanner(System.in);
String choice = null;
do {
System.out.println("enter the how many element to add");
int numOfElement = Integer.parseInt(reader.readLine());
String userInput;
int i = 0;
do {
// adding element in the list
System.out.println("enter the element to add in the list");
userInput = scan.next();
list.add(userInput);
i++;
} while (i < numOfElement);
// adding list in the map with key
map.put(key, list);
System.out.println(map);
list.clear();
// my intial key is 100 and it will incremented when i am going for another key
key++;
System.out.println("do you want to go for next key");
System.out.println("y or n");
choice = scan.next();
} while (choice.equals("y"));
for (Entry<Integer, ArrayList<String>> entry : map.entrySet()) {
key = entry.getKey();
ArrayList<String> value = entry.getValue();
System.out.println("key" + entry.getKey() + ": value " + entry.getValue());
}
}
}
Output:
enter the how many element to add
2
enter the element to add in the list
a
enter the element to add in the list
x
{100=[a, x]}
do you want to go for next key
y or n
y
enter the how many element to add
1
enter the element to add in the list
z
{100=[z], 101=[z]}
do you want to go for next key
y or n
Actually the output I need is:
{100=[a,x], 101=[z]}
The issue is that you keep adding the same instance of List to the Map without making a copy. This will not work, because clearing the list outside the map also clears the list inside the map - after all, it's the same object.
Replace list.clear(); with list = new ArrayList<String>(); to fix this problem.
You have to instantiate a new List for every Entry in your HashMap.
Currently, you are adding the very same List instance to every entry.
In combination with list.clear() this produces the observed output. The last entries in a (the only!) list will define the output for every key.
Dear you are making mistake at belove line
list.clear();
instead of this just initialize list again with new instance as
list = new ArrayList<String>();

How can I retrieve the value in a Hashmap stored in an arraylist type hashmap?

I am a beginner in Java. Basically, I have loaded each text document and stored each individual words in the text document in the hasmap. Afterwhich, I tried storing all the hashmaps in an ArrayList. Now I am stuck with how to retrieve all the words in my hashmaps that is in the arraylist!
private static long numOfWords = 0;
private String userInputString;
private static long wordCount(String data) {
long words = 0;
int index = 0;
boolean prevWhiteSpace = true;
while (index < data.length()) {
//Intialise character variable that will be checked.
char c = data.charAt(index++);
//Determine whether it is a space.
boolean currWhiteSpace = Character.isWhitespace(c);
//If previous is a space and character checked is not a space,
if (prevWhiteSpace && !currWhiteSpace) {
words++;
}
//Assign current character's determination of whether it is a spacing as previous.
prevWhiteSpace = currWhiteSpace;
}
return words;
} //
public static ArrayList StoreLoadedFiles()throws Exception{
final File f1 = new File ("C:/Users/Admin/Desktop/dataFiles/"); //specify the directory to load files
String data=""; //reset the words stored
ArrayList<HashMap> hmArr = new ArrayList<HashMap>(); //array of hashmap
for (final File fileEntry : f1.listFiles()) {
Scanner input = new Scanner(fileEntry); //load files
while (input.hasNext()) { //while there are still words in the document, continue to load all the words in a file
data += input.next();
input.useDelimiter("\t"); //similar to split function
} //while loop
String textWords = data.replaceAll("\\s+", " "); //remove all found whitespaces
HashMap<String, Integer> hm = new HashMap<String, Integer>(); //Creates a Hashmap that would be renewed when next document is loaded.
String[] words = textWords.split(" "); //store individual words into a String array
for (int j = 0; j < numOfWords; j++) {
int wordAppearCount = 0;
if (hm.containsKey(words[j].toLowerCase().replaceAll("\\W", ""))) { //replace non-word characters
wordAppearCount = hm.get(words[j].toLowerCase().replaceAll("\\W", "")); //remove non-word character and retrieve the index of the word
}
if (!words[j].toLowerCase().replaceAll("\\W", "").equals("")) {
//Words stored in hashmap are in lower case and have special characters removed.
hm.put(words[j].toLowerCase().replaceAll("\\W", ""), ++wordAppearCount);//index of word and string word stored in hashmap
}
}
hmArr.add(hm);//stores every single hashmap inside an ArrayList of hashmap
} //end of for loop
return hmArr; //return hashmap ArrayList
}
public static void LoadAllHashmapWords(ArrayList m){
for(int i=0;i<m.size();i++){
m.get(i); //stuck here!
}
Firstly your login wont work correctly. In the StoreLoadedFiles() method you iterate through the words like for (int j = 0; j < numOfWords; j++) { . The numOfWords field is initialized to zero and hence this loop wont execute at all. You should initialize that with length of words array.
Having said that to retrieve the value from hashmap from a list of hashmap, you should first iterate through the list and with each hashmap you could take the entry set. Map.Entry is basically the pair that you store in the hashmap. So when you invoke map.entrySet() method it returns a java.util.Set<Map.Entry<Key, Value>>. A set is returned because the key will be unique.
So a complete program will look like.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Scanner;
public class FileWordCounter {
public static List<HashMap<String, Integer>> storeLoadedFiles() {
final File directory = new File("C:/Users/Admin/Desktop/dataFiles/");
List<HashMap<String, Integer>> listOfWordCountMap = new ArrayList<HashMap<String, Integer>>();
Scanner input = null;
StringBuilder data;
try {
for (final File fileEntry : directory.listFiles()) {
input = new Scanner(fileEntry);
input.useDelimiter("\t");
data = new StringBuilder();
while (input.hasNext()) {
data.append(input.next());
}
input.close();
String wordsInFile = data.toString().replaceAll("\\s+", " ");
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
for(String word : wordsInFile.split(" ")){
String strippedWord = word.toLowerCase().replaceAll("\\W", "");
int wordAppearCount = 0;
if(strippedWord.length() > 0){
if(wordCountMap.containsKey(strippedWord)){
wordAppearCount = wordCountMap.get(strippedWord);
}
wordCountMap.put(strippedWord, ++wordAppearCount);
}
}
listOfWordCountMap.add(wordCountMap);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if(input != null) {
input.close();
}
}
return listOfWordCountMap;
}
public static void loadAllHashmapWords(List<HashMap<String, Integer>> listOfWordCountMap) {
for(HashMap<String, Integer> wordCountMap : listOfWordCountMap){
for(Entry<String, Integer> wordCountEntry : wordCountMap.entrySet()){
System.out.println(wordCountEntry.getKey() + " - " + wordCountEntry.getValue());
}
}
}
public static void main(String[] args) {
List<HashMap<String, Integer>> listOfWordCountMap = storeLoadedFiles();
loadAllHashmapWords(listOfWordCountMap);
}
}
Since you are beginner in Java programming I would like to point out a few best practices that you could start using from the beginning.
Closing resources : In your while loop to read from files you are opening a Scanner like Scanner input = new Scanner(fileEntry);, But you never closes it. This causes memory leaks. You should always use a try-catch-finally block and close resources in finally block.
Avoid unnecessary redundant calls : If an operation is the same while executing inside a loop try moving it outside the loop to avoid redundant calls. In your case for example the scanner delimiter setting as input.useDelimiter("\t"); is essentially a one time operation after a scanner is initialized. So you could move that outside the while loop.
Use StringBuilder instead of String : For repeated string manipulations such as concatenation should be done using a StringBuilder (or StringBuffer when you need synchronization) instead of using += or +. This is because String is an immutable object, meaning its value cannot be changed. So each time when you do a concatenation a new String object is created. This results in a lot of unused instances in memory. Where as StringBuilder is mutable and values could be changed.
Naming convention : The usual naming convention in Java is starting with lower-case letter and first letter upper-case for each word. So its a standard practice to name a method as storeLoadedFiles as opposed to StoreLoadedFiles. (This could be opinion based ;))
Give descriptive names : Its a good practice to give descriptive names. It helps in later code maintenance. Say its better to give a name as wordCountMap as opposed to hm. So in future if someone tries to go through your code they'll get a better and faster understanding about your code with descriptive names. Again opinion based.
Use generics as much as possible : This avoid additional casting overhead.
Avoid repetition : Similar to point 2 if you have an operation that result in the same output and need to be used multiple times try moving it to a variable and use the variable. In your case you were using words[j].toLowerCase().replaceAll("\\W", "") multiple times. All the time the result is the same but it creates unnecessary instances and repetitions. So you could move that to a String and use that String elsewhere.
Try using for-each loop where ever possible : This relieves us from taking care of indexing.
These are just suggestions. I tried to include most of it in my code but I wont say its the perfect one. Since you are a beginner if you tried to include these best practices now itself it'll get ingrained in you. Happy coding.. :)
for (HashMap<String, Integer> map : m) {
for(Entry<String,Integer> e:map.entrySet()){
//your code here
}
}
or, if using java 8 you can play with lambda
m.stream().forEach((map) -> {
map.entrySet().stream().forEach((e) -> {
//your code here
});
});
But before all you have to change method signature to public static void LoadAllHashmapWords(List<HashMap<String,Integer>> m) otherwise you would have to use a cast.
P.S. are you sure your extracting method works? I've tested it a bit and had list of empty hashmaps all the time.

Find unique words in a file - Java

Using a msdos window I am piping in an amazon.txt file.
I am trying to use the collections framework. Keep in mind I want to keep this
as simple as possible.
What I want to do is count all the unique words in the file... with no duplicates.
This is what I have so far. Please be kind this is my first java project.
import java.util.Scanner;
import java.util.ArrayList;
import java.util.Iterator;
public class project1 {
// ArrayList<String> a = new ArrayList<String>();
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String word;
String grab;
int count = 0;
ArrayList<String> a = new ArrayList<String>();
// Iterator<String> it = a.iterator();
System.out.println("Java project\n");
while (sc.hasNext()) {
word = sc.next();
a.add(word);
if (word.equals("---")) {
break;
}
}
Iterator<String> it = a.iterator();
while (it.hasNext()) {
grab = it.next();
if (grab.contains("a")) {
System.out.println(it.next()); // Just a check to see
count++;
}
}
System.out.println("I counted abc = ");
System.out.println(count);
System.out.println("\nbye...");
}
}
In your version, the wordlist a will contain all words but duplicates aswell. You can either
(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution
(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order
Edit
If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).
Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.
If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.

Categories