Split a string in Java by symbols - java

The request is simple in words, but then it took a twist
Basically what I need is not only to split a string, but save the"symbols"and later be able to show them
Here's how it works:
Input:
Once upon a time there was a kingdom far, far away
On this kingdom lived a princess
And she was happy by herself
Output:
Once, 1 time, line 1-column 1
upon, 1 time, line 1-column 6
a, 3 times, line 1-column 11, line 1-column 28, line 2-column 23
...
Also should be able to : not consider spaces, \n and \f on them, but like do consider the "," so splitting by spaces would not give "far," and "far" as different symbols, just as "," is a symbol on it's own
Is there any way on which the split method could consider multiple parameters?
Not to mention I am not sure if there is a better way than a for loop to count the line and column on which each "symbol" is repeated

Split the string into symbols, keep the symbols as keys in the hash map and their count in corresponding hash map values.
Example:
Map<String, Integer> tokensWithCounts = new HashMap<String, Integer>();
String[] tokens = str.split("\\s*[a-zA-Z]+\\s*");
List<String> tokensList = Arrays.asList(tokens);
for(String token : tokenList){
if(tokensWithCounts.get(token) == null){
tokensWithCounts.put(token, 0);
}
tokensWithCounts.put(token, tokensWithCounts.get(token) + 1);
}
EDIT: Sorry, I got the question wrong. You will keep the values in a HashMap<String, List<String>>. Tokens will still be keys, List will have all occurrences of the token in the form of "line 1-column 1". The count of occurrences is the .size() of the List<String> that is stored in the HashMap.

Related

counting number of occurrences of words in a text java

So I'm building a TreeMap from scratch and I'm trying to count the number of occurrences of every word in a text using Java. The text is read from a text file, but I can easily read it from there. I really don't know how to count every word, can someone help?
Imagine the text is something like:
Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.
Output:
Over 1
time 1
computer 1
algotitms 5
...
If possible I want to ignore if it's upper or lower case, I want to count them both together.
EDIT: I don't want to use any sort of Map (hashMap i.e.) or something similiar to do this.
Break down the problem as follows (this is one potential solution - not THE solution):
Split the text into words (create list or array or words).
Remove punctuation marks.
Create your map to collect results.
Iterate over your list of words and add "1" to the value of each encountered key
Display results (Iterate over the map's EntrySet)
Split the text into words
My preference is to split words by using space as a delimiter. The reason being is that, if you split using non-word characters, you may missed on some hyphenated words. I know that the use of hyphenation is being reduced, there are still plenty of words that fall under this rule; for example, middle-aged. If a word such as this is encountered, it MIGHT have to be treated as one word and not two.
Remove punctuation marks
Because of the decision above, you will need to first remove punctuation characters that might attached to your words. Keep in mind that if you use a regular expression to split the words, you might be able to accomplish this step at the same time you are doing the step above. In fact, that would be preferred so that you don't have to iterate over twice. Do both of these in a single pass. While you at it, call toLowerCase() on the input string to eliminate the ambiguity between capitalized words and lowercase words.
Create your map to collect results
This is where you are going to collect your count. Using the TreeMap implementation of the Java Map. One thing to be aware about this particular implementation is that the map is sorted according to the natural ordering of its keys. In this case, since the keys are the words from the inputted text, the keys will be arranged in alphabetical order, not by the magnitude of the count. IF sorting the entries by count is important, there is a technique where you can "reverse" the map and make the values the keys and the keys to values. However, since two or more words could have the same count, you will need to create a new map of <Integer, Set>, so that you can group together words with the same count.
Iterate over your list of words
At this point, you should have a list of words and a map structure to collect the count. Using a lambda expression, you should be able to perform a count() or your words very easily. But, if you are not familiarized or comfortable with Lambda expressions, you can use a regular looping structure to iterate over your list, do a containsKey() check to see if the word was encountered before, get() the value if the map already contains the word, and then add "1" to the previous value. Lastly, put() the new count in the map.
Display results
Again, you can use a Lambda Expression to print out the EntrySet key value pairs or simply iterate over the entry set to display the results.
Based on all of the above points, a potential solution should look like this (not using Lambda for the OPs sake)
public static void main(String[] args) {
String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
text = text.replaceAll("\\p{P}", ""); // replace all punctuations
text = text.toLowerCase(); // turn all words into lowercase
String[] wordArr = text.split(" "); // create list of words
Map<String, Integer> wordCount = new TreeMap<>();
// Collect the word count
for (String word : wordArr) {
if(!wordCount.containsKey(word)){
wordCount.put(word, 1);
} else {
int count = wordCount.get(word);
wordCount.put(word, count + 1);
}
}
Iterator<Entry<String, Integer>> iter = wordCount.entrySet().iterator();
System.out.println("Output: ");
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
This produces the following output
Output:
advantage: 1
algorithms: 5
and: 1
combine: 1
computer: 1
each: 1
engineers: 1
even: 1
for: 2
in: 1
invent: 1
more: 1
new: 1
of: 2
other: 2
others: 1
over: 1
producing: 1
results: 2
take: 1
the: 1
things: 1
time: 1
to: 1
turn: 1
utilize: 1
with: 1
work: 1
Why did I break down the problem like this for such mundane task? Simple. I believe each of those discrete steps should be extracted into functions to improve code reusability. Yes, it is cool to use a Lambda expression to do everything at once and make your code look much simplified. But what if you need to some intermediate step over and over? Most of the time, code is duplicated to accomplish this. In reality, often a better solution is to break these tasks into methods. Some of these tasks, like transforming the input text, can be done in a single method since that activity seems to be related in nature. (There is such a thing as a method doing "too little.")
public String[] createWordList(String text) {
return text.replaceAll("\\p{P}", "").toLowerCase().split(" ");
}
public Map<String, Integer> createWordCountMap(String[] wordArr) {
Map<String, Integer> wordCountMap = new TreeMap<>();
for (String word : wordArr) {
if(!wordCountMap.containsKey(word)){
wordCountMap.put(word, 1);
} else {
int count = wordCountMap.get(word);
wordCountMap.put(word, count + 1);
}
}
return wordCountMap;
}
String void displayCount(Map<String, Integer> wordCountMap) {
Iterator<Entry<String, Integer>> iter = wordCountMap.entrySet().iterator();
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
Now, after doing that, your main method looks more readable and your code is more reusable.
public static void main(String[] args) {
WordCount wc = new WordCount();
String text = "...";
String[] wordArr = wc.createWordList(text);
Map<String, Integer> wordCountMap = wc.createWordCountMap(wordArr);
wc.displayCount(wordCountMap);
}
UPDATE:
One small detail I forgot to mention is that, if instead of a TreeMap a HashMap is used, the output will come sorted by count value in descending order. This is because the hashing function will use value of the entry as the hash. Therefore, you won't need to "reverse" the map for this purpose. So, after switching to HashMap, the output should be as follows:
Output:
algorithms: 5
other: 2
for: 2
turn: 1
computer: 1
producing: 1
...
my suggestion is to use regexp and split and stream with grouping example 3
EX1 this solution does not use a collection LIST/MAP only array for me it is not optimal
#Test
public void testApp2() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String lowerText = text.toLowerCase();
final String[] split = lowerText.split("\\W+");
System.out.println("Output: ");
for (String s : split) {
if (s == null) {
continue;
}
int count = 0;
for (int i = 0; i < split.length; i++) {
final boolean sameWorld = s.equals(split[i]);
if (sameWorld) {
count = count + 1;
split[i] = null;
}
}
System.out.println(s + " " + count);
}
}
EX2 I think that's what you mean, but I'm not sure if I used too much for the list
#Test
public void testApp() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String[] split = text.split("\\W+");
final List<String> list = new ArrayList<>();
System.out.println("Output: ");
for (String s : split) {
if(!list.contains(s)){
list.add(s.toUpperCase());
final long count = Arrays.stream(split).filter(s::equalsIgnoreCase).count();
System.out.println(s+" "+count);
}
}
}
EX3 below is a test for your example but use MAP
#Test
public void test() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
Map<String, Long> result = Arrays.stream(text.split("\\W+")).collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));
assertEquals(result.get("algorithms"), new Long(5));
System.out.println("Output: ");
result.entrySet().stream().forEach(x -> System.out.println(x.getKey() + " " + x.getValue()));
}

Count the number of numbers in String (Java) [duplicate]

This question already has answers here:
Java equivalent to Explode and Implode(PHP) [closed]
(6 answers)
Closed 8 years ago.
Hi all :) I`ve got very easy problem I spend more than 2 hours searching here and on java docs.
So I have a string which contain more than 5k lines in each line there are 6 numbers from 1-49 and they are separated by ";". I want to count how many times each of the number occur in my very long string. Most of topic which i found was about char counting. The closest i think was to use common lang and function .countMatches should I use arrayList? I need some clue, if solution is to long tip me how to do it :)
the straightforward solution is to read line by line from your file and split by ;, then you got each number as string, finally put them into a HashMap<String, Integer>, if the key exists, just +1 the value. At the end you have the counts for each string (your number).
I hope I understand your question right.
Try this:
Create a variable to hold the counts. Here is an example: Map<String, Integer> counts = new HashMap<String, Integer>();
Split each line using String.split() (specifically line.split(";").
Each time you split the line you will receive an array of numbers. for each of these numbers, retrieve the value from the the counts map. if null, add it to the map with a count of 1, of not null, increment the count and add it back to the map.
Edit: some code.
Map<String, Integer> counts = new HashMap<String, Integer>();
String line;
String[] parts
while (there are more lines)
{
line = read the line somehow.
parts = line.split(";");
if (parts != null)
{
for (String current : parts)
{
Integer value = counts.get(current);
if (value == null) // number not in the counts map yet.
{
counts.put(current, 1);
}
else
{
int currentCount = value.intValue() + 1;
counts.put(current, currentCount);
}
}
}
}

Split a text file using delimiter \t and store in hashmap in java

I have a text file of n columns and n rows separated by tab space.. How do i use split function to store the columns and rows into a hashmap.Please help.My text file will be like..
Dept Id Name Contact
IT 1 zzz 678
ECE 2 ttt 789
IT 3 rrr 908
I tried the following.But it dint work.
Map<String,String> map=new HashMap<String,String>();
while(lineReader!=null)
{
String[] tokens = lineReader.split("\\t");
key = tokens[0];
values = tokens[1];
map.put(key , values );
System.out.println("ID:"+map.get(key ));
System.out.println("Other Column Values:"+map.get(values ));
}
This returns the key of the last entry(row) of the file and value as null. But i need to store all rows and columns in the map. How do i do it?
If I understand your data correctly,
After
String[] tokens = lineReader.split("\\t");
is processed on the first line, you'd have 4 tokens in the array.
I think you are using wrong logic, if you want to store the map in the following way:
IT -> (1 ZZZ 678)
.... etc then you need to process the data differently.
What you are storing in the map is follows:
IT -> 1
ECE -> 2
...
and so on.
That's why you get null when you are trying to do:
map.get(value);
What you should instead print is the Key and map.get(key).
Actually, in any case I don't think Map is what you want (but I don't know what you really want).
For now though, for your understanding of this problem try printing:
System.out.println("Total collumns: "+ tokens.length);
Updated:
This should work for you. It isn't the most elegant implementation for what you want, but gets the job done. You should try improving it from here on.
Map<String,String> map=new HashMap<String,String>();
while(lineReader!=null)
{
String[] tokens = lineReader.split("\\t");
key = tokens[1];
values = tokens[2]+tokens[3];
map.put(key , values );
System.out.println("ID:"+key);
System.out.println("Other Column Values:"+map.get(key));
}
Good luck!

search elements in an array in java

I'm wondering what kind method should I use to search the elements in an array and what data structure to store the return value
For example a txt file contains following
123 Name line Moon night table
124 Laugh Cry Dog
123 quote line make pet table
127 line array hello table
and the search elements are line+table
I read every line as an string and then spilt by space
the output should like this
123 2 (ID 123 occurs twice that contains the search elements)
127 1
I want some suggestions of what kind method to search the elements in the array and what kind data structure to store the return value (the ID and the number of occurs. I'm thinking hashmap)
Read the text file and store each line that ends with table in ArrayList<String>. Then use contains for each element in ArrayList<String>. Store result in HashMap<key,value> where key is ID and value is Integer which represent number of times ID occurs.
First, I would keep reading through the file line by line, there's really no other way of going about it other than that.
Second, to pick out the rows to save, you don't need to do the split (assumption: they all end in (space)table). You can just get them by using:
if (line.endsWith(" table"))
Then, I would suggest using a Map<String, Integer> datatype to store your information. This way, you have the number of the table (key) and how many times if was found in the file (value).
Map<String, Integer> map = new HashMap<String, Integer>();
....reading file....
if (line.endsWith(" table")) {
String number = line.substring(0, line.indexOf(" "))
if (!map.containsKey(number)) {
map.put(number, 1);
} else {
Integer value = map.get(number);
value++;
map.put(number, value);
}
}

how to find duplicate and unique string entries using Hashtable

Assume I'm taking input a string from command line and I want to find the duplicate and unique entries in the string by using Hashtable.
eg:
i/p:
hi hello bye hi good hello name hi day hi
o/p:
Unique elements are: bye, good, name, day
Duplicate elements are:
hi 3 times
hello 2 times
You can break the input apart by calling split(" ") on the input String. This will return a String[] representing each word. Iterate over this array, and use each String as the key into your Hashtable, with the value being an Integer. Each time you encounter a word, either increment its value, or set the value to 0 if no value is currently there.
Hashtable<String, Integer> hashtable = new Hashtable<String, Integer>();
String[] splitInput = input.split(" ");
for(String inputToken : splitInput) {
Integer val = hashtable.get(inputToken);
if(val == null) {
val = new Integer(0);
}
++val;
hashtable.put(inputToken, val);
}
Also, you may want to look into HashMap rather than Hashtable. HashMap is not thread safe, but is faster. Hashtable is a bit slower, but is thread safe. If you are trying to do this in a single thread, I would recommend HashMap.
Use a hashtable with string as key and a numeric type as counter.
Go through all the words and if they are not in the map, insert them; otherwise increase the count (the data part of the hashtable).
hth
Mario
you can convert each string into an integer. Then, use the generated integer as the hash value. To convert string to int, you can treat it as a base 256 number and then convert it

Categories