Java Unicode Characters - java

I'm familiar with problems with ascii. The problem is I have no experience with same problems in unicode characters. For example, how to return the word that occurs most frequently given a string array containing words? Thanks in advance!
p.s.: You can always use an array which length is "256" to represent all the characters in ASCII while you can't do that when it comes to unicode. Is HashMap a must and the best way to solve the problem? I heard that there are better ways to solve it. Below is what I can think of:
String str = "aa df ds df df"; // assume they are Unicode
String[] words = str.split(" ");
HashMap<String, Integer> map = new HashMap<String, Integer>();
for (String word : words){
if (map.containsKey(word)){
int f = map.get(word);
map.put(word, f+1);
} else{
map.put(word, 1);
}
}
int max = 0;
String maxWord = "";
for (String word : words){
int f = map.get(word);
if (f > max){
max = f;
maxWord = word;
}
}
System.out.println(maxWord + " " +max);

// Inspired by GameKyuubi. It can be solved using array sort and count the most frequently used word using constatnt space.
Arrays.sort(words);
int max = 0;
int count = 0;
String maxWord = "";
String prev = "";
for (String word : words){
if (prev.equals("") || word.equals(prev)){
count++;
} else{
count = 1;
}
if (max < count){
max = count;
maxWord = word;
}
prev = word;
}
System.out.println(maxWord + " " +max);

Related

Find the most common word from user input

I'm very new to Java creating a software application that allows a user to input text into a field and the program runs through all of the text and identifies what the most common word is. At the moment, my code looks like this:
JButton btnMostFrequentWord = new JButton("Most Frequent Word");
btnMostFrequentWord.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
String text = textArea.getText();
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + occurrences.values());
}
}
This just prints what the values of the words are, but I would like it to tell me what the number one most common word is instead. Any help would be really appreciated.
Just after your for loop, you can sort the map by value then reverse the sorted entries by value and select the first.
for (String word: words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
Map.Entry<String,Integer> tempResult = occurrences.entrySet().stream()
.sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
.findFirst().get();
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + tempResult.getKey());
For anyone who is more familiar with Java, here is a very easy way to do it with Java 8:
List<String> words = Arrays.asList(text.split("\\s+"));
Collections.sort(words, Comparator.comparingInt(word -> {
return Collections.frequency(words, word);
}).reversed());
The most common word is stored in words.get(0) after sorting.
I would do something like this
int max = 0;
String a = null;
for (String word : words) {
int value = 0;
if(occurrences.containsKey(word)){
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
if(max < value+1){
max = value+1;
a = word;
}
}
System.out.println(a);
You could sort it, and the solution would be much shorter, but I think this runs faster.
You can either iterate through occurrences map and find the max or
Try like below
String text = textArea.getText();;
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<>();
int mostFreq = -1;
String mostFreqWord = null;
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
value = value + 1;
occurrences.put(word, value);
if (value > mostFreq) {
mostFreq = value;
mostFreqWord = word;
}
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + mostFreqWord);

How to get the total count of occurence of a word in a sentence

I am trying to find the total count of occurrence of word in a sentence.
I tried the following code:
String str = "This is stackoverflow and you will find great solutions here.stackoverflowstackoverflow is a large community of talented coders.It hepls you to find solutions for every complex problems.";
String findStr = "hello World";
String[] split=findStr.split(" ");
for(int i=0;i<split.length;i++){
System.out.println(split[i]);
String indexWord=split[i];
int lastIndex = 0;
int count = 0;
while(lastIndex != -1){
lastIndex = str.indexOf(indexWord,lastIndex);
System.out.println(lastIndex);
if(lastIndex != -1){
count ++;
lastIndex += findStr.length();
}
}
System.out.println("Count for word "+indexWord+" is : "+count);
}
If I am passing string like "stack solution" ,the string should be split into two(space split) and need to find the no of occurrence of each string in the sentence.The count is perfect if I pass only one word.The code has to match even substrings containing the searched string.
Eg:-In the sentence "stack" apperars three times,but the count is only 2.
Thanks.
When you increment lastIndex after a match, you mean to increment it by the length of the match (indexWord), not the length of the string of input words (findStr). Just replace the line
lastIndex += findStr.length();
with
lastIndex += indexWord.length();
try this code
String str = "helloslkhellodjladfjhello";
String findStr = "hello";
int lastIndex = 0;
int count = 0;
while(lastIndex != -1){
lastIndex = str.indexOf(findStr,lastIndex);
if(lastIndex != -1){
count ++;
lastIndex += findStr.length();
}
}
System.out.println(count);
You can use map for this as well.
public static void main(String[] args) {
String value = "This is simple sting with simple have two occurence";
Map<String, Integer> map = new HashMap<>();
for (String w : value.split(" ")) {
if (!w.equals("")) {
Integer n = map.get(w);
n = (n == null) ? 1 : ++n;
map.put(w, n);
}
}
System.out.println("map" + map);
}
Is there any reason of not using the readymade API solution in place.
This can be achieved by using the StringUtils in apache commons-lang have CountMatches method to counts the number of occurrences of one String in another.
E.g.
String input = "This is stackoverflow and you will find great solutions here.stackoverflowstackoverflow is a large community of talented coders.It hepls you to find solutions for every complex problems.";
String findStr = "stackoverflow is";
for (String s : Arrays.asList(findStr.split(" "))) {
int occurance = StringUtils.countMatches(input, s);
System.out.println(occurance);
}

sub arraylist's size isn't correct

After hard searchig I still haven't found the proper answer for my question and there is it:
I have to write a java program that enters an array of strings and finds in it the largest sequence of equal elements. If several sequences have the same longest length, the program should print the leftmost of them. The input strings are given as a single line, separated by a space.
For example:
if the input is: "hi yes yes yes bye",
the output should be: "yes yes yes".
And there is my source code:
public static void main(String[] args) {
System.out.println("Please enter a sequence of strings separated by spaces:");
Scanner inputStringScanner = new Scanner(System.in);
String[] strings = inputStringScanner.nextLine().split(" ");
System.out.println(String.join(" ", strings));
ArrayList<ArrayList<String>> stringsSequencesCollection = new ArrayList<ArrayList<String>>();
ArrayList<String> stringsSequences = new ArrayList<String>();
stringsSequences.add(strings[0]);
for (int i = 1; i < strings.length; i++) {
if(strings[i].equals(strings[i - 1])) {
stringsSequences.add(strings[i]);
} else {
System.out.println(stringsSequences + " " + stringsSequences.size());
stringsSequencesCollection.add(stringsSequences);
stringsSequences.clear();
stringsSequences.add(strings[i]);
//ystem.out.println("\n" + stringsSequences);
}
if(i == strings.length - 1) {
stringsSequencesCollection.add(stringsSequences);
stringsSequences.clear();
System.out.println(stringsSequences + " " + stringsSequences.size());
}
}
System.out.println(stringsSequencesCollection.size());
System.out.println(stringsSequencesCollection.get(2).size());
System.out.println();
int maximalStringSequence = Integer.MIN_VALUE;
int index = 0;
ArrayList<String> currentStringSequence = new ArrayList<String>();
for (int i = 0; i < stringsSequencesCollection.size(); i++) {
currentStringSequence = stringsSequencesCollection.get(i);
System.out.println(stringsSequencesCollection.get(i).size());
if (stringsSequencesCollection.get(i).size() > maximalStringSequence) {
maximalStringSequence = stringsSequencesCollection.get(i).size();
index = i;
//System.out.println("\n" + index);
}
}
System.out.println(String.join(" ", stringsSequencesCollection.get(index)));
I think it should be work correct but there is a problem - the sub array list's count isn't correct: All the sub arrayList's size is 1 and for this reason the output is not correct. I don't understand what is the reason for this. If anybody can help me to fix the code I will be gratefull!
I think it is fairly straight forward just keep track of a max sequence length as you go through the array building sequences.
String input = "hi yes yes yes bye";
String sa[] = input.split(" ");
int maxseqlen = 1;
String last_sample = sa[0];
String longest_seq = last_sample;
int seqlen = 1;
String seq = last_sample;
for (int i = 1; i < sa.length; i++) {
String sample = sa[i];
if (sample.equals(last_sample)) {
seqlen++;
seq += " " + sample;
if (seqlen > maxseqlen) {
longest_seq = seq;
maxseqlen = seqlen;
}
} else {
seqlen = 1;
seq = sample;
}
last_sample = sample;
}
System.out.println("longest_seq = " + longest_seq);
Lots of issues.
First of all, when dealing with the last string of the list you are not actually printing it before clearing it. Should be:
if(i == strings.length - 1)
//...
System.out.println(stringsSequences + " " + stringsSequences.size());
stringsSequences.clear();
This is the error in the output.
Secondly, and most importantly, when you do stringsSequencesCollection.add you are adding an OBJECT, i.e. a reference to the collection. When after you do stringsSequences.clear(), you empty the collection you just added too (this is because it's not making a copy, but keeping a reference!). You can verify this by printing stringsSequencesCollection after the first loop finishes: it will contain 3 empty lists.
So how do we do this? First of all, we need a more appropriate data structure. We are going to use a Map that, for each string, contains the length of its longest sequence. Since we want to manage ties too, we'll also have another map that for each string stores the leftmost ending position of the longest sequence:
Map<String, Integer> lengths= new HashMap<>();
Map<String, Integer> indexes= new HashMap<>();
String[] split = input.split(" ");
lengths.put(split[0], 1);
indexes.put(split[0], 0);
int currentLength = 1;
int maxLength = 1;
for (int i = 1; i<split.length; i++) {
String s = split[i];
if (s.equals(split[i-1])) {
currentLength++;
}
else {
currentLength = 1;
}
int oldLength = lengths.getOrDefault(s, 0);
if (currentLength > oldLength) {
lengths.put(s, currentLength);
indexes.put(s, i);
}
maxLength = Math.max(maxLength, currentLength);
}
//At this point, youll have in lengths a map from string -> maxSeqLengt, and in indexes a map from string -> indexes for the leftmost ending index of the longest sequence. Now we need to reason on those!
Now we can just scan for the strings with the longest sequences:
//Find all strings with equal maximal length sequences
Set<String> longestStrings = new HashSet<>();
for (Map.Entry<String, Integer> e: lengths.entrySet()) {
if (e.value == maxLength) {
longestStrings.add(e.key);
}
}
//Of those, search the one with minimal index
int minIndex = input.length();
String bestString = null;
for (String s: longestStrings) {
int index = indexes.get(s);
if (index < minIndex) {
bestString = s;
}
}
System.out.println(bestString);
Below code results in output as you expected:
public static void main(String[] args) {
System.out.println("Please enter a sequence of strings separated by spaces:");
Scanner inputStringScanner = new Scanner(System.in);
String[] strings = inputStringScanner.nextLine().split(" ");
System.out.println(String.join(" ", strings));
List <ArrayList<String>> stringsSequencesCollection = new ArrayList<ArrayList<String>>();
List <String> stringsSequences = new ArrayList<String>();
//stringsSequences.add(strings[0]);
boolean flag = false;
for (int i = 1; i < strings.length; i++) {
if(strings[i].equals(strings[i - 1])) {
if(flag == false){
stringsSequences.add(strings[i]);
flag= true;
}
stringsSequences.add(strings[i]);
}
}
int maximalStringSequence = Integer.MIN_VALUE;
int index = 0;
List <String> currentStringSequence = new ArrayList<String>();
for (int i = 0; i < stringsSequencesCollection.size(); i++) {
currentStringSequence = stringsSequencesCollection.get(i);
System.out.println(stringsSequencesCollection.get(i).size());
if (stringsSequencesCollection.get(i).size() > maximalStringSequence) {
maximalStringSequence = stringsSequencesCollection.get(i).size();
index = i;
//System.out.println("\n" + index);
}
}
System.out.println(stringsSequences.toString());

Java words reverse

I am new to Java and I found a interesting problem which I wanted to solve. I am trying to code a program that reverses the position of each word of a string. For example, the input string = "HERE AM I", the output string will be "I AM HERE". I have got into it, but it's not working out for me. Could anyone kindly point out the error, and how to fix it, because I am really curious to know what's going wrong. Thanks!
import java.util.Scanner;
public class Count{
static Scanner sc = new Scanner(System.in);
static String in = ""; static String ar[];
void accept(){
System.out.println("Enter the string: ");
in = sc.nextLine();
}
void intArray(int words){
ar = new String[words];
}
static int Words(String in){
in = in.trim(); //Rm space
int wc = 1;
char c;
for (int i = 0; i<in.length()-1;i++){
if (in.charAt(i)==' '&&in.charAt(i+1)!=' ') wc++;
}
return wc;
}
void generate(){
char c; String w = ""; int n = 0;
for (int i = 0; i<in.length(); i++){
c = in.charAt(i);
if (c!=' '){
w += c;
}
else {
ar[n] = w; n++;
}
}
}
void printOut(){
String finale = "";
for (int i = ar.length-1; i>=0;i--){
finale = finale + (ar[i]);
}
System.out.println("Reversed words: " + finale);
}
public static void main(String[] args){
Count a = new Count();
a.accept();
int words = Words(in);
a.intArray(words);
a.generate();
a.printOut();
}
}
Got it. Here is my code that implements split and reverse from scratch.
The split function is implemented through iterating through the string, and keeping track of start and end indexes. Once one of the indexes in the string is equivalent to a " ", the program sets the end index to the element behind the space, and adds the previous substring to an ArrayList, then creating a new start index to begin with.
Reverse is very straightforward - you simply iterate from the end of the string to the first element of the string.
Example:
Input: df gf sd
Output: sd gf df
import java.util.Scanner;
import java.util.ArrayList;
import java.util.Collections;
public class Count{
public static void main(String[] args)
{
Scanner scan = new Scanner(System.in);
System.out.println("Enter string to reverse: ");
String unreversed = scan.nextLine();
System.out.println("Reversed String: " + reverse(unreversed));
}
public static String reverse(String unreversed)
{
ArrayList<String> parts = new ArrayList<String>();
String reversed = "";
int start = 0;
int end = 0;
for (int i = 0; i < unreversed.length(); i++)
{
if (unreversed.charAt(i) == ' ')
{
end = i;
parts.add(unreversed.substring(start, end));
start = i + 1;
}
}
parts.add(unreversed.substring(start, unreversed.length()));
for (int i = parts.size()-1; i >= 0; i--)
{
reversed += parts.get(i);
reversed += " ";
}
return reversed;
}
}
There is my suggestion :
String s = " HERE AM I ";
s = s.trim();
int j = s.length() - 1;
int index = 0;
StringBuilder builder = new StringBuilder();
for (int i = j; i >= 0; i--) {
Character c = s.charAt(i);
if (c.isWhitespace(c)) {
index = i;
String r = s.substring(index+1, j+1);
j = index - 1;
builder.append(r);
builder.append(" ");
}
}
String r=s.substring(0, index);
builder.append(r);
System.out.println(builder.toString());
From adding debug output between each method call it's easy to determine that you're successfully reading the input, counting the words, and initializing the array. That means that the problem is in generate().
Problem 1 in generate() (why "HERE" is duplicated in the output): after you add w to your array (when the word is complete) you don't reset w to "", meaning every word has the previous word(s) prepended to it. This is easily seen by adding debug output (or using a debugger) to print the state of ar and w each iteration of the loop.
Problem 2 in generate() (why "I" isn't in the output): there isn't a trailing space in the string, so the condition that adds a word to the array is never met for the last word before the loop terminates at the end of the string. The easy fix is to just add ar[n] = w; after the end of the loop to cover the last word.
I would use the split function and then print from the end of the list to the front.
String[] splitString = str.split(" ");
for(int i = splitString.length() - 1; i >= 0; i--){
System.out.print(splitString[i]);
if(i != 0) System.out.print(' ');
}
Oops read your comment. Disregard this if it is not what you want.
This has a function that does the same as split, but not the predefined split function
public static void main(String args[]) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter the string : ");
String input = sc.nextLine();
// This splits the string into array of words separated with " "
String arr[] = myOwnSplit(input.trim(), ' '); // ["I", "AM", "HERE"]
// This ll contain the reverse string
String rev = "";
// Reading the array from the back
for(int i = (arr.length - 1) ; i >= 0 ; i --) {
// putting the words into the reverse string with a space to it's end
rev += (arr[i] + " ");
}
// Getting rid of the last extra space
rev.trim();
System.out.println("The reverse of the given string is : " + rev);
}
// The is my own version of the split function
public static String[] myOwnSplit(String str, char regex) {
char[] arr = str.toCharArray();
ArrayList<String> spltedArrayList = new ArrayList<String>();
String word = "";
// splitting the string based on the regex and bulding an arraylist
for(int i = 0 ; i < arr.length ; i ++) {
char c = arr[i];
if(c == regex) {
spltedArrayList.add(word);
word = "";
} else {
word += c;
}
if(i == (arr.length - 1)) {
spltedArrayList.add(word);
}
}
String[] splitedArray = new String[spltedArrayList.size()];
// Converting the arraylist to string array
for(int i = 0 ; i < spltedArrayList.size() ; i++) {
splitedArray[i] = spltedArrayList.get(i);
}
return splitedArray;
}

how do i count occurrence of words in a line

I am fairly new to java. I want to count the occurrences of words in a particular line. So far i can only count the words but no idea how to count occurrences.
Is there a simple way to do this?
Scanner file = new Scanner(new FileInputStream("/../output.txt"));
int count = 0;
while (file.hasNextLine()) {
String s = file.nextLine();
count++;
if(s.contains("#AVFC")){
System.out.printf("There are %d words on this line ", s.split("\\s").length-1);
System.out.println(count);
}
}
file.close();
Output:
There are 4 words on this line 1
There are 8 words on this line 13
There are 3 words on this line 16
Simplest way I can think of is to use String.split("\\s"), which will split based on spaces.
Then have a HashMap containing a word as the key with the value being the number of times it is used.
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
Implementation you requested to skip strings that contain certain words
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
if (isStringWanted(s) == false) {
continue;
}
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
private boolean isStringWanted(String s) {
String[] checkStrings = new String[] {"chelsea", "Liverpool", "#LFC"};
for (String check : checkString) {
if (s.contains(check)) {
return false;
}
}
return true;
}
Try below code, it may solve your problem, in addition you can call String.toLowerCase() before you put it into the hashmap
String line ="a a b b b b a q c c";
...
Map<String,Integer> map = new HashMap<String,Integer>();
Scanner scanner = new Scanner(line);
while (scanner.hasNext()) {
String s = scanner.next();
Integer count = map.put(s,1);
if(count!=null) map.put(s,count + 1);
}
...
System.out.println(map);
Result:
{b=4, c=2, q=1, a=3}
Fastest would be store the splitted data in a ArrayList then iterate on your ArrayList and use [Collections.frequency] (http://www.tutorialspoint.com/java/util/collections_frequency.htm)
Check Guava's Multiset. Their description starts with 'The traditional Java idiom for e.g. counting how many times a word occurs in a document is something like:'. You find some code snippets how to do that without a MultiSet.
BTW: If you only wanted to count the number of words in your string, why not just count the spaces? You could use StringUtils from the apache commons. It's much better than creating an array of the split parts. Also have a look at their implementation.
int count = StringUtils.countMatches(string, " ");
In a given String, occurrences of a given String can be counted using String#indexOf(String, int) and through a loop
String haystack = "This is a string";
String needle = "i";
int index = 0;
while (index != -1) {
index = haystack.indexOf(needle, index + 1);
if (index != -1) {
System.out.println(String.format("Found %s in %s at index %s.", needle, haystack, index));
}
}

Categories