Efficiently Compare Successive Characters in String - java

I'm doing some text analysis, and need to record the frequencies of character transitions in a String. I have n categories of characters: for the sake of example, isUpperCase(), isNumber(), and isSpace().
Given that there are n categories, there will be n^2 categories of transitions, e.g. "isUpperCase() --> isUpperCase()", "isUpperCase --> isLetter()", "isLetter() --> isUpperCase()", etc.
Given a block of text, I would like to record the number of transitions that took place. I would imagine constructing a Map with the transition types as the Keys, and an Integer as each Value.
For the block of text "TO", the Map would look like [isUpper -> isUpper : 1, isUpper -> isSpace : 1]
The part I cannot figure out, though, is how to construct a Map where, from what I can see, the Key would consist of 2 boolean methods.

Create an enum that represents character types - you need a way to get a character type enum given a character. I'm sure there are better ways to do that than what I have done below but that is left as an exercise to the reader.
Next create a method that takes the previous and current characters and concatenates their types into a unique String.
Finally loop over the input string and hey presto.
private static enum CharacterType {
UPPER {
#Override
boolean isA(final char c) {
return Character.isUpperCase(c);
}
},
LOWER {
#Override
boolean isA(final char c) {
return Character.isLowerCase(c);
}
},
SPACE {
#Override
boolean isA(final char c) {
return Character.isWhitespace(c);
}
},
UNKOWN {
#Override
boolean isA(char c) {
return false;
}
};
abstract boolean isA(final char c);
public static CharacterType toType(final char c) {
for (CharacterType type : values()) {
if (type.isA(c)) {
return type;
}
}
return UNKOWN;
}
}
private static String getTransitionType(final CharacterType prev, final CharacterType current) {
return prev + "_TO_" + current;
}
public static void main(String[] args) {
final String myString = "AAaaA Aaa AA";
final Map<String, Integer> countMap = new TreeMap<String, Integer>() {
#Override
public Integer put(final String key, final Integer value) {
final Integer currentCount = get(key);
if (currentCount == null) {
return super.put(key, value);
}
return super.put(key, currentCount + value);
}
};
final char[] myStringAsArray = myString.toCharArray();
CharacterType prev = CharacterType.toType(myStringAsArray[0]);
for (int i = 1; i < myStringAsArray.length; ++i) {
final CharacterType current = CharacterType.toType(myStringAsArray[i]);
countMap.put(getTransitionType(prev, current), 1);
prev = current;
}
for (final Entry<String, Integer> entry : countMap.entrySet()) {
System.out.println(entry);
}
}
Output:
LOWER_TO_LOWER=2
LOWER_TO_SPACE=1
LOWER_TO_UPPER=1
SPACE_TO_SPACE=1
SPACE_TO_UPPER=2
UPPER_TO_LOWER=2
UPPER_TO_SPACE=1
UPPER_TO_UPPER=2
Running the method on the content of your question (825 chars) took 9ms.

If you think most of the transitions will be present, then a 2 dimension Array would work best:
int n = _categories.size();
int[][] _transitionFreq = new int[n][n];
If you think it will be a parse array, then a map will be more efficient in terms of memory usage, but less efficient in terms of performance.
It's a trade-off you'll have to make depending on your data and the number of character types.

Related

Linked HashMap - Iteration (Java) [duplicate]

This question already has answers here:
How do I efficiently iterate over each entry in a Java Map?
(46 answers)
Closed 3 years ago.
I'm trying to iterate through a Linked HashMap keySet however I am having difficulties in getting it to work.
Essentially I am searching the keySet to find a word, and another word. If the second word is immediately after the first word I wish to return true. This is the progress I have made so far.
for (String word : storedWords.keySet()) {
value0++;
if(word.equals(firstWord)){
value1 = value0;
}
if(word.equals(secondWord)){
value2 = value0;
}
int value3 = value2 - 1;
if(value1 == value3){
result = true;
break;
}
}
EDIT:
I've solved my problem and am thankful for all of those who helped. I apologise for making a post when there was a lot of information available on the website however I just lacked the understanding of the logic behind it.
You can avoid iterating over the whole keySet by storing the indices of each element in a separate map; then you can just test if both keys are present and have indices differing by 1. For convenience, encapsulate both maps into an object:
import java.util.*;
public class MapWithIndices<K, V> {
private final Map<K, V> map = new LinkedHashMap<>();
private final Map<K, Integer> indices = new HashMap<>();
public V get(K k) {
return map.get(k);
}
public V put(K k, V v) {
V old = map.put(k, v);
if(old == null) {
indices.put(k, indices.size());
}
return old;
}
public boolean areAdjacent(K k1, K k2) {
Integer i1 = indices.get(k1);
Integer i2 = indices.get(k2);
return i1 != null && i2 != null && i1 + 1 == i2;
}
}
You can add more Map methods (e.g. size) by delegating them to map. However, the remove method cannot be implemented efficiently since it requires recomputing all later indices. If removing from the map is required, an alternative data structure design should be considered; for example, indices can store the original insertion order of each key, and an order statistic tree can be used to count how many existing keys have a lower original-insertion-order.
Map<String, String> map = ...
for (Map.Entry<String, String> entry : map.entrySet()) {
System.out.println(entry.getKey() + "/" + entry.getValue());
}
I think this is sort of in line with what you started with. You might want to test the performance though.
import java.util.LinkedHashMap;
import java.util.Map;
class Testing {
Map<String, Integer> storedWords = new LinkedHashMap<>();
{
storedWords.put("One",1);
storedWords.put("Two",2);
storedWords.put("Three",3);
storedWords.put("Four",4);
storedWords.put("Five",5);
}
public static void main(String[] args) {
Testing t = new Testing();
String firstWord;
String secondWord;
firstWord = "Three";
secondWord = "Five";
System.out.println(t.consecutive(firstWord, secondWord)); // false
firstWord = "Two";
secondWord = "Three";
System.out.println(t.consecutive(firstWord, secondWord)); // true
}
public boolean consecutive(String firstWord, String secondWord) {
boolean foundfirst = false;
for (String word : storedWords.keySet()) {
if (!foundfirst && word.equals(firstWord)){
foundfirst = true;
continue;
}
if (foundfirst) {
if(word.equals(secondWord)){
return true;
} else {
foundfirst = false; // reset to search for the first word again
}
}
}
return false;
}
}

recursively finding value of a String from a map

I have a hashmap containing Key and Value <String, String>.
i.e. mapValue:
mapValue.put("A","B-7");
mapValue.put("B","START+18");
mapValue.put("C","A+25");
Now I want to evaluate expression for 'C'. So for C, the expression would be
replaced by (((START+18)-7)+25).
So if anymethod, I will pass the string C, it should return string
"(((START+18)-7)+25)" and also I want to evaluate it as per the priority.
Thanks
generally logic of such function (assuming, you know possible operations and syntax is strict) may as follows:
public String eval(HashMap<String, String> mapValue, String variable) {
//get expression to be evaluated
String tmp = mapValue.get(variable);
// For each knwon operation
for (String op : OPERATIONS) {
// split expression in operators in Array
String[] vars = tmp.split("\\" + op);
// for each Element of splitted expr. Array
for (int i = 0; i < vars.length; i++) {
//Check if Element is a valid key in HashMap
if (mapValue.containsKey(vars[i])) {
//if it is replace element with result of iteration
vars[i] = eval(mapValue, vars[i]); // DO ITERATION
}
//if Element is not a valid key in has do nothing
}
//Join splitted string with proper operator
tmp = join(vars, op);
}
//return in parenthesis
return "(" + tmp + ")";
}
The result of 'eval(mapValue,"C")' would be:
(((START+18)-7)+25)
Some short join function may be implemented as follows:
public String join(String[] arr, String d) {
String result = arr[0];
int i = 1;
while (i < arr.length) {
result += d + arr[i];
i++;
}
return result;
}
All code provided above is more to illustrate logic, as some exception handling, better operations with string etc should be used.
Hope it helps
Cheers!
As mentioned in the comments I would not recommend recursion - it can lead to stackoverflow-Exceptions, if the recursion gets too deep.
Also I would recommend not to use String equations. Strings are slow to parse and can lead to unexpected results (as mentioned by #rtruszk "START" contains variable "A").
I created an example as my recommendation:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
public class X {
static interface Expression {
}
static class Combination implements Expression {
Expression[] values;
public Combination(Expression... values) {
this.values = values;
}
#Override
public String toString() {
return "?";
}
}
static class Reference implements Expression {
private String reference;
public Reference(String reference) {
this.reference = reference;
}
#Override
public String toString() {
return reference;
}
}
static class Number implements Expression {
private int value;
public Number(int value) {
this.value = value;
}
#Override
public String toString() {
return ""+value;
}
}
public static void main(String[] args) {
Map<String, Expression> mapValue = new HashMap<>();
mapValue.put("START", new Number(42));
String x = "C";
mapValue.put("A", new Combination( new Reference("B"), new Number(-7)));
mapValue.put("B", new Combination(new Reference("START"), new Number(+18)));
mapValue.put("C", new Combination( new Reference("A"), new Number(+25)));
int result = 0;
ArrayList<Expression> parts = new ArrayList<>();
parts.add(mapValue.get(x));
while (!parts.isEmpty()) {
debuggingOutput(x, result, parts);
Expression expression = parts.remove(0);
if (expression instanceof Combination)
parts.addAll(Arrays.asList(((Combination) expression).values));
else if (expression instanceof Reference)
parts.add(mapValue.get(((Reference) expression).reference));
else if (expression instanceof Number)
result += ((Number) expression).value;
}
System.out.println(result);
}
private static void debuggingOutput(String x, int result, ArrayList<Expression> parts) {
System.out.print(x);
System.out.print(" = ");
System.out.print(result);
for (Expression part : parts) {
System.out.print(" + ");
System.out.print(part);
}
System.out.println();
}
}

Searching through an Array of Objects

I'm attempting to return the index of where an object appears in an array of objects.
public static int search(WordCount[] list,WordCount word, int n)
{
int result = -1;
int i=0;
while (result < 0 && i < n)
{
if (word.equals(list[i]))
{
result = i;
break;
}
i++;
}
return result;
}
WordCount[] is the array of objects.
word is an instance of WordCount.
n is the number of objects in WordCount[]
It runs, but isn't returning the index correctly. Any and all help is appreciated. Thanks for your time.
CLASS
class WordCount
{
String word;
int count;
static boolean compareByWord;
public WordCount(String aWord)
{
setWord(aWord);
count = 1;
}
private void setWord(String theWord)
{
word=theWord;
}
public void increment()
{
count=+1;
}
public static void sortByWord()
{
compareByWord = true;
}
public static void sortByCount()
{
compareByWord = false;
}
public String toString()
{
String result = String.format("%s (%d)",word, count);
return result;
}
}
How I'm calling it...
for (int i=0;i<tokens.length;i++)
{
if (tokens[i].length()>0)
{
WordCount word = new WordCount(tokens[i]);
int foundAt = search(wordList, word, n);
if (foundAt >= 0)
{
wordList[foundAt].increment();
}
else
{
wordList[n]=word;
n++;
}
}
}
}
By default, Object#equals just returns whether or not the two references refer to the same object (same as the == operator). Looking at what you are doing, what you need to do is create a method in your WordCount to return word, e.g.:
public String getWord() {
return word;
}
Then change your comparison in search from:
if (word.equals(list[i]))
to:
if (word.getWord().equals(list[i].getWord()))
Or change the signature of the method to accept a String so you don't create a new object if you don't have to.
I wouldn't recommend overriding equals in WordCount so that it uses only word to determine object equality because you have other fields. (For example, one would also expect that two counters were equal only if their counts were the same.)
The other way you can do this is to use a Map which is an associative container. An example is like this:
public static Map<String, WordCount> getCounts(String[] tokens) {
Map<String, WordCount> map = new TreeMap<String, WordCount>();
for(String t : tokens) {
WordCount count = map.get(t);
if(count == null) {
count = new WordCount(t);
map.put(t, count);
}
count.increment();
}
return map;
}
This method is probably not working because the implementation of .equals() you are using is not correctly checking if the two objects are equal.
You need to either override the equals() and hashCode() methods for your WordCount object, or have it return something you want to compare, i.e:word.getWord().equals(list[i].getWord())
It seems easier to use:
public static int search(WordCount[] list, WordCount word)
{
for(int i = 0; i < list.length; i++){
if(list[i] == word){
return i;
}
}
return -1;
}
This checks each value in the array and compares it against the word that you specified.
The odd thing in the current approach is that you have to create a new WordCount object in order to look for the count of a particular word. You could add a method like
public boolean hasEqualWord(WordCount other)
{
return word.equals(other.word);
}
in your WordCount class, and use it instead of the equals method:
....
while (result < 0 && i < n)
{
if (word.hasEqualWord(list[i])) // <--- Use it here!
{
....
}
}
But I'd recommend you to rethink what you are going to model there - and how. While it is not technically "wrong" to create a class that summarizes a word and its "count", there may be more elgant solutions. For example, when this is only about counting words, you could consider a map:
Map<String, Integer> counts = new LinkedHashMap<String, Integer>();
for (int i=0;i<tokens.length;i++)
{
if (tokens[i].length()>0)
{
Integer count = counts.get(tokens[i]);
if (count == null)
{
count = 0;
}
counts.put(tokens[i], count+1);
}
}
Afterwards, you can look up the number of occurrences of each word in this map:
String word = "SomeWord";
Integer count = counts.get(word);
System.out.println(word+" occurred "+count+" times);

How to get N most often words in given text, sorted from max to min?

I have been given a large text as input. I have made a HashMap that stores each different word as a key, and number of times that occurs as value (Integer).
Now I have to make a method called mostOften(int k):List that return a List that gives the first k-words that from max number of occurrence to min number of occurrence ( descending order ) using the HashMap that I have made before.
The problem is that whenever 2 words have the same number of occurrence, then they should be sorted alphabetically.
The first idea that was on my mind was to swap keys and values of the given HashMap, and put it into TreeMap and TreeMap will sort the words by the key(Integer - number of occurrence of the word ) and then just pop the last/first K-entries from the TreeMap.
But I will have collision for sure, when the number of 2 or 3 words are the same. I will compare the words alphabetically but what Integer should I put as a key of the second word comming.
Any ideas how to implement this, or other options ?
Hints:
Look at the javadocs for the Collections.sort methods ... both of them!
Look at the javadocs for Map.entries().
Think about how to implement a Comparator that compares instances of a class with two fields, using the 2nd as a "tie breaker" when the other compares as equal.
Here's the solution with I come up.
First you create a class MyWord that can store the String value of the word and the number of occurences it appears.
You implement the Comparable interface for this class to sort by occurences first and then alphabetically if the number of occurences is the same
Then for the most often method, you create a new List of MyWord from your original map. You add the entries of this to your List
You sort this list
You take the k-first items of this list using subList
You add those Strings to the List<String> and you return it
public class Test {
public static void main(String [] args){
Map<String, Integer> m = new HashMap<>();
m.put("hello",5);
m.put("halo",5);
m.put("this",2);
m.put("that",2);
m.put("good",1);
System.out.println(mostOften(m, 3));
}
public static List<String> mostOften(Map<String, Integer> m, int k){
List<MyWord> l = new ArrayList<>();
for(Map.Entry<String, Integer> entry : m.entrySet())
l.add(new MyWord(entry.getKey(), entry.getValue()));
Collections.sort(l);
List<String> list = new ArrayList<>();
for(MyWord w : l.subList(0, k))
list.add(w.word);
return list;
}
}
class MyWord implements Comparable<MyWord>{
public String word;
public int occurence;
public MyWord(String word, int occurence) {
super();
this.word = word;
this.occurence = occurence;
}
#Override
public int compareTo(MyWord arg0) {
int cmp = Integer.compare(arg0.occurence,this.occurence);
return cmp != 0 ? cmp : word.compareTo(arg0.word);
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + occurence;
result = prime * result + ((word == null) ? 0 : word.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
MyWord other = (MyWord) obj;
if (occurence != other.occurence)
return false;
if (word == null) {
if (other.word != null)
return false;
} else if (!word.equals(other.word))
return false;
return true;
}
}
Output : [halo, hello, that]
In addition to your Map to store word counts I would use a PriorityQueue of fixed size K (with natural order). It will allow to reach O(N) complexity. Here is a code which use this approach:
In constructor we are reading input stream word by word filling the counters in the Map.
In the same time we are updating priority queue keeping it's max size = K (we need count top K words)
public class TopNWordsCounter
{
public static class WordCount
{
String word;
int count;
public WordCount(String word)
{
this.word = word;
this.count = 1;
}
}
private PriorityQueue<WordCount> pq;
private Map<String, WordCount> dict;
public TopNWordsCounter(Scanner scanner)
{
pq = new PriorityQueue<>(10, new Comparator<WordCount>()
{
#Override
public int compare(WordCount o1, WordCount o2)
{
return o2.count-o1.count;
}
});
dict = new HashMap<>();
while (scanner.hasNext())
{
String word = scanner.next();
WordCount wc = dict.get(word);
if (wc == null)
{
wc = new WordCount(word);
dict.put(word, wc);
}
if (pq.contains(wc))
{
pq.remove(wc);
wc.count++;
pq.add(wc);
}
else
{
wc.count++;
if (pq.size() < 10 || wc.count >= pq.peek().count)
{
pq.add(wc);
}
}
if (pq.size() > 10)
{
pq.poll();
}
}
}
public List<String> getTopTenWords()
{
Stack<String> topTen = new Stack<>();
while (!pq.isEmpty())
{
topTen.add(pq.poll().word);
}
return topTen;
}
}

remove smallest k elements from hashmap in JAVA

I have a hashmap of objects. Each object has two attributes (let say int length and int weight).
I want to remove k elements with the smallest length.
What is the efficient way of doing this?
Map<K, V> map = new HashMap<>();
...
Set<K> keys = map.keySet();
TreeSet<K> smallest = new TreeSet<>(new Comparator<K>(){
public int compare(K o1, K o2) {
return o1.getLength() - o2.getLength();
}
});
smallest.addAll(keys);
for(int x = 0; x < num; x++) {
keys.remove(smallest.pollFirst());
}
Where K is your key type, V is your value type, and num is the number of elements you wish to remove.
If you are doing this frequently, it might be a good idea to use a TreeMap in the first place.
The easiest, but certainly not the most efficient is to create an instance of a TreeMap with provided Comparator for your type, putAll() elements from your map to the map you just created and remove k-elements with help of keySet(). In the end a TreeMap will not contain k-smallest elements.
You didn't mention if the attribute you discriminate on is part of the key or the value, if it's the key then teh treemap discussed above is applicbale.
Otherwise If you need to do this often I'd be inclined to implement my own map, delegating everything in the map interface to a hashmap (or appropriate structure0. Override the add/remove and if necessary iterator, then use the add/remove to maintain a sorted list of the values.
This obviously assumes the values don't change and is highly coupled to your problem.
Keep in mind that TreeMap sorts by the natural ordering of its keys. Hence you can create a key with comparable based on the length of it's value. For example (Since I am on Lunch the code isn't perfect but should get you to what you need):
package com.trip.test;
import java.util.SortedMap;
import java.util.TreeMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class ComparisonTest {
private static Logger logger = LoggerFactory.getLogger(ComparisonTest.class);
private static String[] a = {"1","2","3","4"};
private static String[] b = {"A","B","D"};
private static String[] c = {"1","B","D","1","B","D"};
/**
* #param args
*/
static SortedMap<KeyWithLength, String[]> myMap = new TreeMap<KeyWithLength, String[]>();
static {
myMap.put(new KeyWithLength("a", a.length), a);
myMap.put(new KeyWithLength("b", b.length), b);
myMap.put(new KeyWithLength("c", c.length), c);
}
public static void main(String[] args) {
// print Map
logger.info("Original Map:");
int i = 0;
for (String[] strArray: myMap.values() ){
logger.info(String.format("*** Entry %s: ", i++));
printStrings(strArray);
}
// chop off 2 shortest
chopNShortest(myMap, 2);
// print Map
logger.info("ShortenedMap:");
i = 0;
for (String[] strArray: myMap.values() ){
logger.info(String.format("*** Entry %s: ", i++));
printStrings(strArray);
}
}
static void printStrings(String[] strArray){
StringBuffer buf = new StringBuffer();
for (String str: strArray){
buf.append(String.format("%s, ", str));
}
logger.info(buf.toString());
}
static void chopNShortest(SortedMap<KeyWithLength, String[]> sortedMap, int n) {
// Assuming map is not unmodifiable
if (n <= sortedMap.size()-1){
for (int i = 0; i< n;i++){
sortedMap.remove(sortedMap.firstKey());
}
}
}
}
class KeyWithLength implements Comparable<KeyWithLength> {
private String key;
private Integer length;
public KeyWithLength(String key, int length) {
super();
this.key = key;
this.length = length;
}
public String getKey() {
return key;
}
public int getLength() {
return length;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((key == null) ? 0 : key.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
KeyWithLength other = (KeyWithLength) obj;
if (key == null) {
if (other.key != null)
return false;
} else if (!key.equals(other.key))
return false;
return true;
}
#Override
public int compareTo(KeyWithLength another) {
// TODO Auto-generated method stub
return compare(this.length, another.length);
}
public static int compare(int x, int y) {
return (x < y) ? -1 : ((x == y) ? 0 : 1);
}
}
The output:
Original Map:
*** Entry 0:
A, B, D,
*** Entry 1:
1, 2, 3, 4,
*** Entry 2:
1, B, D, 1, B, D,
ShortenedMap:
*** Entry 0:
1, B, D, 1, B, D,

Categories