recursively finding value of a String from a map - java

I have a hashmap containing Key and Value <String, String>.
i.e. mapValue:
mapValue.put("A","B-7");
mapValue.put("B","START+18");
mapValue.put("C","A+25");
Now I want to evaluate expression for 'C'. So for C, the expression would be
replaced by (((START+18)-7)+25).
So if anymethod, I will pass the string C, it should return string
"(((START+18)-7)+25)" and also I want to evaluate it as per the priority.
Thanks

generally logic of such function (assuming, you know possible operations and syntax is strict) may as follows:
public String eval(HashMap<String, String> mapValue, String variable) {
//get expression to be evaluated
String tmp = mapValue.get(variable);
// For each knwon operation
for (String op : OPERATIONS) {
// split expression in operators in Array
String[] vars = tmp.split("\\" + op);
// for each Element of splitted expr. Array
for (int i = 0; i < vars.length; i++) {
//Check if Element is a valid key in HashMap
if (mapValue.containsKey(vars[i])) {
//if it is replace element with result of iteration
vars[i] = eval(mapValue, vars[i]); // DO ITERATION
}
//if Element is not a valid key in has do nothing
}
//Join splitted string with proper operator
tmp = join(vars, op);
}
//return in parenthesis
return "(" + tmp + ")";
}
The result of 'eval(mapValue,"C")' would be:
(((START+18)-7)+25)
Some short join function may be implemented as follows:
public String join(String[] arr, String d) {
String result = arr[0];
int i = 1;
while (i < arr.length) {
result += d + arr[i];
i++;
}
return result;
}
All code provided above is more to illustrate logic, as some exception handling, better operations with string etc should be used.
Hope it helps
Cheers!

As mentioned in the comments I would not recommend recursion - it can lead to stackoverflow-Exceptions, if the recursion gets too deep.
Also I would recommend not to use String equations. Strings are slow to parse and can lead to unexpected results (as mentioned by #rtruszk "START" contains variable "A").
I created an example as my recommendation:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
public class X {
static interface Expression {
}
static class Combination implements Expression {
Expression[] values;
public Combination(Expression... values) {
this.values = values;
}
#Override
public String toString() {
return "?";
}
}
static class Reference implements Expression {
private String reference;
public Reference(String reference) {
this.reference = reference;
}
#Override
public String toString() {
return reference;
}
}
static class Number implements Expression {
private int value;
public Number(int value) {
this.value = value;
}
#Override
public String toString() {
return ""+value;
}
}
public static void main(String[] args) {
Map<String, Expression> mapValue = new HashMap<>();
mapValue.put("START", new Number(42));
String x = "C";
mapValue.put("A", new Combination( new Reference("B"), new Number(-7)));
mapValue.put("B", new Combination(new Reference("START"), new Number(+18)));
mapValue.put("C", new Combination( new Reference("A"), new Number(+25)));
int result = 0;
ArrayList<Expression> parts = new ArrayList<>();
parts.add(mapValue.get(x));
while (!parts.isEmpty()) {
debuggingOutput(x, result, parts);
Expression expression = parts.remove(0);
if (expression instanceof Combination)
parts.addAll(Arrays.asList(((Combination) expression).values));
else if (expression instanceof Reference)
parts.add(mapValue.get(((Reference) expression).reference));
else if (expression instanceof Number)
result += ((Number) expression).value;
}
System.out.println(result);
}
private static void debuggingOutput(String x, int result, ArrayList<Expression> parts) {
System.out.print(x);
System.out.print(" = ");
System.out.print(result);
for (Expression part : parts) {
System.out.print(" + ");
System.out.print(part);
}
System.out.println();
}
}

Related

Java: Converting between bases with custom symbols

I was wondering if you can create a custom base with your own symbols instead of the one Java applies to you with Integer.parseInt (0-9 and A-P.)
I was thinking of something like this:
public class Base {
private String symbols;
public Base(String symbols) {
this.symbols = symbols;
}
// for example: new Base("0123456789"); would represent base 10
public static String convertBases(Base from, Base to, String toConvert) {
// Takes toConvert which is encoded in base "from" and converts it to base "to"
}
}
I am not sure how to implement this. Does anyone have the code for this?
To do this, you need to first parse the input text in the from base, then format the value in the to base, exactly like you'd need to do if using standard base "alphabet".
public static String convertBases(int fromRadix, int toRadix, String text) {
int value = Integer.parseInt(text, fromRadix);
return Integer.toString(value, toRadix);
}
So, first you implement parse and toString, then implementing convertTo is easy:
public class Base {
private final String symbols;
private final BigInteger radix;
private final Map<Character, Integer> symbolIndex;
public Base(String symbols) {
if (symbols.length() <= 1)
throw new IllegalArgumentException("Must provide at least 2 symbols: length=" + symbols.length());
this.symbols = symbols;
this.radix = BigInteger.valueOf(symbols.length());
this.symbolIndex = new HashMap<>(symbols.length() * 4 / 3 + 1);
for (int i = 0; i < symbols.length(); i++) {
Integer prevIndex = this.symbolIndex.putIfAbsent(symbols.charAt(i), i);
if (prevIndex != null)
throw new IllegalArgumentException("Duplicate symbol at index " + prevIndex +
" and " + i + ": " + symbols.charAt(i));
}
}
public BigInteger parse(String text) {
BigInteger value = BigInteger.ZERO;
for (int i = 0; i < text.length(); i++) {
Integer index = this.symbolIndex.get(text.charAt(i));
if (index == null)
throw new IllegalArgumentException("Not a valid number: " + text);
value = value.multiply(this.radix).add(BigInteger.valueOf(index));
}
return value;
}
public String toString(BigInteger value) {
if (value.signum() < 0)
throw new IllegalArgumentException("Negative value not allowed: " + value);
if (value.signum() == 0)
return this.symbols.substring(0, 1);
StringBuilder buf = new StringBuilder();
for (BigInteger v = value; v.signum() != 0; v = v.divide(this.radix))
buf.append(this.symbols.charAt(v.mod(this.radix).intValue()));
return buf.reverse().toString();
}
public String convertTo(Base newBase, String text) {
return newBase.toString(parse(text));
}
}
Test
Base base3 = new Base("012");
Base base6alpha = new Base("ABCDEF");
System.out.println(base3.convertTo(base6alpha, "0")); // 0 -> A
System.out.println(base3.convertTo(base6alpha, "2")); // 2 -> C
System.out.println(base3.convertTo(base6alpha, "10")); // 3 -> D
System.out.println(base3.convertTo(base6alpha, "200")); // 18 -> DA
Output
A
C
D
DA
Test 2
Base obscure = new Base("^JsdloYF9%");
Base base64 = new Base("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/");
BigInteger value = new BigInteger("123456789012345678901234567890"); // Too large for int and long
String obscureValue = obscure.toString(value);
String base64Value = base64.toString(value);
System.out.println(obscureValue);
System.out.println(base64Value);
System.out.println(base64.convertTo(obscure, base64Value));
System.out.println(obscure.convertTo(base64, obscureValue));
Output
JsdloYF9%^JsdloYF9%^JsdloYF9%^
BjukP9sNz4O5OPwrS
JsdloYF9%^JsdloYF9%^JsdloYF9%^
BjukP9sNz4O5OPwrS
Let's start with a value type. It holds a string representation and a Base object. (i.e., it has a string representation and a something like a decoder). Why? because we don't want to pass around Strings which we need to look at and "guess" what base they are.
public class CustomNumber {
private final String stringRepresentation;
private final Base base;
public CustomNumber(String stringRepresentation, Base base) {
super();
this.stringRepresentation = stringRepresentation;
this.base = base;
}
public long decimalValue() {
return base.toDecimal(stringRepresentation);
}
public CustomNumber toBase(Base newBase) {
long decimalValue = this.decimalValue();
String stringRep = newBase.fromDecimal(decimalValue);
return new CustomNumber(stringRep, newBase);
}
}
Then we need to define an interface which is broad enough to handle any regular or custom-symbol base. We will later build concrete implementations on top.
public interface Base {
public long toDecimal(String stringRepresentation);
public String fromDecimal(long decimalValue);
}
We are all set. Lets do an example implementation to support the standard decimal number format before going to custom string symbols:
public class StandardBaseLong implements Base{
public long toDecimal(String stringRepresentation) {
return Long.parseLong(stringRepresentation);
}
public String fromDecimal(long decimalValue) {
return Long.toString(decimalValue);
}
}
Now finally, coming to the custom string base:
public class CustomBase implements Base{
private String digits;
public CustomBase(String digits) {
this.digits = digits;
}
public long toDecimal(String stringRepresentation) {
//Write logic to interpret that string as your base
return 0L;
}
public String fromDecimal(long decimalValue) {
//Write logic to generate string output in your base format
return null;
}
}
Now you have a framework to work with various custom and standard bases.
Of course, there could be more customisations and improved features (more convenience constructors, hashCode and equals implementations and arithmetic). But, they are beyond the scope of this answer.

In Java, is there a cleaner approach to an if statement with a slew of ||'s

I know this question is basic but I am looking for a less-clumsy approach to the following if statement:
if ((sOne.Contains('*')) || (sOne.Contains('/')) || (sOne.Contains('-')) || (sOne.Contains('+')) || (sOne.Contains('%'))){
I should also note that sOne.Contains() refers to the following code...
public boolean Contains(char key) {
// Checks stack for key
boolean retval = arrs.contains(key);
return retval;
}
It should also be noted that those five chars will never be changed.
You could use a breaking for-each loop over a character array:
for (char c : "*/-+%".toCharArray()) {
if (sOne.Contains(c)) {
// ...
break;
}
}
If you're extremely concerned about performance you might also want to pull out the toCharArray() call and cache the result in a static final char[] constant.
You could even use this strategy to define other convenience methods on your sOne object, like ContainsAny or ContainsAll (credit to Sina Madrid for the name ContainsAny):
public boolean ContainsAny (CharSequence keys) {
for (int i = 0; i < keys.length; i++)
if (Contains(keys.charAt(i)) return true;
return false;
}
public boolean ContainsAll (CharSequence keys) {
for (int i = 0; i < keys.length; i++)
if (!Contains(keys.charAt(i)) return false;
return true;
}
Usage would look something like this:
if (sOne.ContainsAny("*/-+%")) {
// ...
}
You can try using regular expression like this
if (sOne.matches(".*[-*/+%].*")) {
// your code
}
Somewhere, you need the characters in a list:
List<Character> keys = Arrays.asList('*', '/', '-', '+', '%');
And then you can do:
if (keys.stream().anyMatch(sOne::Contains)) {
How about this method:
Items are your keys stored in an Array.
public static boolean stringContainsItemFromList(String inputStr, String[] items)
{
for(int i =0; i < items.length; i++)
{
if(inputStr.contains(items[i]))
{
return true;
}
}
return false;
}
You don't show what arr is but since it has a contains method I'm going to assume it is a collection. So if you were to put your keys into a static collection like a Set and if arr is a collection of some type as well, you could use Collections.disjoint. disjoint returns true if there is no intersection between two collections.
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.Collectors;
public class Test {
static Set<Character> keys = new HashSet<Character>(Arrays.asList('*','/','-','+','%'));
static class SOne {
Set<Character> arrs = null;
SOne(String line) {
arrs = line.chars().mapToObj(e->(char)e).collect(Collectors.toSet());
}
public boolean Contains(Set<Character> checkset) {
return !Collections.disjoint(arrs, checkset);
}
}
static public void main(String args[]) {
SOne sOne = new SOne("Does not contain");
SOne sTwo = new SOne("Does contain a + sign");
if(sOne.Contains(keys)) {
System.out.println("Fail: Contains a key");
} else {
System.out.println("Pass: Does not contain a key");
}
if(sTwo.Contains(keys)) {
System.out.println("Pass: Contains a key");
} else {
System.out.println("Fail: Does not contain a key");
}
}
}
You could write a method like this and still re-use your existing method (substitute T for type of sOne):
static <T> boolean ContainsAny(T sOne, char... keys) {
for (char key : keys) {
if (sOne.Contains(key))
return true;
}
return false;
}
You can then invoke it like so with any number of characters to evaluate:
if (ContainsAny(sOne, '%', '_', '-')) {
//code here
}
If you're using an if statement of this form in only one place, it would be fine to keep the same structure but just format it more neatly:
if (sOne.contains('+') ||
sOne.contains('-') ||
sOne.contains('*') ||
sOne.contains('/') ||
sOne.contains('%')) {
...
}
P.S. In your method it is not necessary to define a boolean variable merely to immediately return it; you can return the other method's result directly:
public boolean contains(char key) {
// Checks stack for key
return arrs.contains(key);
}
Please also note the naming conventions: method names (contains) should start with a lower-case letter.

JAVA: Cast String to (dynamically known) primitive type in order to instantiate a (dynamically known) class

I have a repository class that uses text files(a requirement), meaning that I have to read strings and cast them in order to instantiate objects. The problem is that I want my repository class as general as I can make it, in order to use it to manipulate different object types.
So, is there a (more elegant) way to dynamically cast strings to whatever field (primitive) type it needs at runtime, while avoiding lots of
try-catch structures with numerous ifs/switches?
As a short simplified version, I want objectA.txt to contain only objectA's information, similarly for objectB.txt, and my Repository code to handle both:
Repository repoA = new Repository("objectA.txt", < list of Types for A >); TypeA a=repoA.getOne();
Repository repoB = new Repository("objectB.txt", < list of Types for B >); TypeB b=repoB.getOne();
What I have:
public class FileRepository extends InMemoryRepository{
private String fileName;
private List<Class> types;
public FileRepository(String fileName, List<Class> types) {
//#param types
// - list containing the Class to be instantiated followed by it's field types
super();
this.fileName = fileName;
this.types=types;
loadData();
}
private void loadData() {
Path path = Paths.get(fileName);
try {
Files.lines(path).forEach(line -> {
List<String> items = Arrays.asList(line.split(","));
//create Class array for calling the correct constructor
Class[] cls=new Class[types.size()-1];
for (int i=1; i<types.size(); i++){
cls[i-1]=types.get(i);
}
Constructor constr=null;
try {
//get the needed constructor
constr = types.get(0).getConstructor(cls);
} catch (NoSuchMethodException e) {
//do something
e.printStackTrace();
}
//here is where the fun begins
//#arg0 ... #argn are the primitives that need to be casted from string
//something like:
//*(type.get(1))* arg0=*(cast to types.get(1))* items.get(0);
//*(type.get(2))* arg1=*(cast to types.get(2))* items.get(1);
//...
Object obj= (Object) constr.newInstance(#arg0 ... #argn);
});
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
P.S.: I'm a JAVA newbie, so please keep the explanations as simple as possible.
No IDE on hand, so I hope this makes sense:
private static final Map<Class, Function<String, ?>> parsers = new HashMap<>();
static {
parsers.put(Long.class, Long::parseLong);
parsers.put(Integer.class, Integer::parseInt);
parsers.put(String.class, String::toString);
parsers.put(Double.class, Double::parseDouble);
parsers.put(Float.class, Float::parseFloat);
// add your own types here.
}
public <T> T parse(Class<T> klass, String value) {
// add some null-handling logic here? and empty values.
return (T)parsers.get(klass).apply(value);
}
Then when you need to create the parameters for your constructor:
parameters =
IntStream
.range(0, cls.size-1)
.map(i -> (Object)parse(types.get(i), items.get(i)))
.toArray(Object[]::new);
I think you can make use of auto-boxing and auto-unboxing coupled with the observation that all wrapper classes provide a method named valueOf that accepts a String and returns an instance of the respective (wrapper) type such that the given string represents the legal value of that type.
The following is an attempt of a type-safe implementation that suits your needs:
import java.io.*;
import java.lang.reflect.*;
import java.util.*;
import java.util.function.Consumer;
/**
* Created by kmhaswade on 3/18/16.
*/
//#ThreadUnsafe
public class NonStreamingGenericPrimitiveDataRepo<T> implements Iterable<T> {
#Override
public Iterator<T> iterator() {
return new Iterator<T>() {
#Override
public boolean hasNext() {
return theIterator.hasNext();
}
#Override
public T next() {
String next = theIterator.next();
try {
Method m = theType.getDeclaredMethod("valueOf", String.class);
return (T) m.invoke(null, next);
} catch (NoSuchMethodException | IllegalAccessException e) {
throw new RuntimeException("This is impossible!");
} catch (InvocationTargetException e) {
throw new RuntimeException("data: " + next + " does not represent type: " + theType);
}
}
};
}
#Override
public void forEach(Consumer<? super T> action) {
throw new RuntimeException("left as an exercise :-) ");
}
private final ArrayList<String> theCache;
private final Iterator<String> theIterator;
private final Class<T> theType;
public NonStreamingGenericPrimitiveDataRepo(Reader reader, Class<T> theType) throws IOException {
Objects.requireNonNull(reader);
Objects.requireNonNull(theType);
if (Integer.class.equals(theType)
|| Long.class.equals(theType)
|| Float.class.equals(theType)
|| Double.class.equals(theType)
|| Boolean.class.equals(theType)
|| String.class.equals(theType)) {
theCache = new ArrayList<>();
try (BufferedReader br = new BufferedReader(reader)) {
String line;
while ((line = br.readLine()) != null)
theCache.add(line);
}
theIterator = theCache.iterator();
this.theType = theType;
} else {
throw new IllegalArgumentException("Not a wrapper type: " + theType);
}
}
public static void main(String[] args) throws IOException {
for (int i : new NonStreamingGenericPrimitiveDataRepo<>(ints(), Integer.class))
System.out.println("read an int: " + i);
for (float f : new NonStreamingGenericPrimitiveDataRepo<>(floats(), Float.class))
System.out.println("read a float: " + f);
for (boolean b: new NonStreamingGenericPrimitiveDataRepo<>(booleans(), Boolean.class))
System.out.println("read a boolean: " + b);
}
static StringReader ints() {
return new StringReader("1.w\n2\n-3\n4\n");
}
static StringReader floats() {
return new StringReader("1.0f\n3.25f\n-3.33f\n4.44f\n");
}
static StringReader booleans() {
return new StringReader("false \ntrue\n");
}
}
If you want to identify the type of primitive data type from a String, you can use the following:
public class Test {
final static String LONG_PATTERN = "[-+]?\\d+";
final static String DOUBLE_PATTERN = "[-+]?(\\d*[.])?\\d+";
final static String BOOLEAN_PATTERN = "(true|false)";
final static String CHAR_PATTERN = "[abcdefghijklmnopqrstuvwxyz]";
public static void main(String[] args) {
String[] xList= {
"1", //int
"111111111111", //long
"1.1", //float
"1111111111111111111111111111111111111111111111111111.1", //double
"c", //char
"true", //boolean
"end" //String
};
for (String x: xList){
if(x.matches(LONG_PATTERN)){
long temp = Long.parseLong(x);
if (temp >= Integer.MIN_VALUE && temp <= Integer.MAX_VALUE){
System.out.println( x + " is int use downcast");
} else {
System.out.println( x + " is long");
}
} else if(x.matches(DOUBLE_PATTERN)){
double temp = Double.parseDouble(x);
if (temp >= Float.MIN_VALUE && temp <= Float.MAX_VALUE){
System.out.println( x + " is float use downcast");
} else {
System.out.println( x + " is Double");
}
} else if (x.toLowerCase().matches(BOOLEAN_PATTERN)){
boolean temp = x.toLowerCase().equals("true");
System.out.println(x + " is Boolean");
} else if(x.length() == 1){
System.out.println(x + " is char");
}else {
System.out.println( x + " is String");
}
}
}
}
Output:
1 is int use downcast
111111111111 is long
1.1 is float use downcast
1111111111111111111111111111111111111111111111111111.1 is Double
c is char
true is Boolean
end is String
The above code categorizes your String in 4 major parts long integer, double, boolean and if none matches then String. As java states, primitive data types fall in two categories:
Integers
byte
char (represented as a character)
short
int
long
Floating point numbers
float
double
Boolean
boolean
This way you will be able to identify the types in which your String lies. You can modify the code to check the range and type cast the numbers accordingly in byte and short as well.

Searching through an Array of Objects

I'm attempting to return the index of where an object appears in an array of objects.
public static int search(WordCount[] list,WordCount word, int n)
{
int result = -1;
int i=0;
while (result < 0 && i < n)
{
if (word.equals(list[i]))
{
result = i;
break;
}
i++;
}
return result;
}
WordCount[] is the array of objects.
word is an instance of WordCount.
n is the number of objects in WordCount[]
It runs, but isn't returning the index correctly. Any and all help is appreciated. Thanks for your time.
CLASS
class WordCount
{
String word;
int count;
static boolean compareByWord;
public WordCount(String aWord)
{
setWord(aWord);
count = 1;
}
private void setWord(String theWord)
{
word=theWord;
}
public void increment()
{
count=+1;
}
public static void sortByWord()
{
compareByWord = true;
}
public static void sortByCount()
{
compareByWord = false;
}
public String toString()
{
String result = String.format("%s (%d)",word, count);
return result;
}
}
How I'm calling it...
for (int i=0;i<tokens.length;i++)
{
if (tokens[i].length()>0)
{
WordCount word = new WordCount(tokens[i]);
int foundAt = search(wordList, word, n);
if (foundAt >= 0)
{
wordList[foundAt].increment();
}
else
{
wordList[n]=word;
n++;
}
}
}
}
By default, Object#equals just returns whether or not the two references refer to the same object (same as the == operator). Looking at what you are doing, what you need to do is create a method in your WordCount to return word, e.g.:
public String getWord() {
return word;
}
Then change your comparison in search from:
if (word.equals(list[i]))
to:
if (word.getWord().equals(list[i].getWord()))
Or change the signature of the method to accept a String so you don't create a new object if you don't have to.
I wouldn't recommend overriding equals in WordCount so that it uses only word to determine object equality because you have other fields. (For example, one would also expect that two counters were equal only if their counts were the same.)
The other way you can do this is to use a Map which is an associative container. An example is like this:
public static Map<String, WordCount> getCounts(String[] tokens) {
Map<String, WordCount> map = new TreeMap<String, WordCount>();
for(String t : tokens) {
WordCount count = map.get(t);
if(count == null) {
count = new WordCount(t);
map.put(t, count);
}
count.increment();
}
return map;
}
This method is probably not working because the implementation of .equals() you are using is not correctly checking if the two objects are equal.
You need to either override the equals() and hashCode() methods for your WordCount object, or have it return something you want to compare, i.e:word.getWord().equals(list[i].getWord())
It seems easier to use:
public static int search(WordCount[] list, WordCount word)
{
for(int i = 0; i < list.length; i++){
if(list[i] == word){
return i;
}
}
return -1;
}
This checks each value in the array and compares it against the word that you specified.
The odd thing in the current approach is that you have to create a new WordCount object in order to look for the count of a particular word. You could add a method like
public boolean hasEqualWord(WordCount other)
{
return word.equals(other.word);
}
in your WordCount class, and use it instead of the equals method:
....
while (result < 0 && i < n)
{
if (word.hasEqualWord(list[i])) // <--- Use it here!
{
....
}
}
But I'd recommend you to rethink what you are going to model there - and how. While it is not technically "wrong" to create a class that summarizes a word and its "count", there may be more elgant solutions. For example, when this is only about counting words, you could consider a map:
Map<String, Integer> counts = new LinkedHashMap<String, Integer>();
for (int i=0;i<tokens.length;i++)
{
if (tokens[i].length()>0)
{
Integer count = counts.get(tokens[i]);
if (count == null)
{
count = 0;
}
counts.put(tokens[i], count+1);
}
}
Afterwards, you can look up the number of occurrences of each word in this map:
String word = "SomeWord";
Integer count = counts.get(word);
System.out.println(word+" occurred "+count+" times);

Efficiently Compare Successive Characters in String

I'm doing some text analysis, and need to record the frequencies of character transitions in a String. I have n categories of characters: for the sake of example, isUpperCase(), isNumber(), and isSpace().
Given that there are n categories, there will be n^2 categories of transitions, e.g. "isUpperCase() --> isUpperCase()", "isUpperCase --> isLetter()", "isLetter() --> isUpperCase()", etc.
Given a block of text, I would like to record the number of transitions that took place. I would imagine constructing a Map with the transition types as the Keys, and an Integer as each Value.
For the block of text "TO", the Map would look like [isUpper -> isUpper : 1, isUpper -> isSpace : 1]
The part I cannot figure out, though, is how to construct a Map where, from what I can see, the Key would consist of 2 boolean methods.
Create an enum that represents character types - you need a way to get a character type enum given a character. I'm sure there are better ways to do that than what I have done below but that is left as an exercise to the reader.
Next create a method that takes the previous and current characters and concatenates their types into a unique String.
Finally loop over the input string and hey presto.
private static enum CharacterType {
UPPER {
#Override
boolean isA(final char c) {
return Character.isUpperCase(c);
}
},
LOWER {
#Override
boolean isA(final char c) {
return Character.isLowerCase(c);
}
},
SPACE {
#Override
boolean isA(final char c) {
return Character.isWhitespace(c);
}
},
UNKOWN {
#Override
boolean isA(char c) {
return false;
}
};
abstract boolean isA(final char c);
public static CharacterType toType(final char c) {
for (CharacterType type : values()) {
if (type.isA(c)) {
return type;
}
}
return UNKOWN;
}
}
private static String getTransitionType(final CharacterType prev, final CharacterType current) {
return prev + "_TO_" + current;
}
public static void main(String[] args) {
final String myString = "AAaaA Aaa AA";
final Map<String, Integer> countMap = new TreeMap<String, Integer>() {
#Override
public Integer put(final String key, final Integer value) {
final Integer currentCount = get(key);
if (currentCount == null) {
return super.put(key, value);
}
return super.put(key, currentCount + value);
}
};
final char[] myStringAsArray = myString.toCharArray();
CharacterType prev = CharacterType.toType(myStringAsArray[0]);
for (int i = 1; i < myStringAsArray.length; ++i) {
final CharacterType current = CharacterType.toType(myStringAsArray[i]);
countMap.put(getTransitionType(prev, current), 1);
prev = current;
}
for (final Entry<String, Integer> entry : countMap.entrySet()) {
System.out.println(entry);
}
}
Output:
LOWER_TO_LOWER=2
LOWER_TO_SPACE=1
LOWER_TO_UPPER=1
SPACE_TO_SPACE=1
SPACE_TO_UPPER=2
UPPER_TO_LOWER=2
UPPER_TO_SPACE=1
UPPER_TO_UPPER=2
Running the method on the content of your question (825 chars) took 9ms.
If you think most of the transitions will be present, then a 2 dimension Array would work best:
int n = _categories.size();
int[][] _transitionFreq = new int[n][n];
If you think it will be a parse array, then a map will be more efficient in terms of memory usage, but less efficient in terms of performance.
It's a trade-off you'll have to make depending on your data and the number of character types.

Categories