Java String parsing - {k1=v1,k2=v2,...}

Java String parsing - {k1=v1,k2=v2,...} - java

I have the following string which will probably contain ~100 entries:
String foo = "{k1=v1,k2=v2,...}"
and am looking to write the following function:
String getValue(String key){
// return the value associated with this key
}
I would like to do this without using any parsing library. Any ideas for something speedy?

If you know your string will always look like this, try something like:
HashMap map = new HashMap();
public void parse(String foo) {
String foo2 = foo.substring(1, foo.length() - 1); // hack off braces
StringTokenizer st = new StringTokenizer(foo2, ",");
while (st.hasMoreTokens()) {
String thisToken = st.nextToken();
StringTokenizer st2 = new StringTokenizer(thisToken, "=");
map.put(st2.nextToken(), st2.nextToken());
}
}
String getValue(String key) {
return map.get(key).toString();
}
Warning: I didn't actually try this; there might be minor syntax errors but the logic should be sound. Note that I also did exactly zero error checking, so you might want to make what I did more robust.

The speediest, but ugliest answer I can think of is parsing it character by character using a state machine. It's very fast, but very specific and quite complex. The way I see it, you could have several states:
Parsing Key
Parsing Value
Ready
Example:
int length = foo.length();
int state = READY;
for (int i=0; i<length; ++i) {
switch (state) {
case READY:
//Skip commas and brackets
//Transition to the KEY state if you find a letter
break;
case KEY:
//Read until you hit a = then transition to the value state
//append each letter to a StringBuilder and track the name
//Store the name when you transition to the value state
break;
case VALUE:
//Read until you hit a , then transition to the ready state
//Remember to save the built-key and built-value somewhere
break;
}
}
In addition, you can implement this a lot faster using StringTokenizers (which are fast) or Regexs (which are slower). But overall, individual character parsing is most likely the fastest way.

If the string has many entries you might be better off parsing manually without a StringTokenizer to save some memory (in case you have to parse thousands of these strings, it's worth the extra code):
public static Map parse(String s) {
HashMap map = new HashMap();
s = s.substring(1, s.length() - 1).trim(); //get rid of the brackets
int kpos = 0; //the starting position of the key
int eqpos = s.indexOf('='); //the position of the key/value separator
boolean more = eqpos > 0;
while (more) {
int cmpos = s.indexOf(',', eqpos + 1); //position of the entry separator
String key = s.substring(kpos, eqpos).trim();
if (cmpos > 0) {
map.put(key, s.substring(eqpos + 1, cmpos).trim());
eqpos = s.indexOf('=', cmpos + 1);
more = eqpos > 0;
if (more) {
kpos = cmpos + 1;
}
} else {
map.put(key, s.substring(eqpos + 1).trim());
more = false;
}
}
return map;
}
I tested this code with these strings and it works fine:
{k1=v1}
{k1=v1, k2 = v2, k3= v3,k4 =v4}
{k1= v1,}

Written without testing:
String result = null;
int i = foo.indexOf(key+"=");
if (i != -1 && (foo.charAt(i-1) == '{' || foo.charAt(i-1) == ',')) {
int j = foo.indexOf(',', i);
if (j == -1) j = foo.length() - 1;
result = foo.substring(i+key.length()+1, j);
}
return result;
Yes, it's ugly :-)

Well, assuming no '=' nor ',' in values, the simplest (and shabby) method is:
int start = foo.indexOf(key+'=') + key.length() + 1;
int end = foo.indexOf(',',i) - 1;
if (end==-1) end = foo.indexOf('}',i) - 1;
return (start<end)?foo.substring(start,end):null;
Yeah, not recommended :)

Adding code to check for existance of key in foo is left as exercise to the reader :-)
String foo = "{k1=v1,k2=v2,...}";
String getValue(String key){
int offset = foo.indexOf(key+'=') + key.length() + 1;
return foo.substring(foo.indexOf('=', offset)+1,foo.indexOf(',', offset));
}

Please find my solution:
public class KeyValueParser {
private final String line;
private final String divToken;
private final String eqToken;
private Map<String, String> map = new HashMap<String, String>();
// user_uid=224620; pass=e10adc3949ba59abbe56e057f20f883e;
public KeyValueParser(String line, String divToken, String eqToken) {
this.line = line;
this.divToken = divToken;
this.eqToken = eqToken;
proccess();
}
public void proccess() {
if (Strings.isNullOrEmpty(line) || Strings.isNullOrEmpty(divToken) || Strings.isNullOrEmpty(eqToken)) {
return;
}
for (String div : line.split(divToken)) {
if (Strings.isNullOrEmpty(div)) {
continue;
}
String[] split = div.split(eqToken);
if (split.length != 2) {
continue;
}
String key = split[0];
String value = split[1];
if (Strings.isNullOrEmpty(key)) {
continue;
}
map.put(key.trim(), value.trim());
}
}
public String getValue(String key) {
return map.get(key);
}
}
Usage
KeyValueParser line = new KeyValueParser("user_uid=224620; pass=e10adc3949ba59abbe56e057f20f883e;", ";", "=");
String userUID = line.getValue("user_uid")

Related

Check if a string contains another string twice

I have a for loop which iterates through a maplist and now I want to check every entry of the maplist if it contains a certain String more than once and then delete all Strings except the first one which occurs but I have no clue how to do it.
for (Map<String, String> entry : mapList) {
String line = "";
for (String key : entry.keySet()) {
if (StringUtils.containsAny(key, "5799"){
line += entry.get(key) + "|";
}
list1.add(line);
}
}
I am thankful for every idea.

From your comments I assume your requirements are as follows:
You have string that contains multiple parts delimited by the pipe character |, e.g. "a|e|b|c|a|c|a|d"
You want to remove all repeating strings while preserving the order of elements, e.g. you want "a|e|b|c|d"
To achieve that you could split your string at the pipe, collect the elements into a LinkedHashSet and rejoin the elements using the pipe.
Example using Java 8:
//The pipe needs to be escaped because split() interprets the input as a regex
Set<String> elements = new LinkedHashSet<>( Arrays.asList( input.split( "\\|" ) ) );
//rejoin using the pipe
String output = elements.stream().collect( Collectors.joining( "|" ) );

To see if key contains a string s at least twice, and to remove the second occurrence, use indexOf twice, with the second call starting the search after the first occurrence:
static String removeSecond(String key, String s) {
int idxFirst = key.indexOf(s);
if (idxFirst != -1) {
int idxSecond = key.indexOf(s, idxFirst + s.length());
if (idxSecond != -1) {
return key.substring(0, idxSecond) +
key.substring(idxSecond + s.length());
}
}
return key; // Nothing to remove
}
Test
System.out.println(removeSecond("mississippi", "ss")); // prints: missiippi
System.out.println(removeSecond("mississippi", "i")); // prints: missssippi
System.out.println(removeSecond("mississippi", "pp")); // prints: mississippi
UPDATE
If you want to remove all duplicates, i.e. leave only the first occurrence, keep searching. For best performance of building the new string, use StringBuilder.
static String removeDuplicates(String key, String s) {
int idx = key.indexOf(s);
if (idx == -1)
return key; // Nothing to remove
StringBuilder buf = new StringBuilder();
int prev = 0;
for (int start = idx + s.length(); (idx = key.indexOf(s, start)) != -1; prev = start = idx + s.length())
buf.append(key.substring(prev, idx));
return (prev == 0 ? key : buf.append(key.substring(prev)).toString());
}
Test
System.out.println(removeDuplicates("mississippi", "ss")); // prints: missiippi
System.out.println(removeDuplicates("mississippi", "i")); // prints: misssspp
System.out.println(removeDuplicates("mississippi", "s")); // prints: misiippi
System.out.println(removeDuplicates("mississippi", "ab")); // prints: mississippi

If you want to remove all occurrences except the first one:
public static String removeExceptFirst(String master, String child) throws Exception {
int firstIndex = master.indexOf(child);
int lastIndexOf = master.lastIndexOf(child);
if (firstIndex == lastIndexOf) {
if (firstIndex == -1) {
throw new Exception("No occurrence!");
} else {
throw new Exception("Only one occurrence!");
}
}
while (true) {
firstIndex = master.indexOf(child);
lastIndexOf = master.lastIndexOf(child);
if (firstIndex == lastIndexOf) {
return master;
}
master = master.substring(0, lastIndexOf) + master.substring(child.length() + lastIndexOf);
}
}

Unformat formatted String

I have a simple formatted String:
double d = 12.348678;
int i = 9876;
String s = "ABCD";
System.out.printf("%08.2f%5s%09d", d, s, i);
// %08.2f = '12.348678' -> '00012,35'
// %5s = 'ABCD' -> ' ABCD'
// %09d = '9876' -> '000009876'
// %08.2f%5s%09d = '00012,35 ABCD000009876'
When i know the pattern: %08.2f%5s%09d and String: 00012,35 ABCD000009876:
Can i "unformat" this String in some way?
eg. the expected result something like 3 tokens: '00012,35', ' ABCD', '000009876'

This is specific to your pattern. A general parser for a formatstring, (because what we call unformatting is parsing) would look much different.
public class Unformat {
public static Integer getWidth(Pattern pattern, String format) {
Matcher matcher = pattern.matcher(format);
if (matcher.find()) {
return Integer.valueOf(matcher.group(1));
}
return null;
}
public static String getResult(Pattern p, String format, String formatted,
Integer start, Integer width) {
width = getWidth(p, format);
if (width != null) {
String result = formatted.substring(start, start + width);
start += width;
return result;
}
return null;
}
public static void main(String[] args) {
String format = "%08.2f%5s%09d";
String formatted = "00012.35 ABCD000009876";
String[] formats = format.split("%");
List<String> result = new ArrayList<String>();
Integer start = 0;
Integer width = 0;
for (int j = 1; j < formats.length; j++) {
if (formats[j].endsWith("f")) {
Pattern p = Pattern.compile(".*([0-9])+\\..*f");
result.add(getResult(p, formats[j], formatted, start, width));
} else if (formats[j].endsWith("s")) {
Pattern p = Pattern.compile("([0-9])s");
result.add(getResult(p, formats[j], formatted, start, width));
} else if (formats[j].endsWith("d")) {
Pattern p = Pattern.compile("([0-9])d");
result.add(getResult(p, formats[j], formatted, start, width));
}
}
System.out.println(result);
}
}

Judging by your output format of "%08.2f%5s%09d", it seems comparable to this pattern
"([0-9]{5,}[\\.|,][0-9]{2,})(.{5,})([0-9]{9,})"
Try the following:
public static void main(String[] args) {
String data = "00012,35 ABCD000009876";
Matcher matcher = Pattern.compile("([0-9]{5,}[\\.|,][0-9]{2,})(.{5,})([0-9]{9,})").matcher(data);
List<String> matches = new ArrayList<>();
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
matches.add(matcher.group(i));
}
}
System.out.println(matches);
}
Results:
[00012,35, ABCD, 000009876]
UPDATE
After seeing the comments, here's a generic example without using RegularExpressions as to not copy #bpgergo (+1 to you with generic RegularExpressions approach). Also added some logic in case the format ever exceeded the width of the data.
public static void main(String[] args) {
String data = "00012,35 ABCD000009876";
// Format exceeds width of data
String format = "%08.2f%5s%09d%9s";
String[] formatPieces = format.replaceFirst("^%", "").split("%");
List<String> matches = new ArrayList();
int index = 0;
for (String formatPiece : formatPieces) {
// Remove any argument indexes or flags
formatPiece = formatPiece.replaceAll("^([0-9]+\\$)|[\\+|-|,|<]", "");
int length = 0;
switch (formatPiece.charAt(formatPiece.length() - 1)) {
case 'f':
if (formatPiece.contains(".")) {
length = Integer.parseInt(formatPiece.split("\\.")[0]);
} else {
length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
}
break;
case 's':
length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
break;
case 'd':
length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
break;
}
if (index + length < data.length()) {
matches.add(data.substring(index, index + length));
} else {
// We've reached the end of the data and need to break from the loop
matches.add(data.substring(index));
break;
}
index += length;
}
System.out.println(matches);
}
Results:
[00012,35, ABCD, 000009876]

You can do something like this:
//Find the end of the first value,
//this value will always have 2 digits after the decimal point.
int index = val.indexOf(".") + 3;
String tooken1 = val.substring(0, index);
//Remove the first value from the original String
val = val.substring(index);
//get all values after the last non-numerical character.
String tooken3 = val.replaceAll(".+\\D", "");
//remove the previously extracted value from the remainder of the original String.
String tooken2 = val.replace(tooken3, "");
This will fail if the String value contains a number at the end and probably in some other cases.

As you know the pattern, it means that you are dealing with some kind of regular expression. Use them to utilize your needs.
Java has decent regular expression API for such tasks
Regular expressions can have capturing groups and each group would have a single "unformatted" part just as you want. All depends on regex you will use/create.

Easiest thing to do would be to parse the string using a regex with myString.replaceAll(). myString.split(",") may also be helpful for splitting your string into a string array

LZW decoding miss the first code entry

I followed the Rosetta Java code implementation.
I tried do this LZW coding with my own Dictionary and not with the ASCII Dictionary which was used.
When I try with my own Dictioanry there is a problem about decoding... The result is wrong, because each of decoded word don't view the first 'a' letter.
The result have to be 'abraca abrac abra' and not 'braca brac bra'
I see the problem in decode() method at String act = "" + (char)(int)compressed.remove(0); This will remove all first 'a' letter.
But I don't have any ideas how can I modify this line...
For example if I use the String act = "";instead of above line... the coding will be very wrong, or use another command... I don't know how can I solve this little problem... Or maybe I am looking for on the bad way for the solution.
public class LZW {
public static List<Integer> encode(String uncompressed) {
Map<String,Integer> dictionary = DictionaryInitStringInt();
int dictSize = dictionary.size();
String act = "";
List<Integer> result = new ArrayList<Integer>();
for (char c : uncompressed.toCharArray()) {
String next = act + c;
if (dictionary.containsKey(next))
act = next;
else {
result.add(dictionary.get(act));
// Add next to the dictionary.
dictionary.put(next, dictSize++);
act = "" + c;
}
}
// Output the code for act.
if (!act.equals(""))
result.add(dictionary.get(act));
return result;
}
public static String decode(List<Integer> compressed) {
Map<Integer,String> dictionary = DictionaryInitIntString();
int dictSize = dictionary.size();
String act = "" + (char)(int)compressed.remove(0);
//String act = "";
String result = act;
for (int k : compressed) {
String entry;
if (dictionary.containsKey(k))
entry = dictionary.get(k);
else if (k == dictSize)
entry = act + act.charAt(0);
else
throw new IllegalArgumentException("Nincs ilyen kulcs: " + k);
result += entry;
dictionary.put(dictSize++, act + entry.charAt(0));
act = entry;
}
return result;
}
public static Map<String,Integer> DictionaryInitStringInt()
{
char[] characters = {'a','b','c','d','e','f','g','h','i','j', 'k','l','m','n',
'o','p','q','r','s','t','u','v','w','x','y','z',' ','!',
'?','.',','};
int charactersLength = characters.length;
Map<String,Integer> dictionary = new HashMap<String,Integer>();
for (int i = 0; i < charactersLength; i++)
dictionary.put("" + characters[i], i);
return dictionary;
}
public static Map<Integer,String> DictionaryInitIntString()
{
char[] characters = {'a','b','c','d','e','f','g','h','i','j', 'k','l','m','n',
'o','p','q','r','s','t','u','v','w','x','y','z',' ','!',
'?','.',','};
int charactersLength = characters.length;
Map<Integer,String> dictionary = new HashMap<Integer,String>();
for (int i = 0; i < charactersLength; i++)
dictionary.put(i,"" + characters[i]);
return dictionary;
}
public static void main(String[] args) {
List<Integer> compressed = encode("abraca abrac abra");
System.out.println(compressed);
String decodeed = decode(compressed);
// decodeed will be 'braca brac bra'
System.out.println(decodeed);
}
}

The rosetta example use
"" + (char) (int) compressed.remove(0);
because the first 256 entries of the dictionnary map exactly the 'char' values.
With a custom dictionnary this line should be:
String act = dictionary.get(compressed.remove(0));

Treemap Problem

I am trying to count frequency of words in a text file. But I have to use a different approach. For example, if the file contains BRAIN-ISCHEMIA and ISCHEMIA-BRAIN, I need to count BRAIN-ISCHEMIA twice (and leaving ISCHEMIA-BRAIN) or vice versa. Here is my piece of code-
// Mapping of String->Integer (word -> frequency)
HashMap<String, Integer> frequencyMap = new HashMap<String, Integer>();
// Iterate through each line of the file
String[] temp;
String currentLine;
String currentLine2;
while ((currentLine = in.readLine()) != null) {
// Remove this line if you want words to be case sensitive
currentLine = currentLine.toLowerCase();
temp=currentLine.split("-");
currentLine2=temp[1]+"-"+temp[0];
// Iterate through each word of the current line
// Delimit words based on whitespace, punctuation, and quotes
StringTokenizer parser = new StringTokenizer(currentLine);
while (parser.hasMoreTokens()) {
String currentWord = parser.nextToken();
Integer frequency = frequencyMap.get(currentWord);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
frequency = 0;
}
frequencyMap.put(currentWord, frequency + 1);
}
StringTokenizer parser2 = new StringTokenizer(currentLine2);
while (parser2.hasMoreTokens()) {
String currentWord2 = parser2.nextToken();
Integer frequency = frequencyMap.get(currentWord2);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
frequency = 0;
}
frequencyMap.put(currentWord2, frequency + 1);
}
}
// Display our nice little Map
System.out.println(frequencyMap);
But for the following file-
ISCHEMIA-GLUTAMATE
ISCHEMIA-BRAIN
GLUTAMATE-BRAIN
BRAIN-TOLERATE
BRAIN-TOLERATE
TOLERATE-BRAIN
GLUTAMATE-ISCHEMIA
ISCHEMIA-GLUTAMATE
I am getting the following output-
{glutamate-brain=1, ischemia-glutamate=3, ischemia-brain=1, glutamate-ischemia=3, brain-tolerate=3, brain-ischemia=1, tolerate-brain=3, brain-glutamate=1}
The problem is in second while block I think. Any light on this problem will be highly appreciated.

From an algorithm perspective, you may want to consider the following approach:
For each string, split, then sort, then re-combine (i.e. take DEF-ABC and convert to ABC-DEF. ABC-DEF would convert to ABC-DEF). Then use that as the key for your frequency count.
If you need to hold onto the exact original item, just include that in your key - so the key would have: ordinal (the re-combined string) and original.

Disclaimer: I stole the sweet trick suggested by Kevin Day for my implementation.
I still want to post just to let you know that using the right data structure (Multiset/Bad) and the right libraries (google-guava) will not only simplify the code but also makes it efficient.
Code
public class BasicFrequencyCalculator
{
public static void main(final String[] args) throws IOException
{
#SuppressWarnings("unchecked")
Multiset<Word> frequency = Files.readLines(new File("c:/2.txt"), Charsets.ISO_8859_1, new LineProcessor() {
private final Multiset<Word> result = HashMultiset.create();
#Override
public Object getResult()
{
return result;
}
#Override
public boolean processLine(final String line) throws IOException
{
result.add(new Word(line));
return true;
}
});
for (Word w : frequency.elementSet())
{
System.out.println(w.getOriginal() + " = " + frequency.count(w));
}
}
}
public class Word
{
private final String key;
private final String original;
public Word(final String orig)
{
this.original = orig.trim();
String[] temp = original.toLowerCase().split("-");
Arrays.sort(temp);
key = temp[0] + "-"+temp[1];
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((getKey() == null) ? 0 : getKey().hashCode());
return result;
}
#Override
public boolean equals(final Object obj)
{
if (this == obj)
{
return true;
}
if (obj == null)
{
return false;
}
if (!(obj instanceof Word))
{
return false;
}
Word other = (Word) obj;
if (getKey() == null)
{
if (other.getKey() != null)
{
return false;
}
}
else if (!getKey().equals(other.getKey()))
{
return false;
}
return true;
}
#Override
public String toString()
{
return getOriginal();
}
public String getKey()
{
return key;
}
public String getOriginal()
{
return original;
}
}
Output
BRAIN-TOLERATE = 3
ISCHEMIA-GLUTAMATE = 3
GLUTAMATE-BRAIN = 1
ISCHEMIA-BRAIN = 1

Thanks everyone for your help. Here is how I solved it-
// Mapping of String->Integer (word -> frequency)
TreeMap<String, Integer> frequencyMap = new TreeMap<String, Integer>();
// Iterate through each line of the file
String[] temp;
String currentLine;
String currentLine2;
while ((currentLine = in.readLine()) != null) {
temp=currentLine.split("-");
currentLine2=temp[1]+"-"+temp[0];
// Iterate through each word of the current line
StringTokenizer parser = new StringTokenizer(currentLine);
while (parser.hasMoreTokens()) {
String currentWord = parser.nextToken();
Integer frequency = frequencyMap.get(currentWord);
Integer frequency2 = frequencyMap.get(currentLine2);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
if (frequency2 == null)
frequency = 0;
else {
frequencyMap.put(currentLine2, frequency2 + 1);
break;
}//else
} //if (frequency == null)
frequencyMap.put(currentWord, frequency + 1);
}//while (parser.hasMoreTokens())
}//while ((currentLine = in.readLine()) != null)
// Display our nice little Map
System.out.println(frequencyMap);

arrayListOutOfBoundsException

This is my class Debugger. Can anyone try and run it and see whens wrong? Ive spent hours on it already. :(
public class Debugger {
private String codeToDebug = "";
public Debugger(String code) {
codeToDebug = code;
}
/**
* This method itterates over a css file and adds all the properties to an arraylist
*/
public void searchDuplicates() {
boolean isInside = false;
ArrayList<String> methodStorage = new ArrayList();
int stored = 0;
String[] codeArray = codeToDebug.split("");
try {
int i = 0;
while(i<codeArray.length) {
if(codeArray[i].equals("}")) {
isInside = false;
}
if(isInside && !codeArray[i].equals(" ")) {
boolean methodFound = false;
String method = "";
int c = i;
while(!methodFound) {
method += codeArray[c];
if(codeArray[c+1].equals(":")) {
methodFound = true;
} else {
c++;
}
}
methodStorage.add(stored, method);
System.out.println(methodStorage.get(stored));
stored++;
boolean stillInside = true;
int skip = i;
while(stillInside) {
if(codeArray[skip].equals(";")) {
stillInside = false;
} else {
skip++;
}
}
i = skip;
}
if(codeArray[i].equals("{")) {
isInside = true;
}
i++;
}
} catch(ArrayIndexOutOfBoundsException ar) {
System.out.println("------- array out of bounds exception -------");
}
}
/**
* Takes in String and outputs the number of characters it contains
* #param input
* #return Number of characters
*/
public static int countString(String input) {
String[] words = input.split("");
int counter = -1;
for(int i = 0; i<words.length; i++){
counter++;
}
return counter;
}
public static void main(String[] args) {
Debugger h = new Debugger("body {margin:;\n}");
h.searchDuplicates();
}
}

Any place where an element of an array is being obtained without a bounds check after the index is manipulated is an candidate for an ArrayIndexOutOfBoundsException.
In the above code, there are at least two instances where the index is being manipulated without being subject to a bounds check.
The while loop checking the !methodFound condition
The while loop checking the stillInside condition
In those two cases, the index is being manipulated by incrementing or adding a value to the index, but there are no bound checks before an element is being obtained from the String[], therefore there is no guarantee that the index being specified is not outside the bounds of the array.

I think this block of codes can create your problem
int c = i;
while(!methodFound) {
method += codeArray[c];
if(codeArray[c+1].equals(":")) {
methodFound = true;
} else {
c++;
}
}
int skip = i;
while(stillInside) {
if(codeArray[skip].equals(";")) {
stillInside = false;
} else {
skip++;
}
}
i = skip;
The reason is that if the condition is true, and i = codeArray.length - 1. The c + 1 will create the error of ArrayIndexOutOfBound

Try evaluating if your index exists in the array...
adding:
while (!methodFound && c < codeArray.length) {
while (stillInside && skip < codeArray.length) {
if (i < codeArray.length && codeArray[i].equals("{")) {
so, your code looks like:
public class Debugger {
private String codeToDebug = "";
public Debugger(String code) {
codeToDebug = code;
}
/**
* This method itterates over a css file and adds all the properties to an
* arraylist
*/
public void searchDuplicates() {
boolean isInside = false;
List<String> methodStorage = new ArrayList<String>();
int stored = 0;
String[] codeArray = codeToDebug.split("");
try {
int i = 0;
while (i < codeArray.length) {
if (codeArray[i].equals("}")) {
isInside = false;
}
if (isInside && !codeArray[i].equals(" ")) {
boolean methodFound = false;
String method = "";
int c = i;
while (!methodFound && c < codeArray.length) {
method += codeArray[c];
if (codeArray[c].equals(":")) {
methodFound = true;
} else {
c++;
}
}
methodStorage.add(stored, method);
System.out.println(methodStorage.get(stored));
stored++;
boolean stillInside = true;
int skip = i;
while (stillInside && skip < codeArray.length) {
if (codeArray[skip].equals(";")) {
stillInside = false;
} else {
skip++;
}
}
i = skip;
}
if (i < codeArray.length && codeArray[i].equals("{")) {
isInside = true;
}
i++;
}
} catch (ArrayIndexOutOfBoundsException ar) {
System.out.println("------- array out of bounds exception -------");
ar.printStackTrace();
}
}
/**
* Takes in String and outputs the number of characters it contains
*
* #param input
* #return Number of characters
*/
public static int countString(String input) {
String[] words = input.split("");
int counter = -1;
for (int i = 0; i < words.length; i++) {
counter++;
}
return counter;
}
public static void main(String[] args) {
Debugger h = new Debugger("body {margin:prueba;\n}");
h.searchDuplicates();
}
}
Also, declaring implementation types is a bad practice, because of that in the above code i Change the ArrayList variable = new ArrayList() to List variable = new ArrayList()

I couldn't resist to implement this task of writing a CSS parser in a completely different way. I have split the task of parsing into many small ones.
The smallest is called skipWhitespace, since you will need it everywhere when parsing text files.
The next one is parseProperty, which reads one property of the form name:value;.
Based on that, parseSelector reads a complete CSS selector, starting with the selector name, an opening brace, possibly many properties, and finishing with the closing brace.
Still based on that, parseFile reads a complete file, consisting of possibly many selectors.
Note how carefully I checked whether the index is small enough. I did that before every access to the chars array.
I used LinkedHashMaps to save the properties and the selectors, because these kinds of maps remember in which order the things have been inserted. Normal HashMaps don't do that.
The task of parsing a text file is generally quite complex, and this program only attempts to handle the basics of CSS. If you need a full CSS parser, you should definitely look for a ready-made one. This one cannot handle #media or similar things where you have nested blocks. But it shouldn't bee too difficult to add it to the existing code.
This parser will not handle CSS comments very well. It only expects them at a few places. If comments appear in other places, the parser will not treat them as comments.
import java.util.LinkedHashMap;
import java.util.Map;
public class CssParser {
private final char[] chars;
private int index;
public Debugger(String code) {
this.chars = code.toCharArray();
this.index = 0;
}
private void skipWhitespace() {
/*
* Here you should also skip comments in the CSS file, which either look
* like this comment or start with a // and go until the end of line.
*/
while (index < chars.length && Character.isWhitespace(chars[index]))
index++;
}
private void parseProperty(String selector, Map<String, String> properties) {
skipWhitespace();
// get the CSS property name
StringBuilder sb = new StringBuilder();
while (index < chars.length && chars[index] != ':')
sb.append(chars[index++]);
String propertyName = sb.toString().trim();
if (index == chars.length)
throw new IllegalArgumentException("Expected a colon at index " + index + ".");
// skip the colon
index++;
// get the CSS property value
sb.setLength(0);
while (index < chars.length && chars[index] != ';' && chars[index] != '}')
sb.append(chars[index++]);
String propertyValue = sb.toString().trim();
/*
* Here is the check for duplicate property definitions. The method
* Map.put(Object, Object) always returns the value that had been stored
* under the given name before.
*/
String previousValue = properties.put(propertyName, propertyValue);
if (previousValue != null)
throw new IllegalArgumentException("Duplicate property \"" + propertyName + "\" in selector \"" + selector + "\".");
if (index < chars.length && chars[index] == ';')
index++;
skipWhitespace();
}
private void parseSelector(Map<String, Map<String, String>> selectors) {
skipWhitespace();
// get the CSS selector
StringBuilder sb = new StringBuilder();
while (index < chars.length && chars[index] != '{')
sb.append(chars[index++]);
String selector = sb.toString().trim();
if (index == chars.length)
throw new IllegalArgumentException("CSS Selector name \"" + selector + "\" without content.");
// skip the opening brace
index++;
skipWhitespace();
Map<String, String> properties = new LinkedHashMap<String, String>();
selectors.put(selector, properties);
while (index < chars.length && chars[index] != '}') {
parseProperty(selector, properties);
skipWhitespace();
}
// skip the closing brace
index++;
}
private Map<String, Map<String, String>> parseFile() {
Map<String, Map<String, String>> selectors = new LinkedHashMap<String, Map<String, String>>();
while (index < chars.length) {
parseSelector(selectors);
skipWhitespace();
}
return selectors;
}
public static void main(String[] args) {
CssParser parser = new CssParser("body {margin:prueba;A:B;a:Arial, Courier New, \"monospace\";\n}");
Map<String, Map<String, String>> selectors = parser.parseFile();
System.out.println("There are " + selectors.size() + " selectors.");
for (Map.Entry<String, Map<String, String>> entry : selectors.entrySet()) {
String selector = entry.getKey();
Map<String, String> properties = entry.getValue();
System.out.println("Selector " + selector + ":");
for (Map.Entry<String, String> property : properties.entrySet()) {
String name = property.getKey();
String value = property.getValue();
System.out.println(" Property name \"" + name + "\" value \"" + value + "\"");
}
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java String parsing - {k1=v1,k2=v2,...} - java

Written without testing: String result = null; int i = foo.indexOf(key+"="); if (i != -1 && (foo.charAt(i-1) == '{' || foo.charAt(i-1) == ',')) { int j = foo.indexOf(',', i); if (j == -1) j = foo.length() - 1; result = foo.substring(i+key.length()+1, j); } return result; Yes, it's ugly :-)

Well, assuming no '=' nor ',' in values, the simplest (and shabby) method is: int start = foo.indexOf(key+'=') + key.length() + 1; int end = foo.indexOf(',',i) - 1; if (end==-1) end = foo.indexOf('}',i) - 1; return (start<end)?foo.substring(start,end):null; Yeah, not recommended :)

Adding code to check for existance of key in foo is left as exercise to the reader :-) String foo = "{k1=v1,k2=v2,...}"; String getValue(String key){ int offset = foo.indexOf(key+'=') + key.length() + 1; return foo.substring(foo.indexOf('=', offset)+1,foo.indexOf(',', offset)); }

Related

Check if a string contains another string twice

Unformat formatted String

LZW decoding miss the first code entry

Treemap Problem

arrayListOutOfBoundsException

Categories

Resources