I am trying to count frequency of words in a text file. But I have to use a different approach. For example, if the file contains BRAIN-ISCHEMIA and ISCHEMIA-BRAIN, I need to count BRAIN-ISCHEMIA twice (and leaving ISCHEMIA-BRAIN) or vice versa. Here is my piece of code-
// Mapping of String->Integer (word -> frequency)
HashMap<String, Integer> frequencyMap = new HashMap<String, Integer>();
// Iterate through each line of the file
String[] temp;
String currentLine;
String currentLine2;
while ((currentLine = in.readLine()) != null) {
// Remove this line if you want words to be case sensitive
currentLine = currentLine.toLowerCase();
temp=currentLine.split("-");
currentLine2=temp[1]+"-"+temp[0];
// Iterate through each word of the current line
// Delimit words based on whitespace, punctuation, and quotes
StringTokenizer parser = new StringTokenizer(currentLine);
while (parser.hasMoreTokens()) {
String currentWord = parser.nextToken();
Integer frequency = frequencyMap.get(currentWord);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
frequency = 0;
}
frequencyMap.put(currentWord, frequency + 1);
}
StringTokenizer parser2 = new StringTokenizer(currentLine2);
while (parser2.hasMoreTokens()) {
String currentWord2 = parser2.nextToken();
Integer frequency = frequencyMap.get(currentWord2);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
frequency = 0;
}
frequencyMap.put(currentWord2, frequency + 1);
}
}
// Display our nice little Map
System.out.println(frequencyMap);
But for the following file-
ISCHEMIA-GLUTAMATE
ISCHEMIA-BRAIN
GLUTAMATE-BRAIN
BRAIN-TOLERATE
BRAIN-TOLERATE
TOLERATE-BRAIN
GLUTAMATE-ISCHEMIA
ISCHEMIA-GLUTAMATE
I am getting the following output-
{glutamate-brain=1, ischemia-glutamate=3, ischemia-brain=1, glutamate-ischemia=3, brain-tolerate=3, brain-ischemia=1, tolerate-brain=3, brain-glutamate=1}
The problem is in second while block I think. Any light on this problem will be highly appreciated.
From an algorithm perspective, you may want to consider the following approach:
For each string, split, then sort, then re-combine (i.e. take DEF-ABC and convert to ABC-DEF. ABC-DEF would convert to ABC-DEF). Then use that as the key for your frequency count.
If you need to hold onto the exact original item, just include that in your key - so the key would have: ordinal (the re-combined string) and original.
Disclaimer: I stole the sweet trick suggested by Kevin Day for my implementation.
I still want to post just to let you know that using the right data structure (Multiset/Bad) and the right libraries (google-guava) will not only simplify the code but also makes it efficient.
Code
public class BasicFrequencyCalculator
{
public static void main(final String[] args) throws IOException
{
#SuppressWarnings("unchecked")
Multiset<Word> frequency = Files.readLines(new File("c:/2.txt"), Charsets.ISO_8859_1, new LineProcessor() {
private final Multiset<Word> result = HashMultiset.create();
#Override
public Object getResult()
{
return result;
}
#Override
public boolean processLine(final String line) throws IOException
{
result.add(new Word(line));
return true;
}
});
for (Word w : frequency.elementSet())
{
System.out.println(w.getOriginal() + " = " + frequency.count(w));
}
}
}
public class Word
{
private final String key;
private final String original;
public Word(final String orig)
{
this.original = orig.trim();
String[] temp = original.toLowerCase().split("-");
Arrays.sort(temp);
key = temp[0] + "-"+temp[1];
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((getKey() == null) ? 0 : getKey().hashCode());
return result;
}
#Override
public boolean equals(final Object obj)
{
if (this == obj)
{
return true;
}
if (obj == null)
{
return false;
}
if (!(obj instanceof Word))
{
return false;
}
Word other = (Word) obj;
if (getKey() == null)
{
if (other.getKey() != null)
{
return false;
}
}
else if (!getKey().equals(other.getKey()))
{
return false;
}
return true;
}
#Override
public String toString()
{
return getOriginal();
}
public String getKey()
{
return key;
}
public String getOriginal()
{
return original;
}
}
Output
BRAIN-TOLERATE = 3
ISCHEMIA-GLUTAMATE = 3
GLUTAMATE-BRAIN = 1
ISCHEMIA-BRAIN = 1
Thanks everyone for your help. Here is how I solved it-
// Mapping of String->Integer (word -> frequency)
TreeMap<String, Integer> frequencyMap = new TreeMap<String, Integer>();
// Iterate through each line of the file
String[] temp;
String currentLine;
String currentLine2;
while ((currentLine = in.readLine()) != null) {
temp=currentLine.split("-");
currentLine2=temp[1]+"-"+temp[0];
// Iterate through each word of the current line
StringTokenizer parser = new StringTokenizer(currentLine);
while (parser.hasMoreTokens()) {
String currentWord = parser.nextToken();
Integer frequency = frequencyMap.get(currentWord);
Integer frequency2 = frequencyMap.get(currentLine2);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
if (frequency2 == null)
frequency = 0;
else {
frequencyMap.put(currentLine2, frequency2 + 1);
break;
}//else
} //if (frequency == null)
frequencyMap.put(currentWord, frequency + 1);
}//while (parser.hasMoreTokens())
}//while ((currentLine = in.readLine()) != null)
// Display our nice little Map
System.out.println(frequencyMap);
Related
I have the format line
"123","45","{"VFO":[B501], "AGN":[605,B501], "AXP":[665], "QAV":[720,223R,251Q,496M,548A,799M]}","4"
it can be longer but it always contains
"number","number","someValues","digit"
I need to wrap values inside someValues with quotes
for test string expected result should be.
"123","45","{"VFO":["B501"], "AGN":["605","B501"], "AXP":["665"], "QAV":["720","223R","251Q","496M","548A","799M"]}","4"
Please suggest simplest solution in java.
P.S.
my variant:
String valuePattern = "\\[(.*?)\\]";
Pattern valueR = Pattern.compile(valuePattern);
Matcher valueM = valueR.matcher(line);
List<String> list = new ArrayList<String>();
while (valueM.find()) {
list.add(valueM.group(0));
}
String value = "";
for (String element : list) {
element = element.substring(1, element.length() - 1);
String[] strings = element.split(",");
String singleGroup = "[";
for (String el : strings) {
singleGroup += "\"" + el + "\",";
}
singleGroup = singleGroup.substring(0, singleGroup.length() - 1);
singleGroup = singleGroup + "]";
value += singleGroup;
}
System.out.println(value);
EDITED
OK, here is the shortest way i found, it works very nicely in my opinion, except for the comma and the bracket which i had to add manually... somebody might be able to do it straight away but i found it tricky to handle replacements with nested groups.
import java.util.*;
import java.lang.*;
import java.io.*;
Pattern p = Pattern.compile("(\\[(\\w+))|(,(\\w+))");
Matcher m = p.matcher("\"123\",\"45\",\"{\"VFO\":[B501], \"AGN\":[605,B501], \"AXP\":[665], \"QAV\":[720,223R,251Q,496M,548A,799M]}\",\"4\"");
StringBuffer s = new StringBuffer();
while (m.find()){
if(m.group(2)!=null){
m.appendReplacement(s, "[\""+m.group(2)+"\"");
}else if(m.group(4)!=null){
m.appendReplacement(s, ",\""+m.group(4)+"\"");
}
}
m.appendTail(s);
print(s);
As I commented above, I think the real solution here is to fix the thing that's generating this malformed output. In the general case I don't believe it's possible to parse correctly: if the strings contain embedded bracket or comma characters then it becomes impossible to determine which parts are which.
You can get pretty close, though, by simply ignoring all quote characters and tokenizing the rest:
public final class AlmostJsonSanitizer {
enum TokenType {
COMMA(','),
COLON(':'),
LEFT_SQUARE_BRACKET('['),
RIGHT_SQUARE_BRACKET(']'),
LEFT_CURLY_BRACKET('{'),
RIGHT_CURLY_BRACKET('}'),
LITERAL(null);
static Map<Character, TokenType> LOOKUP;
static {
Map<Character, TokenType> lookup = new HashMap<Character, TokenType>();
for (TokenType tokenType : values()) {
lookup.put(tokenType.ch, tokenType);
}
LOOKUP = Collections.unmodifiableMap(lookup);
}
private final Character ch;
private TokenType(Character ch) {
this.ch = ch;
}
}
static class Token {
final TokenType type;
final String string;
Token(TokenType type, String string) {
this.type = type;
this.string = string;
}
}
private static class Tokenizer implements Iterator<Token> {
private final String buffer;
private int pos;
Tokenizer(String buffer) {
this.buffer = buffer;
this.pos = 0;
}
#Override
public boolean hasNext() {
return pos < buffer.length;
}
#Override
public Token next() {
char ch = buffer.charAt(pos);
TokenType type = TokenType.LOOKUP.get(ch);
// If it's in the lookup table, return a token of that type
if (type != null) {
pos++;
return new Token(type, null);
}
// Otherwise it's a literal
StringBuilder sb = new StringBuilder();
while (pos < buffer.length) {
ch = buffer.charAt(pos++);
// Skip all quote characters
if (ch == '"') {
continue;
}
// If we've found a different type of token then stop
if (TokenType.LOOKUP.get(ch) != null) {
break;
}
sb.append(ch);
}
return new Token(TokenType.LITERAL, sb.toString());
}
#Override
public boolean remove() {
throw new UnsupportedOperationException();
}
}
/** Convenience method to allow using a foreach loop below. */
static Iterable<Token> tokenize(final String input) {
return new Iterable<Token>() {
#Override
public Iterator<Token> iterate() {
return new Tokenizer(input);
}
};
}
public static String sanitize(String input) {
StringBuilder result = new StringBuilder();
for (Token token : tokenize(input)) {
switch (token.type) {
case COMMA:
result.append(", ");
break;
case COLON:
result.append(": ");
break;
case LEFT_SQUARE_BRACKET:
case RIGHT_SQUARE_BRACKET:
case LEFT_CURLY_BRACKET:
case RIGHT_CURLY_BRACKET:
result.append(token.type.ch);
break;
case LITERAL:
result.append('"').append(token.string).append('"');
break;
}
}
return result.toString();
}
}
If you wanted to you could also do some sanity checks like ensuring the brackets are balanced. Up to you, this is just an example.
I am trying to compare lines from a text file in Java.
For example, there is a text file with these lines:
temp1 am 32.5 pm 33.5 temp2 am 33.5 pm 33.5 temp3 am 32.5 pm
33.5 temp4 am 31.5 pm 35
a b c d e
a is the name of the line, b is constant(am), c is a variable, d is constant(pm), e is another variable.
It will only compare the variables -> temp1(c) to temp2(c), temp1(e) to temp2(e) etc.
When there are two or more lines with the same c(s) and e(s), it will throw FormatException.
From the example text file above, because temp1's c is the same as temp3's c and temps1's e is the same as temp3's e, it will throw FormatException.
This is what I have so far:
public static Temp read(String file) throws FormatException {
String line = "";
FileReader fr = new FileReader(fileName);
Scanner scanner = new Scanner(fr);
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
System.out.println(line);
}
scanner.close();
if () {
throw new FormatException("Error.");
How can I make this?
You will need to split your lines to extract your variables and a Set to check for duplicates as next:
Set<String> ceValues = new HashSet<>();
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] values = line.split(" ");
if (!ceValues.add(String.format("%s %s", values[2], values[4]))) {
// The value has already been added so we throw an exception
throw new FormatException("Error.");
}
}
As I don't want to do your homework for you, let me get you started:
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] partials = line.split(" ");
String a = partials[0];
//...
String e = partials[4];
}
I'm splitting the line over a space as this is the only thing to split over in your case. This gives us 5 seperate strings (a through e). You will need to save them in a String[][] for later analysis but you should be able to figure out for yourself how to do this.
Try playing around with this and update your question if you're still stuck.
Here you got an example that basically includes:
a collection in which store your lines
simple pattern matching logic (see Java Regex Tutorial for more)
a try-with-resource statement
a recursive check method
First of all I would make a simple POJO representing a line info:
public class LineInfo {
private String lineName;
private String am;
private String pm;
public LineInfo(String lineName, String am, String pm) {
this.lineName = lineName;
this.am = am;
this.pm = pm;
}
// getters and setters
}
Second I would need a pattern to validate each line and extract data from them:
// group 1 group 2 group3 group 4 group 5
// v v v v v
private static final String LINE_REGEX = "(\\w+)\\s+am\\s+(\\d+(\\.\\d+)?)\\s+pm\\s+(\\d+(\\.\\d+)?)";
private static final Pattern LINE_PATTERN = Pattern.compile(LINE_REGEX);
Third I would rework the read method like this (I return void for simplicity):
public static void read(String fileName) throws FormatException {
// collect your lines (or better the information your lines provide) in some data structure, like a List
final List<LineInfo> lines = new ArrayList<>();
// with this syntax your FileReader and Scanner will be closed automatically
try (FileReader fr = new FileReader(fileName); Scanner scanner = new Scanner(fr)) {
while (scanner.hasNextLine()) {
final String line = scanner.nextLine();
final Matcher matcher = LINE_PATTERN.matcher(line);
if (matcher.find()) {
lines.add(new LineInfo(matcher.group(1), matcher.group(2), matcher.group(4)));
} else {
throw new FormatException("Line \"" + line + "\" is not valid.");
}
}
// recursive method
compareLines(lines, 0);
} catch (final IOException e) {
e.printStackTrace();
// or handle it in some way
}
}
private static void compareLines(List<LineInfo> lines, int index) throws FormatException {
// if there are no more lines return
if (index == lines.size()) {
return;
}
final LineInfo line = lines.get(index);
for (int i = index + 1; i < lines.size(); i++) {
final LineInfo other = lines.get(i);
// do the check
if (line.getAm().equals(other.getAm()) && line.getPm().equals(other.getPm())) {
throw new FormatException(String.format("Lines #%d (%s) and #%d (%s) does not meet the requirements.",
index, line.getLineName(), i, other.getLineName()));
}
}
// do the same thing with the next line
compareLines(lines, index + 1);
}
If I got your question right then you need to check line by line in order to find duplicates using c and e as criteria
this means, line n must be compared against all the other lines, if repeated then exception...
The suggestion will be:
Define a class that represent the element c and e of every line...
class LinePojo {
private String c;
private String e;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((c == null) ? 0 : c.hashCode());
result = prime * result + ((e == null) ? 0 : e.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
LinePojo other = (LinePojo) obj;
if (c == null) {
if (other.c != null)
return false;
} else if (!c.equals(other.c))
return false;
if (e == null) {
if (other.e != null)
return false;
} else if (!e.equals(other.e))
return false;
return true;
}
#Override
public String toString() {
return "(c=" + c + ", e=" + e + ")";
}
public LinePojo(String c, String e) {
this.c = c;
this.e = e;
}
}
then a list of that class where every line will be inserted and /or check if an element is there or not..
List<LinePojo> myList = new ArrayList<LinePojo>();
then iterate line by line
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] lineInfo = line.split(" ");
LinePojo lp = new LinePojo(lineInfo[2], lineInfo[4]);
if (myList.contains(lp)) {
throw new IllegalArgumentException("there is a duplicate element");
} else {
myList.add(lp);
}
}
I've been working for hours trying to order a linked list of strings alphabetically (dictionary-like). The given string is lowercase only.
For example, input of: "hello my name is albert" will be sorted in the list as: Node 1: albert,
Node 2: hello,
Node 3: is,
etc..
My code so far reads a string like the example above and insert it as nodes - unordered.
I've searched in the web for ways to sort a linked list alphabetically with good performance, and I found Merge Sort can be usefull.
I've changed the merge sort to work for string using compareTo() but my code returns nullPointerException error in the following line:
if(firstList._word.compareTo(secondList._word) < 0){
I'm looking for help to fix the following code or another way for sorting a linked list alphabetically (without Collection.sort)
My full code is (after trying to add the merge sort to work with my code):
public class TextList
{
public WordNode _head;
public TextList()
{
_head = null;
}
public TextList (String text)
{
this._head = new WordNode();
int lastIndex = 0;
boolean foundSpace = false;
String newString;
WordNode prev,next;
if (text.length() == 0) {
this._head._word = null;
this._head._next = null;
}
else {
for (int i=0;i<text.length();i++)
{
if (text.charAt(i) == ' ') {
newString = text.substring(lastIndex,i);
insertNode(newString);
// Update indexes
lastIndex = i;
// set to true when the string has a space
foundSpace = true;
}
}
if (!foundSpace) {
//If we didnt find any space, set the given word
_head.setWord(text);
_head.setNext(null);
}
else {
//Insert last word
String lastString = text.substring(lastIndex,text.length());
WordNode lastNode = new WordNode(_head._word,_head._next);
_head.setNext(new WordNode(lastString,lastNode));
}
sortList(_head);
}
}
private void insertNode(String word)
{
//Create a new node and put the curret node in it
WordNode newWord = new WordNode(_head._word,_head.getNext());
//Set the new information in the head
_head._word = word;
_head.setNext(newWord);
}
private WordNode sortList(WordNode start) {
if (start == null || start._next == null) return start;
WordNode fast = start;
WordNode slow = start;
// get in middle of the list :
while (fast._next!= null && fast._next._next !=null){
slow = slow._next; fast = fast._next._next;
}
fast = slow._next;
slow._next=null;
return mergeSortedList(sortList(start),sortList(fast));
}
private WordNode mergeSortedList(WordNode firstList,WordNode secondList){
WordNode returnNode = new WordNode("",null);
WordNode trackingPointer = returnNode;
while(firstList!=null && secondList!=null){
if(firstList._word.compareTo(secondList._word) < 0){
trackingPointer._next = firstList; firstList=firstList._next;
}
else {
trackingPointer._next = secondList; secondList=secondList._next
;}
trackingPointer = trackingPointer._next;
}
if (firstList!=null) trackingPointer._next = firstList;
else if (secondList!=null) trackingPointer._next = secondList;
return returnNode._next;
}
public String toString() {
String result = "";
while(_head.getNext() != null){
_head = _head.getNext();
result += _head._word + ", ";
}
return "List: " + result;
}
public static void main(String[] args) {
TextList str = new TextList("a b c d e a b");
System.out.println(str.toString());
}
}
In the past i have made a method to sort strings alphabetically in an array as school HW, so umm here it is:
private void sortStringsAlphabetically(){
for (int all = 0; all < names.length; all++) {
for (int i = all + 1; i < names.length; i++) {
if (names[all].compareTo(names[i]) > 0) {
String tmp = names[i];
names[i] = names[all];
names[all] = tmp;
}
}
}
}
This piece of code works for Arrays and specifically for an array of names. You can tweak it to work with the list, it is very simple especially if we consider the wide range of methods in the List interface and all it's implementations.
Cheers.
If you don't wanna to have a huge code who gets every first letter of the word and sort them, do it with Collection.sort()
I don't know what is the proplem on Collection.sort() so use it
Here is a short code, that does exactually this what you want to:
String test = "hello my name is albert";
test = test.replaceAll(" ", "\n");
String[] te = test.split("\n");
List<String> stlist = new ArrayList<String>();
for(String st : te) {
stlist.add(st);
}
Collections.sort(stlist);
Regarding NPE you said it is probably because you are having an null string in head at first and keep adding this in insert method.
this._head = new WordNode();
Also the adding last element is also not proper. Just reuse the insert method like below
insertNode(text.substring(lastIndex,text.length()));
These are the ones I thought having problem when you are converting string to lined list
You can use the below code to handle the first null
private void insertNode(String word) {
if (this._head == null) {
this._head = new WordNode(word, null);
} else {
WordNode newWord = new WordNode(_head._word, _head.getNext());
_head._word = word;
_head.setNext(newWord);
}
}
This is my class Debugger. Can anyone try and run it and see whens wrong? Ive spent hours on it already. :(
public class Debugger {
private String codeToDebug = "";
public Debugger(String code) {
codeToDebug = code;
}
/**
* This method itterates over a css file and adds all the properties to an arraylist
*/
public void searchDuplicates() {
boolean isInside = false;
ArrayList<String> methodStorage = new ArrayList();
int stored = 0;
String[] codeArray = codeToDebug.split("");
try {
int i = 0;
while(i<codeArray.length) {
if(codeArray[i].equals("}")) {
isInside = false;
}
if(isInside && !codeArray[i].equals(" ")) {
boolean methodFound = false;
String method = "";
int c = i;
while(!methodFound) {
method += codeArray[c];
if(codeArray[c+1].equals(":")) {
methodFound = true;
} else {
c++;
}
}
methodStorage.add(stored, method);
System.out.println(methodStorage.get(stored));
stored++;
boolean stillInside = true;
int skip = i;
while(stillInside) {
if(codeArray[skip].equals(";")) {
stillInside = false;
} else {
skip++;
}
}
i = skip;
}
if(codeArray[i].equals("{")) {
isInside = true;
}
i++;
}
} catch(ArrayIndexOutOfBoundsException ar) {
System.out.println("------- array out of bounds exception -------");
}
}
/**
* Takes in String and outputs the number of characters it contains
* #param input
* #return Number of characters
*/
public static int countString(String input) {
String[] words = input.split("");
int counter = -1;
for(int i = 0; i<words.length; i++){
counter++;
}
return counter;
}
public static void main(String[] args) {
Debugger h = new Debugger("body {margin:;\n}");
h.searchDuplicates();
}
}
Any place where an element of an array is being obtained without a bounds check after the index is manipulated is an candidate for an ArrayIndexOutOfBoundsException.
In the above code, there are at least two instances where the index is being manipulated without being subject to a bounds check.
The while loop checking the !methodFound condition
The while loop checking the stillInside condition
In those two cases, the index is being manipulated by incrementing or adding a value to the index, but there are no bound checks before an element is being obtained from the String[], therefore there is no guarantee that the index being specified is not outside the bounds of the array.
I think this block of codes can create your problem
int c = i;
while(!methodFound) {
method += codeArray[c];
if(codeArray[c+1].equals(":")) {
methodFound = true;
} else {
c++;
}
}
int skip = i;
while(stillInside) {
if(codeArray[skip].equals(";")) {
stillInside = false;
} else {
skip++;
}
}
i = skip;
The reason is that if the condition is true, and i = codeArray.length - 1. The c + 1 will create the error of ArrayIndexOutOfBound
Try evaluating if your index exists in the array...
adding:
while (!methodFound && c < codeArray.length) {
while (stillInside && skip < codeArray.length) {
if (i < codeArray.length && codeArray[i].equals("{")) {
so, your code looks like:
public class Debugger {
private String codeToDebug = "";
public Debugger(String code) {
codeToDebug = code;
}
/**
* This method itterates over a css file and adds all the properties to an
* arraylist
*/
public void searchDuplicates() {
boolean isInside = false;
List<String> methodStorage = new ArrayList<String>();
int stored = 0;
String[] codeArray = codeToDebug.split("");
try {
int i = 0;
while (i < codeArray.length) {
if (codeArray[i].equals("}")) {
isInside = false;
}
if (isInside && !codeArray[i].equals(" ")) {
boolean methodFound = false;
String method = "";
int c = i;
while (!methodFound && c < codeArray.length) {
method += codeArray[c];
if (codeArray[c].equals(":")) {
methodFound = true;
} else {
c++;
}
}
methodStorage.add(stored, method);
System.out.println(methodStorage.get(stored));
stored++;
boolean stillInside = true;
int skip = i;
while (stillInside && skip < codeArray.length) {
if (codeArray[skip].equals(";")) {
stillInside = false;
} else {
skip++;
}
}
i = skip;
}
if (i < codeArray.length && codeArray[i].equals("{")) {
isInside = true;
}
i++;
}
} catch (ArrayIndexOutOfBoundsException ar) {
System.out.println("------- array out of bounds exception -------");
ar.printStackTrace();
}
}
/**
* Takes in String and outputs the number of characters it contains
*
* #param input
* #return Number of characters
*/
public static int countString(String input) {
String[] words = input.split("");
int counter = -1;
for (int i = 0; i < words.length; i++) {
counter++;
}
return counter;
}
public static void main(String[] args) {
Debugger h = new Debugger("body {margin:prueba;\n}");
h.searchDuplicates();
}
}
Also, declaring implementation types is a bad practice, because of that in the above code i Change the ArrayList variable = new ArrayList() to List variable = new ArrayList()
I couldn't resist to implement this task of writing a CSS parser in a completely different way. I have split the task of parsing into many small ones.
The smallest is called skipWhitespace, since you will need it everywhere when parsing text files.
The next one is parseProperty, which reads one property of the form name:value;.
Based on that, parseSelector reads a complete CSS selector, starting with the selector name, an opening brace, possibly many properties, and finishing with the closing brace.
Still based on that, parseFile reads a complete file, consisting of possibly many selectors.
Note how carefully I checked whether the index is small enough. I did that before every access to the chars array.
I used LinkedHashMaps to save the properties and the selectors, because these kinds of maps remember in which order the things have been inserted. Normal HashMaps don't do that.
The task of parsing a text file is generally quite complex, and this program only attempts to handle the basics of CSS. If you need a full CSS parser, you should definitely look for a ready-made one. This one cannot handle #media or similar things where you have nested blocks. But it shouldn't bee too difficult to add it to the existing code.
This parser will not handle CSS comments very well. It only expects them at a few places. If comments appear in other places, the parser will not treat them as comments.
import java.util.LinkedHashMap;
import java.util.Map;
public class CssParser {
private final char[] chars;
private int index;
public Debugger(String code) {
this.chars = code.toCharArray();
this.index = 0;
}
private void skipWhitespace() {
/*
* Here you should also skip comments in the CSS file, which either look
* like this comment or start with a // and go until the end of line.
*/
while (index < chars.length && Character.isWhitespace(chars[index]))
index++;
}
private void parseProperty(String selector, Map<String, String> properties) {
skipWhitespace();
// get the CSS property name
StringBuilder sb = new StringBuilder();
while (index < chars.length && chars[index] != ':')
sb.append(chars[index++]);
String propertyName = sb.toString().trim();
if (index == chars.length)
throw new IllegalArgumentException("Expected a colon at index " + index + ".");
// skip the colon
index++;
// get the CSS property value
sb.setLength(0);
while (index < chars.length && chars[index] != ';' && chars[index] != '}')
sb.append(chars[index++]);
String propertyValue = sb.toString().trim();
/*
* Here is the check for duplicate property definitions. The method
* Map.put(Object, Object) always returns the value that had been stored
* under the given name before.
*/
String previousValue = properties.put(propertyName, propertyValue);
if (previousValue != null)
throw new IllegalArgumentException("Duplicate property \"" + propertyName + "\" in selector \"" + selector + "\".");
if (index < chars.length && chars[index] == ';')
index++;
skipWhitespace();
}
private void parseSelector(Map<String, Map<String, String>> selectors) {
skipWhitespace();
// get the CSS selector
StringBuilder sb = new StringBuilder();
while (index < chars.length && chars[index] != '{')
sb.append(chars[index++]);
String selector = sb.toString().trim();
if (index == chars.length)
throw new IllegalArgumentException("CSS Selector name \"" + selector + "\" without content.");
// skip the opening brace
index++;
skipWhitespace();
Map<String, String> properties = new LinkedHashMap<String, String>();
selectors.put(selector, properties);
while (index < chars.length && chars[index] != '}') {
parseProperty(selector, properties);
skipWhitespace();
}
// skip the closing brace
index++;
}
private Map<String, Map<String, String>> parseFile() {
Map<String, Map<String, String>> selectors = new LinkedHashMap<String, Map<String, String>>();
while (index < chars.length) {
parseSelector(selectors);
skipWhitespace();
}
return selectors;
}
public static void main(String[] args) {
CssParser parser = new CssParser("body {margin:prueba;A:B;a:Arial, Courier New, \"monospace\";\n}");
Map<String, Map<String, String>> selectors = parser.parseFile();
System.out.println("There are " + selectors.size() + " selectors.");
for (Map.Entry<String, Map<String, String>> entry : selectors.entrySet()) {
String selector = entry.getKey();
Map<String, String> properties = entry.getValue();
System.out.println("Selector " + selector + ":");
for (Map.Entry<String, String> property : properties.entrySet()) {
String name = property.getKey();
String value = property.getValue();
System.out.println(" Property name \"" + name + "\" value \"" + value + "\"");
}
}
}
}
I have the following string which will probably contain ~100 entries:
String foo = "{k1=v1,k2=v2,...}"
and am looking to write the following function:
String getValue(String key){
// return the value associated with this key
}
I would like to do this without using any parsing library. Any ideas for something speedy?
If you know your string will always look like this, try something like:
HashMap map = new HashMap();
public void parse(String foo) {
String foo2 = foo.substring(1, foo.length() - 1); // hack off braces
StringTokenizer st = new StringTokenizer(foo2, ",");
while (st.hasMoreTokens()) {
String thisToken = st.nextToken();
StringTokenizer st2 = new StringTokenizer(thisToken, "=");
map.put(st2.nextToken(), st2.nextToken());
}
}
String getValue(String key) {
return map.get(key).toString();
}
Warning: I didn't actually try this; there might be minor syntax errors but the logic should be sound. Note that I also did exactly zero error checking, so you might want to make what I did more robust.
The speediest, but ugliest answer I can think of is parsing it character by character using a state machine. It's very fast, but very specific and quite complex. The way I see it, you could have several states:
Parsing Key
Parsing Value
Ready
Example:
int length = foo.length();
int state = READY;
for (int i=0; i<length; ++i) {
switch (state) {
case READY:
//Skip commas and brackets
//Transition to the KEY state if you find a letter
break;
case KEY:
//Read until you hit a = then transition to the value state
//append each letter to a StringBuilder and track the name
//Store the name when you transition to the value state
break;
case VALUE:
//Read until you hit a , then transition to the ready state
//Remember to save the built-key and built-value somewhere
break;
}
}
In addition, you can implement this a lot faster using StringTokenizers (which are fast) or Regexs (which are slower). But overall, individual character parsing is most likely the fastest way.
If the string has many entries you might be better off parsing manually without a StringTokenizer to save some memory (in case you have to parse thousands of these strings, it's worth the extra code):
public static Map parse(String s) {
HashMap map = new HashMap();
s = s.substring(1, s.length() - 1).trim(); //get rid of the brackets
int kpos = 0; //the starting position of the key
int eqpos = s.indexOf('='); //the position of the key/value separator
boolean more = eqpos > 0;
while (more) {
int cmpos = s.indexOf(',', eqpos + 1); //position of the entry separator
String key = s.substring(kpos, eqpos).trim();
if (cmpos > 0) {
map.put(key, s.substring(eqpos + 1, cmpos).trim());
eqpos = s.indexOf('=', cmpos + 1);
more = eqpos > 0;
if (more) {
kpos = cmpos + 1;
}
} else {
map.put(key, s.substring(eqpos + 1).trim());
more = false;
}
}
return map;
}
I tested this code with these strings and it works fine:
{k1=v1}
{k1=v1, k2 = v2, k3= v3,k4 =v4}
{k1= v1,}
Written without testing:
String result = null;
int i = foo.indexOf(key+"=");
if (i != -1 && (foo.charAt(i-1) == '{' || foo.charAt(i-1) == ',')) {
int j = foo.indexOf(',', i);
if (j == -1) j = foo.length() - 1;
result = foo.substring(i+key.length()+1, j);
}
return result;
Yes, it's ugly :-)
Well, assuming no '=' nor ',' in values, the simplest (and shabby) method is:
int start = foo.indexOf(key+'=') + key.length() + 1;
int end = foo.indexOf(',',i) - 1;
if (end==-1) end = foo.indexOf('}',i) - 1;
return (start<end)?foo.substring(start,end):null;
Yeah, not recommended :)
Adding code to check for existance of key in foo is left as exercise to the reader :-)
String foo = "{k1=v1,k2=v2,...}";
String getValue(String key){
int offset = foo.indexOf(key+'=') + key.length() + 1;
return foo.substring(foo.indexOf('=', offset)+1,foo.indexOf(',', offset));
}
Please find my solution:
public class KeyValueParser {
private final String line;
private final String divToken;
private final String eqToken;
private Map<String, String> map = new HashMap<String, String>();
// user_uid=224620; pass=e10adc3949ba59abbe56e057f20f883e;
public KeyValueParser(String line, String divToken, String eqToken) {
this.line = line;
this.divToken = divToken;
this.eqToken = eqToken;
proccess();
}
public void proccess() {
if (Strings.isNullOrEmpty(line) || Strings.isNullOrEmpty(divToken) || Strings.isNullOrEmpty(eqToken)) {
return;
}
for (String div : line.split(divToken)) {
if (Strings.isNullOrEmpty(div)) {
continue;
}
String[] split = div.split(eqToken);
if (split.length != 2) {
continue;
}
String key = split[0];
String value = split[1];
if (Strings.isNullOrEmpty(key)) {
continue;
}
map.put(key.trim(), value.trim());
}
}
public String getValue(String key) {
return map.get(key);
}
}
Usage
KeyValueParser line = new KeyValueParser("user_uid=224620; pass=e10adc3949ba59abbe56e057f20f883e;", ";", "=");
String userUID = line.getValue("user_uid")