I'm moving some code from objective-c to java. The project is an XML/HTML Parser. In objective c I pretty much only use the scanUpToString("mystring"); method.
I looked at the Java Scanner class, but it breaks everything into tokens. I don't want that. I just want to be able to scan up to occurrences of substrings and keep track of the scanners current location in the overall string.
Any help would be great thanks!
EDIT
to be more specific. I don't want Scanner to tokenize.
String test = "<title balh> blah <title> blah>";
Scanner feedScanner = new Scanner(test);
String title = "<title";
String a = feedScanner.next(title);
String b = feedScanner.next(title);
In the above code I'd like feedScanner.next(title); to scan up to the end of the next occurrence of "<title"
What actually happens is the first time feeScanner.next is called it works since the default delimiter is whitespace, however, the second time it is called it fails (for my purposes).
You can achieve this with String class (Java.lang.String).
First get the first index of your substring.
int first_occurence= string.indexOf(substring);
Then iterate over entire string and get the next value of substrings
int next_index=indexOf( str,fromIndex);
If you want to save the values, add them to the wrapper class and the add to a arraylist object.
This really is easier by just using String's methodsdirectly:
String test = "<title balh> blah <title> blah>";
String target = "<title";
int index = 0;
index = test.indexOf( target, index ) + target.length();
// Index is now 6 (the space b/w "<title" and "blah"
index = test.indexOf( target, index ) + target.length();
// Index is now at the ">" in "<title> blah"
Depending on what you want to actually do besides walk through the string, different approaches might be better/worse. E.g. if you want to get the blah> blah string between the <title's, a Scanner is convenient:
String test = "<title balh> blah <title> blah>";
Scanner scan = new Scanner(test);
scan.useDelimiter("<title");
String stuff = scan.next(); // gets " blah> blah ";
Maybe String.split is something for you?
s = "The almighty String is mystring is your String is our mystring-object - isn't it?";
parts = s.split ("mystring");
Result:
Array("The almighty String is ", " is your String is our ", -object - isn't it?)
You know that in between your "mystring" must be. I'm not sure for start and end, so maybe you need some s.startsWith ("mystring") / s.endsWith.
Related
I'm sure this is fairly simple, however I've tried googling the question but can't find an answer that fits my problem.
I'm playing around with string manipulation and one of the things I'm trying to do is get the first letter of each word. (And then place them all into a string)
I'm having trouble with registering each 'space' so that my If statement will be triggered. Here's what I have so far.
while (scanText.hasNext()) {
boolean isSpace = false;
if (scanText.hasNext(" ")) {isSpace = true;}
String s = scanText.next();
if (isSpace) {firstLetters += s + " ";}
}
Also, if there is a much better way to do this then please let me know
You can also split the original text by spaces, and collect the words.
String input = " Hello world aaa ";
String[] split = input.trim().split("\\s+"); // all types of whitespace; " +" to pick spaces only
// operate on "split" array containing words now: [Hello, world, aaa]
However using regexps here might be overkill.
Assuming that scanText is a Scanner object, you could use something like stated on the documentation:
Scanner s = new Scanner(input).useDelimiter("\\s+"); //regex for spaces
https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
The title speaks for itself. I'm trying to create a calculator that integrates polynomial functions using basic coding, not just whipping out a math operator to do it for me :). I haven't had to go far until I hit a wall, as I'm unable to find a way to: create a substring of the numbers in the original string until a non-numerical character is reached. i.e. if the string is 123x, I want to create a substring of 123, without the 'x'. Here is what I've got so far:
public static void indefinite()
{
int x = 0;
System.out.print("Enter your function to integrate:\n F ");
Scanner input = new Scanner(System.in);
String function = input.nextLine();
String s1 = "";
for (int i = 0; i < function.length(); i++)
{
s1 = s1 + function.substring(x, i+1);
x = i+1;
}
}
It all looks a bit nonsensical, but basically, if the string 'function' is 32x^4, I want the substring to be 32. I'll figure out the rest myself, but this part I cant seem to do.
p.s. i know the for loop's repetition variable is wrong, it shouldn't repeat until the end of the string if I'm looking at functions with more than just 2x^3. I haven't gotten around trying to figure that out yet, so I just made sure it does it for 1 part.
Use replaceAll() to "extract" it:
String number = str.replaceAll("\\D.*", "");
This replaces the first non digit and everything after it with nothing (effectively deleting it), leaving you with just the number.
You can also go directly to a numeric primitive, without having to use a String variable if you prefer (like me) to have less code:
int number = Integer.parseInt(str.replaceAll("\\D.*", ""));
You could split your string at the letter-digit marks, like so:
str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
For instance, "123x54y7z" will return [123, x, 54, y, 7, z]
I got a string that looks like this.
String line = "50464,"STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS",C,04/05/2006,STIRLING,BUCHANAN";
If i want to split the string in 6 parts to put in an array with
String result = line.split(",");
i would have a problem since STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS should be only one element because it's between "" but since there is also a komma there, it would be split there also. Also i wouldn't want the "'s there.
If i look for the position of the first ' and create a substring of that pos + 1 and then look for the other " in that substring and make a substring out of that.
Now i have 2 strings of which one has everything before the first " and one string that has the part that comes after the 2nd ", and the string value of STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS. Then i can replace the komma there with nothing using replace(",", "");, maybe save the position of the komma to place it back once the string is split on komma's, but that aside. Next action would be to concatenate everyting together again so i get:
50464,STRONACHLACHAR PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS,C,04/05/2006,STIRLING,BUCHANAN
which can be succesfully split on komma's and i end up with an array of 6 elements i can work with.
In programming it would look like this. Left the part out on how to put the komma back.
String line = "50464,\"STRONACHLACHAR, PIER BUILDING AND PIER INCLUDING REVETMENT WALLS AND RAILINGS\",C,04/05/2006,STIRLING,BUCHANAN";
String end2= "";
if(line.contains("\"")){
int pos = line.indexOf("\"");
String firstPart = line.substring(0, 6);
String temp = line.substring(pos+1);
int pos2 = temp.indexOf("\"");
String secondPart = temp.substring(pos2+1);
String temp2 = temp.substring(0, pos2-1);
String temp3 = temp2.replace(",", "");
String end = firstPart.concat(temp3);
end2 = end.concat(secondPart);
}
String[] output = end2.split(",");
for(int i = 0; i < output.length; i++){
System.out.println(output[i] + " ");
}
But what i am wondering about is: if this is good programming practice or am i thinking too complicated on how to do this? Since its a 1500 line file and every line would have to be checked. Even so there still might be other irregularities that need to be dealt with.
Btw, the purpose of this is that all lines should end up in an array of 6 elements, no more, no less.
What kind of parameters/thinking should i keep in mind to process a file/lines like this one?
Your data appears to be well-formed comma separated value (CSV) lines. Instead of tying yourself in knots, suggest reusing a library such as http://opencsv.sourceforge.net
I am trying to concatenate and trying to parse at the same time. I am right now making a excel like program where I can say a1 = "Hello" + "World" and in the cell of A1 have it say HelloWorld. I just need to know how to parse the adding sign and connect those two words. Please tell me if you need more code to understand this, like the runner.
This is my parseInput class :
public class ParseInput {
private static String inputs;
static int col;
private static int row;
private static String operation;
private static Value field;
public static void parseInput(String input){
//splits the input at each regular expression match. \w is used for letters and \d && \D for integers
inputs = input;
Scanner tokens = new Scanner(inputs);
String none0 = tokens.next();
#SuppressWarnings("unused")
String none1 = tokens.next();
operation = tokens.nextLine().substring(1);
String[] holder = new String[2];
String regex = "(?<=[\\w&&\\D])(?=\\d)";
holder = none0.split(regex);
row = Integer.parseInt(holder[1]);
col = 0;
int counter = -1;
char temp = holder[0].charAt(0);
char check = 'a';
while(check <= temp){
if(check == temp){
col = counter +1;
}
counter++;
check = (char) (check + 1);
}
System.out.println(col);
System.out.println(row);
System.out.println(operation);
setField(Value.parseValue(operation));
Spreadsheet.changeCell(row, col, field);
}
public static Value getField() {
return field;
}
public static void setField(Value field) {
ParseInput.field = field;
}
}
This is actually a pretty complicated problem unless you can constrain input to a very small subset of what Excel accepts. If not then you'll probably want to look into something like ANTLR. However, assuming the above input then you'll want to do something like:
Split the string on the equal sign into s1 and s2
Split s2 on the plus sign into s3 and s4.
Trim all the strings, remove the quotes around s3 and s4.
Concatenate s3 and s4 and assign to your datastore indexed by s1.
Depending on how complex your concatenation needs are you can either use string concatenation or a StringBuilder:
result = "" + s3 + s4; // string concatenation
result = new StringBuilder().append(s3).append(s4).toString(); // StringBuilder
Let me know if you have any questions about any of the steps detailed above.
Details on (1) above, assuming input is a1 = "Hello" + "World":
String[] strings = input.split("=");
String s1 = strings[0].trim(); // a1
String s2 = strings[1].trim(); // "Hello" + "World"
strings = s2.split("+");
String s3 = strings[0].trim().replaceAll("^\"", "").replaceAll("\"$", "") // Hello
String s4 = strings[1].trim().replaceAll("^\"", "").replaceAll("\"$", ""); // World
String field = s3 + s4;
String colString = s1.replaceAll("[\\d]", ""); // a
String rowString = s1.replaceAll("[\\D]", ""); // 1
int col = colString.charAt(0) - 'a'; // 0
int row = Integer.parseInt(rowString);
Spreadsheet.changeCell(row, col, field);
I suggest you to implement your custom grammar using a parser generator like JavaCC.
Here you can find a simple tutorial.
I believe this is the better solution because in this way you can handle every expression you need.
Are you sure you want to use all the classes you are using? To parse something like "a=b+c+d.." (assuming you are not trying to validate), easiest and possibly the most efficient way is to use split API in Java lang String
Then join whatever is required using StringBuilder
You need to design and implement a parser and an evaluator. And before that, you need to design the language that your parser/evaluator is going to evaluate.
How to do it.
If your language is really simple, you can get away with parsing it by hand, using something like StringTokenizer to do the tokenization,
Otherwise, you are probably best off learning to use a Java "parser generator" such as JavaCC or ANTLR.
Either way, you need to do some background reading to understand all of the terminology. You could start with Wikipedia and/or the tutorial material from one of the parser generators. Alternatively, there are good textbooks on this topic.
In addition to what Abdullah said, if you really want to save every single ounce of memory you can, you should use the StringBuilder instead of the String concatenation. I believe i read somewhere before that the String concatenation make a new string object for each concatenations while the StringBuilder will add them all to a single String. Shouldn't matter too much though.
In my early life I made an equation evaluator in your style. It cost me huge code and complexity, because of my unawareness about Expression trees. But now with this you will be able to add more capabilities to your parser easily and with native JAVA codes. You will get tons of example of using Expression Trees.
I need to trim a String in java so that:
The quick brown fox jumps over the laz dog.
becomes
The quick brown...
In the example above, I'm trimming to 12 characters. If I just use substring I would get:
The quick br...
I already have a method for doing this using substring, but I wanted to know what is the fastest (most efficient) way to do this because a page may have many trim operations.
The only way I can think off is to split the string on spaces and put it back together until its length passes the given length. Is there an other way? Perhaps a more efficient way in which I can use the same method to do a "soft" trim where I preserve the last word (as shown in the example above) and a hard trim which is pretty much a substring.
Thanks,
Below is a method I use to trim long strings in my webapps.
The "soft" boolean as you put it, if set to true will preserve the last word.
This is the most concise way of doing it that I could come up with that uses a StringBuffer which is a lot more efficient than recreating a string which is immutable.
public static String trimString(String string, int length, boolean soft) {
if(string == null || string.trim().isEmpty()){
return string;
}
StringBuffer sb = new StringBuffer(string);
int actualLength = length - 3;
if(sb.length() > actualLength){
// -3 because we add 3 dots at the end. Returned string length has to be length including the dots.
if(!soft)
return escapeHtml(sb.insert(actualLength, "...").substring(0, actualLength+3));
else {
int endIndex = sb.indexOf(" ",actualLength);
return escapeHtml(sb.insert(endIndex,"...").substring(0, endIndex+3));
}
}
return string;
}
Update
I've changed the code so that the ... is appended in the StringBuffer, this is to prevent needless creations of String implicitly which is slow and wasteful.
Note: escapeHtml is a static import from apache commons:
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
You can remove it and the code should work the same.
Here is a simple, regex-based, 1-line solution:
str.replaceAll("(?<=.{12})\\b.*", "..."); // How easy was that!? :)
Explanation:
(?<=.{12}) is a negative look behind, which asserts that there are at least 12 characters to the left of the match, but it is a non-capturing (ie zero-width) match
\b.* matches the first word boundary (after at least 12 characters - above) to the end
This is replaced with "..."
Here's a test:
public static void main(String[] args) {
String input = "The quick brown fox jumps over the lazy dog.";
String trimmed = input.replaceAll("(?<=.{12})\\b.*", "...");
System.out.println(trimmed);
}
Output:
The quick brown...
If performance is an issue, pre-compile the regex for an approximately 5x speed up (YMMV) by compiling it once:
static Pattern pattern = Pattern.compile("(?<=.{12})\\b.*");
and reusing it:
String trimmed = pattern.matcher(input).replaceAll("...");
Please try following code:
private String trim(String src, int size) {
if (src.length() <= size) return src;
int pos = src.lastIndexOf(" ", size - 3);
if (pos < 0) return src.substring(0, size);
return src.substring(0, pos) + "...";
}
Try searching for the last occurence of a space that is in a position less or more than 11 and trim the string there, by adding "...".
Your requirements aren't clear. If you have trouble articulating them in a natural language, it's no surprise that they'll be difficult to translate into a computer language like Java.
"preserve the last word" implies that the algorithm will know what a "word" is, so you'll have to tell it that first. The split is a way to do it. A scanner/parser with a grammar is another.
I'd worry about making it work before I concerned myself with efficiency. Make it work, measure it, then see what you can do about performance. Everything else is speculation without data.
How about:
mystring = mystring.replaceAll("^(.{12}.*?)\b.*$", "$1...");
I use this hack : suppose that the trimmed string must have 120 of length :
String textToDisplay = textToTrim.substring(0,(textToTrim.length() > 120) ? 120 : textToTrim.length());
if (textToDisplay.lastIndexOf(' ') != textToDisplay.length() &&textToDisplay.length()!=textToTrim().length()) {
textToDisplay = textToDisplay + textToTrim.substring(textToDisplay.length(),textToTrim.indexOf(" ", textToDisplay.length()-1))+ " ...";
}