Getting the next token and remaining String with StreamTokenizer - java

I have a StreamTokenizer that will tokenize a String. I am interested in a way to get the next token from a String, as well as the remaining String (without the token we just took).
public static void parseString(String s){
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
try {
while (st.nextToken() != st.TT_EOF){
if (st.ttype == st.TT_WORD){
System.out.println("Word: " + st.sval);
if (st.sval.equals("start")){
start(st.sval, ???)
}
}
else if (st.ttype == st.TT_NUMBER){
System.out.println("Number: " + st.nval);
}
else if (st.ttype == '\''){
System.out.println(st.sval);
}
else{
System.out.println((char)st.ttype);
}
}
} catch (IOException e){}
}
public String start(String text, String theRest){
return "<start>" + text + "" + parseString(theRest) + "</start>";
}
Some things I've tried:
I've tried just using the original String s but StreamTokenizer doesn't alter a String after it tokenizes it (I forget the word to describe this).
I could find the index of the current token, and slice that token out of the original string.
I was wondering if there was a more elegant way to go about this?

In regards to the first bullet point, I believe the word you're thinking of is probably "immutable". You're correct, anything that seems to be manipulating a String is in fact creating a new string; the original is left intact.
For the second bullet point, frankly I would have suggested the same as well. At the moment I cannot think of a better way.
Here's a general example:
int startIndex = s.indexOf(currentToken) + currentToken.length;
String remainingString = s.subString(startIndex, s.length-1);
If my string is "Hi my name is Paul", and the current token is "name", the result of remainingString should be " is Paul".
You could easily encapsulate that in a helper method somewhere to help keep things clean and separated.
Probably not the answer you're looking for, but hopefully that somewhat helps.

Related

How do I print out information with CSV spreadsheet?

I'm currently working on a project with CSV. In the task, I am supposed to type a country name in the tester method, and when I call the tester method, it will print the information of the country. For example, "Germany Chemical 32000." However, no matter what country name I put(I'm sure that country exists in the spreadsheet), it always prints out "NOT FOUND," which I don't understand how. I'm guessing the problem is in the if statement of the countryInfo method. However, I can't find the problem probably due to a lack of domain knowledge, so I hope someone can inform me or give me a hint.
public void tester(){
FileResource fr = new FileResource();
CSVParser parser = fr.getCSVParser();
String GermanyInfo = countryInfo(parser,"Peru");
System.out.println(GermanyInfo);
}
public String countryInfo(CSVParser parser, String country){
String countryInfo = " ";
for (CSVRecord record : parser){
String nation = record.get("Country");
if (nation.contains(country)){
String countryExport = record.get("Exports");
String exportValue = record.get("Value (dollars)");
countryInfo = country + ": " + countryExport + " " + exportValue;
}else{
countryInfo = "NOT FOUND";
}
}
return countryInfo;
}
Debug Process:
Hey guys, after more testing and trying, I found out the problem is really part of the if statement. My for-each loop is running through the parser, one row at a time. The way I have this written, my if statement is checking to see whether that row contains any matching country name in the Country column, but once it finds it, it just keeps going and doesn't stop because I haven't told it to do so. It would find Germany but then move on to the next rows and bypass it until the end of the file, where it will return "not found." In order for me to fix this, I need to have a return statement following the exportValues = record.get line instead of the end of my method, OR simply type in a line that says "break;" after the money line, which will end the loop and then go to the return statement at the bottom.
If you're sure that the country search in your loop works fine, just add the return statement in the right place. I would suggest to change your method like this:
public String countryInfo(CSVParser parser, String country) {
for (CSVRecord record : parser) {
String nation = record.get("Country");
if (nation.contains(country)) {
String countryExport = record.get("Exports");
String exportValue = record.get("Value (dollars)");
return country + ": " + countryExport + " " + exportValue;
}
}
return "NOT FOUND";
}
In this case - as soon as the country is found - the method will return information about it. If no country is found - the method will return String "NOT FOUND"

Recursive command parser that solves a repeat statement

I am building a parser that recognizes simple commands such as "DOWN.", "UP." and "REP 3.". It must be able to parse the commands rather freely. It should be legal to write
"DOWN % asdf asdf asdf
."
Where % represents a comment and the fullstop signifying end-of-command. This fullstop can be on the next line.
This is all good and well so far, however I'm struggling with the Rep part (represents Repeat.)
I should be able to issue a command as follows:
DOWN .DOWN. REP 3 " DOWN. DOWN.
DOWN . % hello this is a comment
REP 2 " DOWN. ""
This should give me 17 DOWNS. The semantics is as follows for repeat: REP x " commands " where x is the amount of times it shall repeat the commands listed inside the quotation marks. Note that REP can be nested inside of REP. The following code is for handling the DOWN command. The incoming text is read from System.in or a text file.
public void repeat(String workingString) {
if (workingString.matches(tokens)) {
if (workingString.matches("REP")) {
repada();
} else
if (workingString.matches("(DOWN).*")) {
String job = workingString.substring(4);
job = job.trim();
if (job.equals("")) {
String temp= sc.next();
temp= temp.trim();
// Word after DOWN.
if (temp.matches("\\.")) {
leo.down()
// If word after DOWN is a comment %
} else if (temp.matches("%.*")) {
boolean t = comment();
} else {
throw SyntaxError();
}
} else if (job.matches("\\..*")) {
workingString += job;
System.out.println("Confirm DOWN with .");
}
} else if (workingString.matches("\\.")) {
instructions += workingString;
System.out.println("Fullstop");
} else if (workingString.matches("%.*")) {
comment();
} else {
// work = sc.next();
work = work.trim().toUpperCase();
System.out.println(work);
}
} else {
System.out.println("No such token: " + workingString);
}
}
I got a working start on the repeat function:
public String repada(){
String times = sc.next();
times.trim();
if (times.matches("%.*")) {
comment();
times = sc.next();
}
String quote = sc.next();
quote.trim();
if(quote.matches("%.*")){
comment();
quote = sc.next();
}
String repeater = "";
System.out.println("REP " + times + " "+quote);}
However I'm thinking my whole system of doing things might need a rework. Any advice on how I could more easily solve this issue would be greatly appreciated!

In java trying to extract XMLNS using a Regexpression

I have been trying for a few hours to get this right, and I really can't seem to do it...
Given a string
"xmlns:oai-identifier=\"http://www.openarchives.org/OAI/2.0/oai-identifier\""
what is the correct expression to "save" the http://www.openarchives.org/OAI/2.0/oai-identifier bit?
Thanks in advance, really having trouble getting this right.
String validXML = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><feed "
+ "xmlns:oai-identifier=\"http://www.openarchives.org/OAI/2.0/oai-identifier\" "
+ "xmlns:mingo-identifier=\"http://www.google.com\" "
+ "xmlns:abeve-identifier=\"http://www.news.ycombinator.org/OAI/2.0/oai-identifier\">"
+ "</feed>";
Pattern p = Pattern.compile(".*\\\"(.*)\\\".*");
Matcher m = p.matcher(validXML);
System.out.println(m.group(1));
Is not printing out anything. Be aware that this attempt was just to get the string inside the quotes, I was going to worry about the other part once I got that working... To bad I never got that working. Thanks
Regular Expressions are so expensive - don't use them when you don't need to!! There are a million other ways to parse a string.
String validXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><feed "
+ "xmlns:oai-identifier=\"http://www.openarchives.org/OAI/2.0/oai-identifier\" "
+ "xmlns:mingo-identifier=\"http://www.google.com\" "
+ "xmlns:abeve-identifier=\"http://www.news.ycombinator.org/OAI/2.0/oai-identifier\">"
+ "</feed>";
String start = "xmlns:oai-identifier=\"";
String end = "\" ";
int location = validXml.indexOf(start);
String result;
if (location > 0) {
result = validXml.substring(location + start.length(), validXml.length());
int endIndex = result.indexOf(end);
if (endIndex > 0) {
result = result.substring(0, endIndex);
}
else {
throw new Exception("Could not find end!");
}
}
else {
throw new Exception("Could not find start!");
}
System.out.println(result);
I think the problem might be that the first .* in your regular expression is too eager and matching more characters than you'd like.
Try changing ".*\\\"(.*)\\\".*" to be "xmlns.*=\"(.*)\".*" and see whether that works.
If it doesn't work at first, you can also try re-instating the quote escaping. Off the top of my head, I think you don't need them escaping, but I'm not 100% sure.
Note also that this will only match a single namespace declaration, not each one in the validXML variable in your example. You'll have to split the string in order to use this on an arbitrary number of xmlns:.*= attributes.
Since you are reading XML, you might be using DOM, so you can extract the namespace from the prefix name using lookupNamespaceURI() once you parse the document with the setNamespaceAware() option set to true:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = factory.newDocumentBuilder().parse(new InputSource(new StringReader(validXML)));
String namespace = doc.lookupNamespaceURI("oai-identifier");
It's simpler and you don't have to do any string parsing.

removal of repeated string

I have a string something like
JNDI Locations eis/FileAdapter,eis/FileAdapter used by composite
HelloWorld1.0.jar are not available in the
destination domain.
eis/FileAdapter,eis/FileAdapter is occuring twice.
I want it to be formatted as
JNDI Locations eis/FileAdapter used by composite
HelloWorld1.0.jar are not available in the
destination domain.
I tried below thing
String[ ] missingAdapters =((textMissingAdapterList.item(0)).getNodeValue().trim().split(","));
missingAdapters.get(0)
but i am missing second part any better way to handle this?
In your comment below the question you confirm, that the duplicates will alway be conencted via a comma. Using this information, this should work (for most cases):
String replaceCustomDuplicates(String str) {
if (str.indexOf(",") < 0) {
return str; // nothing to do
}
StringBuilder result = new StringBuilder(str.length());
for (String token : str.split(" ", -1)) {
if (token.indexOf(",") > 0) {
String[] parts = token.split(",");
if (parts.length == 2 && parts[0].equals(parts[1])) {
token = parts[0];
}
}
result.append(token + " ");
}
return result.delete(result.length() - 1, result.length()).toString();
}
a little demo with your example:
String str = "JNDI Locations eis/FileAdapter,eis/FileAdapter used by composite";
System.out.println(str);
str = replaceCustomDuplicates(str);
System.out.println(str);
Previous errors fixed
That should do it:
String[] missingAdapters = ((textMissingAdapterList.item(0)).getNodeValue().trim().split(","));
String result = missingAdapters[0] + " " + missingAdapters[1].split(" ", 2)[1];
assuming there is no space in this double string you want to leave out.

using tokenizer to read a line

public void GrabData() throws IOException
{
try {
BufferedReader br = new BufferedReader(new FileReader("data/500.txt"));
String line = "";
int lineCounter = 0;
int TokenCounter = 1;
arrayList = new ArrayList < String > ();
while ((line = br.readLine()) != null) {
//lineCounter++;
StringTokenizer tk = new StringTokenizer(line, ",");
System.out.println(line);
while (tk.hasMoreTokens()) {
arrayList.add(tk.nextToken());
System.out.println("check");
TokenCounter++;
if (TokenCounter > 12) {
er = new DataRecord(arrayList);
DR.add(er);
arrayList.clear();
System.out.println("check2");
TokenCounter = 1;
}
}
}
} catch (FileNotFoundException ex) {
Logger.getLogger(Driver.class.getName()).log(Level.SEVERE, null, ex);
}
}
Hello , I am using a tokenizer to read the contents of a line and store it into an araylist. Here the GrabData class does that job.
The only problem is that the company name ( which is the third column in every line ) is in quotes and has a comma in it. I have included one line for your example. The tokenizer depends on the comma to separate the line into different tokens. But the company name throws it off i guess. If it weren't for the comma in the company column , everything goes as normal.
Example:-
Essie,Vaill,"Litronic , Industries",14225 Hancock Dr,Anchorage,Anchorage,AK,99515,907-345-0962,907-345-1215,essie#vaill.com,http://www.essievaill.com
Any ideas?
First of all StringTokenizer is considered to be legacy code. From Java doc:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
Using the split() method you get an array of strings. While iterating through the array you can check if the current string starts with a quote and if that's the case check if the next one ends with a quote. If you meet these 2 conditions then you know you didn't split where you wanted and you can merge these 2 together, process it like you want and continue iterating through the array normally after that. In that pass you will probably do i+=2 instead of your regular i++ and it should go unnoticed.
You can accomplish this using Regular Expressions. The following code:
String s = "asd,asdasd,asd\"asdasdasd,asdasdasd\", asdasd, asd";
System.out.println(s);
s = s.replaceAll("(?<=\")([^\"]+?),([^\"]+?)(?=\")", "$1 $2");
s = s.replaceAll("\"", "");
System.out.println(s);
yields
asd,asdasd,asd, "asdasdasd,asdasdasd", asdasd, asd
asd,asdasd,asd, asdasdasd asdasdasd, asdasd, asd
which, from my understanding, is the preprocessing you require for your tokenizer-code to work. Hope this helps.
While StringTokenizer might not natively handle this for you, a couple lines of code will do it... probably not the most efficient, but should get the idea across...
while(tk.hasMoreTokens()) {
String token = tk.nextToken();
/* If the item is encapsulated in quotes, loop through all tokens to
* find closing quote
*/
if( token.startsWIth("\"") ){
while( tk.hasMoreTokens() && ! tk.endsWith("\"") ) {
// append our token with the next one. Don't forget to retain commas!
token += "," + tk.nextToken();
}
if( !token.endsWith("\"") ) {
// open quote found but no close quote. Error out.
throw new BadFormatException("Incomplete string:" + token);
}
// remove leading and trailing quotes
token = token.subString(1, token.length()-1);
}
}
As you can see, in the class description, the use of StringTokenizer is discouraged by Oracle.
Instead of using tokenizer I would use the String split() method
which you can use a regular expression as argument and significantly reduce your code.
String str = "Essie,Vaill,\"Litronic , Industries\",14225 Hancock Dr,Anchorage,Anchorage,AK,99515,907-345-0962,907-345-1215,essie#vaill.com,http://www.essievaill.com";
String[] strs = str.split("(?<! ),(?! )");
List<String> list = new ArrayList<String>(strs.length);
for(int i = 0; i < strs.length; i++) list.add(strs[i]);
Just pay attention to your regex, using this one you're assuming that the comma will be always between spaces.

Categories