how do you get String Tokenizer to ignore text?

how do you get String Tokenizer to ignore text? - java

I have this code:
public void readTroops() {
File file = new File("resources/objects/troops.txt");
StringBuffer contents = new StringBuffer();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String text = null;
// repeat until all lines is read
while ((text = reader.readLine()) != null) {
StringTokenizer troops = new StringTokenizer(text,"=");
String list = troops.nextToken();
String value = troops.nextToken();
}
and this file:
//this is a comment part of the text file//
Total=1
the problem is that 1) I cant get it to ignore everything within the //,// and can't get it to read with an 'ENTER' (line) in-between them. For example, this text works:
Total=1
So my question is what do I type into the delimiter area ie.
StringTokenizer troops = new StringTokenizer(text,"=","WHAT GOES HERE?");
So how can I get Tokenizer to ignore 'ENTER'/new line, and anything in-between // or something similar, thanks.
ps.I don't care if you use a String.split to answer my question.

Use the method countTokens to skip lines that don't have two tokens:
while ((text = reader.readLine()) != null) {
StringTokenizer troops = new StringTokenizer(text,"=");
if(troops.countTokens() == 2){
String list = troops.nextToken();
String value = troops.nextToken();
....
}else {
//ignore this line
}
}

Properties prop = new Properties();
prop.load(new FileInputStream("properties_file.txt"));
assertExuals("1",prop.getProperty("Total"));
ps. you might hold and close input stream.

Thinking out of the box, maybe you can use Properties instead of tokenizer (if you update your comments to start with #)?
Properties troops = new Properties();
InputStream inputStream = SomeClass.class.getResourceAsStream("troops.properties");
try {
props.load(inputStream);
} catch (IOException e) {
// Handle error
} finally {
// Close inputStream in a safe manner
}
troops.getProperty("Total"); // Returns "1"
Or if you are using Java 7:
Properties troops = new Properties();
try (InputStream inputStream = SomeClass.class.getResourceAsStream("troops.properties")) {
props.load(inputStream);
} catch (IOException e) {
// Handle error
}
troops.getProperty("Total"); // Returns "1"

If you are reading in the file a better way would be to use a StreamTokenizer. This then allows you to declare your own syntax of the tokenizer. I used this method to create a HTML rendering engine. This then allows you to parse direct from a reader, and also provides useful functions to identify numbers, which it seems you may use.
(I will post an example once my eclipse loads!)
public static String render(String file, HashMap vars){
// Create a stringbuffer to rebuild the string
StringBuffer renderedFile = new StringBuffer();
try{
FileReader in = new FileReader(file);
BufferedReader reader = new BufferedReader(in); // create your reader
StreamTokenizer tok;
tok = new StreamTokenizer(reader); //the tokenizer then takes in the reader as a builder
tok.resetSyntax();
tok.wordChars(0, 255); //sets all chars (inc spaces to be counted as words)
/*
* quoteChar allows you to set your comment char, for example $ hello $ means it will ignore hello
*/
tok.quoteChar('$');
while(tok.nextToken()!=StreamTokenizer.TT_EOF){ //while it is not at the end of file
String s = tok.sval;
if (vars.containsKey(s))
s =(String)vars.get(s);
renderedFile.append(s);
}
}
catch(Exception e){System.out.println("Error Loading Template");}
return renderedFile.toString();
}
Check this out for a good tutorial http://tutorials.jenkov.com/java-io/streamtokenizer.html

Related

How to split single text file into multiple with character as delimiter

I have a text document that has multiple separate entries all compiled into one .log file.
The format of the file looks something like this.
$#UserID#$
Date
User
UserInfo
SteamFriendID
=========================
<p>Message</p>
$#UserID#$
Date
User
UserInfo
SteamFriendID
========================
<p>Message</p>
$#UserID#$
Date
User
UserInfo
SteamFriendID
========================
<p>Message</p>
I'm trying to take everything in between the instances of "$#UserID$#", and print them into separate text files.
So far, with the looking that I've done, I tried implementing it using StringBuilder in something like this.
FileReader fr = new FileReader(“Path to raw file.”);
int idCount = 1;
FileWriter fw = new FileWriter("Path to parsed files" + idCount);
BufferedReader br = new BufferedReader(fr);
//String line, date, user, userInfo, steamID;
StringBuilder sb = new StringBuilder();
//br.readLine();
while ((line = br.readLine()) != null) {
if(line.substring(0,1).contains("$#")) {
if (sb.length() != 0) {
File file = new File("Path to parsed logs" + idCount);
PrintWriter pw = new PrintWriter(file, "UTF-8");
pw.println(sb.toString());
pw.close();
//System.out.println(sb.toString());
Sb.delete(0, sb.length());
idCount++;
}
continue;
}
sb.append(line + "\r\n");
}
But this only gives me the first 2 of the entries in separate parsed files. Leaving the 3rd one out for some reason.
The other way I was thinking about doing it was reading in all the lines using .readAllLines(), store the list as an array, loop through the lines to find "$#", get that line's index & then recursively write the lines starting at the index given.
Does anyone know of a better way to do this, or would be willing to explain to me why I'm only getting two of the three entries parsed?

Short / quick fix is to write the contents of the StringBuilder once after your while loop like this:
public static void main(String[] args) {
try {
int idCount = 1;
FileReader fr = new FileReader("<path to desired file>");
BufferedReader br = new BufferedReader(fr);
//String line, date, user, userInfo, steamID;
StringBuilder sb = new StringBuilder();
//br.readLine();
String line = "";
while ((line = br.readLine()) != null) {
if(line.startsWith("$#")) {
if (sb.length() != 0) {
writeFile(sb.toString(), idCount);
System.out.println(sb);
sb.setLength(0);
idCount++;
}
continue;
}
sb.append(line + "\r\n");
}
if (sb.length() != 0) {
writeFile(sb.toString(), idCount);
System.out.println(sb);
idCount++;
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void writeFile(String content, int id) throws IOException
{
File file = new File("<path to desired dir>\\ID_" + id + ".txt");
file.createNewFile();
PrintWriter pw = new PrintWriter(file, "UTF-8");
pw.println(content);
pw.close();
}
I've changed two additional things:
the condition "line.substring(0,1).contains("$#")" did not work properly, the substring call only returns one character, but is compared to two characters -> never true. I changed that to use the 'startsWith' method.
After the content of the StringBuilder is written to file, you did not reset or empty it, resulting in the second and third file containing every previous blocks aswell (thrid file equals input then...). So thats done with "sb.setLength(0);".

Combined Xml String Split Java

I am trying to split a combined text file. The combined text file has multiple xml files inside. I want to split on <?xml version='1.0'?> which is the start of every new xml inside the combined text file. Not sure what is the best way to do this. Currently this is what I have which does not split correctly.
Updated Code Working (fixed quotation in quotes problem added Pattern.quote):
Scanner scanner = new Scanner( new File("src/main/resources/Flume_Sample"), "UTF-8" );
String combinedText = scanner.useDelimiter("\\A").next();
scanner.close(); // Put this call in a finally block
String delimiter = "<?xml version=\"1.0\"?>";
String[] xmlFiles = combinedText.split("(?="+Pattern.quote(delimiter)+")");
for (int i = 0; i < xmlFiles.length; i++){
File file = new File("src/main/resources/output_"+i);
FileWriter writer = new FileWriter(file);
writer.write(xmlFiles[i]);
System.out.println(xmlFiles[i]);
writer.close();
}

The split method takes a regular expression string, so you may want to escape your delimiter String to a valid regex :
String[] xmlFiles = combinedText.split(Pattern.quote(delimiter));
See the Pattern.quote method .

Be also aware that you will load the entire initial file in memory if you proceed this way.
A streamed approach would perform better if the input file is large...

I would use something like this if you want to parse the data manually.
public static void parseFile(File file) throws AttributeException, LineException{
BufferedReader br = null;
String s = "";
int counter = 0;
if(file != null){
try{
br = new BufferedReader(new FileReader(file));
while((s = br.readLine()) != null){
if(s.contains("<?xml version='1.0'?>")){
//Write in new file with Stringbuffer and Filewritter.
}
}
br.close();
}catch (IOException e){
System.out.println(e);
}
}
}

How to replace a line with a new line using Java

Using a Buffer reader I parse throughout a file. If Oranges: pattern is found, I want to replace it with ApplesAndOranges.
try (BufferedReader br = new BufferedReader(new FileReader(resourcesFilePath))) {
String line;
while ((line = br.readLine()) != null) {
if (line.startsWith("Oranges:")){
int startIndex = line.indexOf(":");
line = line.substring(startIndex + 2);
String updatedLine = "ApplesAndOranges";
updateLine(line, updatedLine);
I call a method updateLine and I pass my original line as well as the updated line value.
private static void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(resourcesFilePath));
PrintWriter writer = new PrintWriter(new File(resourcesFilePath+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("Can't Write To File"+ resourcesFilePath);
writer.close();
}
To get the file to update I have to save it with a different name (resourcesFilePath+".out"). If I use the original file name the saved version become blank.
So here is my question, how can I replace a line with any value in the original file without losing any data.

For this you need to use the regular expressions (RegExp) like this:
str = str.replaceAll("^Orange:(.*)", "OrangeAndApples:$1");
It's an example and maybe it's not excactly what you want, but here, in the first parameter, the expression in parentesis is called a capturing group. The expression found will be replaced by the second parameter and the $1 will be replaced by the value of the capturing group. In our example Orange:Hello at the beggining of a line will be replaced by OrangeAndApples:Hello.
In your code, it seams you create one file per line ... maybe inlining the sub-method would be better.
try (
BufferedReader br = new BufferedReader(new FileReader(resourcesFilePath));
BufferedWriter writer = Files.newBufferedWriter(outputFilePath, charset);
) {
String line;
while ((line = br.readLine()) != null) {
String repl = line.replaceAll("Orange:(.*)","OrangeAndApples:$1");
writer.writeln(repl);
}
}

The easiest way to write over everything in your original final would be to read in everything - changing whatever you want to change and closing the stream. Afterwards open up the file again, then overwrite the file and all its lines with the data you want.

You can use RandomAccessFile to write to the file, and nio.Files to read the bytes from it. In this case, I put it as a string.
You can also read the file with RandomAccessFile, but it is easier to do it this way, in my opinion.
import java.io.RandomAccessFile;
import java.io.File;
import java.io.IOException;
import java.nio.file.*;
public void replace(File file){
try {
RandomAccessFile raf = new RandomAccessFile(file, "rw");
Path p = Paths.get(file.toURI());
String line = new String(Files.readAllBytes(p));
if(line.startsWith("Oranges:")){
line.replaceAll("Oranges:", "ApplesandOranges:");
raf.writeUTF(line);
}
raf.close();
} catch (IOException e) {
e.printStackTrace();
}
}

Parsing in Java with C style?

I am new to java text parsing and I'm wondering what is the best way to parse a file when the format of each line is known.
I have a file that has the following format for each line:
Int;String,double;String,double;String,double;String,double;String,double
Note how the String,double act as a pair separated by a comma and each pair is separated by a semicolon.
A few examples:
1;art,0.1;computer,0.5;programming,0.6;java,0.7;unix,0.3
2;291,0.8;database,0.6;computer,0.2;java,0.9;undegraduate,0.7
3;coffee,0.5;colombia,0.2;java,0.1;export,0.4;import,0.5
I'm using the following code to read each line:
public static void main(String args[]) {
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
Thanks in advance :)

You could use the Scanner class, for starters:
A simple text scanner which can parse primitive types and strings using regular expressions.

If you are truly trying to do "C" style parsing, where is the buffer which contains the characters being accumulated for the "next" field? Where is the check that sees if the field separator was read, and where is the code that flushes the current field into the correct data structure once the end of line / field separator is read?
A character by character read loop in Java looks like
int readChar = 0;
while ((readChar = in.read()) != -1) {
// do something with the new readChar.
}

You can provide a pattern and use the Scanner
String input = "fish1-1 fish2-2";
java.util.Scanner s = new java.util.Scanner(input);
s.findInLine("(\\d+)");
java.util.regex.MatchResult result = s.match();
for (int i=1; i<=result.groupCount(); i++)
System.out.println(result.group(i));
s.close();

Read XML, Replace Text and Write to same XML file via Java

Currently I am trying something very simple. I am looking through an XML document for a certain phrase upon which I try to replace it. The problem I am having is that when I read the lines I store each line into a StringBuffer. When I write the it to a document everything is written on a single line.
Here my code:
File xmlFile = new File("abc.xml")
BufferedReader br = new BufferedReader(new FileReade(xmlFile));
String line = null;
while((line = br.readLine())!= null)
{
if(line.indexOf("abc") != -1)
{
line = line.replaceAll("abc","xyz");
}
sb.append(line);
}
br.close();
BufferedWriter bw = new BufferedWriter(new FileWriter(xmlFile));
bw.write(sb.toString());
bw.close();
I am assuming I need a new line character when I prefer sb.append but unfortunately I don't know which character to use as "\n" does not work.
Thanks in advance!
P.S. I figured there must be a way to use Xalan to format the XML file after I write to it or something. Not sure how to do that though.

The readline reads everything between the newline characters so when you write back out, obviously the newline characters are missing. These characters depend on the OS: windows uses two characters to do a newline, unix uses one for example. To be OS agnostic, retrieve the system property "line.separator":
String newline = System.getProperty("line.separator");
and append it to your stringbuffer:
sb.append(line).append(newline);

Modified as suggested by Brel, your text-substituting approach should work, and it will work well enough for simple applications.
If things start to get a little hairier, and you end up wanting to select elements based on their position in the XML structure, and if you need to be sure to change element text but not tag text (think <abc>abc</abc>), then you'll want to call in in the cavalry and process the XML with an XML parser.
Essentially you read in a Document using a DocuemntBuilder, you hop around the document's nodes doing whatever you need to, and then ask the Document to write itself back to file. Or do you ask the parser? Anyway, most XML parsers have a handful of options that let you format the XML output: You can specify indentation (or not) and maybe newlines for every opening tag, that kinda thing, to make your XML look pretty.

Sb would be the StringBuffer object, which has not been instantiated in this example. This can added before the while loop:
StringBuffer sb = new StringBuffer();

Scanner scan = new Scanner(System.in);
String filePath = scan.next();
String oldString = "old_string";
String newString = "new_string";
String oldContent = "";
BufferedReader br = null;
FileWriter writer = null;
File xmlFile = new File(filePath);
try {
br = new BufferedReader(new FileReader(xmlFile));
String line = br.readLine();
while (line != null) {
oldContent = oldContent + line + System.lineSeparator();
line = br.readLine();
}
String newContent = oldContent.replaceAll(oldString, newString);
writer = new FileWriter(xmlFile);
writer.write(newContent);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
scan.close();
br.close();
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

how do you get String Tokenizer to ignore text? - java

Properties prop = new Properties(); prop.load(new FileInputStream("properties_file.txt")); assertExuals("1",prop.getProperty("Total")); ps. you might hold and close input stream.

Related

How to split single text file into multiple with character as delimiter

Combined Xml String Split Java

How to replace a line with a new line using Java

Parsing in Java with C style?

Read XML, Replace Text and Write to same XML file via Java

Categories

Resources