Java - reading file as binary with readLine - java

I have a Ruby code that reads file line-by-line and checks if it needs to read the next line to some block or it should handle that block and continue reading file parsing each line.
Here's it:
File.open(ARGV[0], 'rb') do |f|
fl = false
text = ''
f.readlines.each do |line|
if (line =~ /^end_block/)
fl = false
# parse text variable
end
text += line if fl == true
if (line =~ /^start_block/)
fl = true
end
end
end
E.g. i need the file to be opened for reading as binary and still i need a readLine method.
So, the question is: how can i do exactly the same with Groovy/Java?

You can use java.io.DataInputStream which provides both a readLine() method and readFully(byte[]) and read(byte[]) methods.
Warning: The JavaDoc for readLine says, it is deprecated and that the encoding might be inappropriate (read details in JavaDoc).
So think twice about your real requirements and if this is a suitable trade-off in your case.

If you have line formatted text, that's not binary IMHO. That's because true binary can have any byte, even new line and carriage return which would create false breaks in the code.
What you could mean is you have text where you want to read each byte without encoding or possibly mangling them. This is the same as using ISO-8859-1.
You can try
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(filename), "ISO-8859-1"));
StringBuilder sb = new StringBuilder();
String line;
boolean include = false;
while((line = br.readLine()) != null) {
if (line.startsWith("end_block"))
include = false;
else if (line.startsWith("start_block"))
include = true;
else if (include)
sb.append(line).append('\n'); // new lines back in.
}
br.close();
String text = sb.toString();

Maybe something like this:
public final class Read
{
private static final Pattern START_BLOCK = Pattern.compile("whatever");
private static final Pattern END_BLOCK = Pattern.compile("whatever");
public static void main(final String... args)
throws IOException
{
if (args.length < 1) {
System.err.println("Not enough arguments");
System.exit(1);
}
final FileReader r = new FileReader(args[0]);
final BufferedReader reader = new BufferedReader(r);
final StringBuilder sb = new StringBuilder();
boolean inBlock = false;
String line;
while ((line = reader.readLine()) != null) {
if (END_BLOCK.matcher(line).matches()) {
inBlock = false;
continue;
}
if (inBlock)
sb.append(line);
if (START_BLOCK.matcher(line).matches())
inBlock = true;
}
System.out.println(sb.toString());
System.exit(0);
}
}

Related

How to split single text file into multiple with character as delimiter

I have a text document that has multiple separate entries all compiled into one .log file.
The format of the file looks something like this.
$#UserID#$
Date
User
UserInfo
SteamFriendID
=========================
<p>Message</p>
$#UserID#$
Date
User
UserInfo
SteamFriendID
========================
<p>Message</p>
$#UserID#$
Date
User
UserInfo
SteamFriendID
========================
<p>Message</p>
I'm trying to take everything in between the instances of "$#UserID$#", and print them into separate text files.
So far, with the looking that I've done, I tried implementing it using StringBuilder in something like this.
FileReader fr = new FileReader(“Path to raw file.”);
int idCount = 1;
FileWriter fw = new FileWriter("Path to parsed files" + idCount);
BufferedReader br = new BufferedReader(fr);
//String line, date, user, userInfo, steamID;
StringBuilder sb = new StringBuilder();
//br.readLine();
while ((line = br.readLine()) != null) {
if(line.substring(0,1).contains("$#")) {
if (sb.length() != 0) {
File file = new File("Path to parsed logs" + idCount);
PrintWriter pw = new PrintWriter(file, "UTF-8");
pw.println(sb.toString());
pw.close();
//System.out.println(sb.toString());
Sb.delete(0, sb.length());
idCount++;
}
continue;
}
sb.append(line + "\r\n");
}
But this only gives me the first 2 of the entries in separate parsed files. Leaving the 3rd one out for some reason.
The other way I was thinking about doing it was reading in all the lines using .readAllLines(), store the list as an array, loop through the lines to find "$#", get that line's index & then recursively write the lines starting at the index given.
Does anyone know of a better way to do this, or would be willing to explain to me why I'm only getting two of the three entries parsed?
Short / quick fix is to write the contents of the StringBuilder once after your while loop like this:
public static void main(String[] args) {
try {
int idCount = 1;
FileReader fr = new FileReader("<path to desired file>");
BufferedReader br = new BufferedReader(fr);
//String line, date, user, userInfo, steamID;
StringBuilder sb = new StringBuilder();
//br.readLine();
String line = "";
while ((line = br.readLine()) != null) {
if(line.startsWith("$#")) {
if (sb.length() != 0) {
writeFile(sb.toString(), idCount);
System.out.println(sb);
sb.setLength(0);
idCount++;
}
continue;
}
sb.append(line + "\r\n");
}
if (sb.length() != 0) {
writeFile(sb.toString(), idCount);
System.out.println(sb);
idCount++;
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void writeFile(String content, int id) throws IOException
{
File file = new File("<path to desired dir>\\ID_" + id + ".txt");
file.createNewFile();
PrintWriter pw = new PrintWriter(file, "UTF-8");
pw.println(content);
pw.close();
}
I've changed two additional things:
the condition "line.substring(0,1).contains("$#")" did not work properly, the substring call only returns one character, but is compared to two characters -> never true. I changed that to use the 'startsWith' method.
After the content of the StringBuilder is written to file, you did not reset or empty it, resulting in the second and third file containing every previous blocks aswell (thrid file equals input then...). So thats done with "sb.setLength(0);".

Is this a good way of reading from a text file?

I've been looking around on the Internet trying to figure out which could be the best way to read from text files which are not very long (the use case here involves small OpenGL shaders). I ended up with this:
private static String load(final String path)
{
String text = null;
try
{
final FileReader fileReader = new FileReader(path);
fileReader.read(CharBuffer.wrap(text));
// ...
}
catch(IOException e)
{
e.printStackTrace();
}
return text;
}
In which cases could this chunk of code result in inefficiencies? Is that CharBuffer.wrap(text) a good thing?
If you want to read the file line by line:
BufferedReader br = new BufferedReader(new FileReader(path));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
} finally {
br.close();
}
If you want to read the complete file in one go:
String text=new String(Files.readAllBytes(...)) or Files.readAllLines(...)
I would usually just roll like this. The CharBuffer.wrap(text) thing seems to only get you a single character ... File Reader docs
BufferedReader br = new BufferedReader(fr);
StringBuilder sb = new StringBuilder();
String s;
while((s = br.readLine()) != null) {
sb.append(s);
}
fr.close();
return sb.toString();

Read all lines with BufferedReader

I want to type a multiple line text into the console using a BufferedReader and when I hit "Enter" to find the sum of the length of the whole text. The problem is that it seems I'm getting into an infinite loop and when I press "Enter" the program does not come to an end. My code is below:
InputStreamReader instream = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(instream);
line= buffer.readLine();
while (line!=null){
length = length + line.length();
line= buffer.readLine();
}
Could you please tell me what I'm doing wrong?
One line of code using Java 8:
line = buffer.lines().collect(Collectors.joining());
The idiomatic way to read all of the lines is while ((line = buffer.readLine()) != null). Also, I would suggest a try-with-resources statement. Something like
try (InputStreamReader instream = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(instream)) {
long length = 0;
String line;
while ((line = buffer.readLine()) != null) {
length += line.length();
}
System.out.println("Read length: " + length);
} catch (Exception e) {
e.printStackTrace();
}
If you want to end the loop when you receive an empty line, add a test for that in the while loop
while ((line = buffer.readLine()) != null) {
if (line.isEmpty()) {
break;
}
length += line.length();
}
JLS-14.15. The break Statement says
A break statement transfers control out of an enclosing statement.
line will not be null when you press enter; it will be an empty string.
Take note of what the BufferedReader JavaDoc says about readLine():
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
And readLine() returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So when you press [Enter], you are giving the BufferedReader a new line containing only \n, \r, or \r\n. This means that readLine() will return an empty string.
So try something like this instead:
InputStreamReader instream = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(instream);
line = buffer.readLine();
while( (line != null) && (!line.isEmpty()) ){
length = length + line.length();
line = buffer.readLine();
}
When you only press Enter the return from buffer.readLine(); isn't null it is an empty String.
Therefore you should change line != null to !line.equals("") (You could also change it to line.length() > 0)
Now your code will look something like this:
InputStreamReader instream = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(instream);
line = buffer.readLine();
while (!line.equals("")){
length = length + line.length();
line = buffer.readLine();
}
This should solve your problem. Hope this helped! :)
Since Java 8 you can use BufferedReader#lines method directly on buffered reader.
try (InputStreamReader in = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(in)) {
final int length = buffer.lines().mapToInt(String::length).sum();
System.out.println("Read length: " + length);
} catch (Exception e) {
e.printStackTrace();
}
Snarky answer: what you're doing wrong is only creating 2 objects in Java to do something... if you search, you can probably find a few more classes that extend BufferedReader or ExtendedBufferReader etc., and then it can be real Enterprise Java.
Now that i've gotten that out of my system: more useful answer. System.in is closed when you input EOF, which is Control-D under Linux and I think MacOS, and I think Control-Z plus enter under Windows. If you want to check for enter (or more specifically, two enters... one to finish the last line and one to indicate that you're done, which is essentially how http handles determining when the http headers are finished and it's time for the http body, then #dbank 's solution should be a viable option with a minor fix I'm going to try to make to move the ! inside the while predicate instead of !while.
(Edit #2: realized readLine strips the newline, so an empty line would "" instead of the newline, so now my code devolves to another answer with the EOF bit as an answer instead of comment)
Edit... that's weird, #dbank had answered while I was typing my answer, and I would have stopped had I not though mentioning the EOF alternative. To repeat his code from memory with the edit I was going to make:
InputStreamReader instream = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(instream);
line= buffer.readLine();
while (line != null && !line.equals("")){
length = length + line.length();
line= buffer.readLine();
}
Put every lines into String[] array. and second method get the number of lines contains in text file. I hope this might be useful to anyone..
public static void main(String... args) throws IOException {
String[] data = getLines();
for(String v : data) {
out.println(v);
}
}
public static String[] getLines() throws IOException {
BufferedReader bufferReader = new BufferedReader(new FileReader("C:\\testing.txt"));
String line = bufferReader.readLine();
String[] data = new String[getLinesLength()];
int i = 0;
while(line != null) {
data[i] = line;
line = bufferReader.readLine();
i++;
}
bufferReader.close();
return data;
}
public static int getLinesLength() throws IOException {
BufferedReader bufferReader = new BufferedReader(new FileReader("C:\\testing.txt"));
String line = bufferReader.readLine();
int size = 0;
while(line != null) {
size += 1;
line = bufferReader.readLine();
}
bufferReader.close();
return size;
}
Good example from #Russel Yang (https://stackoverflow.com/a/40412945/11079418).
Use this code, to add also a new line character after each line.
String lines = bufferedReader.lines().map(line -> line + "\n").collect(Collectors.joining());

how do you get String Tokenizer to ignore text?

I have this code:
public void readTroops() {
File file = new File("resources/objects/troops.txt");
StringBuffer contents = new StringBuffer();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String text = null;
// repeat until all lines is read
while ((text = reader.readLine()) != null) {
StringTokenizer troops = new StringTokenizer(text,"=");
String list = troops.nextToken();
String value = troops.nextToken();
}
and this file:
//this is a comment part of the text file//
Total=1
the problem is that 1) I cant get it to ignore everything within the //,// and can't get it to read with an 'ENTER' (line) in-between them. For example, this text works:
Total=1
So my question is what do I type into the delimiter area ie.
StringTokenizer troops = new StringTokenizer(text,"=","WHAT GOES HERE?");
So how can I get Tokenizer to ignore 'ENTER'/new line, and anything in-between // or something similar, thanks.
ps.I don't care if you use a String.split to answer my question.
Use the method countTokens to skip lines that don't have two tokens:
while ((text = reader.readLine()) != null) {
StringTokenizer troops = new StringTokenizer(text,"=");
if(troops.countTokens() == 2){
String list = troops.nextToken();
String value = troops.nextToken();
....
}else {
//ignore this line
}
}
Properties prop = new Properties();
prop.load(new FileInputStream("properties_file.txt"));
assertExuals("1",prop.getProperty("Total"));
ps. you might hold and close input stream.
Thinking out of the box, maybe you can use Properties instead of tokenizer (if you update your comments to start with #)?
Properties troops = new Properties();
InputStream inputStream = SomeClass.class.getResourceAsStream("troops.properties");
try {
props.load(inputStream);
} catch (IOException e) {
// Handle error
} finally {
// Close inputStream in a safe manner
}
troops.getProperty("Total"); // Returns "1"
Or if you are using Java 7:
Properties troops = new Properties();
try (InputStream inputStream = SomeClass.class.getResourceAsStream("troops.properties")) {
props.load(inputStream);
} catch (IOException e) {
// Handle error
}
troops.getProperty("Total"); // Returns "1"
If you are reading in the file a better way would be to use a StreamTokenizer. This then allows you to declare your own syntax of the tokenizer. I used this method to create a HTML rendering engine. This then allows you to parse direct from a reader, and also provides useful functions to identify numbers, which it seems you may use.
(I will post an example once my eclipse loads!)
public static String render(String file, HashMap vars){
// Create a stringbuffer to rebuild the string
StringBuffer renderedFile = new StringBuffer();
try{
FileReader in = new FileReader(file);
BufferedReader reader = new BufferedReader(in); // create your reader
StreamTokenizer tok;
tok = new StreamTokenizer(reader); //the tokenizer then takes in the reader as a builder
tok.resetSyntax();
tok.wordChars(0, 255); //sets all chars (inc spaces to be counted as words)
/*
* quoteChar allows you to set your comment char, for example $ hello $ means it will ignore hello
*/
tok.quoteChar('$');
while(tok.nextToken()!=StreamTokenizer.TT_EOF){ //while it is not at the end of file
String s = tok.sval;
if (vars.containsKey(s))
s =(String)vars.get(s);
renderedFile.append(s);
}
}
catch(Exception e){System.out.println("Error Loading Template");}
return renderedFile.toString();
}
Check this out for a good tutorial http://tutorials.jenkov.com/java-io/streamtokenizer.html

How to see if a Reader is at EOF?

My code needs to read in all of a file. Currently I'm using the following code:
BufferedReader r = new BufferedReader(new FileReader(myFile));
while (r.ready()) {
String s = r.readLine();
// do something with s
}
r.close();
If the file is currently empty, though, then s is null, which is no good. Is there any Reader that has an atEOF() method or equivalent?
The docs say:
public int read() throws IOException
Returns:
The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached.
So in the case of a Reader one should check against EOF like
// Reader r = ...;
int c;
while (-1 != (c=r.read()) {
// use c
}
In the case of a BufferedReader and readLine(), it may be
String s;
while (null != (s=br.readLine())) {
// use s
}
because readLine() returns null on EOF.
Use this function:
public static boolean eof(Reader r) throws IOException {
r.mark(1);
int i = r.read();
r.reset();
return i < 0;
}
A standard pattern for what you are trying to do is:
BufferedReader r = new BufferedReader(new FileReader(myFile));
String s = r.readLine();
while (s != null) {
// do something with s
s = r.readLine();
}
r.close();
the ready() method will not work. You must read from the stream and check the return value to see if you are at EOF.

Categories