I need to read a set of xml and property files and parse the data. Currently I am using inputstream ans string builder to do this. But this does not create the file in the same way as input file is. I donot want to remove the white spaces and new lines. How do i achieve this.
is = test.getInputStream();
br = new BufferedReader(new InputStreamReader(is));
String line5;
StringBuilder sb5 = new StringBuilder();
while ((line5 = br.readLine()) != null) {
sb5.append(line5);
}
String s = sb5.toString();
My output is:
#test 123 #test2 345
Expected output is:
#test
123
#test2
345
Any thoughts ? Thanks
br.readLine() consumes the line breaks, you need to add them to your StringBuilder after appending the line.
is = test.getInputStream();
br = new BufferedReader(new InputStreamReader(is));
String line5;
StringBuilder sb5 = new StringBuilder();
while ((line5 = br.readLine()) != null) {
sb5.append(line5);
sb5.append("\n");
}
If you want an extremely simple solution for reading a file to a String, Apache Commons-IO has a method for performing such a task (org.apache.commons.io.FileUtils).
FileUtils.readFileToString(File file, String encoding);
readLine() method doesn't add the EOL character (\n). So while appending the string to the builder, you need to add the EOL char, like sb5.append(line5+"\n");
The various readLine methods discard the newline from the input.
From the BufferedReader docs:
Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
A solution may be as simple as adding back a newline to your StringBuilder for every readLine: sb5.append(line5 + "\n");.
A better alternative is to read into an intermediate buffer first, using the read method, supplying your own char[]. You can still use StringBuilder.append, and get a String will match the file contents.
Related
I would like to remove the new line character in the middle of a line of a file while it is reading the file.
If I'm going to read the file with BufferedReader then it is recognised as a new line and split the line in the middle. I want to be able to read the file and remove those new line characters of the middle while reading.
The format of each line is a simple Json.
Thank you
If what youre saying is you want to remove the newlines from the original file after reading them, I think you can just write to a new (temporary) file while youre reading the lines, and then replace the file with the original after youre done writing.
If I'm interpreting your question correctly, what you want to do isn't quite as simple as "read and write at the same time". What you need is a loop and a StringBuilder.
public String readFileWithNoLines(BufferedReader reader) throws IOException {
StringBuilder builder = new StringBuilder();
String line;
while((line = reader.readLine()) != null) {
builder.append(line);
}
return builder.toString();
}
Then you want to write the return value of that function to the file.
I have big file (about 30mb) and here the code I use to read data from the file
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String line = br.readLine();
while (line != null) {
sb.append(line).append("\n");
line = br.readLine();
}
Then I need to split the content I read, so I use
String[] inst = sb.toString().split("GO");
The problem is that sometimes the sub-string is over the maximum String length and I can't get all the data inside the string. How can I get rid of this?
Thanks
Scanner s = new Scanner(input).useDelimiter("GO"); and use s.next()
WHY PART:- The erroneous result may be the outcome of non contiguous heap segment as the CMS collector doesn't de-fragment memory.
(It does not answer your how to solve part though).
You may opt for loading the whole string partwise, i.e using substring
I'm trying to learn Java/Android and right now I'm doing some experiments with the replaceAll function. But I've found that with large text files the process gets sluggish so I was wondering if there is a way to skip the "useless" parts of a file to have a better performance. (Note: Just skip them, not delete them)
Note: I am not trying to "count lines" or "println" or "system.out", I'm just replacing strings and saving the changes in the same file.
Example
AAAA
CCCC- 9234802394819102948102948104981209381'238901'2309'129831'2381'2381'23081'23081'284091824098304982390482304981'20841'948023984129048'1489039842039481'204891'29031'923481290381'20391'294872385710239841'20391'20931'20853029573098341'290831'20893'12894093274019799919208310293810293810293810293810298'120931¿2093¿12039¿120931¿203912¿0391¿203912¿039¿12093¿12093¿12093¿12093¿12093¿1209312¿0390¿... DDDD
AAAA
CCCC- 9234802394819102948102948104981209381'238901'2309'129831'2381'2381'23081'23081'284091824098304982390482304981'20841'948023984129048'1489039842039481'204891'29031'923481290381'20391'294872385710239841'20391'20931'20853029573098341'290831'20893'12894093274019799919208310293810293810293810293810298'120931¿2093¿12039¿120931¿203912¿0391¿203912¿039¿12093¿12093¿12093¿12093¿12093¿1209312¿0390¿... DDDD
and so on....like a zillion times
I want to replace all "AAAA" with "BBBB", but there are large portions of data between the strings I am replacing. Also, this portions always begin with "CCCC" and end with "DDDD".
Here's the code I am using to replace the string.
File file = new File("my_file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "", oldtext = "";
while((line = reader.readLine()) != null) {
oldtext += line + "\r\n";
}
reader.close();
// Replacing "AAAA" strings
String newtext= oldtext.replaceAll("AAAA", "BBBB");
FileWriter writer = new FileWriter("my_file.txt");
writer.write(newtext);
writer.close();
I think reading all lines is inefficient, especially when you won't be modifying these parts (and they represent the 90% of the file).
Does anyone know a solution???
You are wasting a lot of time on this line --
oldtext += line + "\r\n";
In Java, String is immutable, which means you can't modify them. Therefore, when you do the concatenation, Java is actually making a complete copy of oldtext. So, for every line in your file, you are recopying every line that came before in your new String. Take a look at StringBuilder for a a way to build a String avoiding these copies.
However, in your case, you do not need the whole file in memory, because you can process line by line. By moving your replaceAll and write into your loop, you can operate on each line as you read it. This will keep the memory footprint of the routine down, because you are only keeping a single line in memory.
Note that since the FileWriter is opened before you read the input file, you need to have a different name for the output file. If you want to keep the same name, you can do a renameTo on the File after you close it.
File file = new File("my_file.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
FileWriter writer = new FileWriter("my_out_file.txt");
String line = "";
while((line = reader.readLine()) != null) {
// Replacing "AAAA" strings
String newtext= line.replaceAll("AAAA", "BBBB");
writer.write(newtext);
}
reader.close();
writer.close();
My program must read text files - line by line.
Files in UTF-8.
I am not sure that files are correct - can contain unprintable characters.
Is possible check for it without going to byte level?
Thanks.
Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.
E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):
String line;
try (
InputStream fis = new FileInputStream("the_file_name");
InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
) {
while ((line = br.readLine()) != null) {
// Deal with the line
}
}
While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
You can then do whatever you like with those lines.
EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.
Just found out that with the Java NIO (java.nio.file.*) you can easily write:
List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8);
for(String line:lines){
System.out.println(line);
}
instead of dealing with FileInputStreams and BufferedReaders...
If you want to check a string has unprintable characters you can use a regular expression
[^\p{Print}]
How about below:
FileReader fileReader = new FileReader(new File("test.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
// if no more lines the readLine() returns null
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
}
Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html
I can find following ways to do.
private static final String fileName = "C:/Input.txt";
public static void main(String[] args) throws IOException {
Stream<String> lines = Files.lines(Paths.get(fileName));
lines.toArray(String[]::new);
List<String> readAllLines = Files.readAllLines(Paths.get(fileName));
readAllLines.forEach(s -> System.out.println(s));
File file = new File(fileName);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
The answer by #T.J.Crowder is Java 6 - in java 7 the valid answer is the one by #McIntosh - though its use of Charset for name for UTF -8 is discouraged:
List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"),
StandardCharsets.UTF_8);
for(String line: lines){ /* DO */ }
Reminds a lot of the Guava way posted by Skeet above - and of course same caveats apply. That is, for big files (Java 7):
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {}
If every char in the file is properly encoded in UTF-8, you won't have any problem reading it using a reader with the UTF-8 encoding. Up to you to check every char of the file and see if you consider it printable or not.
I am using this code to read a txt file, line by line.
// Open the file that is the first command line parameter
FileInputStream fstream = new FileInputStream("/Users/dimitramicha/Desktop/SweetHome3D1.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
int i = 0;
while ((strLine = br.readLine()) != null) {
str[i] = strLine;
i++;
}
// Close the input stream
in.close();
and I save it in an array.
Afterwards, I would like to make an if statement about the Strings that I saved in the array. But when I do that it doesn't work, because (as I've thought) it saves also the spaces (backslashes). Do you have any idea how I can save the data in the array but without spaces?
I would do:
strLineWithoutSpaces = strLine.replace(' ', '');
str[i] = strLineWithoutSpaces;
You can also do more replaces if you find other characters that you don't want.
Have a look at the replace method in String and call it on strLine before putting it in the array.
You can use a Scanner which by default uses white space to separate tokens. Have a look at this tutorial.