reading characters from a file - java

i want to read a file one character at a time and write the contents of first file to another file one character at a time.
i have asked this question earlier also but didnt get a satisfactory answer.....
i am able to read the file and print it out to std o/p.but cant write the same read character to a file.

It may have been useful to link to your previous question to see what was unsatisfactory. Here's a basic example:
public static void copy( File src, File dest ) throws IOException {
Reader reader = new FileReader(src);
Writer writer = new FileWriter(dest);
int oneChar = 0;
while( (oneChar = reader.read()) != -1 ) {
writer.write(oneChar);
}
writer.close();
reader.close();
}
Additional things to consider:
wrap reader/writer with BufferedReader/Writer for better performance
the close calls should be in a finally block to prevent resource leaks

You can read characters from a file by using a FileReader (there's a read method that lets you do it one character at a time if you like), and you can write characters to a file using a FileWriter (there's a one-character-at-a-time write method). There are also methods to do blocks of characters rather than one character at a time, but you seemed to want those, so...
That's great if you're not worried about setting the character encoding. If you are, look at using FileInputStream and FileOutputStream with InputStreamReader and OutputStreamWriter wrappers (respectively). The FileInputStream and FileoutputStream classes work with bytes, and then the stream reader/writers work with converting bytes to characters according to the encoding you choose.

Related

A code I don't understand about the write method in I/O

I just started learning IO and there's something about this code I don't understand:
public static void main(String[] args) throws IOException {
FileReader reader = new FileReader("C://blablablal.txt");
FileWriter writer = new FileWriter("C://blabla.txt");
int c;
while ((c = reader.read()) != -1) {
writer.write(c);
}
reader.close();
writer.close();
}
I'll be happy for an explenation of how does the write method, which writes "c"(an int) in the while loop, actually writes it as a character or a string in the txt file.
Thanks
Firstly, FileReader will be ready to read the content of the File at the given file path if the path is correct.
Secondly,FileWriter will be ready to write the something to the File if it has something to write.
Next, The reader reads the contents from the File until the end of the File(-1) and writes to another file using writer.
Atlast the Reader and Writer will get closed.
Comment here if you have any Doubts.
The write(int c) method casts the int (32 bits) to a char (16 bits). The char is then converted to appropriate bytes depending on the encoding. In this case the encoding used is the platform default encoding, but you should always specify the encoding used to make sure that it will work properly in any environment.
The reason it doesn't take char c as a parameter is apparently a design oversight which is too late to correct now.
reader.read() is reading one character from a file at a time. This seems a little confusing because you are reading a character in as an integer value, but all this means really is that the integer value equates to a character. When there are no more characters to read in the file a -1 is returned and this is how you know you've reached the end of the file.
writer.write behaves the same way. It writes the integer value of a character to a file.
Here is a link to a good tutorial on Java IO FileReader.
And here is one for Java IO FileWriter.

Reading all content of a Java BufferedReader including the line termination characters

I'm writing a TCP client that receives some binary data and sends it to a device. The problem arises when I use BufferedReader to read what it has received.
I'm extremely puzzled by finding out that there is no method available to read all the data. The readLine() method that everybody is using, detects both \n and \r characters as line termination characters, so I can't get the data and concat the lines, because I don't know which char was the line terminator. I also can't use read(buf, offset, num), because it doesn't return the number of bytes it has read. If I read it byte by byte using read() method, it would become terribly slow. Please someone tell me what is the solution, this API seems quite stupid to me!
Well, first of all thanks to everyone. I think the main problem was because I had read tutorialspoint instead of Java documentation. But pardon me for it, as I live in Iran, and Oracle doesn't let us access the documentation for whatever reason it is. Thanks anyway for the patient and helpful responses.
This is more than likely an XY problem.
The beginning of your question reads:
I'm writing a TCP client that receives some binary data and sends it to a device. The problem arises when I use BufferedReader to read what it has received.
This is binary data; do not use a Reader to start with! A Reader wraps an InputStream using a Charset and yields a stream of chars, not bytes. See, among other sources, here for more details.
Next:
I'm extremely puzzled by finding out that there is no method available to read all the data
With reason. There is no telling how large the data may be, and as a result such a method would be fraught with problems if the data you receive is too large.
So, now that using a Reader is out of the way, what you really need to do is this:
read some binary data from a Socket;
copy this data to another source.
The solutions to do that are many; here is one solution which requires nothing but the standard JDK (7+):
final byte[] buf = new byte[8192]; // or other
try (
final InputStream in = theSocket.getInputStream();
final OutputStream out = whatever();
) {
int nrBytes;
while ((nrBytes = in.read(buf)) != -1)
out.write(buf, 0, nrBytes);
}
Wrap this code in a method or whatever etc.
I'm extremely puzzled by finding out that there is no method available to read all the data.
There are three.
The readLine() method that everybody is using, detects both \n and \r characters as line termination characters, so I can't get the data and concat the lines, because I don't know which char was the line terminator.
Correct. It is documented to suppress the line terminator.
I also can't use read(buf, offset, num), because it doesn't return the number of bytes it has read.
It returns the number of chars read.
If I read it byte by byte using read() method, it would become terribly slow.
That reads it char by char, not byte by byte, but you're wrong about the performance. It's buffered.
Please someone tell me what is the solution
You shouldn't be using a Reader for binary data in the first place. I can only suggest you re-read the Javadoc for:
BufferedInputStream.read() throws IOException;
BufferedInputStream.read(byte[]) throws IOException;
BufferedInputStream.read(byte[], int, int) throws IOException;
The last two both return the number of bytes read, or -1 at end of stream.
this API seems quite stupid to me!
No comment.
In the first place everyone who reads data has to plan for \n, \r, \r\n as possible sequences except when parsing HTTP headers which must be separated with \r\n. You could easily read line by line and output whatever line separator you like.
Secondly the read method returns the number of characters it has read into a char[] so that works exactly correctly if you want to read a chunk of chars and do your own line parsing and outputting.
The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:
String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
String l = "";
Char c = " ";
while (true){
c = br.read();
if not c == "\n"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
}
if not c == "\r"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
Char ctwo = ' '
ctwo = br.read();
if ctwo == "\n"{
// do extra stuff since you know that you've got a \r\n
}
}
else{
l = l + c;
}
if (l == null) break;
...
l = "";
}
previously answered by #https://stackoverflow.com/users/615234/arrdem

Java File Replace Lines

I have a 250 GB big .txt file and i have just 50 GB space left on my harddrive.
Every line in this .txt file has a long prefix and i want to delete this prefix
to make that file smaller.
First i wanted to read line by line, change it and write it into another file.
// read line out of first file
line = line.replace(prefix, "");
// write line into second file
The Problem is i have not enough space for that.
So how can i delete all prefixes out out of my file?
Check RandomAccessFile: http://docs.oracle.com/javase/7/docs/api/java/io/RandomAccessFile.html
You have to keep track of the position you are reading from and the position you are writing to. Initially both are at the start. Then you read N bytes (one line), shorten it, seek back N bytes and write M bytes (the shortened line). Then you seek forward (N - M) bytes to get back to the position where next line starts. Then you do this over and over again. In the end truncate excess with setLength(long).
You can also do it in batches (like read 4kb, process, write, repeat) to make it more efficient.
The process is identical in all languages. Some make it easier by hiding the seeking back and forth behind an API.
Of course you have to be absolutely sure that your program works flawlessly, since there is no way to undo this process.
Also, the RandomAccessFile is a bit limited, since it can not tell you at which position the file is at a given moment. Therefore you have to do conversion between "decoded strings" and "encoded bytes" as you go. If your file is in UTF-8, a given character in the string can take one ore many bytes in the file. So you can't just do seek(string.length()). You have to use seek(string.getBytes(encoding).length) and factor in possible line break conversions (Windows uses two characters for line break, Unix uses only one). But if you have ASCII, ISO-Latin-1 or similar trivial character encoding and know what line break chars the file has, then the problem should be pretty simple.
And as I edit my answer to match all possible corner cases, I think it would be better to read the file using BufferedReader and correct character encoding and also open a RandomAccessFile for doing the writing. If your OS supports having a file being opened twice. This way you would get complete Unicode support from BufferedReader and yuou wouldn't have to keep track of read and write positions. You have to do the writing with RandomAccessFile because using a Writer to the file may just truncate it (haven't tried it, though).
Something like this. It works on trivial examples but it has no error checking and I absolutely give no guarantees. Test it on a smaller file first.
public static void main(String[] args) throws IOException {
File f = new File(args[0]);
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream(f), "UTF-8")); // Use correct encoding here.
RandomAccessFile writer = new RandomAccessFile(f, "rw");
String line = null;
long totalWritten = 0;
while ((line = reader.readLine()) != null) {
line = line.trim() + "\n"; // Remove your prefix here.
byte[] b = line.getBytes("UTF-8");
writer.write(b);
totalWritten += b.length;
}
reader.close();
writer.setLength(totalWritten);
writer.close();
}
You can use RandomAccessFile. That allows you to overwrite parts of the file. And since there is no copy- or caching-mechanism mentioned in the javadoc this should work without additional disk-space.
So you could overwrite the unwanted parts with spaces.
Split the 250 GB file into 5 files of 50 GB each. Then process each file and then delete it. This way you will always have 50 GB left on your machine and you will also be able to process 250 GB file.
Since it does not have to be done in Java, i would recommend Python for this:
Save the following in replace.py in the same folder with your textfile:
import fileinput
for line in fileinput.input("your-file.txt", inplace=True):
print "%s" % (line.replace("oldstring", "newstring"))
replace the two strings with your string and execute python replace.py

Read lines from Java FileInputStream without losing my place

I have a FileInputStream. I'd like to read character-oriented, linewise data from it, until I find a particular delimiter. Then I'd like to pass the FileInputStream, with the current position set immediately after the end of the delimiter line, to a library that needs an InputStream.
I can use a BufferedReader to walk through the file a line at a time, and everything works great. However, this leaves the underlying file stream in
BufferedReader br = new BufferedReader(new InputStreamReader(myFileStream))
at a non-deterministic position -- the BufferedReader had to look ahead, and I don't know how far, and AFAICT there's no way to tell the BufferedReader to rewind the underlying stream to just after the last-returned line.
Is this the best solution? It seems crazy to have a ReaderInputStream(BufferedReader(InputStreamReader(FileInputStream))) but it's the only way I've seen to avoid rolling my own. I'd really like to avoid writing my own entire stream-that-reads-lines implementation if at all possible.
You cannot unbuffer a buffered reader. You have to use the same wrapper for the life for the application. In your situation I would use
DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
String line = dis.readLine();
While DataInputStream.readLine() is deprecated, it could work for you if you are careful. Otherwise you only option is to read the bytes yourself and parse the text using the encoding required.

Substitute chars on Java 1.4 InputStream

I have an InputStream that is returning, for example:
<?xml version='1.0' ?><env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><bbs:rule xmlns:bbs="http://com.foo/bbs">
I then pass the stream to a method that return a byte array.
I'd like to substitute "com.foo" with something else, like "org.bar" before I pass to the byte[] method.
What is a good way to do that?
If you have a bytearray you can transform it into a String. Pay attention to the encoding, in the example I use utf-8. I think this is a simple way to do that:
String newString = new String(byteArray, "utf-8");
newString = newString.replace("com.foo", "org.bar");
return newString.getBytes("utf-8");
One way is to wrap your InputStream in your own FilterInputStream subclass that does the transformation on the fly. It will have to be a look-ahead stream that checks every "c" character to see if it is followed by "om.foo" and if so make the substitution. You'll probably have to override just the read() method.
A stream reads/writes bytes. Trying to replace text in a binary representation is asking for trouble. So the first thing to do would be wrapping this stream into a Reader (like InputStreamReader) which will take care of translating the binary data into character information for you. You'll have to know the encoding of your streamed data, however, to make sure it is interpreted correctly. For example, UTF-8 or ISO-8859-1.
Once you have your textual data, you can think of how to replace parts of it. One way to do this is using regular expressions. However, this means you'll first have to read the entire stream into a string, do the substitution and then return the byte array. For large amounts of data, this might be inefficient.
Since you're dealing with XML data, you could make use of a higher-level approach and parse the XML in some way that allows you to process the contents without having to store them entirely in an intermediate format. A SAXParser with your own ContentHandler would do the trick. As events arrive, simply write them out again but with the proper alterations. Another approach would be an XSLT transformation with some extension function magic.
Wasn't there supposed to be some support for stream manipulations like this in java.nio? Or was this planned for an upcoming Java version?
This may not be the most efficient way to do it, but it certainly works.
InputStream is = // input;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(baos));
String line = null;
while((line = reader.readLine()) != null)
{
if(line.contains("com.foo"))
{
line = line.replace("com.foo", "org.bar");
}
writer.write(line);
}
return baos.toByteArray();

Categories