Get offset in file read by BufferedReader? - java

I'm reading a file line by line. The file is encoded by CipherOutputStream and then later compressed by DeflaterOutputStream. The file can consist of UTF-8 characters, like Russian letters, etc.
I want to obtain the offset in actually read file, or the number of bytes read by br.ReadLine() command. The problem is that the file is both encrypted, and deflated, so length of read String is larger than number of read bytes in file.
InputStream fis=tempURL.openStream(); //in tempURL I've got an URL to download
CipherInputStream cis=new CipherInputStream(fis,pbeCipher); //CipherStream
InflaterInputStream iis=new InflaterInputStream(cis); //InflaterInputStream
BufferedReader br = new BufferedReader(
new InputStreamReader(iis, "UTF8")); //BufferedReader
br.readLine();
int fSize=tempURL.openConnection().getContentLength(); //Catch FileSize

Use a CountingInputStream from the Apache Commons IO project:
InputStream fis=tempURL.openStream();
CountingInputStream countStream = new CountingInputStream(fis);
CipherInputStream cis=new CipherInputStream(countStream,pbeCipher);
...
Later you can obtain the file position with countStream.getByteCount().

For compressed files, you can find that a String doesn't use a whole number of bytes so the question cannot be answered. e.g. a byte can take less than a byte when compressed (otherwise there would be no point trying to compress it)
BTW: Is usually best to compress the data before encrypting it as it will usually be much more compact. Compressing the data after it has been encrypted will only help if its output is base 64 or something similar. Compression works best when you can the contents become predictable (e.g. repeating sequences, common characters) whereas the porpose of encryption is to make the data appear unpredictable.

Related

java Files.readAllBytes(image.png) doesn't work

I was trying to read from file and then write to other file. I use code bellow to do so.
byte[] bytes = Files.readAllBytes(file1);
Writer Writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2), "UTF-8"));
for(int i=0;i<bytes.length;i++)
Writer.write(bytes[i]);
Writer.close();
But when I change file1 to picture.png and file2 to picture2.png, this method doesn't work and I can't open picture2.png using image viewer.
What have I done wrong?
Writers are for writing text, possibly in different formats (ie utf-8 / 16, etc). For writing raw bytes, don't use writers. Just use (File)OutputStreams.
It is truly as simple as
byte[] bytes = ...;
FileOutputStream fos = ...;
fos.write(bytes);
The other answers explain why what you have potentially fails.
I'm curious why you're already using one Java NIO method, but not others? The library already has methods to do this for you.
byte[] bytes = Files.readAllBytes(file1);
Files.write(file2, bytes, StandardOpenOption.CREATE_NEW); // or relevant OpenOptions
or
FileOutputStream out = new FileOutputStream(file2); // or buffered
Files.copy(file1, out);
out.close();
or
Files.copy(file1, file2, options);
The problem is that Writer.write() doesn't take a byte. It takes a char, which is variable size, and often bigger than one byte.
But once you've got the whole thing read in as a byte[], you can just use Files.write() to send the whole array to a file in much the same way that you read it in:
Files.write(filename, bytes);
This is the more modern NIO idiom, rather than using an OutputStream.
It's worth reading the tutorial.

Files not copying correctly with Java

I have written a little program that just reads a files contents and writes it to a new copy. This works perfectly with text files, but with PNGs and video files, it fails to correctly create the file (the image is all black or the video will not play). I know there are APIs that can copy files with one line, but I'd love to know why this isn't working. Here is the code:
import java.io.*;
public class CopyFile
{
public static void main(String[] args) throws Exception
{
File file = new File("test.mp4");
File copy = new File("copy.mp4");
InputStreamReader input = new InputStreamReader(new FileInputStream(file));
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(copy));
System.out.println(input.ready());
while(input.ready())
{
int i = input.read();
//System.out.print( (char) ( (byte) i));
out.write(i);
}
input.close();
out.flush();
out.close();
}
}
Don't use Reader and Writer unless you know the input is text. Use InputStream and OutputStream.
Don't use ready(), or, for Sotirios' benefit, available() either. Neither of them is a valid test for end of stream. They both concern whether the input can be read without blocking, which isn't the same thing at all. See the Javadoc.
You're not detecting end of stream correctly. If read() returns -1 you're still copying that to the output.
Copying a single character or single byte at a time is extremely slow.
The canonical way to copy streams in Java is as follows:
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
where count is an int, and buffer is a byte[] of any size greater than zero, typically 8192.
Readers and Writers are for reading character streams (i.e., text). Pictures and videos are binary data, not text, and will probably be corrupted if you pass them through character streams. This is because, depending on the character set, there is not necessarily a reversible mapping between bytes and characters. Some byte sequences are gibberish if interpreted as characters, then gibberish gets written back to the file.
Use the InputStream and OutputStream that you open directly, instead of wrapping them up as a Reader and Writer, and it will work correctly. These are byte streams and can handle any type of data.
E.g.,
InputStream input = new FileInputStream(file);
OutputStream out = new FileOutputStream(copy);
P.S. This will still be quite slow. You can wrap the streams in a BufferedInputStream and BufferedOutputStream for a simple way to improve performance, although the one-line copy APIs will probably still be faster.

Java transfer files via UDP. Compressed files are damaged?

i'm trying to transfer Files with a DatagrammSocket in Java. I'm reading the files into 4096 Byte pieces. We are using ACK, so all pieces are in the right order, we tried pdf, exe, jpg and lot more stuff successfully, but iso, zip and 7z are not working. They have exactly the same size afterwards. Do you have any idea?
Reading the Parts:
byte[] b = new byte[FileTransferClient.PACKAGE_SIZE - 32];
FileInputStream read = new FileInputStream(file);
read.skip((part - 1) * (FileTransferClient.PACKAGE_SIZE - 32));
read.read(b);
content = b;
Writing the Parts:
stream = new FileOutputStream(new File(this.filePath));
stream.write(output);
...
stream.write(output);
stream.close();
(Sorry for great grammar, i'm German)
Your write() method calls are assuming that the entire buffer was filled by receive(). You must use the length provided with the DatagramPacket:
datagramSocket.receive(packet);
stream.write(packet.getData(), packet.getOffset(), packet.getLength());
If there is overhead in the packet, e.g. a sequence number, which there should be, you will need to adjust the offset and length accordingly.
NB TCP will ensure 'everything gets transferred and is not damaged'.

Commons Net FTPClient retrieved file encoding issue

I'm retrieving a file from a FTP Server. The file is encoded as UTF-8
ftpClient.connect(props.getFtpHost(), props.getFtpPort());
ftpClient.login(props.getUsername(), props.getPassword());
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
inputStream = ftpClient.retrieveFileStream(fileNameBuilder
.toString());
And then somewhere else I'm reading the input stream
bufferedReader = new BufferedReader(new InputStreamReader(
inputStream, "UTF-8"));
But the file is not getting read as UTF-8 Encoded!
I tried ftpClient.setAutodetectUTF8(true); but still doesn't work.
Any ideas?
EDIT:
For example a row in the original file is
...00248090041KENAN SARÐIN 00000000015.993FAC...
After downloading it through FTPClient, I parse it and load in a java object, one of the fields of the java object is name, which for this row is read as "KENAN SAR�IN"
I tried dumping to disk directly:
File file = new File("D:/testencoding/downloaded-file.txt");
FileOutputStream fop = new FileOutputStream(file);
ftpClient.retrieveFile(fileName, fop);
if (!file.exists()) {
file.createNewFile();
}
I compared the MD5 Checksums of the two files(FTP Server one and the and the one dumped to disk), and they're the same.
I would separate out the problems first: dump the file to disk, and compare it with the original. If it's the same as the original, the problem has nothing to do with UTF-8. The FTP code looks okay though, and if you're saying you want the raw binary data, I'd expect it not to mess with anything.
If the file is the same after transfer as before, then the problem has nothing to do with FTP. You say "the file is not getting read as UTF-8 Encoded" but it's not clear what you mean. How certain are you that it's UTF-8 text to start with? If you could edit your question with the binary data, how it's being read as text, and how you'd expect it to be read as text, that would really help.
Try to download the file content as bytes and not as characters using InputStream and OutputStream instead of InputStreamReader. This way you are sure that the file is not changed during transfer.

Java String I/O

I have to write a code in JAVA like following structure:
Read String From File
// Perform some string processing
Write output string in file
Now, for reading/writing string to/from file, I am using,
BufferedReader br = new BufferedReader(new FileReader("Text.txt"), 32768);
BufferedWriter out = new BufferedWriter(new FileWriter("AnotherText.txt"), 32768);
while((line = br.readLine()) != null) {
//perform some string processing
out.write(output string) ;
out.newLine();
}
However, it seems reading and writing is quite slow. Is there any other fastest method to read/write strings to/from a file in JAVA ?
Additional Info:
1) Read File is 144 MB.
2) I can allocate large memory (50 MB) for reading or writing.
3)I have to write it as a string, not as Byte.
It sounds slower than it should be.
You can try increasing the buffer size.
Maybe also try FileOutputStream instead of FileWriter.
You mentioned 50MB. Are you modifying the memory parameters of the program at all when you run it using a -X switch?
Ignoring the fact that you have not posted what your performance requirements are:
Try reading/writing the file as bytes and internally convert the byte to characters/string.
This question might be helpful: Number of lines in a file in Java

Categories