Read line as array of bytes without default encoding

Read line as array of bytes without default encoding - java

How to read in the most efficient way one line from a file (finished by \n, or \r, or both) as an array of bytes withouth going through String (if I read line into String, the default encoding is applied and I don't want to have this step).

I don't think you can do this without doing it manually. But to save you time, I'll write the code for you:
public static byte[] firstLine(InputStream in) {
byte[] buffer = new byte[1024]; // arbitrary number
int idx = 0;
byte b;
while ((b = in.read()) != 0x0d || b != 0x0a) { // those codes are CR and LF
if (idx >= buffer.length)
buffer = Arrays.copyOf(buffer, buffer.length * 2);
buffer[idx] = b;
return Arrays.copyOf(buffer, idx);
}

Related

Decryption only yields one correct line after encrypting line by line using RC4 algorithm

I have to encrypt a file line by line using the RC4 algorithm.
Encrypting the whole file and decrypting the whole file yields the original which is fine.
When I attempt to read the file one line at a time,encrypt it and then write the encrypted line to file, decryption of the resulting file yields just one correct line which is the first line of the original file.
I have tried to read the file and feed it to rc4 routine using a byte array whose size is a multiple of the key length but the results were the same. Here is my attempt:
try
{
BufferedReader br = new BufferedReader((new FileReader(fileToEncrypt)));
FileOutputStream fos = new FileOutputStream("C:\\Users\\nikaselo\\Documents\\Encryption\\encrypted.csv", true);
File file = new File("C:\\Users\\nikaselo\\Documents\\Encryption\\encrypted.csv");
// encrypt
while ((line = br.readLine()) != null)
{
byte [] encrypt = fed.RC4(line.getBytes(), pwd);
if (encrypt != null) dos.write(encrypt);
fos.flush();
}
fos.close();
// test decrypt
FileInputStream fis = null;
fis = new FileInputStream(file);
byte[] input = new byte[512];
int bytesRead;
while ((bytesRead = fis.read(input)) != -1)
{
byte [] de= fed.RC4(input, pwd);
String result = new String(de);
System.out.println(result);
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
and here is my RC4 function
public byte [] RC4 (byte [] Str, String Pwd) throws Exception
{
int[] Sbox = new int [256] ;
int A, B,c,Tmp;;
byte [] Key = {};
byte [] ByteArray = {};
//KEY
if ((Pwd.length() == 0 || Str.length == 0))
{
byte [] arr = {};
return arr;
}
if(Pwd.length() > 256)
{
Key = Pwd.substring(0, 256).getBytes();
}
else
{
Key = Pwd.getBytes();
}
//String
for( A = 0 ; A <= 255; A++ )
{
Sbox[A] = A;
}
A = B = c= 0;
for (A = 0; A <= 255; A++)
{
B = (B + Sbox[A] + Key[A % Pwd.length()]) % 256;
Tmp = Sbox[A];
Sbox[A] = Sbox[B];
Sbox[B] = Tmp;
}
A = B = c= 0;
ByteArray = Str;
for (A = 0; A <= Str.length -1 ; A++)
{
B = (B + 1) % 256;
c = (c + Sbox[B]) % 256;
Tmp = Sbox[B];
Sbox[B] = Sbox[c];
Sbox[c] = Tmp;
ByteArray[A] = (byte) (ByteArray[A] ^ (Sbox[(Sbox[B] + Sbox[c]) % 256]));
}
return ByteArray;
}
Running this gives me one clean line and the rest is just unreadable.

You are encrypting line by line, but you are trying to decrypt in 512 bytes blocks.
Your options, as I see it are:
Encrypt and decrypt in fixed sized blocks
Pad each line out to 512 bytes (and split lines that are longer than 512 bytes)
Introduce a delimiter. This will be tricky because potentially any delimiter could appear in the cipher text, so you should base64 encode each encrypted line and separate them with line feeds.
Probably 1 is the easiest (and the one used in real encryption), but if you have to do it line by line, I would go with 3 even though this introduces a vulnerability, but it's RC4 which is no longer considered secure anyway.

How to read whole file with read(char[] cbuf, int off, int len)

I've got this soure:
public static void inBufferBooks() throws IOException
{
Reader inStreamBooks = null;
BufferedReader bufferIn = null;
try
{
inStreamBooks = new FileReader("Files/BufferBook.txt");
bufferIn = new BufferedReader(inStreamBooks);
char text[] = new char[10];
int i = -1;
while ((i = inStreamBooks.read(text, 0, 10)) != -1)
{
System.out.print(text);
}
When I read file at the end of the text console printing chars who's fill last array.
How can I read whole text from the file without redundant chars from last array?

How can I read whole text from the file without redundant chars from last array?
Use the value read returns to you to determine how many characters in the array are still valid. From the documentation:
Returns:
The number of characters read, or -1 if the end of the stream has been reached

You need to remember how may characters you read and only print that many.
for (int len; ((len = inStreamBooks.read(text, 0, text.length)) != -1; ) {
System.out.print(new String(text, 0, len));
}

To resolve the problem I change my while cycle like this:
while((i = bufferText.read(text, 0, text.length)) != -1){
if(text.length == i){
System.out.print(text);
}else if (text.length != i){
System.out.print(Arrays.copyOfRange(text, 0, i));
}
Thanks everyone for the help.

Find single line comments in byte array

Is it possible to find instances of // in a line read from a file into a byte array and then "snip" from // to the end of the line out? I'm trying
FileInputStream fis = new FileInputStream(file);
byte[] buffer = new byte[8 * 1024];
int read;
while ((read = fis.read(buffer)) != -1)
{
for (int i = 0; i < read; i++)
{
if (buffer[i] == '//')
{
buffer = buffer[0:i];
}
}
}
but I'm getting Invalid character constant at if (buffer[i] == '//') on the '//' part. Am I doing something wrong, or is this just not possible?

Old-school solution
for (int i = 0; i < read-1; i++)
{
(if (buffer[i] == '/') && (buffer[i+1]== '/'))
{
buffer = buffer[0:i];
}
}

' and ' denote one character. Since // are two characters this does not work. One has to differentiate between a character and a string. Thus you have to individually check both positions in the byte array to confirm there are two successive /s.

What is CharsetDecoder.decode(ByteBuffer, CharBuffer, endOfInput)

I have a problem with CharsetDecoder class.
First example of code (which works):
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final ByteBuffer b = ByteBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1);
}
try {
b.flip();
System.out.println("a" + dec.decode(b).toString() + "a");
} catch (CharacterCodingException e1) {
e1.printStackTrace();
}
The result is a€a
But when i execute this code:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
dec.decode(buffer, chars, i == 2);
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
The result is a
Why is not the same result?
How to use the method decode(ByteBuffer, CharBuffer, endOfInput) of class CharsetDecoder in order to retrieve the result a€a ?
-- EDIT --
So with code of Jesper I do that. It's no perfect but works with a step = 1, 2 and 3
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(6);
final byte[] tab = new byte[]{(byte)97, (byte)-30, (byte)-126, (byte)-84, (byte)97, (byte)97}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
final int step = 3;
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab, i, step);
i+=step-1;
// Remember the current position
final int pos = buffer.position();
int l=chars.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i >= tab.length -1);
System.out.println(result);
if (result.isUnderflow() && chars.position() == l) {
// Underflow, prepare the buffer for more writing
buffer.position(pos);
}else{
if (buffer.position() == buffer.limit()){
//ByteBuffer decoded
buffer.clear();
buffer.position(0);
}else{
//a part of ByteBuffer is decoded. We keep only bytes which are not decoded
final byte[] b = buffer.array();
final int f = buffer.position();
final int g = buffer.limit() - buffer.position();
buffer.clear();
buffer.position(0);
buffer.put(b, f, g);
}
}
buffer.limit(buffer.capacity());
}
dec.flush(chars);
chars.flip();
System.out.println(chars.toString());

The method decode(ByteBuffer, CharBuffer, boolean) returns a result, but you are ignoring the result. If print the result in your second code fragment:
for (int i = 0; i < tab.length; i++) {
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
System.out.println(dec.decode(buffer, chars, i == 2));
}
you'll see this output:
UNDERFLOW
MALFORMED[1]
MALFORMED[1]
a a
Apparently it does not work correctly if you start decoding in the middle of a character. The decoder expects that the first thing it reads is the start of a valid UTF-8 sequence.
edit - When the decoder reports UNDERFLOW, it expects you to add more data to the input buffer and then try to call decode() again, but you must re-offer it the data from the start of the UTF-8 sequence that you are trying to decode. You can't continue in the middle of an UTF-8 sequence.
Here is a version that works, adding one byte from tab in every iteration of the loop:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte) -30, (byte) -126, (byte) -84}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab[i]);
// Remember the current position
final int pos = buffer.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i == 2);
System.out.println(result);
if (result.isUnderflow()) {
// Underflow, prepare the buffer for more writing
buffer.limit(buffer.capacity());
buffer.position(pos);
}
}
dec.flush(chars);
chars.flip();
System.out.println("a" + chars.toString() + "a");

The decoder does not internally cache the data from partial characters, but this does not mean that you have to do complicated things to figure out what data to re-feed the decoder. You gave it a clear way to represent what data it actually consumed, i.e. the input ByteBuffer and its position. In the second example, by giving it a new ByteBuffer every time, the OP failed to pass the decoder back what it reported it had not yet consumed.
The standard pattern for using NIO Buffers is input, flip, output, compact, loop. Short of optimization (which may be premature), there is no reason to re-implement compact yourself. You might just get it wrong, like #Jesper and #lecogiteur did (if more than a single character was ever presented). You should NOT be resetting to the position from before the decode call.
The second example should have read something like:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
final ByteBuffer buffer = ByteBuffer.wrap(new byte[3]);
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1); // In actual usage some type of IO read/transfer would occur here
b.flip();
dec.decode(buffer, chars, i == 2);
b.compact();
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
NOTE: The above does not check the return value to detect malformed input or other error handling for running safely on arbitrary input/IO conditions.

How to read bytes from a file, whereas the result byte[] is exactly as long

I want the result byte[] to be exactly as long as the file content. How to achieve that.
I am thinking of ArrayList<Byte>, but it doe not seem to be efficient.

Personally I'd go the Guava route:
File f = ...
byte[] content = Files.toByteArray(f);
Apache Commons IO has similar utility methods if you want.
If that's not what you want, it's not too hard to write that code yourself:
public static byte[] toByteArray(File f) throws IOException {
if (f.length() > Integer.MAX_VALUE) {
throw new IllegalArgumentException(f + " is too large!");
}
int length = (int) f.length();
byte[] content = new byte[length];
int off = 0;
int read = 0;
InputStream in = new FileInputStream(f);
try {
while (read != -1 && off < length) {
read = in.read(content, off, (length - off));
off += read;
}
if (off != length) {
// file size has shrunken since check, handle appropriately
} else if (in.read() != -1) {
// file size has grown since check, handle appropriately
}
return content;
} finally {
in.close();
}
}

I'm pretty sure File#length() doesn't iterate through the file. (Assuming this is what you meant by length()) Each OS provides efficient enough mechanisms to find file size without reading it all.

Allocate an adequate buffer (if necessary, resize it while reading) and keep track of how many bytes read. After finishing reading, create a new array with the exact length and copy the content of the reading buffer.

Small function that you can use :
// Returns the contents of the file in a byte array.
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
// You cannot create an array using a long type.
// It needs to be an int type.
// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
throw new RuntimeException(file.getName() + " is too large");
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Read line as array of bytes without default encoding - java

How to read in the most efficient way one line from a file (finished by \n, or \r, or both) as an array of bytes withouth going through String (if I read line into String, the default encoding is applied and I don't want to have this step).

Related

Decryption only yields one correct line after encrypting line by line using RC4 algorithm

How to read whole file with read(char[] cbuf, int off, int len)

Find single line comments in byte array

What is CharsetDecoder.decode(ByteBuffer, CharBuffer, endOfInput)

How to read bytes from a file, whereas the result byte[] is exactly as long

Categories

Resources