random characters in byte to string conversion java - java

I am converting byte [] into a string. Everytime that I convert the byte array to a string, it has a prefixed-type character before it every single time. I have tried different characters, uppercase, etc.. Still has the prefix.
When I write the byte code to system output, it still has the character.
System.out.write(theByteArray);
System.out.println(new String(theByteArray, "UTF-8"));
When I write the text to a file, it seems like the byte array printed flawlessly, but then I scan it and end up with the weird prefix symbol...
Text to be encrypted >
"aaaa"
Text when decrypted and converted to a string >
"aaaa"
The Character seems to disappear, here is an image of it.
I am wanting to compare the given string to another string, kind of like decrypting a password, and comparing it to a database. If one matches, then it gives access.
Code that is generating this byte code.
Keep in mind, the byte I am looking at is decData, and this is NOT my code.
byte[] encData;
byte[] decData;
File inFile = new File(fileName+ ".encrypted");
//Generate the cipher using pass:
Cipher cipher = FileEncryptor.makeCipher(pass, false);
//Read in the file:
FileInputStream inStream = new FileInputStream(inFile);
encData = new byte[(int)inFile.length()];
inStream.read(encData);
inStream.close();
//Decrypt the file data:
decData = cipher.doFinal(encData);
//Figure out how much padding to remove
int padCount = (int)decData[decData.length - 1];
//Naive check, will fail if plaintext file actually contained
//this at the end
//For robust check, check that padCount bytes at the end have same value
if( padCount >= 1 && padCount <= 8 ) {
decData = Arrays.copyOfRange( decData , 0, decData.length - padCount);
}
FileOutputStream target = new FileOutputStream(new File(fileName + ".decrypted.txt"));
target.write(decData);
target.close();

Looks like encData contains BOM and I think Java, when reading in a stream with BOM, will just treat the BOM as an UTF-8 character, which caused the "prefix". You can try the solution suggested here: Reading UTF-8 - BOM marker.
On the other hand, byte order mark is optional and not recommended for UTF-8 encoding. So two questions to ask is:
Is the original data encoded using utf-8?
If it is, it might be worth while to find out why did the BOM gets into the original data in the first place.

Related

Converting bytes with Charset results in diamonds at end of string?

I am currently storing a String as an array of bytes. However, when I try to use the following code to convert the bytes back to a String using Charset, I have diamonds at the end:
byte[] testbytes = "abc123".getBytes(); // tried getBytes("UTF-8"/StandardCharsets.UTF_8) too
Charset charset = Charset.forName("UTF-8"); // ISO-8859-1 has no diamonds
CharBuffer charBuffer = charset.decode( ByteBuffer.wrap( Arrays.copyOfRange(testbytes,0,testbytes.length) ) );
System.out.println("converted = " + String.valueOf(charBuffer.array()) );
// returns this - abc123����������
If I set the encoding to ISO-8859-1 instead, it converts fine. I thought it might be the encoding of the source code file but opening that in Notepad++ suggests it is also in UTF-8.
Am I missing something or is this just a problem with Android Studio's Logcat window?
- Edit 1 -
Further testing shows that 3 character strings do not have this padding at the end problem. If you use longer strings, Charset.decode seems to pad out the char array with \u0000 values according to the break point.
String.valueOf will end up printing the padded characters as diamonds while creating a new String object removes the padding but, I would like to not use String at all to convert a byte array to a char array due to sensitive values.
- Edit 2 -
It appears the above happens if you call charset.decode() again so, I'm guessing there's a buffer that's being appended to but not sure at what point. Tried clearing with charBuffer.clear() but the second block of code's output appears to be the same i.e. 3 char + 2 spaces + 6 new chars.
String test1 = "123";
byte[] test1b = test1.getBytes();
char[] expected1 = test1.toCharArray();
CharBuffer charBuffer = charset.decode( ByteBuffer.wrap( test1b ) );
char[] actual1 = charBuffer.array(); // size 3, correct
String test2 = "123456";
byte[] test2b = test2.getBytes();
char[] expected2 = test2.toCharArray();
CharBuffer charBuffer2 = charset.decode( ByteBuffer.wrap( test2b ) );
char[] actual2 = charBuffer2.array(); // size 11, padded with '\u0000' 0
Did you try to use the String constructor that receives an array of bytes?
Like:
byte[] testbytes = "abc123".getBytes(StandardCharsets.UTF_8);
String stringDecoded = new String(testbytes, StandardCharsets.UTF_8);
Maybe it can solve your problem.

Base64 Encoded to Decoded File Conversion Problem

I am processing very large files (> 2Gig). Each input file is Base64 encoded, andI am outputting to new files after decoding. Depending on the buffer size (LARGE_BUF) and for a given input file, my input to output conversion either works fine, is missing one or more bytes, or throws an exception at the outputStream.write line (IllegalArgumentException: Last unit does not have enough bits). Here is the code snippet (could not cut and paste so my not be perfect):
.
.
final int LARGE_BUF = 1024;
byte[] inBuf = new byte[LARGE_BUF];
try(InputStream inputStream = new FileInputStream(inFile); OutputStream outStream new new FileOutputStream(outFile)) {
for(int len; (len = inputStream.read(inBuf)) > 0); ) {
String out = new String(inBuf, 0, len);
outStream.write(Base64.getMimeDecoder().decode(out.getBytes()));
}
}
For instance, for my sample input file, if LARGE_BUF is 1024, output file is 4 bytes too small, if 2*1024, I get the exception mentioned above, if 7*1024, it works correctly. Grateful for any ideas. Thank you.
First, you are converting bytes into a String, then immediately back into bytes. So, remove the use of String entirely.
Second, base64 encoding turns each sequence of three bytes into four bytes, so when decoding, you need four bytes to properly decode three bytes of original data. It is not safe to create a new decoder for each arbitrarily read sequence of bytes, which may or may not have a length which is an exact multiple of four.
Finally, Base64.Decoder has a wrap(InputStream) method which makes this considerably easier:
try (InputStream inputStream = Base64.getDecoder().wrap(
new BufferedInputStream(
Files.newInputStream(Paths.get(inFile))))) {
Files.copy(inputStream, Paths.get(outFile));
}

How to read/write extended ASCII characters as a string into ANSI coded text file in java

This is my encryption program. Primarily used to encrypt Files(text)
This part of the program converts List<Integer> elements intobyte [] and writes it into a text file. Unfortunately i cannot provide the algorithm.
void printit(List<Integer> prnt, File outputFile) throws IOException
{
StringBuilder building = new StringBuilder(prnt.size());
for (Integer element : prnt)
{
int elmnt = element;
//building.append(getascii(elmnt));
building.append((char)elmnt);
}
String encryptdtxt=building.toString();
//System.out.println(encryptdtxt);
byte [] outputBytes = offo.getBytes();
FileOutputStream outputStream =new FileOutputStream(outputFile);
outputStream.write(outputBytes);
outputStream.close();
}
This is the decryption program where the decryption program get input from a .enc file
void getfyle(File inputFile) throws IOException
{
FileInputStream inputStream = new FileInputStream(inputFile);
byte[] inputBytes = new byte[(int)inputFile.length()];
inputStream.read(inputBytes);
inputStream.close();
String fylenters = new String(inputBytes);
for (char a:fylenters.toCharArray())
{
usertext.add((int)a);
}
for (Integer bk : usertext)
{
System.out.println(bk);
}
}
Since the methods used here, in my algorithm require List<Integer> byte[] gets converted to String first and then to List<Integer>and vice versa.
The elements while writing into a file during encryption do not match the elements read from the .enc file.
Is my method of converting List<Integer> to byte[] correct??
or is something else wrong? . I do know that java can't print extended ASCII characters so i used this .But, even this failed.It gives a lot of ?s
Is there a solution??
please help me .. and also how to do it for other formats(.png.mp3....etc)
The format of the encrypted file can be anything (it needn't be .enc)
thanxx
There are thousands of different 'extended ASCII' codes and Java supports about a hundred of them,
but you have to tell it which 'Charset' to use or the default often causes data corruption.
While representing arbitrary "binary" bytes in hex or base64 is common and often necessary,
IF the bytes will be stored and/or transmitted in ways that preserve all 256 values, often called "8-bit clean",
and File{Input,Output}Stream does, you can use "ISO-8859-1" which maps Java char codes 0-255 to and from bytes 0-255 without loss, because Unicode is based partly on 8859-1.
on input, read (into) a byte[] and then new String (bytes, charset) where charset is either the name "ISO-8859-1"
or the java.nio.charset.Charset object for that name, available as java.nio.charset.StandardCharSets.ISO_8859_1;
or create an InputStreamReader on a stream reading the bytes from a buffer or directly from the file, using that charset name or object, and read chars and/or a String from the Reader
on output, use String.getBytes(charset) where charset is that charset name or object and write the byte[];
or create an OutputStreamWriter on a stream writing the bytes to a buffer or the file, using that charset name or object, and write chars and/or String to the Writer
But you don't actually need char and String and Charset at all. You actually want to write a series of Integers as bytes, and read a series of bytes as Integers. So just do that:
void printit(List<Integer> prnt, File outputFile) throws IOException
{
byte[] outputBytes = new byte[prnt.size()]; int i = 0;
for (Integer element : prnt) outputBytes[i++] = (byte)element;
FileOutputStream outputStream =new FileOutputStream(outputFile);
outputStream.write(b);
outputStream.close();
// or replace the previous three lines by one
java.nio.file.Files.write (outputFile.toPath(), outputBytes);
}
void getfyle(File inputFile) throws IOException
{
FileInputStream inputStream = new FileInputStream(inputFile);
byte[] inputBytes = new byte[(int)inputFile.length()];
inputStream.read(inputBytes);
inputStream.close();
// or replace those four lines with
byte[] inputBytes = java.nio.file.Files.readAllBytes (inputFile.toPath());
for (byte b: inputBytes) System.out.println (b&0xFF);
// or if you really wanted a list not just a printout
ArrayList<Integer> list = new ArrayList<Integer>(inputBytes.length);
for (byte b: inputBytes) list.add (b&0xFF);
// return list or store it or whatever
}
Arbitrary data bytes are not all convertible to any character encoding and encryption creates data bytes including all values 0 - 255.
If you must convert the encrypted data to a string format the standard methods are to convert to Base64 or hexadecimal.
In encryption part:
`for (Integer element : prnt)
{
int elmnt = element;
//building.append(getascii(elmnt));
char b = Integer.toString(elmnt).charAt(0);
building.append(b);
}`
-->this will convert int to char like 1 to '1' and 5 to '5'

What encoding Java uses to create string from give unicode data?

I am quite perplexed on why I should not be encoding unicode text with UTF-8 for comparison when other text(to compare) has been encoded with UTF-8?
I wanted to compare a text(= アクセス拒否 - means Access denied) stored in external file encoded as UTF-8 with a constant string stored in a .java file as
public static final String ACCESS_DENIED_IN_JAPANESE = "\u30a2\u30af\u30bb\u30b9\u62d2\u5426"; // means Access denied
The java file was encoded as Cp1252.
I read the file as as input stream by using below code. Point to note that I am using UTF-8 for encoding.
InputStream in = new FileInputStream("F:\\sample.txt");
int b1;
byte[] bytes = new byte[4096];
int i = 0;
while (true) {
b1 = in.read();
if (b1 == -1)
break;
bytes[i++] = (byte) b1;
}
String japTextFromFile = new String(bytes, 0, i, Charset.forName("UTF-8"));
Now when I compare as
System.out.println(ACCESS_DENIED_IN_JAPANESE.equals(japTextFromFile)); // result is `true` , and works fine
but when I encode ACCESS_DENIED_IN_JAPANESE with UTF-8 and try to compare it with japTextFromFile result is false. The code is
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(),Charset.forName("UTF-8"));
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `false`
So my doubt is why above comparison is failing, when both the strings are same and have been encoded with UTF-8? The result should be true.
However, in first case, when compared different encoded strings- one with UTF-16(Java default way of encoding string) and other with UTF-8 , result is true, which I think should be false as it is different encoding ,no matter text we read, is same.
Where I am wrong in my understanding? Any clarification is greatly appreciated.
ACCESS_DENIED_IN_JAPANESE.getBytes() does not use UTF-8. It uses your platform's default charset. But then you use UTF-8 to turn those bytes back into a String. This gets you a different String to the one you started with.
Try this:
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(StandardCharsets.UTF_8),StandardCharsets.UTF_8
);
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `true`
The best way I know is put all static texts into a text file encoded with UTF-8. And then read those resources with FileReader, setting encoding parameter to "UTF-8"

Stream decoding of Base64 data

I have some large base64 encoded data (stored in snappy files in the hadoop filesystem).
This data was originally gzipped text data.
I need to be able to read chunks of this encoded data, decode it, and then flush it to a GZIPOutputStream.
Any ideas on how I could do this instead of loading the whole base64 data into an array and calling Base64.decodeBase64(byte[]) ?
Am I right if I read the characters till the '\r\n' delimiter and decode it line by line?
e.g. :
for (int i = 0; i < byteData.length; i++) {
if (byteData[i] == CARRIAGE_RETURN || byteData[i] == NEWLINE) {
if (i < byteData.length - 1 && byteData[i + 1] == NEWLINE)
i += 2;
else
i += 1;
byteBuffer.put(Base64.decodeBase64(record));
byteCounter = 0;
record = new byte[8192];
} else {
record[byteCounter++] = byteData[i];
}
}
Sadly, this approach doesn't give any human readable output.
Ideally, I would like to stream read, decode, and stream out the data.
Right now, I'm trying to put in an inputstream and then copy to a gzipout
byteBuffer.get(bufferBytes);
InputStream inputStream = new ByteArrayInputStream(bufferBytes);
inputStream = new GZIPInputStream(inputStream);
IOUtils.copy(inputStream , gzipOutputStream);
And it gives me a
java.io.IOException: Corrupt GZIP trailer
Let's go step by step:
You need a GZIPInputStream to read zipped data (that and not a GZIPOutputStream; the output stream is used to compress data). Having this stream you will be able to read the uncompressed, original binary data. This requires an InputStream in the constructor.
You need an input stream capable of reading the Base64 encoded data. I suggest the handy Base64InputStream from apache-commons-codec. With the constructor you can set the line length, the line separator and set doEncode=false to decode data. This in turn requires another input stream - the raw, Base64 encoded data.
This stream depends on how you get your data; ideally the data should be available as InputStream - problem solved. If not, you may have to use the ByteArrayInputStream (if binary), StringBufferInputStream (if string) etc.
Roughly this logic is:
InputStream fromHadoop = ...; // 3rd paragraph
Base64InputStream b64is = // 2nd paragraph
new Base64InputStream(fromHadoop, false, 80, "\n".getBytes("UTF-8"));
GZIPInputStream zis = new GZIPInputStream(b64is); // 1st paragraph
Please pay attention to the arguments of Base64InputStream (line length and end-of-line byte array), you may need to tweak them.
Thanks to Nikos for pointing me in the right direction.
Specifically this is what I did:
private static final byte NEWLINE = (byte) '\n';
private static final byte CARRIAGE_RETURN = (byte) '\r';
byte[] lineSeparators = new byte[] {CARRIAGE_RETURN, NEWLINE};
Base64InputStream b64is = new Base64InputStream(inputStream, false, 76, lineSeparators);
GZIPInputStream zis = new GZIPInputStream(b64is);
Isn't 76 the length of the Base64 line? I didn't try with 80, though.

Categories