Byte array - locating a character position

Byte array - locating a character position - java

I have a byte array in java. That array contains '%' symbol somewhere in it. I want to find the position of that symbol in that array. Is there any way to find this?
Thanks in Advance!
[EDIT]
I tried below code and it worked fine.
byte[] b = {55,37,66};
String s = new String(b);
System.out.println(s.indexOf("%"));
I have a doubt. Is every character takes exactly one byte in java?

A correct and more direct Guava solution:
Bytes.indexOf(byteArray, (byte) '%');

using Google Guava:
com.google.common.primitives.Bytes.asList(byteArray).indexOf(Byte.valueOf('%'))

I come from the future with some streaming and lambda stuff.
If it's just a matter of finding a byte in a byte[]:
Input:
byte[] bytes = {55,37,66};
byte findByte = '%';
With streaming and lambda stuff:
OptionalInt firstMatch = IntStream.range(0, bytes.length).filter(i -> bytes[i] == findByte).findFirst();
int index = firstMatch.isPresent ? firstMatch.getAsInt() : -1;
Which is pretty much the same as:
Actually, I think I still just prefer this. (e.g. and put it in some utility class).
int index = -1;
for (int i = 0 ; i < bytes.length ; i++)
if (bytes[i] == findByte)
{
index = i;
break;
}
EDIT
Your question is actually more about finding a character rather than finding a byte.
What could be improved in your solution:
String s = new String(bytes); // will not always give the same result
// there is an invisible 2nd argument : i.e. charset
String s = new String(bytes, charset); // default charset depends on your system.
So, your program may act different on different platforms.
Some charsets use 1 byte per character, others use 2, 3, ... or are irregular.
So, the size of your string may vary from platform to platform.
Secondly, some byte sequences cannot be represented as strings at all. i.e. if the charset does not have a character for the matching value.
So, how could you improve it:
If you just know that your byte array will always contain plain old ascii values, you could use this:
byte[] b = {55,37,66};
String s = new String(b, StandardCharsets.US_ASCII);
System.out.println(s.indexOf("%"));
On the other hand, if you know that your content contains UTF-8 characters, use :
byte[] b = {55,37,66};
String s = new String(b, StandardCharsets.UTF-8);
System.out.println(s.indexOf("%"));
etc ...

Related

Get bytes from the Int returned from socket intputStream read()

I have an InputStream and I want to read each char until I find a comma "," from a socket.
Heres my code
private static Packet readPacket(InputStream is) throws Exception
{
int ch;
Packet p = new Packet();
String type = "";
while((ch = is.read()) != 44) //44 is the "," in ISO-8859-1 codification
{
if(ch == -1)
throw new IOException("EOF");
type += new String(ch, "ISO-8859-1"); //<----DOES NOT COMPILE
}
...
}
String constructor does not receive an int, only an array of bytes. I read the documentation and the it says
read():
Reads the next byte of data from the input stream.
How can I convert this int to byte then ? Is it using only the less significant bits (8 bits) of all 32 bits of the int ?
Since Im working with Java, I want to keep it full plataform compatible (little endian vs big endian, etc...) Whats the best approach here and why ?
PS: I dont want to use any ready-to-use classes like DataInputStream, etc....

The String constructor takes a char[] (an array)
type += new String(new byte[] { (byte) ch }, "ISO-8859-1");
Btw. it would be more elegant to use a StringBuilder for type and make use of its append-methods. Its faster and also shows the intend better:
private static Packet readPacket(InputStream is) throws Exception {
int ch;
Packet p = new Packet();
StringBuilder type = new StringBuilder();
while((ch = is.read()) != 44) {
if(ch == -1)
throw new IOException("EOF");
// NOTE: conversion from byte to char here is iffy, this works for ISO8859-1/US-ASCII
// but fails horribly for UTF etc.
type.append((char) ch);
}
String data = type.toString();
...
}
Also, to make it more flexible (e.g. work with other character encodings), your method would better take an InputStreamReader that handles the conversion from bytes to characters for you (take look at InputStreamReader(InputStream, Charset) constructor's javadoc).

For this can use an InputStreamReader, which can read encoded character data from a raw byte stream:
InputStreamReader reader = new InputStreamReader(is, "ISO-8859-1");
You may now use reader.read(), which will consume the correct number of bytes from is, decode as ISO-8859-1, and return a Unicode code point that can be correctly cast to a char.
Edit: Responding to comment about not using any "ready-to-use" classes:
I don't know if InputStreamReader counts. If it does, check out Durandal's answer, which is sufficient for certain single byte encodings (like US-ASCII, arguable, or ISO-8859-1).
For multibyte encodings, if you do not want to use any other classes, you would first buffer all data into a byte[] array, then construct a String from that.
Edit: Responding to a related question in the comments on Abhishek's answer.
Q:
Abhishek wrote: Can you please enlighten me a little more? i have tried casting integer ASCII to character..it has worked..can you kindly tell where did i go wrong?
A:
You didn't go "wrong", per se. The reason ASCII works is the same reason that Brian pointed out that ISO-8859-1 works. US-ASCII is a single byte encoding, and bytes 0x00-0x7f have the same value as their corresponding Unicode code points. So a cast to char is conceptually incorrect, but in practice, since the values are the same, it works. Same with ISO-8859-1; bytes 0x00-0xff have the same value as their corresponding code points in that encoding. A cast to char would not work in e.g. IBM01141 (a single byte encoding but with different values).
And, of course, a single byte to char cast would not work for multibyte encodings like UTF-16, as more than one input byte must be read (a variable number, in fact) to determine the correct value of a corresponding char.

type += new String(String.valueOf(ch).getBytes("ISO-8859-1"));

Partial answer: Try replacing :
type += new String(ch, "ISO-8859-1");
by
type+=(char)ch;
This can be done if you receive the ASCII value of the char.Code converts ASCII in to char by casting.
Its better to avoid lengthy code and this would work just fine. The read() function works in many ways:
One way is: int= inpstr.read();
Second inpstr.read(byte)
So its up to you which method you wanna use.. both have different purpose..

Convert a single byte to a string?

This is simply to error check my code, but I would like to convert a single byte out of a byte array to a string. Does anyone know how to do this? This is what I have so far:
recBuf = read( 5 );
Log.i( TAG, (String)recBuf[0] );
But of course this doesn't work.
I have googled around a bit but have only found ways to convert an entire byte[] array to a string...
new String( recBuf );
I know I could just do that, and then sift through the string, but it would make my task easier if I knew how to operate this way.

You can make a new byte array with a single byte:
new String(new byte[] { recBuf[0] })

Use toString method of Byte
String s=Byte.toString(recBuf[0] );
Try above , it works.
Example:
byte b=14;
String s=Byte.toString(b );
System.out.println("String value="+ s);
Output:
String value=14

There's a String constructor of the form String(byte[] bytes, int offset, int length). You can always use that for your conversion.
So, for example:
byte[] bite = new byte[]{65,67,68};
for(int index = 0; index < bite.length; index++)
System.out.println(new String(bite, index,1));

What about converting it to char? or simply
new String(buffer[0])

public static String toString (byte value)
Since: API Level 1
Returns a string containing a concise, human-readable description of the specified byte value.
Parameters
value the byte to convert to a string.
Returns
a printable representation of value.]1
this is how you can convert single byte to string try code as per your requirement

Edit:
Hows about
""+ recBuf[0];//Hacky... not sure if would work
((Byte)recBuf[0]).toString();
Pretty sure that would work.

Another alternate could be converting byte to char and finally string
Log.i(TAG, Character.toString((char) recBuf[0]));
Or
Log.i(TAG, String.valueOf((char) recBuf[0]));

You're assuming that you're using 8bit character encoding (like ASCII) and this would be wrong for many others.
But with your assumption you might just as well using simple cast to character like
char yourChar = (char) yourByte;
or if really need String:
String string = String.valueOf((char)yourByte);

How to convert 1s and 0s to String?

Please have a look at the following machine code
‎0111001101110100011100100110010101110011011100110110010101100100
This means something. I need to convert this to string. When I use Integer.parseInt() with the above as the string and 2 as the radix(to convert it to bytes), it gives number format exception.
And I believe I have to seperate this into sets of 8 pieces (like ‎01110011 , 10111010, etc). Am I correct?
Please help me to convert this correctly to string.
Thanks

final String s =
"0111001101110100011100100110010101110011011100110110010101100100";
final StringBuilder b = new StringBuilder();
for (int i = 0; i < s.length(); i+=8)
b.append((char)Integer.parseInt(s.substring(i,i+8),2));
System.out.println(b);
prints "stressed"

A shorter way of reading large integers is to use BigInteger
final String s = "0111001101110100011100100110010101110011011100110110010101100100";
System.out.println(new String(new BigInteger('0'+s, 2).toByteArray(), 0));
prints
stressed

It depends on the encoding of the String.
An ASCII coded string uses 1 byte for each character while a unicode coded string takes 2 bytes for each character. There are many other types of encodings. The binary layout differs for each encoding.
So you need to find the encoding that was used to write this string to binary format

Length of Strings regarding XOR operation for byte array

I am creating an encryption algorithm and is to XOR two strings. While I know how to XOR the two strings the problem is the length. I have two byte arrays one for the plain text which is of a variable size and then the key which is of 56 bytes lets say. What I want to know is what is the correct method of XORing the two strings. Concatenate them into one String in Binary and XOR the two values? Have each byte array position XOR a concatenated Binary value of the key and such. Any help is greatly appreciated.
Regards,
Milinda

To encode just move through the array of bytes from the plain text, repeating the key as necessary with the mod % operator. Be sure to use the same character set at both ends.
Conceptually we're repeating the key like this, ignoring encoding.
hello world, there are sheep
secretsecretsecretsecretsecr
Encrypt
String plainText = "hello world, there are sheep";
Charset charSet = Charset.forName("UTF-8");
byte[] plainBytes = plainText.getBytes(charSet);
String key = "secret";
byte[] keyBytes = key.getBytes(charSet);
byte[] cipherBytes = new byte[plainBytes.length];
for (int i = 0; i < plainBytes.length; i++) {
cipherBytes[i] = (byte) (plainBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
String cipherText = new String(cipherBytes, charSet);
System.out.println(cipherText);
To decrypt just reverse the process.
// decode
for (int i = 0; i < cipherBytes.length; i++) {
plainBytes[i] = (byte) (cipherBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
plainText = new String(plainBytes, charSet); // <= make sure same charset both ends
System.out.println(plainText);

(As noted in comments, you shouldn't use this for anything real. Proper cryptography is incredibly hard to do properly from scratch - don't do it yourself, use existing implementations.)
There's no such concept as "XOR" when it comes to strings, really. XOR specifies the result given two bits, and text isn't made up of bits - it's made up of characters.
Now you could just take the Unicode representation of each character (an integer) and XOR those integers together - but the result may well be a sequence of integers which is not a valid Unicode representation of any valid string.
It's not clear that you're even thinking in the right way to start with - you talk about having strings, but also having 56 bytes. You may have an encoded representation of a string (e.g. the result of converting a string to UTF-8) but that's not the same thing.
If you've got two byte arrays, you can easily XOR those together - and perhaps cycle back to the start of one of them if it's shorter than the other, so that the result is always the same length as the longer array. However, even if both inputs are (say) UTF-8 encoded text, the result often won't be valid UTF-8 encoded text. If you must have the result in text form, I'd suggest using Base64 at that point - there's a public domain base64 encoder which has a simple API.

My java class implementation of XOR encryption has gone wrong

I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}

Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.

If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.