Java: Remove first UTF string from byte array

Java: Remove first UTF string from byte array - java

I'm trying to remove a written string from a byte array and maintain the original separate objects:
byte[] data... // this is populated with the following:
// 00094E6966747943686174001C00074D657373616765000B4372616674656446757279000474657374
// to string using converter : " ChannelMessageUsernametest"
// notice that tab/whitespace, ignore quotes
// The byte array was compiled by writing the following (writeUTF from a writer):
// Channel
// Message
// Username
// test
Now I'm trying to strip Channel from the byte array:
ByteArrayDataInput input = ByteStreams.newDataInput(message);
String channel = input.readUTF(); // Channel, don't want this
String message = input.readUTF(); // Message
// works good, but I don't want Channel,
// and I can't remove it from the data before it arrives,
// I have to work with what I have
Here is my problem:
byte[] newData = Arrays.copyOfRange(data, channel.length() + 2, data.length)
// I use Arrays.copyOfRange to strip the whitespace (I assume it's not needed)
// As well, since it's inclusive of length size, I have to add 1 more,
// resulting in channel.length() + 1
// ...
ByteArrayDataInput newInput = ByteStreams.newDataInput(message);
String channel = newInput.readUTF(); // MessageUsernametext
See how I lose the separation of the objects, how can I keep the original "sections" of objects in the original byte[] data inside byte[] newData.
It's safe to assume that String channel (before and after stripping) is a string
It's NOT safe to assume that every object is a string, assume everything is random, because it is

As long as you can guarantee that channel is always in a reasonable character range (for example alphanumeric), changing the channel.length() + 2 to channel.length() + 4 should be sufficient.

Java Strings have 16-bit elements, so it is safe to convert a byte array into a String, although not as memory efficient:
private byte[] removeElements(byte[] data, int fromIndex, int len) {
String str1 = new String(data).substring(0,fromIndex);
String str2 = new String(data).substring(fromIndex+len,data.length);
return (str1+str2).getBytes();
}
In the same manner, you can also search for a String inside the byte array:
private int findStringInByteArray(byte[] mainByte, String str, int fromIndex) {
String main = new String(mainByte);
return main.indexOf(str,fromIndex);
}
Now you can call these methods together:
byte[] newData = removeElements(
data,
findStringInByteArray(data,channel,0),
channel.length() );

Related

Splitting a string with byte length limits in java

I want to split a String to a String[] array, whose elements meet following conditions.
s.getBytes(encoding).length should not exceed maxsize(int).
If I join the splitted strings with StringBuilder or + operator, the result should be exactly the original string.
The input string may have unicode characters which can have multiple bytes when encoded in e.g. UTF-8.
The desired prototype is shown below.
public static String[] SplitStringByByteLength(String src,String encoding, int maxsize)
And the testing code:
public boolean isNice(String str, String encoding, int max)
{
//boolean success=true;
StringBuilder b=new StringBuilder();
String[] splitted= SplitStringByByteLength(str,encoding,max);
for(String s: splitted)
{
if(s.getBytes(encoding).length>max)
return false;
b.append(s);
}
if(str.compareTo(b.toString()!=0)
return false;
return true;
}
Though it seems easy when the input string has only ASCII characters, the fact that it could cobtain multibyte characters makes me confused.
Thank you in advance.
Edit: I added my code impementation. (Inefficient)
public static String[] SplitStringByByteLength(String src,String encoding, int maxsize) throws UnsupportedEncodingException
{
ArrayList<String> splitted=new ArrayList<String>();
StringBuilder builder=new StringBuilder();
//int l=0;
int i=0;
while(true)
{
String tmp=builder.toString();
char c=src.charAt(i);
if(c=='\0')
break;
builder.append(c);
if(builder.toString().getBytes(encoding).length>maxsize)
{
splitted.add(new String(tmp));
builder=new StringBuilder();
}
++i;
}
return splitted.toArray(new String[splitted.size()]);
}
Is this the only way to solve this problem?

The class CharsetEncode has provision for your requirement. Extract from the Javadoc of the Encode method:
public final CoderResult encode(CharBuffer in,
ByteBuffer out,
boolean endOfInput)
Encodes as many characters as possible from the given input buffer, writing the results to the given output buffer...
In addition to reading characters from the input buffer and writing bytes to the output buffer, this method returns a CoderResult object to describe its reason for termination:
...
CoderResult.OVERFLOW indicates that there is insufficient space in the output buffer to encode any more characters. This method should be invoked again with an output buffer that has more remaining bytes. This is typically done by draining any encoded bytes from the output buffer.
A possible code could be:
public static String[] SplitStringByByteLength(String src,String encoding, int maxsize) {
Charset cs = Charset.forName(encoding);
CharsetEncoder coder = cs.newEncoder();
ByteBuffer out = ByteBuffer.allocate(maxsize); // output buffer of required size
CharBuffer in = CharBuffer.wrap(src);
List<String> ss = new ArrayList<>(); // a list to store the chunks
int pos = 0;
while(true) {
CoderResult cr = coder.encode(in, out, true); // try to encode as much as possible
int newpos = src.length() - in.length();
String s = src.substring(pos, newpos);
ss.add(s); // add what has been encoded to the list
pos = newpos; // store new input position
out.rewind(); // and rewind output buffer
if (! cr.isOverflow()) {
break; // everything has been encoded
}
}
return ss.toArray(new String[0]);
}
This will split the original string in chunks that when encoded in bytes fit as much as possible in byte arrays of the given size (assuming of course that maxsize is not ridiculously small).

The problem lies in the existence of Unicode "supplementary characters" (see Javadoc of the Character class), that take up two "character places" (a surrogate pair) in a String, and you shouldn't split your String in the middle of such a pair.
An easy approach to splitting would be to stick to the worst-case that a single Unicode code point can take at most four bytes in UTF-8, and split the string after every 99 code points (using string.offsetByCodePoints(pos, 99) ). In most cases, you won't fill the 400 bytes, but you'll be on the safe side.
Some words about code points and characters
When Java started, Unicode had less than 65536 characters, so Java decided that 16 bits were enough for a character. Later the Unicode standard exceeded the 16-bit limit, and Java had a problem: a single Unicode element (now called a "code point") no longer fit into a single Java character.
They decided to go for an encoding into 16-bit entities, being 1:1 for most usual code points, and occupying two "characters" for the exotic code points beyond the 16-bit limit (the pair built from so-called "surrogate characters" from a spare code range below 65535). So now it can happen that e.g. string.charAt(5) and string.charAt(6) must be seen in combination, as a "surrogate pair", together encoding one Unicode code point.
That's the reason why you shouldn't split a string at an arbitrary index.
To help the application programmer, the String class then got a new set of methods, working in code point units, and e.g. string.offsetByCodePoints(pos, 99) means: from the index pos, advance by 99 code points forward, giving an index that will often be pos+99 (in case the string doesn't contain anything exotic), but might be up to pos+198, if all the following string elements happen to be surrogate pairs.
Using the code-point methods, you are safe not to land in the middle of a surrogate pair.

Having problems with splitting a String into max 1Mb size subStrings

I have to split a String into 1Mb size strings. With using UTF-8 as character encoding, some letters take up more than 1 byte, so for avoiding to split a character in the middle (for example 'á' is 2 byte, so can't 1 byte go to the end of one String, and 1 to the beggining of the next String)
public static List<String> cutString3(String original, int chunkSize, String encoding) throws UnsupportedEncodingException {
List<String> strings = new ArrayList<>();
final int end = original.length();
int from = 0;
int to = 0;
do {
to = (to + chunkSize > end) ? end : to + chunkSize;
String chunk = original.substring(from, to); // get chunk
while (chunk.getBytes(encoding).length > chunkSize) { // cut the chunk from the end
chunk = original.substring(from, --to);
}
strings.add(chunk); // add chunk to collection
from = to; // next chunk
} while (to < end);
return strings;
}
I'm using the above method to generate an example String:
private static String createDataSize(int msgSize) {
StringBuilder sb = new StringBuilder(msgSize);
for (int i = 0; i < msgSize; i++) {
sb.append("a");
}
return sb.toString();
}
Calling the method as the following:
String exampleString = createDataSize(1024*1024*3);
cutString(exampleString, 1024*1024, "UTF-8");
It has no problems, I get back 3 Strings, as the 3 megabyte String was splitted into 3, 1 megabyte String. But if I change the createDataSize() method's char to append 'á' to the example String, so it only stands from "áááááá..." the inner while loop in the cutString method takes forever, since it's removing every 'á' one by one, until it fits into the given size. How can I improve the inner while, or come up with something similiar solution? The String can be smaller than 1 megabyte, just not bigger!

Using the binary search logic would clearly fit your need.
Simply decrement faster, using only the half of the chunk size, if you still as some room, add an half of it, if not, remove and half. And so on.
A simpler solution would be to remove only the differences between chunk.getBytes(encoding).length and chunkSize. Then see how many byte you can still take if you want to fill it completly.

Getting Exception in Converting ByteArray to String with Fixed length

I want to convert bytes in to String.
I have one android application and I am using flatfile for data storage.
Suppose I have lots of record in my flatfile.
Here in flat file database, my record size is fixed and its 10 characters and here I am storing lots of String records sequence.
But when I read one record from the flat file, then it is fixed number of bytes for each record. Because I wrote 10 bytes for every record.
If my string is S="abc123";
then it is stored in flat file like abc123 ASCII values for each character and rest would be 0.
Means byte array should be [97 ,98 ,99 ,49 ,50 ,51,0,0,0,0].
So when I want to get my actual string from the byte array, at that time I am using below code and it is working fine.
But when I give my inputString = "1234567890" then it creates problem.
public class MainActivity extends Activity {
public static short messageNumb = 0;
public static short appID = 16;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
// record with size 10 and its in bytes.
byte[] recordBytes = new byte[10];
// fill record by 0's
Arrays.fill(recordBytes, (byte) 0);
// input string
String inputString = "abc123";
int length = 0;
int SECTOR_LENGTH = 10;
// convert in bytes
byte[] inputBytes = inputString.getBytes();
// set how many bytes we have to write.
length = SECTOR_LENGTH < inputBytes.length ? SECTOR_LENGTH
: inputBytes.length;
// copy bytes in record size.
System.arraycopy(inputBytes, 0, recordBytes, 0, length);
// Here i write this record in the file.
// Now time to read record from the file.
// Suppose i read one record from the file successfully.
// convert this read bytes to string which we wrote.
Log.d("TAG", "String is = " + getStringFromBytes(recordBytes));
}
public String getStringFromBytes(byte[] inputBytes) {
String s;
s = new String(inputBytes);
return s = s.substring(0, s.indexOf(0));
}
}
But I am getting problem when my string has complete 10 characters. At that time I have two 0's in my byte array so in this line
s = s.substring(0, s.indexOf(0));
I am getting the below exception:
java.lang.StringIndexOutOfBoundsException: length=10; regionStart=0; regionLength=-1
at java.lang.String.startEndAndLength(String.java:593)
at java.lang.String.substring(String.java:1474)
So what can I do when my string length is 10.
I have two solutions- I can check my inputBytes.length == 10 then make it not to do subString condition otherwise check contains 0 in byte array.
But i don't want to use this solution because I used this thing at lots of places in my application. So, is there any other way to achieve this thing?
Please suggest me some good solution which works in every condition. I think at last 2nd solution would be great. (check contains 0's in byte array and then apply sub string function).

public String getStringFromBytes(byte[] inputBytes) {
String s;
s = new String(inputBytes);
int zeroIndex = s.indexOf(0);
return zeroIndex < 0 ? s : s.substring(0, zeroIndex);
}

i think this line cause the error
s = s.substring(0, s.indexOf(0));
s.indexOf(0)
returns -1 , perhaps you should specifiy the ASCII code
for zero which is 48
so this will work s = s.substring(0, s.indexOf(48));
check documentation for indexOf(int)
public int indexOf (int c) Since: API Level 1 Searches in this string
for the first index of the specified character. The search for the
character starts at the beginning and moves towards the end of this
string.
Parameters c the character to find. Returns the index in this string
of the specified character, -1 if the character isn't found.

Unique Bytes from a String

I want to get bytes from a string value( for Encryption/Decryption purposes ),
I have used getBytes() method for that purpose, but every time i call getBytes() method,
it returns a new bytes of array each time.
I want a unique bytes of array for a particular string.
How ? Also i want to store that information (string or byte) in a file, and i want to get back this information in form of bytes.

getBytes() will not return new byte[] every time but contents are same.
Please check the below sample
byte[] b1 = "abc".getBytes();
byte[] b2 = "abc".getBytes();
if(b1 == b2)
{
System.out.println("Equal Not possible");//Not this
}
if(Arrays.equals(b1, b2))
{
System.out.println("Equal possible");//Gets printed
}
As contents of Array are equal here it should not make any difference in any possible kind of encryption/description algorithm in whole Java world !

If String gave you the same byte array every time, it would violate the contract of the method. Here is why:
String a = "test";
byte[] abytes1 = a.getBytes();
abytes1[0] = 0; // we are modifying the byte array.
// There is no way to prevent this!
// some other caller later on does this:
byte[] abytes2 = a.getBytes();
If abytes2 was the same array as abytes1, it would have a 0 as its first entry and would not match the value of the string. String.getBytes() must create a new array every time in case a caller decides to modify the array.

Convert Java string to byte array

I have a byte array which I'm encrypting then converting to a string so it can be transmitted. When I receive the string I then have to convert the string back into a byte array so it can be decrypted. I have checked that the received string matches the sent string (including length) but when I use something like str.getBytes() to convert it to a byte array, it does not match my original byte array.
example output:
SENT: WzShnf/fOV3NZO2nqnOXZbM1lNwVpcq3qxmXiiv6M5xqC1A3
SENT STR: [B#3e4a9a7d
RECEIVED STR: [B#3e4a9a7d
RECEIVED: W0JAM2U0YTlhN2Q=
any ideas how i can convert the received string to a byte array which matches the sent byte array?
Thanks

You used array.toString(), which is implemented like this:
return "[B#" + Integer.toString(this.hashCode(), 16);
(In fact it inherits the definition from Object, and the part before the # simply is the result of getClass().getName().)
And the hashCode here does not depend on the content.
Instead, use new String(array, encoding).
Of course, this only works for byte-arrays which are really representable as Java strings (which then contain readable characters), not for arbitrary arrays. There better use base64 like Bozho recommended (but make sure to use it on both sides of the channel).

This looks like Base64. Take a look at commons-codec Base64 class.

You can't just use getBytes() on two different machines, since getBytes uses the plattform's default charset.
Decode and encode the array with a specified charset (i.e. UTF-8) to make sure you get the correct results.

First do convertion of your byte array to proper string, by doing
String line= new String(Arrays.toString(your_array))
Then send it and use function below
public static byte[] StringToByteArray(String line)
{
String some=line.substring(1, line.length()-1);
int element_counter=1;
for(int i=0; i<some.length(); i++)
{
if (some.substring(i, i+1).equals(","))
{
element_counter++;
}
}
int [] comas =new int[element_counter-1];
byte [] a=new byte[element_counter];
if (a.length==1)
{
a[0]= Byte.parseByte(some.substring(0));
}
else
{
int j=0;
for (int i = 0; i < some.length(); i++)
{
if (some.substring(i, i+1).equals(","))
{
comas[j]=i;
j++;
}
}
for (int i=0; i<element_counter; i++)
{
if(i==0)
{
a[i]=Byte.parseByte(some.substring(0, comas[i]));
}
else if (i==element_counter-1)
{
a[i]=Byte.parseByte(some.substring(comas[comas.length-1]+2));
}
else
{
a[i]=Byte.parseByte(some.substring(comas[i-1]+2, comas[i]));
}
}
}
return a;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.