Set my own rule of converting char array to String in Java

Set my own rule of converting char array to String in Java - java

I am actually writing a program to generate some truely random numbers. So, i am trying to write an algorithm to calculate from various factor.And i want to define my own encoding, so that when it is converted, it can only be strings of certain characters.
for example, the user wants only small letters, so i want something like this:
char[] result;
String result = new String(result,"MY OWN ENCODING")
Is there a way ?
Thanks alot !

You probably want to create a converter class:
public class Converter {
private String charset;
public Converter(String charset) {this.charset = charset;}
public char[] convert(char[] input) {
char[] result = new char[input.length];
// do your own magic
return result;
}
}
Once the magic converter logic is implemented and in place:
Converter converter = new Converter("abcdefghijklmnopqrstuvwxyz");
char[] result = converter.convert(myCharArray);

You seem to have a misunderstanding of what a "character encoding" is.
Character encodings, which you can specify as the second argument in the String(byte[], String) constructor, do not have anything to do with what you seem to have in mind.
A character encoding is what specifies how characters in a string are converted to raw bytes and vice versa. For example the ASCII encoding specifies that the a byte with the value 65 represents the character 'A', 66 is 'B' etc.
A character encoding is not something that you can use to "convert" or "encode" a char[] with into a String object, and is not something that can filter out for example non-lower-case characters or transform arbitrary characters to lower-case characters. Forget about character encodings for this purpose, because that's not what character encodings are for.
Just write some method that converts the output of your random generator to characters in a suitable range. Here's a simple example of a program that generates a string from random lower-case characters:
public class Example {
public static void main(String[] args) {
System.out.println(generate(10));
}
public static String generate(int len) {
Random random = new Random();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < len; ++i) {
sb.append((char)('a' + random.nextInt(26)));
}
return sb.toString();
}
}

Related

Encrypt using reapeting XOR

i have to encrypy a string using repeating XOR with the KEY:"ICE".
I think that i made a correct algorith to do it but the solution of the problem has 5 byte less then my calculated Hex string, why? Until this 5 bytes more the string are equals.
Did i miss something how to do repeating XOR?
public class ES5 {
public static void main(String[] args) throws UnsupportedEncodingException {
String str1 = "Burning 'em, if you ain't quick and nimble";
String str2 = "I go crazy when I hear a cymbal";
String correct1 = "0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a2622632427276527";
byte[] cr = Encript(str1.getBytes(StandardCharsets.UTF_8),"ICE");
String cr22 = HexFormat.of().formatHex(cr);
System.out.println(cr22);
System.out.println(correct1);
}
private static byte doXOR(byte b, byte b1) {
return (byte) (b^b1);
}
private static byte[] Encript(byte[] bt1, String ice) {
int x = 0;
byte[] rt = new byte[bt1.length];
for (int i=0;i< bt1.length;i++){
rt[i] = doXOR(bt1[i],(byte) (ice.charAt(x) & 0x00FF));
x++;
if(x==3)x=0;
}
return rt;
}
}

Hmmm. The String contains characters, and XOR works on bytes.
That's why the first thing is to run String.getBytes() to receive a byte array.
Here, depending on the characters and their encoding the amount of bytes can be more than the amount of characters. You may want to print and compare the numbers already.
Then you perform XOR on the bytes, which may bring you into a completely different area for characters - so you cannot rely on new String(byte[]) at all. Instead you have to create a HEX string representation of the byte[].
Finally compare this HEX string with the value in correct. To me that string already looks like a HEX representation, so do not apply HEX again.

Converting from Hexadecimal to Bytes

I have some Java code that converts a Hexadecimal string into bytes. It seems to work okay for very short hexadecimal strings but flags an error if I use a long string, but I cant figure out why. I'm new to Java and programming in general. Feel free to point out any other areas which I could improve.
Here is my code:
public class Hextobinary {
static String hexToBinary(String hex) {
int i = Integer.parseInt(hex, 16);
String bin = Integer.toBinaryString(i);
return bin;
}
public static void main(String[] args) {
String h = "5F";
String x = hexToBinary(h);
System.out.println(x);
}
}
Many Thanks

There is a built-in for this using DatatypeConverter, so you may not have to do it yourself.
import javax.xml.bind.DatatypeConverter;
public class HexUtils {
public String toHex(final byte[] arr) {
return DatatypeConverter.printHexBinary(arr);
}
public byte[] fromHex(final String str) {
return DatatypeConverter.parseHexBinary(str);
}
}

You are parsing your string to an int. That will work for short hex strings, but not for longer ones. An int is 32 bits, or 8 hex characters. Any string longer than that will not fit into an int.
If you do write your own method, then split the hex string up into two character chunks, and process each pair of characters separately into a byte, and store the bytes in a byte array. That will allow you to deal with longer hex strings.

If you are using huge strings, the type int (Integer) of the variable i cannot store the value contained in the string hex. An Integer can only store values ranging from -80000000 (hexadecimal) to +7FFFFFFF. Any longer string will cause your function to produce false results.
One quick solution is to use the type Long (and the function parseLong) instead of Integer. The type Long can hold values ranging from -8000000000000000 (hexadecimal) to +7FFFFFFFFFFFFFFF. But if you need to convert longer strings, this is not going to work anymore.

How to replace/remove 4(+)-byte characters from a UTF-8 string in Java?

Because MySQL 5.1 does not support 4 byte UTF-8 sequences, I need to replace/drop the 4 byte sequences in these strings.
I'm looking a clean way to replace these characters.
Apache libraries are replacing the characters with a question-mark is fine for this case, although ASCII equivalent would be nicer, of course.
N.B. The input is from external sources (e-mail names) and upgrading the database is not a solution at this point in time.

We ended up implementing the following method in Java for this problem.
Basicaly replacing the characters with a higher codepoint then the last 3byte UTF-8 char.
The offset calculations are to make sure we stay on the unicode code points.
public static final String LAST_3_BYTE_UTF_CHAR = "\uFFFF";
public static final String REPLACEMENT_CHAR = "\uFFFD";
public static String toValid3ByteUTF8String(String s) {
final int length = s.length();
StringBuilder b = new StringBuilder(length);
for (int offset = 0; offset < length; ) {
final int codepoint = s.codePointAt(offset);
// do something with the codepoint
if (codepoint > CharUtils.LAST_3_BYTE_UTF_CHAR.codePointAt(0)) {
b.append(CharUtils.REPLACEMENT_CHAR);
} else {
if (Character.isValidCodePoint(codepoint)) {
b.appendCodePoint(codepoint);
} else {
b.append(CharUtils.REPLACEMENT_CHAR);
}
}
offset += Character.charCount(codepoint);
}
return b.toString();
}

Another simple solution is to use regular expression [^\u0000-\uFFFF]. For example in java:
text.replaceAll("[^\\u0000-\\uFFFF]", "\uFFFD");

5 byte utf-8 sequences begin with a 111110xx-byte and 6 byte utf-8 sequences begin with a 1111110x-byte. Important to note is, that no follow-up bytes of 1-4-byte utf-8 sequences contain bytes that large because follow-up bytes are always of the form 10xxxxxx.
Therefore you can just go through the bytes and every time you see a byte of kind 111110xx then only emit a '?' to the output-stream/array while skipping the next 4 bytes from the input; analogue for the 6-byte-sequences.

My java class implementation of XOR encryption has gone wrong

I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}

Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.

If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.

Get unicode value of a character

Is there any way in Java so that I can get Unicode equivalent of any character? e.g.
Suppose a method getUnicode(char c). A call getUnicode('÷') should return \u00f7.

You can do it for any Java char using the one liner here:
System.out.println( "\\u" + Integer.toHexString('÷' | 0x10000).substring(1) );
But it's only going to work for the Unicode characters up to Unicode 3.0, which is why I precised you could do it for any Java char.
Because Java was designed way before Unicode 3.1 came and hence Java's char primitive is inadequate to represent Unicode 3.1 and up: there's not a "one Unicode character to one Java char" mapping anymore (instead a monstrous hack is used).
So you really have to check your requirements here: do you need to support Java char or any possible Unicode character?

If you have Java 5, use char c = ...; String s = String.format ("\\u%04x", (int)c);
If your source isn't a Unicode character (char) but a String, you must use charAt(index) to get the Unicode character at position index.
Don't use codePointAt(index) because that will return 24bit values (full Unicode) which can't be represented with just 4 hex digits (it needs 6). See the docs for an explanation.
[EDIT] To make it clear: This answer doesn't use Unicode but the method which Java uses to represent Unicode characters (i.e. surrogate pairs) since char is 16bit and Unicode is 24bit. The question should be: "How can I convert char to a 4-digit hex number", since it's not (really) about Unicode.

private static String toUnicode(char ch) {
return String.format("\\u%04x", (int) ch);
}

char c = 'a';
String a = Integer.toHexString(c); // gives you---> a = "61"

I found this nice code on web.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class Unicode {
public static void main(String[] args) {
System.out.println("Use CTRL+C to quite to program.");
// Create the reader for reading in the text typed in the console.
InputStreamReader inputStreamReader = new InputStreamReader(System.in);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
try {
String line = null;
while ((line = bufferedReader.readLine()).length() > 0) {
for (int index = 0; index < line.length(); index++) {
// Convert the integer to a hexadecimal code.
String hexCode = Integer.toHexString(line.codePointAt(index)).toUpperCase();
// but the it must be a four number value.
String hexCodeWithAllLeadingZeros = "0000" + hexCode;
String hexCodeWithLeadingZeros = hexCodeWithAllLeadingZeros.substring(hexCodeWithAllLeadingZeros.length()-4);
System.out.println("\\u" + hexCodeWithLeadingZeros);
}
}
} catch (IOException ioException) {
ioException.printStackTrace();
}
}
}
Original Article

are you picky with using Unicode because with java its more simple if you write your program to use "dec" value or (HTML-Code) then you can simply cast data types between char and int
char a = 98;
char b = 'b';
char c = (char) (b+0002);
System.out.println(a);
System.out.println((int)b);
System.out.println((int)c);
System.out.println(c);
Gives this output
b
98
100
d

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Set my own rule of converting char array to String in Java - java

Related

Encrypt using reapeting XOR

Converting from Hexadecimal to Bytes

How to replace/remove 4(+)-byte characters from a UTF-8 string in Java?

My java class implementation of XOR encryption has gone wrong

Get unicode value of a character

Categories

Resources