Input string compressed as string - java

I want to compress/transform a string as new string.
i.e.:
input string:
USERNAME/REGISTERID
output string after compress:
<some-string-in-UTF8-format>
output string after decompress:
USERNAME/REGISTERID
There are some compress or hash method for this transformation?
I prefer some solution using Java or an algorithm with basic process steps.
I already read and try to use Huffman transformation, but the compressed output are composed by bytes outbound UTF-8 charset.

You could use ZipOutputStream.
ByteArrayOutputStream result = new ByteArrayOutputStream();
new ZipOutputStream(result).write("myString".getBytes());
byte[] bytes = result.toByteArray();
You just have to figure out the right string encoding. This case be done with a Base64 representation.

Take a look at Base64, commons-codec, etc.
Commons-code provides a very simple Base64 class to use.
You can't use a hash function as hashing functions are typically meant to be one-way only: i.e. given a MD5 or SHA1 hash, you should not be able to decode it to find out what the source message was.

See iconv and mb_convert_encoding. For encoding, maybe consider base64_encode.

if you have database ids for your identifiers as your names suggests, why not using this number as encoding ? (put it as string if you like).
You shouldn't hope to get better compression using compression algorithms as they all need some headers and the header size by itself is probably longer than your input string.

It looks like someone is asking you to obfuscate username/password combinations. This is probably not a good idea, since it suggests security where there is none. You might as well implement a ROT13 encryption for this and use double ROT13 to decrypt.

Related

base64 url safe removes =

The following code(using commons codec Base64):
byte[] a = Hex.decodeHex("9349c513ed080dab".toCharArray());
System.out.println(Base64.encodeBase64URLSafeString(a));
System.out.println(Base64.encodeBase64String(a));
gives the following output:
k0nFE-0IDas //should be k0nFE-0IDas=
k0nFE+0IDas=
Base64.encodeBase64URLSafeString(a) returns k0nFE-0IDas instead of k0nFE-0IDas=. Why is this happening?
Why is this happening?
Because that's what it's documented to do:
Note: no padding is added.
The = characters at the end of a base64 string are called padding. They're used to make sure that the final string's length is a multiple of 4 characters - but they're not really required, in terms of information theory, so it's reasonable to remove them so long as you then convert the data back to binary using a method which doesn't expect padding. The Apache Codec Base64 class claims it transparently handles both regular and URL-safe base64, so presumably does handle a lack of padding.

When is encoding being relevant in Java?

This might be a bit beginner question but it's fairly relevant considering debbuging encoding in Java: At what point is an encoding being relevant to a String object?
Consider I have a String object that I want to save to a file. Is the String object itself using some sort of encoding I should manipulate or this encoding will only be informed by me when I create a stream of bytes to save?
The same applies to importing: when I open a file and get it's bytes, I assume there's no encoding at hand, only bytes. When I parse this bytes to a String, I got to use an encoding to understand what characters are they. After I parse those bytes, the String (in memory) has some sort of meta information with the encoding or this is only being handled by the JVM?
This is vital considering I'm having file import/export issues and I got to understand at which point I should worry about getting the right encoding.
Hope I explained my doubt well, and thank you in advance!
Java strings do not have explicit encoding information. They don't know where they came from, and they don't know where they are going. All Java strings are stored internally as UTF-16.
You (optionally) specify what encoding to use whenever you want to turn a String into a sequence of bytes (e.g., to save to a file), or when you want to turn a sequence of bytes (e.g., read from a file) into a String.
Encoding is important to String when you are de/serializing from disk or the web. There are multiple text file formats: ascii, latin-1, utf-8/16 (I believe there may be two utf-16 formats, but I'm not 100%)
See InputStreamReader for how to load a String from text encoded in a non-default format

How to extract byte-array from one xml and store it in another in Java

So I am using DocumentBuilderFactory and DocumentBuilder to parse an xml.
So it is DOM parser.
But what I am trying to do is extract byte-array data (its an image encoded in base64)
Store it in one object and later in code write it out to another xml encoded in base64.
What is the best way to store this in btw.
Store it as string? or as ByteArray?
How can I extract byte array data in best way and write it out.
I am not experienced with this so wanted to get opinion from the group.
UPDATE: I am given XML I do not have control of incoming XML that comes in binary64 encoded
< byte-array >
... base64 encoded image ...
< /byte-array >
Using parser I have I need to store this node and question is should that be byte or string
and then writing it out to another node in new xml. again in base64 encoding.
thanks
The image should be stored in the first xml as a string. Perhaps something like this:
<img src="data:image/gif;base64,sssssssssssss"/>
If you need to write the same data to the second xml just use the same string which is already encoded. If you need to change the image. Get the attribute (element.getAttribute("src")), decode it with one of the many libraries (apache commons codec) and then reencode it as a string for the second xml.
UPDATE RESPONSE:
As to your update. Inside the <byte-array> element you should have plain text. It could be stored as text and then used as text in the second xml.
Base64 encoding is generally used when you need transport the data over text based protocols like http. What Base64 encoding does is encodes the binary data into set of characters which can be sent over text based protocol without any encoding/decoding problems.
Not sure if you are sending the xml over the wire, but you can use any of the folloiwng methods
Send the base64 string as simple string. But in this case the onus of encoding and decoding will on sending and receiving application programs.
Use standard base64Binary xml type. In this case, the parser will take care of decoding the string.
There's a class in Apache Commons that will help you ensure data integrity with Base64:
import org.apache.commons.codec.binary.Base64;
String yourString = "testing";
byte[] encoded = Base64.encodeBase64(yourString.getBytes());
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html
Although, Base64 data is simply alpha characters in addition to + and /, so there shouldn't be any data loss if you store it in a String.

byte array to string in java

SecureRandom random = SecureRandom.getInstance("SHA1PRNG");
byte[] salt = new byte[16];
random.nextBytes(salt);
I would like to convert salt to a string to store/read. I don't seem to be able to get this to work. I have read that I need to use the right encoding but I'm not sure what encoding to use. I have tried the following but get junk:
String s = new String(salt, "UTF-8");
String s = new String(salt, "UTF-16");
String s = new String(salt);
Edit: For context, I'm trying to work through and understand this code. I'm trying to view the salt and password so I can monkey with the code.
You need to use Base64 (Apache Commons) class or sun.misc.BASE64Encoder/BASE64Decode to encode the byte array.
Like AVD says, the solution is to use Base64 encoding or some other binary-as-text encoding. (For example, Hex encoding.)
Why? Because binary data is not text!
What you are currently doing is telling the String constructor that the bytes are text that has been correctly encoded as UTF-8 or UTF-16 or (in the last case) the platform's default encoding. This is patently false. The "junk" you are seeing is what you get if you attempt to decode random binary stuff as text.
Worse still, the decoding process is probably lossy when you apply it to random binary data. For instance, some sequences of bytes are simply invalid if you try to treat them as UTF-8. (The spec for UTF-8 says so!) When the UTF-8 decoder sees one of these invalid sequences, it replaces it with a character (such as a '?') that means "invalid character". If you then turn the characters in the string back into bytes, you will get a different byte sequence to the one that you started with. That's probably a disaster for your use-case.

Why does the Blowfish output in Java and PHP differ by only 2 chars?

I have a blowfish encryption script in PHP and JAVA vice versa that was working fine until today when I came across a problem.
The same content is encrypted differently in Java vs PHP by only 2 chars, which is really weird.
PHP
wTHzxfxLHdMm/JMFnoh0hciS/JADvFFg
Java
wTHzxfxLHdMm/JMFnoh0hciS/D8DvFFg
-------------------------^^
As you see those two positions do not match. Unfortunately the value is a real email address and I can't share it. Also I was not able to reproduce the problem with other few values I've tested. I've tried changing Base64 encode classes on Java, and that neither helped.
The source code for PHP is here, and for Java is here.
What could I do to resolve this problem?
Let's have a look at your Java code:
String c = new String(Test.encrypt((new String("thevalue")).getBytes(),
(new String("mykey")).getBytes()));
...
System.out.println("Base64 encoded String:" +
new sun.misc.BASE64Encoder().encode(c.getBytes()));
What you are doing here is:
Convert the plaintext string to bytes, using the system's default encoding
convert the key to bytes, using the system's default encoding
encrypt the bytes
convert the encrypted bytes back to a string, using the system's default encoding
convert the encrypted string back to bytes, using the system's default encoding
encode these encrypted bytes using Base64.
The problem is in step 4. It assumes that an arbitrary byte array represents a string in your system's default encoding, and encoding this string back gives the same byte[]. This is valid for some encodings (the ISO-8859 series, for example), but not for others. In Java, when some byte (or byte sequence) is not representable in the given encoding, it will be replaced by some other character, which later for reconverting will be mapped to byte 63 (ASCII ?). Actually, the documentation even says:
The behavior of this constructor when the given bytes are not valid in the default charset is unspecified.
In your case, there is no reason to do this at all - simply use the bytes which your encrypt method outputs directly to convert them to Base64.
byte[] encrypted = Test.encrypt("thevalue".getBytes(),
"mykey".getBytes());
System.out.println("Base64 encoded String:"+ new sun.misc.BASE64Encoder().encode(encrypted));
(Also note that I removed the superfluous new String("...") constructor calls here, though this does not relate to your problem.)
The point to remember: Never ever convert an arbitrary byte[], which did not come from encoding a string, to a string. Output of an encryption algorithm (and most other cryptographic algorithms, except decryption) certainly belongs to the category of data which should not be converted to a string.
And never ever use the System's default encoding, if you want portable programs.
Your code seems right to me.
It looks like you have a trailing white space in the input to one of these programs, and it is only one. I'll tell you why:
Each of these 4-char blocks represent 3 characters in the encrypted string. Th different part (JA and D8 in the 7th block) actually come from a single different character.
wTHz xfxL HdMm /JMF noh0 hciS /JAD vFFg
wTHz xfxL HdMm /JMF noh0 hciS /D8D vFFg
If I have got it right your email address is 19 characters long. The 20th character in one of your input strings is a white space.
Question: Have you tried the associated PHP decryption library to decrypt the PHP generated encrypted text? Have you tried the associated JAVA decryption library to decrypt the JAVA encrypted text?
If both produce differing outputs, then one MUST fail decrypting.
Is that one PHP, or Java?
Whichever one it is -- I would try to duplicate another such failure with a publicly shareable string... give that string as a unit test -- to the developer or developers that created the encrypt/decrypt code in the language that the round-trip encrypt/decrypt fails in.
Then... wait for them to fix it.
Not sure of any faster solutions -- except maybe change encryption/decryption library providers... or roll your own...

Categories