I have Base64 String. I am trying to decode it, then decompress it.
String textToDecode = "H4sIAAAAAAAAAAEgAN//0JTQtdGC0LDQu9C40LfQuNGA0L7QstCw0L3QvdGL0LmRCuyiIAAAAA==\n";
byte[] data = Base64.decode(textToDecode, Base64.DEFAULT);
String result = GzipUtil.decompress(data);
Code that I am using for decompression:
public static String decompress(byte[] compressed) throws IOException {
final int BUFFER_SIZE = 32;
ByteArrayInputStream is = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
string.append(new String(data, 0, bytesRead));
}
gis.close();
is.close();
return string.toString();
}
I should get this String:
Детализированный
Insteam of it, I am getting this String with question mark symbols:
Детализирован��ый
What is my mistake? And how to solve it?
One problem is that when converting from bytes to String (internally Unicode)
the encoding is not given. And for a multi-byte encoding like UTF-8 one cannot take a fixed number of bytes (like 32) and then at the end have a valid sequence.
You experienced the loss of evidently a half sequence. Hence the encoding probably is UTF-8.
final int BUFFER_SIZE = 32;
ByteArrayInputStream is = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
baos.write(data, 0, bytesRead);
}
gis.close();
return baos.toString("UTF-8"); // Or "Windows-1251" ...
The above does away with buffer boundary problems, and specifies the encoding, so the same code runs on different computers.
And mind:
new String(bytes, encoding)
string.getBytes(encoding)
It is possible that the problem is here:
string.append(new String(data, 0, bytesRead))
You are using the default character encoding to decode bytes into a Java String. If the (current) default encoding is different to the encoding used when encoding the original characters to bytes (prior to compression, etc), then you could get bytes that don't decode correctly. The decoder will then replace them with the decoder's replacement character; i.e. '\uFFFD' by default.
If this is the problem, then the solution is to find out what the correct character encoding is and use String(byte[], int, int, Charset) to create the String.
If you work only with streams you can avoid encoding problems, this few line of code should do the job well
public static String decompress(byte[] compressed) throws IOException {
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
try (GZIPInputStream gis = new GZIPInputStream(
new ByteArrayInputStream(compressed))) {
org.apache.commons.compress.utils.IOUtils.copy(gis, bos);
}
return bos.toString();
}
}
Related
I have been searching the web for this particular problem. Maybe i'm doing something wrong or i'm missing something here...
So i'm trying to convert a File Stream ( an Excel file ) -> mimetype ( application/octet-stream or application/vnd.ms-excel ) doesn´t matter...to a Base64 encoded string.
The reason i'm doing this is because i want to provide the File in a REST API inside a JSON object for later decoding in the browser the base64 string and download the file.
When I receivethe InputStream and save to the disk everything works fine...
Even when i use POSTMAN to get the FILE if I save the file it opens in Excel with all the right data.
THE CODE -> Used this simple example to download a file from a URL
URL url = new URL(fileURL);
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
//etc...i get response code OK(200) get file name etc
// opens input stream from the HTTP connection
InputStream inputStream = httpConn.getInputStream();
String saveFilePath1 = "C:\\test1.xlsx";
String saveFilePath2 = "C:\\test2.xlsx";
FileOutputStream outputStream = new FileOutputStream(saveFilePath1);
int bytesRead = -1;
byte[] buffer = new byte[BUFFER_SIZE];
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}
//FOR TESTING PURPOSES AT THIS POINT I HAVE SAVED THE STREAM INTO
//**test1.xlsx** SUCCESSFULLY and opens into excel and everything
//is fine.
//THE PROBLEM RESIDES HERE IN THIS NEXT PIECE OF CODE
//import org.apache.commons.codec.binary.Base64;
//I try to encode the string to Base64
String encodedBytesBase64 = Base64.encodeBase64String(buffer);
//WHEN I DO THE DECODE AND WRITE THE BYTES into test2.xlsx this file doesn´t work...
FileOutputStream fos = new FileOutputStream(saveFilePath2);
byte[] bytes = Base64.decodeBase64(encodedBytesBase64);
fos.write(bytes);
//Close streams from saved file test2
fos.close();
//Close streams from saved file test1
outputStream.close();
inputStream.close();
I even took the string to check if it is a valid Base64 String, which it is accordind to this site -> Base64 Validator
But when i try to decode the string in the same website it tells me there's a different encoding:
Is it possible this is the problem ?
I think you can ignore those warnings. Rather, the issue is here:
int bytesRead = -1;
byte[] buffer = new byte[BUFFER_SIZE];
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}
:
String encodedBytesBase64 = Base64.encodeBase64String(buffer);
As you can see in the first part, you are reusing buffer to read the input stream and write to the output stream. If this loops around more than once, buffer will be overwritten with the next chunk of data from the input stream. So, when you are encoding buffer, you are only using the last chunk of the file.
The next problem is that when you are encoding, you are encoding the full buffer array, ignoring the bytesRead.
One option might be to read the inputStream and write it to a ByteArrayOutputStream, and then encode that.
int bytesRead = -1;
byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream array = new ByteArrayOutputStream();
while ((bytesRead = inputStream.read(buffer)) != -1) {
array.write(buffer, 0, bytesRead);
}
String encoded = Base64.encodeBase64String(array.toByteArray());
Why i cant read POST-request with 150k chars?
I can only read ~15k chars all time
InputStream is = socket.getInputStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (is.available() > 0 && (length = is.read(buffer)) != -1) {
baos.write(buffer, 0, length);
}
System.out.println(baos.toString(StandardCharsets.UTF_8.name()));
UPD: if we ignored is.available(), code freezes in the while:
InputStream is = socket.getInputStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while ((length = is.read(buffer)) != -1) {
baos.write(buffer, 0, length);
}
System.out.println(baos.toString(StandardCharsets.UTF_8.name()));
There are no exceptions.
Docs for avaiable() says:
available()
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking
So I'm going to guess that you internally have 15k buffers and you're only reading up to the end of your own buffer, not to the end of the stream. You should frankly be ignoring availabe() in this case and just call read( byte[] ) until it returns -1.
Your updated code example looks almost exactly like the code I use to read streams. I think the problem must be on the sender's side. Either the sender is not closing the stream properly, or there's some network issue that doesn't allow enough packets through.
For reference, here's the code I use to read an entire stream. (Lightly tested.)
public static ByteArrayOutputStream readFully( InputStream ins )
throws IOException
{
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] bytes = new byte[ 1024 ];
for( int length; ( length = ins.read( bytes ) ) != -1; )
bos.write( bytes, 0, length );
return bos;
}
I have a byte array and I want to decompress this byte array. When I run below code it gives;
java.util.zip.ZipException: Not in GZIP format
I get this byte array from a soap webservice. When I call this webservice from soap UI it returns;
<size>491520</size>
<studentData>
<dataContent>Uy0xMDAwMF90MTAwMDAtVXNlciBTZWN1cml0eSBB........</dataContent>
</studentData>
Is there a problem with data coming from web service or my decompress method?
public static byte[] decompress(final byte[] input) throws Exception{
try (ByteArrayInputStream bin = new ByteArrayInputStream(input);
GZIPInputStream gzipper = new GZIPInputStream(bin)) {
byte[] buffer = new byte[1024];
ByteArrayOutputStream out = new ByteArrayOutputStream();
int len;
while ((len = gzipper.read(buffer)) > 0) {
out.write(buffer, 0, len);
}
gzipper.close();
out.close();
return out.toByteArray();
}
}
EDIT:
I decoded base64 and write it to a file called "test.gzip". Now I can extract this file with 7zip and I can see all student files without any problem.
String encoded = Base64.getEncoder().encodeToString(studentData.getDataContent());
byte[] decoded = Base64.getDecoder().decode(encoded);
FileOutputStream fos = new FileOutputStream("test.gzip");
fos.write(decoded);
fos.close();
But when I try to decompress this decoded file it still gives same error;
decompress(decoded);
When I received an HTTP request of smaller length it's fine, but when receiving long packet getting corrupted. I took a trace through wire shark and I printed packet in hex value in JAVA console. Some additional values are showing in that printing. Why?
How can I solve it?
Is there anything wrong with conversion of HTTP request to Hex.
Following code is used to convert String to Hex.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
InputStream responseData = request.getInputStream();
byte[] buffer = new byte[1000];
int bytesRead = 0;
while ((bytesRead = responseData.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
sb=baos.toString();
str = baos.toString();
sb.append(str);
sb = new String(baos.toByteArray(),UTF8);
}
baos.close(); // connection.close();
You can't convert the read bytes to a String until all your input is read because a fraction of the input might be invalid UTF-8 encoded data.
Also don't use ByteArrayOutputStream.toString() because it uses the platform's default character set to decode bytes to characters (String) which is indeterministic. Instead use ByteArrayOutputStream.toString(String charsetName) and specify the encoding.
Also you should use ServletRequest.getCharacterEncoding() to detect encoding and revert to UTF-8 for example if it is unknown.
First read all input, and then convert it to a String:
String encoding = ServletRequest.getCharacterEncoding();
if (encoding == null)
encoding = "UTF-8";
// First read all input data
while ((bytesRead = responseData.read(buffer)) > 0) {
baos.write(buffer, 0, bytesRead);
}
// We have all input, now convert it to String:
String text = baos.toString(encoding);
Better Alternative
Since you convert the binary input to a String, you should use ServletRequest.getReader() instead of reading binary data using ServletRequest.getInputStream() and converting it to String manually.
E.g. reading all lines:
BufferedReader reader = request.getReader();
StringBuilder sb = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
// Process line, here I just append it to a StringBuilder
sb.append(line);
// If you want to preserve newline characters, keep the next line:
sb.append('\n');
}
I read this post but I am not following. I have seen this but have not seen a proper example of converting a ByteArrayInputStream to String using a ByteArrayOutputStream.
To retrieve the contents of a ByteArrayInputStream as a String, is using a ByteArrayOutputstream recommended or is there a more preferable way?
I was considering this example and extend ByteArrayInputStream and utilize a Decorator to increase functionality at run time. Any interest in this being a better solution to employing a ByteArrayOutputStream?
A ByteArrayOutputStream can read from any InputStream and at the end yield a byte[].
However with a ByteArrayInputStream it is simpler:
int n = in.available();
byte[] bytes = new byte[n];
in.read(bytes, 0, n);
String s = new String(bytes, StandardCharsets.UTF_8); // Or any encoding.
For a ByteArrayInputStream available() yields the total number of bytes.
Addendum 2021-11-16
Since java 9 you can use the shorter readAllBytes.
byte[] bytes = in.readAllBytes();
Answer to comment: using ByteArrayOutputStream
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8192];
for (;;) {
int nread = in.read(buf, 0, buf.length);
if (nread <= 0) {
break;
}
baos.write(buf, 0, nread);
}
in.close();
baos.close();
byte[] bytes = baos.toByteArray();
Here in may be any InputStream.
Since java 10 there also is a ByteArrayOutputStream#toString(Charset).
String s = baos.toString(StandardCharsets.UTF_8);
Why nobody mentioned org.apache.commons.io.IOUtils?
import java.nio.charset.StandardCharsets;
import org.apache.commons.io.IOUtils;
String result = IOUtils.toString(in, StandardCharsets.UTF_8);
Just one line of code.
Java 9+ solution:
new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
Use Scanner and pass to it's constructor the ByteArrayInputStream then read the data from your Scanner , check this example :
ByteArrayInputStream arrayInputStream = new ByteArrayInputStream(new byte[] { 65, 80 });
Scanner scanner = new Scanner(arrayInputStream);
scanner.useDelimiter("\\Z");//To read all scanner content in one String
String data = "";
if (scanner.hasNext())
data = scanner.next();
System.out.println(data);
Use Base64 encoding
Assuming you got your ByteArrayOutputStream :
ByteArrayOutputStream baos =...
String s = new String(Base64.Encoder.encode(baos.toByteArray()));
See http://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html