I have some code to uncompress gzip a compressedString as below:
public static String decompress(String compressedString) throws IOException {
byte[] byteCompressed = compressedString.getBytes(StandardCharsets.UTF_8)
final StringBuilder outStr = new StringBuilder();
if ((byteCompressed == null) || (byteCompressed.length == 0)) {
return "";
}
if (isCompressed(byteCompressed)) {
final GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(byteCompressed));
final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
String line;
while ((line = bufferedReader.readLine()) != null) {
outStr.append(line);
}
} else {
outStr.append(byteCompressed);
}
return outStr.toString();
}
public static boolean isCompressed(final byte[] compressed) {
return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
}
I use this code to uncompress a String as below:
H4sIAAAAAAAAAHNJLQtJLS4BALwLiloHAAAA
But this code uncompress a unexpected String although I can uncompress online normally in the web
Anyone can help me give the right uncompress code? Thanks
Your string is base64 encoded gzip data, so you need to base64 decode it, instead of trying to encode it as UTF-8 bytes.
String input = "H4sIAAAAAAAAAHNJLQtJLS4BALwLiloHAAAA";
byte[] byteCompressed = Base64.getDecoder().decode(input);
// ... rest of your code
Related
I have sounds in my jar file directory.
I need to use these sounds and I am trying to extract them using this method:
String charset = "ISO-8859-1";
public void extractSounds(String pathIn, String pathOut) throws IOException {
BufferedReader r = new BufferedReader(new InputStreamReader(getClass().getResourceAsStream(pathIn), charset));
String line = r.readLine();
String result = null;
FileOutputStream fos = new FileOutputStream(pathOut);
while(line != null) {
if(result != null) {
result += "\r" + line;
line = r.readLine();
} else {
result = line;
line = r.readLine();
}
}
fos.write(result.getBytes(charset));
}}
But when I extract the sounds they get distorted and I don't know what the problem is, because it basically just copies the file.
Sounds:
Original,
Extracted
I would be very grateful if you could help me find a solution or suggest another method to extract the sound files.
Don't assume you are reading text. You should not try to mutate the data. Just copy it in chunks.
Try something like
InputStream in = ...;
ByteArrayOutputStream out = new ByteArrayOutputStream();
final int BUF_SIZE = 1 << 8;
byte[] buffer = new byte[BUF_SIZE];
int bytesRead = -1;
while((bytesRead = in.read(buffer)) > -1) {
out.write(buffer, 0, bytesRead);
}
I need to convert a Reader object into InputStream. My solution right now is below. But my concern is since this will handle big chunks of data, it will increase the memory usage drastically.
private static InputStream getInputStream(final Reader reader) {
char[] buffer = new char[10240];
StringBuilder builder = new StringBuilder();
int charCount;
try {
while ((charCount = reader.read(buffer, 0, buffer.length)) != -1) {
builder.append(buffer, 0, charCount);
}
reader.close();
} catch (final IOException e) {
e.printStackTrace();
}
return new ByteArrayInputStream(builder.toString().getBytes(StandardCharsets.UTF_8));
}
Since I use StringBuilder this will keep the full content of the reader object in memory. I want to avoid this. Is there a way I can pipe Reader object? Any help regarding this highly appreciated.
Using the Apache Commons IO library, you can do this conversion in one line:
//import org.apache.commons.io.input.ReaderInputStream;
InputStream inputStream = new ReaderInputStream(reader, StandardCharsets.UTF_8);
You can read the documentaton for this Class at https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/ReaderInputStream.html
It might be worth trying this to see if it solves the memory issue too.
First: a rare requirement, often it is the other way around, or there is a FileChannel, so one can use a ByteBuffer.
A PipedInputStream would be possible, starting a PipedOutputStream in a second thread. However that is unneeded.
A Reader gives chars. Unicode code points are derived from either one or two chars (the latter a surrogate pair).
/**
* Reader for an InputSteam of UTF-8 text bytes.
*/
public class ReaderInputStream extends InputStream {
private final Reader reader;
private boolean eof;
private int byteCount;
private byte[] bytes = new byte[6];
public ReaderInputStream(Reader reader) {
this.reader = reader;
}
#Override
public int read() throws IOException {
if (byteCount > 0) {
int c = bytes[0];
--byteCount;
for (int i = 0; i < byteCount; ++i) {
bytes[i] = bytes[i + 1];
}
return c;
}
if (eof) {
return -1;
}
int c = reader.read();
if (c == -1) {
eof = true;
return -1;
}
char ch = (char) c;
String s;
if (Character.isHighSurrogate(ch)) {
c = reader.read();
if (c == -1) {
// Error, low surrogate expected.
eof = true;
//return -1;
throw new IOException("Expected a low surrogate char i.o. EOF");
}
char ch2 = (char) c;
if (!Character.isLowSurrogate(ch2)) {
throw new IOException("Expected a low surrogate char");
}
s = new String(new char [] {ch, ch2});
} else {
s = Character.toString(ch);
}
byte[] bs = s.getBytes(StandardCharsets.UTF_8);
byteCount = bs.length;
System.arraycopy(bs, 0, bytes, 0, byteCount);
return read();
}
}
Path source = Paths.get("...");
Path target = Paths.get("...");
try (Reader reader = Files.newBufferedReader(source, StandardCharsets.UTF_8);
InputStream in = new ReaderInputStream(reader)) {
Files.copy(in, target);
}
I searched for an example of how to compress a string in Java.
I have a function to compress then uncompress. The compress seems to work fine:
public static String encStage1(String str)
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream out = new ByteArrayOutputStream();
String outStr = null;
try
{
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
outStr = out.toString(format2);
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
But the reverse is complaining about the string not being in GZIP format, even when I pass the return from encStage1 straight back into the decStage3:
public static String decStage3(String str)
{
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("Input String length : " + str.length());
String outStr = "";
try
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes(format2)));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, format2));
String line;
while ((line = bf.readLine()) != null)
{
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
I get this error when I call with a string return from encStage1:
public String encIDData(String idData)
{
String tst = "A simple test string";
System.out.println("Enc 0: " + tst);
String stg1 = encStage1(tst);
System.out.println("Enc 1: " + toHex(stg1));
String dec1 = decStage3(stg1);
System.out.println("unzip: " + toHex(dec1));
}
Output/Error:
Enc 0: A simple test string
String length : 20
Output String lenght : 40
Enc 1: 1fefbfbd0800000000000000735428efbfbdefbfbd2defbfbd495528492d2e51282e29efbfbdefbfbd4b07005aefbfbd21efbfbd14000000
Input String length : 40
java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
A small error is:
gzip.write(str.getBytes());
takes the default platform encoding, which on Windows will never be ISO-8859-1. Better:
gzip.write(str.getBytes(format1));
You could consider taking "Cp1252", Windows Latin-1 (for some European languages), instead of "ISO-8859-1", Latin-1. That adds comma like quotes and such.
The major error is converting the compressed bytes to a String. Java separates binary data (byte[], InputStream, OutputStream) from text (String, char, Reader, Writer) which internally is always kept in Unicode. A byte sequence does not need to be valid UTF-8. You might get away by converting the bytes as a single byte encoding (ISO-8859-1 for instance).
The best way would be
gzip.write(str.getBytes(StandardCharsets.UTF_8));
So you have full Unicode, every script may be combined.
And uncompressing to a ByteArrayOutputStream and new String(baos.toByteArray(), StandardCharsets.UTF_8).
Using BufferedReader on an InputStreamReader with UTF-8 is okay too, but a readLine throws away the newline characters
outStr += line + "\r\n"; // Or so.
Clean answer:
public static byte[] encStage1(String str) throws IOException
{
try (ByteArrayOutputStream out = new ByteArrayOutputStream())
{
try (GZIPOutputStream gzip = new GZIPOutputStream(out))
{
gzip.write(str.getBytes(StandardCharsets.UTF_8));
}
return out.toByteArray();
//return out.toString(StandardCharsets.ISO_8859_1);
// Some single byte encoding
}
}
public static String decStage3(byte[] str) throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str)))
{
int b;
while ((b = gis.read()) != -1) {
baos.write((byte) b);
}
}
return new String(baos.toByteArray(), StandardCharset.UTF_8);
}
usage of toString/getBytes for encoding/decoding is a wrong way. try to use something like BASE64 encoding for this purpose (java.util.Base64 in jdk 1.8)
as a proof try this simple test:
import org.testng.annotations.Test;
import java.io.ByteArrayOutputStream;
import static org.testng.Assert.assertEquals;
public class SimpleTest {
#Test
public void test() throws Exception {
final String CS = "utf-8";
byte[] b0 = {(byte) 0xff};
ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(b0);
out.close();
byte[] b1 = out.toString(CS).getBytes(CS);
assertEquals(b0, b1);
}
}
I want read all data ,synchronously , receive from client or server without readline() method in java(like readall() in c++).
I don't want use something like code below:
BufferedReader reader = new BufferedReader(new inputStreamReader(socket.getInputStream()));
String line = null;
while ((line = reader.readLine()) != null)
document.append(line + "\n");
What method should i use?
If you know the size of incoming data you could use a method like :
public int read(char cbuf[], int off, int len) throws IOException;
where cbuf is Destination buffer.
Otherwise, you'll have to read lines or read bytes. Streams aren't aware of the size of incoming data. The can only sequentially read until end is reached (read method returns -1)
refer here streams doc
sth like that:
public static String readAll(Socket socket) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = reader.readLine()) != null)
sb.append(line).append("\n");
return sb.toString();
}
You could use something like this:
public static String readToEnd(InputStream in) throws IOException {
byte[] b = new byte[1024];
int n;
StringBuilder sb = new StringBuilder();
while ((n = in.read(b)) >= 0) {
sb.append(b);
}
return sb.toString();
}
try this
public static String readToEnd(InputStream in) throws IOException {
return new String(in.readAllBytes(),StandardCharsets.UTF_8);
}
How can i decompress a String that was zipped by PHP gzcompress() function?
Any full examples?
thx
I tried it now like this:
public static String unzipString(String zippedText) throws Exception
{
ByteArrayInputStream bais = new ByteArrayInputStream(zippedText.getBytes("UTF-8"));
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);
String unzipped = "";
while ((unzipped = in.readLine()) != null)
unzipped+=unzipped;
return unzipped;
}
but it's not working if i i'm trying to unzip a PHP gzcompress (-ed) string.
PHP's gzcompress uses Zlib NOT GZIP
public static String unzipString(String zippedText) {
String unzipped = null;
try {
byte[] zbytes = zippedText.getBytes("ISO-8859-1");
// Add extra byte to array when Inflater is set to true
byte[] input = new byte[zbytes.length + 1];
System.arraycopy(zbytes, 0, input, 0, zbytes.length);
input[zbytes.length] = 0;
ByteArrayInputStream bin = new ByteArrayInputStream(input);
InflaterInputStream in = new InflaterInputStream(bin);
ByteArrayOutputStream bout = new ByteArrayOutputStream(512);
int b;
while ((b = in.read()) != -1) {
bout.write(b); }
bout.close();
unzipped = bout.toString();
}
catch (IOException io) { printIoError(io); }
return unzipped;
}
private static void printIoError(IOException io)
{
System.out.println("IO Exception: " + io.getMessage());
}
Try a GZIPInputStream. See this example and this SO question.
See
http://developer.android.com/reference/java/util/zip/InflaterInputStream.html
since the DEFLATE algorithm is gzip.