I have some large base64 encoded data (stored in snappy files in the hadoop filesystem).
This data was originally gzipped text data.
I need to be able to read chunks of this encoded data, decode it, and then flush it to a GZIPOutputStream.
Any ideas on how I could do this instead of loading the whole base64 data into an array and calling Base64.decodeBase64(byte[]) ?
Am I right if I read the characters till the '\r\n' delimiter and decode it line by line?
e.g. :
for (int i = 0; i < byteData.length; i++) {
if (byteData[i] == CARRIAGE_RETURN || byteData[i] == NEWLINE) {
if (i < byteData.length - 1 && byteData[i + 1] == NEWLINE)
i += 2;
else
i += 1;
byteBuffer.put(Base64.decodeBase64(record));
byteCounter = 0;
record = new byte[8192];
} else {
record[byteCounter++] = byteData[i];
}
}
Sadly, this approach doesn't give any human readable output.
Ideally, I would like to stream read, decode, and stream out the data.
Right now, I'm trying to put in an inputstream and then copy to a gzipout
byteBuffer.get(bufferBytes);
InputStream inputStream = new ByteArrayInputStream(bufferBytes);
inputStream = new GZIPInputStream(inputStream);
IOUtils.copy(inputStream , gzipOutputStream);
And it gives me a
java.io.IOException: Corrupt GZIP trailer
Let's go step by step:
You need a GZIPInputStream to read zipped data (that and not a GZIPOutputStream; the output stream is used to compress data). Having this stream you will be able to read the uncompressed, original binary data. This requires an InputStream in the constructor.
You need an input stream capable of reading the Base64 encoded data. I suggest the handy Base64InputStream from apache-commons-codec. With the constructor you can set the line length, the line separator and set doEncode=false to decode data. This in turn requires another input stream - the raw, Base64 encoded data.
This stream depends on how you get your data; ideally the data should be available as InputStream - problem solved. If not, you may have to use the ByteArrayInputStream (if binary), StringBufferInputStream (if string) etc.
Roughly this logic is:
InputStream fromHadoop = ...; // 3rd paragraph
Base64InputStream b64is = // 2nd paragraph
new Base64InputStream(fromHadoop, false, 80, "\n".getBytes("UTF-8"));
GZIPInputStream zis = new GZIPInputStream(b64is); // 1st paragraph
Please pay attention to the arguments of Base64InputStream (line length and end-of-line byte array), you may need to tweak them.
Thanks to Nikos for pointing me in the right direction.
Specifically this is what I did:
private static final byte NEWLINE = (byte) '\n';
private static final byte CARRIAGE_RETURN = (byte) '\r';
byte[] lineSeparators = new byte[] {CARRIAGE_RETURN, NEWLINE};
Base64InputStream b64is = new Base64InputStream(inputStream, false, 76, lineSeparators);
GZIPInputStream zis = new GZIPInputStream(b64is);
Isn't 76 the length of the Base64 line? I didn't try with 80, though.
Related
I am processing very large files (> 2Gig). Each input file is Base64 encoded, andI am outputting to new files after decoding. Depending on the buffer size (LARGE_BUF) and for a given input file, my input to output conversion either works fine, is missing one or more bytes, or throws an exception at the outputStream.write line (IllegalArgumentException: Last unit does not have enough bits). Here is the code snippet (could not cut and paste so my not be perfect):
.
.
final int LARGE_BUF = 1024;
byte[] inBuf = new byte[LARGE_BUF];
try(InputStream inputStream = new FileInputStream(inFile); OutputStream outStream new new FileOutputStream(outFile)) {
for(int len; (len = inputStream.read(inBuf)) > 0); ) {
String out = new String(inBuf, 0, len);
outStream.write(Base64.getMimeDecoder().decode(out.getBytes()));
}
}
For instance, for my sample input file, if LARGE_BUF is 1024, output file is 4 bytes too small, if 2*1024, I get the exception mentioned above, if 7*1024, it works correctly. Grateful for any ideas. Thank you.
First, you are converting bytes into a String, then immediately back into bytes. So, remove the use of String entirely.
Second, base64 encoding turns each sequence of three bytes into four bytes, so when decoding, you need four bytes to properly decode three bytes of original data. It is not safe to create a new decoder for each arbitrarily read sequence of bytes, which may or may not have a length which is an exact multiple of four.
Finally, Base64.Decoder has a wrap(InputStream) method which makes this considerably easier:
try (InputStream inputStream = Base64.getDecoder().wrap(
new BufferedInputStream(
Files.newInputStream(Paths.get(inFile))))) {
Files.copy(inputStream, Paths.get(outFile));
}
I am reading the contents of a zip file and when i find sample.xml file, i edit the contents of it and write to the output zip file
public class CopyEditZip {
static String fileSeparator = System.getProperty("file.separator");
public static void main(String[] args) {
System.getProperty("file.separator");
ZipFile zipFile;
try {
zipFile = new ZipFile("c:/temp/source.zip");
ZipOutputStream zos = new ZipOutputStream(new
FileOutputStream(
c:/temp/target.zip));
for (Enumeration e = zipFile.entries();
e.hasMoreElements();)
{
ZipEntry entryIn = (ZipEntry) e.nextElement();
if (entryIn.getName().contains("sample.xml")) {
zos.putNextEntry(new ZipEntry("sample.xml"));
InputStream is = zipFile.getInputStream(entryIn);
byte[] buf = new byte[1024];
int len;
while ((len = (is.read(buf))) > 0) {
String x = new String(buf);
if (x.contains("Input")) {
System.out.println("edit count");
x = x.replace("Input", "output");
}
buf = x.getBytes();
zos.write(buf, 0, (len < buf.length) ? len
: buf.length);
}
is.close();
zos.closeEntry();
}
zos.close();
zipFile.close();
} catch (Exception ex) {
}
}
}
Now the sample.xml in the output is not coming out correct. There are some data that is truncated and some are lost. Does this have to do with buffer not getting written correctly? Any other alternative to edit the file and write it out?
EDIT: I see the xml is getting written followed some more data from the xml.
mt end tag is called broker, then it is followed by few more lines of data. Not sure how it is writing more data after the end tag.
EDIT:
I put a counter and sysout of line by line to see what came out during each iteration of the while loop.
here are the last two lines
18
put.fileFtpDirectory"/><ConfigurableProperty uri="CDTSFileInput#File
Input.fileFtpServer"/><ConfigurableProperty uri="CDTSFileInput#File
Input.fileFtpUser"/><ConfigurableProperty uri="CDTSFileInput#File
Input.longRetryInterval"/><ConfigurableProperty uri="CDTSFileInput#File
Input.messageCodedCharSetIdProperty"/><ConfigurableProperty
uri="CDTSFileInput#File Input.messageEncodingProperty"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.remoteTransferType"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.retryThreshold"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.shortRetryInterval"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.validateMaster"/>
<ConfigurableProperty override="30" uri="CDTSFileInput#File
Input.waitInterval"/><ConfigurableProperty override="no"
uri="CDTSFileInput#FileInput.connectDatasourceBeforeFlowStarts"/>
<ConfigurableProperty uri="CDTSFileInput#FileInput.validateMaster"/>
<ConfigurableProperty override="/apps/cdts/trace/ExceptionTrace-
CDTSFileInput-CDT.REF_EXT.Q01.txt" uri="CDTSFileInp
19
ut#FilePath_ExceptionTrace"/><ConfigurableProperty
override="/apps/cdts/trace/SnapTrace-CDTSFileInput-CDT.REF_EXT.Q01.txt"
uri="CDTSFileInput#FilePath_SnapTraceENV"/><ConfigurableProperty
override="/apps/cdts/trace/SnapTrace-CDTSFileInput-CDT.REF_EXT.Q01.txt"
uri="CDTSFileInput#FilePath_SnapTraceNOENV"/><ConfigurableProperty
override="EXTERNAL" uri="CDTSFileInput#INPUTORIGIN"/>
<ConfigurableProperty
override="/apps/cdts/data_in/data_in_fileinput_gtr1"
uri="CDTSFileInput#InputDirectory"/><ConfigurableProperty override="GTR"
uri="CDTSFileInput#SUBMITTERID"/><ConfigurableProperty
override="FILEINPT" uri="CDTSFileInput#SUBMITTERTYPE"/>
<ConfigurableProperty override="" uri="CDTSFileInput#excludePattern"/>
<ConfigurableProperty override="*" uri="CDTSFileInput#filenamePattern"/>
<ConfigurableProperty override="no"
uri="CDTSFileInput#recursiveDirectories"/></CompiledMessageFlow>
</Broker>ileInput#FileInput.validateMaster"/><ConfigurableProperty
override="/apps/cdts/trace/ExceptionTrace-CDTSFileInput-
CDT.REF_EXT.Q01.txt" uri="CDTSFileInp
the xml end at but part of the last but one line is getting appended again.
Reading and writing text
If the file is a text file, you shouldn't be reading it as bytes. You should wrap the input stream with a reader, read lines, and write them back to a writer wrapped around the output stream.
One of the reasons for this is that the file could be in an encoding that is not single-byte, like UTF-8. This means that a character can be split between one buffer and the next.
Another problem is that the word Input might be split between buffers. So you might just get Inp in one and ut in the next, and you won't match it properly. Reading lines is a good way of ensuring that you won't be stopping in the middle of a word.
However, it's a little less simple to write text using a ZipOutputStream, as you don't get a separate output stream for each entry. Therefore, you'll need to extract the bytes from the line you read, and write those to the zip file - much like you did.
Reading and writing bytes
Even if the file happens to be in ASCII, you have a couple of problems in your read/write loops. The first, minor one is that your loop condition should be:
((len = (is.read(buf)) >= 0)
You really should only terminate the loop when you get -1. In theory, you could get a read in the middle of the loop that didn't read any bytes at all, if the buffer size is zero, but that doesn't mean the stream is ended. So >=, not >.
But your worse problem is that you read len bytes, but you translate the whole buffer to a string. So if you have a buffer of 1024 bytes, and len is only 50, then only 50 bytes of the buffer will be the content of the latest read, and the rest are going to come from the previous read, or be zero.
So always use exactly len bytes if that's what you read. You should be using
String x = new String(buf,0,len);
Rather than
String x = new String(buf);
Also, you should note that when you do:
buf = x.getBytes();
Your buffer is no longer 1024 bytes long. If there were originally 1024 bytes, and you have 10 Input occurrences in your string, the buffer will now be 1034 bytes long (assuming a one-byte encoding). len is no longer pertinent - it will be smaller than the number. So that's another reason why you have characters that are lost.
Encoding
Usually, XML files are UTF-8. It is important to state the encoding explicitly when you convert bytes to string and vice versa, and also when you create readers and writers. Otherwise, characters may be read inappropriately.
Summary
Prefer a line-based read loop for a text file.
If you read bytes rather than lines: if you read len bytes, use len bytes, not the whole buffer.
If you change the data, don't use the old len.
Use encodings.
So a sketch of the new loop would be:
for (Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements();) {
ZipEntry entryIn = e.nextElement();
if (entryIn.getName().contains("sample.xml")) {
zos.putNextEntry(new ZipEntry("sample.xml"));
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipFile.getInputStream(entryIn),
StandardCharsets.UTF_8))) {
String line;
while ((line = bufferedReader.readLine()) != null) {
if (line.contains("Input")) {
System.out.println("edit count");
line = line.replace("Input", "output")
}
line += System.lineSeparator(); // Add newline back.
byte[] buf = line.getBytes(StandardCharsets.UTF_8);
zos.write(buf);
}
}
zos.closeEntry();
}
}
Note:
Try-with-resources for opening the buffered reader. It will be closed automatically (with its underlying reader and input steam).
Don't use the raw type Enumeration. Use a proper wildcard and you'll be able to avoid the explicit cast.
Since you create a buffer from the full line, and only that line, you can write that full buffer and don't need offset and length.
I'm trying to make a file hexadecimal converter (input file -> output hex string of the file)
The code I came up with is
static String open2(String path) throws FileNotFoundException, IOException,OutOfMemoryError {
System.out.println("BEGIN LOADING FILE");
StringBuilder sb = new StringBuilder();
//sb.ensureCapacity(2147483648);
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
//System.out.println(sb.capacity());
sb.append(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return sb.toString();
}
I am sure that "path" is a valid filename.
The problem is with big files (>=
500mb), the compiler outputs a OutOfMemoryError: Java Heap Space on the StringBuilder.append.
To create this code I followed some tips from http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly but I got a doubt when I tried to force a space allocation for the StringBuilder sb: "2147483648 is too big for an int".
If I want to use this code even with very big files (let's say up to 2gb if I really have to stop somewhere) what's the better way to output a hexadecimal string conversion of the file in terms of speed?
I'm now working on copying the converted string into a file. Anyway I'm having problems of "writing the empty buffer on the file" after the eof of the original one.
static String open3(String path) throws FileNotFoundException, IOException {
System.out.println("BEGIN LOADING FILE (Hope this is the last change)");
FileWriter fos = new FileWriter("HEXTMP");
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
fos.write(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return "HEXTMP";
}
obviously the file HEXTMP created has a size multiple of 256k, but if the file is 257k it will be a 512 file with LOT of "000000" at the end.
I know I just have to create a last byte array with cut length.
(I used a file writer because i wanted to write the string of hex; otherwise it would have just copied the file as-is)
Why are you loading complete file?
You can load few bytes in buffer from input file, process bytes in buffer, then write processed bytes buffer to output file. Continue this till all bytes from input file are not processed.
FileInputStream fis = new FileInputStream("in file");
FileOutputStream fos = new FileOutputStream("out");
byte buffer [] = new byte[8192];
while(true){
int count = fis.read(buffer);
if(count == -1)
break;
byte[] processed = processBytesToConvert(buffer, count);
fos.write(processed);
}
fis.close();
fos.close();
So just read few bytes in buffer, convert it to hex string, get bytes from converted hex string, then write back these bytes to file, and continue for next few input bytes.
The problem here is that you try to read the whole file and store it in memory.
You should use stream, read some lines of your input file, convert them and write them in the output file. That way your program can scale, whatever the size of the input file is.
The key would be to read file in chunks instead of reading all of it in one go. Depending on its use you could vary size of the chunk. For example, if you are trying to make a hex viewer / editor determine how much content is being shown in the viewport and read only as much of data from file. Or if you are simply converting and dumping hex to another file use any chunk size that is small enough to fit in memory but big enough for performance. This should be tunable over some runs. Perhaps use filesystem NIO in Java 7 so that you can do all three tasks - reading, processing and writing - concurrently. The link included in question gives good primer on reading files.
Hi i have a problem i'm not able to solve.
In my Android\java application i call a script download.php. Basically it gives a file in output that i download and save on my device. I had to add a control on all my php scripts that basically consist in sending a token to the script and check if it's valid or not. If it's a valid token i will get the output (in this case a file in the other scripts a json file) if it's not i get back a string "false".
To check this condition in my other java files i used IOUtils method to turn the input stream to a String, check it, and than
InputStream newInputStream = new ByteArrayInputStream(mystring.getBytes("UTF-8"));
to get a valid input stream again and read it......it works with my JSon files, but not in this case......i get this error:
11-04 16:50:31.074: ERROR/AndroidRuntime(32363):
java.lang.OutOfMemoryError
when i try IOUtils.toString(inputStream, "UTF-8");
I think it's because in this case i'm trying to download really long file.
fileOutput = new BufferedOutputStream(new FileOutputStream(file,false));
inputStream = new BufferedInputStream(conn.getInputStream());
String result = IOUtils.toString(inputStream, "UTF-8");
if(result.equals("false"))
{
return false;
}
else
{
Reader r = new InputStreamReader(MyMethods.stringToInputStream(result));
int totalSize = conn.getContentLength();
int downloadedSize = 0;
byte[] buffer = new byte[1024];
int bufferLength = 0;
while ( (bufferLength = inputStream.read(buffer)) > 0 )
{
fileOutput.write(buffer, 0, bufferLength);
downloadedSize += bufferLength;
}
fileOutput.flush();
fileOutput.close();
Don't read the stream as a string to start with. Keep it as binary data, and start off by just reading the first 5 bytes. You can then check whether those 5 bytes are the 5 bytes used to encode "false" in UTF-8, and act accordingly if so. Otherwise, write those 5 bytes to the output file and then do the same looping/reading/writing as before. Note that to read those 5 bytes you may need to loop (however unlikely that seems). Perhaps your IOUtils class has something to say "read at least 5 bytes"? Will the real content ever be smaller than 5 bytes?
To be honest, it would be better if you could use a header in the response to indicate the different result, instead of just a body with "false" - are you in control of the PHP script?
I thought I would find a solution to this problem relatively easily, but here I am calling upon the help from ye gods to pull me out of this conundrum.
So, I've got an image and I want to store it in an XML document using Java. I have previously achieved this in VisualBasic by saving the image to a stream, converting the stream to an array, and then VB's xml class was able to encode the array as a base64 string. But, after a couple of hours of scouring the net for an equivalent solution in Java, I've come back empty handed. The only success I have had has been by:
import it.sauronsoftware.base64.*;
import java.awt.image.BufferedImage;
import org.w3c.dom.*;
...
BufferedImage img;
Element node;
...
java.io.ByteArrayOutputStream os = new java.io.ByteArrayOutputStream();
ImageIO.write(img, "png", os);
byte[] array = Base64.encode(os.toByteArray());
String ss = arrayToString(array, ",");
node.setTextContent(ss);
...
private static String arrayToString(byte[] a, String separator) {
StringBuffer result = new StringBuffer();
if (a.length > 0) {
result.append(a[0]);
for (int i=1; i<a.length; i++) {
result.append(separator);
result.append(a[i]);
}
}
return result.toString();
}
Which is okay I guess, but reversing the process to get it back to an image when I load the XML file has proved impossible. If anyone has a better way to encode/decode an image in an XML file, please step forward, even if it's just a link to another thread that would be fine.
Cheers in advance,
Hoopla.
I've done something similar (encoding and decoding in Base64) and it worked like a charm. Here's what I think you should do, using the class Base64 from the Apache Commons project:
// ENCODING
BufferedImage img = ImageIO.read(new File("image.png"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(img, "png", baos);
baos.flush();
String encodedImage = Base64.encodeToString(baos.toByteArray());
baos.close(); // should be inside a finally block
node.setTextContent(encodedImage); // store it inside node
// DECODING
String encodedImage = node.getTextContent();
byte[] bytes = Base64.decode(encodedImage);
BufferedImage image = ImageIO.read(new ByteArrayInputStream(bytes));
Hope it helps.
Apache Commons has a Base64 class that should be helpful to you:
From there, you can just write out the bytes (they are already in a readable format)
After you get your byte array
byte[] array = Base64.encode(os.toByteArray());
use an encoded String :
String encodedImg = new String( array, "utf-8");
Then you can do fun things in your xml like
<binImg string-encoding="utf-8" bin-encoding="base64" img-type="png"><![CDATA[ encodedIImg here ]]></binImg>
With Java 6, you can use DatatypeConverter to convert a byte array to a Base64 string:
byte[] imageData = ...
String base64String = DatatypeConverter.printBase64Binary(imageData);
And to convert it back:
String base64String = ...
byte[] imageData = DatatypeConverter.parseBase64Binary(base64String);
Your arrayToString() method is rather bizarre (what's the point of that separator?). Why not simply say
String s = new String(array, "US-ASCII");
The reverse operation is
byte[] array = s.getBytes("US-ASCII");
Use the ASCII encoding, which should be sufficient when dealing with Base64 encoded data. Also, I'd prefer a Base64 encoder from a reputable source like Apache Commons.
You don't need to invent your own XML data type for this. XML schema defines standard binary data types, such as base64Binary, which is exactly what you are trying to do.
Once you use the standard types, it can be converted into binary automatically by some parsers (like XMLBeans). If your parser doesn't handle it, you can find classes for base64Binary in many places since the datatype is widely used in SOAP, XMLSec etc.
most easy implementation I was able to made is as below, And this is from Server to Server XML transfer containing binary data Base64 is from the Apache Codec library:
- Reading binary data from DB and create XML
Blob blobData = oRs.getBlob("ClassByteCode");
byte[] bData = blobData.getBytes(1, (int)blobData.length());
bData = Base64.encodeBase64(bData);
String strClassByteCode = new String(bData,"US-ASCII");
on requesting server read the tag and save it in DB
byte[] bData = strClassByteCode.getBytes("US-ASCII");
bData = Base64.decodeBase64(bData);
oPrStmt.setBytes( ++nParam, bData );
easy as it can be..
I'm still working on implementing the streaming of the XML as it is generated from the first server where the XML is created and stream it to the response object, this is to take care when the XML with binary data is too large.
Vishesh Sahu
The basic problem is that you cannot have an arbitrary bytestream in an XML document, so you need to encode it somehow. A frequent encoding scheme is BASE64, but any will do as long as the recipient knows about it.
I know that the question was aking how to encode an image via XML, but it is also possible to just stream the bytes via an HTTP GET request instead of using XML and encoding an image. Note that input is a FileInputStream.
Server Code:
File f = new File(uri_string);
FileInputStream input = new FileInputStream(f);
OutputStream output = exchange.getResponseBody();
int c = 0;
while ((c = input.read()) != -1) {
output.write(c); //writes each byte to the exchange.getResponseBody();
}
result = new DownloadFileResult(int_list);
if (input != null) {input.close();}
if (output != null){ output.close();}
Client Code:
InputStream input = connection.getInputStream();
List<Integer> l = new ArrayList<>();
int b = 0;
while((b = input.read()) != -1){
l.add(b);//you can do what you wish with this list of ints ie- write them to a file. see code below.
}
Here is how you would write the Integer list to a file:
FileOutputStream out = new FileOutputStream("path/to/file.png");
for(int i : result_bytes_list){
out.write(i);
}
out.close();
node.setTextContent( base64.encodeAsString( fileBytes ) )
using org.apache.commons.codec.binary.Base64