I'm trying to figure out how to replace binary data using Java.
below is a PHP example of replacing "foo" to "bar" from a swf file.
<?php
$fp = fopen("binary.swf","rb");
$size = filesize("binary.swf");
$search = bin2hex("foo");
$replace = bin2hex("bar");
$data = fread($fp, $size);
$data16 = bin2hex($data);
$data16 = str_replace($search, $replace, $data16);
$data = pack('H*',$data16);
header("Content-Type:application/x-shockwave-flash");
echo $data;
?>
How do I do this in Java.
Try this:
InputStream in = new FileInputStream("filename");
StringBuilder sb = new StringBuilder();
byte[] b = new byte[4096];
for (int n; (n = in.read(b)) != -1;) {
sb.append(new String(b, 0, n));
}
in.close();
String data = sb.toString();
data = data.replace("foo", "bar");
//do whatever you want with data
I'm not sure how well this will work with truly binary data (such as a SWF file as used in your example). It's possible that binary data will be interpreted as Unicode characters, and will appear differently if you print them. It's also possible that it will throw some kind of exception for invalid character encodings. You probably want to use a ByteArrayInputStream for binary data, but then you don't have easy ways of doing a search/replace.
Related
I have a problem converting this Java code that generate md5-base64 to php.
I'd try more then 5 hours but without success.
This is the java code:
public static void main(String[] args) {
// TODO code application logic here
try {
String string = "customString";
String format = "20190101000000";
StringBuilder sb = new StringBuilder();
sb.append(format);
sb.append(string);
String sb2 = sb.toString();
byte[] bytes = sb2.getBytes();
byte[] bArr = new byte[16];
MessageDigest instance2 = MessageDigest.getInstance("MD5");
instance2.update(bytes, 0, bytes.length);
instance2.digest(bArr, 0, 16);
PrintStream printStream6 = System.out;
String a2 = Base64.getEncoder().encodeToString(bArr);
if (a2.length() >= 20) {
a2 = a2.substring(0, 19).trim();
}
StringBuilder sb8 = new StringBuilder();
sb8.append("MD5 16: ");
sb8.append(a2);
printStream6.println(sb8.toString());
} catch (Exception e) {
System.out.println(e);
}
}
And this is my php
<?php
$string = 'customString';
$format = '20190101000000';
$res = $format . $string;
$md5 = md5($res, true);
echo $md5;
echo '------------------';
$base = base64_encode($md5);
echo $base;
echo '------------------';
$result = substr($base, 0, 19);
echo $result;
echo '------------------';
The Java result is 1B2M2Y8AsgTpgAmY7Ph and php is iSKxA+7Y1mMnHhwf0yb
Check for charset encodings. In Java Strings are usually UTF-8 encoded. But when you transform to byte[] in sb2.getBytes(); it is using platform default charset (e.g. ISO-8859-1).
You have to provide the charset in java to have a determined behavior:
sb2.getBytes(java.nio.charset.Charset.forName("UTF-8");
or, the other way round, if goal isn't simply to make both reproduce same output, but you have to implement a PHP solution compatible with your existing Java solution, convert the PHP UTF-8 string to correct charset before md5(...). Therefore use iconv method.
I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html
For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.
As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}
You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.
I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.
I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.
I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.
I am trying to read the file as follows:
byte[] result;
try (final FileInputStream fis = new FileInputStream(file)) {
final InputStreamReader isr = new InputStreamReader(fis);
final BufferedReader reader = new BufferedReader(isr);
String line;
// reading until \n\n
while (!(line = reader.readLine()).trim().isEmpty()){
// processing the line
}
// copying the rest of the byte array
result = IOUtils.toByteArray(reader);
reader.close();
}
Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.
How can I read the rest of the file correctly and efficiently?
Thanks!
The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.
Depending on how exactly the charset is implemented, there is a slight chance that this might work:
result = IOUtils.toByteArray(reader, "ISO-8859-1");
ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.
But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.
This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.
You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:
http://www.docjar.com/html/api/java/io/BufferedReader.java.html
It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.
Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes
byte[] a = Files.readAllBytes(Paths.get("file"));
String line = "";
byte[] result = a;
for (int i = 0; i < a.length - 1; i++) {
if (a[i] == '\n' && a[i + 1] == '\n') {
line = new String(a, 0, i);
int len = a.length - i - 1;
result = new byte[len];
System.arraycopy(a, i + 1, result, 0, len);
break;
}
}
Thanks for all the comments - the final implementation was done in this way:
try (final FileInputStream fis = new FileInputStream(file)) {
ByteBuffer buffer = ByteBuffer.allocate(64);
boolean wasLast = false;
String headerValue = null, headerKey = null;
byte[] result = null;
while (true) {
byte current = (byte) fis.read();
if (current == '\n') {
if (wasLast) {
// this is \n\n
break;
} else {
// just a new line in header
wasLast = true;
headerValue = new String(buffer.array(), 0, buffer.position()));
buffer.clear();
}
} else if (current == '\t') {
// headerKey\theaderValue\n
headerKey = new String(buffer.array(), 0, buffer.position());
buffer.clear();
} else {
buffer.put(current);
wasLast = false;
}
}
// reading the rest
result = IOUtils.toByteArray(fis);
}
I am trying to use following code to read a Google text document. But the value returned is a stream with garbage characters instead of the real contents. How can I fix this.
for (DocumentListEntry entry : resultFeed.getEntries()) {
String docId = entry.getDocId();
String docType = entry.getType();
URL exportUrl = new URL("https://docs.google.com/feeds/download/"
+ docType
+ "s/Export?docID="
+ docId
+ "&exportFormat=doc");
MediaContent mc = new MediaContent();
mc.setUri(exportUrl.toString());
MediaSource ms = client.getMedia(mc);
InputStream inStream = null;
try {
inStream = ms.getInputStream();
int c;
while ((c = inStream.read()) != -1) {
System.out.print((char)c);
}
} finally {
if (inStream != null) {
inStream.close();
}
}
}
From a quick read of the documentation, it looks like you are reading the raw bytes of a Microsoft Word-encoded document.
Try changing the &exportFormat=doc to html or txt and see if the output makes more sense.
I suspect that the files you are trying to print out have some other encoding but you're printing them byte by byte in ASCII way. I would try to read the whole stream as byte array and then convert it to string using some other encoding (e.g. UTF8).
Some character not support by certain charset, so below test fail. I would like to use html entity to encode ONLY those not supported character. How, in java?
public void testWriter() throws IOException{
String c = "\u00A9";
String encoding = "gb2312";
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Writer writer = new BufferedWriter(new OutputStreamWriter(outStream, encoding));
writer.write(c);
writer.close();
String result = new String(outStream.toByteArray(), encoding);
assertEquals(c, result);
}
I'm not positive I understand the question, but something like this might help:
import java.nio.charset.CharsetEncoder;
...
StringBuilder buf = new StringBuilder(c.length());
CharsetEncoder enc = Charset.forName("gb2312");
for (int idx = 0; idx < c.length(); ++idx) {
char ch = c.charAt(idx);
if (enc.canEncode(ch))
buf.append(ch);
else {
buf.append("&#");
buf.append((int) ch);
buf.append(';');
}
}
String result = buf.toString();
This code is not robust, because it doesn't handle characters beyond the Basic Multilingual Plane. But iterating over code points in the String, and using the canEncode(CharSequence) method of the CharsetEncoder, you should be able to handle any character.
Try using StringEscapeUtils from apache commons.
Just use utf-8, and that way there is no reason to use entities.
If there is an argument that some clients need gb2312 because they don't understand Unicode, then entities are not much use either, because the numeric entities represent Unicode code points.