Why won't Java's BufferedReader act like Objective-C's NSData? - java

I'm developing an application running on Android and iOS devices. For this app I need to get an XML stream from a URL. This XML is not really safe, because some lines, for example :
Révélation
Will become :
R�v�lation
Of course I know the best thing to do is to fix the XML generator script. But I'm only working as a developer for a firm and don't have access to it, so for the moment I'm trying to do what I can with what I have.
Now here is the reason of this topic. When I put this data in an Objective-C's NSData object :
NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:url]];
And then try to read every byte :
NSUInteger len = [data length];
Byte *byteData = (Byte*)malloc(len);
memcpy(byteData, [data bytes], len);
for(int i = 0 ; i < len ; i++)
{
NSLog(#"%d",byteData[i]);
}
It correctly displays the int value of the char, special character or not. Then I just have to handle (unichar)byteData[i] to solve it.
No with Java and Android, I'm trying to do a basic BufferedReader operation.
URL myURL = new URL(url);
BufferedReader in = new BufferedReader(new InputStreamReader(myURL.openStream()));
Then print every char's int one by one :
int i;
while((i = in.read()) != -1) System.out.print(i);
But with Java, by doing this I immediately get the replacement char's id (65533) instead of the good one, and can't manage to replace it.
Any idea? Thanks for reading me.

BufferedReader in = new BufferedReader(
new InputStreamReader(myURL.openStream(), "UTF-8"));
InputStreams are for bytes, binary data.
Readers are for characters, String, text.
The InputStreamReader bridges this conceptual difference, saying which encoding the binary data is in, and has an optional parameter for the encoding. The default encoding is that of the current platform - so not very portable.

Related

Sending large data over TCP/IP socket

I have a small project running a server in C# and a client in Java. The server sends images to the client.
Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.
My C# Server code is as follows:
using (var stream = new MemoryStream(ImageData))
{
for (int j = 1; j <= dataSplitParameters.NumberOfChunks; j++)
{
byte[] chunk;
if (j == dataSplitParameters.NumberOfChunks)
chunk = new byte[dataSplitParameters.FinalChunkSize];
else
chunk = new byte[dataSplitParameters.ChunkSize];
int result = stream.Read(chunk, 0, chunk.Length);
string line = DateTime.Now + ", Status OK, " + ImageName+ ", ImageChunk, " + j + ", " + dataSplitParameters.NumberOfChunks + ", " + chunk.Length;
//write read params
streamWriter.WriteLine(line);
streamWriter.Flush();
//write the data
binaryWriter.Write(chunk);
binaryWriter.Flush();
Console.WriteLine(line);
string deliveryReport = streamReader.ReadLine();
Console.WriteLine(deliveryReport);
}
}
And my Java Client code is as follows:
long dataRead = 0;
for (int j = 1; j <= numberOfChunks; j++) {
String line = bufferedReader.readLine();
tokens = line.split(", ");
System.out.println(line);
int toRead = Integer.parseInt(tokens[tokens.length - 1]);
byte[] chunk = new byte[toRead];
int read = inputStream.read(chunk, 0, toRead);
//do something with the data
dataRead += read;
String progressReport = pageLabel + ", progress: " + dataRead + "/" + dataLength + " bytes.";
bufferedOutputStream.write((progressReport + "\n").getBytes());
bufferedOutputStream.flush();
System.out.println(progressReport);
}
The problem is when I run the code, either the client crashes with an error saying it is reading bogus data, or both the client and the server hang. This is the error:
Document Page 1, progress: 49153/226604 bytes.
�9��%>�YI!��F�����h�
Exception in thread "main" java.lang.NumberFormatException: For input string: .....
What am I doing wrong?
The basic problem.
Once you wrap an inputstream into a bufferedreader you must stop accessing the inputstream. That bufferedreader is buffered, it will read as much data as it wants to, it is NOT limited to reading exactly up to the next newline symbol(s) and stopping there.
The BufferedReader on the java side has read a lot more than that, so it's consumed a whole bunch of image data already, and there's no way out from here. By making that BufferedReader, you've made the job impossible, so you can't do that.
The underlying problem.
You have a single TCP/IP connection. On this, you send some irrelevant text (the page, the progress, etc), and then you send an unknown amount of image data, and then you send another irrelevant progress update.
That's fundamentally broken. How can an image parser possibly know that halfway through sending an image, you get a status update line? Text is just binary data too, there is no magic identifier that lets a client know: This byte is part of the image data, but this byte is some text sent in-between with progress info.
The simple fix.
You'd think the simple fix is.. well, stop doing that then! Why are you sending this progress? The client is perfectly capable of knowing how many bytes it read, there is no point sending that. Just.. take your binary data. open the outputstream. send all that data. And on the client side, open the inputstream, read all that data. Don't involve strings. Don't use anything that smacks of 'works with characters' (so, BufferedReader? No. BufferedInputStream is fine).
... but now the client doesn't know the title, nor the total size!
So make a wire protocol. It can be near trivial.
This is your wire protocol:
4 bytes, big endian: SizeOfName
SizeOfName number of bytes. UTF-8 encoded document title.
4 bytes, big endian: SizeOfData
SizeOfData number of bytes. The image data.
And that's if you actually want the client to be able to render a progress bar and to know the title. If that's not needed, don't do any of that, just straight up send the bytes, and signal that the file has been completely sent by.. closing the connection.
Here's some sample java code:
try (InputStream in = ....) {
int nameSize = readInt(in);
byte[] nameBytes = in.readNBytes(nameSize);
String name = new String(nameBytes, StandardCharsets.UTF_8);
int dataSize = readInt(in);
try (OutputStream out =
Files.newOutputStream(Paths.get("/Users/TriSky/image.png")) {
byte[] buffer = new byte[65536];
while (dataSize > 0) {
int r = in.read(buffer);
if (r == -1) throw new IOException("Early end-of-stream");
out.write(buffer, 0, r);
dataSize -= r;
}
}
}
public int readInt(InputStream in) throws IOException {
byte[] b = in.readNBytes(4);
return ByteBuffer.wrap(b).getInt();
}
Closing notes
Another bug in your app is that you're using the wrong method. Java's 'read(bytes)' method will NOT (neccessarily) fully fill that byte array. All read(byte[]) will do is read at least 1 byte (unless the stream is closed, then it reads none, and returns -1. The idea is: read will read the optimal number of bytes: Exactly as many as are ready to give you right now. How many is that? Who knows - if you ignore the returned value of in.read(bytes), your code is neccessarily broken, and you're doing just that. What you really want is for example readNBytes which guarantees that it fully fills that byte array (or until stream ends, whichever happens first).
Note that in the transfer code above, I also use the basic read, but here I don't ignore the return value.
Your Java code seems to be using a BufferedReader. It reads data into a buffer of its own, meaning it is no longer available in the underlying socket input stream - that's your first problem. You have a second problem with how inputStream.read is used - it's not guaranteed to read all the bytes you ask for, you would have to put a loop around it.
This is not a particularly easy problem to solve. When you mix binary and text data in the same stream, it is difficult to read it back. In Java, there is a class called DataInputStream that can help a little - it has a readLine method to read a line of text, and also methods to read binary data:
DataInputStream dataInput = new DataInputStream(inputStream);
for (int j = 1; j <= numberOfChunks; j++) {
String line = dataInput.readLine();
...
byte[] chunk = new byte[toRead];
int read = dataInput.readFully(chunk);
...
}
DataInputStream has limitations: the readLine method is deprecated because it assumes the text is encoded in latin-1, and does not let you use a different text encoding. If you want to go further down this road you'll want to create a class of your own to read your stream format.
Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.
You know this is totally unnecessary right? There is absolutely no problem sending multiple megabytes of data into a TCP socket, and streaming all of the data in on the receiving side.
When you try to send image, you have to open the image as a normal file then substring the image into some chunks and every chunk change it into "base64encode" when you send and the client decode it because the image data is not normal data, so base64encode change this symbols to normal chars like AfHM65Hkgf7MM

How to avoid unwanted data being read from java servlet to android

I have a Java servlet which takes some data from an android app and returns a string data back to the android app using the following code.
response.getOutputStream().write(STRING_MESSAGE.getBytes());
The value I pass here is read from the android activity as:
InputStream is = con.getInputStream();
byte[] b = new byte[1024];
while(is.read(b) != -1) {
buffer.append(new String(b));
}
The value is then converted to String using:
String result = buffer.toString();
But after doing so, the result has some added unwanted characters (they appear as a '?' inside a diamond shape) appended to the original string I have passed from the servlet. How can I avoid this?
As nafas said, the encoding is probably the error.
Try to replace the writing on your os with this :
response.getOutputStream().write(STRING_MESSAGE.getBytes(Charset.forName("UTF-8")));
And you also have to apply the mod to the InputStream :
buffer.append(new String(b, Charset.forName("UTF-8")));

Parsing PDF that has been downloaded from internet

I have searched questions about this topic on stackoverflow. They really helped me but I stuck again.
My problem is that I need do write a method that downloads pdf from a site like (www.example.com/abc.pdf) and then I want to read the output. I don't want to save this file, just read in system out. I don't need to put bytes to fileoutputstream. I tried to cast bytes to char to get characters ( it can be dumbest solution ). But I got unknown characters. Any idea or am I understood it in a wrong way?
Here is the code and its output:
String textlink="http://www.selab.isti.cnr.it/ws-mate/example.pdf";// it comes from main class
public String HtmlTest(String textLink) throws IOException{
StringBuilder sd=new StringBuilder();
URL link=new URL(textLink);
URLConnection urlConn = link.openConnection();
BufferedInputStream in = null;
try
{
in = new BufferedInputStream(urlConn.getInputStream());
byte data[] = new byte[1024];
in.read(data, 0, 1024);
for (int j = 0; j < data.length; j++) {
if(j%100==0){
sd.append((char)data[j]+"\n"); // i used this for making readable text
}
else{
sd.append((char)data[j]);
}
}
}
finally
{
if (in != null)
in.close();
}
return sd.toString();
}
Output
run:
%
PDF-1.3
%ᅦ↓マᄁ
7 0 obj
<</Length 8 0 R/Filter /FlateDecode>>
stream
xワᆳY[モᅴᄊ○ᄈ&?BoNf,,q%¢ᄐ4￞x&゙6ᄅロlᅮ
ラᄐ￐폐Zeム￲f→チ
You're not going to get very far trying to read a .pdf file as though it were basically a text file. For starters, the "text" is in a compressed binary format; there are other issues you'll probably also have to deal with.
STRONG SUGGESTION:
Use a Java .pdf library like Apache PDFBox
IMHO>.

java: error checking php output

Hi i have a problem i'm not able to solve.
In my Android\java application i call a script download.php. Basically it gives a file in output that i download and save on my device. I had to add a control on all my php scripts that basically consist in sending a token to the script and check if it's valid or not. If it's a valid token i will get the output (in this case a file in the other scripts a json file) if it's not i get back a string "false".
To check this condition in my other java files i used IOUtils method to turn the input stream to a String, check it, and than
InputStream newInputStream = new ByteArrayInputStream(mystring.getBytes("UTF-8"));
to get a valid input stream again and read it......it works with my JSon files, but not in this case......i get this error:
11-04 16:50:31.074: ERROR/AndroidRuntime(32363):
java.lang.OutOfMemoryError
when i try IOUtils.toString(inputStream, "UTF-8");
I think it's because in this case i'm trying to download really long file.
fileOutput = new BufferedOutputStream(new FileOutputStream(file,false));
inputStream = new BufferedInputStream(conn.getInputStream());
String result = IOUtils.toString(inputStream, "UTF-8");
if(result.equals("false"))
{
return false;
}
else
{
Reader r = new InputStreamReader(MyMethods.stringToInputStream(result));
int totalSize = conn.getContentLength();
int downloadedSize = 0;
byte[] buffer = new byte[1024];
int bufferLength = 0;
while ( (bufferLength = inputStream.read(buffer)) > 0 )
{
fileOutput.write(buffer, 0, bufferLength);
downloadedSize += bufferLength;
}
fileOutput.flush();
fileOutput.close();
Don't read the stream as a string to start with. Keep it as binary data, and start off by just reading the first 5 bytes. You can then check whether those 5 bytes are the 5 bytes used to encode "false" in UTF-8, and act accordingly if so. Otherwise, write those 5 bytes to the output file and then do the same looping/reading/writing as before. Note that to read those 5 bytes you may need to loop (however unlikely that seems). Perhaps your IOUtils class has something to say "read at least 5 bytes"? Will the real content ever be smaller than 5 bytes?
To be honest, it would be better if you could use a header in the response to indicate the different result, instead of just a body with "false" - are you in control of the PHP script?

Java - Image encoding in XML

I thought I would find a solution to this problem relatively easily, but here I am calling upon the help from ye gods to pull me out of this conundrum.
So, I've got an image and I want to store it in an XML document using Java. I have previously achieved this in VisualBasic by saving the image to a stream, converting the stream to an array, and then VB's xml class was able to encode the array as a base64 string. But, after a couple of hours of scouring the net for an equivalent solution in Java, I've come back empty handed. The only success I have had has been by:
import it.sauronsoftware.base64.*;
import java.awt.image.BufferedImage;
import org.w3c.dom.*;
...
BufferedImage img;
Element node;
...
java.io.ByteArrayOutputStream os = new java.io.ByteArrayOutputStream();
ImageIO.write(img, "png", os);
byte[] array = Base64.encode(os.toByteArray());
String ss = arrayToString(array, ",");
node.setTextContent(ss);
...
private static String arrayToString(byte[] a, String separator) {
StringBuffer result = new StringBuffer();
if (a.length > 0) {
result.append(a[0]);
for (int i=1; i<a.length; i++) {
result.append(separator);
result.append(a[i]);
}
}
return result.toString();
}
Which is okay I guess, but reversing the process to get it back to an image when I load the XML file has proved impossible. If anyone has a better way to encode/decode an image in an XML file, please step forward, even if it's just a link to another thread that would be fine.
Cheers in advance,
Hoopla.
I've done something similar (encoding and decoding in Base64) and it worked like a charm. Here's what I think you should do, using the class Base64 from the Apache Commons project:
// ENCODING
BufferedImage img = ImageIO.read(new File("image.png"));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(img, "png", baos);
baos.flush();
String encodedImage = Base64.encodeToString(baos.toByteArray());
baos.close(); // should be inside a finally block
node.setTextContent(encodedImage); // store it inside node
// DECODING
String encodedImage = node.getTextContent();
byte[] bytes = Base64.decode(encodedImage);
BufferedImage image = ImageIO.read(new ByteArrayInputStream(bytes));
Hope it helps.
Apache Commons has a Base64 class that should be helpful to you:
From there, you can just write out the bytes (they are already in a readable format)
After you get your byte array
byte[] array = Base64.encode(os.toByteArray());
use an encoded String :
String encodedImg = new String( array, "utf-8");
Then you can do fun things in your xml like
<binImg string-encoding="utf-8" bin-encoding="base64" img-type="png"><![CDATA[ encodedIImg here ]]></binImg>
With Java 6, you can use DatatypeConverter to convert a byte array to a Base64 string:
byte[] imageData = ...
String base64String = DatatypeConverter.printBase64Binary(imageData);
And to convert it back:
String base64String = ...
byte[] imageData = DatatypeConverter.parseBase64Binary(base64String);
Your arrayToString() method is rather bizarre (what's the point of that separator?). Why not simply say
String s = new String(array, "US-ASCII");
The reverse operation is
byte[] array = s.getBytes("US-ASCII");
Use the ASCII encoding, which should be sufficient when dealing with Base64 encoded data. Also, I'd prefer a Base64 encoder from a reputable source like Apache Commons.
You don't need to invent your own XML data type for this. XML schema defines standard binary data types, such as base64Binary, which is exactly what you are trying to do.
Once you use the standard types, it can be converted into binary automatically by some parsers (like XMLBeans). If your parser doesn't handle it, you can find classes for base64Binary in many places since the datatype is widely used in SOAP, XMLSec etc.
most easy implementation I was able to made is as below, And this is from Server to Server XML transfer containing binary data Base64 is from the Apache Codec library:
- Reading binary data from DB and create XML
Blob blobData = oRs.getBlob("ClassByteCode");
byte[] bData = blobData.getBytes(1, (int)blobData.length());
bData = Base64.encodeBase64(bData);
String strClassByteCode = new String(bData,"US-ASCII");
on requesting server read the tag and save it in DB
byte[] bData = strClassByteCode.getBytes("US-ASCII");
bData = Base64.decodeBase64(bData);
oPrStmt.setBytes( ++nParam, bData );
easy as it can be..
I'm still working on implementing the streaming of the XML as it is generated from the first server where the XML is created and stream it to the response object, this is to take care when the XML with binary data is too large.
Vishesh Sahu
The basic problem is that you cannot have an arbitrary bytestream in an XML document, so you need to encode it somehow. A frequent encoding scheme is BASE64, but any will do as long as the recipient knows about it.
I know that the question was aking how to encode an image via XML, but it is also possible to just stream the bytes via an HTTP GET request instead of using XML and encoding an image. Note that input is a FileInputStream.
Server Code:
File f = new File(uri_string);
FileInputStream input = new FileInputStream(f);
OutputStream output = exchange.getResponseBody();
int c = 0;
while ((c = input.read()) != -1) {
output.write(c); //writes each byte to the exchange.getResponseBody();
}
result = new DownloadFileResult(int_list);
if (input != null) {input.close();}
if (output != null){ output.close();}
Client Code:
InputStream input = connection.getInputStream();
List<Integer> l = new ArrayList<>();
int b = 0;
while((b = input.read()) != -1){
l.add(b);//you can do what you wish with this list of ints ie- write them to a file. see code below.
}
Here is how you would write the Integer list to a file:
FileOutputStream out = new FileOutputStream("path/to/file.png");
for(int i : result_bytes_list){
out.write(i);
}
out.close();
node.setTextContent( base64.encodeAsString( fileBytes ) )
using org.apache.commons.codec.binary.Base64

Categories