How to write an S3 object to a String in java - java

I am trying to download the content from s3object to a string in the following manner:
S3Object s3Object = amazonS3Client.getObject(bucketName, key);
S3ObjectInputStream stream = s3Object.getObjectContent();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(stream));
String text = "";
String temp = "";
try {
while((temp = bufferedReader.readLine()) != null){
text = text+temp;
}
bufferedReader.close();
stream.close();
} catch (IOException e) {
m_logger.error("Exception while reading the string " + e);
}
but while downloading the content, i am getting the following error
Client calculated content hash didn't match hash calculated by Amazon S3. The data may be corrupt.
at com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidationInputStream.java:73)
at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:61)
at java.io.FilterInputStream.read(FilterInputStream.java:133)

You can try jcabi-s3 (I'm a developer), which does this job for you:
Region region = new Region.Simple("key", "secret");
Bucket bucket = region.bucket("my.example.com");
Ocket.Text ocket = new Ocket.Text(bucket.ocket("test.txt"));
String content = ocket.read();
Check this blog post: http://www.yegor256.com/2014/05/26/amazon-s3-java-oop-adapter.html

Your code looks correct to me (although I'd put the close statements in a finally block, and handle line endings in concatenating the rows of the file in the text = text + temp statement).
Looking at the error message, I get the feeling that this is something occurring in the framework. Have you tried to get data from another object? Or to download the object you're trying to read through an alternative means to see that the data isn't corrupted?
Good Luck!

Related

Remove Base64 prefix from InputStream

I have a Base64 encoded Image String residing in a File Server. The encoded String has a prefix (ex: "data:image/png;base64,") for support in popular modern browsers (it's obtained via JavaScript's Canvas.toDataURL() method). The client sends a request for the image to my server which verifies them and returns a stream of the Base64 encoded String.
If the client is a web client, the image can be displayed as is within an <img> tag by setting the src to the Base64 encoded String. However, if the client is an Android client, the String needs to be decoded into a Bitmap without the prefix. Though, this can be done fairly easily.
The Problem:
In order to simplify my code and not reinvent the wheel, I'm using an Image Library for the Android client to handle loading, displaying, and caching the images (Facebook's Fresco Library to be exact). However, no library seems to support Base64 decoding (I want my cake and to eat it too). A solution I came up with is to decode the Base64 String on the server as it is being streamed to the client.
The Attempt:
S3Object obj = s3Client.getObject(new GetObjectRequest(bucketName, keyName));
Base64.Decoder decoder = Base64.getDecoder();
//decodes the stream as it is being read
InputStream stream = decoder.wrap(obj.getObjectContent());
try{
return new StreamingOutput(){
#Override
public void write(OutputStream output) throws IOException, WebApplicationException{
int nextByte = 0;
while((nextByte = stream.read()) != -1){
output.write(nextByte);
}
output.flush();
output.close();
stream.close();
}
};
}catch(Exception e){
e.printStackTrace();
}
Unfortunately, the Fresco library still has a problem displaying the image (with no stack traces!). As there doesn't seem to be an issue on my server when decoding the stream (no stack traces either), it leads me to believe that it must be an issue with the prefix. Which leaves me with a dilemma.
The Question: How do I remove the Base64 prefix from a Stream being sent to the client without storing and editing the entire Stream on the server? Is this possible?
Fresco does support decoding data URIs, just as the web client does.
The demo app has an example of this.
How do I remove the Base64 prefix from a Stream being sent to the client without storing and editing the entire Stream on the server?
Removing the prefix while sending the stream to the client turns out to be a pretty complex task. If you don't mind storing the whole String on the server you could simply do:
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(stream));
while ((line = br.readLine()) != null) {
sb.append(line);
}
String result = sb.toString();
//comma is the charater which seperates the prefix and the Base64 String
int i = result.indexOf(",");
result = result.substring(i + 1);
//Now, that we have just the Base64 encoded String, we can decode it
Base64.Decoder decoder = Base64.getDecoder();
byte[] decoded = decoder.decode(result);
//Now, just write each byte from the byte array to the output stream
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
But to be more efficient and not store the entire Stream on the server, creates a much more complicated task. We could use the Base64.Decoder.wrap() method but the problem with that is that it throws an IOException if it reaches a value that cannot be decoded (wouldn't it be nice if they provided a method that just left the bytes as is if they can't be decoded?). And unfortunately, the Base64 prefix can't be decoded because it's not Base64 encoded. So, it would throw an IOException.
To get around this problem, we would have to use an InputStreamReader to read the InputStream with the specified appropriate Charset. Then we would have to cast the ints received from the InputStream's read() method call to chars. When we reach the appropriate amount of chars, we would have to compare it with the Base64 prefix's intro ("data"). If it's a match, we know the Stream contains the prefix, so continue reading until we reach the prefix end character (the comma: ","). Finally, we can begin streaming out the bytes after the prefix. Example:
S3Object obj = s3Client.getObject(new GetObjectRequest(bucketName, keyName));
Base64.Decoder decoder = Base64.getDecoder();
InputStream stream = obj.getObjectContent();
InputStreamReader reader = new InputStreamReader(stream);
try{
return new StreamingOutput(){
#Override
public void write(OutputStream output) throws IOException, WebApplicationException{
//for checking if string has base64 prefix
char[] pre = new char[4]; //"data" has at most four bytes on a UTF-8 encoding
boolean containsPre = false;
int count = 0;
int nextByte = 0;
while((nextByte = stream.read()) != -1){
if(count < pre.length){
pre[count] = (char) nextByte;
count++;
}else if(count == pre.length){
//determine whether has prefix or not and act accordingly
count++;
containsPre = (Arrays.toString(pre).toLowerCase().equals("data")) ? true : false;
if(!containsPre){
//doesn't have Base64 prefix so write all the bytes until this point
for(int i = 0; i < pre.length; i++){
output.write((int) pre[i]);
}
output.write(nextByte);
}
}else if(containsPre && count < 25){
//the comma character (,) is considered the end of the Base64 prefix
//so look for the comma, but be realistic, if we don't find it at about 25 characters
//we can assume the String is not encoded correctly
containsPre = (Character.toString((char) nextByte).equals(",")) ? false : true;
count++;
}else{
output.write(nextByte);
}
}
output.flush();
output.close();
stream.close();
}
};
}catch(Exception e){
e.printStackTrace();
return null;
}
This seems a bit hefty of a task to do on the server so I think decoding on the client side is a better choice. Unfortunately, most Android client side libraries don't have support for Base64 decoding (especially with the prefix). However, as #tyronen pointed out Fresco does support it if the String is already obtained. Though, this removes one of the key reasons to use an image loading library.
Android Client Side Decoding
To decode on the client side application is pretty easy. First obtain the String from the InputStream:
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(stream));
while ((line = br.readLine()) != null) {
sb.append(line);
}
return sb.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Then decode the String using Android's Base64 class:
int i = result.indexOf(",");
result = result.substring(i + 1);
byte[] decodedString = Base64.decode(result, Base64.DEFAULT);
Bitmap bitMap = BitmapFactory.decodeByteArray(decodedString, 0, decodedString.length);
The Fresco library seems hard to update due to them using a lot of delegation. So, I moved on to using the Picasso image loading library and created my own fork of it with the Base64 decoding ability.

Hebrew rendering in website

I am working on a product which has an internet "Admin Panel" - Somewhere the user can see information about the product. One of the minimal requirements is that the website has both English and Hebrew Version. So what is the problem? The problem is that some of the characters look like this, But they should look like this.
When I get a request from a browser I read an HTML file using this code (JAVA):
public static String loadPage(String page, String lang) {
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
(Thanks to Jon Skeet for helpig with reading it as UTF-8), After I read the file I am replacing some of the comments to with the correct data (For example: I have a comment like this: <!--username--> and I replace it with "Itay"), After the replacing I just send the response.
The server itself is hosted using sun's HttpServer.
I also made sure to do these things:
I saved the html file as UTF-8
In the html file there is this meta tag: <meta charset="UTF-8">"
One of the response headers is: Content-Type=text/html;charset=utf-8
By the way i am using Chrome.
So I hope I gave enough details about my problem and if you need more feel free to tell me!
(I also hope I posted the question with the right tags and title)
Basically, don't use FileReader. It always uses the platform-default encoding, which may well not be appropriate for this file.
If you're using a modern version of Java, it's better to use:
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
br = Files.newBufferedReader(path);
That will read in UTF-8 by default - if you wanted a different charset, you can specify it as another argument to newBufferedReader.
I'd also advise you to use a try-with-resources statement to get rid of all the cruft with a manual finally block:
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
}
That will remove all line breaks, mind you. (Note that I've used StringBuilder to avoid performance issues from repeated string concatenation...)
You need to tell your FileReader to read as UTF8.
In the end i found that i realy had a problem reading as UTF-8 but the other problem was thats I have not sent it back as UTF-8 So this is how i sent it:
public void end(HttpExchange t, String response, long tStart, int status) throws IOException {
try {
String temp = convertToUTF8(response);
t.sendResponseHeaders(status, temp.length());
OutputStream os = t.getResponseBody();
OutputStream bout= new BufferedOutputStream(os);
OutputStreamWriter out = new OutputStreamWriter(bout, "UTF-8");
out.write(response);
out.flush();
out.close();
}catch (UnsupportedEncodingException e) {
System.out.println("This VM does not support the UTF-8 character set.");
}catch (IOException e) {
System.out.println(e.getMessage());
}
long tEnd = System.currentTimeMillis();
long tDelta = tEnd - tStart;
System.out.println("Done handling request! Time took: " + tDelta);
}
Again thank you Jon Skeet for yor answer it was very helpfull!
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
}
(This is how to read the file as UTF-8 using his way)

Download file line by line java

I know this question might sound really basic for most of you. I need to download a large file from server. The first line of this file contains a time tag. I want to download entire file only if my time tag mismatches to that of file. For this I'm using the given code. However, I'm not sure if this actually prevents file from uselessly downloading entire file.
Please help me out !
public String downloadString(String url,String myTime)
{
try {
URL url1 = new URL(url);
URLConnection tc = url1.openConnection();
tc.setConnectTimeout(timeout);
tc.setReadTimeout(timeout);
BufferedReader br = new BufferedReader(new InputStreamReader(tc.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
if(line.contains(myTime))
{
Log.d("TIME CHECK", "Article already updated");
break;
}
sb.append(line+"\n");
}
br.close();
return sb.toString();
}
catch(Exception e)
{
Log.d("Error","In JSON downloading");
}
return null;
}
No, there is no easy way to control exactly to the last byte what will be downloaded. Even at the Java level you are involving a BufferedReader, which will obviously download more than you ask for, buffering it. There are other buffers as well, including at the OS level, which you cannot control. The proper technique to download only new files with HTTP is to use the IfModifiedSince header.
Your code won't download the whole file but as the BufferedReader has a default buffer size of 8192 you will read at least that many characters.
You can go byte-by-byte or chunk-by-chunk if it is the size
BufferedInputStream in = new BufferedInputStream(url).openStream())
byte data[] = new byte[1024];
int count;
while((count = in.read(data,0,1024)) != -1)
{
out.write(data, 0, count);
}
Check this question please
How to download and save a file from Internet using Java?

issues in reading google text document with google apis

I am trying to use following code to read a Google text document. But the value returned is a stream with garbage characters instead of the real contents. How can I fix this.
for (DocumentListEntry entry : resultFeed.getEntries()) {
String docId = entry.getDocId();
String docType = entry.getType();
URL exportUrl = new URL("https://docs.google.com/feeds/download/"
+ docType
+ "s/Export?docID="
+ docId
+ "&exportFormat=doc");
MediaContent mc = new MediaContent();
mc.setUri(exportUrl.toString());
MediaSource ms = client.getMedia(mc);
InputStream inStream = null;
try {
inStream = ms.getInputStream();
int c;
while ((c = inStream.read()) != -1) {
System.out.print((char)c);
}
} finally {
if (inStream != null) {
inStream.close();
}
}
}
From a quick read of the documentation, it looks like you are reading the raw bytes of a Microsoft Word-encoded document.
Try changing the &exportFormat=doc to html or txt and see if the output makes more sense.
I suspect that the files you are trying to print out have some other encoding but you're printing them byte by byte in ASCII way. I would try to read the whole stream as byte array and then convert it to string using some other encoding (e.g. UTF8).

Get web page content to String is very slow

I did the download a web page with the HttpURLConnection.getInputStream() and to get the content to a String, i do the following method:
String content="";
isr = new InputStreamReader(pageContent);
br = new BufferedReader(isr);
try {
do {
line = br.readLine();
content += line;
} while (line != null);
return content;
} catch (Exception e) {
System.out.println("Error: " + e);
return null;
}
The download of the page is fast, but the processing to get the content to String is very slow. There is another way faster to get the content to a String?
I transform it to String to insert in the database.
Read into buffer by number of bytes, not something arbitrary like lines. That alone should be a good start to speeding this up, as the reader will not have to find the line end.
Use a StringBuffer instead.
Edit for an example:
StringBuffer buffer=new StringBuffer();
for(int i=0;i<20;++i)
buffer.append(i.toString());
String result=buffer.toString();
use the blob/clob to put the content directly into database.
any specific reason for buliding string line by line and put it in the database??
I'm using jsoup to get specified content of a page and here is a web demo based on jquery and jsoup to catch any content of a web page, you should specify the ID or Class for the page content you need to catch: http://www.gbin1.com/technology/democenter/20120720jsoupjquerysnatchpage/index.html

Categories