Java Tomcat UTF-8 Issues - java

Here is a live example of what I am talking about:
http://185.112.249.77:9999/Api/search?search=ж
That URL displays no results.
http://188.226.217.48:8338/api/clan_search/ж
It does display results.
How come this is happening?
My code for reading the parameter is: String search = request.getParameter("search");
System.out.println(search); also outputs a
?
I looked around and it seems there might be something I need to do in the Tomcat8 Config but I can't find what or figure out what has to be done.
I'd appreciate any help with this.
This problem also occurs when I am printing out the results. The first one shows no results and the 2nd shows the results and in UTF-8.
What is the most likely issue causing this and what code/config files would you need to see?
EDIT
I am receiving a bytearray which I am converting to an inputstream via a bytearrayinputstream like this.
InputStream myis = new ByteArrayInputStream(decryptedPayload);
I have a class which handles the packet and it extends a class I made called PacketInputStream. This class has a readString function which goes like this:
public String readString() throws IOException {
int length = readVarInt();
byte[] data = new byte[length];
readFully(data);
return new String(data, UTF8);
}
The string doesn't display properly on the returned byte[]s and it also doesn't work when I send it through a GET parameter.
Thanks

Related

Issue when convert buffer to string with hexadecimal code of LF

I am trying to download web page with all its resources . First i download the html, but when to be sure to keep file formatted and use this function below .
there is and issue , i found 10 in the final file and when i found that hexadecimal code of the LF or line escape . and this makes troubles to my javascript functions .
Example of the final result :
<!DOCTYPE html>10<html lang="fr">10 <head>10 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />10
Can someone help me to found the real issue ?
public static String scanfile(File file) {
StringBuilder sb = new StringBuilder();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
while (true) {
String readLine = bufferedReader.readLine();
if (readLine != null) {
sb.append(readLine);
sb.append(System.lineSeparator());
Log.i(TAG,sb.toString());
} else {
bufferedReader.close();
return sb.toString();
}
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
There are multiple problems with your code.
Charset error
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
This isn't going to work in tricky ways.
Files (and, for that matter, data given to you by webservers) comes in bytes. A stream of numbers, each number being between 0 and 255.
So, if you are a webserver and you want to send the character ö, what byte(s) do you send?
The answer is complicated. The mapping that explains how some character is rendered in byte(s)-form is called a character set encoding (shortened to 'charset').
Anytime bytes are turned into characters or vice versa, there is always a charset involved. Always.
So, you're reading a file (that'd be bytes), and turning it into a Reader (which is chars). Thus, charset is involved.
Which charset? The API of new FileReader(path) explains which one: "The system default". You do not want that.
Thus, this code is broken. You want one of two things:
Option 1 - write the data as is
When doing the job of querying the webserver for the data and relaying this information onto disk, you'd want to just store the bytes (after all, webserver gives bytes, and disks store bytes, that's easy), but the webserver also sends the encoding, in a header, and you need to save this separately. Because to read that 'sack of bytes', you need to know the charset to turn it into characters.
How would you do this? Well, up to you. You could for example decree that the data file starts with the name of a charset encoding (as sent via that header), then a 0 byte, and then the data, unmodified. I think you should go with option 2, however
Option 2
Another, better option for text-based documents (which HTML is), is this: When reading the data, convert it to characters, using the encoding as that header tells you. Then, to save it to disk, turn the chars back to bytes, using UTF-8, which is a great encoding and an industry standard. That way, when reading, you just know it's UTF-8, period.
To read a UTF-8 text file, you do:
Files.newBufferedReader(Paths.get(file));
The reason this works, is that the Files API, unlike most other APIs (and unlike FileReader, which you should never ever use), defaults to UTF_8 and not to platform-default. If you want, you can make it more readable:
Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8);
same thing - but now in the code it is clear what's happening.
Broken exception handling
} catch (IOException e) {
e.printStackTrace();
return null;
}
This is not okay - if you catch an exception, either [A] throw something else, or [B] handle the problem. And 'log it and keep going' is definitely not 'handling' it. Your strategy of exception handling results in 1 error resulting in a thousand things going wrong with a thousand stack traces, and all of them except the first are undesired and irrelevant, hence why this is horrible code and you should never write it this way.
The easy solution is to just put throws IOException on your scanFile method. The method inherently interacts with files, it SHOULD be throwing that. Note that your psv main(String[] args) method can, and usually should, be declared to throws Exception.
It also makes your code simpler and shorter, yay!
Resource Management failure
a filereader is a resource. You MUST close it, no matter what happens. You are not doing that: If .readLine() throws an exception, then your code will jump to the catch handler and bufferedReader.close is never executed.
The solution is to use the ARM (Automatic Resource Management) construct:
try (var br = Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8)) {
// code goes here
}
This construct ensures that close() is invoked, regardless of how the 'code goes here' block exits. Even if it 'exits' via an exception or a return statement.
The problem
Your 'read a file and print it' code is other than the above three items mostly fine. The problem is that the HTML file on disk is corrupted; the error lies in your code that reads the data from the web server and saves it to disk. You did not paste that code.
Specifically, System.lineSeparator() returns the actual string. Thus, assuming the code you pasted really is the code you are running, if you are seeing an actual '10' show up, then that means the HTML file on disk has that in there. It's not the read code.
Closing thoughts
More generally the job of 'just print a file on disk with a known encoding' can be done in far fewer lines of code:
public static String scanFile(String path) throws IOException {
return Files.readString(Paths.get(path));
}
You should just use the above code instead. It's simple, short, doesn't have any bugs, cannot leak resources, has proper exception handling, and will use UTF-8.
Actually, there is no problem in this function I was mistakenly adding 10 using another function in my code .

java android application cannot make http get

Can someone helps me to get json from the web.In the end of function jsonResponse is empty. I use this method to do it:
private String getJson() {
jsonResponsce = "";
AsyncTask.execute(new Runnable() {
#Override
public void run() {
try{
URL httpbinEndpoint = new URL(webPage);
HttpsURLConnection myConnection = (HttpsURLConnection) httpbinEndpoint.openConnection();
myConnection.setRequestMethod("GET");
// Enable writing
myConnection.setDoOutput(true);
String internetData = "";
// Write the data
myConnection.getOutputStream().write(internetData.getBytes());
jsonResponsce = internetData;
} catch (IOException e) {
e.printStackTrace();
}
}
});
return jsonResponsce;
}
I set an Internet permission to the manifest. I try go get Json from the next address: https://shori-dodjo-mobile-app.firebaseio.com/.json. Full code is placed here: https://github.com/GenkoKaradimov/Shori-Dodjo-Android-App/
You are executing the request asynchronously so the method starts the execution and then completes and therefore there is no result. The result will be there in a second but by that time the method getJson has already completed. You most probably need to put the code that uses the json at the end of the run method.
In addition your code for reading from the stream seems wrong. It should probably be something like
BufferedReader br = new BufferedReader(new InputStreamReader(myConnection.getInputStream()));
jsonResponsce = br.lines().collect(Collectors.joining("\n"));
(I haven't tested this)
There are multiple issues in your code:
First, AsyncTask means it's async(hronous), so you can't return the result right away. Instead, override AsyncTask's onPostExecute and do what you need to do with the data there. Here is the sample implementation.
Second, you're using getOutputStream, which is intended for writing to the connection, i.e. sending data to the server. In your case you need to getInputStream and read from it. Easiest way is to wrap it in a BufferedReader and read until it returns -1 (marking end of stream), and then convert to string.
There are a few quirks: You should handle, or at least recognize errors by checking HTTP status code, handle encodings (the convert-bytes-to-string part), and handle cases when response is compressed, e.g. using DEFLATE or gzip. I've implemented that in a pure Java way (reference code, warning: outdated docs), but I'd seriously recommend using one of the established libraries such as Retrofit or Volley.
Json objects usually get return as HashMaps.
So you might need something like, HashMap yourMap = new HashMap<~>();
then
yourMap.get("the objects name on the other side", the var its getting saved too.);
right now it looks like all you are trying to do is save the byte data, but this byte data needs to have a type. Hope this helps.

why epublib for android web view load data url show [B#41408d8?

Hi everyone i try to show content of epub using epublib.
this is my code
File f = new File(Environment.getExternalStorageDirectory() + "/documents/cindersilly.epub");
String path = f.getPath();
FileInputStream epubInputStream = new FileInputStream(f);
Book book = new EpubReader().readEpub(epubInputStream);
wvTest.loadDataWithBaseURL(f.getAbsolutePath(), book.getContents().get(0).getData().toString(), "text/html", "UTF-8", null);
and i get result :
[B#41408d8
what is that?
and how to solve this so the content will show on the webview?
thanks
You haven't posted enough code to be able to see all the details, but your getData() method is returning a byte[]. When you invoke toString() on an object, it tries to convert it into a String; but arrays don't have a toString() that returns anything particularly useful. What you get is a header ([B) that tells you its type (byte array), and an address that says where in JVM memory it's stored.
If you want to be able to see the contents of the array, you can use Arrays.toString() to turn it into something more useful. You pass it the byte[] you've got (in this case, the output of getData()) and it constructs the String representation for you. Your code would look like this:
wvTest.loadDataWithBaseURL(f.getAbsolutePath(), Arrays.toString(book.getContents().get(0).getData()), "text/html", "UTF-8", null);
It's also possible you weren't intending it to return a byte[] at all, in which case your problem is further back in your code.

UTF8 characters showing weirdly or random basis in Android TextView

It has been at least 5 applications in which I have attempted to display UTF8 encoded characters and every time, quite sporadically and rarely I see random characters being replaced by diamond question marks (see image for better details).
I enclose a page layout to demonstrate my issues. The layout is very basic, it is very simple poll I am creating. The "Съгласен съм" text is takes from a database, where it has just been inserted by a script, using copy-pasted constant. The text is displayed in TextViews.
Has anyone ever encountered such an issue? Please advise!
EDIT: Something I forgot to mention is that the amount and position of weird characters varies on diffferent Android Phone models.
Finally I got it all sorted out in all my applications. Actually the issues mlet down to 3 different reasons and I will list all of them below so that this findings of mine could help people in the future.
Reason 1: Incorrect encoding of user created file.
This actually was the problem with the application I posted about in the question. The problem was that the encoding of the insert script I used for introducing the values in the database was "UTF8 without BOM". I converted this encoding to "UTF8" using Notepad++ and reinserted the values in the database and the issue was resolved. Thanks to #user3249477 for pointing me to thinking in this direction. By the way "UTF8 without BOM" seems to be the default encoding Eclipse uses when creating URF8 files, so take care!
Reason 2: Incorrect encoding of generated file.
The problem of reason 1, pointed me to what to think for in some of the other cases I was facing. In one application of mine I am provided with raw data that I insert in my backend database using simple Java application. The problem there turned out to be that I was passing through intermediate format, files stored on the file system that ?I used to verify I interpretted the raw data correctly. I noticed that these files were also created "UTF8 without BOM". I used this code to write to these files:
BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream(outputFilePath));
writer = new BufferedWriter(new OutputStreamWriter(outputStream, STRING_ENCODING));
writer.append(string);
Which I changed to:
BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream(outputFilePath));
writer = new BufferedWriter(new OutputStreamWriter(outputStream, STRING_ENCODING));
// prepending a bom
writer.write('\ufeff');
writer.append(string);
Following the prescriptions from this answer. This line I add basically made all the intermediate files be encoded in "UTF8" with BOM and resolved my encoding issues.
Reason 3: Incorrect parsing of HTTP responses
The last issue I encountered in few of my applications was that I was not interpretting the UTF8 http responses correctly. I used to have the following code:
HttpResponse response = httpClient.execute(host, request, (HttpContext) null);
String responseBody = null;
responseBody = IOHelper.getInputStreamContents(responseStream);
Where IOHelper is an util I have written myself and reads stream contents to String. I replaced this code with the already provided method in the Android API:
HttpResponse response = httpClient.execute(host, request, (HttpContext) null);
String responseBody = null;
if (response.getEntity() != null) {
responseBody = EntityUtils.toString(response.getEntity(), HTTP.UTF_8);
}
And this fixed the encoding issues I was having with HTTP responses.
As conclusion I can say that one needs to take special care of BOM / without BOM strings when using UTF8 encoding in Android. I am very happy I learnt so many new things during this investigation.

Reading data from URL returning strange characters [duplicate]

This question already has answers here:
JSON URL from StackExchange API returning jibberish?
(3 answers)
Closed 9 years ago.
I am trying to grab the data from a json file through java. If I navigate to the URL using my browser, everything displays fine, but if I try to get the data using java I get get a bunch of characters that cannot be interpreted or parsed. Note that this code works with other JSON Files. Could this be a server side thing with the way the JSON file is created? I tried messing around with different character sets and that did not seem to fix the problem.
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.minecraftpvp.com/api/ping.json");
URLConnection connection = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
boolean hasLine = true;
while (hasLine) {
String line = in.readLine();
if (line != null) {
System.out.println(line);
} else {
hasLine = false;
}
}
}
The output I get from this is just a ton of strange characters that make no sense at all. Where if I change the url to something like google.com, it works fine.
EDIT: JSON URL from StackExchange API returning jibberish? Seemed to have answered my question. I tried searching before I asked to make sure the answer wasn't here and couldn't find anything. Guess I didn't look hard enough.
Yes that URL is returning gzip encoded content by default.
You can do one of three things:
Explicitly set the Accept-Encoding: header in your request. A web service should not return gzip compression unless it is listed as an accepted encoding in the request, so this website is not being very friendly. Your browser is setting it as accepted I suspect, that is why you can see it there. Just set it to an empty value and it should as per the spec return non-encoded responses, your mileage may vary on this one.
Or use the answer in this How to handle non-UTF8 html page in Java? that shows how to decompress the response. This should be the preferred option over #1.
And/or Ask the person hosting the service to implement the recommended scheme which is to only provide compressed responses if the client says it can handle them or if it can infer it from the browser fingerprint with high confidence.
Best of luck C.
You need to inspect the Content-Encoding header. The URL in question improperly returns gzip-compressed content even when you don't ask for it, and you'll need to run it through a decoder.

Categories