Character looks like "?" at Reading the Content of an Uploaded File - java

I have a client that uploads a vcf file, and I get this file at server side and reads it contents and saves them to a txt file. But there is a character error when I try read it, if there is turkish characters it looks like "?". My read code is here:
FileItemStream item = null;
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterator = upload.getItemIterator(request);
String encoding = null;
while (iterator.hasNext()) {
item = iterator.next();
if ("fileUpload".equals(item.getFieldName())) {
InputStreamReader isr = new InputStreamReader(item.openStream(), "UTF-8");
String str = "";
String temp="";
BufferedReader br = new BufferedReader(isr);
while((temp=br.readLine()) != null){
str +=temp;
}
br.close();
File f = new File("C:/sedat.txt");
BufferedWriter buf = new BufferedWriter(new FileWriter(f));
buf.write(str);
buf.close();
}

BufferedWriter buf = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f), "UTF-8"));
If this is production code, i would recommend writing the output straight to the file and not accumulating it in the string first. And, you could avoid any potential encoding issues by reading the source as an InputStream and writing as an OutputStream (and skipping the conversion to characters).

Related

Having trouble reading in content of url using InputStream

So I run the code below and it prints "!DOCTYPE html". How do I get the content of the url, like the html for instance?
public static void main(String[] args) throws IOException {
URL u = new URL("https://www.whitehouse.gov/");
InputStream ins = u.openStream();
InputStreamReader isr = new InputStreamReader(ins);
BufferedReader websiteText = new BufferedReader(isr);
System.out.println(websiteText.readLine());
}
According to java doc https://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html: "When you run the program, you should see, scrolling by in your command window, the HTML commands and textual content from the HTML file located at ".... Why am I not getting that?
In your program, your did not put while loop.
URL u = new URL("https://www.whitehouse.gov/");
InputStream ins = u.openStream();
InputStreamReader isr = new InputStreamReader(ins);
BufferedReader websiteText = new BufferedReader(isr);
String inputLine;
while ((inputLine = websiteText.readLine()) != null){
System.out.println(inputLine);
}
websiteText.close();
You are only reading one line of the text.
Try this and you will see that you get two lines:
System.out.println(websiteText.readLine());
System.out.println(websiteText.readLine());
Try reading it in a loop to get all the text.
BufferedReader has a method called #lines() since Java 8. The return type of #lines() is Stream. To read an entire site you could do something like that:
String htmlText = websiteText.lines()
.reduce("", (text, nextLine) -> text + "\n" + nextLine)
.orElse(null);

Read any file as a binary string

As the title suggests, is there any way to read a binary representation of a given file (.txt, .docx, .exe, etc...) in Java (or any other language)?
In java, I know how to read the content of a file as is, i.e:
String line;
BufferedReader br = new BufferedReader(new FileReader("myFile.txt"));
while ((line = br.readLine()) != null) {
System.out.println(line);
}
But I'm not sure (if it's possible) to read a binary representation of the file itself.
File file = new File(filePath);
byte[] bytes = new byte[(int)file.length()];
DataInputStream dataInputStream = new DataInputStream(new BufferedInputStream(new FileInputStream(filePath)));
dataInputStream.readFully(bytes);
dataInputStream.close();
bytes is a byte array with all of the data of the file in it

How to Decide InputStream Encoding?

My objective is to download an xml feed into an InputStream, then convert it to a String so that if may be used with XmlPullParser.
I convert the InputStream into a String like this:
InputStream input_stream = connection.getInputStream();
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(input_stream,"UTF-8"));
while ((line = br.readLine()) != null) {
sb.append(line);
}
Here's the problem, some XML feeds define specific encoding. Take this one for example:
http://voxinox.ch/podcasts/valdo/feed.xml
If I use a default of "UTF-8" encoding some characters from the feed look like a black rhombus shape with a question mark in it. If I use the encoding specified in the xml header it works (iso-8859-1), not a surprise.
The thing is how do I decide what encoding to use before I start reading the input stream which contains encoding specifications? Is there a better way of doing this?
Example how i get encoding from XML inputstream
FileInputStream finput = new FileInputStream(myFile);
String encoding = getInputEncoding(finput);
Log.d("Encoding: ", "> " + encoding);
public String getInputEncoding(FileInputStream finput){
String encoding = "";
if(finput!=null){
try{
BufferedReader myReader = new BufferedReader(new InputStreamReader(finput));
String getline = "";
getline = myReader.readLine();
myReader.close();
Log.d("Line: ", "> " + getline);
String[] separated = getline.split("encoding=\"");
String encoding1 = separated[1];
String[] separated2 = encoding1.split("\"");
encoding = separated2[0];
} catch (Exception e) {
}
}
return encoding;
}

JAVA IO : Issue with reading a stream using BufferedReader

am trying to read a JSON response using buffered reader as shown below. I'm using Apache Commons Http client. Response comes as a single line JSON and no of characters are around 1060000 and size is approximately 1 MB. Problem am facing is only part of stream is read by reader and other part is missing. How can i read the full JSON without losing any data.? Is this related to 'CharBufferSize' of BufferedReader or no of characters in the stream ?
InputStream stream = method.getResponseBodyAsStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream, "UTF-8"));
StringBuilder builder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
builder.append(line);
}
try using a json parser.
import org.codehaus.jackson.*;
JsonFactory fac = new JsonFactory();
JsonParser parser = fac .createJsonParser(stream);
If you just want to copy the complete stream into the StringBuilder, you should use the InputStreamReader and a char-array buffer.
InputStream stream = method.getResponseBodyAsStream();
InputStreamReader reader = new InputStreamReader(stream, "UTF-8");
StringBuilder builder = new StringBuilder();
char[] buffer = new char[4096];
int read;
while ((read = reader.read(buffer)) != -1) {
builder.append(buffer, 0, read);
}
Finally i was able to solve using the IOUtils in Apache Commons library. Here is the code.
BoundedInputStream boundedInputStream= new BoundedInputStream(stream);
BufferedReader reader = new BufferedReader(new InputStreamReader(boundedInputStream,"UTF-8"));
StringBuilder builder= new StringBuilder();
StringBuilderWriter writer = new StringBuilderWriter(builder);
IOUtils.copy(reader, writer);
Although it is been a while, it may be helpful for someone.
Here is the original source,
Most Robust way of reading a file or stream using Java (To prevent DoS attacks)

java: how to convert a file to utf8

i have a file that have some non-utf8 caracters (like "ISO-8859-1"), and so i want to convert that file (or read) to UTF8 encoding, how i can do it?
The code it's like this:
File file = new File("some_file_with_non_utf8_characters.txt");
/* some code to convert the file to an utf8 file */
...
edit: Put an encoding example
The following code converts a file from srcEncoding to tgtEncoding:
public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
BufferedReader br = null;
BufferedWriter bw = null;
try{
br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
char[] buffer = new char[16384];
int read;
while ((read = br.read(buffer)) != -1)
bw.write(buffer, 0, read);
} finally {
try {
if (br != null)
br.close();
} finally {
if (bw != null)
bw.close();
}
}
}
--EDIT--
Using Try-with-resources (Java 7):
public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
try (
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
char[] buffer = new char[16384];
int read;
while ((read = br.read(buffer)) != -1)
bw.write(buffer, 0, read);
}
}
String charset = "ISO-8859-1"; // or what corresponds
BufferedReader in = new BufferedReader(
new InputStreamReader (new FileInputStream(file), charset));
String line;
while( (line = in.readLine()) != null) {
....
}
There you have the text decoded. You can write it, by the simmetric Writer/OutputStream methods, with the encoding you prefer (eg UTF-8).
You need to know the encoding of the input file. For example, if the file is in Latin-1, you would do something like this,
FileInputStream fis = new FileInputStream("test.in");
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");
Reader in = new BufferedReader(isr);
FileOutputStream fos = new FileOutputStream("test.out");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
Writer out = new BufferedWriter(osw);
int ch;
while ((ch = in.read()) > -1) {
out.write(ch);
}
out.close();
in.close();
You only want to read it as UTF-8?
What I did recently given a similar problem is to start the JVM with -Dfile.encoding=UTF-8, and reading/printing as normal. I don't know if that is applicable in your case.
With that option:
System.out.println("á é í ó ú")
prints correctly the characters. Otherwise it prints a ? symbol

Categories