i have a file that have some non-utf8 caracters (like "ISO-8859-1"), and so i want to convert that file (or read) to UTF8 encoding, how i can do it?
The code it's like this:
File file = new File("some_file_with_non_utf8_characters.txt");
/* some code to convert the file to an utf8 file */
...
edit: Put an encoding example
The following code converts a file from srcEncoding to tgtEncoding:
public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
BufferedReader br = null;
BufferedWriter bw = null;
try{
br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
char[] buffer = new char[16384];
int read;
while ((read = br.read(buffer)) != -1)
bw.write(buffer, 0, read);
} finally {
try {
if (br != null)
br.close();
} finally {
if (bw != null)
bw.close();
}
}
}
--EDIT--
Using Try-with-resources (Java 7):
public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
try (
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
char[] buffer = new char[16384];
int read;
while ((read = br.read(buffer)) != -1)
bw.write(buffer, 0, read);
}
}
String charset = "ISO-8859-1"; // or what corresponds
BufferedReader in = new BufferedReader(
new InputStreamReader (new FileInputStream(file), charset));
String line;
while( (line = in.readLine()) != null) {
....
}
There you have the text decoded. You can write it, by the simmetric Writer/OutputStream methods, with the encoding you prefer (eg UTF-8).
You need to know the encoding of the input file. For example, if the file is in Latin-1, you would do something like this,
FileInputStream fis = new FileInputStream("test.in");
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");
Reader in = new BufferedReader(isr);
FileOutputStream fos = new FileOutputStream("test.out");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
Writer out = new BufferedWriter(osw);
int ch;
while ((ch = in.read()) > -1) {
out.write(ch);
}
out.close();
in.close();
You only want to read it as UTF-8?
What I did recently given a similar problem is to start the JVM with -Dfile.encoding=UTF-8, and reading/printing as normal. I don't know if that is applicable in your case.
With that option:
System.out.println("á é í ó ú")
prints correctly the characters. Otherwise it prints a ? symbol
Related
I am using post man to send the JSon request. Then I get the inputStream using the getInputStream().
InputStream inputStream = request.getInputStream();
I have a JSon request with 2032 character and it might increase based on the scenarios. I tried few suggestions for the similar kind of issue, but using all I would be able to read only 1011 character.
Below are the ways which I tried.
Declarations:
BufferedReader bufferedReader = null;
StringBuilder stringBuilder = new StringBuilder();
// stringBuilder.ensureCapacity(1048576);
JSONObject jObj = null;
InputStream inputStream = request.getInputStream();
1)
bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
char[] charBuffer = new char[1048576];
int bytesRead = -1;
while ((bytesRead = bufferedReader.read(charBuffer)) > 0) {
stringBuilder.append(charBuffer, 0, bytesRead);
}
2)
bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = "";
String result = "";
while ((line = bufferedReader.readLine()) != null)
result += line;
inputStream.close();
3)
String line;
try {
bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
while ((line = bufferedReader.readLine()) != null) {
stringBuilder.append(line);
}
} catch (Exception e) {
// TODO: handle exception
}
4)
stringBuilder.ensureCapacity(1048576);
BoundedInputStream boundedInputStream = new BoundedInputStream(inputStream);
bufferedReader = new BufferedReader(new InputStreamReader(boundedInputStream, "UTF-8"));
// StringBuilder builder= new StringBuilder();
StringBuilderWriter bufferedwriter = new StringBuilderWriter(stringBuilder);
IOUtils.copy(bufferedReader, bufferedwriter);
5)
bufferedReader = request.getReader();
char[] charBuffer = new char[1048576];
int bytesRead = -1;
while ((bytesRead = bufferedReader.read(charBuffer)) > 0) {
stringBuilder.append(charBuffer, 0, bytesRead);
}
Final Consumption: Used the second variation result was my latest try
// jObj = new JSONObject(stringBuilder.toString());
// jObj = new JSONObject(bufferedwriter.toString());
jObj = new JSONObject(result.toString());
Note: I was just verifying by increasing the char capacity to 1048576 to see if that would solve. But increasing that also have no effect on the inputstream.
Could anyone of you please advise me on how to read large Json input. Also let me know if I am doing it wrong.
Thanks in advance.
You seem to want to convert the JSON into a String. With Java 8 this has become a bit simpler.
// (1)
try (BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {
// (2)
String json = reader.lines().collect(Collectors.joining("\n"));
// do something with `json`...
}
Explained:
Create a BufferedReader from the input stream. Using "try-with-resources" means, that the reader will be automatically closed when leaving the try {} block.
The BufferedReader has a method lines() which returns a Stream<String>. You can simply join all Strings using the joining collector.
I have a client that uploads a vcf file, and I get this file at server side and reads it contents and saves them to a txt file. But there is a character error when I try read it, if there is turkish characters it looks like "?". My read code is here:
FileItemStream item = null;
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterator = upload.getItemIterator(request);
String encoding = null;
while (iterator.hasNext()) {
item = iterator.next();
if ("fileUpload".equals(item.getFieldName())) {
InputStreamReader isr = new InputStreamReader(item.openStream(), "UTF-8");
String str = "";
String temp="";
BufferedReader br = new BufferedReader(isr);
while((temp=br.readLine()) != null){
str +=temp;
}
br.close();
File f = new File("C:/sedat.txt");
BufferedWriter buf = new BufferedWriter(new FileWriter(f));
buf.write(str);
buf.close();
}
BufferedWriter buf = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f), "UTF-8"));
If this is production code, i would recommend writing the output straight to the file and not accumulating it in the string first. And, you could avoid any potential encoding issues by reading the source as an InputStream and writing as an OutputStream (and skipping the conversion to characters).
In PHP we can use file_get_contents() like this:
<?php
$data = file_get_contents('php://input');
echo file_put_contents("image.jpg", $data);
?>
How can I implement this in Java (JSP)?
Here's a function I created in Java a while back that returns a String of the file contents. Hope it helps.
There might be some issues with \n and \r but it should get you started at least.
// Converts a file to a string
private String fileToString(String filename) throws IOException
{
BufferedReader reader = new BufferedReader(new FileReader(filename));
StringBuilder builder = new StringBuilder();
String line;
// For every line in the file, append it to the string builder
while((line = reader.readLine()) != null)
{
builder.append(line);
}
reader.close();
return builder.toString();
}
This will read a file from an URL and write it to a local file. Just add try/catch and imports as needed.
byte buf[] = new byte[4096];
URL url = new URL("http://path.to.file");
BufferedInputStream bis = new BufferedInputStream(url.openStream());
FileOutputStream fos = new FileOutputStream(target_filename);
int bytesRead = 0;
while((bytesRead = bis.read(buf)) != -1) {
fos.write(buf, 0, bytesRead);
}
fos.flush();
fos.close();
bis.close();
I'm trying to read from a text/plain file over the internet, line-by-line. The code I have right now is:
URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;
while ((readLine = in.readLine()) != null) {
lines.add(readLine);
}
for (String line : lines) {
out.println("> " + line);
}
The file, test.txt, contains ¡Hélló!, which I am using in order to test the encoding.
When I review the OutputStream (out), I see it as > ¬°H√©ll√≥!. I don't believe this is a problem with the OutputStream since I can do out.println("é"); without problems.
Any ideas for reading form the InputStream as UTF-8? Thanks!
Solved my own problem. This line:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
needs to be:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
or since Java 7:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));
String file = "";
try {
InputStream is = new FileInputStream(filename);
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
BufferedReader br = new BufferedReader(new InputStreamReader(is,
UTF8), BUFFER_SIZE);
String str;
while ((str = br.readLine()) != null) {
file += str;
}
} catch (Exception e) {
}
Try this,.. :-)
I ran into the same problem every time it finds a special character marks it as ��. to solve this, I tried using the encoding: ISO-8859-1
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));
while ((line = br.readLine()) != null) {
}
I hope this can help anyone who sees this post.
If you use the constructor InputStreamReader(InputStream in, Charset cs), bad characters are silently replaced. To change this behaviour, use a CharsetDecoder :
public static Reader newReader(Inputstream is) {
new InputStreamReader(is,
StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
);
}
Then catch java.nio.charset.CharacterCodingException.
This is a logging function which logs error stream from the execution of an external program. Everything works fine. But I do not want to generate the log file when there is no data in error stream. Currently it is creating zero size file. Please help.
FileOutputStream fos = new FileOutputStream(logFile);
PrintWriter pw = new PrintWriter(fos);
Process proc = Runtime.getRuntime().exec(externalProgram);
InputStreamReader isr = new InputStreamReader(proc.getErrorStream());
BufferedReader br = new BufferedReader(isr);
String line=null;
while ( (line = br.readLine()) != null)
{
if (pw != null){
pw.println(line);
pw.flush();
}
}
Thank you.
Simply defer the creating of the FileOutputStream and PrintWriter until you need it:
PrintWriter pw = null;
Process proc = Runtime.getRuntime().exec(externalProgram);
InputStreamReader isr = new InputStreamReader(proc.getErrorStream());
BufferedReader br = new BufferedReader(isr);
String line;
while ( (line = br.readLine()) != null)
{
if (pw == null)
{
pw = new PrintWriter(new FileOutputStream(logFile));
}
pw.println(line);
pw.flush();
}
Personally I'm not a big fan of PrintWriter - the fact that it just swallows all exceptions concerns me. I'd also use OutputStreamWriter so that you can explicitly specify the encoding. Anyway, that's aside from the real question here.
The obvious thing to do is to change
FileOutputStream fos = new FileOutputStream(logFile);
PrintWriter pw = new PrintWriter(fos);
....
if (pw != null){
...
}
to
FileOutputStream rawLog = null;
try {
PrintWriter Log = null;
....
if (log == null) {
rawLog = new FileOutputStream(logFile);
log = new PrintWriter(log, "UTF-8");
}
...
} finally {
// Thou shalt close thy resources.
// Icky null check - might want to split this using the Execute Around idiom.
if (rawLog != null) {
rawLog.close();
}
}