Special character in txt file not being passed in the InputStream - java

I have an
InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("templates/createUser/new-user.txt");
and the content of the new-user.txt is :
Hello™ how r u ®
but when they are displayed in the output they are displayed as
Hello��� how r u��
Can you tell me what changes should I make to my txt file so that it starts displaying the data accordingly.
UPDATE
So here is the code :-
Handlebars handlebars = new Handlebars();
InputStream txtInputStream = this.getClass().getClassLoader()
.getResourceAsStream("templates/createUser/new-user.txt");
Template textTemplate = handlebars.compileInline(IOUtils.toString(txtInputStream));
String emailText = textTemplate.apply(vars);

The problem does not lie in the InputStream object. InputStreams are just streams of bytes, they do not differentiate between encodings. The problem is you should use this as your reader:
Reader reader = new InputStreamReader(inputStream, "UTF-8");
as opposed to using this:
Reader reader = new InputStreamReader(inputStream); // does not specify encoding
You can then get the string with:
String theString = IOUtils.toString(inputStream, "UTF-8");
Edit:
I did not realize you posted full code in the comments. Just change your second to last line to:
Template textTemplate = handlebars.compileInline(IOUtils.toString(txtInputStream, "UTF-8"));

Related

How do I uncompress .bz file contents and convert it to plain enlish

I have .bz file contents in s3 bucket.
Using S3Object , I'm able to read the file :
S3Object object = s3.getObject(bucket, path);
S3ObjectInputStream inputStream = object.getObjectContent();
Now I want to uncompress this content.
Tried, converting using the below code, but it is still giving me machine readable text but not english text.
String text = new BufferedReader(
new InputStreamReader(inputStream, StandardCharsets.UTF_8))
.lines()
.collect(Collectors.joining("\n"));
how do I get the uncompressed text here.
You could use a library.
For example Apache Commons Compress.
S3Object object = s3.getObject(bucket, path);
S3ObjectInputStream inputStream = object.getObjectContent();
BZip2CompressorInputStream bzInputStream = new BZip2CompressorInputStream(inputStream);
// Then just write to a string.
// This is Java 9+.
String plaintext = new String(bzInputStream.readAllBytes(), StandardCharsets.UTF_8);

Java, Reading a file that has UCS-2 Little Endian encodeing

I'm trying to read a txt file that has the UCS-2 LE encoding, I have the following code below. the ??? is the encoding variable I need but I am not sure what it's supposed to be.
InputStream HostFile = new FileInputStream(Location + FileName);
Reader file = new InputStreamReader(HostFile, Charset.forName(???);
PrintWriter writer = new PrintWriter(outLocation, "UTF-8");
Any ideas would be appreciated .
Reader file = new InputStreamReader(HostFile, Charset.forName("UTF-16LE");

Java: UTF8 encoding is not displayed correctly in JTextArea

I am trying to display txt or docx file content in JTextArea, but text area does not correcty display armenian or russian text. UTF-8 enconding in InputStreamReader does not help:
public class TextReader {
public static String getText(File textFile) throws IOException {
FileInputStream fis = new FileInputStream(textFile);
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
BufferedReader br = new BufferedReader(isr);
StringBuilder text = new StringBuilder();
String c;
while ((c = br.readLine()) != null)
text.append(c + "\n");
fis.close();
isr.close();
br.close();
return String.valueOf(text);
}
}
I am using this static method in another class in JTextArea:
String text = TextReader.getText(currentFile);
textArea.setText(text);
After running and choosing the file, I got random characters. What could be the solution in this case?
Your code seems to be fine. My guess is you are trying to read a docx file.
You can't directly read docx files this way. Use some library like Apache POI.
If you are indeed using a text file, it might be the case that application you use to save the file uses wrong encoding. You could try saving some (hard-coded) sample Russian text using Java itself to a text file and reading it again in to your JTextArea.

read greek characters from xls file into java

I am trying to read an xls file in java and convert it to csv. The problem is that it contains greek characters. I have used various different methods with no success.
br = new BufferedReader(new InputStreamReader(
new FileInputStream(saveDir+"/"+fileName+".xls"), "UTF-8"));
FileWriter writer1 = new FileWriter(saveDir+"/A"+fileName+".csv");
byte[] bytes = thisLine.getBytes("UTF-8");
writer1.append(new String(bytes, "UTF-8"));
used that with different encoders, like utf16 and windoes-1253 and ofcourse with out using the bytes array. none worked. any ideas?
Use "ISO-8859-7" instead of "UTF-8". It is for latin and greek. See documentation
InputStream in = new BufferedInputStream(new FileInputStream(new File(myfile)));
result = new Scanner(in,"ISO-8859-7").useDelimiter("\\A").next();
A Byte Order Mask (BOM) should be entered at the start of the CSV file.
Can you try this code?
PrintWriter writer1 = new PrintWriter(saveDir+"/A"+fileName+".csv");
writer1.print('\ufeff');
....

How do I get an FileInputStream from FileItem in java?

I am trying to avoid the FileItem getInputStream(), because it will get the wrong encoding, for that I need a FileInputStream instead. Is there any way to get a FileInputStream without using this method? Or can I transform my fileitem into a file?
if (this.strEncoding != null && !this.strEncoding.isEmpty()) {
br = new BufferedReader(new InputStreamReader(clsFile.getInputStream(), this.strEncoding));
}
else {
// br = ?????
}
You can try
FileItem#getString(encoding)
Returns the contents of the file item as a String, using the specified encoding.
You can use the write method here.
File file = new File("/path/to/file");
fileItem.write(file);
An InputStream is binary data, bytes. It must be converted to text by giving the encoding of those bytes.
Java uses internally Unicode to represent all text scripts. For text it uses String/char/Reader/Writer.
For binary data, byte[], InputStream, OutputStream.
So you could use a bridging class, like InputStreamReader:
String encoding = "UTF-8"; // Or "Windows-1252" ...
BufferedReader in = new BufferedStream(
new InputStreamReader(fileItem.getInputStream(),
encoding));
Or if you read the bytes:
String s = new String(bytes, encoding);
The encoding is often an option parameter (there then exists an overloaded method without encoding).

Categories