I'm trying to read a txt file that has the UCS-2 LE encoding, I have the following code below. the ??? is the encoding variable I need but I am not sure what it's supposed to be.
InputStream HostFile = new FileInputStream(Location + FileName);
Reader file = new InputStreamReader(HostFile, Charset.forName(???);
PrintWriter writer = new PrintWriter(outLocation, "UTF-8");
Any ideas would be appreciated .
Reader file = new InputStreamReader(HostFile, Charset.forName("UTF-16LE");
Related
I have an
InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("templates/createUser/new-user.txt");
and the content of the new-user.txt is :
Hello™ how r u ®
but when they are displayed in the output they are displayed as
Hello��� how r u��
Can you tell me what changes should I make to my txt file so that it starts displaying the data accordingly.
UPDATE
So here is the code :-
Handlebars handlebars = new Handlebars();
InputStream txtInputStream = this.getClass().getClassLoader()
.getResourceAsStream("templates/createUser/new-user.txt");
Template textTemplate = handlebars.compileInline(IOUtils.toString(txtInputStream));
String emailText = textTemplate.apply(vars);
The problem does not lie in the InputStream object. InputStreams are just streams of bytes, they do not differentiate between encodings. The problem is you should use this as your reader:
Reader reader = new InputStreamReader(inputStream, "UTF-8");
as opposed to using this:
Reader reader = new InputStreamReader(inputStream); // does not specify encoding
You can then get the string with:
String theString = IOUtils.toString(inputStream, "UTF-8");
Edit:
I did not realize you posted full code in the comments. Just change your second to last line to:
Template textTemplate = handlebars.compileInline(IOUtils.toString(txtInputStream, "UTF-8"));
I am trying to convert pptx files to txt (Text Extraction) using Apache POI Framework (Java).
I'm new in coding Java, so I don't know a lot about Buffered Readers/InputStream, etc.
What I tried is:
import org.apache.poi.xslf.XSLFSlideShow;
import org.apache.poi.xslf.extractor.XSLFPowerPointExtractor;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
... Classes and Stuff ....
String inputfile = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
InputStream fis = new FileInputStream(inputfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fis));
String fileName = br1.readLine();
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(fileName))).getText());
br1.close();
My goal is, to write the extracted text into a variable, but It doesn't even work to print it on console... What I get is:
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'PK
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:102)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:69)
org.apache.poi.xslf.XSLFSlideShow.<init>(XSLFSlideShow.java:90)
Any help would be greatly appreciated!
You are doing much to much, in fact you are trying to read the data of the PPTX itself as filename, better simply use
System.out.println(new XSLFPowerPointExtractor(
new XMLSlideShow(new XSLFSlideShow(
"X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"))).getText());
or more generic
POITextExtractor extractor = ExtractorFactory.createExtractor(
new java.io.File("X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"");
System.out.println(extractor.getText());
extractor.close();
I cannot give you the correct answer (because I myself don't use POI), but I can tell you where your mistake might lie.
The constructor of the class XSLFSlideShow is expecting file path as its argument. But you are passing an InputStream. Try it as follows:
String filePath = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(filePath))).getText());
I am trying to read an xls file in java and convert it to csv. The problem is that it contains greek characters. I have used various different methods with no success.
br = new BufferedReader(new InputStreamReader(
new FileInputStream(saveDir+"/"+fileName+".xls"), "UTF-8"));
FileWriter writer1 = new FileWriter(saveDir+"/A"+fileName+".csv");
byte[] bytes = thisLine.getBytes("UTF-8");
writer1.append(new String(bytes, "UTF-8"));
used that with different encoders, like utf16 and windoes-1253 and ofcourse with out using the bytes array. none worked. any ideas?
Use "ISO-8859-7" instead of "UTF-8". It is for latin and greek. See documentation
InputStream in = new BufferedInputStream(new FileInputStream(new File(myfile)));
result = new Scanner(in,"ISO-8859-7").useDelimiter("\\A").next();
A Byte Order Mask (BOM) should be entered at the start of the CSV file.
Can you try this code?
PrintWriter writer1 = new PrintWriter(saveDir+"/A"+fileName+".csv");
writer1.print('\ufeff');
....
I have a FileOutputStream in java that is reading the contents of UDP packets and saving them to a file. At the end of reading them, I sometimes want to convert the encoding of the file. The problem is that currently when doing this, it just ends up doubling all the contents of the file. The only workaround that I could think to do would be to create a temp file with the new encoding and then save it as the original file, but this seems too hacky.
I must be just overlooking something in my code:
if(mode.equals("netascii")){
byte[] convert = new byte[(int)file.length()];
FileInputStream input = new FileInputStream(file);
input.read(convert);
String temp = new String(convert);
convert = Charset.forName("US-ASCII").encode(temp).array();
fos.write(convert);
}
JOptionPane.showMessageDialog(frame, "Read Successful!");
fos.close();
}
Is there anything suspect?
Thanks in advance for any help!
The problem is the array of bytes you've read from the InputStream will be converted as if its ascii chars, which I'm assuming its not. Specify the InputStream encoding when converting its bytes to String and you'll get a standard Java string.
I've assumed UTF-16 as the InputStream's encoding here:
byte[] convert = new byte[(int)file.length()];
FileInputStream input = new FileInputStream(file);
// read file bytes until EOF
int r = input.read(convert);
while(r!=-1) r = input.read(convert,r,convert.length);
String temp = new String(convert, Charset.forName("UTF-16"));
I am using this code to save the data in the file. The data that is being saved in the file is ????????. Please help me with suitable solution.
File gpxfile = new File(activate, "activate.csv");
OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(gpxfile),"UTF-8");
writer.write(merchantId);
It works for me. Make sure merchantId contains valid Hindi. For instance:
String str = "मानक हिन्दी";
writer.write(str);