Saving Information in Hindi - java

I am using this code to save the data in the file. The data that is being saved in the file is ????????. Please help me with suitable solution.
File gpxfile = new File(activate, "activate.csv");
OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(gpxfile),"UTF-8");
writer.write(merchantId);

It works for me. Make sure merchantId contains valid Hindi. For instance:
String str = "मानक हिन्दी";
writer.write(str);

Related

How to set 'charset' for DatumWriter || write avro that contains arabic characters to HDFS

Some of the data contains value in Arabic format, and when the data is written, reader code/hadoop fs -text command shows ?? instead of Arabic characters.
1) Writer
// avro object is provided as SpecificRecordBase
Path path = new Path(pathStr);
DatumWriter<SpecificRecord> datumWriter = new SpecificDatumWriter<>();
FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf); // HDFS File System
FSDataOutputStream outputStream = fs.create(path);
DataFileWriter<SpecificRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
Schema schema = getSchema(); // method to get schema
dataFileWriter.setCodec(CodecFactory.snappyCodec());
dataFileWriter.create(schema, outputStream);
dataFileWriter.append(avroObject);
2) Reader
Configuration conf = new Configuration();
FsInput in = new FsInput(new Path(hdfsFilePathStr), conf);
DatumReader<Row> datumReader = new GenericDatumReader<>();
DataFileReader<Row> dataFileReader = new DataFileReader<>(in, datumReader);
GenericRecord outputData = (GenericRecord) dataFileReader.iterator.next();
I've tried hadoop fs -text {filePath} command, there also the values in Arabic appear as ??.
It will be really difficult to change the format in which data is written because there are numerous consumers of the same file.
Tried reading through SpecificRecordBase, still getting ??.
Edit
Also tried these (in both reader and writer):
Configuration conf = new Configuration();
conf.set("file.encoding", StandardCharsets.UTF_16.displayName());
AND
System.setProperty("file.encoding", StandardCharsets.UTF_16.displayName());
Doesn't help.
Apparently, HDFS does not support a lot of non-english characters. To work around that, change the field from String to bytes in your avro schema.
To convert your value from String to bytes, use:
ByteBuffer.wrap(str.getBytes(StandardCharsets.UTF_8)).
Then, while reading, to convert it back to String use:
new String(byteData.array(), StandardCharsets.UTF_8).
Rest of the code in your reader and writer stays the same.
Doing this, for English characters hadooop fs -text command will show proper text but for non-English characters it might show gibberish, but your reader will still be able to create the UTF-8 String from ByteBuffer.

Java, Reading a file that has UCS-2 Little Endian encodeing

I'm trying to read a txt file that has the UCS-2 LE encoding, I have the following code below. the ??? is the encoding variable I need but I am not sure what it's supposed to be.
InputStream HostFile = new FileInputStream(Location + FileName);
Reader file = new InputStreamReader(HostFile, Charset.forName(???);
PrintWriter writer = new PrintWriter(outLocation, "UTF-8");
Any ideas would be appreciated .
Reader file = new InputStreamReader(HostFile, Charset.forName("UTF-16LE");

Java InputStreamReader Error (org.apache.poi.openxml4j.exceptions.InvalidOperationException)

I am trying to convert pptx files to txt (Text Extraction) using Apache POI Framework (Java).
I'm new in coding Java, so I don't know a lot about Buffered Readers/InputStream, etc.
What I tried is:
import org.apache.poi.xslf.XSLFSlideShow;
import org.apache.poi.xslf.extractor.XSLFPowerPointExtractor;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
... Classes and Stuff ....
String inputfile = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
InputStream fis = new FileInputStream(inputfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fis));
String fileName = br1.readLine();
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(fileName))).getText());
br1.close();
My goal is, to write the extracted text into a variable, but It doesn't even work to print it on console... What I get is:
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'PK
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:102)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:69)
org.apache.poi.xslf.XSLFSlideShow.<init>(XSLFSlideShow.java:90)
Any help would be greatly appreciated!
You are doing much to much, in fact you are trying to read the data of the PPTX itself as filename, better simply use
System.out.println(new XSLFPowerPointExtractor(
new XMLSlideShow(new XSLFSlideShow(
"X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"))).getText());
or more generic
POITextExtractor extractor = ExtractorFactory.createExtractor(
new java.io.File("X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"");
System.out.println(extractor.getText());
extractor.close();
I cannot give you the correct answer (because I myself don't use POI), but I can tell you where your mistake might lie.
The constructor of the class XSLFSlideShow is expecting file path as its argument. But you are passing an InputStream. Try it as follows:
String filePath = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(filePath))).getText());

Getting XML data from the byteArray of a zipFile

I'm writing a simple program that retrieves XML data from an object, and parses it dynamically, based on user criteria. I am having trouble getting the XML data from the object, due to the format it is available in.
The object containing the XML returns the data as a byteArray of a zipFile, like so.
MyObject data = getData();
byte[] byteArray = data.getPayload();
//The above returns the byteArray of a zipFile
The way I checked this, is by writing the byteArray to a String
String str = new String(byteArray);
//The above returns a string with strange characters in it.
Then I wrote the data to a file.
FileOutputStream fos = new FileOutputStream("new.txt");
fos.write(byteArray);
I renamed new.txt as new.zip. When I opened it using WinRAR, out popped the XML.
My problem is that, I don't know how to do this conversion in Java using streams, without writing the data to a zip file first, and then reading it. Writing data to disk will make the software way too slow.
Any ideas/code snippets/info you could give me would be really appreciated!! Thanks
Also, if you need a better explanation from me, I'd be happy to elaborate.
As another option, I am wondering whether an XMLReader would work with a ZipInputStream as InputSource.
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
InputSource inputSource = new InputSource(zis);
A zip archive can contain several files. You have to position the zip stream on the first entry before parsing the content:
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
ZipEntry entry = zis.getNextEntry();
InputSource inputSource = new InputSource(new BoundedInputStream(zis, entry.getCompressedSize()));
The BoundedInputStream class is taken from Apache Commons IO (http://commons.apache.org/io)

Writing in the beginning of a text file Java

I need to write something into a text file's beginning. I have a text file with content and i want write something before this content. Say i have;
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
After modifying,I want it to be like this:
Page 1-Scene 59
25.05.2011
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
Just made up the content :) How can i modify a text file like this way?
You can't really modify it that way - file systems don't generally let you insert data in arbitrary locations - but you can:
Create a new file
Write the prefix to it
Copy the data from the old file to the new file
Move the old file to a backup location
Move the new file to the old file's location
Optionally delete the old backup file
Just in case it will be useful for someone here is full source code of method to prepend lines to a file using Apache Commons IO library. The code does not read whole file into memory, so will work on files of any size.
public static void prependPrefix(File input, String prefix) throws IOException {
LineIterator li = FileUtils.lineIterator(input);
File tempFile = File.createTempFile("prependPrefix", ".tmp");
BufferedWriter w = new BufferedWriter(new FileWriter(tempFile));
try {
w.write(prefix);
while (li.hasNext()) {
w.write(li.next());
w.write("\n");
}
} finally {
IOUtils.closeQuietly(w);
LineIterator.closeQuietly(li);
}
FileUtils.deleteQuietly(input);
FileUtils.moveFile(tempFile, input);
}
I think what you want is random access. Check out the related java tutorial. However, I don't believe you can just insert data at an arbitrary point in the file; If I recall correctly, you'd only overwrite the data. If you wanted to insert, you'd have to have your code
copy a block,
overwrite with your new stuff,
copy the next block,
overwrite with the previously copied block,
return to 3 until no more blocks
As #atk suggested, java.nio.channels.SeekableByteChannel is a good interface. But it is available from 1.7 only.
Update : If you have no issue using FileUtils then use
String fileString = FileUtils.readFileToString(file);
This isn't a direct answer to the question, but often files are accessed via InputStreams. If this is your use case, then you can chain input streams via SequenceInputStream to achieve the same result. E.g.
InputStream inputStream = new SequenceInputStream(new ByteArrayInputStream("my line\n".getBytes()), new FileInputStream(new File("myfile.txt")));
I will leave it here just in case anyone need
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try (FileInputStream fileInputStream1 = new FileInputStream(fileName1);
FileInputStream fileInputStream2 = new FileInputStream(fileName2)) {
while (fileInputStream2.available() > 0) {
byteArrayOutputStream.write(fileInputStream2.read());
}
while (fileInputStream1.available() > 0) {
byteArrayOutputStream.write(fileInputStream1.read());
}
}
try (FileOutputStream fileOutputStream = new FileOutputStream(fileName1)) {
byteArrayOutputStream.writeTo(fileOutputStream);
}

Categories