jena read inputstream from gzipped file

jena read inputstream from gzipped file - java

I have the following code to read a dataset into a jena model using inputstream however I would like my program to be able to read compressed (gzipped) files as well (using filePath).
Dataset dataset = TDBFactory.createDataset(tdbPath);
Model model = dataset.getDefaultModel();
InputStream str = FileManager.get().open(filePath);
model.read(str,null, "N-TRIPLES");

You need to create a GZIPInputStream to read it then
Dataset dataset = TDBFactory.createDataset(tdbPath);
Model model = dataset.getDefaultModel();
InputStream str = FileManager.get().open(filePath);
if (useGZIP) {
str = new GZIPInputStream(str);
}
model.read(str,null, "N-TRIPLES");

If you use the newer RDFDataMgr APIs then GZip compression should be handled entirely transparently:
RDFDataMgr.read(model, filePath, Lang.NTRIPLES);

Related

How to set 'charset' for DatumWriter || write avro that contains arabic characters to HDFS

Some of the data contains value in Arabic format, and when the data is written, reader code/hadoop fs -text command shows ?? instead of Arabic characters.
1) Writer
// avro object is provided as SpecificRecordBase
Path path = new Path(pathStr);
DatumWriter<SpecificRecord> datumWriter = new SpecificDatumWriter<>();
FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf); // HDFS File System
FSDataOutputStream outputStream = fs.create(path);
DataFileWriter<SpecificRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
Schema schema = getSchema(); // method to get schema
dataFileWriter.setCodec(CodecFactory.snappyCodec());
dataFileWriter.create(schema, outputStream);
dataFileWriter.append(avroObject);
2) Reader
Configuration conf = new Configuration();
FsInput in = new FsInput(new Path(hdfsFilePathStr), conf);
DatumReader<Row> datumReader = new GenericDatumReader<>();
DataFileReader<Row> dataFileReader = new DataFileReader<>(in, datumReader);
GenericRecord outputData = (GenericRecord) dataFileReader.iterator.next();
I've tried hadoop fs -text {filePath} command, there also the values in Arabic appear as ??.
It will be really difficult to change the format in which data is written because there are numerous consumers of the same file.
Tried reading through SpecificRecordBase, still getting ??.
Edit
Also tried these (in both reader and writer):
Configuration conf = new Configuration();
conf.set("file.encoding", StandardCharsets.UTF_16.displayName());
AND
System.setProperty("file.encoding", StandardCharsets.UTF_16.displayName());
Doesn't help.

Apparently, HDFS does not support a lot of non-english characters. To work around that, change the field from String to bytes in your avro schema.
To convert your value from String to bytes, use:
ByteBuffer.wrap(str.getBytes(StandardCharsets.UTF_8)).
Then, while reading, to convert it back to String use:
new String(byteData.array(), StandardCharsets.UTF_8).
Rest of the code in your reader and writer stays the same.
Doing this, for English characters hadooop fs -text command will show proper text but for non-English characters it might show gibberish, but your reader will still be able to create the UTF-8 String from ByteBuffer.

Writing Image metadata in png image using pngj

Hi I am writing some custom meta data in png image using PNGJ.
I am able to write the meta data using PngJWriter at server side and able to read metadata using PngJ Reader at client side.
But if i save the image on local disk and try to read the meta data using javax.imageio.* classes I am not able to read the meta data inside image.
I tried to read using http://www.extractmetadata.com/ but still it is not working.
Below is the code I used to write it.
public synchronized static byte[] writeMetaDataInImageUsingPngJ(byte[] imageData, String key, String metaData)
{
InputStream image = new ByteArrayInputStream(imageData);
PngReader reader = new PngReader(image);
OutputStream outStream = new ByteArrayOutputStream();
PngWriter writer = new PngWriter(outStream, reader.imgInfo);
// instruct the writer to copy all ancillary chunks from source
writer.copyChunksFrom(reader.getChunksList(), ChunkCopyBehaviour.COPY_ALL_SAFE);
// add a new textual chunk (can also be done after writing the rows)
writer.getMetadata().setText(key, metaData);
// copy all rows
for (int row = 0; row < reader.imgInfo.rows; row++ )
{
IImageLine l1 = reader.readRow();
writer.writeRow(l1);
}
reader.end();
writer.end();
return ((ByteArrayOutputStream) outStream).toByteArray();
}
how to make sure that users can read meta data using normal libraries if I write using it PNGJ.

Convert Outputstream to file

Well i'm stucked with a problem,
I need to create a PDF with a html source and i did this way:
File pdf = new File("/home/wrk/relatorio.pdf");
OutputStream out = new FileOutputStream(pdf);
InputStream input = new ByteArrayInputStream(build.toString().getBytes());//Build is a StringBuilder obj
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
well i'm using JSP so i need to download this file to the user not write in the server...
How do I transform this Outputstream output to a file in the java without write this file in hard drive ?

If you're using VRaptor 3.3.0+ you can use the ByteArrayDownload class. Starting with your code, you can use this:
#Path("/download-relatorio")
public Download download() {
// Everything will be stored into this OutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream();
InputStream input = new ByteArrayInputStream(build.toString().getBytes());
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
// Now that you have finished, return a new ByteArrayDownload()
// The 2nd and 3rd parameters are the Content-Type and File Name
// (which will be shown to the end-user)
return new ByteArrayDownload(out.toByteArray(), "application/pdf", "Relatorio.pdf");
}

A File object does not actually hold the data but delegates all operations to the file system (see this discussion).
You could, however, create a temporary file using File.createTempFile. Also look here for a possible alternative without using a File object.

use temporary files.
File temp = File.createTempFile(prefix ,suffix);
prefix -- The prefix string defines the files name; must be at least three characters long.
suffix -- The suffix string defines the file's extension; if null the suffix ".tmp" will be used.

Compress and serialize String on the fly

I need to store a binary object (java class having several collections inside) in the key-value storage.
The size limit for the value is 4K.
I created XStream based serializer and deserializer, so when I am done filling my class members I can serialize it to a String or to a file.
In the worst case the serialized String/file size is ~30K. I mange to achive good compression rate so after compression my file is ~2K which fits the bill.
My question: is there any useful java API\library\technique that can:
compress a String and serialize the compressed object.
decompress previously compressed object and create a regular String from it
I am looking for one-liners that do not require intermediate storage of serialized object to file for later compression.
Appreciate your help!

Try a GZIPOutputStream for zipping the String:
ByteArrayOutputStream out = new ByteArrayOutputStream();
Writer writer = new BufferedWriter(new OutputStreamWriter(new GZIPOutputStream(out)));
writer.write(string);
byte[] zipped = out.toByteArray();
And to unzip again:
ByteArrayInputStream in = new ByteArrayInputStream(zipped);
BufferedReader reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(in)));
string = reader.readLine();

Getting XML data from the byteArray of a zipFile

I'm writing a simple program that retrieves XML data from an object, and parses it dynamically, based on user criteria. I am having trouble getting the XML data from the object, due to the format it is available in.
The object containing the XML returns the data as a byteArray of a zipFile, like so.
MyObject data = getData();
byte[] byteArray = data.getPayload();
//The above returns the byteArray of a zipFile
The way I checked this, is by writing the byteArray to a String
String str = new String(byteArray);
//The above returns a string with strange characters in it.
Then I wrote the data to a file.
FileOutputStream fos = new FileOutputStream("new.txt");
fos.write(byteArray);
I renamed new.txt as new.zip. When I opened it using WinRAR, out popped the XML.
My problem is that, I don't know how to do this conversion in Java using streams, without writing the data to a zip file first, and then reading it. Writing data to disk will make the software way too slow.
Any ideas/code snippets/info you could give me would be really appreciated!! Thanks
Also, if you need a better explanation from me, I'd be happy to elaborate.
As another option, I am wondering whether an XMLReader would work with a ZipInputStream as InputSource.
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
InputSource inputSource = new InputSource(zis);

A zip archive can contain several files. You have to position the zip stream on the first entry before parsing the content:
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
ZipEntry entry = zis.getNextEntry();
InputSource inputSource = new InputSource(new BoundedInputStream(zis, entry.getCompressedSize()));
The BoundedInputStream class is taken from Apache Commons IO (http://commons.apache.org/io)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

jena read inputstream from gzipped file - java

You need to create a GZIPInputStream to read it then Dataset dataset = TDBFactory.createDataset(tdbPath); Model model = dataset.getDefaultModel(); InputStream str = FileManager.get().open(filePath); if (useGZIP) { str = new GZIPInputStream(str); } model.read(str,null, "N-TRIPLES");

If you use the newer RDFDataMgr APIs then GZip compression should be handled entirely transparently: RDFDataMgr.read(model, filePath, Lang.NTRIPLES);

Related

How to set 'charset' for DatumWriter || write avro that contains arabic characters to HDFS

Writing Image metadata in png image using pngj

Convert Outputstream to file

Compress and serialize String on the fly

Getting XML data from the byteArray of a zipFile

Categories

Resources