Java extract .cab files - java

I want to extract files from a .cab file and save them. I could not find any useful information on other questions and on GitHub there is no documentation.
For now I can iterate through all entries (2 text files) with CAB Parser but I cannot find a good way to save them and would appreciate some hints.
File inputFile = new File(inputPath);
File outputPath = new File(inputFile.getParentFile().getAbsolutePath());
final InputStream is = new FileInputStream(inputFile);
CabParser cabParser = new CabParser(is, inputPath);
Arrays.stream(cabParser.files).forEach(element -> {
CabFileEntry cabFile = element;
});
Thanks a lot!

This seems to work with simple.cab:
final InputStream is = new FileInputStream("/tmp/simple.cab");
CabParser cabParser = new CabParser(is, new File("/tmp/simple/"));
cabParser.extractStream();
However, it fails with VC_RED.cab.
CabParser README.md:
extract files from within a .cab which are compressed via the LZX compression algorithm
Maybe not all the possible compression algorithms are supported by this library.
One option is to run cabextract with Runtime.exec(...).

Related

How read XLS file from gcp bucket

I have on bucket XLS file and I have to pull the file and read the data on the stream and work with data. I worked with CSV file and this is my code:
try (ReadChannel reader = storage.reader(bucketName, fileName)) {
ByteBuffer bytes = ByteBuffer.allocate(BUFFER_SIZE);
while (reader.read(bytes) > 0) {
bytes.flip();
// outChannel.write(bytes);
set(new String(bytes.array(), "UTF-8"), fileName);
bytes.clear();
}
}
I think this might be what you are looking for. GCS Input Channel
GcsService gcsService = GcsServiceFactory.createGcsService();
GcsFilename fileName = new GcsFilename("TestBucket", "Test1.xlsx");
GcsInputChannel readChannel = gcsService.openPrefetchingReadChannel(fileName, 0, BUFFER_SIZE);
InputStream inputStream = Channels.newInputStream(readChannel);
I've also found an interesting workaround that might help you, in this answer you have a piece of code that converts all the excel files to CSV to let you manipulate them as normal if you wish to.
EDIT: The error was there because it was not initialized. Have a look at my code edit.
In your case as far as I understood you might need to use "OutputChannel" not "InputChannel" but it's the same concept.
Hope this helps.

Iterate over repeated specific HL7 segment in Pentaho Data Integration KETTLE

I want to get a specific segment hl7, by name of segment for example,
I am using Pipeparser class but i still don't know how to get each segment by structure name (MSH,PID,OBX,...).
Sometimes I have a repeated segments like DG1 or PV1 or OBX(see attached lines) how can I get data fileds from each row segment in Pentaho kettle (do i need java code in kettle, if there is such solution, help please).
OBX|1|TX|PTH_SITE1^Site A|1|left||||||F|||||||
OBX|2|TX|PTH_SPEC1^Specimen A||C-FNA^Fine Needle Aspiration||||||F|||||||
or
DG1|1|I10C|G30.0|Alzheimer's disease with early onset|20160406|W|||||||||
DG1|2|I10C|E87.70|Fluid overload, unspecified|20160406|W|||||||||
You should use HL7 Parser for proper/accurate parsing of any HL7 formatted message.
With the help of HAPI you can either parse your message as Stream
// Open an InputStream to read from the file
File file = new File("hl7_messages.txt");
InputStream is = new FileInputStream(file);
// It's generally a good idea to buffer file IO
is = new BufferedInputStream(is);
// The following class is a HAPI utility that will iterate over
// the messages which appear over an InputStream
Hl7InputStreamMessageIterator iter = new Hl7InputStreamMessageIterator(is);
while (iter.hasNext()) {
Message next = iter.next();
// Do something with the message
}
Or you can read it as String
File file = new File("hl7_messages.txt");
is = new FileInputStream(file);
is = new BufferedInputStream(is);
Hl7InputStreamMessageStringIterator iter2 = new Hl7InputStreamMessageStringIterator(is);
while (iter2.hasNext()) {
String next = iter2.next();
// Do something with the message
}
Hope this helps you in right direction.

Java InputStreamReader Error (org.apache.poi.openxml4j.exceptions.InvalidOperationException)

I am trying to convert pptx files to txt (Text Extraction) using Apache POI Framework (Java).
I'm new in coding Java, so I don't know a lot about Buffered Readers/InputStream, etc.
What I tried is:
import org.apache.poi.xslf.XSLFSlideShow;
import org.apache.poi.xslf.extractor.XSLFPowerPointExtractor;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
... Classes and Stuff ....
String inputfile = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
InputStream fis = new FileInputStream(inputfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fis));
String fileName = br1.readLine();
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(fileName))).getText());
br1.close();
My goal is, to write the extracted text into a variable, but It doesn't even work to print it on console... What I get is:
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'PK
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:102)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:69)
org.apache.poi.xslf.XSLFSlideShow.<init>(XSLFSlideShow.java:90)
Any help would be greatly appreciated!
You are doing much to much, in fact you are trying to read the data of the PPTX itself as filename, better simply use
System.out.println(new XSLFPowerPointExtractor(
new XMLSlideShow(new XSLFSlideShow(
"X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"))).getText());
or more generic
POITextExtractor extractor = ExtractorFactory.createExtractor(
new java.io.File("X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"");
System.out.println(extractor.getText());
extractor.close();
I cannot give you the correct answer (because I myself don't use POI), but I can tell you where your mistake might lie.
The constructor of the class XSLFSlideShow is expecting file path as its argument. But you are passing an InputStream. Try it as follows:
String filePath = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(filePath))).getText());

Getting XML data from the byteArray of a zipFile

I'm writing a simple program that retrieves XML data from an object, and parses it dynamically, based on user criteria. I am having trouble getting the XML data from the object, due to the format it is available in.
The object containing the XML returns the data as a byteArray of a zipFile, like so.
MyObject data = getData();
byte[] byteArray = data.getPayload();
//The above returns the byteArray of a zipFile
The way I checked this, is by writing the byteArray to a String
String str = new String(byteArray);
//The above returns a string with strange characters in it.
Then I wrote the data to a file.
FileOutputStream fos = new FileOutputStream("new.txt");
fos.write(byteArray);
I renamed new.txt as new.zip. When I opened it using WinRAR, out popped the XML.
My problem is that, I don't know how to do this conversion in Java using streams, without writing the data to a zip file first, and then reading it. Writing data to disk will make the software way too slow.
Any ideas/code snippets/info you could give me would be really appreciated!! Thanks
Also, if you need a better explanation from me, I'd be happy to elaborate.
As another option, I am wondering whether an XMLReader would work with a ZipInputStream as InputSource.
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
InputSource inputSource = new InputSource(zis);
A zip archive can contain several files. You have to position the zip stream on the first entry before parsing the content:
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
ZipEntry entry = zis.getNextEntry();
InputSource inputSource = new InputSource(new BoundedInputStream(zis, entry.getCompressedSize()));
The BoundedInputStream class is taken from Apache Commons IO (http://commons.apache.org/io)

Writing in the beginning of a text file Java

I need to write something into a text file's beginning. I have a text file with content and i want write something before this content. Say i have;
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
After modifying,I want it to be like this:
Page 1-Scene 59
25.05.2011
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
Just made up the content :) How can i modify a text file like this way?
You can't really modify it that way - file systems don't generally let you insert data in arbitrary locations - but you can:
Create a new file
Write the prefix to it
Copy the data from the old file to the new file
Move the old file to a backup location
Move the new file to the old file's location
Optionally delete the old backup file
Just in case it will be useful for someone here is full source code of method to prepend lines to a file using Apache Commons IO library. The code does not read whole file into memory, so will work on files of any size.
public static void prependPrefix(File input, String prefix) throws IOException {
LineIterator li = FileUtils.lineIterator(input);
File tempFile = File.createTempFile("prependPrefix", ".tmp");
BufferedWriter w = new BufferedWriter(new FileWriter(tempFile));
try {
w.write(prefix);
while (li.hasNext()) {
w.write(li.next());
w.write("\n");
}
} finally {
IOUtils.closeQuietly(w);
LineIterator.closeQuietly(li);
}
FileUtils.deleteQuietly(input);
FileUtils.moveFile(tempFile, input);
}
I think what you want is random access. Check out the related java tutorial. However, I don't believe you can just insert data at an arbitrary point in the file; If I recall correctly, you'd only overwrite the data. If you wanted to insert, you'd have to have your code
copy a block,
overwrite with your new stuff,
copy the next block,
overwrite with the previously copied block,
return to 3 until no more blocks
As #atk suggested, java.nio.channels.SeekableByteChannel is a good interface. But it is available from 1.7 only.
Update : If you have no issue using FileUtils then use
String fileString = FileUtils.readFileToString(file);
This isn't a direct answer to the question, but often files are accessed via InputStreams. If this is your use case, then you can chain input streams via SequenceInputStream to achieve the same result. E.g.
InputStream inputStream = new SequenceInputStream(new ByteArrayInputStream("my line\n".getBytes()), new FileInputStream(new File("myfile.txt")));
I will leave it here just in case anyone need
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try (FileInputStream fileInputStream1 = new FileInputStream(fileName1);
FileInputStream fileInputStream2 = new FileInputStream(fileName2)) {
while (fileInputStream2.available() > 0) {
byteArrayOutputStream.write(fileInputStream2.read());
}
while (fileInputStream1.available() > 0) {
byteArrayOutputStream.write(fileInputStream1.read());
}
}
try (FileOutputStream fileOutputStream = new FileOutputStream(fileName1)) {
byteArrayOutputStream.writeTo(fileOutputStream);
}

Categories