Unpack whole Tar easily in Java - java

Easiest way to unpack Tar (or BZip+Tar) archive in Java.
Apache Commons Compress has classes for unpacking Tar. But you have to iterate through all archive entries and save each entry's InputStream to file.
Is there way to simple unpack all files from Tar archive "in one line of code"?

I think your best bet is to launch as a subprocess. These libraries work with filesystem entries. So there is no easy way of doing it in one line of code
ProcessBuilder pb = new ProcessBuilder();
pb.directory(new File("path to your tar"));
pb.command("tar", "-xzvf", "my tar");
pb.start();

Plexus Archiver can do it.
But it also requires dependencies on plexus-container:
org.codehaus.plexus:plexus-archiver:3.4
org.codehaus.plexus:plexus-container-default:1.7.1
Example:
import org.codehaus.plexus.archiver.tar.TarUnArchiver;
import org.codehaus.plexus.logging.Logger;
import org.codehaus.plexus.logging.console.ConsoleLogger;
import org.junit.Test;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.arrayContaining;
public class UnpackTarTest {
#Test
public void wholeTarAtOnce() throws IOException {
File srcFile = new File(getClass().getResource("my.tar").getFile());
File destDir = Files.createTempDirectory("UnpackTarTest_").toFile();
destDir.deleteOnExit();
final TarUnArchiver ua = new TarUnArchiver();
ua.setSourceFile(srcFile);
ua.enableLogging(new ConsoleLogger(Logger.LEVEL_DEBUG, "console_logger"));
ua.setDestDirectory(destDir);
ua.extract();
assertThat(destDir.list(), arrayContaining("mytar"));
}
}

Related

Is there a way to read programmatically a .jmod file in Java?

I opened a .jmod file with 7-zip and I can see the contents. I tried to read it with ZipInputStream programmatically but it doesn't work: does someone know how to do it?
There is no documentation in JEP 261: Module System regarding the format used by JMOD files. That's not an oversight, as far as I can tell, because leaving the format as an implementation detail means they can change the format, without notice, whenever they want. That being said, currently JMOD files appear to be packaged in the ZIP format; this other answer quotes the following from JEP 261:
The final format of JMOD files is an open issue, but for now it is based on ZIP files.
However, I can't find that quote anywhere in JEP 261. It looks to be from an older version of the specification—at least, I found similar wording in the history of JDK-8061972 (the issue associated with the JEP).
What this means is you should—for the time being—be able to read a JMOD file by using any of the APIs which allow reading ZIP files. For instance, you could use one of the following:
The java.util.zip API:
import java.io.File;
import java.io.IOException;
import java.util.zip.ZipFile;
public class Main {
public static void main(String[] args) throws IOException {
var jmodFile = new File(args[0]).getAbsoluteFile();
System.out.println("Listing entries in JMOD file: " + jmodFile);
try (var zipFile = new ZipFile(jmodFile)) {
for (var entries = zipFile.entries(); entries.hasMoreElements(); ) {
System.out.println(entries.nextElement());
}
}
}
}
Note: To read the contents of an entry, see ZipFile#getInputStream(ZipEntry).
The ZIP FileSystemProvider API:
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
public class Main {
public static void main(String[] args) throws IOException {
var jmodFile = Path.of(args[0]).toAbsolutePath().normalize();
System.out.println("Listing entries in JMOD file: " + jmodFile);
try (var fileSystem = FileSystems.newFileSystem(jmodFile)) {
Files.walk(fileSystem.getRootDirectories().iterator().next())
.forEachOrdered(System.out::println);
}
}
}
Note: To read the contents of an entry, use one of the many methods provided by the java.nio.file.Files class.
Note: The Path#of(String,String...) method was added in Java 11 and the FileSystems#newFileSystem(Path) method was added in Java 13. Replace those method calls if using an older version of Java.
To reiterate, however: The format used by JMOD files is not documented and may change without notice.

How to specify path of a file in java/terminal on Hadoop?

I am running a task on Hadoop2:
$hadoop jar hipi.jar "/5" "/processWOH" 1
hipi.jar: the jar file name
"/5": the input folder name
"/processWOH": the output folder name
I am getting and exception regarding the path /localhost:9000/5/LC814000.tif:
Error: java.io.FileNotFoundException: /localhost:9000/5/LC814000.tif (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at ProcessWithoutHIPI.ProcessRecordReaderWOH.getCurrentKey(ProcessRecordReaderWOH.java:81)
at ProcessWithoutHIPI.ProcessRecordReaderWOH.getCurrentKey(ProcessRecordReaderWOH.java:1)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentKey(MapTask.java:507)
at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentKey(MapContextImpl.java:70)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentKey(WrappedMapper.java:81)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I think ( I am not sure) the problem with the extra "/localhost:9000" added to the path, but I don't know how it is added ( By hadoop, java code, ...).
Notice: this jar file is running fine outside of hadoop but in hadoop (hdfs) it is not
Any help is appreciated
Update:
As I discovered later that "/5" folder is searched inside the local system not inside hdfs and if I create a folder in the local file system with name "localhost:9000" under root i.e. /localhost:9000 and put "/5" the code will run, but in this case the data is taken outside from hadoop like if I am not using hadoop at all.
So is this a mistake in programming i.e. I should use hadoop io packages instead of java io packages to deal with hdfs instead of local filesystem, or it is another problem.?
The default directory of your hdfs is /localhost:9000/, hadoop can not find your input file there; just past it in /localhost:9000/:
$hadoop fs -put $LOCAL_PATH_OF_INPUT_FILE:/5 /localhost:9000/
$hadoop jar hipi.jar "/5" "/processWOH" 1
Good luck!
The problem is as I said earlier was that Java IO (i.e. File Class, Path class,...) treats paths as in local file system whereas Hadoop Io (FileSystem class, Path class,...) treats paths as in HDFS.
Please have a look here:
read/write from/in HDFS
Using FileSystem API to read and write data to HDFS
Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.
Step 1: Once you have downloaded a test dataset, we can write an application to read a file from the local file system and write the contents to Hadoop Distributed File System.
package com.hadoop.hdfs.writer;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ToolRunner;
public class HdfsWriter extends Configured implements Tool {
public static final String FS_PARAM_NAME = "fs.defaultFS";
public int run(String[] args) throws Exception {
if (args.length < 2) {
System.err.println("HdfsWriter [local input path] [hdfs output path]");
return 1;
}
String localInputPath = args[0];
Path outputPath = new Path(args[1]);
Configuration conf = getConf();
System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
FileSystem fs = FileSystem.get(conf);
if (fs.exists(outputPath)) {
System.err.println("output path exists");
return 1;
}
OutputStream os = fs.create(outputPath);
InputStream is = new BufferedInputStream(new FileInputStream(localInputPath));
IOUtils.copyBytes(is, os, conf);
return 0;
}
public static void main( String[] args ) throws Exception {
int returnCode = ToolRunner.run(new HdfsWriter(), args);
System.exit(returnCode);
}
}
Step 2: Export the Jar file and run the code from terminal to write a sample file to HDFS:
[training#localhost ~]$ hadoop jar HdfsWriter.jar com.hadoop.hdfs.writer.HdfsWriter sample.txt /user/training/HdfsWriter_sample.txt
Step 3: Verify whether the file is written into HDFS and check the contents of the file:
[training#localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
Step 4: Next, we write an application to read the file we just created in Hadoop Distributed File System and write its contents back to the local file system:
package com.hadoop.hdfs.reader;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HdfsReader extends Configured implements Tool {
public static final String FS_PARAM_NAME = "fs.defaultFS";
public int run(String[] args) throws Exception {
if (args.length < 2) {
System.err.println("HdfsReader [hdfs input path] [local output path]");
return 1;
}
Path inputPath = new Path(args[0]);
String localOutputPath = args[1];
Configuration conf = getConf();
System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
FileSystem fs = FileSystem.get(conf);
InputStream is = fs.open(inputPath);
OutputStream os = new BufferedOutputStream(new FileOutputStream(localOutputPath));
IOUtils.copyBytes(is, os, conf);
return 0;
}
public static void main( String[] args ) throws Exception {
int returnCode = ToolRunner.run(new HdfsReader(), args);
System.exit(returnCode);
}
}
Step 5: Export the Jar file and run the code from terminal to write a sample file to HDFS:
[training#localhost ~]$ hadoop jar HdfsReader.jar com.hadoop.hdfs.reader.HdfsReader /user/training/HdfsWriter_sample.txt /home/training/HdfsReader_sample.txt
Step 6: Verify whether the file is written back into local file system:
[training#localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
FileSystem is an abstract class that represents a generic file system. Most Hadoop file system implementations can be accessed and updated through the FileSystem object. To create an instance of the HDFS, you call the method FileSystem.get(). The FileSystem.get() method will look at the URI assigned to the fs.defaultFS parameter of the Hadoop configuration files on your classpath and choose the correct implementation of the FileSystem class to instantiate. The fs.defaultFS parameter of HDFS has the value hdfs://.
Once an instance of the FileSystem class has been created, the HdfsWriter class calls the create() method to create a file in HDFS. The create() method returns an OutputStream object, which can be manipulated using normal Java I/O methods. Similarly HdfsReader calls the method open() to open a file in HDFS, which returns an InputStream object that can be used to read the contents of the file.
The FileSystem API is extensive. To demonstrate some of the other methods available in the API, we can add some error checking to the HdfsWriter and HdfsReader classes we created.
To check whether the file exists before we call create(), use:
boolean exists = fs.exists(inputPath);
To check whether the path is a file, use:
boolean isFile = fs.isFile(inputPath);
To rename a file that already exits, use:
boolean renamed = fs.rename(inputPath, new Path("old_file.txt"));

Why can my library not access its resources?

I have a class that reads a file:
package classlibrary;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URL;
public class ReadingResource {
public static String readResource() throws IOException {
URL resource = ClassLoader.getSystemClassLoader().getResource("classlibrary/test_file.txt");
BufferedReader br = new BufferedReader(new FileReader(resource.getPath()));
return br.readLine();
}
}
The resource file is in the same directory where this class is.
I made a library out of this class and the file.
Now I want to use it in the other class:
package uritesting;
import classlibrary.ReadingResource;
import java.io.IOException;
public class URITesting {
public static void main(String[] args) throws IOException {
System.out.println(ReadingResource.readResource());
}
}
When I make a .jar file out of this class, set the class as the main class, add the .jar from above and execute it as "java -jar URITesting.jar" I get a FileNotFoundException, saying the class ReadingResource can not find the specified file. It is funny because the path that is specified in the exception message is actually the correct path to the file.
You can find the files here.
EDIT:
I developed the project in NetBeans. When I run it there, it works fine. The classpath is different in that case. It contains both resources of the URITestingProject and ReadingResource.
However, when I run it as a standalone JAR, the classpath contains URITestingProject only. What is strange to me is that it doesn't complain about not finding the class ReadingResource. It means that it is loaded, although it is not in the classpath :/
The problem is resource.getPath(). It's not possible to calculate a path ,valid for a file reader, inside a jar file, on another server and so on. However you can get the data through a stream instead:
InputStream data = ClassLoader.getSystemClassLoader().getResourceAsStream("classlibrary/test_file.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(data, "utf-8"));
As a side note: When reading with reader it's a good idea to specify the encoding:

Convert OWL/XML in RDF/XML with a simple command line in shell

I'm asking your help to create a converter to transform OWL/XML into RDF/XML. My purpose is to use OWLapi 2 through a simple shell command with bash.
My files are in OWL/XML but I have to transform them into RDF/XML to send them in my fuseki database. I could transform each file thanks to Protégé or a converter available online, but I've more than one thousand files to convert.
See my current java file (but I don't know how to use it) :
package owl2rdf;
import java.io.File;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyManager;
#SuppressWarnings("deprecation")
public class owl2rdf {
public static void main(String[] args) throws Exception {
// Get hold of an ontology manager
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
// Load the ontology from a local files
File file = new File(args[0]);
System.out.println("Loaded ontology: " + file);
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(file);
// Get the ontology format ; in our case it's normally OWL/XML
OWLOntologyFormat format = manager.OWLOntologyFormat(file);
System.out.println(" format: " + format);
// save the file into RDF/XML format
RDFXMLOntologyFormat rdfxmlFormat = new RDFXMLOntologyFormat();
manager.saveOntology(ontology, rdfxmlFormat, IRI.create(file));
}
}
When I execute this code, I've many errors relative to exceptions I don't understand at all, but I saw it's a common error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
Caused by: java.lang.NoClassDefFoundError: com/google/inject/Provider
Caused by: java.lang.ClassNotFoundException: com.google.inject.Provider
To change a entire repository of an OWL/XML file into RDF/XML file:
1- create your package owl2rdf.java
package owl2rdf;
//import all necessary classes
import java.io.File;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyManager;
#SuppressWarnings("deprecation")
public class owl2rdf {
#create a main() function to take an argument; here in the example one argument only
public static void main(String[] args) throws Exception {
// Get hold of an ontology manager
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
// Load the ontology from a local files
File file = new File(args[0]);
System.out.println("Loaded ontology: " + file);
OWLOntology ontology = manager.loadOntologyFromOntologyDocument(file);
// save the file into RDF/XML format
//in this case, my ontology file and format already manage prefix
RDFXMLOntologyFormat rdfxmlFormat = new RDFXMLOntologyFormat();
manager.saveOntology(ontology, rdfxmlFormat, IRI.create(file));
}
}
2- Thanks to a Java-IDE such as Eclipse or something else, manages all dependencies (repo Maven, downloads jar, classplath, etc.)
3- create your bash scrip my-scrip.sh; here absolutely not optimized
#!/bin/bash
cd your-dir
for i in *
do
#get the absolute path; be careful, realpath comes with the latest coreutils package
r=$(realpath "$i")
#to be not disturb by relative path with java -jar, I put the .jar in the parent directory
cd ..
java -jar owl2rdf.jar "$r"
cd your-dir
done
echo "Conversion finished, see below if there are errors."
4- Execute your script
$ chmod +x my-script.sh;./my-script
5- haha moment: all your OWL/XML are converted in RDF/XML. You can for example, import them into fuseki or sesame database.

How to build jar artifact so at run-time it will be able to read packed files?

At run-time application successfully files at project's root. All works fine when executing project in IntelliJ but when built jar artifact by IntelliJ executed in Windows environment it has troubles locating/reading files although they reside in root of jar file. How to fix it?
UPDATE
I am using Jersey framework. I read file from root path like that:
package example;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
#Path("/monitor")
public class MonitoringPage {
#GET
#Produces("text/html")
public String getMonitoringPage() throws IOException {
String page, line;
page = "";
BufferedReader br = new BufferedReader(new FileReader("MonitoringPage.htm"));
while((line = br.readLine()) != null){
page += line + "\r\n";
}
br.close();
return page;
}
}
My jar has MonitoringPage.htm in it's root but it cannot find it for some strange reason.
I am running jar with bat script:
java -jar "Rs.jar"
.
JAVA_HOME=C:\Program Files\Java\jdk1.7.0_51
Path=.......**C:\Program Files\Java\jdk1.7.0_51\bin**
Don't read it as a file from the file system (as is what happens when you use File, FileReader or many of its FileXxx variants). Once you package the jar, the file will no long be in the system file location you are expecting
Instead read it as a resource via an URL. You can use:
MonitoringPage.class.getResourceAsStream("/MonitoringPage.htm") which will return an InputStream.
From that InputStream you can just do something like
InputStream is = MonitoringPage.class.getResourceAsStream("/MonitoringPage.htm");
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
Note: this is all assuming you have the file at the root of the classpath (which it looks like from your image). The / in the front of the path will bring the search to the root of the classpath. So just use the path to the file that's relative to the root

Categories