Detect file type based on content

Detect file type based on content - java

Tried the following:
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.spi.FileTypeDetector;
import org.apache.tika.Tika;
import org.apache.tika.mime.MimeTypes;
/**
*
* #author kiriti.k
*/
public class TikaFileTypeDetector {
private final Tika tika = new Tika();
public TikaFileTypeDetector() {
super();
}
public String probeContentType(Path path) throws IOException {
// Try to detect based on the file name only for efficiency
String fileNameDetect = tika.detect(path.toString());
if (!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
return fileNameDetect;
}
// Then check the file content if necessary
String fileContentDetect = tika.detect(path.toFile());
if (!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
return fileContentDetect;
}
// Specification says to return null if we could not
// conclusively determine the file type
return null;
}
public static void main(String[] args) throws IOException {
Tika tika = new Tika();
// expects file path as the program argument
if (args.length != 1) {
printUsage();
return;
}
Path path = Paths.get(args[0]);
TikaFileTypeDetector detector = new TikaFileTypeDetector();
// Analyse the file - first based on file name for efficiency.
// If cannot determine based on name and then analyse content
String contentType = detector.probeContentType(path);
System.out.println("File is of type - " + contentType);
}
public static void printUsage() {
System.out.print("Usage: java -classpath ... "
+ TikaFileTypeDetector.class.getName()
+ " ");
}
}
The above program is checking based on file extension only. How do I make it to check content type also(mime) and then determine the type. I am using tika-app-1.8.jar in netbean 8.0.2. What am I missing?

The code checks the file extension first and returns the MIME type based on that, if it finds a result. If you want it to check the content first, just switch the two statements:
public String probeContentType(Path path) throws IOException {
// Check contents first
String fileContentDetect = tika.detect(path.toFile());
if (!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
return fileContentDetect;
}
// Try file name only if content search was not successful
String fileNameDetect = tika.detect(path.toString());
if (!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
return fileNameDetect;
}
// Specification says to return null if we could not
// conclusively determine the file type
return null;
}
Be aware that this may have huge performance impact.

You can use Files.probeContentType(path)

Related

java Gherkin parser stream does not release file locks

I am using Gherkin parser to parse feature files and returning the list of Gherkin documents see the function below:
import io.cucumber.gherkin.Gherkin;
import io.cucumber.messages.IdGenerator;
import io.cucumber.messages.Messages;
import io.cucumber.messages.Messages.Envelope;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class GherkinUtils {
private static final Logger LOG = LogManager.getLogger(GherkinUtils.class);
public static ArrayList<Messages.GherkinDocument> getGherkinDocumentsFromFiles() {
IdGenerator idGenerator = new IdGenerator.Incrementing();
ArrayList<Messages.GherkinDocument> listOfGherkinDocuments = new ArrayList<>();
String pathFolderFrameworkFeatures = SettingsUtils.getPathFolderFrameworkFeatures();
List<String> listOfPathsForFeatureFiles = FileUtils.getAllFilePathsFromFolder(pathFolderFrameworkFeatures);
try (Stream<Envelope> dataStream = Gherkin.fromPaths(listOfPathsForFeatureFiles, false, true, false, idGenerator)){
List<Envelope> envelopes = dataStream.collect(Collectors.toList());
for (Envelope env : envelopes) {
Messages.GherkinDocument gherkinDocument = env.getGherkinDocument();
listOfGherkinDocuments.add(gherkinDocument);
}
} catch (Exception e) {
LOG.error("Error occurred while trying to read the feature files", new Exception(e));
}
FileUtils.renameAllFeatureFiles("b");
return listOfGherkinDocuments;
}
}
Just before the return statement, you can see the function that will update the name for all feature files just to check if they are not locked.
The problem is that only the first file is always renamed and the rest of them are always locked.
If I will place the rename function at the top, then all the files are successfully renamed...
My understanding is that the try statement will automatically close the stream. Also, I tried to close it manually inside the try block but the results are the same.
What am I missing? How can I make it to release the file locks?
Update 1:
This exact line is making the files (except the first one to be locked):
List<Envelope> envelopes = dataStream.collect(Collectors.toList());
Here is the file name update function definition in case you want to test it:
public static void renameAllFeatureFiles(String fileName) {
String pathFeaturesFolder = SettingsUtils.getPathFolderFrameworkFeatures();
List<String> pathList = FileUtils.getAllFilePathsFromFolder(pathFeaturesFolder);
int counter = 0;
for (String path : pathList) {
counter ++;
File file = new File(path);
File newFile = new File(pathFeaturesFolder + "\\" + fileName +counter+".feature");
System.out.println("File: " + path + " locked: " + !file.renameTo(newFile));
}
}
And here is a sample feature file content:
Feature: Test
Scenario: test 1
Given User will do something
And User will do something
Update 2:
Tried with separate thread using javafx Task, still the same issue :(
Except for one file (this is really strange) all files are locked...
public static void runInNewThread() {
// define the execution task that will run in a new thread
Task<Void> newTask = new Task<>() {
#Override
protected Void call() {
ArrayList<Messages.GherkinDocument> listOfGherkinDocuments = GherkinUtils.getGherkinDocumentsFromFiles();
return null;
}
};
// run the task in a new thread
Thread th = new Thread(newTask);
th.setDaemon(true);
th.start();
}

For now, I have used workaround with creating copies of the specific files and using parser on the copies to prevent locking of the original versions...

How to invoke model from TensorFlow Java?

The following python code passes ["hello", "world"] into the universal sentence encoder and returns an array of floats denoting their encoded representation.
import tensorflow as tf
import tensorflow_hub as hub
module = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4")
model = tf.keras.Sequential(module)
print("model: ", model(["hello", "world"]))
This code works but I'd now like to do the same thing using the Java API. I've successfully loaded the module, but I am unable to pass inputs into the model and extract the output. Here is what I've got so far:
import org.tensorflow.Graph;
import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.Tensors;
import org.tensorflow.framework.ConfigProto;
import org.tensorflow.framework.GPUOptions;
import org.tensorflow.framework.GraphDef;
import org.tensorflow.framework.MetaGraphDef;
import org.tensorflow.framework.NodeDef;
import org.tensorflow.util.SaverDef;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
public final class NaiveBayesClassifier
{
public static void main(String[] args)
{
new NaiveBayesClassifier().run();
}
protected SavedModelBundle loadModule(Path source, String... tags) throws IOException
{
return SavedModelBundle.load(source.toAbsolutePath().normalize().toString(), tags);
}
public void run()
{
try (SavedModelBundle module = loadModule(Paths.get("universal-sentence-encoder"), "serve"))
{
Graph graph = module.graph();
try (Session session = new Session(graph, ConfigProto.newBuilder().
setGpuOptions(GPUOptions.newBuilder().setAllowGrowth(true)).
setAllowSoftPlacement(true).
build().toByteArray()))
{
Tensor<String> input = Tensors.create(new byte[][]
{
"hello".getBytes(StandardCharsets.UTF_8),
"world".getBytes(StandardCharsets.UTF_8)
});
List<Tensor<?>> result = session.runner().feed("serving_default_inputs", input).
addTarget("???").run();
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
I used https://stackoverflow.com/a/51952478/14731 to scan the model for possible input/output nodes. I believe the input node is "serving_default_inputs" but I can't figure out the output node. More importantly, I don't have to specify any of these values when invoking the code in python through Keras so is there a way to do the same using the Java API?
UPDATE: Thanks to roywei I can now that confirm the input node is serving_default_input and output node is StatefulPartitionedCall_1 but when I plug these names into the aforementioned code I get:
2020-05-22 22:13:52.266287: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.
Exception in thread "main" java.lang.IllegalStateException: [_Derived_]{{function_node __inference_pruned_6741}} {{function_node __inference_pruned_6741}} Error while reading resource variable EncoderDNN/DNN/ResidualHidden_0/dense/kernel/part_25 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/EncoderDNN/DNN/ResidualHidden_0/dense/kernel/part_25/class tensorflow::Var does not exist.
[[{{node EncoderDNN/DNN/ResidualHidden_0/dense/kernel/ConcatPartitions/concat/ReadVariableOp_25}}]]
[[StatefulPartitionedCall_1/StatefulPartitionedCall]]
at libtensorflow#1.15.0/org.tensorflow.Session.run(Native Method)
at libtensorflow#1.15.0/org.tensorflow.Session.access$100(Session.java:48)
at libtensorflow#1.15.0/org.tensorflow.Session$Runner.runHelper(Session.java:326)
at libtensorflow#1.15.0/org.tensorflow.Session$Runner.run(Session.java:276)
Meaning, I still cannot invoke the model. What am I missing?

I figured it out after roywei pointed me in the right direction.
I needed to use SavedModuleBundle.session() instead of constructing my own instance. This is because the loader initializes the graph variables.
Instead of passing a ConfigProto to the Session constructor, I passed it into the SavedModelBundle loader instead.
I needed to use fetch() instead of addTarget() to retrieve the output tensor.
Here is the working code:
public final class NaiveBayesClassifier
{
public static void main(String[] args)
{
new NaiveBayesClassifier().run();
}
public void run()
{
try (SavedModelBundle module = loadModule(Paths.get("universal-sentence-encoder"), "serve"))
{
try (Tensor<String> input = Tensors.create(new byte[][]
{
"hello".getBytes(StandardCharsets.UTF_8),
"world".getBytes(StandardCharsets.UTF_8)
}))
{
MetaGraphDef metadata = MetaGraphDef.parseFrom(module.metaGraphDef());
Map<String, Shape> nameToInput = getInputToShape(metadata);
String firstInput = nameToInput.keySet().iterator().next();
Map<String, Shape> nameToOutput = getOutputToShape(metadata);
String firstOutput = nameToOutput.keySet().iterator().next();
System.out.println("input: " + firstInput);
System.out.println("output: " + firstOutput);
System.out.println();
List<Tensor<?>> result = module.session().runner().feed(firstInput, input).
fetch(firstOutput).run();
for (Tensor<?> tensor : result)
{
{
float[][] array = new float[tensor.numDimensions()][tensor.numElements() /
tensor.numDimensions()];
tensor.copyTo(array);
System.out.println(Arrays.deepToString(array));
}
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
/**
* Loads a graph from a file.
*
* #param source the directory containing to load from
* #param tags the model variant(s) to load
* #return the graph
* #throws NullPointerException if any of the arguments are null
* #throws IOException if an error occurs while reading the file
*/
protected SavedModelBundle loadModule(Path source, String... tags) throws IOException
{
// https://stackoverflow.com/a/43526228/14731
try
{
return SavedModelBundle.loader(source.toAbsolutePath().normalize().toString()).
withTags(tags).
withConfigProto(ConfigProto.newBuilder().
setGpuOptions(GPUOptions.newBuilder().setAllowGrowth(true)).
setAllowSoftPlacement(true).
build().toByteArray()).
load();
}
catch (TensorFlowException e)
{
throw new IOException(e);
}
}
/**
* #param metadata the graph metadata
* #return the first signature, or null
*/
private SignatureDef getFirstSignature(MetaGraphDef metadata)
{
Map<String, SignatureDef> nameToSignature = metadata.getSignatureDefMap();
if (nameToSignature.isEmpty())
return null;
return nameToSignature.get(nameToSignature.keySet().iterator().next());
}
/**
* #param metadata the graph metadata
* #return the output signature
*/
private SignatureDef getServingSignature(MetaGraphDef metadata)
{
return metadata.getSignatureDefOrDefault("serving_default", getFirstSignature(metadata));
}
/**
* #param metadata the graph metadata
* #return a map from an output name to its shape
*/
protected Map<String, Shape> getOutputToShape(MetaGraphDef metadata)
{
Map<String, Shape> result = new HashMap<>();
SignatureDef servingDefault = getServingSignature(metadata);
for (Map.Entry<String, TensorInfo> entry : servingDefault.getOutputsMap().entrySet())
{
TensorShapeProto shapeProto = entry.getValue().getTensorShape();
List<Dim> dimensions = shapeProto.getDimList();
long firstDimension = dimensions.get(0).getSize();
long[] remainingDimensions = dimensions.stream().skip(1).mapToLong(Dim::getSize).toArray();
Shape shape = Shape.make(firstDimension, remainingDimensions);
result.put(entry.getValue().getName(), shape);
}
return result;
}
/**
* #param metadata the graph metadata
* #return a map from an input name to its shape
*/
protected Map<String, Shape> getInputToShape(MetaGraphDef metadata)
{
Map<String, Shape> result = new HashMap<>();
SignatureDef servingDefault = getServingSignature(metadata);
for (Map.Entry<String, TensorInfo> entry : servingDefault.getInputsMap().entrySet())
{
TensorShapeProto shapeProto = entry.getValue().getTensorShape();
List<Dim> dimensions = shapeProto.getDimList();
long firstDimension = dimensions.get(0).getSize();
long[] remainingDimensions = dimensions.stream().skip(1).mapToLong(Dim::getSize).toArray();
Shape shape = Shape.make(firstDimension, remainingDimensions);
result.put(entry.getValue().getName(), shape);
}
return result;
}
}

There are two ways to get the names:
1) Using Java:
You can read the input and output names from the org.tensorflow.proto.framework.MetaGraphDef stored in saved model bundle.
Here is an example on how to extract the information:
https://github.com/awslabs/djl/blob/master/tensorflow/tensorflow-engine/src/main/java/ai/djl/tensorflow/engine/TfSymbolBlock.java#L149
2) Using python:
load the saved model in tensorflow python and print the names
loaded = tf.saved_model.load("path/to/model/")
print(list(loaded.signatures.keys()))
infer = loaded.signatures["serving_default"]
print(infer.structured_outputs)
I recommend to take a look at Deep Java Library, it automatically handle the input, output names.
It supports TensorFlow 2.1.0 and allows you to load Keras models as well as TF Hub Saved Model. Take a look at the documentation here and here
Feel free to open an issue if you have problem loading your model.

You can load TF model with Deep Java Library
System.setProperty("ai.djl.repository.zoo.location", "https://storage.googleapis.com/tfhub-modules/google/universal-sentence-encoder/1.tar.gz?artifact_id=encoder");
Criteria.Builder<NDList, NDList> builder =
Criteria.builder()
.setTypes(NDList.class, NDList.class)
.optArtifactId("ai.djl.localmodelzoo:encoder")
.build();
ZooModel<NDList, NDList> model = ModelZoo.loadModel(criteria);
See https://github.com/awslabs/djl/blob/master/docs/load_model.md#load-model-from-a-url for detail

I need to do the same, but seems still lots of missing pieces RE DJL usage. E.g., what to do after this?:
ZooModel<NDList, NDList> model = ModelZoo.loadModel(criteria);
I finally found an example in the DJL source code. The key take-away is to not use NDList for the input/output at all:
Criteria<String[], float[][]> criteria =
Criteria.builder()
.optApplication(Application.NLP.TEXT_EMBEDDING)
.setTypes(String[].class, float[][].class)
.optModelUrls(modelUrl)
.build();
try (ZooModel<String[], float[][]> model = ModelZoo.loadModel(criteria);
Predictor<String[], float[][]> predictor = model.newPredictor()) {
return predictor.predict(inputs.toArray(new String[0]));
}
See https://github.com/awslabs/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/UniversalSentenceEncoder.java for the complete example.

Get Random File from Folder - Java [duplicate]

I am looking for a way to get a list of all resource names from a given classpath directory, something like a method List<String> getResourceNames (String directoryName).
For example, given a classpath directory x/y/z containing files a.html, b.html, c.html and a subdirectory d, getResourceNames("x/y/z") should return a List<String> containing the following strings:['a.html', 'b.html', 'c.html', 'd'].
It should work both for resources in filesystem and jars.
I know that I can write a quick snippet with Files, JarFiles and URLs, but I do not want to reinvent the wheel. My question is, given existing publicly available libraries, what is the quickest way to implement getResourceNames? Spring and Apache Commons stacks are both feasible.

Custom Scanner
Implement your own scanner. For example:
(limitations of this solution are mentioned in the comments)
private List<String> getResourceFiles(String path) throws IOException {
List<String> filenames = new ArrayList<>();
try (
InputStream in = getResourceAsStream(path);
BufferedReader br = new BufferedReader(new InputStreamReader(in))) {
String resource;
while ((resource = br.readLine()) != null) {
filenames.add(resource);
}
}
return filenames;
}
private InputStream getResourceAsStream(String resource) {
final InputStream in
= getContextClassLoader().getResourceAsStream(resource);
return in == null ? getClass().getResourceAsStream(resource) : in;
}
private ClassLoader getContextClassLoader() {
return Thread.currentThread().getContextClassLoader();
}
Spring Framework
Use PathMatchingResourcePatternResolver from Spring Framework.
Ronmamo Reflections
The other techniques might be slow at runtime for huge CLASSPATH values. A faster solution is to use ronmamo's Reflections API, which precompiles the search at compile time.

Here is the code
Source: forums.devx.com/showthread.php?t=153784
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Enumeration;
import java.util.regex.Pattern;
import java.util.zip.ZipEntry;
import java.util.zip.ZipException;
import java.util.zip.ZipFile;
/**
* list resources available from the classpath # *
*/
public class ResourceList{
/**
* for all elements of java.class.path get a Collection of resources Pattern
* pattern = Pattern.compile(".*"); gets all resources
*
* #param pattern
* the pattern to match
* #return the resources in the order they are found
*/
public static Collection<String> getResources(
final Pattern pattern){
final ArrayList<String> retval = new ArrayList<String>();
final String classPath = System.getProperty("java.class.path", ".");
final String[] classPathElements = classPath.split(System.getProperty("path.separator"));
for(final String element : classPathElements){
retval.addAll(getResources(element, pattern));
}
return retval;
}
private static Collection<String> getResources(
final String element,
final Pattern pattern){
final ArrayList<String> retval = new ArrayList<String>();
final File file = new File(element);
if(file.isDirectory()){
retval.addAll(getResourcesFromDirectory(file, pattern));
} else{
retval.addAll(getResourcesFromJarFile(file, pattern));
}
return retval;
}
private static Collection<String> getResourcesFromJarFile(
final File file,
final Pattern pattern){
final ArrayList<String> retval = new ArrayList<String>();
ZipFile zf;
try{
zf = new ZipFile(file);
} catch(final ZipException e){
throw new Error(e);
} catch(final IOException e){
throw new Error(e);
}
final Enumeration e = zf.entries();
while(e.hasMoreElements()){
final ZipEntry ze = (ZipEntry) e.nextElement();
final String fileName = ze.getName();
final boolean accept = pattern.matcher(fileName).matches();
if(accept){
retval.add(fileName);
}
}
try{
zf.close();
} catch(final IOException e1){
throw new Error(e1);
}
return retval;
}
private static Collection<String> getResourcesFromDirectory(
final File directory,
final Pattern pattern){
final ArrayList<String> retval = new ArrayList<String>();
final File[] fileList = directory.listFiles();
for(final File file : fileList){
if(file.isDirectory()){
retval.addAll(getResourcesFromDirectory(file, pattern));
} else{
try{
final String fileName = file.getCanonicalPath();
final boolean accept = pattern.matcher(fileName).matches();
if(accept){
retval.add(fileName);
}
} catch(final IOException e){
throw new Error(e);
}
}
}
return retval;
}
/**
* list the resources that match args[0]
*
* #param args
* args[0] is the pattern to match, or list all resources if
* there are no args
*/
public static void main(final String[] args){
Pattern pattern;
if(args.length < 1){
pattern = Pattern.compile(".*");
} else{
pattern = Pattern.compile(args[0]);
}
final Collection<String> list = ResourceList.getResources(pattern);
for(final String name : list){
System.out.println(name);
}
}
}
If you are using Spring Have a look at PathMatchingResourcePatternResolver

Using Reflections
Get everything on the classpath:
Reflections reflections = new Reflections(null, new ResourcesScanner());
Set<String> resourceList = reflections.getResources(x -> true);
Another example - get all files with extension .csv from some.package:
Reflections reflections = new Reflections("some.package", new ResourcesScanner());
Set<String> resourceList = reflections.getResources(Pattern.compile(".*\\.csv"));

So in terms of the PathMatchingResourcePatternResolver this is what is needed in the code:
#Autowired
ResourcePatternResolver resourceResolver;
public void getResources() {
resourceResolver.getResources("classpath:config/*.xml");
}

If you use apache commonsIO you can use for the filesystem (optionally with extension filter):
Collection<File> files = FileUtils.listFiles(new File("directory/"), null, false);
and for resources/classpath:
List<String> files = IOUtils.readLines(MyClass.class.getClassLoader().getResourceAsStream("directory/"), Charsets.UTF_8);
If you don't know if "directoy/" is in the filesystem or in resources you may add a
if (new File("directory/").isDirectory())
or
if (MyClass.class.getClassLoader().getResource("directory/") != null)
before the calls and use both in combination...

The most robust mechanism for listing all resources in the classpath is currently to use this pattern with ClassGraph, because it handles the widest possible array of classpath specification mechanisms, including the new JPMS module system. (I am the author of ClassGraph.)
List<String> resourceNames;
try (ScanResult scanResult = new ClassGraph().acceptPaths("x/y/z").scan()) {
resourceNames = scanResult.getAllResources().getNames();
}

The Spring framework's PathMatchingResourcePatternResolver is really awesome for these things:
private Resource[] getXMLResources() throws IOException
{
ClassLoader classLoader = MethodHandles.lookup().getClass().getClassLoader();
PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(classLoader);
return resolver.getResources("classpath:x/y/z/*.xml");
}
Maven dependency:
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>LATEST</version>
</dependency>

This should work (if spring is not an option):
public static List<String> getFilenamesForDirnameFromCP(String directoryName) throws URISyntaxException, UnsupportedEncodingException, IOException {
List<String> filenames = new ArrayList<>();
URL url = Thread.currentThread().getContextClassLoader().getResource(directoryName);
if (url != null) {
if (url.getProtocol().equals("file")) {
File file = Paths.get(url.toURI()).toFile();
if (file != null) {
File[] files = file.listFiles();
if (files != null) {
for (File filename : files) {
filenames.add(filename.toString());
}
}
}
} else if (url.getProtocol().equals("jar")) {
String dirname = directoryName + "/";
String path = url.getPath();
String jarPath = path.substring(5, path.indexOf("!"));
try (JarFile jar = new JarFile(URLDecoder.decode(jarPath, StandardCharsets.UTF_8.name()))) {
Enumeration<JarEntry> entries = jar.entries();
while (entries.hasMoreElements()) {
JarEntry entry = entries.nextElement();
String name = entry.getName();
if (name.startsWith(dirname) && !dirname.equals(name)) {
URL resource = Thread.currentThread().getContextClassLoader().getResource(name);
filenames.add(resource.toString());
}
}
}
}
}
return filenames;
}

My way, no Spring, used during a unit test:
URI uri = TestClass.class.getResource("/resources").toURI();
Path myPath = Paths.get(uri);
Stream<Path> walk = Files.walk(myPath, 1);
for (Iterator<Path> it = walk.iterator(); it.hasNext(); ) {
Path filename = it.next();
System.out.println(filename);
}

With Spring it's easy. Be it a file, or folder, or even multiple files, there are chances, you can do it via injection.
This example demonstrates the injection of multiple files located in x/y/z folder.
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
#Service
public class StackoverflowService {
#Value("classpath:x/y/z/*")
private Resource[] resources;
public List<String> getResourceNames() {
return Arrays.stream(resources)
.map(Resource::getFilename)
.collect(Collectors.toList());
}
}
It does work for resources in the filesystem as well as in JARs.

Used a combination of Rob's response.
final String resourceDir = "resourceDirectory/";
List<String> files = IOUtils.readLines(Thread.currentThread().getClass().getClassLoader().getResourceAsStream(resourceDir), Charsets.UTF_8);
for (String f : files) {
String data = IOUtils.toString(Thread.currentThread().getClass().getClassLoader().getResourceAsStream(resourceDir + f));
// ... process data
}

I think you can leverage the [Zip File System Provider][1] to achieve this. When using FileSystems.newFileSystem it looks like you can treat the objects in that ZIP as a "regular" file.
In the linked documentation above:
Specify the configuration options for the zip file system in the java.util.Map object passed to the FileSystems.newFileSystem method. See the [Zip File System Properties][2] topic for information about the provider-specific configuration properties for the zip file system.
Once you have an instance of a zip file system, you can invoke the methods of the [java.nio.file.FileSystem][3] and [java.nio.file.Path][4] classes to perform operations such as copying, moving, and renaming files, as well as modifying file attributes.
The documentation for the jdk.zipfs module in [Java 11 states][5]:
The zip file system provider treats a zip or JAR file as a file system and provides the ability to manipulate the contents of the file. The zip file system provider can be created by [FileSystems.newFileSystem][6] if installed.
Here is a contrived example I did using your example resources. Note that a .zip is a .jar, but you could adapt your code to instead use classpath resources:
Setup
cd /tmp
mkdir -p x/y/z
touch x/y/z/{a,b,c}.html
echo 'hello world' > x/y/z/d
zip -r example.zip x
Java
import java.io.IOException;
import java.net.URI;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.util.Collections;
import java.util.stream.Collectors;
public class MkobitZipRead {
public static void main(String[] args) throws IOException {
final URI uri = URI.create("jar:file:/tmp/example.zip");
try (
final FileSystem zipfs = FileSystems.newFileSystem(uri, Collections.emptyMap());
) {
Files.walk(zipfs.getPath("/")).forEach(path -> System.out.println("Files in zip:" + path));
System.out.println("-----");
final String manifest = Files.readAllLines(
zipfs.getPath("x", "y", "z").resolve("d")
).stream().collect(Collectors.joining(System.lineSeparator()));
System.out.println(manifest);
}
}
}
Output
Files in zip:/
Files in zip:/x/
Files in zip:/x/y/
Files in zip:/x/y/z/
Files in zip:/x/y/z/c.html
Files in zip:/x/y/z/b.html
Files in zip:/x/y/z/a.html
Files in zip:/x/y/z/d
-----
hello world

Neither of answers worked for me even though I had my resources put in resources folders and followed the above answers. What did make a trick was:
#Value("file:*/**/resources/**/schema/*.json")
private Resource[] resources;

Expanding on Luke Hutchinsons answer above, using his ClassGraph library, I was able to easily get a list of all files in a Resource folder with almost no effort at all.
Let's say that in your resource folder, you have a folder called MyImages. This is how easy it is to get a URL list of all the files in that folder:
import io.github.classgraph.ClassGraph;
import io.github.classgraph.ResourceList;
import io.github.classgraph.ScanResult;
public static LinkedList<URL> getURLList (String folder) {
LinkedList<URL> urlList = new LinkedList<>();
ScanResult scanResult = new ClassGraph().enableAllInfo().scan();
ResourceList resources = scanResult.getAllResources();
for (URL url : resources.getURLs()) {
if (url.toString().contains(folder)) {
urlList.addLast(url);
}
}
return urlList;
}
Then you simply do this:
LinkedList<URL> myURLFileList = getURLList("MyImages");
The URLs can then be loaded into streams or use Apache's FileUtils to copy the files somewhere else like this:
String outPath = "/My/Output/Path";
for(URL url : myURLFileList) {
FileUtils.copyURLToFile(url, new File(outPath, url.getFile()));
}
I think ClassGraph is a pretty slick library for making tasks like this very simple and easy to comprehend.

Based on #rob 's information above, I created the implementation which I am releasing to the public domain:
private static List<String> getClasspathEntriesByPath(String path) throws IOException {
InputStream is = Main.class.getClassLoader().getResourceAsStream(path);
StringBuilder sb = new StringBuilder();
while (is.available()>0) {
byte[] buffer = new byte[1024];
sb.append(new String(buffer, Charset.defaultCharset()));
}
return Arrays
.asList(sb.toString().split("\n")) // Convert StringBuilder to individual lines
.stream() // Stream the list
.filter(line -> line.trim().length()>0) // Filter out empty lines
.collect(Collectors.toList()); // Collect remaining lines into a List again
}
While I would not have expected getResourcesAsStream to work like that on a directory, it really does and it works well.

Java file encoding magic

Strange thing happened in Java Kingdom...
Long story short: I use Java API V3 to connect to QuickBooks and fetch the data form there (services for example).
Everything goes fine except the case when a service contains russian symbols (or probably non-latin symbols).
Here is Java code that does it (I know it's far from perfect)
package com.mde.test;
import static com.intuit.ipp.query.GenerateQuery.$;
import static com.intuit.ipp.query.GenerateQuery.select;
import java.util.LinkedList;
import java.util.List;
import com.intuit.ipp.core.Context;
import com.intuit.ipp.core.ServiceType;
import com.intuit.ipp.data.Item;
import com.intuit.ipp.exception.FMSException;
import com.intuit.ipp.query.GenerateQuery;
import com.intuit.ipp.security.OAuthAuthorizer;
import com.intuit.ipp.services.DataService;
import com.intuit.ipp.util.Config;
public class TestEncoding {
public static final String QBO_BASE_URL_SANDBOX = "https://sandbox-quickbooks.api.intuit.com/v3/company";
private static String consumerKey = "consumerkeycode";
private static String consumerSecret = "consumersecretcode";
private static String accessToken = "accesstokencode";
private static String accessTokenSecret = "accesstokensecretcode";
private static String appToken = "apptokencode";
private static String companyId = "companyidcode";
private static OAuthAuthorizer oauth = new OAuthAuthorizer(consumerKey, consumerSecret, accessToken, accessTokenSecret);
private static final int PAGING_STEP = 500;
public static void main(String[] args) throws FMSException {
List<Item> res = findAllServices(getDataService());
System.out.println(res.get(1).getName());
}
public static List<Item> findAllServices(DataService service) throws FMSException {
Item item = GenerateQuery.createQueryEntity(Item.class);
List<Item> res = new LinkedList<>();
for (int skip = 0; ; skip += PAGING_STEP) {
String query = select($(item)).skip(skip).take(PAGING_STEP).generate();
List<Item> items = (List<Item>)service.executeQuery(query).getEntities();
if (items.size() > 0)
res.addAll(items);
else
break;
}
System.out.println("All services fetched");
return res;
}
public static DataService getDataService() throws FMSException {
Context context = getContext();
if (context == null) {
System.out.println("Context is null, something wrong, dataService also will null.");
return null;
}
return getDataService(context);
}
private static Context getContext() {
try {
return new Context(oauth, appToken, ServiceType.QBO, companyId);
} catch (FMSException e) {
System.out.println("Context is not loaded");
return null;
}
}
protected static DataService getDataService(Context context) throws FMSException {
DataService service = new DataService(context);
Config.setProperty(Config.BASE_URL_QBO, QBO_BASE_URL_SANDBOX);
return new DataService(context);
}
}
This file is saved in UTF-8. And it prints something like
All services fetched
РЎСЌСЂРІС‹СЃ, РѕС‚РЅСЋРґСЊ
But! When I save this file in UTF-8 with BOM.... I get the correct data!
All services fetched
Сэрвыс, отнюдь
Does anybody can explain what is happening? :)
// I use Eclipse to run the code

You are fetching data from a system that doesn't share the same byte ordering as you, so when you save the file with BOM, it adds enough information in the file that future programs will read it in the remote system's byte ordering.
When you save it without BOM, it wrote the file in the remote system's byte ordering without any indication of the stored byte order, so when you read it you read it with the local system's (different) byte order. This jumbles up the bytes within the multi-byte characters, making the output appear as nonsense.

How to list the files inside a JAR file?

I have this code which reads all the files from a directory.
File textFolder = new File("text_directory");
File [] texFiles = textFolder.listFiles( new FileFilter() {
public boolean accept( File file ) {
return file.getName().endsWith(".txt");
}
});
It works great. It fills the array with all the files that end with ".txt" from directory "text_directory".
How can I read the contents of a directory in a similar fashion within a JAR file?
So what I really want to do is, to list all the images inside my JAR file, so I can load them with:
ImageIO.read(this.getClass().getResource("CompanyLogo.png"));
(That one works because the "CompanyLogo" is "hardcoded" but the number of images inside the JAR file could be from 10 to 200 variable length.)
EDIT
So I guess my main problem would be: How to know the name of the JAR file where my main class lives?
Granted I could read it using java.util.Zip.
My Structure is like this:
They are like:
my.jar!/Main.class
my.jar!/Aux.class
my.jar!/Other.class
my.jar!/images/image01.png
my.jar!/images/image02a.png
my.jar!/images/imwge034.png
my.jar!/images/imagAe01q.png
my.jar!/META-INF/manifest
Right now I'm able to load for instance "images/image01.png" using:
ImageIO.read(this.getClass().getResource("images/image01.png));
But only because I know the file name, for the rest I have to load them dynamically.

CodeSource src = MyClass.class.getProtectionDomain().getCodeSource();
if (src != null) {
URL jar = src.getLocation();
ZipInputStream zip = new ZipInputStream(jar.openStream());
while(true) {
ZipEntry e = zip.getNextEntry();
if (e == null)
break;
String name = e.getName();
if (name.startsWith("path/to/your/dir/")) {
/* Do something with this entry. */
...
}
}
}
else {
/* Fail... */
}
Note that in Java 7, you can create a FileSystem from the JAR (zip) file, and then use NIO's directory walking and filtering mechanisms to search through it. This would make it easier to write code that handles JARs and "exploded" directories.

Code that works for both IDE's and .jar files:
import java.io.*;
import java.net.*;
import java.nio.file.*;
import java.util.*;
import java.util.stream.*;
public class ResourceWalker {
public static void main(String[] args) throws URISyntaxException, IOException {
URI uri = ResourceWalker.class.getResource("/resources").toURI();
Path myPath;
if (uri.getScheme().equals("jar")) {
FileSystem fileSystem = FileSystems.newFileSystem(uri, Collections.<String, Object>emptyMap());
myPath = fileSystem.getPath("/resources");
} else {
myPath = Paths.get(uri);
}
Stream<Path> walk = Files.walk(myPath, 1);
for (Iterator<Path> it = walk.iterator(); it.hasNext();){
System.out.println(it.next());
}
}
}

erickson's answer worked perfectly:
Here's the working code.
CodeSource src = MyClass.class.getProtectionDomain().getCodeSource();
List<String> list = new ArrayList<String>();
if( src != null ) {
URL jar = src.getLocation();
ZipInputStream zip = new ZipInputStream( jar.openStream());
ZipEntry ze = null;
while( ( ze = zip.getNextEntry() ) != null ) {
String entryName = ze.getName();
if( entryName.startsWith("images") && entryName.endsWith(".png") ) {
list.add( entryName );
}
}
}
webimages = list.toArray( new String[ list.size() ] );
And I have just modify my load method from this:
File[] webimages = ...
BufferedImage image = ImageIO.read(this.getClass().getResource(webimages[nextIndex].getName() ));
To this:
String [] webimages = ...
BufferedImage image = ImageIO.read(this.getClass().getResource(webimages[nextIndex]));

I would like to expand on acheron55's answer, since it is a very non-safe solution, for several reasons:
It doesn't close the FileSystem object.
It doesn't check if the FileSystem object already exists.
It isn't thread-safe.
This is somewhat a safer solution:
private static ConcurrentMap<String, Object> locks = new ConcurrentHashMap<>();
public void walk(String path) throws Exception {
URI uri = getClass().getResource(path).toURI();
if ("jar".equals(uri.getScheme()) {
safeWalkJar(path, uri);
} else {
Files.walk(Paths.get(path));
}
}
private void safeWalkJar(String path, URI uri) throws Exception {
synchronized (getLock(uri)) {
// this'll close the FileSystem object at the end
try (FileSystem fs = getFileSystem(uri)) {
Files.walk(fs.getPath(path));
}
}
}
private Object getLock(URI uri) {
String fileName = parseFileName(uri);
locks.computeIfAbsent(fileName, s -> new Object());
return locks.get(fileName);
}
private String parseFileName(URI uri) {
String schemeSpecificPart = uri.getSchemeSpecificPart();
return schemeSpecificPart.substring(0, schemeSpecificPart.indexOf("!"));
}
private FileSystem getFileSystem(URI uri) throws IOException {
try {
return FileSystems.getFileSystem(uri);
} catch (FileSystemNotFoundException e) {
return FileSystems.newFileSystem(uri, Collections.<String, String>emptyMap());
}
}
There's no real need to synchronize over the file name; one could simply synchronize on the same object every time (or make the method synchronized), it's purely an optimization.
I would say that this is still a problematic solution, since there might be other parts in the code that use the FileSystem interface over the same files, and it could interfere with them (even in a single threaded application).
Also, it doesn't check for nulls (for instance, on getClass().getResource().
This particular Java NIO interface is kind of horrible, since it introduces a global/singleton non thread-safe resource, and its documentation is extremely vague (a lot of unknowns due to provider specific implementations). Results may vary for other FileSystem providers (not JAR). Maybe there's a good reason for it being that way; I don't know, I haven't researched the implementations.

So I guess my main problem would be, how to know the name of the jar where my main class lives.
Assuming that your project is packed in a Jar (not necessarily true!), you can use ClassLoader.getResource() or findResource() with the class name (followed by .class) to get the jar that contains a given class. You'll have to parse the jar name from the URL that gets returned (not that tough), which I will leave as an exercise for the reader :-)
Be sure to test for the case where the class is not part of a jar.

I've ported acheron55's answer to Java 7 and closed the FileSystem object. This code works in IDE's, in jar files and in a jar inside a war on Tomcat 7; but note that it does not work in a jar inside a war on JBoss 7 (it gives FileSystemNotFoundException: Provider "vfs" not installed, see also this post). Furthermore, like the original code, it is not thread safe, as suggested by errr. For these reasons I have abandoned this solution; however, if you can accept these issues, here is my ready-made code:
import java.io.IOException;
import java.net.*;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.Collections;
public class ResourceWalker {
public static void main(String[] args) throws URISyntaxException, IOException {
URI uri = ResourceWalker.class.getResource("/resources").toURI();
System.out.println("Starting from: " + uri);
try (FileSystem fileSystem = (uri.getScheme().equals("jar") ? FileSystems.newFileSystem(uri, Collections.<String, Object>emptyMap()) : null)) {
Path myPath = Paths.get(uri);
Files.walkFileTree(myPath, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
System.out.println(file);
return FileVisitResult.CONTINUE;
}
});
}
}
}

Here is an example of using Reflections library to recursively scan classpath by regex name pattern augmented with a couple of Guava perks to to fetch resources contents:
Reflections reflections = new Reflections("com.example.package", new ResourcesScanner());
Set<String> paths = reflections.getResources(Pattern.compile(".*\\.template$"));
Map<String, String> templates = new LinkedHashMap<>();
for (String path : paths) {
log.info("Found " + path);
String templateName = Files.getNameWithoutExtension(path);
URL resource = getClass().getClassLoader().getResource(path);
String text = Resources.toString(resource, StandardCharsets.UTF_8);
templates.put(templateName, text);
}
This works with both jars and exploded classes.

Here's a method I wrote for a "run all JUnits under a package". You should be able to adapt it to your needs.
private static void findClassesInJar(List<String> classFiles, String path) throws IOException {
final String[] parts = path.split("\\Q.jar\\\\E");
if (parts.length == 2) {
String jarFilename = parts[0] + ".jar";
String relativePath = parts[1].replace(File.separatorChar, '/');
JarFile jarFile = new JarFile(jarFilename);
final Enumeration<JarEntry> entries = jarFile.entries();
while (entries.hasMoreElements()) {
final JarEntry entry = entries.nextElement();
final String entryName = entry.getName();
if (entryName.startsWith(relativePath)) {
classFiles.add(entryName.replace('/', File.separatorChar));
}
}
}
}
Edit:
Ah, in that case, you might want this snippet as well (same use case :) )
private static File findClassesDir(Class<?> clazz) {
try {
String path = clazz.getProtectionDomain().getCodeSource().getLocation().getFile();
final String codeSourcePath = URLDecoder.decode(path, "UTF-8");
final String thisClassPath = new File(codeSourcePath, clazz.getPackage().getName().repalce('.', File.separatorChar));
} catch (UnsupportedEncodingException e) {
throw new AssertionError("impossible", e);
}
}

Just to mention that if you are already using Spring, you can take advantage of the PathMatchingResourcePatternResolver.
For instance to get all the PNG files from a images folder in resources
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources = resolver.getResources("images/*.png");
for (Resource r: resources){
logger.info(r.getFilename());
// From your example
// ImageIO.read(cl.getResource("images/" + r.getFilename()));
}

A jar file is just a zip file with a structured manifest. You can open the jar file with the usual java zip tools and scan the file contents that way, inflate streams, etc. Then use that in a getResourceAsStream call, and it should be all hunky dory.
EDIT / after clarification
It took me a minute to remember all the bits and pieces and I'm sure there are cleaner ways to do it, but I wanted to see that I wasn't crazy. In my project image.jpg is a file in some part of the main jar file. I get the class loader of the main class (SomeClass is the entry point) and use it to discover the image.jpg resource. Then some stream magic to get it into this ImageInputStream thing and everything is fine.
InputStream inputStream = SomeClass.class.getClassLoader().getResourceAsStream("image.jpg");
JPEGImageReaderSpi imageReaderSpi = new JPEGImageReaderSpi();
ImageReader ir = imageReaderSpi.createReaderInstance();
ImageInputStream iis = new MemoryCacheImageInputStream(inputStream);
ir.setInput(iis);
....
ir.read(0); //will hand us a buffered image

Given an actual JAR file, you can list the contents using JarFile.entries(). You will need to know the location of the JAR file though - you can't just ask the classloader to list everything it could get at.
You should be able to work out the location of the JAR file based on the URL returned from ThisClassName.class.getResource("ThisClassName.class"), but it may be a tiny bit fiddly.

Some time ago I made a function that gets classess from inside JAR:
public static Class[] getClasses(String packageName)
throws ClassNotFoundException{
ArrayList<Class> classes = new ArrayList<Class> ();
packageName = packageName.replaceAll("\\." , "/");
File f = new File(jarName);
if(f.exists()){
try{
JarInputStream jarFile = new JarInputStream(
new FileInputStream (jarName));
JarEntry jarEntry;
while(true) {
jarEntry=jarFile.getNextJarEntry ();
if(jarEntry == null){
break;
}
if((jarEntry.getName ().startsWith (packageName)) &&
(jarEntry.getName ().endsWith (".class")) ) {
classes.add(Class.forName(jarEntry.getName().
replaceAll("/", "\\.").
substring(0, jarEntry.getName().length() - 6)));
}
}
}
catch( Exception e){
e.printStackTrace ();
}
Class[] classesA = new Class[classes.size()];
classes.toArray(classesA);
return classesA;
}else
return null;
}

public static ArrayList<String> listItems(String path) throws Exception{
InputStream in = ClassLoader.getSystemClassLoader().getResourceAsStream(path);
byte[] b = new byte[in.available()];
in.read(b);
String data = new String(b);
String[] s = data.split("\n");
List<String> a = Arrays.asList(s);
ArrayList<String> m = new ArrayList<>(a);
return m;
}

There are two very useful utilities both called JarScan:
www.inetfeedback.com/jarscan
jarscan.dev.java.net
See also this question: JarScan, scan all JAR files in all subfolders for specific class

The most robust mechanism for listing all resources in the classpath is currently to use this pattern with ClassGraph, because it handles the widest possible array of classpath specification mechanisms, including the new JPMS module system. (I am the author of ClassGraph.)
How to know the name of the JAR file where my main class lives?
URI mainClasspathElementURI;
try (ScanResult scanResult = new ClassGraph().whitelistPackages("x.y.z")
.enableClassInfo().scan()) {
mainClasspathElementURI =
scanResult.getClassInfo("x.y.z.MainClass").getClasspathElementURI();
}
How can I read the contents of a directory in a similar fashion within a JAR file?
List<String> classpathElementResourcePaths;
try (ScanResult scanResult = new ClassGraph().overrideClasspath(mainClasspathElementURI)
.scan()) {
classpathElementResourcePaths = scanResult.getAllResources().getPaths();
}
There are lots of other ways to deal with resources too.

One more for the road that's a bit more flexible for matching specific filenames because it uses wildcard globbing. In a functional style this could resemble:
import java.io.IOException;
import java.net.URISyntaxException;
import java.nio.file.FileSystem;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.function.Consumer;
import static java.nio.file.FileSystems.getDefault;
import static java.nio.file.FileSystems.newFileSystem;
import static java.util.Collections.emptyMap;
/**
* Responsible for finding file resources.
*/
public class ResourceWalker {
/**
* Globbing pattern to match font names.
*/
public static final String GLOB_FONTS = "**.{ttf,otf}";
/**
* #param directory The root directory to scan for files matching the glob.
* #param c The consumer function to call for each matching path
* found.
* #throws URISyntaxException Could not convert the resource to a URI.
* #throws IOException Could not walk the tree.
*/
public static void walk(
final String directory, final String glob, final Consumer<Path> c )
throws URISyntaxException, IOException {
final var resource = ResourceWalker.class.getResource( directory );
final var matcher = getDefault().getPathMatcher( "glob:" + glob );
if( resource != null ) {
final var uri = resource.toURI();
final Path path;
FileSystem fs = null;
if( "jar".equals( uri.getScheme() ) ) {
fs = newFileSystem( uri, emptyMap() );
path = fs.getPath( directory );
}
else {
path = Paths.get( uri );
}
try( final var walk = Files.walk( path, 10 ) ) {
for( final var it = walk.iterator(); it.hasNext(); ) {
final Path p = it.next();
if( matcher.matches( p ) ) {
c.accept( p );
}
}
} finally {
if( fs != null ) { fs.close(); }
}
}
}
}
Consider parameterizing the file extensions, left an exercise for the reader.
Be careful with Files.walk. According to the documentation:
This method must be used within a try-with-resources statement or similar control structure to ensure that the stream's open directories are closed promptly after the stream's operations have completed.
Likewise, newFileSystem must be closed, but not before the walker has had a chance to visit the file system paths.

Just a different way of listing/reading files from a jar URL and it does it recursively for nested jars
https://gist.github.com/trung/2cd90faab7f75b3bcbaa
URL urlResource = Thead.currentThread().getContextClassLoader().getResource("foo");
JarReader.read(urlResource, new InputStreamCallback() {
#Override
public void onFile(String name, InputStream is) throws IOException {
// got file name and content stream
}
});

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Detect file type based on content - java

You can use Files.probeContentType(path)

Related

java Gherkin parser stream does not release file locks

How to invoke model from TensorFlow Java?

Get Random File from Folder - Java [duplicate]

Java file encoding magic

How to list the files inside a JAR file?

Categories

Resources