Convert JSON to ORC using orc-tools - java

I am trying to convert JSON file using the orc tools jar mentioned on
https://orc.apache.org/docs/tools.html#java-orc-tools
I have imported this in my pom.xml
<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-tools</artifactId>
<version>1.3.1</version>
</dependency>
However, after the import, I am unable to see/import the class org.apache.orc.tools.json.JsonSchemaFinder which is used to infer the schema from JSON files.
Example using the above class can be seen in this commit.
https://github.com/apache/orc/pull/95/commits/2ee0be7e60e7ca77f574110ba1babfa2a8e93f3f
Am I using the wrong jar here?

This is scheduled to release in 1.4.0 version of ORC. Current version 1.3.x doesnt include these features.
You can still get the ORC git branch, copy out the org.apache.orc.tools.convert and org.apache.orc.tools.json to your repo and use these features. Alternatively, you can also make a jar from the ORC repo and use it too.
public static void main(Configuration conf,
String[] args) throws IOException, ParseException {
CommandLine opts = parseOptions(args);
TypeDescription schema;
if (opts.hasOption('s')) {
schema = TypeDescription.fromString(opts.getOptionValue('s'));
} else {
schema = computeSchema(opts.getArgs());
}
String outFilename = opts.hasOption('o')
? opts.getOptionValue('o') : "output.orc";
Writer writer = OrcFile.createWriter(new Path(outFilename),
OrcFile.writerOptions(conf).setSchema(schema));
VectorizedRowBatch batch = schema.createRowBatch();
for (String file: opts.getArgs()) {
System.err.println("Processing " + file);
RecordReader reader = new JsonReader(new Path(file), schema, conf);
while (reader.nextBatch(batch)) {
writer.addRowBatch(batch);
}
reader.close();
}
writer.close();
}

Related

How do I make json file in src/main/resources/ accessable when spring application is hosted in a docker container?

I'm trying to read a json file that is located under src/main/resources, this works just fine locally but after having built docker-image the same code reports "IOException - File not found" when it's deployed.
Here is the code attempting to load in the class:
#Component
public class KontorAdmin {
List<KontorInfo> kontorer;
String filePath = "./src/main/resources/kontortoggle.json";
public KontorAdmin() throws IOException {
initKontor();
}
public void initKontor() throws IOException {
File file = new File(filePath);
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
KontorWrapper kontorWrapper = objectMapper.readValue(file, KontorWrapper.class);
this.kontorer = kontorWrapper.getKontorer();
}
Here is the dockerfile:
FROM openjdk:11
VOLUME /tmp
ADD target/rek-service.jar rek-service.jar
EXPOSE 80:80
ENTRYPOINT ["java","-jar","rek-service.jar"]
I'm thinking I could ADD the file to docker, but how do I then make it pass unit tests locally which expects a different path to file?
Thanks for any help
Actually, the resource file is in your classpath.
You should try reading like this,
File file = ResourceUtils.getFile("classpath:kontortoggle.json");
or
File file = new File(getClass().getResource("kontortoggle.json").getFile());
You problem is your file access. When building the jar, your json file is moved as part of the jar file. Therefor you never may find this file by a simple file access by its name. Your resource json file is part of the rek-service.jar file. You have multiple opportunities to access this file.
In your use case you are using an fasterxml object mapper. This can be used with files or streams. I would suggest to use a stream by this code:
#Component
public class KontorAdmin {
List<KontorInfo> kontorer;
public KontorAdmin() throws IOException {
initKontor();
}
public void initKontor() throws IOException {
InputStream is = getClass().getResourceAsStream("kontortoggle.json");
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
KontorWrapper kontorWrapper = objectMapper.readValue(is, KontorWrapper.class);
this.kontorer = kontorWrapper.getKontorer();
}
But by the #Component annotation I suspect, that you are using spring. With Spring this is much easier to get the InputStream or File from class path. Then you should take a look at the ClassPathResource or the ResourceUtils. The ResourceUtils seems to be for internal use by JavaDoc, but anyone is using them in Spring.
new ClassPathResource("data/employees.dat", this.getClass().getClassLoader());
ResourceUtils.getFile("classpath:kontortoggle.json");
#Component
public class KontorAdmin {
List<KontorInfo> kontorer;
public KontorAdmin() throws IOException {
initKontor();
}
public void initKontor() throws IOException {
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
KontorWrapper kontorWrapper = objectMapper.readValue(ResourceUtils.getFile("classpath:kontortoggle.json"), KontorWrapper.class);
this.kontorer = kontorWrapper.getKontorer();
}

How to get Maven to read a file from my resources folder

I have been happily using JUnit to run my tests and everything has been fine.
However, I now need to use Maven but for some reason it cannot find any of my resource files.
The files are in the expected place: src/main/resources
I am using the following code to try to read a file:
public Map<String, String> readCsv(String filename) {
Map<String, String> headersAsMap;
CSVDataManipulator csvDataManipulator = new CSVDataManipulator();
ClassLoader classLoader = getClass().getClassLoader();
String wrkbook = new File(classLoader.getResource(filename).getFile()).toString();
headersAsMap = csvDataManipulator.getAllRecordsAsMap(wrkbook);
return headersAsMap;
}
However, try as I might it cannot find the file.
I've tried lots of different code and tried moving the files to different locations but I cannot get Maven to find my resource files.
Any help would be greatly appreciated!
Thanks
To my understanding classLoader.getResource(..) expects the file to be in a folder structure matching the package of the class. So if the package of your class is com.matt.stuff, then you'll have to put the csv file in src/main/resources/com/matt/stuff.
Or you could just use this to grab your csv file:
private static String readFile(String fileName) throws IOException {
//filename can be src/main/resources/my-csv.csv
return new String(Files.readAllBytes(Paths.get(fileName)));
}
A File is indeed a file on the file system. For a resource which might be a file zipped in a jar, and has a path on the class path, you need something else.
Traditionally one would use a more general InputStream instead of a File.
InputStream in = getClass().getResourceAsStream("/.../x.csv"); // Path on the class path
With the new class Path, more general than File, you can deal with several (virtual) file systems:
URL url = getClass().getResource("/.../x.csv"); // Path on the class path
Path path = Paths.get(url.toURI());
Files.copy(path, Paths.get("..."));
With a bit of luck your CSVManipulator should besides being parametrized with a File, also with an InputStream or Reader (new InputStreamReader(in, "UTF-8"))
Here's my file structure of a Maven project built from the quick-start-archetype, with comons-io added as a dependency:
src
src/main
src/main/java
src/main/java/com
src/main/java/com/essexboy
src/main/java/com/essexboy/App.java
src/main/resources
src/main/resources/dir1
src/main/resources/dir1/test.txt
src/main/resources/dir2
src/main/resources/dir2/test.txt
src/main/resources/test.txt
src/test
src/test/java
src/test/java/com
src/test/java/com/essexboy
src/test/java/com/essexboy/AppTest.java
Here's my test
public class AppTest {
#Test
public void shouldAnswerWithTrue() throws Exception {
StringWriter writer = new StringWriter();
IOUtils.copy(getClass().getResourceAsStream("/test.txt"), writer, Charset.defaultCharset());
assertEquals("from root", writer.toString().trim());
writer = new StringWriter();
IOUtils.copy(getClass().getResourceAsStream("/dir1/test.txt"), writer, Charset.defaultCharset());
assertEquals("from dir1", writer.toString().trim());
writer = new StringWriter();
IOUtils.copy(getClass().getResourceAsStream("/dir2/test.txt"), writer, Charset.defaultCharset());
assertEquals("from dir2", writer.toString().trim());
}
}

Accessing and manipulating resource files in Maven project

I know that there already are many questions on this topic, but none of them seem to help. In my Maven Project I am trying to access the src/main/resource files, and i know that during runtime when I want to read a file and write to it i must access its class path or something.
I am really not sure how that works, I managed to write code to read from a file, but I don't know how to write code that writes to a file. Here it is what i have so far:
private File userFile;
private ObjectMapper om;
private InputStream is;
public UserService() {
om = new ObjectMapper();
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
is = classLoader.getResourceAsStream("files/userJSON.json");
}
This constructor represents the UserService class which methods I use to access User Resoruces located in userJSON.json file. Here is a method I use for reading a file:
public List<User> readJSON() throws JsonParseException, JsonMappingException, IOException{
BufferedReader br = new BufferedReader(new InputStreamReader(is));
List<User> list = om.readValue(br,new TypeReference<List<User>>(){});
return list;
}
How can I write to a file, and if possible to explain to me how this works. By that I mean how does Maven work with files during runtime and so on.
Just call this method it will read the file and convert it to JSONObject:
private static JSONObject readUserJSON() throws FileNotFoundException, IOException, ParseException {
String jarPath = (new File("")).getAbsolutePath();
JSONObject config = null;
JSONParser parser = new JSONParser();
config = (JSONObject) parser.parse(new FileReader(jarPath + "resource/files/userJSON.json"));
return config;
}
ADD this dependency in .pom file:
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple example</artifactId>
<version>1.1.1</version>
</dependency>
Using below statement you get Absolute path of file and you can write it using file writer :
String jarPath = (new File("")).getAbsolutePath();

How to load external properties file with Java without rebuilding the Jar?

I use gradle which structures projects in maven style so I have the following
src/main/java/Hello.java and src/main/resources/test.properties
My Hello.java look like this
public class Hello {
public static void main(String[] args) {
Properties configProperties = new Properties();
ClassLoader classLoader = Hello.class.getClassLoader();
try {
configProperties.load(classLoader.getResourceAsStream("test.properties"));
System.out.println(configProperties.getProperty("first") + " " + configProperties.getProperty("last"));
} catch (IOException e) {
e.printStackTrace();
}
}
}
This works fine. however I want to be able to point to .properties file outside of my project and I want to it to be flexible enough that I can point to any location without rebuilding the jar every time. Is there a way to this without using a File API and passing file path as an argument to the main method?
You can try this one, which will first try to load properties file from project home directory so that you don't have to rebuild jar, if not found then will load from classpath
public class Hello {
public static void main(String[] args) {
String configPath = "test.properties";
if (args.length > 0) {
configPath = args[0];
} else if (System.getenv("CONFIG_TEST") != null) {
configPath = System.getenv("CONFIG_TEST");
}
File file = new File(configPath);
try (InputStream input = file.exists() ? new FileInputStream(file) : Hello.class.getClassLoader().getResourceAsStream(configPath)) {
Properties configProperties = new Properties();
configProperties.load(input);
System.out.println(configProperties.getProperty("first") + " " + configProperties.getProperty("last"));
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
You can send the properties file path as argument or set the path to an environment variable name CONFIG_TEST
Archaius may be complete overkill for such a simple problem, but it is a great way to manage external properties. It is a library for handling configuration: hierarchies of configuration, configuration from property files, configuration from databases, configuration from user defined sources. It may seem complicated, but you will never have to worry about hand-rolling a half-broken solution to configuration again. The Getting Started page has a section on using a local file as the configuration source.

Mapr using Java

I am new to Hadoop, Mapr and Pivotal. I have written java code to write into pivotal but facing issue while writing into Mapr.
public class HadoopFileSystemManager {
private String url;
public void writeFile(String filePath,String data) throws IOException, URISyntaxException {
Path fPath = new Path(filePath);
String url = url = "hdfs://"+ip+":"+"8020";
FileSystem fs = FileSystem.get(new URI(url),new Configuration());
System.out.println(fs.getWorkingDirectory());
FSDataOutputStream writeStream = fs.create(fPath);
writeStream.writeChars(data);
writeStream.close();
}
}
This code works fine with pivoatal but fails with Mapr.
For Mapr i am using port = 7222.
I am getting the following error
"An existing connection was forcibly closed by the remote host"
Please let me know if am using the right port or anything needs to be changed in the code specific to Mapr.
I have stopped the iptables.
Any info is much appreciated.
Thanks
Heading
Try this code. But make sure you have MapR client setup in the node from where you are executing the test.
public class HadoopFileSystemManager {
private String url;
public void writeFile(String filePath,String data) throws IOException, URISyntaxException {
System.setProperty( "java.library.path", "/opt/mapr/lib" );
Path fPath = new Path(filePath);
String url = url = "hdfs://"+ip+":"+"8020";
FileSystem fs = FileSystem.get(new URI(url),new Configuration());
System.out.println(fs.getWorkingDirectory());
FSDataOutputStream writeStream = fs.create(fPath);
writeStream.writeChars(data);
writeStream.close();
}
}
Add the following to the classpath:
/opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar:.:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.2.jar
This statement in the above code: System.setProperty( "java.library.path", "/opt/mapr/lib" ); can be removed and can be supplied using -Djava.library.path too, if you are running your program from terminal when building.
/opt/mapr may not be your path to mapr files. If that's the case replace the path accordingly wherever applicable.
After comment:
If you are using Maven to build your project, try using the following in the pom.xml,
and with scope provided. MapR is compatible with the normal Apache Hadoop distribution too. So, while building you can use the same. Then when you run your program, you would supply the mapR jars in the classpath.
<dependency>
<groupid>hadoop</groupid>
<artifactid>hadoop</artifactid>
<version>0.20.2</version>
<scope>provided</scope>
</dependency>

Categories