I have a JSON file with varying schema.
{"asin":"xxxxxx", "title":"xxxsomething"}
{"asin":"yyyyy"}
{"asin":"zzzzzz", "title":"zzzsomething"}
For which I have written a pig script that makes use of twitter's elephant-bird library to load the JSON data and convert it into a tab separated file.
However if a line in the input JSON file is missing the "title" key (line# 2 in above example), the tvs file also has nothing in place of it, like:
xxxxxx xxxsomething
yyyyyy
zzzzzz zzzsomething
I would like to give custom default value if a particular key is missing. How can I do this using PigLatin?
expected output:
xxxxxx xxxsomething
yyyyyy default_string
zzzzzz zzzsomething
Here's my script:
REGISTER elephant-bird-elephant-bird-4.13/pig/target/elephant-bird-pig-4.13.jar;
REGISTER elephant-bird-elephant-bird-4.13/hadoop-compat/target/elephant-bird-hadoop-compat-4.13.jar;
REGISTER elephant-bird-elephant-bird-4.13/core/target/elephant-bird-core-4.13-thrift9.jar;
reviews = load '../data/Amazon/meta_Amazon_Instant_Video.json'
using com.twitter.elephantbird.pig.load.JsonLoader();
tabs = FOREACH reviews generate (chararray)$0#'asin' as asin_new, (chararray)$0#'title';
A = ORDER tabs BY asin_new;
DESCRIBE A;
STORE A INTO 'hdfs://localhost:9000/meta_Amazon_Instant_Video.tsv';
You can simply write a UDF for that and put the condition that if either one of them is empty then pass the default string.
Related
I am reading an application.yml file in my java Spring application, and getting this property called body to send in a request (it is a very very long json), sometimes it contains names or values like you will see in the example ahead, and it messes up the yml, any way to solve this so that it takes the json properly? Here is a little example of the kind of data that messes my application.yml (that comes inside that very big json):
data:
body: '{"name":"O'brien"}'
The problem is the ' in the persons name
I tried using: <%='putting the very big json here'%> but then I get "Nested mappings are not allowed in compact mappings", also tried
<%=very big json%> but get the same error
I am trying to retrieve the group of values from property file based on the key.
myproerty.properties
key=1
name=adam
place=USA
address=Michigan
Key=2
name=umesh
place=india
address=bengaluru
I want to retrieve values of that particular key values.
Earlier i tried using the below method but it doesnt differentiate key.
myProperties = new Properties();
myProperties.load(HelloWorld.class.getResourceAsStream("/myproerty.properties"));
name=myProperties.getProperty("adam");
but how do we retrieve group of values based on the key
It seems that you need to read your properties file as INI file. Take a look:
How to parse ini file with sections in Java?
What is the easiest way to parse an INI file in Java?
So, your file should look like that:
[key1]
name=adam
place=USA
address=Michigan
[Key2]
name=umesh
place=india
address=bengaluru
and use a library like ini4j for parsing such ini files.
I have a JSON data in the properties file and trying to retrieve it in java. When I am trying to retrieve the JSON data with the property name it's giving only first string/word from the JSON.
Inside the property file, I have the below content.
profile: {"fname": "ABC","lname": "XYZ","meetings":{"morning":10,"evening":60}}
I am trying to read the content using property name 'profile' as a string and I am getting below error message.
Expected ',' instead of ''
can someone help me with the issue, I tried to escape and unescape but still have the same issue
It may depend on what you are using to deserialize the JSON, but well formed JSON is a single element, so what you have needs to be inside of a container. That is, your file content should be:
{ profile: {"fname": "ABC","lname": "XYZ","meetings":{"morning":10,"evening":60}}}
You can do it like this:
profile={"fname": "ABC","lname": "XYZ","meetings":{"morning":10,"evening":60}}
Or if you want to do it in multiple lines
profile={\
"fname": "ABC",\
"lname": "XYZ",\
"meetings":{\
"morning":10,\
"evening":60\
}\
}
I'm trying to use tTikaExtractor component to extract the content of several files in a folder.
It is working with a single file but when I add a tFileList component, I don't understand how to get the content of the 2 different files.
I think it is something related to flow/iterations but I cannot manage to make it work.
For example, I have this simple job :
tFileList -(iterate)-> tTikaExtractor -(onComponentOk)-> tJava -(row1)-> tFileOutputJSON
In my java component I only have this :
String content = (String) globalMap.get("tTikaExtractor_1_CONTENT");
row1.content=content;
But in my json output I only the content of the last file and not of all files !
Can you help me on this ?
That because you are not appending records to the output it is writing records one by one so eventually only last record is available in file.
Perhaps you can write all the rows to delimited file first then use tFileInputDelimited--main--tFileOutputJSON
to transfer all the rows.
My scenario is to read a file from a file endpoint which contains only key value paris like a property file and take a few data from it based on the key .
Any idea how to do them other that using a custom bean or java component.
I would like to know if this is possible any way in Mule or Camel.
Thanks in advance.
If you want to use a Camel route, to pickup files, then something like this
from("file:inbox")
.convertBodyTo(Properties.class)
.log("The foo value is {${body[foo]}")
.log("The bar value is {${body[bar]}")
What we then need is a type converter from java.io.File -> java.util.Properties. Which we could add to camel-core out of the box.
I logged a ticket to add that type converter out of the box in Camel: https://issues.apache.org/jira/browse/CAMEL-7312
I think to this problem explained, the very easy solution is to use java.util.Properties class. Load the file using Properties class which maintains key value pair only.