Hadoop read JSON from HDFS - java

I'm trying to read an JSON file into my hadoop mapreduce algorithm.
How can i do this? I've put a file 'testinput.json' into /input in my HDFS memory.
When calling the mapreduce i execute hadoop jar popularityMR2.jar populariy input output, with input stating the input directory in the dhfs memory.
public static class PopularityMapper extends Mapper<Object, Text, Text, Text>{
protected void map(Object key, Text value,
Context context)
throws IOException, InterruptedException {
JSONParser jsonParser = new JSONParser();
try {
JSONObject jsonobject = (JSONObject) jsonParser.parse(new FileReader("hdfs://input/testinput.json"));
JSONArray jsonArray = (JSONArray) jsonobject.get("votes");
Iterator<JSONObject> iterator = jsonArray.iterator();
while(iterator.hasNext()) {
JSONObject obj = iterator.next();
String song_id_rave_id = (String) obj.get("song_ID") + "," + (String) obj.get("rave_ID")+ ",";
String preference = (String) obj.get("preference");
System.out.println(song_id_rave_id + "||" + preference);
context.write(new Text(song_id_rave_id), new Text(preference));
}
}catch(ParseException e) {
e.printStackTrace();
}
}
}
My mapper function now looks like this. I read the file from the dhfs memory. But it always returns an error, file not found.
Does someone know how i can read this json into a jsonobject?
Thanks

FileReader cannot read from HDFS, only local Filesystem.
The filepath comes from the Job parameters - FileInputFormat.addInputPath(job, new Path(args[0]));
You wouldn't read the file in the Mapper class, anyway.
MapReduce defaults to read line-delimited files, so your JSON objects would have to be one per-line such as
{"votes":[]}
{"votes":[]}
From the mapper, you would parse the Text objects into JSONObject like so
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
JSONParser jsonParser = new JSONParser();
try {
JSONObject jsonobject = (JSONObject) jsonParser.parse(value.toString());
JSONArray jsonArray = (JSONArray) jsonobject.get("votes");
If you only have one JSON object in the file, then you probably shouldn't be using MapReduce.
Otherwise, you would have to implement a WholeFileInputFormat and set that in the Job
job.setInputFormatClass(WholeFileInputFormat.class);

Tried reading the JSON from HDFS path using the following function using pydoop library and it is working as expected.Hope it helps.
import pydoop.hdfs as hdfs
def lreadline(inputJsonIterator):
with hdfs.open(inputJsonIterator,mode='rt') as f:
lines = f.read().split('\n')
return lines

Related

Java - Access JSON Element (llegalFormatConversionException: d !=java.lang.String)

I am trying to learn JAVA and build a plugin for Minecraft; I have been successfully able to get the JSON data from my api endpoint however, the issue I am facing right now is an llegalFormatConversionException: d !=java.lang.String which means that the format I am trying to make into string isn't equal to the type of string that it's looking for.
I am trying to access a JSON element from my endpoint called condition
{
"todaysdate": "2021-02-12",
"temperature": 25,
"description": [
"Overcast"
],
"condition": 122,
}
Coming from C#; I know there's a website called json2sharp where you can create a root class for the JSON. I'm not sure how it would be applied in Java but currently, my code looks like this.
private String fetchWeather() throws IOException, InvalidConfigurationException {
// Download
final URL url = new URL(API);
final URLConnection request = url.openConnection();
// Set HEADER
request.setRequestProperty("x-api-key", plugin.apiKey);
request.setConnectTimeout(5000);
request.setReadTimeout(5000);
request.connect();
// Convert to a JSON object to print data
JsonParser jp = new JsonParser(); //from gson
JsonElement root = jp.parse(new InputStreamReader((InputStream) request.getContent())); //Convert the input stream to a json element
JsonObject rootobj = root.getAsJsonObject();
//JsonElement code = rootobj.get("condition");
String condition_code = rootobj.get("condition").toString();
plugin.getLogger().fine(String.format(
"[%s] Weather is %d",
world.getName(), condition_code
));
return condition_code;
}
If I call
private JsonObject fetchWeather() throws IOException, InvalidConfigurationException {
// Download
final URL url = new URL(API);
final URLConnection request = url.openConnection();
// Set HEADER
request.setRequestProperty("x-api-key", plugin.apiKey);
request.setConnectTimeout(5000);
request.setReadTimeout(5000);
request.connect();
// Convert to a JSON object to print data
JsonParser jp = new JsonParser(); //from gson
JsonElement root = jp.parse(new InputStreamReader((InputStream) request.getContent())); //Convert the input stream to a json element
JsonObject rootobj = root.getAsJsonObject();
return rootobj;
}
state = fetchWeather();
plugin.getLogger().warning(state.toString());
I do get the actual JSON with all of the elements so I know the URL and accessing it is completely working, but I get an llegalexception for format if I try to call and print with logger the condition code.
So, if I change the return type in fetchWeather() to be the root json object and then try to print it with the state variable above it works; but if I try to return the condition json element and print it it gives me an iilegal exception.
Now before I posted this question I did read some other questions people had but I couldn't get a working solution from their suggested answers. So, I am hoping someone can point me out on what I'm doing wrong because I know I'm messing up somewhere with the variable format.
Thanks.
Line 37: state = fetchWeather();
Line 103:
plugin.getLogger().fine(String.format(
"[%s] Weather is %d",
world.getName(), bId
));
condition_code variable is String. You should use the %s format specifier.

How to get value for particular key only without reading whole JSON file in Java?

I need some information for reading a JSON file using Java. I have a sample data in abc.json file as following:
{
"id":"abc",
"name":"abc",
"otherInfo": {},
"personalData:[{}]
}
I am reading this file as following:
JSONParser parser = new JSONParser();
Object obj = parser.parse(new FileReader("...")); //the location of the file
JSONObject jsonObject = (JSONObject) obj;
JSONArray numbers = (JSONArray) jsonObject.get("personalData");
My question is will this above code read the whole file and then give data for ("personalData") key or will just fetch the data for particular key ("personalData"). And, if it fetches the entire's file data, is there any way to just fetch the value of a particular key without reading the whole file?

Loop through json file appending key value to List

I am trying to loop through a JSON file and append the value each time to patientList. So far I believe I have done the hard part, however the simplest part seems to be taking a lot of time, that is appending the values to patientList. My getJsonFile method gets the path of the JSON file. The format of the JSON file is below. I am able to print jsonArray so I know I am good up to that point, but lost after that.
Json file.
[{"patient":1},{"patient":2},{"patient":3},{"patient":4},{"patient":5},{"patient":6},{"patient":7},{"patient":8},{"patient":9}]
getJsonFile method.
private List<Integer> getJsonFile(String path)
{
List<Integer> patientList = new ArrayList<>();
try (FileReader reader = new FileReader(path))
{
JSONParser jsonParser = new JSONParser();
JSONArray jsonArray = (JSONArray)jsonParser.parse(reader);
System.out.println(jsonArray);
// Update patientList
for (int i = 0; i < jsonArray.size(); i++ )
{
patientList.add(jsonArray(i));
}
}
catch(IOException | ParseException | NullPointerException e)
{
e.printStackTrace();
}
return patientList;
}
Your JSONArray contains objects: {"patient":1}
So you could not add patientList.add(jsonArray(i));
You have to access the int value inside that object:
JSONObject patient = jsonArray.getJsonObject(i);
patientList.add(patient.getInt("patient");
Edit
Well. You are using the simple-json library with quite limit feature and outdated. In this case you have to cast the data yourself:
JSONObject patient = (JSONObject)jsonArray.get(i);
patientList.add((Integer)patient.get("patient");
I recommend you remove this lib and use existing JSON feature of Java. If you want more advance feature, Jackson/GSon is the library to use.

Reading json objects separated by new line

I am trying to write a test case where I want to stream json objects from a json file separated by new line into Java.
I want to stream one event object in Java and serialize it.
The json file is of the form:
{"event":[{"D49-64":0,"Bezeichnung":"A 41","D33-48":0}]}
{"event":[{"D49-64":1,"Bezeichnung":"A 41","D33-48":0}]}
Any suggestions to stream the objects in Java will be beneficial.
The blob that you have posted is not a valid JSONObject, but two individual objects.
To stream this, you would end up with something like the following:
String pathToFile = "/path/to/something.txt";
BufferedReader someReader = new BufferedReader( new FileReader( pathToFile ));
String someData;
while (( someData = someReader.readLine() ) != null ) {
JSONObject o = new JSONObject( someData );
doSomethingWith( o );
}
The library I generally use for JSON manipulation is org.json
I was solving the same problem: reading data from file which just has sequence of json objects in it. I am using com.fasterxml.jackson library for json manipulation. While it does not have direct methods for exactly this, the solution is still quite simple:
// InputStream in - input stream with your data
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser(in);
ObjectNode nextObject;
do {
nextObject = mapper.readTree(parser); // returns null when end of stream is reached
// process your object here
} while(nextObject != null);

Parsing list of json

I need to parse list of json stored in a single file !
What I have done so far is,
test.json file contains:
{"location":"lille","lat":28.4,"long":51.7,"country":"FR"}
with this file I have the code below
public class JsonReader {
public static void main(String[] args) throws ParseException {
JSONParser parser = new JSONParser();
try {
Object obj = parser.parse(new FileReader("c:\\test.json"));
JSONObject locationjson= (JSONObject) obj;
String location= (String) locationjson.get("location");
System.out.printf("%s",location);
long lat = (Long) locationjson.get("lat");
System.out.printf("\t%d",lat);
//similarly for other objects
This is a working code and I am able to print only one json in the file test.json
Now if I have to print a list of json in file: test1.json as shown below: each line is a single valid json and there are list of json in a single file. What I need is to parse each json and print it in each line. Will using a bean class work?
{"Atlas":{"location":"lille","lat":28.4,"long":51.7,"country":"FR"}}
{"Atlas":{"location":"luxum","lat":24.1,"long":54.7,"country":"LU"}}
{"Atlas":{"location":"ghent","lat":28.1,"long":50.1,"country":"BE"}}
{"Atlas":{"location":"alborg","lat":23.4,"long":53.7,"country":"DN"}}
Your help is appreciated !
The JSON should have a root node.
If you don't have that, you can read from the file line-by-line, and pass each line into the JSONParser wrapped in a StringReader (since the JSONParser.parse() method takes a Reader).
e.g.
BufferedReader in
= new BufferedReader(new FileReader("test.json"));
while(!done) {
String s = in.readLine();
if (s == null) {
done = true;
}
else {
StringReader sr = new StringReader(s);
// etc...
}
}
Edit: I've assumed you're using JSONParser. If you're using a different parser (which one?) then it may take a String argument.
JSONParser.parse() also takes the String as an aurgument.
Read the file with FileReader and for each line that you read use JSONParser.parse(String) method.
First of all:make sure you have created a class just for the son item for each row that has the string properties in the header file and synthesized in the implementation file.Then create a property array in the implementation file that will be doing the parsing.In the parse file..use NSJSONSerialization in your retrieve data method:
-(void)retrieveData{
NSURL * url =[NSURL URLWithString:getDataURL];
NSData *data =[NSData dataWithContentsOfURL:url];
json = [NSJSONSerialization JSONObjectWithData:data options:kNilOptions error:nil];
newsArray = [[NSMutableArray alloc]init];
for (int i = 0; i < json.count; i++){
{
NSString * cID =[[json objectAtIndex:i]objectForKey:#"Name1"];
NSString * cName = [[json objectAtIndex:i]objectForKey:#"name2"];
NSString * cState =[[json objectAtIndex:i]objectForKey:#"name3"];
NSString * cPopulation =[[json objectAtIndex:i]objectForKey:#"Edition"];
NSString * cCountry =[[json objectAtIndex:i]objectForKey:#"name4"];
City *myCity = [[City alloc]initWithCityID:cID andCityName:cName andCityState:cState andCityPopulation:cPopulation andCityCountry:cCountry];
[newsArray addObject:myCity];
}
[self.myTableView reloadData];
}
}
Then retrieve the son objects or arrays and enter them in the name 1 name 2 section to parse them to your table.Also make sure you have already setup your Table and assigned it a cell identifier and indexed to row.
-(UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath{
//static NSString *strIdentifier=#"identity";
static NSString *cellIdentifier = #"Cell";
UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:cellIdentifier];
if(cell == nil){
cell = [[UITableViewCell alloc]initWithStyle:UITableViewCellStyleSubtitle reuseIdentifier:cellIdentifier];
}
// cell.textLabel.text = [NSString stringWithFormat:#"Cell %d",indexPath.row];
City *currentCity =[newsArray objectAtIndex:indexPath.row];
cell.textLabel.text = currentCity.Name1;
cell.detailTextLabel.text = currentCity.Name2;
return cell;
}

Categories