I'm trying to parse data obtained via Apache HTTPClient in the fastest and most efficient way possible.
The data returned by the response is a string but in a CSV like format:
e.g. the String looks like this:
date, price, status, ...
2014-02-05, 102.22, OK,...
2014-02-05, NULL, OK
I thought about taking the string and manually parsing it, but this may be too slow as I have to do this for multiple requests.
Also the data returned is about 23,000 lines from one source and I may have to parse potentially several sources.
I'm also storing the data in a hash map of type:
Map<String, Map<String, MyObject>>
where the key is the source name, and value is a map with the parsed objects as a key.
So I have 2 questions, best way to parse a 23,000 line file into objects, and best way to store it.
I tried a csv parser, however the double's if not present are stored as NULL and not 0 so I will need to manually parse it.
Thanks
Related
I have a really big json to read and store into database. I am using mix mode of stream and object using gson. If file format is correct it works like a charm. but if format is not correct within an object then whole file is skipped with an exception (reader.hasNext() throws exception).
Is there a way to skip a particular bad record and continue to read with rest of file?
Sample json file structure -
[{
"A":1,
"B":2,
"C":3
}]
and let say comma or colon is missing in this object.
Another example is if there are multiple objects and comma is missing between }(no comma){ 2 objects.
let say comma or colon is missing in this object
Unfortunately if you're missing a comma or a colon, then it's impossible to parse the JSON data.
But:
it's actually a good thing the parser doesn't accept this data because it protects you from accidentally reading garbage. Since you are putting this data into a database, it's protecting you from potentially filling your database with garbage.
I believe the best solution is to fix the producer of this JSON data and implement the necessary safe guards to prevent bad JSON data in the future.
My Question seems very similar to this question, but what happens if the are duplicate values in the json file?
The duplicate values are found in the json file due to the file contents originating from postgres which allow to insert duplicate values in older JSON format files.
My input look like this.
{
"61":{"value":5,"to_value":5},
"58":{"r":0,"g":0,"b":255}, "58":{"r":165,"g":42,"b":42},"58:{"r":0,"g":255,"b":0},
"63":{"r":0,"g":0,"b":0},
"57":{"r":0,"g":0,"b":255},"57":{"r":0,"g":255,"b":0}
}
If you look carefully there are multiple values of "58" as keys. The main keys "61" and "58" are mappted to a nested map type with different keys.
Now to simplify what I want to achieve, my output of the above input json should look like this.
Approach or solution both equally appreciated in java only.
{
"61":[5,5],
"58": [{"r":0,"g":0,"b":255},{"r":165,"g":42,"b":42},{"r":0,"g":255,"b":0}],
"63":[{"r":0,"g":0,"b":0}],
"57":[{"r":0,"g":0,"b":255},{"r":0,"g":255,"b":0}]
}
A good tool for parsing JSON formats is this library: JSONObject
Here an example of usage in a previous SO question:
Parsing JSON which contains duplicate keys
I have a JSON string that looks like:
"{\"info\":{\"length\":{\"value\":18},\"name\":{\"value\":\"ABC\"}}}"
say, length and name are attribute names
I have another map (say attributeMap) that (created from the results I retrieve from the database) map has attribute name and attribute value association stored.
I need to be able to parse the string and compare the value an attribute has in the above string with the value returned from the attributeMap. Based on those comparisons, I will need to take some decisions.
In order to do this, I should convert the above string to a format that would help make the above comparison easier and efficient. I don't think I should be writing my own parser to do this. what would a right way to do this?
You should use any JSON Parser, like GSON (Google) (Recommended for simplicity), JACKSON, the simple org.json, or any other..
Then you will get a JSONObject/JSONNode to navigate and do the comparison.
You can find a parsing example here: How to parse JSON in Java
this is my xml file
<waveform>
<Ivalue>12,13,14,15,16,17,18</Ivalue>
<IIvalue>1,4,15,23,22,44</IIvalue>
</waveform>
<waveform>
<Ivalue>12,13,14,15,16,17,18</Ivalue>
<IIvalue>1,4,15,23,22,44</IIvalue>
</waveform>
here, I know how to retrieve the values by tags but is it possible to store these values into separate int[]?
Thanks
You may use JAXB for extracting tags like Ivalue AS STRING.
To my knowledge it is at least not easy to get it directly as int array (with JAXB)
However, it is easy to split the string using String.split and convert the results with
Integer.parse
Anybody know which is the easy way to insert a json string in cassandra.
Suppose I have a json string like this: {'key1':'val1','key2':'val2'}
In MongoDB we can insert directly a json string like dbobj.insert(jsonstring);
So is there any way to do like this in Cassandra?(I am coding in java)
There are at least 3 ways, but it depends what you are trying to achieve and what kinds of query you want to run.
You could store the JSON string as just a plain string/byte, as a Cassandra column name (assuming there is something you can use as the row key). You won't be able to do queries based on the JSON content, though; this would be opaque data that you process client-side.
You could split up the JSON before storage, so that key1, key2 are column names and val1, val2 are the corresponding column values. Again, you'd need something to use as a row key. This method would let you retrieve individual values, and use secondary indexes to retrieve rows with particular values.
You could even use key1, key2 as row keys, with val1, val2 as column names. Given that you have the key-val pairs grouped in JSON, they presumably belong to the same entity and are related, so this is unlikely to be useful, but I mention it for completeness.
Edited to add: If your question is actually how to insert data into Cassandra at all, then you should read the docs for a Java client such as Hector (there are other options too - see http://wiki.apache.org/cassandra/ClientOptions)
you can try this:
INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}';
Reference:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support