JSON sorting in Java - java

I have a huge JSON file (400MB). I want to sort by TimeStamp. Do you have any idea, how can I do this?
I created a program with the loop, which is sorted small files, but my file is the too big, and I got an infinitive loop.
enter image description here

Use the library big-sorter as below.
Sorter
.serializer(Serializer.jsonArray())
.comparator((x, y) ->
x.get("time").asText().compareTo(y.get("time").asText()))
.input(new File("input.json"))
.output(new File("sorted.json"))
.sort();
With this method I sorted 10 million records in a 440MB file in 54s with max heap set at 64MB (-Xmx64m).

I would create an object T implemeting Comparable. The object T should representing an entry in the Json file.
I would then load the Json into a list of object T using Gson.
Then you can do a Collections.sort on your list.

Related

Efficiently copy large timeseries results in Java

I am querying data from a timeseries database (Influx in my case) using Java.
I have approximately 20.000-100.000 values (Strings) in the database.
Mapping the results that I get via the Influx Java API to my Domain Objects seems to be very inefficient (ca.0,5s on a small machine).
I suppose this is due to "resource intensive" object creation of the domain objects.
I am currently using StreamsAPI:
QueryResult series = result.getResults().get(0).getSeries().get(0);
List<ItemHistoryEntity> mappedList = series.getValues().stream().parallel().map(valueList ->
new ItemHistoryEntity(valueList)).collect(Collectors.toList());
Unfortunately, I downsampling my data at the database is not an option in my case.
How can I do this more efficiently in Java?
EDIT:
Next thing I will do with the list is downsampling. The problem is that for further downsampling, I need the oldest timestamp in the list. To get this timestamp, I need to iterate the full list. Would it be more efficient, to never call Collectors.toList() until I have reduced the size of the list, even though I need to iterate it at least twice. Or should I find the oldest timestamp using an additional db query and then iterate the list only once and call the Collector only for the reduce list?

(Java) Saving/Loading an ArrayList using Properties

I have been researching different methods for saving and loading configuration settings for my application. I've looked into Preferences, JSON, Properties and XML but I think I've settled on using the Properties method for most of my application settings.
However, I'm not able to find any information on how to best save and load an ArrayList from that file. It seems there are only individual key/pair string combinations possible.
So my question is basically, is there a better way to do this? I have an ArrayList of Strings in my application that I need to be able to save and load. Can this be done with Properties or do I need to use a separate file just to hold this list and then read it in as an ArrayList (per line, perhaps)?
EDIT: I should mention, I would like to keep all config files as readable text so I am avoiding using Serialization.
You can use commas to place multiple values on the same key.
key:value1,value2,value3
Then split them using the split function of a string after reading them in which will give you a String[] array which can be turned into an ArrayList via Arrays.asList().
Here's a partial MCVE:
ArrayList<String> al = new ArrayList<>();
al.add("value1");
al.add("value2");
al.add("value3");
String values = al.toString();
//Substring used to get rid of "[" and "]"
prop.setProperty("name",values.substring(1,values.length() - 1);
I found that using the following combination worked perfectly in my case.
Save:
String csv = String.join(",", arrayList());
props.setProperty("list", csv);
This will create a String containing each element of the ArrayList, separated with a comma.
Load:
arrayList = Arrays.asList(csv.split(","));
Takes the csv String and splits it at each comma, adding the elements to the arrayList reference.
I've seen two approaches for writing lists to a Properties file. One is to store each element of the list as a separate entry by adding the index to the name of the property—something like "mylist.1", "mylist.2". The other is to make a single value of the elements, separated by a delimiter.
The advantage of the first method is that you can handle any value without worrying about what to do if the value contains the delimiter. The advantage of the second is that you can retrieve the whole list without iterating over all entries in the Properties.
In either case, you probably want to write a wrapper (or find a library) around the Properties object that adds methods to store and retrieve lists using whichever scheme you choose. Often these wrappers have methods to validate and convert other common data types, like numbers and URLs.

How to process JSON in Java

I have a .JSON file that has content as:
"Name":"something"
"A":10
"B": 12
"Name":"something else"
"A":5
"B":9
....
I want to read this file and then find among these objects which one have the most number of parts(sum of A+B). What would be a good approach to that? I thought about reading JSON data in a Map, and then going through each object of the Map, finding the its total number of A+B, and then storing that in another linked list as LinkedList<String,Integer> (where String would be a name, and Integer would be the sum of A+B). Then after that sort LinkedList, and then findout which ever has the most number of A+B. Would this be a good solution?

How can I improve performance of string processing with less memory?

I'm implementing this in Java.
Symbol file Store data file
1\item1 10\storename1
10\item20 15\storename6
11\item6 15\storename9
15\item14 1\storename250
5\item5 1\storename15
The user will search store names using wildcards like storename?
My job is to search the store names and produce a full string using symbol data. For example:
item20-storename1
item14-storename6
item14-storename9
My approach is:
reading the store data file line by line
if any line contains matching search string (like storename?), I will push that line to an intermediate store result file
I will also copy the itemno of a matching storename into an arraylist (like 10,15)
when this arraylist size%100==0 then I will remove duplicate item no's using hashset, reducing arraylist size significantly
when arraylist size >1000
sort that list using Collections.sort(itemno_arraylist)
open symbol file & start reading line by line
for each line Collections.binarySearch(itemno_arraylist,itmeno)
if matching then push result to an intermediate symbol result file
continue with step1 until EOF of store data file
...
After all of this I would combine two result files (symbol result file & store result file) to present actual strings list.
This approach is working but it is consuming more CPU time and main memory.
I want to know a better solution with reduced CPU time (currently 2 min) & memory (currently 80MB). There are many collection classes available in Java. Which one would give a more efficient solution for this kind of huge string processing problem?
If you have any thoughts on this kind of string processing problems that too in Java would be great and helpful.
Note: Both files would be nearly a million lines long.
Replace the two flat files with an embedded database (there's plenty of them, I used SQLite and Db4O in the past): problem solved.
So you need to replace 10\storename1 with item20-storename1 because the symbol file contains 10\item20. The obvious solution is to load the symbol file into a Map:
String tokens=symbolFile.readLine().split("\\");
map.put(tokens[0], tokens[1]);
Then read the store file line by line and replace:
String tokens=storelFile.readLine().split("\\");
output.println(map.get(tokens[0])+'-'+tokens[1]));
This is the fastest method, though still using a lot of memory for the map. You can reduce the memory storing the map in a database, but this would increase the time significantly.
If your input data file is not changing frequently, then parse the file once, put the data into a List of custom class e.g. FileStoreRecord mapping your record in the file. Define a equals method on your custom class. Perform all next steps over the List e.g. for search, you can call contains method by passing search string in form of the custom object FileStoreRecord .
If the file is changing after some time, you may want to refresh the List after certain interval or keep the track of list creation time and compare against the file update timestamp before using it. If ifferent, recreate the list. One other way to manage the file check could be to have a Thread continuously polling the file update and the moment, it is updated, it notifies to refresh the list.
Is there any limitation to use Map?
You can add Items to Map, then you can search easily?
1 million record means 1M * recordSize, therefore it will not be problem.
Map<Integer,Item> itemMap= new HashMap();
...
Item item= itemMap.get(store.getItemNo());
But, the best solution will be with Database.

Help need in creating a hashset from a hashmap

I've been able to read a four column text file into a hashmap and get it to write to a output file. However, I need to get the second column(distinct values) into a hashset and write to the output file. I've been able to create the hashset, but it is grabbing everything and not sorting. By the way I'm new, so please take this into consideration when you answer. Thanks
Neither HashSet nor HashMap are meant to sort. They're fundamentally unsorted data structures. You should use an implementation of SortedSet, such as TreeSet.
Some guesses, related to mr Skeets answer and your apparent confusion...
Are you sure you are not inserting the whole line in the TreeSet? If you are going to use ONLY the second column, you will need to split() the strings (representing the lines) into columns - that's nothing that's done automatically.
Also, If you are actually trying to sort the whole file using the second column as key, You will need a TreeMap instead, and use the 2:nd column as key, and the whole line as data. But that won't solve the splitting, it only to keep the relation between the line and the key.
Edit: Here is some terminology for you, you might need it.
You have a Set. It's a collection of other objects - like String. You add other objects to it, and then you can fetch all objects in it by iterating through the set. Adding is done through the method add()and iterating can be done using the enhanced for loop syntax or using the iterator() method.
The set doesn't "grab" or "take" stuff; You add something to the set - in this case a String - Not an array of Strings which is written as String[]
(Its apparently possible to add array to a TreeSet (they are objects too) , but the order is not related to the contents of the String. Maybe thats what you are doing.)
String key = splittedLine[1]; // 2:nd element
"The second element of the keys" doesn't make sense at all. And what's the duplicates you're talking about. (note the correct use of apostrophes... :-)

Categories