Create a dynamic Mongo query in Java - java

I'm migrating MongoDB with Hibernate OGM & ORM to 'pure' Java MongoDB (org.mongodb:mongodb-driver-core:4.4.0.
As: "Hibernate OGM is not going to work with ORM 5.5 (the latest version requires ORM 5.3)".
How to use Hibernate ORM 5.5.x.Final with Jakarta 9 on wildfly-preview-25.0.0.Final
I now want to create a 'dynamic' version say x -> 99 (FindIterable Document). As I did similar with Hibernate OGM & ORM:
if (MotorcycleController.motorcycleManufacturers.length > MotorcyclesEJB.ZERO) {
stringBuilderSQL.append(WHERE);
stringBuilderSQL.append(OPEN_BRACKET);
for (int x = MotorcyclesEJB.ZERO; x < MotorcycleController.motorcycleManufacturers.length; x++) {
stringBuilderSQL.append(MotorcyclesEJB.MANUFACTURER);
stringBuilderSQL.append(MotorcyclesEJB.EQUALS);
stringBuilderSQL.append(MotorcyclesEJB.SINGLE_QUOTE);
stringBuilderSQL.append(MotorcycleController.motorcycleManufacturers[x]);
stringBuilderSQL.append(MotorcyclesEJB.SINGLE_QUOTE);
if ((x + ONE) < MotorcycleController.motorcycleManufacturers.length) {
stringBuilderSQL.append(MotorcyclesEJB.OR);
}
stringBuilderSQL.append(CLOSE_BRACKET);
}
}
I can create (a static) multiple persion of MongoDB 'Collection' using:
FindIterable<Document> motorcycleApriliaMotoGuzzi = mongoCollectionMotorcycleManufacturer.find(or(eq("manufacturer", "Aprilia"), eq("manufacturer", "Moto Guzzi")));
Which can show results (example of one):
INFO [com.gostophandle.ejb.MongoDBEJB] (ServerService Thread Pool -- 97) >>>>> motorcycleApriliaMotoGuzzi = Document{{_id=61d70d6a8c9e88075702af3e, manufacturer=Aprilia, model=RS 660, modelType=E5, typesOf=Sport, dateProductionStarted=Fri Jan 01 00:00:00 GMT 2021, dateProductionEnded=Fri Jan 01 00:00:00 GMT 2021, engine=Document{{type=Four-Stroke, displacement=659.0, cylinder=2.0, capacityUnit=cc, carburation=, bore=0.0, boreMeasurement=mm, stroke=0.0, strokeMeasurement=mm, distribution=, maxiumPowerHp=0.0, maxiumPowerKilowatt=0.0, maxiumPowerRpm=0.0, maximumTorque=0.0, maximumTorqueUnit=Nm, maximumTorqueRpm=0.0}}, performance=Document{{topSpeedMph=105.0, topSpeedKph=0.0, accelleration30Mph=0.0, accelleration60Mph=0.0, accelleration100Mph=0.0, accelleration30Kph=0.0, accelleration60Kph=0.0, accelleration100Kph=0.0}}, dimensionsWeights=Document{{batteryCapacity=, casterAngleDegrees=0.0, dimensionsL=0.0, dimensionsW=0.0, dimensionsH=0.0, frameType=, fuelTankCapacityLitres=0.0, fuelConsumption=0.0, groundClearance=0.0, kerbWeight=0.0, seatHeight=0.0, trail=0.0, wheelbase=0.0}}, chassisBrakesSuspensionWheels=Document{{frame=1, swingarm=2, absSystem=3, frontBrakes=4, rearBrakes=5, frontSuspension=6, rearSuspension=7, tyresFront=8, tyresRear=9, frontTyre=10, rearTyre=11, frontWheel=12, rearWheel=13, instrumentDisplayFunctions=14}}, transmission=Document{{clutch=1, clutchOperation=2, finalDrive=3, gearbox=4, transmissionType=5, primaryReduction=0.0, gearRatios1st=0.0, greaRatios2nd=0.0, gearRatios3rd=0.0, gearRatios4th=0.0, gearRatios5th=0.0, gearRatios6th=0.0}}, instruments=Document{{headlights=1, socket=2, ignitionSystem=3, instruments=4, tailLight=5, usbSocket=6}}, electrics=Document{{}}, colours=[Document{{colour=Acid Gold}}, Document{{colour=Lava Red}}, Document{{colour=Apex Black}}], accessories=[], image=Document{{file=/Users/NOTiFY/IdeaProjects/GoStopHandle/images, url=/Aprilia/2021/, png=ap6115200ebm03-01-m.webp, dimensionsWidth=1500, dimensionsHeight=1000}}}}
I can't get it to create a dynamic version using 'find', 'or' & 'eq' etc.
Any suggestions? TIA.

There are two Filters methods for constructing the Bson for OR:
Filters.or(Bson...)
Filters.or(Iterable<Bson>)
Using the latter, you can construct Bson for each of your conditions that you want to OR together, collect them in a List, and then pass that list to that method to construct the Bson for the OR. I guess this is really an IN operation because these are all the same field but for demonstration purposes:
public Bson or(String field, List<String> values) {
return Filters.or(
values.stream()
.map(v -> Filters.eq(field, v))
.collect(Collectors.toList()));
}

#vsfDawg - Perfect
List<String> stringList = new ArrayList<>();
stringList.add("Aprilia");
stringList.add("Moto Guzzi");
Bson bson = or("manufacturer", stringList);
MongoCursor<Document> cursor = mongoCollectionMotorcycles.find(or("manufacturer", stringList)).iterator();;
try {
while (cursor.hasNext()) {
LOGGER.info(">>>>> 6.4 motorcycleApriliaMotoGuzzi = {}", cursor.next());
}
} finally {
cursor.close();
}
public Bson or(String field, List<String> values) {
return Filters.or(
values.stream()
.map(v -> Filters.eq(field, v))
.collect(Collectors.toList()));
}
}
Displays data:
INFO [com.gostophandle.ejb.MotorcyclesEJB] (default task-1) >>>>> 6.4 motorcycleApriliaMotoGuzzi = Document{{_id=61d70d6a8c9e88075702af3e, manufacturer=Aprilia, model=RS 660, modelType=E5, typesOf=Sport, dateProductionStarted=Fri Jan 01 00:00:00 GMT 2021, dateProductionEnded=Fri Jan 01 00:00:00 GMT 2021, engine=Document{{type=Four-Stroke, displacement=659.0, cylinder=2.0, capacityUnit=cc, carburation=, bore=0.0, boreMeasurement=mm, stroke=0.0, strokeMeasurement=mm, distribution=, maxiumPowerHp=0.0, maxiumPowerKilowatt=0.0, maxiumPowerRpm=0.0, maximumTorque=0.0, maximumTorqueUnit=Nm, maximumTorqueRpm=0.0}}, performance=Document{{topSpeedMph=105.0, topSpeedKph=0.0, accelleration30Mph=0.0, accelleration60Mph=0.0, accelleration100Mph=0.0, accelleration30Kph=0.0, accelleration60Kph=0.0, accelleration100Kph=0.0}}, dimensionsWeights=Document{{batteryCapacity=, casterAngleDegrees=0.0, dimensionsL=0.0, dimensionsW=0.0, dimensionsH=0.0, frameType=, fuelTankCapacityLitres=0.0, fuelConsumption=0.0, groundClearance=0.0, kerbWeight=0.0, seatHeight=0.0, trail=0.0, wheelbase=0.0}}, chassisBrakesSuspensionWheels=Document{{frame=1, swingarm=2, absSystem=3, frontBrakes=4, rearBrakes=5, frontSuspension=6, rearSuspension=7, tyresFront=8, tyresRear=9, frontTyre=10, rearTyre=11, frontWheel=12, rearWheel=13, instrumentDisplayFunctions=14}}, transmission=Document{{clutch=1, clutchOperation=2, finalDrive=3, gearbox=4, transmissionType=5, primaryReduction=0.0, gearRatios1st=0.0, greaRatios2nd=0.0, gearRatios3rd=0.0, gearRatios4th=0.0, gearRatios5th=0.0, gearRatios6th=0.0}}, instruments=Document{{headlights=1, socket=2, ignitionSystem=3, instruments=4, tailLight=5, usbSocket=6}}, electrics=Document{{}}, colours=[Document{{colour=Acid Gold}}, Document{{colour=Lava Red}}, Document{{colour=Apex Black}}], accessories=[], image=Document{{file=/Users/NOTiFY/IdeaProjects/GoStopHandle/images, url=/Aprilia/2021/, png=ap6115200ebm03-01-m.webp, dimensionsWidth=1500, dimensionsHeight=1000}}}}
12:42:59,335 INFO [com.gostophandle.ejb.MotorcyclesEJB] (default task-1) >>>>> 6.4 motorcycleApriliaMotoGuzzi = Document{{_id=61d70d6a8c9e88075702af58, manufacturer=Moto Guzzi, model=Le Mans, modelType=I, typesOf=Sport, dateProductionStarted=Thu Jan 01 00:00:00 GMT 1976, dateProductionEnded=Sat Jan 01 00:00:00 GMT 1977, engine=Document{{type=Four-Stroke, displacement=850.0, cylinder=2.0, capacityUnit=cc, carburation=null, bore=80.0, boreMeasurement=mm, stroke=74.0, strokeMeasurement=mm, distribution=null, maxiumPowerHp=85.0, maxiumPowerKilowatt=38.0, maxiumPowerRpm=6200.0, maximumTorque=60.0, maximumTorqueUnit=Nm, maximumTorqueRpm=4900.0}}, electrics=Document{{}}, colours=[Document{{colour=Red}}, Document{{colour=Silver Blue}}], accessories=[Document{{productNumber=MG0123456789, productName=Product 1}}, Document{{productNumber=MG0123456789, productName=Product 2}}], image=Document{{file=/Users/NOTiFY/IdeaProjects/GoStopHandle/images, url=/MotoGuzzi/1976/, png=motorcycle.png, dimensionsWidth=900, dimensionsHeight=440}}}}

Related

Dataset api of Spark giving different result as compare to Dataframe

I am using Spark 2.1 and having one hive table with orc format, following is the schema.
col_name data_type
tuid string
puid string
ts string
dt string
source string
peer string
# Partition Information
# col_name data_type
dt string
source string
peer string
# Detailed Table Information
Database: test
Owner: test
Create Time: Tue Nov 22 15:25:53 GMT 2016
Last Access Time: Thu Jan 01 00:00:00 GMT 1970
Location: hdfs://apps/hive/warehouse/nis.db/dmp_puid_tuid
Table Type: MANAGED
Table Parameters:
transient_lastDdlTime 1479828353
SORTBUCKETCOLSPREFIX TRUE
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Storage Desc Parameters:
serialization.format 1
When i am applying filter on top of this table using partition column, its working fine and only reading specific partitions.
val puid = spark.read.table("nis.dmp_puid_tuid")
.as(Encoders.bean(classOf[DmpPuidTuid]))
.filter( """peer = "AggregateKnowledge" and dt = "20170403"""")
and this is my physical plan for this query
== Physical Plan ==
HiveTableScan [tuid#1025, puid#1026, ts#1027, dt#1022, source#1023, peer#1024], MetastoreRelation nis, dmp_puid_tuid, [isnotnull(peer#1024), isnotnull(dt#1022),
(peer#1024 = AggregateKnowledge), (dt#1022 = 20170403)]
but when i am using below code, its reading entire data into spark
val puid = spark.read.table("nis.dmp_puid_tuid")
.as(Encoders.bean(classOf[DmpPuidTuid]))
.filter( tp => tp.getPeer().equals("AggregateKnowledge") && Integer.valueOf(tp.getDt()) >= 20170403)
Physical plan for above dataframe
== Physical Plan ==
*Filter <function1>.apply
+- HiveTableScan [tuid#1058, puid#1059, ts#1060, dt#1055, source#1056, peer#1057], MetastoreRelation nis, dmp_puid_tuid
Note :- DmpPuidTuid is java bean class
When you pass a Scala function to filter, you prevent the Spark optimizer from seeing which columns of the dataset are actually used (because the optimizer does not try to look inside the compiled code of the function). If you pass a column expression, such as col("peer") === "AggregateKnowledge" && col("dt").cast(IntegerType) >= 20170403 then the optimizer will be able to see which columns are actually required and adjust the plan accordingly.

Obtain Master public DNS value from AWS EMR Cluster using the Java SDK

I need to obtain the master public DNS value via the Java SDK. The only information that I'll have at the start of the application is the ClusterName which is static.
Thus far I've been able to pull out all the other information that I need excluding this and this, unfortunately is vital for the application to be a success.
This is the code that I'm currently working with:
List<ClusterSummary> summaries = clusters.getClusters();
for (ClusterSummary cs: summaries) {
if (cs.getName().equals("test") && WHITELIST.contains(cs.getStatus().getState())) {
ListInstancesResult instances = emr.listInstances(new ListInstancesRequest().withClusterId(cs.getId()));
clusterHostName = instances.getInstances().get(0).toString();
jobFlowId = cs.getId();
}
}
I've removed the get for PublicIpAddress as wanted the full toString for testing. I should be clear in that this method does give me the DNS that I need but I have no way of differentiating between them.
If my EMR has 4 machines, I don't know which position in the list that Instance will be. For my basic trial I've only got two machines, 1 master and a worker. .get(0) has returned both the values for master and the worker on successive runs.
The information that I'm able to obtain from these is below - my only option that I can see at the moment is to use the 'ReadyDateTime' as an identifier as the master 'should' always be ready first, but this feels hacky and I was hoping on a cleaner solution.
{Id: id,
Ec2InstanceId: id,
PublicDnsName: ec2-54--143.compute-1.amazonaws.com,
PublicIpAddress: 54..143,
PrivateDnsName: ip-10--158.ec2.internal,
PrivateIpAddress: 10..158,
Status: {State: RUNNING,StateChangeReason: {},
Timeline: {CreationDateTime: Tue Feb 21 09:18:08 GMT 2017,
ReadyDateTime: Tue Feb 21 09:25:11 GMT 2017,}},
InstanceGroupId: id,
EbsVolumes: []}
{Id: id,
Ec2InstanceId: id,
PublicDnsName: ec2-54--33.compute-1.amazonaws.com,
PublicIpAddress: 54..33,
PrivateDnsName: ip-10--95.ec2.internal,
PrivateIpAddress: 10..95,
Status: {State: RUNNING,StateChangeReason: {},
Timeline: {CreationDateTime: Tue Feb 21 09:18:08 GMT 2017,
ReadyDateTime: Tue Feb 21 09:22:48 GMT 2017,}},
InstanceGroupId: id
EbsVolumes: []}
Don't use ListInstances. Instead, use DescribeCluster, which returns as one of the fields MasterPublicDnsName.
To expand on what was mentioned by Jonathon:
AmazonEC2Client ec2 = new AmazonEC2Client(cred);
DescribeInstancesResult describeInstancesResult = ec2.describeInstances(new DescribeInstancesRequest().withInstanceIds(clusterInstanceIds));
List<Reservation> reservations = describeInstancesResult.getReservations();
for (Reservation res : reservations) {
for (GroupIdentifier group : res.getGroups()) {
if (group.getGroupName().equals("ElasticMapReduce-master")) { // yaaaaaaaaah, Wahay!
masterDNS = res.getInstances().get(0).getPublicDnsName();
}
}
}
AWSCredentials credentials_profile = null;
credentials_profile = new
DefaultAWSCredentialsProviderChain().getCredentials();
AmazonElasticMapReduceClient emr = new
AmazonElasticMapReduceClient(credentials_profile);
Region euWest1 = Region.getRegion(Regions.US_EAST_1);
emr.setRegion(euWest1);
DescribeClusterFunction fun = new DescribeClusterFunction(emr);
DescribeClusterResult res = fun.apply(new
DescribeClusterRequest().withClusterId(clusterId));
String publicDNSName =res.getCluster().getMasterPublicDnsName();
Below is the working code to get the public DNS name.

How to correctly retrieve 'extended media' URLs in Twitter4J?

Me and my group are working with the Twitter4J API, and we have a problem with the medias.
If we get a tweet directly, with this :
status = twitter.showStatus(Long.parseLong(tweetId2pics));
The two pictures' URLs appear both with
getMedia.getMediaURL()
and
getExtendedMedia.getMediaURL().
But when we use a query to get the same tweet, with these lines:
Query query = new Query(tweetId2pics);
try {
QueryResult results = twitter.search(query);
}
The extended media field appeared to be empty.
Here the two JSONs returned by the requests (which are ridiculusly the same) :
StatusJSONImpl{createdAt=Tue Dec 15 17:44:15 CET 2015, id=676804977189363712, text='Nouvel essai avec deux images #hashtagtoutnul https://t.co/o4d3Jefcy2', source='Twitter Web Client', isTruncated=false, inReplyToStatusId=-1, inReplyToUserId=-1, isFavorited=false, isRetweeted=false, favoriteCount=0, inReplyToScreenName='null', geoLocation=null, place=PlaceJSONImpl{name='Nancy', streetAddress='null', countryCode='FR', id='66fabed9d649aa12', country='France', placeType='city', url='https://api.twitter.com/1.1/geo/id/66fabed9d649aa12.json', fullName='Nancy, Lorraine', boundingBoxType='Polygon', boundingBoxCoordinates=[[Ltwitter4j.GeoLocation;#256c668], geometryType='null', geometryCoordinates=null, containedWithIn=[]}, retweetCount=0, isPossiblySensitive=false, lang='fr', contributorsIDs=[], retweetedStatus=null, userMentionEntities=[], urlEntities=[], hashtagEntities=[HashtagEntityJSONImpl{text='hashtagtoutnul'}], mediaEntities=[MediaEntityJSONImpl{id=676804976056926208, url=https://t.co/o4d3Jefcy2, mediaURL=http://pbs.twimg.com/media/CWR-ijQXAAAuipc.png, mediaURLHttps=https://pbs.twimg.com/media/CWR-ijQXAAAuipc.png, expandedURL=http://twitter.com/SteakdeNiche/status/676804977189363712/photo/1, displayURL='pic.twitter.com/o4d3Jefcy2', sizes={0=Size{width=150, height=150, resize=101}, 1=Size{width=340, height=191, resize=100}, 2=Size{width=600, height=337, resize=100}, 3=Size{width=1024, height=576, resize=100}}, type=photo}], symbolEntities=[], currentUserRetweetId=-1, user=UserJSONImpl{id=366530792, name='SteakdeNiche ', screenName='SteakdeNiche', location='France', description='French engineer student, video games lover, electro-music listener. ~ Don't forget, what you want will be. ~', isContributorsEnabled=false, profileImageUrl='http://pbs.twimg.com/profile_images/461515308834889729/9sC6qD9x_normal.jpeg', profileImageUrlHttps='https://pbs.twimg.com/profile_images/461515308834889729/9sC6qD9x_normal.jpeg', isDefaultProfileImage=false, url='null', isProtected=false, followersCount=39, status=null, profileBackgroundColor='EDECE9', profileTextColor='BD2A2A', profileLinkColor='94D487', profileSidebarFillColor='000000', profileSidebarBorderColor='FFFFFF', profileUseBackgroundImage=true, isDefaultProfile=false, showAllInlineMedia=false, friendsCount=151, createdAt=Fri Sep 02 12:28:05 CEST 2011, favouritesCount=48, utcOffset=3600, timeZone='Paris', profileBackgroundImageUrl='http://pbs.twimg.com/profile_background_images/442649966905810945/x5poZ0qE.jpeg', profileBackgroundImageUrlHttps='https://pbs.twimg.com/profile_background_images/442649966905810945/x5poZ0qE.jpeg', profileBackgroundTiled=true, lang='fr', statusesCount=288, isGeoEnabled=true, isVerified=false, translator=false, listedCount=0, isFollowRequestSent=false, withheldInCountries=null}, withHeldInCountries=null, quotedStatusId=-1, quotedStatus=null}
StatusJSONImpl{createdAt=Tue Dec 15 17:44:15 CET 2015, id=676804977189363712, text='Nouvel essai avec deux images #hashtagtoutnul https://t.co/o4d3Jefcy2', source='Twitter Web Client', isTruncated=false, inReplyToStatusId=-1, inReplyToUserId=-1, isFavorited=false, isRetweeted=false, favoriteCount=0, inReplyToScreenName='null', geoLocation=null, place=PlaceJSONImpl{name='Nancy', streetAddress='null', countryCode='FR', id='66fabed9d649aa12', country='France', placeType='city', url='https://api.twitter.com/1.1/geo/id/66fabed9d649aa12.json', fullName='Nancy, Lorraine', boundingBoxType='Polygon', boundingBoxCoordinates=[[Ltwitter4j.GeoLocation;#48688905], geometryType='null', geometryCoordinates=null, containedWithIn=[]}, retweetCount=0, isPossiblySensitive=false, lang='fr', contributorsIDs=[], retweetedStatus=null, userMentionEntities=[], urlEntities=[], hashtagEntities=[HashtagEntityJSONImpl{text='hashtagtoutnul'}], mediaEntities=[MediaEntityJSONImpl{id=676804976056926208, url=https://t.co/o4d3Jefcy2, mediaURL=http://pbs.twimg.com/media/CWR-ijQXAAAuipc.png, mediaURLHttps=https://pbs.twimg.com/media/CWR-ijQXAAAuipc.png, expandedURL=http://twitter.com/SteakdeNiche/status/676804977189363712/photo/1, displayURL='pic.twitter.com/o4d3Jefcy2', sizes={0=Size{width=150, height=150, resize=101}, 1=Size{width=340, height=191, resize=100}, 2=Size{width=600, height=337, resize=100}, 3=Size{width=1024, height=576, resize=100}}, type=photo}], symbolEntities=[], currentUserRetweetId=-1, user=UserJSONImpl{id=366530792, name='SteakdeNiche ', screenName='SteakdeNiche', location='France', description='French engineer student, video games lover, electro-music listener. ~ Don't forget, what you want will be. ~', isContributorsEnabled=false, profileImageUrl='http://pbs.twimg.com/profile_images/461515308834889729/9sC6qD9x_normal.jpeg', profileImageUrlHttps='https://pbs.twimg.com/profile_images/461515308834889729/9sC6qD9x_normal.jpeg', isDefaultProfileImage=false, url='null', isProtected=false, followersCount=39, status=null, profileBackgroundColor='EDECE9', profileTextColor='BD2A2A', profileLinkColor='94D487', profileSidebarFillColor='000000', profileSidebarBorderColor='FFFFFF', profileUseBackgroundImage=true, isDefaultProfile=false, showAllInlineMedia=false, friendsCount=151, createdAt=Fri Sep 02 12:28:05 CEST 2011, favouritesCount=48, utcOffset=3600, timeZone='Paris', profileBackgroundImageUrl='http://pbs.twimg.com/profile_background_images/442649966905810945/x5poZ0qE.jpeg', profileBackgroundImageUrlHttps='https://pbs.twimg.com/profile_background_images/442649966905810945/x5poZ0qE.jpeg', profileBackgroundTiled=true, lang='fr', statusesCount=288, isGeoEnabled=true, isVerified=false, translator=false, listedCount=0, isFollowRequestSent=false, withheldInCountries=null}, withHeldInCountries=null, quotedStatusId=-1, quotedStatus=null}
Does someone could explain us why for the first both getters are working, and why only the getMedia is working for the second one ?
Thanks by advance !

Java sort a csv file based on column date

Need to sort a csv file based on the date column. This is how the masterRecords array list looks like
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
I need to sort it out based from the date 07:15:00, 07:30:00, etc. I created a code to sort it out:
// Date is fixed on per 15min interval
ArrayList<String> sortDate = new ArrayList<String>();
sortDate.add(":00:");
sortDate.add(":15:");
sortDate.add(":30:");
sortDate.add(":45:");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempPath + filename));
for (int k = 0; k < sortDate.size(); k++) {
String date = sortDate.get(k);
for (int j = 0; j < masterRecords.size(); j++) {
String[] splitLine = masterRecords.get(j).split(",", -1);
if (splitLine[10].contains(date)) {
bw.write(masterRecords.get(j) + System.getProperty("line.separator").replaceAll(String.valueOf((char) 0x0D), ""));
masterRecords.remove(j);
}
}
}
bw.close();
You can see from above it will loop thru a first array (sortDate) and loop thru again on the second array which is the masterRecord and write it on a new file. It seems to be working as the new file is sorted out but I notice that my masterRecord has 10000 records but after creating a new file the record shrinks to 5000, Im assuming its how I remove the records from the master list. Anyone knows why?
Is not safe to remove an item inside of a loop.
You have to iterate array over Iterator, for example:
List<String> names = ....
Iterator<String> i = names.iterator();
while (i.hasNext()) {
String s = i.next(); // must be called before you can call i.remove()
// Do something
i.remove();
}
The documentation says:
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
The accepted answer by Lautaro Cozzani is correct.
And Now for Something Completely Different
For fun here is an entirely different approach.
I used two libraries:
Apache Commons CSV
Joda-Time
Apache Commons CSV
The Commons CSV library handles the parsing of various flavors of CSV. It can return a List of the rows from the file, each row being represented by their CSVRecord object. You can ask that object for the first field, second field, and so on.
Joda-Time
Joda-Time does the work of parsing the date-time strings.
Avoid 3-letter Time Zone Codes
Beware: Joda-Time refuses to try to parse the three-letter time zone code MYT. For good reason: Those 3 or 4 letter codes are mere conventions, neither standardized nor unique. My example code below assumes all your data is using MYT. My code assigns the proper time zone name xxx. I suggest you enlighten whoever creates your input data to learn about proper time zone names and about ISO 8601 string formats.
Java 8
My example code requires Java 8, using the new Lambda syntax and "streams".
Example Code
This example does a double-layer sort. First the rows are sorted by the minute-of-hour (00, 15, 30, 45). Within each of those groups, the rows are sorted by the date-time value (ordered by year, month, day-of-month, and time-of-day).
First we open the .csv text file, and parse its contents into CSVRecord objects.
String filePathString = "/Users/brainydeveloper/input.csv";
try {
Reader in = new FileReader( filePathString ); // Get the input file.
List<CSVRecord> recs = CSVFormat.DEFAULT.parse( in ).getRecords(); // Parse the input file.
Next we wrap those CSVRecord objects each inside a smarter class that extracts the two values we care about: first the DateTime, secondly the minute-of-hour of that DateTime. See further down for the simple code of that class CsvRecWithDateTimeAndMinute.
List<CsvRecWithDateTimeAndMinute> smartRecs = new ArrayList<>( recs.size() ); // Collect transformed data.
for ( CSVRecord rec : recs ) { // For each CSV record…
CsvRecWithDateTimeAndMinute smartRec = new CsvRecWithDateTimeAndMinute( rec ); // …transform CSV rec into one of our objects with DateTime and minute-of-hour.
smartRecs.add( smartRec );
}
Next we take that list of our smarter wrapped objects, and break that list into multiple lists. Each new list contains the CSV row data for a particular minute-of-hour (00, 15, 30, and 45). We store these in a map.
If our input data has only occurrences of those four values, the resulting map will have only four keys. Indeed, you can do a sanity-check by looking for more than four keys. Extra keys would mean either something went terribly wrong in parsing or there is some data with unexpected minute-of-hour values.
Each key (the Integer of those numbers) leads to a List of our smart wrapper objects. Here is some of that fancy new Lambda syntax.
Map<Integer , List<CsvRecWithDateTimeAndMinute>> byMinuteOfHour = smartRecs.stream().collect( Collectors.groupingBy( CsvRecWithDateTimeAndMinute::getMinuteOfHour ) );
The map does not give us our sub-lists with our keys (minute-of-hour Integers) sorted. We might get back the 15 group before we get the 00 group. So extract the keys, and sort them.
// Access the map by the minuteOfHour value in order. We want ":00:" first, then ":15", then ":30:", and ":45:" last.
List<Integer> minutes = new ArrayList<Integer>( byMinuteOfHour.keySet() ); // Fetch the keys of the map.
Collections.sort( minutes ); // Sort that List of keys.
Following along that list of ordered keys, ask the map for each key's list. That list of data needs to be sorted to get our second-level sort (by date-time).
List<CSVRecord> outputList = new ArrayList<>( recs.size() ); // Make an empty List in which to put our CSVRecords in double-sorted order.
for ( Integer minute : minutes ) {
List<CsvRecWithDateTimeAndMinute> list = byMinuteOfHour.get( minute );
// Secondary sort. For each group of records with ":00:" (for example), sort them by their full date-time value.
// Sort the List by defining an anonymous Comparator using new Lambda syntax in Java 8.
Collections.sort( list , ( CsvRecWithDateTimeAndMinute r1 , CsvRecWithDateTimeAndMinute r2 ) -> {
return r1.getDateTime().compareTo( r2.getDateTime() );
} );
for ( CsvRecWithDateTimeAndMinute smartRec : list ) {
outputList.add( smartRec.getCSVRecord() );
}
}
We are done manipulating the data. Now it is time to export back out to a text file in CSV format.
// Now we have complete List of CSVRecord objects in double-sorted order (first by minute-of-hour, then by date-time).
// Now let's dump those back to a text file in CSV format.
try ( PrintWriter out = new PrintWriter( new BufferedWriter( new FileWriter( "/Users/brainydeveloper/output.csv" ) ) ) ) {
final CSVPrinter printer = CSVFormat.DEFAULT.print( out );
printer.printRecords( outputList );
}
} catch ( FileNotFoundException ex ) {
System.out.println( "ERROR - Exception needs to be handled." );
} catch ( IOException ex ) {
System.out.println( "ERROR - Exception needs to be handled." );
}
The code above loads the entire CSV data set into memory at once. If wish to conserve memory, use the parse method rather than getRecords method. At least that is what the doc seems to be saying. I've not experimented with that, as my use-cases so far all fit easily into memory.
Here is that smart class to wrap each CSVRecord object:
package com.example.jodatimeexperiment;
import org.apache.commons.csv.CSVRecord;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
/**
*
* #author Basil Bourque
*/
public class CsvRecWithDateTimeAndMinute
{
// Statics
static public final DateTimeFormatter FORMATTER = DateTimeFormat.forPattern( "MMM dd yyyy' - 'hh:mm:ss aa 'MYT'" ).withZone( DateTimeZone.forID( "Asia/Kuala_Lumpur" ) );
// Member vars.
private final CSVRecord rec;
private final DateTime dateTime;
private final Integer minuteOfHour;
public CsvRecWithDateTimeAndMinute( CSVRecord recordArg )
{
this.rec = recordArg;
// Parse record to extract DateTime.
// Expect value such as: Dec 15 2014 - 07:15:00 AM MYT
String input = this.rec.get( 7 - 1 ); // Index (zero-based counting). So field # 7 = index # 6.
this.dateTime = CsvRecWithDateTimeAndMinute.FORMATTER.parseDateTime( input );
// From DateTime extract minute of hour
this.minuteOfHour = this.dateTime.getMinuteOfHour();
}
public DateTime getDateTime()
{
return this.dateTime;
}
public Integer getMinuteOfHour()
{
return this.minuteOfHour;
}
public CSVRecord getCSVRecord()
{
return this.rec;
}
#Override
public String toString()
{
return "CsvRecWithDateTimeAndMinute{ " + " minuteOfHour=" + minuteOfHour + " | dateTime=" + dateTime + " | rec=" + rec + " }";
}
}
With this input…
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL
…you will get this output…
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL

EclipseLink Profiler shows multiple registered object being created

In my development environment when i run a ReadAllQuery using a simple get all JQPL query, i noticed after using the EL profiler that there several read object queries that get executed each adding time to my total time. For example, running a query like this returns the following eclipse profile output.
#SuppressWarnings("unchecked")
public List<Person> getAllPeople(){
EntityManager entityManager = factory.createEntityManager();
List<Person> people = null;
try {
people = entityManager.createQuery("Select p from Person p where p.active = true").getResultList();
} catch (Exception e) {
// TODO: handle exception
}
finally{
entityManager.close();
if (entityManager.isOpen()) {
}
}
Returns multiple Register the existing object statements and looking at the log output each is being done in a unit of work, why and how can i prevent these?
[EL Finest]: connection: 2012-07-16 20:21:26.558--ServerSession(1144634498)--Connection(1713234840)--Thread(Thread["http-bio-8080"-exec-14,5,main])--Connection released to connection pool [read].
Profile(ReadAllQuery,
class=org.bixin.dugsi.domain.ApplicantSchool,
number of objects=2,
total time=3494000,
local time=3494000,
profiling time=84000,
Timer:Logging=412000,
Timer:ObjectBuilding=1670000,
Timer:SqlPrepare=24000,
Timer:ConnectionManagement=215000,
Timer:StatementExecute=455000,
Timer:Caching=68000,
Timer:DescriptorEvents=9000,
Timer:RowFetch=97000,
time/object=1747000,
)
}End profile
Register the existing object Address[id=5,persons={[Applicant[major=Bachelors in Islamic Studies,nativeLanguage=<null>,ethnicity=<null>,hispanic=<null>,religiousAffiliation=<null>,schools={[ApplicantSchool[id=8,name=John Hopkisn,fromMonth=May,fromYear=2013,toMonth=March,toYear=2011,schoolType=Highschool,creditsCompleted=unavailable,gpa=unavailable,applicant=<null>,version=1,_persistence_applicant_vh={QueryBasedValueHolder: not instantiated},_persistence_fetchGroup=<null>], ApplicantSchool[id=7,name=,fromMonth=<null>,fromYear=<null>,toMonth=<null>,toYear=<null>,schoolType=College,creditsCompleted=,gpa=,applicant=<null>,version=1,_persistence_applicant_vh={QueryBasedValueHolder: not instantiated},_persistence_fetchGroup=<null>]]},FirstName=warsame,MiddleName=a,LastName=bashir,primaryTelephone=2342342333,secondaryTelephone=,emailAddress=warsme#d.com,birthDay=Sun Jul 22 00:00:00 CDT 2012,gender=Male,DateAdded=Fri Jul 13 18:16:33 CDT 2012,address=<null>,imagePath=<null>,active=true,marital=Single,school=<null>,nativeLanguage=Arabic,ethnicity=[Asian],hispanic=No,religiousAffiliation=Islam,id=651,version=1,_persistence_school_vh={null},_persistence_address_vh={Address[id=5,persons=org.eclipse.persistence.indirection.IndirectSet#47f322c8,streetAddress=243 city join,streetAddress2=<null>,city=saudi,state_us=South Carolina (SC),zipCode=24234,country=Antarctica,version=1,_persistence_fetchGroup=<null>]},_persistence_fetchGroup=<null>]]},streetAddress=243 city join,streetAddress2=<null>,city=saudi,state_us=South Carolina (SC),zipCode=24234,country=Antarctica,version=1,_persistence_fetchGroup=<null>]
Profile(
total time=5000,
local time=5000,
Timer:DescriptorEvents=5000,
)
Profile(
total time=196000,
local time=196000,
Timer:Register=196000,
)
[EL Finest]: transaction: 2012-07-16 20:21:26.564--UnitOfWork(26103836)--Thread(Thread["http-bio-8080"-exec-14,5,main])--Register the existing object Applicant[major=Bachelors in Islamic Studies,nativeLanguage=<null>,ethnicity=<null>,hispanic=<null>,religiousAffiliation=<null>,schools={[ApplicantSchool[id=8,name=John Hopkisn,fromMonth=May,fromYear=2013,toMonth=March,toYear=2011,schoolType=Highschool,creditsCompleted=unavailable,gpa=unavailable,applicant=<null>,version=1,_persistence_applicant_vh={QueryBasedValueHolder: not instantiated},_persistence_fetchGroup=<null>], ApplicantSchool[id=7,name=,fromMonth=<null>,fromYear=<null>,toMonth=<null>,toYear=<null>,schoolType=College,creditsCompleted=,gpa=,applicant=<null>,version=1,_persistence_applicant_vh={QueryBasedValueHolder: not instantiated},_persistence_fetchGroup=<null>]]},FirstName=warsame,MiddleName=a,LastName=bashir,primaryTelephone=2342342333,secondaryTelephone=,emailAddress=warsme#d.com,birthDay=Sun Jul 22 00:00:00 CDT 2012,gender=Male,DateAdded=Fri Jul 13 18:16:33 CDT 2012,address=<null>,imagePath=<null>,active=true,marital=Single,school=<null>,nativeLanguage=Arabic,ethnicity=[Asian],hispanic=No,religiousAffiliation=Islam,id=651,version=1,_persistence_school_vh={null},_persistence_address_vh={Address[id=5,persons={[Applicant[major=Bachelors in Islamic Studies,nativeLanguage=<null>,ethnicity=<null>,hispanic=<null>,religiousAffiliation=<null>,schools={[ApplicantSchool[id=8,name=John Hopkisn,fromMonth=May,fromYear=2013,toMonth=March,toYear=2011,schoolType=Highschool,creditsCompleted=unavailable,gpa=unavailable,applicant=<null>,version=1,_persistence_applicant_vh={QueryBasedValueHolder: not instantiated},_persistence_fetchGroup=<null>], ApplicantSchool[id=7,name=,fromMonth=<null>,fromYear=<null>,toMonth=<null>,toYear=<null>,schoolType=College,creditsCompleted=,gpa=,applicant=<null>,version=1,_persistence_applicant_vh={QueryBasedValueHolder: not instantiated},_persistence_fetchGroup=<null>]]},FirstName=warsame,MiddleName=a,LastName=bashir,primaryTelephone=2342342333,secondaryTelephone=,emailAddress=warsme#d.com,birthDay=Sun Jul 22 00:00:00 CDT 2012,gender=Male,DateAdded=Fri Jul 13 18:16:33 CDT 2012,address=<null>,imagePath=<null>,active=true,marital=Single,school=<null>,nativeLanguage=Arabic,ethnicity=[Asian],hispanic=No,religiousAffiliation=Islam,id=651,version=1,_persistence_school_vh={null},_persistence_address_vh=org.eclipse.persistence.internal.indirection.QueryBasedValueHolder#6b8099d3,_persistence_fetchGroup=<null>]]},streetAddress=243 city join,streetAddress2=<null>,city=saudi,state_us=South Carolina (SC),zipCode=24234,country=Antarctica,version=1,_persistence_fetchGroup=<null>]},_persistence_fetchGroup=<null>]
Profile(
total time=1349000,
local time=1349000,
Timer:Logging=1349000,
)
Caused by a Logging setting set to FINEST

Categories