Java sort a csv file based on column date - java
Need to sort a csv file based on the date column. This is how the masterRecords array list looks like
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
I need to sort it out based from the date 07:15:00, 07:30:00, etc. I created a code to sort it out:
// Date is fixed on per 15min interval
ArrayList<String> sortDate = new ArrayList<String>();
sortDate.add(":00:");
sortDate.add(":15:");
sortDate.add(":30:");
sortDate.add(":45:");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempPath + filename));
for (int k = 0; k < sortDate.size(); k++) {
String date = sortDate.get(k);
for (int j = 0; j < masterRecords.size(); j++) {
String[] splitLine = masterRecords.get(j).split(",", -1);
if (splitLine[10].contains(date)) {
bw.write(masterRecords.get(j) + System.getProperty("line.separator").replaceAll(String.valueOf((char) 0x0D), ""));
masterRecords.remove(j);
}
}
}
bw.close();
You can see from above it will loop thru a first array (sortDate) and loop thru again on the second array which is the masterRecord and write it on a new file. It seems to be working as the new file is sorted out but I notice that my masterRecord has 10000 records but after creating a new file the record shrinks to 5000, Im assuming its how I remove the records from the master list. Anyone knows why?
Is not safe to remove an item inside of a loop.
You have to iterate array over Iterator, for example:
List<String> names = ....
Iterator<String> i = names.iterator();
while (i.hasNext()) {
String s = i.next(); // must be called before you can call i.remove()
// Do something
i.remove();
}
The documentation says:
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
The accepted answer by Lautaro Cozzani is correct.
And Now for Something Completely Different
For fun here is an entirely different approach.
I used two libraries:
Apache Commons CSV
Joda-Time
Apache Commons CSV
The Commons CSV library handles the parsing of various flavors of CSV. It can return a List of the rows from the file, each row being represented by their CSVRecord object. You can ask that object for the first field, second field, and so on.
Joda-Time
Joda-Time does the work of parsing the date-time strings.
Avoid 3-letter Time Zone Codes
Beware: Joda-Time refuses to try to parse the three-letter time zone code MYT. For good reason: Those 3 or 4 letter codes are mere conventions, neither standardized nor unique. My example code below assumes all your data is using MYT. My code assigns the proper time zone name xxx. I suggest you enlighten whoever creates your input data to learn about proper time zone names and about ISO 8601 string formats.
Java 8
My example code requires Java 8, using the new Lambda syntax and "streams".
Example Code
This example does a double-layer sort. First the rows are sorted by the minute-of-hour (00, 15, 30, 45). Within each of those groups, the rows are sorted by the date-time value (ordered by year, month, day-of-month, and time-of-day).
First we open the .csv text file, and parse its contents into CSVRecord objects.
String filePathString = "/Users/brainydeveloper/input.csv";
try {
Reader in = new FileReader( filePathString ); // Get the input file.
List<CSVRecord> recs = CSVFormat.DEFAULT.parse( in ).getRecords(); // Parse the input file.
Next we wrap those CSVRecord objects each inside a smarter class that extracts the two values we care about: first the DateTime, secondly the minute-of-hour of that DateTime. See further down for the simple code of that class CsvRecWithDateTimeAndMinute.
List<CsvRecWithDateTimeAndMinute> smartRecs = new ArrayList<>( recs.size() ); // Collect transformed data.
for ( CSVRecord rec : recs ) { // For each CSV record…
CsvRecWithDateTimeAndMinute smartRec = new CsvRecWithDateTimeAndMinute( rec ); // …transform CSV rec into one of our objects with DateTime and minute-of-hour.
smartRecs.add( smartRec );
}
Next we take that list of our smarter wrapped objects, and break that list into multiple lists. Each new list contains the CSV row data for a particular minute-of-hour (00, 15, 30, and 45). We store these in a map.
If our input data has only occurrences of those four values, the resulting map will have only four keys. Indeed, you can do a sanity-check by looking for more than four keys. Extra keys would mean either something went terribly wrong in parsing or there is some data with unexpected minute-of-hour values.
Each key (the Integer of those numbers) leads to a List of our smart wrapper objects. Here is some of that fancy new Lambda syntax.
Map<Integer , List<CsvRecWithDateTimeAndMinute>> byMinuteOfHour = smartRecs.stream().collect( Collectors.groupingBy( CsvRecWithDateTimeAndMinute::getMinuteOfHour ) );
The map does not give us our sub-lists with our keys (minute-of-hour Integers) sorted. We might get back the 15 group before we get the 00 group. So extract the keys, and sort them.
// Access the map by the minuteOfHour value in order. We want ":00:" first, then ":15", then ":30:", and ":45:" last.
List<Integer> minutes = new ArrayList<Integer>( byMinuteOfHour.keySet() ); // Fetch the keys of the map.
Collections.sort( minutes ); // Sort that List of keys.
Following along that list of ordered keys, ask the map for each key's list. That list of data needs to be sorted to get our second-level sort (by date-time).
List<CSVRecord> outputList = new ArrayList<>( recs.size() ); // Make an empty List in which to put our CSVRecords in double-sorted order.
for ( Integer minute : minutes ) {
List<CsvRecWithDateTimeAndMinute> list = byMinuteOfHour.get( minute );
// Secondary sort. For each group of records with ":00:" (for example), sort them by their full date-time value.
// Sort the List by defining an anonymous Comparator using new Lambda syntax in Java 8.
Collections.sort( list , ( CsvRecWithDateTimeAndMinute r1 , CsvRecWithDateTimeAndMinute r2 ) -> {
return r1.getDateTime().compareTo( r2.getDateTime() );
} );
for ( CsvRecWithDateTimeAndMinute smartRec : list ) {
outputList.add( smartRec.getCSVRecord() );
}
}
We are done manipulating the data. Now it is time to export back out to a text file in CSV format.
// Now we have complete List of CSVRecord objects in double-sorted order (first by minute-of-hour, then by date-time).
// Now let's dump those back to a text file in CSV format.
try ( PrintWriter out = new PrintWriter( new BufferedWriter( new FileWriter( "/Users/brainydeveloper/output.csv" ) ) ) ) {
final CSVPrinter printer = CSVFormat.DEFAULT.print( out );
printer.printRecords( outputList );
}
} catch ( FileNotFoundException ex ) {
System.out.println( "ERROR - Exception needs to be handled." );
} catch ( IOException ex ) {
System.out.println( "ERROR - Exception needs to be handled." );
}
The code above loads the entire CSV data set into memory at once. If wish to conserve memory, use the parse method rather than getRecords method. At least that is what the doc seems to be saying. I've not experimented with that, as my use-cases so far all fit easily into memory.
Here is that smart class to wrap each CSVRecord object:
package com.example.jodatimeexperiment;
import org.apache.commons.csv.CSVRecord;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
/**
*
* #author Basil Bourque
*/
public class CsvRecWithDateTimeAndMinute
{
// Statics
static public final DateTimeFormatter FORMATTER = DateTimeFormat.forPattern( "MMM dd yyyy' - 'hh:mm:ss aa 'MYT'" ).withZone( DateTimeZone.forID( "Asia/Kuala_Lumpur" ) );
// Member vars.
private final CSVRecord rec;
private final DateTime dateTime;
private final Integer minuteOfHour;
public CsvRecWithDateTimeAndMinute( CSVRecord recordArg )
{
this.rec = recordArg;
// Parse record to extract DateTime.
// Expect value such as: Dec 15 2014 - 07:15:00 AM MYT
String input = this.rec.get( 7 - 1 ); // Index (zero-based counting). So field # 7 = index # 6.
this.dateTime = CsvRecWithDateTimeAndMinute.FORMATTER.parseDateTime( input );
// From DateTime extract minute of hour
this.minuteOfHour = this.dateTime.getMinuteOfHour();
}
public DateTime getDateTime()
{
return this.dateTime;
}
public Integer getMinuteOfHour()
{
return this.minuteOfHour;
}
public CSVRecord getCSVRecord()
{
return this.rec;
}
#Override
public String toString()
{
return "CsvRecWithDateTimeAndMinute{ " + " minuteOfHour=" + minuteOfHour + " | dateTime=" + dateTime + " | rec=" + rec + " }";
}
}
With this input…
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL
…you will get this output…
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL
Related
Create a dynamic Mongo query in Java
I'm migrating MongoDB with Hibernate OGM & ORM to 'pure' Java MongoDB (org.mongodb:mongodb-driver-core:4.4.0. As: "Hibernate OGM is not going to work with ORM 5.5 (the latest version requires ORM 5.3)". How to use Hibernate ORM 5.5.x.Final with Jakarta 9 on wildfly-preview-25.0.0.Final I now want to create a 'dynamic' version say x -> 99 (FindIterable Document). As I did similar with Hibernate OGM & ORM: if (MotorcycleController.motorcycleManufacturers.length > MotorcyclesEJB.ZERO) { stringBuilderSQL.append(WHERE); stringBuilderSQL.append(OPEN_BRACKET); for (int x = MotorcyclesEJB.ZERO; x < MotorcycleController.motorcycleManufacturers.length; x++) { stringBuilderSQL.append(MotorcyclesEJB.MANUFACTURER); stringBuilderSQL.append(MotorcyclesEJB.EQUALS); stringBuilderSQL.append(MotorcyclesEJB.SINGLE_QUOTE); stringBuilderSQL.append(MotorcycleController.motorcycleManufacturers[x]); stringBuilderSQL.append(MotorcyclesEJB.SINGLE_QUOTE); if ((x + ONE) < MotorcycleController.motorcycleManufacturers.length) { stringBuilderSQL.append(MotorcyclesEJB.OR); } stringBuilderSQL.append(CLOSE_BRACKET); } } I can create (a static) multiple persion of MongoDB 'Collection' using: FindIterable<Document> motorcycleApriliaMotoGuzzi = mongoCollectionMotorcycleManufacturer.find(or(eq("manufacturer", "Aprilia"), eq("manufacturer", "Moto Guzzi"))); Which can show results (example of one): INFO [com.gostophandle.ejb.MongoDBEJB] (ServerService Thread Pool -- 97) >>>>> motorcycleApriliaMotoGuzzi = Document{{_id=61d70d6a8c9e88075702af3e, manufacturer=Aprilia, model=RS 660, modelType=E5, typesOf=Sport, dateProductionStarted=Fri Jan 01 00:00:00 GMT 2021, dateProductionEnded=Fri Jan 01 00:00:00 GMT 2021, engine=Document{{type=Four-Stroke, displacement=659.0, cylinder=2.0, capacityUnit=cc, carburation=, bore=0.0, boreMeasurement=mm, stroke=0.0, strokeMeasurement=mm, distribution=, maxiumPowerHp=0.0, maxiumPowerKilowatt=0.0, maxiumPowerRpm=0.0, maximumTorque=0.0, maximumTorqueUnit=Nm, maximumTorqueRpm=0.0}}, performance=Document{{topSpeedMph=105.0, topSpeedKph=0.0, accelleration30Mph=0.0, accelleration60Mph=0.0, accelleration100Mph=0.0, accelleration30Kph=0.0, accelleration60Kph=0.0, accelleration100Kph=0.0}}, dimensionsWeights=Document{{batteryCapacity=, casterAngleDegrees=0.0, dimensionsL=0.0, dimensionsW=0.0, dimensionsH=0.0, frameType=, fuelTankCapacityLitres=0.0, fuelConsumption=0.0, groundClearance=0.0, kerbWeight=0.0, seatHeight=0.0, trail=0.0, wheelbase=0.0}}, chassisBrakesSuspensionWheels=Document{{frame=1, swingarm=2, absSystem=3, frontBrakes=4, rearBrakes=5, frontSuspension=6, rearSuspension=7, tyresFront=8, tyresRear=9, frontTyre=10, rearTyre=11, frontWheel=12, rearWheel=13, instrumentDisplayFunctions=14}}, transmission=Document{{clutch=1, clutchOperation=2, finalDrive=3, gearbox=4, transmissionType=5, primaryReduction=0.0, gearRatios1st=0.0, greaRatios2nd=0.0, gearRatios3rd=0.0, gearRatios4th=0.0, gearRatios5th=0.0, gearRatios6th=0.0}}, instruments=Document{{headlights=1, socket=2, ignitionSystem=3, instruments=4, tailLight=5, usbSocket=6}}, electrics=Document{{}}, colours=[Document{{colour=Acid Gold}}, Document{{colour=Lava Red}}, Document{{colour=Apex Black}}], accessories=[], image=Document{{file=/Users/NOTiFY/IdeaProjects/GoStopHandle/images, url=/Aprilia/2021/, png=ap6115200ebm03-01-m.webp, dimensionsWidth=1500, dimensionsHeight=1000}}}} I can't get it to create a dynamic version using 'find', 'or' & 'eq' etc. Any suggestions? TIA.
There are two Filters methods for constructing the Bson for OR: Filters.or(Bson...) Filters.or(Iterable<Bson>) Using the latter, you can construct Bson for each of your conditions that you want to OR together, collect them in a List, and then pass that list to that method to construct the Bson for the OR. I guess this is really an IN operation because these are all the same field but for demonstration purposes: public Bson or(String field, List<String> values) { return Filters.or( values.stream() .map(v -> Filters.eq(field, v)) .collect(Collectors.toList())); }
#vsfDawg - Perfect List<String> stringList = new ArrayList<>(); stringList.add("Aprilia"); stringList.add("Moto Guzzi"); Bson bson = or("manufacturer", stringList); MongoCursor<Document> cursor = mongoCollectionMotorcycles.find(or("manufacturer", stringList)).iterator();; try { while (cursor.hasNext()) { LOGGER.info(">>>>> 6.4 motorcycleApriliaMotoGuzzi = {}", cursor.next()); } } finally { cursor.close(); } public Bson or(String field, List<String> values) { return Filters.or( values.stream() .map(v -> Filters.eq(field, v)) .collect(Collectors.toList())); } } Displays data: INFO [com.gostophandle.ejb.MotorcyclesEJB] (default task-1) >>>>> 6.4 motorcycleApriliaMotoGuzzi = Document{{_id=61d70d6a8c9e88075702af3e, manufacturer=Aprilia, model=RS 660, modelType=E5, typesOf=Sport, dateProductionStarted=Fri Jan 01 00:00:00 GMT 2021, dateProductionEnded=Fri Jan 01 00:00:00 GMT 2021, engine=Document{{type=Four-Stroke, displacement=659.0, cylinder=2.0, capacityUnit=cc, carburation=, bore=0.0, boreMeasurement=mm, stroke=0.0, strokeMeasurement=mm, distribution=, maxiumPowerHp=0.0, maxiumPowerKilowatt=0.0, maxiumPowerRpm=0.0, maximumTorque=0.0, maximumTorqueUnit=Nm, maximumTorqueRpm=0.0}}, performance=Document{{topSpeedMph=105.0, topSpeedKph=0.0, accelleration30Mph=0.0, accelleration60Mph=0.0, accelleration100Mph=0.0, accelleration30Kph=0.0, accelleration60Kph=0.0, accelleration100Kph=0.0}}, dimensionsWeights=Document{{batteryCapacity=, casterAngleDegrees=0.0, dimensionsL=0.0, dimensionsW=0.0, dimensionsH=0.0, frameType=, fuelTankCapacityLitres=0.0, fuelConsumption=0.0, groundClearance=0.0, kerbWeight=0.0, seatHeight=0.0, trail=0.0, wheelbase=0.0}}, chassisBrakesSuspensionWheels=Document{{frame=1, swingarm=2, absSystem=3, frontBrakes=4, rearBrakes=5, frontSuspension=6, rearSuspension=7, tyresFront=8, tyresRear=9, frontTyre=10, rearTyre=11, frontWheel=12, rearWheel=13, instrumentDisplayFunctions=14}}, transmission=Document{{clutch=1, clutchOperation=2, finalDrive=3, gearbox=4, transmissionType=5, primaryReduction=0.0, gearRatios1st=0.0, greaRatios2nd=0.0, gearRatios3rd=0.0, gearRatios4th=0.0, gearRatios5th=0.0, gearRatios6th=0.0}}, instruments=Document{{headlights=1, socket=2, ignitionSystem=3, instruments=4, tailLight=5, usbSocket=6}}, electrics=Document{{}}, colours=[Document{{colour=Acid Gold}}, Document{{colour=Lava Red}}, Document{{colour=Apex Black}}], accessories=[], image=Document{{file=/Users/NOTiFY/IdeaProjects/GoStopHandle/images, url=/Aprilia/2021/, png=ap6115200ebm03-01-m.webp, dimensionsWidth=1500, dimensionsHeight=1000}}}} 12:42:59,335 INFO [com.gostophandle.ejb.MotorcyclesEJB] (default task-1) >>>>> 6.4 motorcycleApriliaMotoGuzzi = Document{{_id=61d70d6a8c9e88075702af58, manufacturer=Moto Guzzi, model=Le Mans, modelType=I, typesOf=Sport, dateProductionStarted=Thu Jan 01 00:00:00 GMT 1976, dateProductionEnded=Sat Jan 01 00:00:00 GMT 1977, engine=Document{{type=Four-Stroke, displacement=850.0, cylinder=2.0, capacityUnit=cc, carburation=null, bore=80.0, boreMeasurement=mm, stroke=74.0, strokeMeasurement=mm, distribution=null, maxiumPowerHp=85.0, maxiumPowerKilowatt=38.0, maxiumPowerRpm=6200.0, maximumTorque=60.0, maximumTorqueUnit=Nm, maximumTorqueRpm=4900.0}}, electrics=Document{{}}, colours=[Document{{colour=Red}}, Document{{colour=Silver Blue}}], accessories=[Document{{productNumber=MG0123456789, productName=Product 1}}, Document{{productNumber=MG0123456789, productName=Product 2}}], image=Document{{file=/Users/NOTiFY/IdeaProjects/GoStopHandle/images, url=/MotoGuzzi/1976/, png=motorcycle.png, dimensionsWidth=900, dimensionsHeight=440}}}}
Obtain Master public DNS value from AWS EMR Cluster using the Java SDK
I need to obtain the master public DNS value via the Java SDK. The only information that I'll have at the start of the application is the ClusterName which is static. Thus far I've been able to pull out all the other information that I need excluding this and this, unfortunately is vital for the application to be a success. This is the code that I'm currently working with: List<ClusterSummary> summaries = clusters.getClusters(); for (ClusterSummary cs: summaries) { if (cs.getName().equals("test") && WHITELIST.contains(cs.getStatus().getState())) { ListInstancesResult instances = emr.listInstances(new ListInstancesRequest().withClusterId(cs.getId())); clusterHostName = instances.getInstances().get(0).toString(); jobFlowId = cs.getId(); } } I've removed the get for PublicIpAddress as wanted the full toString for testing. I should be clear in that this method does give me the DNS that I need but I have no way of differentiating between them. If my EMR has 4 machines, I don't know which position in the list that Instance will be. For my basic trial I've only got two machines, 1 master and a worker. .get(0) has returned both the values for master and the worker on successive runs. The information that I'm able to obtain from these is below - my only option that I can see at the moment is to use the 'ReadyDateTime' as an identifier as the master 'should' always be ready first, but this feels hacky and I was hoping on a cleaner solution. {Id: id, Ec2InstanceId: id, PublicDnsName: ec2-54--143.compute-1.amazonaws.com, PublicIpAddress: 54..143, PrivateDnsName: ip-10--158.ec2.internal, PrivateIpAddress: 10..158, Status: {State: RUNNING,StateChangeReason: {}, Timeline: {CreationDateTime: Tue Feb 21 09:18:08 GMT 2017, ReadyDateTime: Tue Feb 21 09:25:11 GMT 2017,}}, InstanceGroupId: id, EbsVolumes: []} {Id: id, Ec2InstanceId: id, PublicDnsName: ec2-54--33.compute-1.amazonaws.com, PublicIpAddress: 54..33, PrivateDnsName: ip-10--95.ec2.internal, PrivateIpAddress: 10..95, Status: {State: RUNNING,StateChangeReason: {}, Timeline: {CreationDateTime: Tue Feb 21 09:18:08 GMT 2017, ReadyDateTime: Tue Feb 21 09:22:48 GMT 2017,}}, InstanceGroupId: id EbsVolumes: []}
Don't use ListInstances. Instead, use DescribeCluster, which returns as one of the fields MasterPublicDnsName.
To expand on what was mentioned by Jonathon: AmazonEC2Client ec2 = new AmazonEC2Client(cred); DescribeInstancesResult describeInstancesResult = ec2.describeInstances(new DescribeInstancesRequest().withInstanceIds(clusterInstanceIds)); List<Reservation> reservations = describeInstancesResult.getReservations(); for (Reservation res : reservations) { for (GroupIdentifier group : res.getGroups()) { if (group.getGroupName().equals("ElasticMapReduce-master")) { // yaaaaaaaaah, Wahay! masterDNS = res.getInstances().get(0).getPublicDnsName(); } } }
AWSCredentials credentials_profile = null; credentials_profile = new DefaultAWSCredentialsProviderChain().getCredentials(); AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials_profile); Region euWest1 = Region.getRegion(Regions.US_EAST_1); emr.setRegion(euWest1); DescribeClusterFunction fun = new DescribeClusterFunction(emr); DescribeClusterResult res = fun.apply(new DescribeClusterRequest().withClusterId(clusterId)); String publicDNSName =res.getCluster().getMasterPublicDnsName(); Below is the working code to get the public DNS name.
Read tab delimited file and ignore empty space
I am working on a simple project in which a tab delimited text file is read into a program. My problem: When reading the text file there are regularly empty data spaces. This lack of data is causing an unexpected output. For lines that do not have data in the token[4] position all data read is ignored and "4" is displayed when I run a System.out.println(Just a test that the data is being read properly). When I incorporate a value in the token[4] position the data reads fine. It is not acceptable that I input a value in the token[4] position. See below for file and code. 2014 Employee Edward Rodrigo 6500 2014 Salesman Patricia Capola 5600 5000000 2014 Executive Suzy Allen 10000 55 2015 Executive James McHale 12500 49 2015 Employee Bernie Johnson 5500 2014 Salesman David Branch 6700 2000000 2015 Salesman Jonathan Stein 4600 300000 2014 Executive Michael Largo 17000 50 2015 Employee Kevin Bolden 9200 2015 Employee Thomas Sullivan 6250 My code is: // Imports are here import java.io.*; import java.util.*; public class EmployeeData { public static void main(String[] args) throws IOException { // Initialize variables String FILE = "employees.txt"; // Constant for file name to be read ArrayList<Employee> emp2014; // Array list for 2014 employees ArrayList<Employee> emp2015; // Array list for 2015 employees Scanner scan; // Try statement for error handling try { scan = new Scanner(new BufferedReader(new FileReader(FILE))); emp2014 = new ArrayList(); emp2015 = new ArrayList(); // While loop to read FILE while (scan.hasNextLine()) { String l = scan.nextLine(); String[] token = l.split("\t"); try { String year = token[0]; String type = token[1]; String name = token[2]; String monthly = token[3]; String bonus = token[4]; System.out.println(year + " " + type + " " + name + " " + monthly + " " + bonus); } catch (Exception a) { System.out.println(a.getMessage()); } } } catch(Exception b) { System.out.println(b.getMessage()); } } } The output I receive for lines with "Employee" returns in an unexpected way. Output: run: 4 2014 Salesman Patricia Capola 5600 5000000 2014 Executive Suzy Allen 10000 55 2015 Executive James McHale 12500 49 4 2014 Salesman David Branch 6700 2000000 2015 Salesman Jonathan Stein 4600 300000 2014 Executive Michael Largo 17000 50 4 4 BUILD SUCCESSFUL (total time: 0 seconds) I tried to use an if-then to test for null value in token[4] position but that didn't really help me. I've done quite a bit of searching with no success. I am still very new to the programming world, so please pardon my coding inefficiencies. Any support and general feedback to improve my skills is greatly appreciated! Thank you, Bryan
Java Devil is right that the underlying issue because of an ArrayOutOfBoundsException. But it's also worth exploring why you didn't see that. As we discussed in the comments your "Try statement for error handling" is in fact not handling your errors at all, instead it is suppressing them, which is generally a poor plan as it allows your program to continue running even after your assumption (that it works correctly) has been violated. Here's a slightly cleaned up version of your code. The underlying problem that causes the ArrayOutOfBoundsException is still there, but the issue would be immediately apparent if you'd structured your code this way instead. There's a few comments calling out issues inline. public class EmployeeData { // constants should be declared static and final, and not inside main private static final String FILE = "employees.txt"; // If you have an exception and you don't know how to handle it the best thing // to do is throw it higher and let the caller of your method decide what to do. // If there's *nothing* you want to do with an exception allow main() to throw // it as you do here; your program will crash, but that's a good thing! public static void main(String[] args) throws IOException { // Notice the <> after ArrayList - without it you're defining a "raw type" // which is bad - https://stackoverflow.com/q/2770321/113632 ArrayList<Employee> emp2014 = new ArrayList<>(); ArrayList<Employee> emp2015 = new ArrayList<>(); // A try-with-resources block automatically closes the file once you exit the block // https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html try (Scanner scan = new Scanner(new BufferedReader(new FileReader(FILE)))) { while (scan.hasNextLine()) { String l = scan.nextLine(); String[] token = l.split("\t"); // The code below this line assumes that token has at least five indicies; // since that isn't always true you need to handle that edge case before // accessing the array indicies directly. String year = token[0]; String type = token[1]; String name = token[2]; String monthly = token[3]; String bonus = token[4]; System.out.println(year + " " + type + " " + name + " " + monthly + " " + bonus); } } } }
This is happening because you are actually getting an ArrayOutOfBoundsException and the message for that is '4'. Because the index of 4 is greater than the length of the array. You should put in your catch statement b.printStackTrace() as this will give you greater details when ever the caught exception occurs. You can get around this by adding the following: String bonus = ""; if(token.length > 4) bonus = token[4];
Java7 GregorianCalendar timeInMillis from collection are not working well
I need to record a collection of objects that keep: Date, and average Day temperature. and I need to be able to track back the date. So I created a class that keeps these values and I made an ArrayList that keeps these objects. In my code I test to keep 5 days. When I run the program and the ArrayList gets filled everything seems fine and the terminal displays: dateSaved:2013-10-16 11:59:59 TimeStamp: 1381960799018 dateSaved:2013-10-17 11:59:59 TimeStamp: 1382047199018 dateSaved:2013-10-18 11:59:59 TimeStamp: 1382133599018 dateSaved:2013-10-19 11:59:59 TimeStamp: 1382219999018 These TimeStamps are all unique and seem to be fine. however when I then enter the for loop and want to get the timestamps from each of these entries I get: entry: 0 //removed since the first dateSaved has not been pasted* entry: 1 timeInMillis: 1382306399018 entry: 2 timeInMillis: 1382306399018 entry: 3 timeInMillis: 1382306399018 entry: 4 timeInMillis: 1382306399018 These are all the same times and are: Sun, 20 Oct 2013 21:59:59 GMT That is the date here. but not the time. And i'm not realy getting the values I expect to get. What is going wrong here? GregorianCalendar date = new GregorianCalendar(); GregorianCalendar beginDate = new GregorianCalendar(); beginDate.roll(beginDate.DAY_OF_YEAR ,-5); while(beginDate.getTimeInMillis() < date.getTimeInMillis() ) { GCalAndDouble dateAndTemp = new GCalAndDouble(beginDate, WeatherStation.Instance().getValue(Enums.MeasurementType.outsideTemperature, Enums.ValueType.average, beginDate) ); list.add(dateAndTemp); System.out.println("dateSaved:" + new SimpleDateFormat("YYYY-MM-dd KK:mm:ss").format(new Timestamp(beginDate.getTimeInMillis())) + " TimeStamp: " + beginDate.getTimeInMillis() ); long timeTemp = beginDate.getTimeInMillis(); beginDate.setTimeInMillis(timeTemp + 86400000); // + the ammount of milliseconds in a day. } for(int j = 0; j < 5; j++) { GCalAndDouble tempdateandtemp = list.get(j); long timestamptemp = tempdateandtemp.getDate().getTimeInMillis(); System.out.println("entry: " + j + " timeInMillis: " + timestamptemp); } Thanks for your help!
You are using the same beginDate object. This means that all the values will be the same. They might have changed as you were building the list, but the final value is all you will see. Most likely you intended to create a new Date() object for each entry to give each one a different Date. BTW I prefer to use long which is not only more efficient but doesn't have this issue.
Halting program execution between certain hours of the week
I am writing a Java program that is required to copy files and folders between the following hours: Mon - 18:00 to 06:30 Tue - 18:00 to 06:30 Wed - 18:00 to 06:30 Thu - 18:00 to 06:30 Fri - 18:00 to 06:30 Sat - all day Sun - all day The program will run continuously until it has finished copying all files and folders. However, outside of the above hours the program should just sleep. I am using a properties file to store the above settings. UPDATE I am looking for the simplest possible implementation including the format of the properties in the properties file as well as the code that will make the checks.
I would do it like this final Map<Integer, String> schedule = new HashMap<>(); // parse your settings and fill schedule schedule.put(Calendar.MONDAY, "18:00 to 06:30"); // ... // create timer to fire e.g. every hour new Timer().scheduleAtFixedRate(new TimerTask() { public void run() { Calendar c = Calendar.getInstance(); String s = schedule.get(c.get(Calendar.DAY_OF_WEEK)); if (withinTimeRange(c, s)) { // implement withinTimeRange func // copy files } }}, 0, 1000 * 3600);
Since your program is going to run continuously, the simplest solution is to check the day and time before copying a file. If the time is during off hours, go ahead and copy the next file, otherwise Thread.sleep. If this is an internal, one-off kind of program, I would go ahead and hard-code the business hours instead of reading the properties file. No need to add complexity.
whenever your program is launched, get the current time, and check day today's day. check whether it lies in permissible time if yes let it continue. If not, find the time at 00:00am of that 'day'. and find the time at xx:yyZZ (start of permissible time). calculate the difference, and let the program sleep for that much of time.
Thank you for your suggestions. I came up with a working solution in the end which if it gets enough points I will mark as the answer. The way I attempted to solve this problem was by thinking about non-working hours rather than working hours. This code is just for illustration # Properties Mon = 06:30-18:00 Tue = 06:30-18:00 Wed = 06:30-18:00 Thu = 06:30-18:00 Fri = 06:30-18:00 Loop over the properties to get their values String[] days = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" }; Map<Integer, Integer[]> nonWorkingHours = new HashMap<Integer, Integer[]>(); for( int i = 0; i < days.length; i++ ) // for each property in file { // excluded implementation of getConfig String prop = getConfig( days[ i ] ); // e.g. "06:00-19:00" // +1 to match CALENDAR.DAY_OF_WEEK nonWorkingHours.put( i + 1, getHours( prop ); } My function to parse property excluding error handling // e.g. if prop = "06:00-19:00" then { 6, 0, 19, 0 } is returned public Integer[] getHours( String prop ) { String times = prop.split( "(:|-)" ); Integer[] t = new Integer[4]; for( int i = 0; i < times.length; i++ ) { t[i] = Integer.parseInt( times[i] ); } return t; } And finally the function that implements the halt private void sleepIfOutsideWorkingHours() { Integer[] data = nonWorkingHours.get( currentDay ); if( data != null ) { Calendar c = Calendar.getInstance(); Integer currentSeconds = ( Calendar.HOUR_OF_DAY * 3600 ) + ( Calendar.MINUTE * 60 ); Integer stopFrom = ( data[ 0 ] * 3600 ) + ( data[ 1 ] * 60 ); Integer stopTill = ( data[ 2 ] * 3600 ) + ( data[ 3 ] * 60 ); if( currentSeconds > stopFrom && currentSeconds < stopTill ) { Integer secondsDiff = stopTill - currentSeconds; if( secondsDiff > 0 ) { try { Thread.sleep( secondsDiff * 1000 ); // turn seconds to milliseconds } catch( InterruptedException e ) { // error handling } } } } } And finally just call the function below just before copying each file and if it is being run outside working hours it will stop the program. sleepIfOutsideWorkingHours(); I am sure there is a simpler way of doing it :-) but there it is.
You should try using a Continuous Integration system. That scenario can be easily set up using Jenkins CI for example. The reason i advise doing it so in a ambient like that is that you can keep a better control on the history of your program runs.