selecting files with regex - java

I'm doing a project at college where my method is using a regex to pull the date out of the name of each of the log files in a folder, and then, after comparing the millisec stamp with one defined by the Ndays parameter passed in, delete it if its older than the required date (eg; 30 days ago- params passed in will be 30 and the file that contains the logs)....
Much appreciate any assistance.....my below attempt isn't working.....
public void deleteFilesOlderThanNdays( int Ndays, String aFolder) throws
ParseException{
File directory = new File(aFolder);
if(directory.exists()){
File[] logFiles = directory.listFiles();
for (File log: logFiles) {
String name= log.getName();
//Pattern dateFind = Pattern.compile("\\d\\{0,10}");// or
Pattern dateFind = Pattern.compile("\\d{4}-\\d{2}-\\d{2}");
Matcher dateSearch =dateFind.matcher(name);
while(dateSearch.find()){
String logDate = dateSearch.toString();
SimpleDateFormat format= new SimpleDateFormat("yyyy.mm.dd");
Date date= format.parse(logDate);
long cutOffPoint = System.currentTimeMillis() - (Ndays* 24*60*60*1000);
if(date.getTime()< cutOffPoint){
log.delete();
}
}
}
}
}

It turned out to be the simple mistakes pointed out by the replies above, the mm in the date format instead of the MM, and calling the group() method in place of toString() method on the Matcher object..regards guys....

Related

Java - Do not delete files that match any of the array values

I have a list of files(approximately 500 or more files) where the filename contains a date.
file_20180810
file_19950101
file_20180809
etc.
What I want to do is delete files which exceed the storage period.
I've come up with the following logic so far
~Get dates of valid storage period (ie. if storage period is 5 days and date today is 20180810, store date values 20180810, 20180809, 20180808, 20180807, 20180806, 20180805 in an array.
~Check every file in a directory if it contains any of the following dates. If it contains date, don't delete, else delete.
My problem here is, if the file name does contain one single date and I use a loop to delete a file, it might delete other files with valid dates as well. To show what I want to do in code form, it goes somehow like this:
if (!fileName.contains(stringDate1) &&
!fileName.contains(stringDate2) &&
!fileName.contains(stringDate3)) //...until storage period
{//delete file}
Is there a better way to express this? Any suggestions for a workaround?
Please and thank you.
Parse dates from your filename. Here's an example:
import java.time.*;
import java.util.regex.*;
public class MyClass {
public static void main(String args[]) {
LocalDate today = LocalDate.now();
long storagePeriod = 5L;
String fileName = "file_20180804";
int year = 0;
int month = 0;
int day = 0;
String pattern = "file_(\\d{4})(\\d{2})(\\d{2})";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileName);
if (m.find()) {
year = Integer.parseInt(m.group(1));
month = Integer.parseInt(m.group(2));
day = Integer.parseInt(m.group(3));
}
LocalDate fileDate = LocalDate.of(year, month, day);
if (fileDate.isBefore(today.minusDays(storagePeriod))) {
System.out.println("Delete this file");
}
}
}
You can try using Regex to extract the actual date of each file and check for the inclusion in a validity period.
Pattern p = Pattern.compile("file_(?<date>\d{6})");
foreach(File f : filelist){
Matcher m = p.matcher(f.filename());
if(m.find()){
Date fileDate = new Date(m.group("date"));
if(fileDate.before(periodStartDate)){
file.delete();
}
}
}
The code is not precise and should not compile, check about Date object creation and comparison, but the main idea is pretty much here.
You can only delete Files that are not in the Array like (tested, working):
String path = ""; // <- Folder we want to clean.
DateFormat df = new SimpleDateFormat("yyyyMMdd"); // <- DateFormat to convert the Calendar dates into our format.
Calendar cal = Calendar.getInstance(); // <- Using Calendar to get the days backwards.
ArrayList<String> dr = new ArrayList<String>(); // <- Save the dates we want to remove. dr = don't remove
dr.add(df.format(cal.getTime())); // <- add the actual date to List
for(int i = 0; i < 5; i++) { // <- Loop 5 Times to get the 5 Last Days
cal.add(Calendar.DATE, -1); // <- remove 1 day from actual Calendar date
dr.add(df.format(cal.getTime())); // <- add the day before to List
}
for(File file : new File(path).listFiles()) { // <- loop through all the files in the folder
String filename = file.getName().substring(0, file.getName().lastIndexOf(".")); // <- name of the file without extension
boolean remove = true; // <- Set removing to "yes"
for(String s : dr) { // <- loop through all the allowed dates
if(filename.contains(s)) { // <- when the file contains the allowed date
remove = false; // <- Set removing to "no"
break; // <- Break the loop for better performance
}
}
if(remove) { // <- If remove is "yes"
file.delete(); // <- Delete the file because it's too old for us!
}
}
but this is not the best way! A much better method would be to calculate how old the files are. Because of the _ you can pretty easily get the dates from the filenames. Like (not tested):
String path = ""; // <- Folder we want to clean.
Date today = new Date();
DateFormat df = new SimpleDateFormat("yyyyMMdd"); // <- Dateformat you used in the files
long maxage = 5 * 24 * 60 * 60 * 1000; // <- Calculate how many milliseconds ago we want to delete
for(File file : new File(path).listFiles()) { // <- loop through all the files in the folder
String fds = file.getName().split("_")[1]; // <- Date from the filename as string
try {
Date date = df.parse(fds); // Convert the string to a date
if(date.getTime() - today.getTime() <= maxage) { // <- when the file is older as 5 days
file.delete(); // <- Delete the file
}
} catch (ParseException e) {
e.printStackTrace();
}
}
Here is some example code which demonstrates how a list of input files (file name strings, e.g., "file_20180810") can be verified against a supplied set of date strings (e.g., "20180810") and perform an operation (like delete the file) on them.
import java.util.*;
import java.io.*;
public class FilesTesting {
private static final int DATE_STRING_LENGTH = 8; // length of 20180809
public static void main(String [] args) {
List<String> filter = Arrays.asList("20180810", "20180808", "20180809", "20180807", "20180806", "20180805");
List<File> files = Arrays.asList(new File("file_20180810"), new File("file_19950101"), new File("file_20180809"));
for (File file : files) {
String fileDateStr = getDateStringFromFileName(file.getName());
if (filter.contains(fileDateStr)) {
// Do something with it
// Delete file - if it exists
System.out.println(file.toString());
}
}
}
private static String getDateStringFromFileName(String fileName) {
int fileLen = fileName.length();
int dateStrPos = fileLen - DATE_STRING_LENGTH;
return fileName.substring(dateStrPos);
}
}
If you’re using ES6 you can use array includes and return a true or false to validate.
['a', 'b', 'c'].includes('b')

Best method for parsing date formats during import datas

I created method for parsing a view different date formats during data import (400 K records). My method catches ParseException and trying to parse date with next format when it's different.
Question: Is better way(and faster) to set correct date format during data import?
private static final String DMY_DASH_FORMAT = "dd-MM-yyyy";
private static final String DMY_DOT_FORMAT = "dd.MM.yyyy";
private static final String YMD_DASH_FORMAT = "yyyy-MM-dd";
private static final String YMD_DOT_FORMAT = "yyyy.MM.dd";
private static final String SIMPLE_YEAR_FORMAT = "yyyy";
private final List<String> dateFormats = Arrays.asList(YMD_DASH_FORMAT, DMY_DASH_FORMAT,
DMY_DOT_FORMAT, YMD_DOT_FORMAT);
private Date parseDateFromString(String date) throws ParseException {
if (date.equals("0")) {
return null;
}
if (date.length() == 4) {
SimpleDateFormat simpleDF = new SimpleDateFormat(SIMPLE_YEAR_FORMAT);
simpleDF.setLenient(false);
return new Date(simpleDF.parse(date).getTime());
}
for (String format : dateFormats) {
SimpleDateFormat simpleDF = new SimpleDateFormat(format);
try {
return new Date(simpleDF.parse(date).getTime());
} catch (ParseException exception) {
}
}
throw new ParseException("Unknown date format", 0);
}
If you're running single threaded, an obvious improvement is to create the SimpleDateFormat objects only once. In a multithreaded situation using ThreadLocal<SimpleDateFormat> would be required.
Also fix your exception handling. It looks like it's written by someone who shouldn't be trusted to import any data.
For a similar problem statememt , i had used time4j library in the past. Here is an example. This uses the following dependencies given below as well
import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import net.time4j.PlainDate;
import net.time4j.format.expert.ChronoFormatter;
import net.time4j.format.expert.MultiFormatParser;
import net.time4j.format.expert.ParseLog;
import net.time4j.format.expert.PatternType;
public class MultiDateParser {
static final MultiFormatParser<PlainDate> MULTI_FORMAT_PARSER;
static {
ChronoFormatter<PlainDate> style1 = ChronoFormatter.ofDatePattern("dd-MM-yyyy", PatternType.CLDR,
Locale.GERMAN);
ChronoFormatter<PlainDate> style2 = ChronoFormatter.ofDatePattern("dd.MM.yyyy", PatternType.CLDR, Locale.US);
ChronoFormatter<PlainDate> style3 = ChronoFormatter.ofDatePattern("yyyy-MM-dd", PatternType.CLDR, Locale.US);
ChronoFormatter<PlainDate> style4 = ChronoFormatter.ofDatePattern("yyyy.MM.dd", PatternType.CLDR, Locale.US);
//this is not supported
//ChronoFormatter<PlainDate> style5 = ChronoFormatter.ofDatePattern("yyyy", PatternType.CLDR, Locale.US);
MULTI_FORMAT_PARSER = MultiFormatParser.of(style1, style2, style3, style4);
}
public List<PlainDate> parse() throws ParseException {
String[] input = { "11-09-2001", "09.11.2001", "2011-11-01", "2011.11.01", "2012" };
List<PlainDate> dates = new ArrayList<>();
ParseLog plog = new ParseLog();
for (String s : input) {
plog.reset(); // initialization
PlainDate date = MULTI_FORMAT_PARSER.parse(s, plog);
if (date == null || plog.isError()) {
System.out.println("Wrong entry found: " + s + " at position " + dates.size() + ", error-message="
+ plog.getErrorMessage());
} else {
dates.add(date);
}
}
System.out.println(dates);
return dates;
}
public static void main(String[] args) throws ParseException {
MultiDateParser mdp = new MultiDateParser();
mdp.parse();
}
}
<dependency>
<groupId>net.time4j</groupId>
<artifactId>time4j-core</artifactId>
<version>4.19</version>
</dependency>
<dependency>
<groupId>net.time4j</groupId>
<artifactId>time4j-misc</artifactId>
<version>4.19</version>
</dependency>
The case yyyy will have to be handled differently as it is not a date. May be similar logic that you have used (length ==4) is a choice.
The above code returns , you can check a quick perf run to see if this scales for the 400k records you have.
Wrong entry found: 2012 at position 4, error-message=Not matched by any format: 2012
[2001-09-11, 2001-11-09, 2011-11-01, 2011-11-01]
Talking about 400K records, it might be reasonable to do some "bare hands" optimization here.
For example: if your incoming string has a "-" on position 5, then you know that the only (potentially) matching format would be "yyyy-MM-dd". If it is "."; you know that it is the other format that starts yyyy.
So, if you really want to optimize, you could fetch that character and see what it is. Could save 3 attempts of parsing with the wrong format!
Beyond that: I am not sure if sure if "dd" means that your other dates start with "01" ... or if "1.1.2016" would be possible, too. If all your dates always use two digits for dd/mm; then you can repeat that game - as you would fetch on position 3 - to choose between "dd...." and "dd-....".
Of course; there is one disadvantage - if you follow that idea, you are very much "hard-coding" the expected formats into your code; so adding other formats will become harder. On the other hand; you would save a lot.
Finally: the other thing that might greatly speed up things would be to use stream operations for reading/parsing that information; because then you could look into parallel streams, and simply exploit the ability of modern hardware to process 4, 8, 16 dates in parallel.

Trying to get the arraylist value inside hashmap key

I'm probably being stupid here...but I need help with this one! Basically i need to do a .contains("message") to determine if the key already contains the incoming message.
Thanks in advance!
EDIT: Just as a note, i do not want it to do anything if it already exists! Currently its not adding it to the list.
EDIT2: the date will not matter for the incoming message because the incoming message does not have the date portion.
private Map<Integer,List<String>> map = new HashMap<Integer,List<String>>();
public synchronized void addToProblemList(String incomingMessage, int storeNumber){
Date date = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy h:mm:ss a");
String formattedDate = sdf.format(date);
if(map.get(storeNumber)==null){
map.put(storeNumber, new ArrayList<String>());
}
for(String lookForText : map.get(storeNumber)){
if(lookForText.contains(incomingMessage)){
}else if(!lookForText.contains(incomingMessage)){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}
}
}
It used to look like this, but it always added it:
public synchronized void addToProblemList(String incomingMessage, int storeNumber){
Date date = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy h:mm:ss a");
String formattedDate = sdf.format(date);
if(map.get(storeNumber)==null){
map.put(storeNumber, new ArrayList<String>());
}
if(map.get(storeNumber).contains(incomingMessage)==true){
//Do nothing
}
if (map.get(storeNumber).contains(incomingMessage)==false){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}
What you are adding to the list is a key of the store number and an empty array list,
So the first message for the store you add to the list is empty, therefore your for loop will not execute as it has no elements to iterate.
So add this
if(map.get(storeNumber)==null){
ArrayList<String> aList = new ArrayList<String>();
aList.add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
map.put(storeNumber, aList);
}
Note map.get(storeNumber).contains(incomingMessage)==true you dont need to boolean comparison here as contains() returns a boolean.
The reason this original approach of yours wouldn't have worked is doing a List.contains() means you are doing an check to see if the list contains an exact matching string which it would not have since when you have added the String it also contained "\nTime of incident: "+formattedDate+"\n... which I suspect would not have matched just incomingMessage
You have this:
for(String lookForText : map.get(storeNumber)){
if(lookForText.contains(incomingMessage)){
}else if(!lookForText.contains(incomingMessage)){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}
}
Try this instead:
List<String> messages = map.get(storeNumber);
if(!messages.contains(incomingMessage)){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}

combining the name of the file with todays date

I am creating an excel sheet in c: with this name ABC_0607 and it also get created as shown below..
String outputDir = "C:/Report/";
FileOutputStream fw = new FileOutputStream(new File(outputDir, "ABC_0607.xls"));
Now as I am recieving this files on a daily basis and need to be stored in c: drive
so I want to modify it name a little bit that is combination of filename+MM/DD/YYYY
so if today date is 3-July-2013 so file name should be like ABC_0607-MM/DD/YYYY that is ABC_0607-07/03/2013.
PLease advise how to achieve this
You can use this method to retrieve the name of your file:
public String getFileNameFrom(String name) {
String currDate = new SimpleDateFormat("yyyy_MM_dd").format(new Date());
return name + "-" + currDate;
}
Use a StringBuilder initialized with the name of the file.Format the Date using a DateFormat and append the String to it. Put the entire logic inside a method so that it can be reused without code duplication.
I have following code to create name for log file, it can be hourly, daily or minutely(lol)
SimpleDateFormat ymd = new SimpleDateFormat("yyyy_MM_dd");
SimpleDateFormat ymdh = new SimpleDateFormat("yyyy_MM_dd_HH");
SimpleDateFormat ymdhm = new SimpleDateFormat("yyyy_MM_dd_HH_mm");
Calendar dt = Calendar.getInstance();
dt.setTimeInMillis(moment);
String fName;
if (_splitType == SPLIT_HOUR)
fName = ymdh.format(dt.getTime());
else if (_splitType == SPLIT_MINUTE)
fName = ymdhm.format(dt.getTime());
else
fName = ymd.format(dt.getTime());

Format date in String Template email

I'm creating an email using String Template but when I print out a date, it prints out the full date (eg. Wed Apr 28 10:51:37 BST 2010). I'd like to print it out in the format dd/mm/yyyy but don't know how to format this in the .st file.
I can't modify the date individually (using java's simpleDateFormatter) because I iterate over a collection of objects with dates.
Is there a way to format the date in the .st email template?
Use additional renderers like this:
internal class AdvancedDateTimeRenderer : IAttributeRenderer
{
public string ToString(object o)
{
return ToString(o, null);
}
public string ToString(object o, string formatName)
{
if (o == null)
return null;
if (string.IsNullOrEmpty(formatName))
return o.ToString();
DateTime dt = Convert.ToDateTime(o);
return string.Format("{0:" + formatName + "}", dt);
}
}
and then add this to your StringTemplate such as:
var stg = new StringTemplateGroup("Templates", path);
stg.RegisterAttributeRenderer(typeof(DateTime), new AdvancedDateTimeRenderer());
then in st file:
$YourDateVariable; format="dd/mm/yyyy"$
it should work
Here is a basic Java example, see StringTemplate documentation on Object Rendering for more information.
StringTemplate st = new StringTemplate("now = $now$");
st.setAttribute("now", new Date());
st.registerRenderer(Date.class, new AttributeRenderer(){
public String toString(Object date) {
SimpleDateFormat f = new SimpleDateFormat("dd/MM/yyyy");
return f.format((Date) date);
}
});
st.toString();
StringTemplate 4 includes a DateRenderer class.
My example below is a modified version of the NumberRenderer on the documentation on Renderers in Java
String template =
"foo(right_now) ::= << <right_now; format=\"full\"> >>\n";
STGroup g = new STGroupString(template);
g.registerRenderer(Date.class, new DateRenderer());
ST st = group.getInstanceOf("foo");
st.add("right_now", new Date());
String result = st.render();
The provided options for format map as such:
"short" => DateFormat.SHORT (default)
"medium" => DateFormat.MEDIUM
"long" => DateFormat.LONG
"full" => DateFormat.FULL
Or, you can use a custom format like so:
foo(right_now) ::= << <right_now; format="MM/dd/yyyy"> >>
You can see these options and other details on the DateRenderer Java source here
one very important fact while setting date format is to use "MM" instead of "mm" for month. "mm" is meant to be used for minutes. Using "mm" instead of "MM" very generally introduces bugs difficult to find.

Categories