I want to extract dates with different formats out of web pages. I am using the Selenium2 Java API to interact with the browser. Also i use jQuery to further interact with the document. So, solutions for both layers are welcome.
Dates can have very different formats in different locales. Also, month names can be written as text or as number. I need to match as much dates as possible, and I am aware of the fact that there are many combinations.
For example if I have a HTML element like this:
<div class="tag_view">
Last update: May,22,2011
View :40
</div>
I want that the relevant part of the date is extracted and recognized:
May,22,2011
This should now be converted to a regular Java Date object.
Update
This should work with the HTML from any web page, the date can be contained in any element in any format. For example here on Stackoverflow the source code looks like this:
<span class="relativetime" title="2011-05-13 14:45:06Z">May 13 at 14:45</span>
I want it to be done the most effective way and i guess this would be a jQuery selector or filter which returns a standardized date representation. But I am open to your suggestions.
Since we can't limit ourselves to any specific element type or children of any element, you're basically talking about searching the whole page's text for dates. The only way to do this with any kind of efficiency is to use regular expressions. Since you're looking for dates in any format, you need a regex for each acceptable format. Once you define what those are, just compile the regexes and run something like:
var datePatterns = new Array();
datePatterns.push(/\d\d\/\d\d\/\d\d\d\d/g);
datePatterns.push(/\d\d\d\d\/\d\d\/\d\d/g);
...
var stringToSearch = $('body').html(); // change this to be more specific if at all possible
var allMatches = new Array();
for (datePatternIndex in datePatterns){
allMatches.push(stringToSearch.match(datePatterns[datePatternIndex]));
}
You can find more date regexes by googling around, or make them yourself, they're pretty easy. One thing to note: You could probably combine some regexes above to create a more efficient program. I'd be very careful with that, it could cause your code to become hard to read very quickly. Doing one regex per date format seems much cleaner.
You could consider using getText to get element text and then split the String, like -
String s = selenium.getText("css=span.relativetime");
String date = s.split("Last update:")[1].split("View :")[0];
I will answer this myself because i came up with a working solution. I appreciate comments though.
/**
* Extract date
*
* #return Date object
* #throws ParseException
*/
public Date extractDate(String text) throws ParseException {
Date date = null;
boolean dateFound = false;
String year = null;
String month = null;
String monthName = null;
String day = null;
String hour = null;
String minute = null;
String second = null;
String ampm = null;
String regexDelimiter = "[-:\\/.,]";
String regexDay = "((?:[0-2]?\\d{1})|(?:[3][01]{1}))";
String regexMonth = "(?:([0]?[1-9]|[1][012])|(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Sept|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?))";
String regexYear = "((?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3}))";
String regexHourMinuteSecond = "(?:(?:\\s)((?:[0-1][0-9])|(?:[2][0-3])|(?:[0-9])):([0-5][0-9])(?::([0-5][0-9]))?(?:\\s?(am|AM|pm|PM))?)?";
String regexEndswith = "(?![\\d])";
// DD/MM/YYYY
String regexDateEuropean =
regexDay + regexDelimiter + regexMonth + regexDelimiter + regexYear + regexHourMinuteSecond + regexEndswith;
// MM/DD/YYYY
String regexDateAmerican =
regexMonth + regexDelimiter + regexDay + regexDelimiter + regexYear + regexHourMinuteSecond + regexEndswith;
// YYYY/MM/DD
String regexDateTechnical =
regexYear + regexDelimiter + regexMonth + regexDelimiter + regexDay + regexHourMinuteSecond + regexEndswith;
// see if there are any matches
Matcher m = checkDatePattern(regexDateEuropean, text);
if (m.find()) {
day = m.group(1);
month = m.group(2);
monthName = m.group(3);
year = m.group(4);
hour = m.group(5);
minute = m.group(6);
second = m.group(7);
ampm = m.group(8);
dateFound = true;
}
if(!dateFound) {
m = checkDatePattern(regexDateAmerican, text);
if (m.find()) {
month = m.group(1);
monthName = m.group(2);
day = m.group(3);
year = m.group(4);
hour = m.group(5);
minute = m.group(6);
second = m.group(7);
ampm = m.group(8);
dateFound = true;
}
}
if(!dateFound) {
m = checkDatePattern(regexDateTechnical, text);
if (m.find()) {
year = m.group(1);
month = m.group(2);
monthName = m.group(3);
day = m.group(3);
hour = m.group(5);
minute = m.group(6);
second = m.group(7);
ampm = m.group(8);
dateFound = true;
}
}
// construct date object if date was found
if(dateFound) {
String dateFormatPattern = "";
String dayPattern = "";
String dateString = "";
if(day != null) {
dayPattern = "d" + (day.length() == 2 ? "d" : "");
}
if(day != null && month != null && year != null) {
dateFormatPattern = "yyyy MM " + dayPattern;
dateString = year + " " + month + " " + day;
} else if(monthName != null) {
if(monthName.length() == 3) dateFormatPattern = "yyyy MMM " + dayPattern;
else dateFormatPattern = "yyyy MMMM " + dayPattern;
dateString = year + " " + monthName + " " + day;
}
if(hour != null && minute != null) {
//TODO ampm
dateFormatPattern += " hh:mm";
dateString += " " + hour + ":" + minute;
if(second != null) {
dateFormatPattern += ":ss";
dateString += ":" + second;
}
}
if(!dateFormatPattern.equals("") && !dateString.equals("")) {
//TODO support different locales
SimpleDateFormat dateFormat = new SimpleDateFormat(dateFormatPattern.trim(), Locale.US);
date = dateFormat.parse(dateString.trim());
}
}
return date;
}
private Matcher checkDatePattern(String regex, String text) {
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
return p.matcher(text);
}
Related
I am fetching data from file using filestream and importing this data into oracle tables. I have column 'FT__FIRST' which is Date data type in oracle table and where i only need date values to be inserted and ignore other values. From file the date is coming in format 'YYYYMMDD'. In future if there is other values coming from file rather than Date datatype for this column and if it tries to insert into oracle table then the java code might throw an error as literal string does not match.
So to avoid this issue i want to modify my java code such that it can take only insert date format value and ignore other values. Currently i am handling only specific string from file which i know and ignoring it as they are not date format..
Java uses classes DateTimeformatter but dont know how to use it in my code..
private String createUpdateTableSql(String line, String tableName, String dateFormat, List<ColumnData> columnData) {
List<String> data = Splitter.on("|").trimResults().splitToList(line);
String ftFirst = "";
String tr = "";
String pds = "";
for (int i = 0; i < columnData.size(); i++) {
if(columnData.get(i) == null || "N.A.".equalsIgnoreCase(data.get(i)) || "N.A".equalsIgnoreCase(data.get(i)) || "UNKNOWN".equalsIgnoreCase(data.get(i))) {
continue;
}
if ("FT_FIRST".equalsIgnoreCase(columnData.get(i).getName().trim())) {
ftFirst = data.get(i);
}
if ("TR".equalsIgnoreCase(columnData.get(i).getName().trim())) {
tr = data.get(i);
}
if ("P_S_SOURCE".equalsIgnoreCase(columnData.get(i).getName().trim())) {
pds = data.get(i);
}
}
return "UPDATE " + tableName + " " +
"SET FT_FIRST=to_date('" + ftFirst + "','YYYYMMDD')" +
" WHERE TR='" + ticker +
"' AND P_S_SOURCE='" + pds + "'";
}
When you read data from file, you could parse the date field to date object like below:
// NEW CODE
private Date getDateValue(String sDate, String format) {
SimpleDateFormat sdf = new SimpleDateFormat(format);
try {
return
sdf.parse(format);
} catch (ParseException ignored) {
// TODO: Log this exception return null;
}
}
As you can see, you are parsing a string to date with an expected format yyyyMMdd (YYYYMMDD on oracle)
If parsing fails (value field does not containts the expected format) you can add the logic as you want on catch clausule or ignore error and let value as null.
Your code will be like this:
private String createUpdateTableSql(String line, String tableName, String dateFormat, List<ColumnData> columnData) {
List<String> data = Splitter.on("|").trimResults().splitToList(line);
String futNoticeFirst = "";
String ticker = "";
String pds = "";
for (int i = 0; i < columnData.size(); i++) {
if (columnData.get(i) == null || "N.A.".equalsIgnoreCase(data.get(i)) || "N.A".equalsIgnoreCase(data.get(i)) || "UNKNOWN".equalsIgnoreCase(data.get(i))) {
continue;
}
if ("FUT_NOTICE_FIRST".equalsIgnoreCase(columnData.get(i).getName().trim())) {
futNoticeFirst = getDateValue(data.get(i), 'yyyyMMdd');
}
if ("TICKER".equalsIgnoreCase(columnData.get(i).getName().trim())) {
ticker = data.get(i);
}
if ("PARSEKYABLE_DES_SOURCE".equalsIgnoreCase(columnData.get(i).getName().trim())) {
pds = data.get(i);
}
}
return "UPDATE " + tableName + " " +
"SET FUT_NOTICE_FIRST= " + futNoticeFirst +
" WHERE TICKER='" + ticker +
"' AND PARSEKYABLE_DES_SOURCE='" + pds + "'";
}
// NEW CODE
private Date getDateValue(String sDate, String format) {
SimpleDateFormat sdf = new SimpleDateFormat(format);
try {
return
sdf.parse(format);
} catch (ParseException ignored) {
// TODO: Log this exception
return null;
}
}
I'm developing an API in which one of the object should satisfy the below:
I've a column in Postgress DB as :
input = 201800
Input(column -datatype is character[6]) 201801
.
.
.
201812
Requirement :
Check if the input is current year
check if the last 2 digits are zeros , then ignore
if it is 201802 : dd=01(always), 02(Month) and 2018(year)
after checking the conditions, if it satisfies display like this :
For Ex: input = 201803 then o/p should in MM/DD/YYYY h:m format
Always month is 01 for month in between (01-12),O/P = 01/03/2018 12:00
I tried this, but didn't gibe O/P as per ,y requirement:
public class sample {
public static void main(String[] args) {
try {
String input= "201700" + "00";
SimpleDateFormat sdf1 = new SimpleDateFormat("yyyyMMdd");
Date d = sdf1.parse(input);
System.out.print(d);
String formateDate = new SimpleDateFormat("MM/dd/yyyy hh:mm").format(d);
System.out.print(formateDate);
} catch(Exception e) {}
}
}
Appreciate your help!
I created a method mapDate() that is actually verifying your condition and return the wanted output. You just have to pass the value as parameter to the method. See the example below:
import java.text.SimpleDateFormat;
import java.util.Date;
public class Test {
public static void main(String[] args) {
String result1 = mapDate("201802"+"00");
System.out.print("result: " + result1 + "\n");
String result2 = mapDate("201702"+"02");
System.out.print("result: " + result2 + "\n");
String result3 = mapDate("201802"+"02");
System.out.print("result: " + result3);
}
public static String mapDate(String value){
String myDate;
myDate = "";
SimpleDateFormat format1 = new SimpleDateFormat("yyyy");
String thisYear = format1.format(new Date());
if(!value.substring(value.length() - 2).equals("00") && value.substring(0, 4).equals(thisYear))
myDate = "01/" + value.substring(4, 6) + "/" + value.substring(0, 4) + " 12:00";
return myDate;
}
}
The ouput of this example is:
result:
result:
result: 01/02/2018 12:00
I have a file path name whose prefix always changes as below :
"Unregistered_2018-05-02_14.40.04_+621241411112_34243555523.mp3"
"Martin_2018-04-01_03.10.40_+111_5213441935.mp3"
"Byan_2018-01-04_04.70.01_+62994_2313325553.mp3"
How can I retrieve date (2018-01-04), time (04.70.01) and number phone (+111) with the ever-changing data ?
Whoever you are I am very grateful to finish this
You can use split with _ like this :
String[] texts = new String[] {
"Unregistered_2018-05-02_14.40.04_+621241411112_34243555523.mp3",
"Martin_2018-04-01_03.10.40_+111_5213441935.mp3",
"Byan_2018-01-04_04.70.01_+62994_2313325553.mp3",
};
for (String text : texts) {
String[] split = text.split("_");
String date = split[1];
String time = split[2];
String phone = split[3];
System.out.println("date = " + date + ", time = " + time + ", phone = " + phone);
}
Outputs
date = 2018-05-02, time = 14.40.04, phone = +621241411112
date = 2018-04-01, time = 03.10.40, phone = +111
date = 2018-01-04, time = 04.70.01, phone = +62994
i got the system time in a string for example something like "1240".
then i wanted to do something like if the system time was < than 1240,then close the application.
but it gives me the "Operator '<' cannot be applied to java.lang.String!" Error!
My code is :
runOnUiThread(new Runnable() {
public void run() {
try{
TextView txtCurrentTime= (TextView)findViewById(R.id.showtime);
Date dt = new Date();
int hours = dt.getHours();
int minutes = dt.getMinutes();
int mynum = 1240;
String curTime = hours + "" + minutes;
txtCurrentTime.setText(curTime);
if(curTime < mynum ){
System.exit(0);
}
}catch (Exception e) {}
}
});
What's the problem?
try{
SimpleDateFormat format = new SimpleDateFormat("HH:mm:ss");
String str1 = String.valueOf(hours1) + ":" + String.valueOf(minutes1) + ":" + "00";
String str2 = String.valueOf(hours2) + ":" + String.valueOf(minutes2) + ":" + "00";
Date date1 = formatter.parse(str1);
Date date2 = formatter.parse(str2);
if (date1.compareTo(date2)<0)
{
// if date2 > date1
}
}catch (ParseException e1){
e1.printStackTrace();
}
formats for dates
check date/time format as per your requirement from here
< is not defined for a string and an int of course . So you can't use it .
your current time can be calculated like this :
int curTime = 100*hours + minutes;
then you can use < between two integers.
I believe though you must use System Milliseconds which is more usual.
if(hours * 100 + minutes < mynum){
System.exit(0);
}
Edited question
I want to pull the date and time out of some strings. Here's an example. All Event strings start with [0r(1)2[000p[040qe1w3h162[020t*. upon encountering a new one, it should parse the last string set and get some data. an example event is below
[0r(1)2[000p[040qe1w3h162[020t*881*11/11/2010*12:24*
*EVENT STARTED*
[020t 12:24:06 SMARTCARD ENTERED
11\11\10 12:24 10390011
123456789098765432 6598
INVALID TRANSACTION, PLEASE CONTACT
ADMIN FOR ADVICE
-----------------------------------
[020t 12:24:52 FILE STACKED
[020t 12:24:59 FILE PRESENTED 0,5,0,0
[020t 12:25:03 FILE TAKEN
11\11\10 12:25 10390011
123456789098765432 6599
WITHDRAW FILES10.00
[000p[040q(1 *6599*1*E*000050000,M-00,R-10200
-----------------------------------
[020t 12:25:34 SMARTCARD TAKEN
[020t 12:25:38 EVENT ENDED
I want to extract date and time as one variable for every activity. e.g.
Activity= EVENT STARTED
Activity time/date= 11/11/2010 12:24
Activity= SmartCard inserted
Activity time/date= 12:24:06
I tried the following
/*
String sample = "[0r(1)2[000p[040qe1w3h162[020t*882*11/11/2010*12:26*";
String regex = "(?x) ^([0r(1)2[000p[040qe1w3h162[020t*):// ([^/:]+) (?:(\\d+))?";
Matcher m = Pattern.compile(regex).matcher(sample);
if(m.find())
{
String ignore = m.group();
String date = m.group(1);
String time = m.group(2);
System.out.println( date + " " + time);
}
*/
//this section isn't useful in light of the edit to the question
Use String.split(String regex):
String line = "[0r(1)2[000p[040qe1w3h162[020t*882*11/11/2010*12:26*";
String[] parts = line.split("\\*");
String date = parts[2];
String time = parts[3];
System.out.println("date=" + date + ", time=" + time);
Output:
date=11/11/2010, time=12:26
class sql
{
public static void main (String [] args)
{
String dateInCase = "11/11/2010";
String termID;
String line = " 11\11\10 12:24 10390011";
String[] parts = line.split("");
String termId = parts[4]+parts[5]+parts[6]; //to catch terminal ID
String cardInserted = parts[1]+parts[2]+parts[3]+parts[4]+parts[5];
String starter = parts[4]+parts[7]+parts[13]+parts[14]+parts[15];
String tracker = parts[3]+parts[4]+parts[5]+parts[6]+parts[7];
boolean V = (termId.matches("\\s\\d\\d"));
boolean W = (cardInserted.matches("\\s\\s\\s\\s\\s"));//this gets card inserted
boolean X = (starter.matches("\\D\\d\\d\\d\\d"));// a new event set has started
boolean Y = (tracker.matches("\\d\\d\\d\\D\\s")); // this checks for any activity as all activities have numbers in 3,4,5
System.out.println(V);
System.out.println(W);
System.out.println(X);
System.out.println(Y);
if(V == true)
{
parts = line.split("\\ ");
String terminal = parts[2];
System.out.println("terminal " + terminal);
}
if(W == true)//this gets card inserted strings
{
parts =line.split("\\*");
String activity = parts[1];
System.out.print(activity);
}
if(X == true) //action if it sees a new event set
{
parts = line.split("\\*");
String datetime = parts[2]+" "+ parts[3];
System.out.println("time= " + datetime);
dateInCase = parts[2];
}
if(Y == true) //action if it sees a new event
{
parts =line.split("\\ ");
String datetime = dateInCase+ " " + parts[1];
String activity = parts[2]+ " " + parts[3];
System.out.println("time= " + datetime + " activity= " + activity);
}
}
}