Regex for matching multiple date formats?

Regex for matching multiple date formats? - java

Sorry if this is a noob question but I'm not very comfortable with regex and (as of now) this is a little beyond my understanding.
My dilemma is that we have a verity of ID badges that get scanned into an android application and I'm trying to parse out some dates.
For example, some dates are represented like so:
"ISS20141231" format = yyyyMMdd desired output = "20141231"
"ISS12312014" format = MMddyyyy desired output = "12312014"
"ISS12-31-2014" format = MM-dd-yyyy desired output = "12312014"
currently I have a regex pattern:
Pattern p = Pattern.compile("ISS(\\d{8})");
Matcher m = p.matcher(scanData);
which worked fine for the first two examples but recently I have realized that we also occasionally have dates which use dashes (or slashes) as separators.
Is there an efficient means for extracting these dates without having to write multiple patterns and loop through each one checking for a match?
possibly similar to: "ISS([\d{8} (\d{2}\w\d{2}\w\d{4}) (\d{4}\w\d{2}\w\d{2})])"
Thanks!!
[EDIT]
Just to make things a little bit more clear. The substring ("ISSMMddyyyy") is from a much larger string and could be located anywhere within it. So regex must search the original (200+ byte) string for a match.

If that date string is actually a substring of a larger string, and so you need the regex in order to also search for that pattern, you could modify your regex to be:
ISS([\\d\\-/]{8,10})
And then when retrieving the capture group, strip the hyphens and slashes.
String dateStr = m.group(1).replaceAll("[/\\-]", "");

You can do 2 replace i.e. replace ISS first and then replace / or -:
str = str.replaceFirst("^ISS", "").replaceAll("[/-]", "");

Or to only use a regex:
Search: ISS([0-9])([-./])([0-9])([-./])([0-9]*)
Replace: ${1}${3}${5}

Related

replace char in specific pattern or after specific character

The task at hand is to replace "-" with "/" in a birthday format e.g. 03-12-89 -> 03/12/89. However, the "-" must be able to appear elsewhere in the string e.g. "My-birthday-is-on-the: 03/12/89".
I have tried creating substrings, replace the "-" in the birthday part and then combine the strings again. However, that solution is inflexible and fails the testcases.
I'm thinking I must be able to do this with a regular expression, although I seem unable to construct it. So now I'm back to: String newStr = input.replace("-", "/"); Which remove all instances of "-" which I don't want.
Can anyone help?

You can use the following regex:
(?<=\d{2})-
with replacement \/ (no need to escape it in Java)
INPUT:
My-birthday-is-on-the: 03-12-89
OUTPUT:
My-birthday-is-on-the: 03/12/89
demo
Code:
String input = "My-birthday-is-on-the: 03-12-89";
System.out.println(input.replaceAll("(?<=\\d{2})-", "/"));
OUTPUT:
My-birthday-is-on-the: 03/12/89

The easiest way which comes to mind is just match \d{2}-\d{2}-\d{2}, with capture groups. Then, use those captured numbers to rebuild the birthdate the way you want it. Something like this:
String input = "My-birthday-is-on-the: 03/12/89";
input = input.replaceAll("\\b(\\d{2})-(\\d{2})-(\\d{2})\\b", "$1/$2/$3");
Demo
The advantage of specifying the full pattern is that it avoids the chance of matching anything other than a 6 digit dash-separated birthday.
Edit:
Based on your comment below, it sounds like maybe you want to do this replacement on a two dash separated number, with any number of digits. In this case, we can slightly modify the above code to the following:
String input = "Your policy number is: 123-45-6789.";
input = input.replaceAll("\\b(\\d+)-(\\d+)-(\\d+)\\b", "$1/$2/$3");

Using regex in java to extract a string between two words in html syntax

I have a json feed that feeds html that is used to populate the calendar, I need to retrieve some of the information from it. For example title, time and location. I wanted to use regex to get content between
<span class=\"title\">
and
<\/span><br/><b>
and I am trying to use this code
for(int i = 0; i < json.length(); i++)
{
JSONObject object = new JSONObject(json.getJSONObject(i));
System.out.println(object.getNames(object));
Pattern p = Pattern.compile("(?i)(<span class=\"title\">)(.+?)(<\\/span>)");
Matcher m = p.matcher(json.get(0).toString());
m.find();
System.out.println(m.group(0));
But it doesn't seem to do the job... I have tried multiple ittoriations and tried researching examples online, but I am not sure if I am doing something wrong in the regex syntax. Help would be appreciated.
{"hoverContent":"<b>Title: <\/b><span class=\"title\">Accounting Awareness<\/span><br/><b>Time: <\/b><span class=\"time\">5:30 PM - 6:30 PM<br/><b>Location: <\/b><span class=\"location\">1185 Grainger Hall<\/span><br/><b>Description: <\/b><br/><span class=\"description\">Information from Kristen Fuhremann, Director of Professional Programs in Accounting and Q&A from a panel of current and former students who will share their experiences in the accounting program. Panel includes a grad of the IMAcc program currently in law school, a candidate for the IMAcc program who studied abroad, an accounting and finance double major, and an IMAcc student who is also a TA for AIS 100. Casual Attire is appropriate.<br />Contact: Natalie Dickson, <a href=\"mailto:ndickson#wisc.edu\">ndickson#wisc.edu<\/a><\/span><br/>","title":"Accounting Awareness","start":"2013-09-30 17:30:00","allDay":false,"itemId":"2356754a-8178-4afd-b4cf-7f5f5ce89868","end":"2013-09-30 18:30:00"}
null

m.group(0) always returns the entire string that matches the regex. It looks like you want to return a particular group, so you need to use m.group(1) to get the text that matches the first group, m.group(2) for the second group, and so on. In this regex:
"(?i)(<span class=\"title\">)(.+?)(<\\/span>)"
anything in parentheses, except for things that begin with (?, counts as a group, so the portion in (.+?) is the second capture group, and you can try retrieving it with m.group(2). In this case, there's no need to put the <span stuff in parentheses, so you could say
"(?i)<span class=\"title\">(.+?)<\\/span>"
and now use m.group(1) to get at the first (and only) capture group.

Using regexp to parse something is not really a good idea from design standpoint.
I would personally just wrap the content in a fake tag and parse it using XML parser. There will be overhead, but you don't use regexp to parse JSON, right? Why not do the same for XML?

Try this regex with DOTALL mode, also avoid redundant escaping:
Pattern p = Pattern.compile("(?si)<span class=\"title\">(.+?)</span>");

How to retrieve portion of number that's within parenthesis in Java?

For part of my Java assignment I'm required to select all records that have a certain area code. I have custom objects within an ArrayList, like ArrayList<Foo>.
Each object has a String phoneNumber variable. They are formatted like "(555) 555-5555"
My goal is to search through each custom object in the ArrayList<Foo> (call it listOfFoos) and place the objects with area code "616" in a temporaryListOfFoos ArrayList<Foo>.
I have looked into tokenizers, but was unable to get the syntax correct. I feel like what I need to do is similar to this post, but since I'm only trying to retrieve the first 3 digits (and I don't care about the remaining 7), this really didn't give me exactly what I was looking for. Ignore parentheses with string tokenizer?
What I did as a temporary work-around, was...
for (int i = 0; i<listOfFoos.size();i++){
if (listOfFoos.get(i).getPhoneNumber().contains("616")){
tempListOfFoos.add(listOfFoos.get(i));
}
}
This worked for our current dataset, however, if there was a 616 anywhere else in the phone numbers [like "(555) 616-5555"] it obviously wouldn't work properly.
If anyone could give me advice on how to retrieve only the first 3 digits, while ignoring the parentheses, I would greatly appreciate it.

You have two options:
Use value.startsWith("(616)") or,
Use regular expressions with this pattern "^\(616\).*"
The first option will be a lot quicker.

areaCode = number.substring(number.indexOf('(') + 1, number.indexOf(')')).trim() should do the job for you, given the formatting of phone numbers you have.
Or if you don't have any extraneous spaces, just use areaCode = number.substring(1, 4).

I think what you need is a capturing group. Have a look at the Groups and capturing section in this document.
Once you are done matching the input with a pattern (for example "\((\\d+)\) \\d+-\\d+"), you can get the number in the parentheses using a matcher (object of java.util.regex.Matcher) with matcher.group(1).

You could use a regular expression as shown below. The pattern will ensure the entire phone number conforms to your pattern ((XXX) XXX-XXXX) plus grabs the number within the parentheses.
int areaCodeToSearch = 555;
String pattern = String.format("\\((%d)\\) \\d{3}-\\d{4}", areaCodeToSearch);
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(phoneNumber);
if (m.matches()) {
String areaCode = m.group(1);
// ...
}
Whether you choose to use a regular expression versus a simple String lookup (as mentioned in other answers) will depend on how bothered you are about the format of the entire string.

Regular expression for java.util.regex.Pattern

I am trying to create a suitable regular expression to use java.util.regex.Pattern
I am using the regular expression shown below to match Strings like so: feed_user_at_gmail_dot_com_testfile
final static Pattern PATTERN1 = Pattern.compile("feed_(.*)_([^_]*)");
This works as expected. But, I need to create another Pattern to match Strings like so: feed_user_at_gmail_dot_com_testfile_ts_20120413_dot_175531_dot_463
The difference is that the second String is a time stamped version of the first String. These two Strings are examples of file names in my database and I need to identify them both separately. The time stamped version is appended with _ts_ followed by DATE as shown above. All dots in the DATE are changed to _dot_
Thanks,
Sony

How about this:
"feed_(.*)_([^_]*)_ts_[1-9]+(_dot_[1-9]+)*"
Or better yet,
"feed_(.*)_([^_]*)_ts_[1-9]+(_dot_[1-9]+){2}"
if dates always have exactly two dots.

Java Regex - exclude empty tags from xml

let's say I have two xml strings:
String logToSearch = "<abc><number>123456789012</number></abc>"
String logToSearch2 = "<abc><number xsi:type=\"soapenc:string\" /></abc>"
String logToSearch3 = "<abc><number /></abc>";
I need a pattern which finds the number tag if the tag contains value, i.e. the match should be found only in the logToSearch.
I'm not saying i'm looking for the number itself, but rather that the matcher.find method should return true only for the first string.
For now i have this:
Pattern pattern = Pattern.compile("<(" + pattrenString + ").*?>",
Pattern.CASE_INSENSITIVE);
where the patternString is simply "number". I tried to add "<(" + pattrenString + ")[^/>].*?> but it didn't work because in [^/>] each character is treated separately.
Thanks

This is absolutely the wrong way to parse XML. In fact, if you need more than just the basic example given here, there's provably no way to solve the more complex cases with regex.
Use an easy XML parser, like XOM. Now, using xpath, query for the elements and filter those without data. I can only imagine that this question is a precursor to future headaches unless you modify your approach right now.

So a search for "<number[^/>]*>" would find the opening tag. If you want to be sure it isn't empty, try "<number[^/>]*>[^<]" or "<number[^/>]*>[0-9]"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for matching multiple date formats? - java

You can do 2 replace i.e. replace ISS first and then replace / or -: str = str.replaceFirst("^ISS", "").replaceAll("[/-]", "");

Or to only use a regex: Search: ISS([0-9])([-./])([0-9])([-./])([0-9]*) Replace: ${1}${3}${5}

Related

replace char in specific pattern or after specific character

Using regex in java to extract a string between two words in html syntax

How to retrieve portion of number that's within parenthesis in Java?

Regular expression for java.util.regex.Pattern

Java Regex - exclude empty tags from xml

Categories

Resources