Parsing time with Regex in Java - java

The code snipped below is trying to extract the hour, minutes and seconds of a string.
Ex:
"PT5M30S"
"PT1H13M59S"
I am getting a NullPointerException in this line (group=null): int number = new Integer(group.substring(0, group.length()-1));
// Create a Pattern object
Pattern pattern = Pattern.compile("PT(\\d+H)?(\\d+M)?(\\d+S)?");
// Now create matcher object.
Matcher matcher = pattern.matcher(duracaoStr);
int hour = 0;
int minute = 0;
int second = 0;
if(matcher.matches()){
for(int i = 1; i<=matcher.groupCount();i++){
String group = matcher.group(i);
int number = new Integer(group.substring(0, group.length()-1));
if(matcher.group(i).endsWith("H")){
hour = number;
} else if(matcher.group(i).endsWith("M")){
minute = number;
} else if(matcher.group(i).endsWith("S")){
second = number;
}
}
}

Just try to compile this code for both the String's individually, one by one.
You'll then notice that this program compiles successfully for the second String i.e., PT1H13M59S whereas it gives NullPointerException for the first String, i.e., PT5M30S
You get this NullPointerException from your first String PT5M30S because this String doesn't contains group 1. Notice that there's no Hour value for your first String PT5M30S
See this Demo:
RegEx
PT(\d+H)?(\d+M)?(\d+S)?
Input
PT5M30S
PT1H13M59S
Match Information
MATCH 1
2. [2-4] `5M`
3. [4-7] `30S`
MATCH 2
1. [10-12] `1H`
2. [12-15] `13M`
3. [15-18] `59S`
Notice that in for the first String in Match 1, there's no output for Group 1.
So what you should do is you should perform appropriate validations. Just enclose your code where you're getting NullPointerException in try catch block and if NullPointerException occurs, then give default values to all the variables.
For example:,
import java.util.regex.*;
public class HelloWorld {
public static void main(String[] args) {
// Create a Pattern object
Pattern pattern = Pattern.compile("PT(\\d+H)?(\\d+M)?(\\d+S)?");
// Now create matcher object.
Matcher matcher = pattern.matcher("PT5M30S");
int hour = 0;
int minute = 0;
int second = 0;
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
try {
String group = matcher.group(i);
int number = new Integer(group.substring(0, group.length() - 1));
if (matcher.group(i).endsWith("H")) {
hour = number;
} else if (matcher.group(i).endsWith("M")) {
minute = number;
} else if (matcher.group(i).endsWith("S")) {
second = number;
}
} catch (java.lang.NullPointerException e) {
if (i == 1) {
hour = 0;
} else if (i == 2) {
minute = 0;
} else if (i == 3) {
second = 0;
}
}
}
}
}
}

#rD's solution above is sufficient and well answered ( please choose his ). Just as an alternative I was working on a solution here as well before I realized it was answered properly:
https://github.com/davethomas11/stackoverflow_Q_39443620
// Create a Pattern object
Pattern pattern = Pattern.compile("PT(\\d+H)?(\\d+M)?(\\d+S)?");
// Now create matcher object.
Matcher matcher = pattern.matcher(duracaoStr);
int hour = 0;
int minute = 0;
int second = 0;
if(matcher.matches()){
for(int i = 1; i<=matcher.groupCount();i++){
String group = matcher.group(i);
//Group will be null if not in pattern
if (group != null) {
int number = new Integer(group.substring(0, group.length()-1));
if(matcher.group(i).endsWith("H")){
hour = number;
} else if(matcher.group(i).endsWith("M")){
minute = number;
} else if(matcher.group(i).endsWith("S")){
second = number;
}
}
}
}
Same thing I've added checking for null.

Related

How to compress string on java without using map

I've recently started java and I want to compress a string like this:
Input:aaaaabbbbwwwccc Output:a5b4w3c3
Input:aaabbccds Output:a3b2c2ds
Input:Abcd Output:Abcd
The following code is what I have done but, it does not work.
public class CompressString {
public static void main(String[] args) {
String out = "";
Scanner in = new Scanner(System.in);
String input = in.next();
int length = input.length();
int counter = 1;
if (length == 0) {
System.out.println(" ");
} else {
for (int i = 0; i<length;i++){
if (input.charAt(i)==input.charAt(i+1)){
counter++;
}else {
if (counter == 1){
out = out+input.charAt(i-counter);
}else{
out = out+input.charAt(i-counter)+counter;
}
}
i++;
counter = 1;
}
System.out.println(out.toString());
}
}
}
The simplest program to do that would loop through each character in the string and check when the character is different from the previous seen one and, if so, add the last one and its count to the compressed string:
String input = "aaaaabbbbwwwccc";
StringBuilder compressed = new StringBuilder();
char last = 0;
int lastCount = 0;
for (int i = 0; i < input.length(); i++) {
char c = input.charAt(i);
if (last == 0 || c != last) {
if (lastCount != 0) {
compressed.append(last);
if (lastCount > 1) {
compressed.append(lastCount);
}
}
last = c;
lastCount = 1;
} else {
lastCount++;
}
}
// take care of the last repeating sequence if any
if (lastCount > 0) {
compressed.append(last);
if (lastCount > 1) {
compressed.append(lastCount);
}
}
Here is a very compact way of doing this with a regex matcher along with a string buffer:
String input = "aaaaabbbbwwwccc";
Pattern r = Pattern.compile("(.)\\1{0,}");
Matcher m = r.matcher(input);
StringBuffer buffer = new StringBuffer();
while (m.find()) {
m.appendReplacement(buffer, m.group(1) + m.group(0).length());
}
m.appendTail(buffer);
System.out.println(buffer.toString());
This prints:
a5b4w3c3
For an explanation, the above logic searches for the regex pattern (.)\1{0,}. This will match any single character, along with that same character occurring again possibly one or more times afterwards. It then replaces with just the single character followed by the count of the number of times it occurs.

Java store matches in array

Hi I would like to store my matches in my array however constantly getting errors of nullpointer or out of bounds.
final String mcontentURI[] = new String[count];
for (int i = 0; i < count; i++) {
Pattern p = Pattern.compile("src=\"(.*?)\"");
Matcher m = p.matcher(content_val);
if (m.find()) {
mcontentURI[i] = (m.group(i+1));
}
}
Since you keep re-compiling the same regex, the group number is going to stay the same. You can put it at different indexes of the array, though:
final String mcontentURI[] = new String[count];
final Pattern p = Pattern.compile("src=\"(.*?)\"");
for (int i = 0; i < count; i++) {
Matcher m = p.matcher(content_val); // Use different strings here
if (m.find()) {
mcontentURI[i] = m.group(1);
}
}
Note that mcontentURI[i] would remain null for indexes for which the pattern did not match.
If you want to search the same string, do this:
final String mcontentURI[] = new String[count];
final Pattern p = Pattern.compile("src=\"(.*?)\"");
Matcher m = p.matcher(content_val);
int i = 0;
while (i < count && m.find()) {
mcontentURI[i++] = m.group(1);
}

Unformat formatted String

I have a simple formatted String:
double d = 12.348678;
int i = 9876;
String s = "ABCD";
System.out.printf("%08.2f%5s%09d", d, s, i);
// %08.2f = '12.348678' -> '00012,35'
// %5s = 'ABCD' -> ' ABCD'
// %09d = '9876' -> '000009876'
// %08.2f%5s%09d = '00012,35 ABCD000009876'
When i know the pattern: %08.2f%5s%09d and String: 00012,35 ABCD000009876:
Can i "unformat" this String in some way?
eg. the expected result something like 3 tokens: '00012,35', ' ABCD', '000009876'
This is specific to your pattern. A general parser for a formatstring, (because what we call unformatting is parsing) would look much different.
public class Unformat {
public static Integer getWidth(Pattern pattern, String format) {
Matcher matcher = pattern.matcher(format);
if (matcher.find()) {
return Integer.valueOf(matcher.group(1));
}
return null;
}
public static String getResult(Pattern p, String format, String formatted,
Integer start, Integer width) {
width = getWidth(p, format);
if (width != null) {
String result = formatted.substring(start, start + width);
start += width;
return result;
}
return null;
}
public static void main(String[] args) {
String format = "%08.2f%5s%09d";
String formatted = "00012.35 ABCD000009876";
String[] formats = format.split("%");
List<String> result = new ArrayList<String>();
Integer start = 0;
Integer width = 0;
for (int j = 1; j < formats.length; j++) {
if (formats[j].endsWith("f")) {
Pattern p = Pattern.compile(".*([0-9])+\\..*f");
result.add(getResult(p, formats[j], formatted, start, width));
} else if (formats[j].endsWith("s")) {
Pattern p = Pattern.compile("([0-9])s");
result.add(getResult(p, formats[j], formatted, start, width));
} else if (formats[j].endsWith("d")) {
Pattern p = Pattern.compile("([0-9])d");
result.add(getResult(p, formats[j], formatted, start, width));
}
}
System.out.println(result);
}
}
Judging by your output format of "%08.2f%5s%09d", it seems comparable to this pattern
"([0-9]{5,}[\\.|,][0-9]{2,})(.{5,})([0-9]{9,})"
Try the following:
public static void main(String[] args) {
String data = "00012,35 ABCD000009876";
Matcher matcher = Pattern.compile("([0-9]{5,}[\\.|,][0-9]{2,})(.{5,})([0-9]{9,})").matcher(data);
List<String> matches = new ArrayList<>();
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
matches.add(matcher.group(i));
}
}
System.out.println(matches);
}
Results:
[00012,35, ABCD, 000009876]
UPDATE
After seeing the comments, here's a generic example without using RegularExpressions as to not copy #bpgergo (+1 to you with generic RegularExpressions approach). Also added some logic in case the format ever exceeded the width of the data.
public static void main(String[] args) {
String data = "00012,35 ABCD000009876";
// Format exceeds width of data
String format = "%08.2f%5s%09d%9s";
String[] formatPieces = format.replaceFirst("^%", "").split("%");
List<String> matches = new ArrayList();
int index = 0;
for (String formatPiece : formatPieces) {
// Remove any argument indexes or flags
formatPiece = formatPiece.replaceAll("^([0-9]+\\$)|[\\+|-|,|<]", "");
int length = 0;
switch (formatPiece.charAt(formatPiece.length() - 1)) {
case 'f':
if (formatPiece.contains(".")) {
length = Integer.parseInt(formatPiece.split("\\.")[0]);
} else {
length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
}
break;
case 's':
length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
break;
case 'd':
length = Integer.parseInt(formatPiece.substring(0, formatPiece.length() - 1));
break;
}
if (index + length < data.length()) {
matches.add(data.substring(index, index + length));
} else {
// We've reached the end of the data and need to break from the loop
matches.add(data.substring(index));
break;
}
index += length;
}
System.out.println(matches);
}
Results:
[00012,35, ABCD, 000009876]
You can do something like this:
//Find the end of the first value,
//this value will always have 2 digits after the decimal point.
int index = val.indexOf(".") + 3;
String tooken1 = val.substring(0, index);
//Remove the first value from the original String
val = val.substring(index);
//get all values after the last non-numerical character.
String tooken3 = val.replaceAll(".+\\D", "");
//remove the previously extracted value from the remainder of the original String.
String tooken2 = val.replace(tooken3, "");
This will fail if the String value contains a number at the end and probably in some other cases.
As you know the pattern, it means that you are dealing with some kind of regular expression. Use them to utilize your needs.
Java has decent regular expression API for such tasks
Regular expressions can have capturing groups and each group would have a single "unformatted" part just as you want. All depends on regex you will use/create.
Easiest thing to do would be to parse the string using a regex with myString.replaceAll(). myString.split(",") may also be helpful for splitting your string into a string array

Java Regex : How to detect the index of not mached char in a complex regex

I'm using regex to control an input and I want to get the exact index of the wrong char.
My regex is :
^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])?
If I type the following input :
DATE/201A08
Then macher.group() (using lookingAt() method) will return "DATE" instead of "DATE/201". Then, I can't know that the wrong index is 9.
If I read this right, you can't do this using only one regex.
^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])? assumes either a String starting with 1 to 4 characters followed by nothing, or followed by / and exactly 6 digits. So it correctly parses your input as "DATE" as it is valid according to your regex.
Try to split this into two checks. First check if it's a valid DATE
Then, if there's an actual / part, check this against the non-optional pattern.
You want to know whether the entire pattern matched, and when not, how far it matched.
There regex fails. A regex test must succeed to give results in group(). If it also succeeds on a part, one does not know whether all was matched.
The sensible thing to do is split the matching.
public class ProgressiveMatch {
private final String[] regexParts;
private String group;
ProgressiveMatch(String... regexParts) {
this.regexParts = regexParts;
}
// lookingAt with (...)?(...=)?...
public boolean lookingAt(String text) {
StringBuilder sb = new StringBuilder();
sb.append('^');
for (int i = 0; i < regexParts.length; ++i) {
String part = regexParts[i];
sb.append("(");
sb.append(part);
sb.append(")?");
}
Pattern pattern = Pattern.compile(sb.toString());
Matcher m = pattern.matcher(text);
if (m.lookingAt()) {
boolean all = true;
group = "";
for (int i = 1; i <= regexParts.length; ++i) {
if (m.group(i) == null) {
all = false;
break;
}
group += m.group(i);
}
return all;
}
group = null;
return false;
}
// lookingAt with multiple patterns
public boolean lookingAt(String text) {
for (int n = regexParts.length; n > 0; --n) {
// Match for n parts:
StringBuilder sb = new StringBuilder();
sb.append('^');
for (int i = 0; i < n; ++i) {
String part = regexParts[i];
sb.append(part);
}
Pattern pattern = Pattern.compile(sb.toString());
Matcher m = pattern.matcher(text);
if (m.lookingAt()) {
group = m.group();
return n == regexParts.length;
}
}
group = null;
return false;
}
public String group() {
return group;
}
}
public static void main(String[] args) {
// ^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])?
ProgressiveMatch match = new ProgressiveMatch("[A-Z]{1,4}", "/",
"[1-2]", "[0-9]", "[0-9]", "[0-9]", "[0-1]", "[0-9]");
boolean matched = match.lookingAt("DATE/201A08");
System.out.println("Matched: " + matched);
System.out.println("Upto; " + match.group());
}
One could make a small DSL in java, like:
ProgressiveMatch match = ProgressiveMatchBuilder
.range("A", "Z", 1, 4)
.literal("/")
.range("1", "2")
.range("0", "9", 3, 3)
.range("0", "1")
.range("0", "9")
.match();

Algorithm to search and replace delimited parameters

I have a string that contains multiple parameters delimited by #, like this :
.... #param1# ... #param2# ... #paramN# ...
And I want to replace the parameter placeholders by values.
The current algorithm looks like this:
//retrieve place holder into this SQL select
Pattern p = Pattern.compile(DIMConstants.FILE_LINE_ESCAPE_INDICATOR);
Matcher m = p.matcher(sqlToExec); // get a matcher object
int count = 0;
int start = 0;
int end = 0;
StringBuilder params = new StringBuilder();
while (m.find()) {
count++;
if (count % 2 == 0) {
// Second parameter delimiter
String patternId = sqlToExec.substring(start, m.end());
//Clean value (#value#->value)
String columnName = patternId.substring(1, patternId.length() - 1);
//Look for this column into preLoad row ResultSet and retrieve its value
String preLoadTableValue = DIMFormatUtil.convertToString(sourceRow.get(columnName));
if (!StringUtils.isEmpty(preLoadTableValue)) {
aSQL.append(loadGemaDao.escapeChars(preLoadTableValue).trim());
} else {
aSQL.append(DIMConstants.COL_VALUE_NULL);
}
params.append(" " + columnName + "=" + preLoadTableValue + " ");
end = m.end();
} else {
// First parameter delimiter
start = m.start();
aSQL.append(sqlToExec.substring(end, m.start()));
}
}
if (end < sqlToExec.length()) {
aSQL.append(sqlToExec.substring(end, sqlToExec.length()));
}
I'm looking for a simplest solution, using regexp or another public API. Input parameters will be the source string, a delimiter and a map of values. Output parameter will be the source string with all the parameters replaced.
If this is for a normal SQL query, you might want to look into using PreparedStatements
Beyond that, am I missing something? Why not just use String.replace()? Your code could look like this:
for(int i = 0; i < n; i++){
String paramName = "#param" + i + "#"
sqlToExec = sqlToExec.replace(paramName,values.get(paramName));
}
That assumes you have a map called "values" with string mappings between parameters in the form "#paramN#"
If you need it more generic, this will find and return the whole param including the #'s:
public class ParamFinder {
public static void main(String[] args) {
String foo = "#Field1# #Field2# #Field3#";
Pattern p = Pattern.compile("#.+?#");
Matcher m = p.matcher(foo);
List matchesFound = new ArrayList();
int ndx = 0;
while(m.find(ndx)){
matchesFound.add(m.group());
ndx = m.end();
}
for(Object o : matchesFound){
System.out.println(o);
}
}
}

Categories