java regex capturing 2 numbers - java

I'm looking for a way to capture the year and the last number of a string. ex: "01/02/2017,546.12,24.2," My problem so far I only got Found value : 2017 and Found value : null. I'm not able to capture the group(2). Thanks
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.Scanner;
public class Bourse {
public static void main( String args[] ) {
Scanner clavier = new Scanner(System.in);
// String to be scanned to find the pattern.
String line = clavier.nextLine();
String pattern = "(?<=\\/)(\\d{4})|(\\d+(?:\\.\\d{1,2}))(?=,$)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}

Try this one:
(\\d{2}\\.?\\d{2})
\\d{2} - exactly two digits
\\.? - optional dot
\\d{2} - exactly two digits
If I understood you correctly you're looking for 4 digits, which could be separated by dot.

Your requirements are not very clear, but this works for me to simply grab the year and the last decimal value:
Pattern pattern = Pattern.compile("[0-9]{2}/[0-9]{2}/([0-9]{4}),[^,]+,([0-9.]+),");
String text = "01/02/2017,546.12,24.2,";
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
String year = matcher.group(1);
String lastDecimal = matcher.group(2);
System.out.println("Year "+year+"; decimal "+lastDecimal);
}
I don't know whether you're deliberately using lookbehind and lookahead, but I think it's simpler to explicitly specify the full date pattern and consume the value between two explicit comma characters. (Obviously if you need the comma to remain in play you can replace the final comma with a lookahead.)
By the way, I'm not a fan of the \d shorthand because in many languages this will match all digit characters from the entire Unicode character space, when usually only matching of ASCII digits 0-9 is desired. (Java does only match ASCII digits when \d is used, but I still think it's a bad habit.)

Parse, not regex
Regex is overkill here.
Just split the string on the comma-delimiter.
String input = "01/02/2017,546.12,24.2,";
String[] parts = input.split( "," );
Parse each element into a meaningful object rather than treating everything as text.
For a date-only value, the modern approach uses the java.time.LocalDate class built into Java 8 and later.
// Parse the first element, a date-only value.
DateTimeFormatter f = DateTimeFormatter.ofPattern( "dd/MM/uuuu" );
LocalDate localDate = null;
String inputDate = parts[ 0 ] ;
try
{
localDate = LocalDate.parse( inputDate , f );
} catch ( DateTimeException e )
{
System.out.println( "ERROR - invalid input for LocalDate: " + parts[ 0 ] );
}
For numbers with decimals where accuracy matters, avoid the floating-point types and instead use BigDecimal. Given your class name “Bourse“, I assume the numbers relate to money, so accuracy matters. Always use BigDecimal for money matters.
// Loop the numbers
List < BigDecimal > numbers = new ArrayList <>( parts.length );
for ( int i = 1 ; i < parts.length ; i++ )
{ // Start index at 1, skipping over the first element (the date) at index 0.
String s = parts[ i ];
if ( null == s )
{
continue;
}
if ( s.isEmpty( ) )
{
continue;
}
BigDecimal bigDecimal = new BigDecimal( parts[ i ] );
numbers.add( bigDecimal );
}
Extract your two desired pieces of information: the year, and the last number.
Consider passing around a Year object in your code rather than a mere integer to represent the year. This gives you type-safety and makes your code more self-documenting.
// Goals: (1) Get the year of the date. (2) Get the last number.
Year year = Year.from( localDate ); // Where possible, use an object rather than a mere integer to represent the year.
int y = localDate.getYear( );
BigDecimal lastNumber = numbers.get( numbers.size( ) - 1 ); // Fetch last element from the List.
Dump to console.
System.out.println("input: " + input );
System.out.println("year.toString(): " + year );
System.out.println("lastNumber.toString(): " + lastNumber );
See this code run live at IdeOne.com.
input: 01/02/2017,546.12,24.2,
year.toString(): 2017
lastNumber.toString(): 24.2

Related

Parse from end to start of a String to grab data before the third occurrence of a delimiter

I am working on some strings and trying to parse through the data and retrieve a string that lies before the third occurrence of " - " from the end of the string. This data comes as a String from the DB and there is some text "-NONE----" that I would like to exclude while parsing.
Input (Below input is a String and not List)
String input1 = "-A123456-B987-013691-000-109264821"
String input2 = "-NONE----"
String input3 = "C1234567-A1241-EF-012361-000-18273460"
Output
String output1 = "-A123456-B987"
String output2= "-NONE----"
String output3 = "C1234567-A1241-EF"
Starting from the beginning of my string, I need to retrieve data before the third occurrence of
" - " (hyphen) is found, but I need to count " - " (hyphen) occurrence starting from end of string.
Any tips are appreciated.
You could use a regex replacement approach:
String input = "-A123456-B987-013691-000-109264821";
String output = "([^-]*(?:-[^-]+){2}).*", "$1");
System.out.println(output); // -A123456-B987
The regex pattern used here says to match:
( open capture group
[^-]* match optional first term
(?:-[^-]+){2} then match - and a term, twice
) close capture group, available as $1
.* consume the remainder of the string
You could match the three dashes from behind with the $ symbol and then extract everything that is in front of that. I created two capture groups, where the first one is what you want to extract:
private static String extractFront(String input1) {
if(input1.equals("-NONE----")) {
return input1;
} else {
Pattern pattern = Pattern.compile("(.*)(-[^-]*){3}$");
Matcher matcher = pattern.matcher(input1);
if (matcher.find()) {
return matcher.group(1);
}
return null;
}
}
Main to test:
public static void main(String[] args) {
String input1 = "-A123456-B987-013691-000-109264821";
String input2 = "-NONE----";
String input3 = "C1234567-A1241-EF-012361-000-18273460";
System.out.println(extractFront(input1));
System.out.println(extractFront(input2));
System.out.println(extractFront(input3));
}
Output:
-A123456-B987
-NONE----
C1234567-A1241-EF
EDIT: #stubbleweb1995 added the if condition for a complete solution
We can use streams, lambdas, and predicate.
Split your input on its end-of-line character, to get an array of strings. We filter out the “NONE” lines.
For each line, we split into pieces, using the hyphen as a delimiter. This gives us an array of strings that we reassemble using only the 3 parts.
Lastly we collect into a list.
Here is some untested code to get you started.
String[] lines = input.split( "\n" ) ;
List < String > results =
Arrays
.stream( lines )
.filter( line -> ! line.contains( "-NONE-" )
.map(
line -> {
String.join(
"-" ,
Arrays.copyOf( line.split( "-" , 4 ) , 3 , String[].class )
)
}
)
.toList()
;

Stuck on how to "append a period" into an acronym

An acronym is a word formed from the initial letters of words in a set phrase. Define a method named createAcronym that takes a string parameter and returns the acronym of the string parameter. Append a period (.) after each letter in the acronym. If a word begins with a lower case letter, don't include that letter in the acronym. Then write a main program that reads a phrase from input, calls createAcronym() with the input phrase as argument, and outputs the returned acronym. Assume the input has at least one upper case letter.
Ex: If the input is:
Institute of Electrical and Electronics Engineers
the output should be:
I.E.E.E.
Ex: If the input is:
Association for computing MACHINERY
the output should be:
A.M.
The letters ACHINERY in MACHINERY don't start a word, so those letters are omitted.
The program must define and call a method:
public static String createAcronym(String userPhrase)
So far my code looks like this:
import java.util.Scanner;
public class LabProgram {
public static String createAcronym(String userPhrase) {
String[] separatedWords = userPhrase.split(" ");
String acronymAlphabets = " ";
for(int i = 0; i < separatedWords.length; ++i) {
if(Character.isUpperCase(separatedWords [i].charAt(0))) {
acronymAlphabets += Character.toUpperCase(separatedWords [i].charAt(0));
}
}
return acronymAlphabets;
}
public static void main(String[] args) {
Scanner userInput = new Scanner(System.in);
System.out.println(createAcronym(userInput.nextLine()));
}
}
The system returns the correct acronym but cant for the life of me figure out how to get the periods in the proper place. Side note: Im really good at getting them at the beginning or end of them. Any help "appending" this problem would be awesome!
Change
acronymAlphabets += Character.toUpperCase(separatedWords [i].charAt(0));
to
acronymAlphabets += Character.toUpperCase(separatedWords [i].charAt(0))+".";
Here is one way using streams.
String[] titles = {"Institute of Electrical and Electronics Engineers",
"Association for Computing MACHINERY",
"Association of Lions, and Tigers, and Bears, oh my!!"};
for (String title : titles) {
System.out.println(createAcronym(title));
}
prints
I.E.E.E
A.C.M
A.L.T.B
Split on anything that isn't a word and stream the array
filter any words not starting with an upper case character.
Map the first letter of each word to a String.
and join together with a period between each word.
public static String createAcronym(String title) {
return Arrays.stream(title.split("\\W+"))
.filter(str -> Character.isUpperCase(str.charAt(0)))
.map(str -> str.substring(0, 1))
.collect(Collectors.joining("."));
}
Here's a solution that will include a starting letter if it is capitalized, and tack on a period (".") after any included letters.
A few comments:
Line 3: a StringBuilder is an efficient, powerful way to make small additions to a String
Line 5: use a "for" loop over the words (from line 2)
Line 6: isolate the starting character of "word"
Line 8: if firstChar is upper case, go ahead and append it to the StringBuilder result (no need to uppercase it again – it is already upper case or we wouldn't have made to line 8); and any time we append a letter, always go ahead and append a ".", too
Line 12: once we're done with all of the words, conver the StringBuilder to a String, and return the result
1. public static String createAcronym(String input) {
2. String[] words = input.split(" ");
3. StringBuilder result = new StringBuilder();
4.
5. for (String word : words) {
6. char firstChar = word.charAt(0);
7. if (Character.isUpperCase(firstChar)) {
8. result.append(firstChar).append(".");
9. }
10. }
11.
12. return result.toString();
13. }
You could further improve this by editing Line 2: instead of calling split() with a single space (" "), you could instead use a regular expressions to split on any number of whitespaces, tabs, etc.
Don't you just need to append a period (.) after each letter.
Add the following line at the end of the loop.
acronymAlphabets += ".";
Footnote: I would say the Answer offered by Kaan above using StringBuilder is better. I have offered an Answer making minimal chances to the code in the question. Given the OP is a novice I believe it's unhelpful to introduce too many new ideas in a reply.
In programming and Java there's usually a dozen ways to do anything. Bombarding novices with new concepts only confuses them. One step at a time if you ask me.
Other Answers, such as the one by WJS, give a direct, correct solution to the specifics of the Question: appending . to each capital letter. My Answer here is more for fun.
tl;dr
Using code points rather than char.
System.out.println(
String.join( // Join a bunch of strings into a single `String` object.
"" , // Join the strings without any delimiter between them.
Arrays.stream( // Turn an array into a stream of elements for processing.
"Association of Lions, and Tigers, and Bears, oh my!!"
.split( " " ) // Returns an array of string objects, `String[]`.
)
.filter( ( String part ) -> ! part.isBlank() ) // Skip any string elements without significant text.
.map( ( String part ) -> part.codePoints().boxed().toList() ) // Generate a stream of `int` integer numbers representing the code point numbers of each character in the string element. Via boxing, convert those `int` primitives into `Integer` objects. Collect the `Integer` objects into a `List`.
.filter( ( List < Integer > codePoints ) -> Character.isUpperCase( codePoints.get( 0 ) ) ) // Skip any list of code points if the first code point represents a character other than an uppercase letter.
.map( ( List < Integer > codePoints ) -> Character.toString( codePoints.get( 0 ) ) + "." ) // Extract the first code point `Integer` number, turn into a `String` containing the single character represented by that code point, and append a FULL STOP.
.toList() // Collect all those generated strings (single character plus FULL STOP) to a list.
)
);
A.L.T.B.
Code points
I recommend making a habit of using code point integer numbers rather than the legacy char type. As a 16-bit value, char is physically incapable of representing most characters.
This code makes heavy use of streams. If not yet comfortable with streams, you could replace with conventional loops.
Pull apart the phrase.
System.out.println( "input = " + input );
String[] parts = input.split( " " );
if ( parts.length == 0 ) { throw new IllegalArgumentException( "Input must consist of multiple words separated by spaces." ); }
System.out.println( "parts = " + Arrays.toString( parts ) );
Convert each part of the phrase into code point integer numbers.
List < List < Integer > > codePointsOfEachPart =
Arrays
.stream( parts )
.filter( part -> ! part.isBlank() )
.map( part -> part.codePoints().boxed().toList() )
.toList();
Filter for those parts of the phrase that begin with an uppercase letter. From each of the qualifying parts, extract the first letter, append a FULL STOP, and collect to a list.
List < String > abbreviations =
codePointsOfEachPart
.stream()
.filter( ( List < Integer > codePoints ) -> Character.isUpperCase( codePoints.get( 0 ) ) )
.map( codePoints -> Character.toString( codePoints.get( 0 ) ) + "." )
.toList();
// Join the collection of UppercaseLetter + FULL STOP combos into a single piece of text.
String result = String.join( "" , abbreviations );
return result;
}
When run:
input = Association of Lions, and Tigers, and Bears, oh my!!
parts = [Association, of, , Lions,, and, Tigers,, and, Bears,, oh, my!!]
output = A.L.T.B.
Pull that all together for your copy-paste convenience.
private String acronym ( final String input )
{
// Pull apart the phrase.
System.out.println( "input = " + input );
String[] parts = input.split( " " );
if ( parts.length == 0 ) { throw new IllegalArgumentException( "Input must consist of multiple words separated by spaces." ); }
System.out.println( "parts = " + Arrays.toString( parts ) );
// Convert each part of the phrase into code point integer numbers.
List < List < Integer > > codePointsOfEachPart =
Arrays
.stream( parts )
.filter( part -> ! part.isBlank() )
.map( part -> part.codePoints().boxed().toList() )
.toList();
// Filter for those parts of the phrase that begin with an uppercase letter.
// From each of the qualifying parts, extract the first letter, append a FULL STOP, and collect to a list.
List < String > abbreviations =
codePointsOfEachPart
.stream()
.filter( ( List < Integer > codePoints ) -> Character.isUpperCase( codePoints.get( 0 ) ) )
.map( codePoints -> Character.toString( codePoints.get( 0 ) ) + "." )
.toList();
// Join the collection of UppercaseLetter + FULL STOP combos into a single piece of text.
String result = String.join( "" , abbreviations );
return result;
}
That code could be shortened, into a single line of code! Not that I necessarily recommend doing so.
System.out.println(
String.join(
"" ,
Arrays.stream(
"Association of Lions, and Tigers, and Bears, oh my!!"
.split( " " )
)
.filter( part -> ! part.isBlank() )
.map( part -> part.codePoints().boxed().toList() )
.filter( ( List < Integer > codePoints ) -> Character.isUpperCase( codePoints.get( 0 ) ) )
.map( codePoints -> Character.toString( codePoints.get( 0 ) ) + "." )
.toList()
)
);
A.L.T.B.

How to compare word within the whole string in java and increase its count?

I have a String replacedtext which is:
Replaced text:OPTIONS (ERRORS=5000)
LOAD DATA
INFILE *
APPEND INTO TABLE REPO.test
Fields terminated by "," optionally enclosed BY '"'
trailing nullcols
(
CODE ,
user,
date DATE "MM/DD/YYYY"
)
I want to count the Number of REPO. in this whole string.So,I tried in this way but it is not working.
String[] words = replacedtext.split("\\s+");
int count=0;
for(String w:words){​​​​​​​
if(w.equals("\\bREPO.\\b")){​​​​​​​
count++;
}​​​​​​​
}​​​​​​​
System.out.println ("count is :"+count);
Output coming is:
count is :0
Since in the string REPO. is seen for once. My output needs to be count is:1.
w.equals("\\bREPO.\\b") compares the content of w with \\bREPO.\\b literally and therefore you are getting a wrong result.
You can count the occurrences of REPO using the regex API.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String replacedText = """
Replaced text:OPTIONS (ERRORS=5000)
LOAD DATA
INFILE *
APPEND INTO TABLE REPO.test
Fields terminated by "," optionally enclosed BY '"'
trailing nullcols
(
CODE ,
user,
date DATE "MM/DD/YYYY"
)
""";
Matcher matcher = Pattern.compile("\\bREPO\\b").matcher(replacedText);
int count = 0;
while (matcher.find()) {
count++;
}
System.out.println(count);
}
}
Output:
1
Note that \b is used as word boundary matcher.
Java SE 9 onwards, you can use Matcher#results to get a Stream on which you can call count as shown below:
long count = matcher.results().count();
Note: I've used Text Block feature to make the text block look more readable. You can form the string in the Pre-Java15 way.
Probably 2 issues here. . is a special symbol which represents exactly one character. So your REPO.test won't match.
The second issue is equals method. You should use matcher or just filter by regex and then count.
I would do it so:
int count = Pattern.compile("\\s").splitAsStream(replacedtext)
.filter(Pattern.compile("\\bREPO\\b").asPredicate())
.count()
System.out.println(count);

Splitting string into two strings with regex

This question was asked several times before but I couldn't find an answer to my question:
I need to split a string into two strings. First part is date and the second string is text. This is what i got so far:
String test = "24.12.17 18:17 TestString";
String[] testSplit = test.split("\\d{2}.\\d{2}.\\d{2} \\d{2}:\\d{2}");
System.out.println(testSplit[0]); // "24.12.17 18:17" <-- Does not work
System.out.println(testSplit[1].trim()); // "TestString" <-- works
I can extract "TestString" but i miss the date. Is there any better (or even simpler) way? Help is highly appreciated!
Skip regex; Use three strings
You are working too hard. No need to include the date and the time together as one. Regex is tricky, and life is short.
Just use the plain String::split for three pieces, and re-assemble the date-time.
String[] pieces = "24.12.17 18:17 TestString".split( " " ) ; // Split into 3 strings.
LocalDate ld = LocalDate.parse( pieces[0] , DateTimeFormatter.ofPattern( "dd.MM.uu" ) ) ; // Parse the first string as a date value (`LocalDate`).
LocalTime lt = LocalTime.parse( pieces[1] , DateTimeFormatter.ofPattern( "HH:mm" ) ) ; // Parse the second string as a time-of-day value (`LocalTime`).
LocalDateTime ldt = LocalDateTime.of( ld , lt ) ; // Reassemble the date with the time (`LocalDateTime`).
String description = pieces[2] ; // Use the last remaining string.
See this code run live at IdeOne.com.
ldt.toString(): 2017-12-24T18:17
description: TestString
Tip: If you have any control over that input, switch to using standard ISO 8601 formats for date-time values in text. The java.time classes use the standard formats by default when generating/parsing strings.
You want to match only the separator. By matching the date, you consume it (it's thrown away).
Use a look behind, which asserts but does not consume:
test.split("(?<=^.{14}) ");
This regex means "split on a space that is preceded by 14 characters after the start of input".
Your test code now works:
String test = "24.12.17 18:17 TestString";
String[] testSplit = test.split("(?<=^.{14}) ");
System.out.println(testSplit[0]); // "24.12.17 18:17" <-- works
System.out.println(testSplit[1].trim()); // "TestString" <-- works
If your string is always in this format (and is formatted well), you do not even need to use a regex. Just split at the second space using .substring and .indexOf:
String test = "24.12.17 18:17 TestString";
int idx = test.indexOf(" ", test.indexOf(" ") + 1);
System.out.println(test.substring(0, idx));
System.out.println(test.substring(idx).trim());
See the Java demo.
If you want to make sure your string starts with a datetime value, you may use a matching approach to match the string with a pattern containing 2 capturing groups: one will capture the date and the other will capture the rest of the string:
String test = "24.12.17 18:17 TestString";
String pat = "^(\\d{2}\\.\\d{2}\\.\\d{2} \\d{2}:\\d{2})\\s(.*)";
Matcher matcher = Pattern.compile(pat, Pattern.DOTALL).matcher(test);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2).trim());
}
See the Java demo.
Details:
^ - start of string
(\\d{2}\\.\\d{2}\\.\\d{2} \\d{2}:\\d{2}) - Group 1: a datetime pattern (xx.xx.xx xx:xx-like pattern)
\\s - a whitespace (if it is optional, add * after it)
(.*) - Group 2 capturing any 0+ chars up to the end of string (. will match line breaks, too, because of the Pattern.DOTALL flag).

How to parse a date from a URL format?

My database contains URLs stored as text fields and each URL contains a representation of the date of a report, which is missing from the report itself.
So I need to parse the date from the URL field to a String representation such as:
2010-10-12
2007-01-03
2008-02-07
What's the best way to extract the dates?
Some are in this format:
http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html
http://e.com/data/invoices/2010/09/invoices-report-thursday-september-2-2010.html
http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html
http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html
http://e.com/data/invoices/2010/08/invoices-report-monday-august-30th-2010.html
http://e.com/data/invoices/2009/05/invoices-report-friday-may-8th-2009.html
http://e.com/data/invoices/2010/10/invoices-report-wednesday-october-6th-2010.html
http://e.com/data/invoices/2010/09/invoices-report-tuesday-september-21-2010.html
Note the inconsistent use of th following the day of the month in cases such as these two:
http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html
http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html
Others are in this format (with three hyphens before the date starts, no year at the end and an optional use of invoices- before report):
http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-1.html
http://e.com/data/invoices/2010/09/invoices-report---thursday-september-2.html
http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-15.html
http://e.com/data/invoices/2010/09/invoices-report---monday-september-13.html
http://e.com/data/invoices/2010/08/report---monday-august-30.html
http://e.com/data/invoices/2009/05/report---friday-may-8.html
http://e.com/data/invoices/2010/10/report---wednesday-october-6.html
http://e.com/data/invoices/2010/09/report---tuesday-september-21.html
You want a regex like this:
"^http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})"
This exploits that everything up through the /year/month/ part of the URL is always the same, and that no number follows till the day of the month. After you have that, you don't care about anything else.
The first capture group is the year, the second the month, and the third the day. The day might not have a leading zero; convert from string to integer and format as needed, or just grab the string length and, if it's not two, then concatenate it to the string "0".
As an example:
import java.util.regex.*;
class URLDate {
public static void
main(String[] args) {
String text = "http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html";
String regex = "http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(text);
if (m.find()) {
int count = m.groupCount();
System.out.format("matched with groups:\n", count);
for (int i = 0; i <= count; ++i) {
String group = m.group(i);
System.out.format("\t%d: %s\n", i, group);
}
} else {
System.out.println("failed to match!");
}
}
}
gives the output:
matched with groups:
0: http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html
1: 2010
2: 09
3: 1
(Note that to use Matcher.matches() instead of Matcher.find(), you would have to make the pattern eat the entire input string by appending .*$ to the pattern.)

Categories