Java - split a string to different fields using regex

Java - split a string to different fields using regex - java

Does anyone have an idea how I can split a string returned by a WS to different strings?
String wsResult = "SONACOM RC, RUE DES ETOILES N. 20, 75250 PARIS (MI)";
I'm trying to split it into:
String name = "SONACOM RC";
String adress = "RUE DES ETOILES N. 20";
String postalCode = "75250";
String city = "PARIS";
N.B: the return of the WS changes only what is inside of my parameters
Thank you in advance for your help

You could capture your data in 4 capturing groups. Your provided example uses uppercase characters, which you can match with [A-Z].
If you want to match also lowercase characters, digits and an underscore, you could replace [A-Z] or [A-Z\d] with \w.
You can go about this in multiple ways. An approach could be:
([A-Z ]+), +([A-Z\d .]+), +(\d+) +([A-Z\d() ]+)
Explanation
Group 1: match one or more uppercase characters or a whitespace ([A-Z ]+)
Match a comma and one or more whitespaces , +
Group 2: match one or more uppercase characters or digit or whitespace or dot ([A-Z\d .]+)
Match a comma and one or more whitespaces , +
Group 3: match one or more digits (\d+)
Match one or more whitespaces +
Group 4: match one or more uppercase characters or digit or open/close parenthesis or whitespace ([A-Z\d() ]+)
Output in Java

One easy way to split it as you'd like is using wsResult.split(","). However, you'll have to add a comma between 75250 and Paris:
String wsResult = "SONACOM RC, RUE DES ETOILES N. 20, 75250, PARIS (MI)";
String[] temp = wsResult.split(",");
String name = temp[0];
String adress = temp[1];
String postalCode = temp[2];
String city = temp[3];
Using that you will get the output you're looking for.
EDIT
Another way to get your output without adding a comma would be to do this (using the code above too):
for(int i = 1; i<postalCode.length(); i++){
if(postalCode.charAt(i) == ' ') {
city = postalCode.substring(i,postalCode.length());
postalCode = postalCode.substring(0,i);
break;
}
}
For more information check the String class in the API Java and this Stack Overflow question.

Related

String format into specific pattern

Is there any pretty and flexible way to format String data into specific pattern, for example:
data input -> 0123456789
data output <- 012345/678/9
I did it by cutting String into multiple parts, but I'm searching for any more suitable way.

Assuming you want the last and 4th-2nd last in groups:
String formatted = str.replaceAll("(...)(.)$", "/$1/$2");
This captures the parts you want in groups and replaces them with intervening slashes.

You can use replaceAll with regex to match multiple groups like so :
String text = "0123456789";
text = text.replaceAll("(\\d{6})(\\d{3})(.*)", "$1/$2/$3");
System.out.println(text);
Output
012345/678/9
details
(\d{6}) group one match 6 digits
(\d{3}) group two match 3 digits
(.*) group three rest of your string
$1/$2/$3 replace with the group 1 followed by / followed by group 2 followed by / followed by group 3

You can use StringBuilder's insert to insert characters at a certain index:
String input = "0123456789";
String output = new StringBuilder(input)
.insert(6, "/")
.insert(10, "/")
.toString();
System.out.println(output); // 012345/678/9

Java replace strings between two commas

String = "9,3,5,*****,1,2,3"
I'd like to simply access "5", which is between two commas, and right before "*****"; then only replace this "5" to other value.
How could I do this in Java?

You can try using the following regex replacement:
String input = "9,3,5,*****,1,2,3";
input = input.replaceAll("[^,]*,\\*{5}", "X,*****");
Here is an explanation of the regex:
[^,]*, match any number of non-comma characters, followed by one comma
\\*{5} followed by five asterisks
This means to match whatever CSV term plus a comma comes before the five asterisks in your string. We then replace this with what you want, along with the five stars in the original pattern.
Demo here:
Rextester

I'd use a regular expression with a lookahead, to find a string of digits that precedes ",*****", and replace it with the new value. The regular expression you're looking for would be \d+(?=,\*{5}) - that is, one or more digits, with a lookahead consisting of a comma and five asterisks. So you'd write
newString = oldString.replaceAll("\\d+(?=,\\*{5})", "replacement");
Here is an explanation of the regex pattern used in the replacement:
\\d+ match any numbers of digits, but only when
(?=,\\*{5}) we can lookahead and assert that what follows immediately
is a single comma followed by five asterisks
It is important to note that the lookahead (?=,\\*{5}) asserts but does not consume. Hence, we can ignore it with regards to the replacement.

I considered newstr be "6"
String str = "9,3,5,*****,1,2,3";
char newstr = '6';
str = str.replace(str.charAt(str.indexOf(",*") - 1), newstr);
Also if you are not sure about str length check for IndexOutOfBoundException
and handle it

You could split on , and then join with a , (after replacing 5 with the desired value - say X). Like,
String[] arr = "9,3,5,*****,1,2,3".split(",");
arr[2] = "X";
System.out.println(String.join(",", arr));
Which outputs
9,3,X,*****,1,2,3

you can use spit() for replacing a string
String str = "9,3,5,*****,1,2,3";
String[] myStrings = str.split(",");
String str1 = myStrings[2];

Regex add space between all punctuation

I need to add spaces between all punctuation in a string.
\\ "Hello: World." -> "Hello : World ."
\\ "It's 9:00?" -> "It ' s 9 : 00 ?"
\\ "1.B,3.D!" -> "1 . B , 3 . D !"
I think a regex is the way to go, matching all non-punctuation [a-ZA-Z\\d]+, adding a space before and/or after, then extracting the remainder matching all punctuation [^a-ZA-Z\\d]+.
But I don't know how to (recursively?) call this regex. Looking at the first example, the regex will only match the "Hello". I was thinking of just building a new string by continuously removing and appending the first instance of the matched regex, while the original string is not empty.
private String addSpacesBeforePunctuation(String s) {
StringBuilder builder = new StringBuilder();
final String nonpunctuation = "[a-zA-Z\\d]+";
final String punctuation = "[^a-zA-Z\\d]+";
String found;
while (!s.isEmpty()) {
// regex stuff goes here
found = ???; // found group from respective regex goes here
builder.append(found);
builder.append(" ");
s = s.replaceFirst(found, "");
}
return builder.toString().trim();
}
However this doesn't feel like the right way to go... I think I'm over complicating things...

You can use lookarounds based regex using punctuation property \p{Punct} in Java:
str = str.replaceAll("(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)", " ");
(?<=\\S) Asserts if prev char is not a white-space
(?<=\\p{Punct}) asserts a position if previous char is a punctuation char
(?=\\p{Punct}) asserts a position if next char is a punctuation char
(?=\\S) Asserts if next char is not a white-space
IdeOne Demo

When you see a punctuation mark, you have four possibilities:
Punctuation is surrounded by spaces
Punctuation is preceded by a space
Punctuation is followed by a space
Punctuation is neither preceded nor followed by a space.
Here is code that does the replacement properly:
String ss = s
.replaceAll("(?<=\\S)\\p{Punct}", " $0")
.replaceAll("\\p{Punct}(?=\\S)", "$0 ");
It uses two expressions - one matching the number 2, and one matching the number 3. Since the expressions are applied on top of each other, they take care of the number 4 as well. The number 1 requires no change.
Demo.

How to replace last letter to another letter in java using regular expression

i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak

Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"

You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");

Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.

if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}

Java regex of string

I want to parse strings to get fields from them. The format of the string (which come from a dataset) is as so (the -> represents a tab, and the * represents a space):
Date(yyyymmdd)->Date(yyyymmdd)->*City,*State*-->Description
I am only interested in the 1st date and the State. I tried regex like this:
String txt="19951010 19951011 Red City, WI Description";
String re1="(\\d+)"; // Integer Number 1
String re2=".*?"; // Non-greedy match on filler
String re3="(?:[a-z][a-z]+)"; // Uninteresting: word
String re4=".*?"; // Non-greedy match on filler
String re5="(?:[a-z][a-z]+)"; // Uninteresting: word
String re6=".*?"; // Non-greedy match on filler
String re7="((?:[a-z][a-z]+))"; // Word 1
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String int1=m.group(1);
String word1=m.group(2);
System.out.print("("+int1.toString()+")"+"("+word1.toString()+")"+"\n");
}
It works fine id the city has two words (Red City) then the State is extracted properly, but if the City only has one word it does not work. I can't figure it out, I don't need to use regex and am open to any other suggestions. Thanks.

Problem:
Your problem is that each component of your current regex essentially matches a number or [a-z] word, separated by anything that isn't [a-z], which includes commas. So your parts for a two word city are:
Input:
19951010 19951011 Red City, WI Description
Your components:
String re1="(\\d+)"; // Integer Number 1
String re2=".*?"; // Non-greedy match on filler
String re3="(?:[a-z][a-z]+)"; // Uninteresting: word
String re4=".*?"; // Non-greedy match on filler
String re5="(?:[a-z][a-z]+)"; // Uninteresting: word
String re6=".*?"; // Non-greedy match on filler
String re7="((?:[a-z][a-z]+))"; // Word 1
What they match:
re1: "19951010"
re2: " 19951011 "
re3: "Red" (stops at non-letter, e.g. whitespace)
re4: " "
re5: "City" (stops at non-letter, e.g. the comma)
re6: ", " (stops at word character)
re7: "WI"
But with a one-word city:
Input:
19951010 19951011 Pittsburgh, PA Description
What they match:
re1: "19951010"
re2: " 19951011 "
re3: "Pittsburgh" (stops at non-letter, e.g. the comma)
re4: ","
re5: "PA" (stops at non-letter, e.g. whitespace)
re6: " " (stops at word character)
re7: "Description" (but you want this to be the state)
Solution:
You should do two things. First, simplify your regex a bit; you are going kind of crazy specifying greedy vs. reluctant, etc. Just use greedy patterns. Second, think about the simplest way to express your rules.
Your rules really are:
Date
A bunch of characters that aren't a comma (including second date and city name).
A comma.
State (one word).
So build a regex that sticks to that. You can, as you are doing now, take a shortcut by skipping the second number, but note that you do lose support for cities that start with numbers (which probably won't happen). Also you don't care about the state. So, e.g.:
String re1 = "(\\d+)"; // match first number
String re2 = "[^,]*"; // skip everything thats not a comma
String re3 = ","; // skip the comma
String re4 = "[\\s]*"; // skip whitespace
String re5 = "([a-z]+)"; // match letters (state)
String regex = re1 + re2 + re3 + re4 + re5;
There are other options as well, but I personally find regular expressions to be very straightforward for things like this. You could use various combinations of split(), as other posters have detailed. You could directly look for commas and whitespace with indexOf() and pull out substrings. You could even convince a Scanner or perhaps a StringTokenizer or StreamTokenizer to work for you. However, regular expressions exist to solve problems like this and are a good tool for the job.
Here is an example with StringTokenizer:
StringTokenizer t = new StringTokenizer(txt, " \t");
String date = t.nextToken();
t.nextToken(); // skip second date
t.nextToken(","); // change delimiter to comma and skip city
t.nextToken(" \t"); // back to whitespace and skip comma
String state = t.nextToken();
Still, I feel a regex expresses the rules more cleanly.
By the way, for future debugging, sometimes it helps to just print out all of the capture groups, this can give you insight into what is matching what. A good technique is to put every component of your regex in a capture group temporarily, then print them all out.

no need to be so complex with this. you can split on whitespace!
//s is your string
String[] first = s.split("\\s*,\\s*")
String[] firstHalf = first[0].split("\\s+")
String[] secondHalf = first[1].split("\\s+")
String date = firstHalf[0]
String state = secondHalf[0]
now you have youre date and your state! do with them what you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - split a string to different fields using regex - java

Related

String format into specific pattern

Java replace strings between two commas

Regex add space between all punctuation

How to replace last letter to another letter in java using regular expression

Java regex of string

Categories

Resources