Regex Pattern to avoid : and , in the strings - java

I have a string which comes from the DB.
the string is something like this:-
ABC:def,ghi:jkl,hfh:fhgh,ahf:jasg
In short String:String, and it repeats for large values.
I need to parse this string to get only the words without any : or , and store each word in ArrayList
I can do it using split function(twice) but I figured out that using regex I can do it one go and get the arraylist..
String strLine="category:hello,good:bye,wel:come";
Pattern titlePattern = Pattern.compile("[a-z]");
Matcher titleMatcher = titlePattern.matcher(strLine);
int i=0;
while(titleMatcher.find())
{
i=titleMatcher.start();
System.out.println(strLine.charAt(i));
}
However it is not giving me proper results..It ends up giving me index of match found and then I need to append it which is not so logical and efficient,.
Is there any way around..

String strLine="category:hello,good:bye,wel:come";
String a[] = strLine.split("[,:]");
for(String s :a)
System.out.println(s);

Use java StringTokenizer
Sample:
StringTokenizer st = new StringTokenizer(in, ":,");
while(st.hasMoreTokens())
System.out.println(st.nextToken());

Even if you can use a regular expression to parse the entire string at once, I think it would be less readable than splitting it with multiple steps.

Related

How to split a string in between square brackets like below

I want to split the string
[{"starDate":"","endDate":"","relativeDays":,"cronExpression":""},{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}]
to
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
Maybe this one-liner work out for you, since it's not clear whether you want two strings or one string with explicit line breaks
String str = "[{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"},{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"}]";
str = String.join("},", str.replaceAll("^\\[|]$+", "").split("},"));
You should ideally do this the proper way using JSON parsing if it applies.
Meanwhile, for this very specific case, you can use a Regular Expression to extract the parts you need (Example in Java) :
String input = "[{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"},{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"}]";
Matcher matcher = Pattern.compile("(\\{[^\\}]*\\})").matcher(input);
while( matcher.find() )
System.out.println(matcher.group(1));
This will result in :
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}

Split a string of multiple sentences into single sentences and surround them with html tags

I am a Java beginner and currently looking for a method to Split a String message into substrings, based on delimiter ( . ). Ideally I have single sentences then and I want to wrap each sentence in HTML tags, i. e. <p></p>.
I tried the following with BreakIterator class:
BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
List<String> sentences = new ArrayList<String>();
iterator.setText(message);
int start = iterator.first();
String newMessage= "";
for (int end = iterator.next();
end != BreakIterator.DONE;
start = end, end = iterator.next()) {
newMessage= "<p>"+ message.substring(start,end) + "</p>";
sentences.add(newMessage);
}
This gives back one sentence. I am stuck here, I also want to wrap each number in a each sentence.
The String I have contains something like:
String message = "Hello, John. My phone number is: 02365897458.
Please call me tomorrow morning, at 8 am."
The output should be:
String newMessage = "<p>Hello, John.</p><p>My phone number is:
<number>02365897458</number>.
</p><p>Please call me tomorrow morning, at 8 am.</p>"
Is there a possibility to achieve this?
Try the split method on Java String. You can split on . and it will return an array of Strings.
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-
This can easily be done using the StringTokenizer class, along with the StringBuilder class:
String message = SOME_STRING;
StringBuilder builder = new StringBuilder();
StringTokenizer tokenizer = new StringTokenizer(message, ".");
while(tokenizer.hasMoreTokens()) {
builder.append("<p>");
builder.append(tokenizer.nextToken());
builder.append("</p>");
}
return builder.toString();
You can add more delimiters as required for various tags.
Surrounding sentences could be archived by adding a <p> at the start, a </p> at the end and replacing each full-stop with .</p><p>. Take a look at the replace method for strings.
And to add the number tag, you could use a regex replace. The replaceAll method and a regex like [0-9]+, depending on what your numbers look like, can do that.
Something similar to this should work (untested):
newMessage = "<p>" + message.replace(".", ".</p><p>")
.replaceAll("([0-9]+)", "<number>$1</number>") +
"</p>"
As said above, you can use the split method. Because you're splitting on dots be sure to escape this in your regex. A simple example (there are other ways to keep the delimiter but I've done it like this for simplicity as you're beginning);
String toSplit = "Hello, John. My phone number is: 02365897458. Please call me tomorrow morning, at 8 am.";
String[] tokens = toSplit.split("\\.");
for(String token : tokens) {
token = "<p>" + token + ".</p>";
}

Split a string in Java, but keep the delimiters inside each new string

I can't seem to find an answer for this one. What I want to do is to split a string in Java, but I want to keep the delimiters inside each string. For example, if I had the following string:
word1{word2}[word3](word4)"word5"'word6'
The array of new strings would have to be something like this:
["word1", "{word2}", "[word3]", "(word4)", "\"word5\"", "\'word6\'"]
How can I achieve this throughout Regex or other form? I'm still learning Regex in Java, so I tried some things, as discussed in here for example: How to split a string, but also keep the delimiters?
but I'm not getting the results I expect.
I have this delimiter:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
And then this method:
private String[] splitLine() { return tokenFactor.split(String.format(WITH_DELIMITER, "\\(|\\)|\\[|\\]|\\{|\\}|\"|\'")); }
But that code splits the delimiters as individual strings, which is not what I want
Can anyone please help me?!! Thanks!
A solution using Pattern and regex :
I will catch every word alone, or words with one element before and after the String
String str = "word1{word2}[word3](word4)\"word5\"'word6'";
Matcher m = Pattern.compile("(([{\\[(\"']\\w+[}\\])\"'])|(\\w+))").matcher(str);
List<String> matches = new ArrayList<>();
while (m.find())
matches.add(m.group());
String[] matchesArray = matches.toArray(new String[0]);
System.out.println(Arrays.toString(matchesArray));
I gave the way to have it in the an array, bu you can stop with the list
Regex demo

Split a string based on pattern and merge it back

I need to split a string based on a pattern and again i need to merge it back on a portion of string.
for ex: Below is the actual and expected strings.
String actualstr="abc.def.ghi.jkl.mno";
String expectedstr="abc.mno";
When i use below, i can store in a Array and iterate over to get it back. Is there anyway it can be done simple and efficient than below.
String[] splited = actualstr.split("[\\.\\.\\.\\.\\.\\s]+");
Though i can acess the string based on index, is there any other way to do this easily. Please advise.
You do not understand how regexes work.
Here is your regex without the escapes: [\.\.\.\.\.\s]+
You have a character class ([]). Which means there is no reason to have more than one . in it. You also don't need to escape .s in a char class.
Here is an equivalent regex to your regex: [.\s]+. As a Java String that's: "[.\\s]+".
You can do .split("regex") on your string to get an array. It's very simple to get a solution from that point.
I would use a replaceAll in this case
String actualstr="abc.def.ghi.jkl.mno";
String str = actualstr.replaceAll("\\..*\\.", ".");
This will replace everything with the first and last . with a .
You could also use split
String[] parts = actualString.split("\\.");
string str = parts[0]+"."+parts[parts.length-1]; // first and last word
public static String merge(String string, String delimiter, int... partnumbers)
{
String[] parts = string.split(delimiter);
String result = "";
for ( int x = 0 ; x < partnumbers.length ; x ++ )
{
result += result.length() > 0 ? delimiter.replaceAll("\\\\","") : "";
result += parts[partnumbers[x]];
}
return result;
}
and then use it like:
merge("abc.def.ghi.jkl.mno", "\\.", 0, 4);
I would do it this way
Pattern pattern = Pattern.compile("(\\w*\\.).*\\.(\\w*)");
Matcher matcher = pattern.matcher("abc.def.ghi.jkl.mno");
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
If you can cache the result of
Pattern.compile("(\\w*\\.).*\\.(\\w*)")
and reuse "pattern" all over again this code will be very efficient as pattern compilation is the most expensive. java.lang.String.split() method that other answers suggest uses same Pattern.compile() internally if the pattern length is greater then 1. Meaning that it will do this expensive operation of Pattern compilation on each invocation of the method. See java.util.regex - importance of Pattern.compile()?. So it is much better to have the Pattern compiled and cached and reused.
matcher.group(1) refers to the first group of () which is "(\w*\.)"
matcher.group(2) refers to the second one which is "(\w*)"
even though we don't use it here but just to note that group(0) is the match for the whole regex.

regular expression to split the string in java

I want to split the string say [AO_12345678, Real Estate] into AO_12345678 and Real Estate
how can I do this in Java using regex?
main issue m facing is in avoiding "[" and "]"
please help
Does it really have to be regex?
if not:
String s = "[AO_12345678, Real Estate]";
String[] split = s.substring(1, s.length()-1).split(", ");
I'd go the pragmatic way:
String org = "[AO_12345678, Real Estate]";
String plain = null;
if(org.startsWith("[") {
if(org.endsWith("]") {
plain = org.subString(1, org.length());
} else {
plain = org.subString(1, org.length() + 1);
}
}
String[] result = org.split(",");
If the string is always surrounded with '[]' you can just substring it without checking.
One easy way, assuming the format of all your inputs is consistent, is to ignore regex altogether and just split it. Something like the following would work:
String[] parts = input.split(","); // parts is ["[AO_12345678", "Real Estate]"]
String firstWithoutBrace = parts[0].substring(1);
String secondWithoutBrace = parts[1].substring(0, parts[1].length() - 1);
String first = firstWithoutBrace.trim();
String second = secondWithoutBrace.trim();
Of course you can tailor this as you wish - you might want to check whether the braces are present before removing them, for example. Or you might want to keep any spaces before the comma as part of the first string. This should give you a basis to modify to your specific requirements however.
And in a simple case like this I'd much prefer code like the above to a regex that extracted the two strings - I consider the former much clearer!
you can also use StringTokenizer. Here is the code:
String str="[AO_12345678, Real Estate]"
StringTokenizer st=new StringTokenizer(str,"[],",false);
String s1 = st.nextToken();
String s2 = st.nextToken();
s1=AO_12345678
s1=Real Estate
Refer to javadocs for reading about StringTokenizer
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
Another option using regular expressions (RE) capturing groups:
private static void extract(String text) {
Pattern pattern = Pattern.compile("\\[(.*),\\s*(.*)\\]");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) { // or .matches for matching the whole text
String id = matcher.group(1);
String name = matcher.group(2);
// do something with id and name
System.out.printf("ID: %s%nName: %s%n", id, name);
}
}
If speed/memory is a concern, the RE can be optimized to (using Possessive quantifiers instead of Greedy ones)
"\\[([^,]*+),\\s*+([^\\]]*+)\\]"

Categories