regex to get two different words from a string in java - java

I will be getting the string as app1(down) and app2(up)
the words in the brackets indicate status of the app, they may be up or down depending,
now i need to use a regex to get the status of the apps like a comma seperated string
ex:ill get app1(UP) and app2(DOWN)
required result UP,DOWN

It's easy using RegEx like this:
\\((.*?)\\)
String x = "app1(UP) and app2(DOWN)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
String tmp = "";
while(m.find()) {
tmp+=(m.group(1))+",";
}
System.out.println(tmp);
Output:
UP,DOWN,
Java 8: using StringJoiner
String x = "app1(UP) and app2(DOWN)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
StringJoiner sj = new StringJoiner(",");
while(m.find()) {
sj.add((m.group(1)));
}
System.out.print(sj.toString());
Output:
UP,DOWN
(Last , is removed)

import java.util.ArrayList;
import java.util.List;
import java.util.regex.*;
public class ValidateDemo
{
public static void main(String[] args)
{
String input = "ill get app1(UP) and app2(DOWN)";
Pattern p = Pattern.compile("app[0-9]+\\(([A-Z]+)\\)");
Matcher m = p.matcher(input);
List<String> found = new ArrayList<String>();
while (m.find())
{
found.add(m.group(1));
}
System.out.println(found.toString());
}
}
my first java script, have mercy

Consider this code:
private static final Pattern RX_MATCH_APP_STATUS = Pattern.compile("\\s*(?<name>[^(\\s]+)\\((?<status>[^(\\s]+)\\)");
final String input = "app1(UP) or app2(down) let's have also app-3(DOWN)";
final Matcher m = RX_MATCH_APP_STATUS.matcher(input);
while (m.find()) {
final String name = m.group("name");
final String status = m.group("status");
System.out.printf("%s:%s\n", name, status);
}
This plucks from input line as many app status entries, as they really are there, and put each app name and its status into proper variable. It's then up to you, how you want to handle them (print or whatever).
Plus, this gives you advantage if there will come other states than UP and DOWN (like UNKNOWN) and this will still work.
Minus, if there are sentences in brackets prefixed with some name, that is actually not a name of an app and the content of the brackets is not an app state.

Use this as regex and test it on http://regexr.com/
[UP]|[DOWN]

Related

Is there a regex where if first expression is valid then check for next [duplicate]

I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}

Splitting string by new line with a condition

I am trying to split a String by \n only when it's not in my "action block".
Here is an example of a text message\n [testing](hover: actions!\nnew line!) more\nmessage I want to split when ever the \n is not inside the [](this \n should be ignored), I made a regex for it that you can see here https://regex101.com/r/RpaQ2h/1/ in the example it seems like it's working correctly so I followed up with an implementation in Java:
final List<String> lines = new ArrayList<>();
final Matcher matcher = NEW_LINE_ACTION.matcher(message);
String rest = message;
int start = 0;
while (matcher.find()) {
if (matcher.group("action") != null) continue;
final String before = message.substring(start, matcher.start());
if (!before.isEmpty()) lines.add(before.trim());
start = matcher.end();
rest = message.substring(start);
}
if (!rest.isEmpty()) lines.add(rest.trim());
return lines;
This should ignore any \n if they are inside the pattern showed above, however it never matches the "action" group, seems like when it is added to java and a \n is present it never matches it. I am a bit confused as to why, since it worked perfectly on the regex101.
Instead of checking whether the group is action, you can simply use regex replacement with the group $1 (the first capture group).
I also changed your regex to (?<action>\[[^\]]*]\([^)]*\))|(?<break>\\n) as [^\]]* doesn't backtrack (.*? backtracks and causes more steps). I did the same with [^)]*.
See code working here
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
final String regex = "(?<action>\\[[^\\]]*\\]\\([^)]*\\))|(?<break>\\\\n)";
final String string = "message\\n [testing test](hover: actions!\\nnew line!) more\\nmessage";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll("$1");
System.out.println(result);
}
}

Regex for finding http and https url from a string

I have a string which contains multiple url starting from http and https I need to fetch all those url and put into a list.
I have tried below code.
List<String> httpLinksList = new ArrayList<>();
String hyperlinkRegex = "((http:\/\/|https:\/\/)?(([a-zA-Z0-9-]){2,}\.){1,4}([a-zA-Z]){2,6}(\/([a-zA-Z-_\/\.0-9#:?=&;,]*)?)?)";
String synopsis = "This is http://stackoverflow.com/questions and https://test.com/method?param=wasd The code below catches all urls in text and returns urls in list";
Pattern pattern = Pattern.compile(hyperlinkRegex);
Matcher matcher = pattern.matcher(synopsis);
while(matcher.find()){
System.out.println(matcher.find()+" "+matcher.group(1)+" "+matcher.groupCount()+" "+matcher.group(2));
httpLinksList.add(matcher.group());
}
System.out.println(httpLinksList);
I need below result
[http://stackoverflow.com/questions,
https://test.com/method?param=wasd]
But getting below output
[https://test.com/method?param=wasd]
This regex will match all the valid urls, including FTP and other
String urlRegex = "((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*)";
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class xmlValue {
public static void main(String[] args) {
String text = "This is http://stackoverflow.com/questions and https://test.com/method?param=wasd The code below catches all urls in text and returns urls in list";
System.out.println(extractUrls(text));
}
public static List<String> extractUrls(String text)
{
List<String> containedUrls = new ArrayList<String>();
String urlRegex = "((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern pattern = Pattern.compile(urlRegex, Pattern.CASE_INSENSITIVE);
Matcher urlMatcher = pattern.matcher(text);
while (urlMatcher.find())
{
containedUrls.add(text.substring(urlMatcher.start(0),
urlMatcher.end(0)));
}
return containedUrls;
}
}
Output:
[http://stackoverflow.com/questions,
https://test.com/method?param=wasd]
credits #BullyWiiPlaza
So I know this is not exactly what you asked since you are specifically looking for regex, but I thought this would fun to try out with an indexOf variant. I will leave it here as an alternative to the regex someone comes up with:
public static void main(String[] args){
String synopsis = "This is http://stackoverflow.com/questions and https://test.com/method?param=wasd The code below catches all urls in text and returns urls in list";
ArrayList<String> list = splitUrl(synopsis);
for (String s : list) {
System.out.println(s);
}
}
public static ArrayList<String> splitUrl(String s)
{
ArrayList<String> list = new ArrayList<>();
int spaceIndex = 0;
while (true) {
int httpIndex = s.indexOf("http", spaceIndex);
if (httpIndex < 0) {
break;
}
spaceIndex = s.indexOf(" ", httpIndex);
if (spaceIndex < 0) {
list.add(s.substring(httpIndex));
break;
}
else {
list.add(s.substring(httpIndex, spaceIndex));
}
}
return list;
}
All the logic is contained in the splitUrl(String s) method, it takes in a String as a parameter and outputs the ArrayList<String> of all the split urls.
It first searches for the index of any http and then the first space that occurs after the url and substrings the difference. It then uses the space it found as the second parameter in indexOf(String, int) to start searching the String beginning after the http that was already found so it does not repeat the same ones.
Additionally a case had to be made when the http is the final part of the String as there is no space afterward. This is done when the indexOf the space returns negative, I use substring(int) instead of substring(int, int) which will take the current location and substring the rest of the String.
The loop ends when either indexOf returns with a negative, though if the space returns negative it does that final substring operation before the break.
Output:
http://stackoverflow.com/questions
https://test.com/method?param=wasd
Note: As someone mentioned in the comments too, this implementation will work with non-Latin characters such as Hiragana too, which could be an advantage over regex.

Replace pattern Java

I am making a program that allows the user to set variables and then use them in their messages such as %variable1% and I need a way of detecting the pattern which indicates a variable (%STRING%) . I am aware that I can use regex to find the patterns but am unsure how to use it to replace text.
I can also see a problem arising when using multiple variables in a single string as it may detect the space between 2 variables as a third variable
e.g. %var1%<-text that may be detected as a variable->%var2%, would this happen and is there any way to stop it?
Thanks.
A non-greedy regex would be helpful in extracting the variables which are within the 2 distinct % signs:
Pattern regex = Pattern.compile("\\%.*?\\%");
In this case if your String is %variable1%mndhokajg%variable2%" it should print
%variable1%
%variable2%
If your String is %variable1%variable2% it should print
%variable1%
%variable1%%variable2% should print
%variable1%
%variable2%
You can now manipulate/use the extracted variables for your purpose:
Code:
public static void main(String[] args) {
try {
String tag = "%variable1%%variable2%";
Pattern regex = Pattern.compile("\\%.*?\\%");
Matcher regexMatcher = regex.matcher(tag);
while (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
} catch (Exception e) {
e.printStackTrace();
}
}
Try playing around with different Strings, there can be invalid scenarios with % as part of the String but your requirement doesn't seem to be that stringent.
Oracle's tutorial on the Pattern and Matcher classes should get you started. Here is an example from the tutorial that you may be interested in:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class ReplaceDemo {
private static String REGEX = "dog";
private static String INPUT =
"The dog says meow. All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
Your second problem shouldn't happen if you use regex properly.
You can use this method for variable detection and their replacements from a passed HashMap:
// regex to detect variables
private final Pattern varRE = Pattern.compile("%([^%]+)%");
public String varReplace(String input, Map<String, String> dictionary) {
Matcher matcher = varRE.matcher( input );
// StringBuffer to hold replaced input
StringBuffer buf = new StringBuffer();
while (matcher.find()) {
// get variable's value from dictionary
String value = dictionary.get(matcher.get(1));
// if found replace the variable's value in input string
if (value != null)
matcher.appendReplacement(buf, value);
}
matcher.appendTail(buf);
return buf.toString();
}

replace StringTokenizer by String.split(..)

Is it possible to build a regexp for use with Javas Pattern.split(..) method to reproduce the StringTokenizer("...", "...", true) behaveiour?
So that the input is split to an alternating sequence of the predefined token characters and any abitrary strings running between them.
The JRE reference states for StringTokenizer it should be considered deprecated and String.split(..) could be used instead way. So it is considered possible there.
The reason I want to use split is that regular expressions are often highly optimized. The StringTokenizer for example is quite slow on the Android Platforms VM, while regex patterns are executed by optimized native code there it seems.
Considering that the documentation for split doesn't specify this behavior and has only one optional parameter that tells how large the array should be.. no you can't.
Also looking at the only other class I can think of that could have this feature - a scanner - it doesn't either. So I think the easiest would be to continue using the Tokenizer, even if it's deprecated. Better than writing your own class - while that shouldn't be too hard (quite trivial really) I can think of better ways to spend ones time.
a regex Pattern can help you
Patter p = Pattern.compile("(.*?)(\\s*)");
//put the boundary regex in between the second brackets (where the \\s* now is)
Matcher m = p.matcher(string);
int endindex=0;
while(m.find(endindex)){
//m.group(1) is the part between the pattern
//m.group(2) is the match found of the pattern
endindex = m.end();
}
//then the remainder of the string is string.substring(endindex);
import java.util.List;
import java.util.LinkedList;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Splitter {
public Splitter(String s, String delimiters) {
this.string = s;
this.delimiters = delimiters;
Pattern pattern = Pattern.compile(delimiters);
this.matcher = pattern.matcher(string);
}
public String[] split() {
String[] strs = string.split(delimiters);
String[] delims = delimiters();
if (strs.length == 0) { return new String[0];}
assert(strs.length == delims.length + 1);
List<String> output = new LinkedList<String>();
int i;
for(i = 0;i < delims.length;i++) {
output.add(strs[i]);
output.add(delims[i]);
}
output.add(strs[i]);
return output.toArray(new String[0]);
}
private String[] delimiters() {
List<String> delims = new LinkedList<String>();
while(matcher.find()) {
delims.add(string.subSequence(matcher.start(), matcher.end()).toString());
}
return delims.toArray(new String[0]);
}
public static void main(String[] args) {
Splitter s = new Splitter("a b\tc", "[ \t]");
String[] tokensanddelims = s.split();
assert(tokensanddelims.length == 5);
System.out.print(tokensanddelims[0].equals("a"));
System.out.print(tokensanddelims[1].equals(" "));
System.out.print(tokensanddelims[2].equals("b"));
System.out.print(tokensanddelims[3].equals("\t"));
System.out.print(tokensanddelims[4].equals("c"));
}
private Matcher matcher;
private String string;
private String delimiters;
}

Categories