I am making a program that allows the user to set variables and then use them in their messages such as %variable1% and I need a way of detecting the pattern which indicates a variable (%STRING%) . I am aware that I can use regex to find the patterns but am unsure how to use it to replace text.
I can also see a problem arising when using multiple variables in a single string as it may detect the space between 2 variables as a third variable
e.g. %var1%<-text that may be detected as a variable->%var2%, would this happen and is there any way to stop it?
Thanks.
A non-greedy regex would be helpful in extracting the variables which are within the 2 distinct % signs:
Pattern regex = Pattern.compile("\\%.*?\\%");
In this case if your String is %variable1%mndhokajg%variable2%" it should print
%variable1%
%variable2%
If your String is %variable1%variable2% it should print
%variable1%
%variable1%%variable2% should print
%variable1%
%variable2%
You can now manipulate/use the extracted variables for your purpose:
Code:
public static void main(String[] args) {
try {
String tag = "%variable1%%variable2%";
Pattern regex = Pattern.compile("\\%.*?\\%");
Matcher regexMatcher = regex.matcher(tag);
while (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
} catch (Exception e) {
e.printStackTrace();
}
}
Try playing around with different Strings, there can be invalid scenarios with % as part of the String but your requirement doesn't seem to be that stringent.
Oracle's tutorial on the Pattern and Matcher classes should get you started. Here is an example from the tutorial that you may be interested in:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class ReplaceDemo {
private static String REGEX = "dog";
private static String INPUT =
"The dog says meow. All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
Your second problem shouldn't happen if you use regex properly.
You can use this method for variable detection and their replacements from a passed HashMap:
// regex to detect variables
private final Pattern varRE = Pattern.compile("%([^%]+)%");
public String varReplace(String input, Map<String, String> dictionary) {
Matcher matcher = varRE.matcher( input );
// StringBuffer to hold replaced input
StringBuffer buf = new StringBuffer();
while (matcher.find()) {
// get variable's value from dictionary
String value = dictionary.get(matcher.get(1));
// if found replace the variable's value in input string
if (value != null)
matcher.appendReplacement(buf, value);
}
matcher.appendTail(buf);
return buf.toString();
}
Related
I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}
I will be getting the string as app1(down) and app2(up)
the words in the brackets indicate status of the app, they may be up or down depending,
now i need to use a regex to get the status of the apps like a comma seperated string
ex:ill get app1(UP) and app2(DOWN)
required result UP,DOWN
It's easy using RegEx like this:
\\((.*?)\\)
String x = "app1(UP) and app2(DOWN)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
String tmp = "";
while(m.find()) {
tmp+=(m.group(1))+",";
}
System.out.println(tmp);
Output:
UP,DOWN,
Java 8: using StringJoiner
String x = "app1(UP) and app2(DOWN)";
Matcher m = Pattern.compile("\\((.*?)\\)").matcher(x);
StringJoiner sj = new StringJoiner(",");
while(m.find()) {
sj.add((m.group(1)));
}
System.out.print(sj.toString());
Output:
UP,DOWN
(Last , is removed)
import java.util.ArrayList;
import java.util.List;
import java.util.regex.*;
public class ValidateDemo
{
public static void main(String[] args)
{
String input = "ill get app1(UP) and app2(DOWN)";
Pattern p = Pattern.compile("app[0-9]+\\(([A-Z]+)\\)");
Matcher m = p.matcher(input);
List<String> found = new ArrayList<String>();
while (m.find())
{
found.add(m.group(1));
}
System.out.println(found.toString());
}
}
my first java script, have mercy
Consider this code:
private static final Pattern RX_MATCH_APP_STATUS = Pattern.compile("\\s*(?<name>[^(\\s]+)\\((?<status>[^(\\s]+)\\)");
final String input = "app1(UP) or app2(down) let's have also app-3(DOWN)";
final Matcher m = RX_MATCH_APP_STATUS.matcher(input);
while (m.find()) {
final String name = m.group("name");
final String status = m.group("status");
System.out.printf("%s:%s\n", name, status);
}
This plucks from input line as many app status entries, as they really are there, and put each app name and its status into proper variable. It's then up to you, how you want to handle them (print or whatever).
Plus, this gives you advantage if there will come other states than UP and DOWN (like UNKNOWN) and this will still work.
Minus, if there are sentences in brackets prefixed with some name, that is actually not a name of an app and the content of the brackets is not an app state.
Use this as regex and test it on http://regexr.com/
[UP]|[DOWN]
From a given string, am trying to replace a pattern such as "sometext.othertext.lasttext" with "lasttext". Is this possible in Java with Regex replace? If yes, how? Thanks in advance.
I tried
"hellow.world".replaceAll("(.*)\\.(.*)", "$2")
which results in world. But, I want to replace any such arbitrary sequence. For instance com.google.code should be replace with code and com.facebook should be replaced with facebook.
Just to add, a test input is:
if (com.google.code) then
and the test output should be:
if (code) then
Thanks.
I believe this is what you are looking for, if you're trying to avoid String methods. It can be made more succinct, but I'm hoping this will give you a better understanding.
As others suggested, String methods are cleaner.
class Split {
public static void main (String[] args) {
String inputString = "if (com.google.code) then";
Pattern p=Pattern.compile("((?<=\\()[^}]*(?=\\)))"); // Find text within parenthesis
Pattern p2 = Pattern.compile("(\\w+)(\\))"); // Find last portion of text between . and )
Matcher m = p.matcher(inputString);
Matcher m2 = p2.matcher(inputString);
String in2 = "";
if (m2.find())
in2=m2.group(1); // else ... error checking
inputString = m.replaceAll(in2); // do whatcha need to do
}
}
If the parenthesis aren't the concern, use this.
class Split {
public static void main (String[] args) {
String in = "if (com.google.code) then";
Pattern p = Pattern.compile("(\\w+)(\\))");
Matcher m = p.matcher(in);
if(m.find())
in = m.group(1);
System.out.println(in); // or whatever
}
}
Use:
str.replaceAll(".*\\.(\\w+)$", "$1")
Explanation here
I am trying to extract text using regex but it is not working. Although my regex work fine on regex validators.
public class HelloWorld {
public static void main(String []args){
String PATTERN1 = "F\\{([\\w\\s&]*)\\}";
String PATTERN2 = "{([\\w\\s&]*)\\}";
String src = "F{403}#{Title1}";
List<String> fvalues = Arrays.asList(src.split("#"));
System.out.println(fieldExtract(fvalues.get(0), PATTERN1));
System.out.println(fieldExtract(fvalues.get(1), PATTERN2));
}
private static String fieldExtract(String src, String ptrn) {
System.out.println(src);
System.out.println(ptrn);
Pattern pattern = Pattern.compile(ptrn);
Matcher matcher = pattern.matcher(src);
return matcher.group(1);
}
}
Why not use:
Pattern regex = Pattern.compile("F\\{([\\d\\s&]*)\\}#\\{([\\s\\w&]*)\\}");
To get both ?
This way the number will be in group 1 and the title in group 2.
Another thing if you're going to compile the regex (which can be helpful to performance) at least make the regex object static so that it doesn't get compiled each time you call the function (which kind of misses the whole pre-compilation point :) )
Basic demo here.
First problem:
String PATTERN2 = "\\{([\\w\\s&]*)\\}"; // quote '{'
Second problem:
Matcher matcher = pattern.matcher(src);
if( matcher.matches() ){
return matcher.group(1);
} else ...
The Matcher must be asked to plough the field, otherwise you can't harvest the results.
i have a file with lines like:
string1 (tab) sting2 (tab) string3 (tab) string4
I want to get from every line, string3... All i now from the lines is that string3 is between the second and the third tab character.
is it possible to take it with a pattern like
Pattern pat = Pattern.compile(".\t.\t.\t.");
String string3 = tempValue.split("\\t")[2];
It sounds like you just want:
foreach (String line in lines) {
String[] bits = line.split("\t");
if (bits.length != 4) {
// Handle appropriately, probably throwing an exception
// or at least logging and then ignoring the line (using a continue
// statement)
}
String third = bits[2];
// Use...
}
(You can escape the string so that the regex engine has to parse the backslash-t as tab, but you don't have to. The above works fine.)
Another alternative to the built-in String.split method using a regex is the Guava Splitter class. Probably not necessary here, but worth being aware of.
EDIT: As noted in comments, if you're going to repeatedly use the same pattern, it's more efficient to compile a single Pattern and use Pattern.split:
private static final Pattern TAB_SPLITTER = Pattern.compile("\t");
...
String[] bits = TAB_SPLITTER.split(line);
If you want a regex which captures the third field only and nothing else, you could use the following:
String regex = "(?:[^\\t]*)\\t(?:[^\\t]*)\\t([^\\t]*)\\t(?:[^\\t]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.err.println(matcher.group(1));
}
I don't know whether this would perform any better than split("\\t") for parsing a large file.
UPDATE
I was curious to see how the simple split versus the more explicit regex would perform, so I tested three different parser implementations.
/** Simple split parser */
static class SplitParser implements Parser {
public String parse(String line) {
String[] fields = line.split("\\t");
if (fields.length == 4) {
return fields[2];
}
return null;
}
}
/** Split parser, but with compiled pattern */
static class CompiledSplitParser implements Parser {
private static final String regex = "\\t";
private static final Pattern pattern = Pattern.compile(regex);
public String parse(String line) {
String[] fields = pattern.split(line);
if (fields.length == 4) {
return fields[2];
}
return null;
}
}
/** Regex group parser */
static class RegexParser implements Parser {
private static final String regex = "(?:[^\\t]*)\\t(?:[^\\t]*)\\t([^\\t]*)\\t(?:[^\\t]*)";
private static final Pattern pattern = Pattern.compile(regex);
public String parse(String line) {
Matcher m = pattern.matcher(line);
if (m.matches()) {
return m.group(1);
}
return null;
}
}
I ran each ten times against the same million line file. Here are the average results:
split: 2768.8 ms
compiled split: 1041.5 ms
group regex: 1015.5 ms
The clear conclusion is that it is important to compile your pattern, rather than rely on String.split, if you are going to use it repeatedly.
The result on compiled split versus group regex is not conclusive based on this testing. And probably the regex could be tweaked further for performance.
UPDATE
A further simple optimization is to re-use the Matcher rather than create one per loop iteration.
static class RegexParser implements Parser {
private static final String regex = "(?:[^\\t]*)\\t(?:[^\\t]*)\\t([^\\t]*)\\t(?:[^\\t]*)";
private static final Pattern pattern = Pattern.compile(regex);
// Matcher is not thread-safe...
private Matcher matcher = pattern.matcher("");
// ... so this method is no-longer thread-safe
public String parse(String line) {
matcher = matcher.reset(line);
if (matcher.matches()) {
return matcher.group(1);
}
return null;
}
}