Get Two Specific Word Using Regex and Save it to HashMap - java

I need a help,
I have a String like
LOCALHOST = https://192.168.56.1
I want to get the "LOCALHOST" and the IP address then save it to HashMap
This is my code so far, I didnt know how to use regex, please help
The output that I want is in HashMap {LOCALHOST=192.168.56.1}
public static void main(String[] args) {
try {
String line = "LOCALHOST = https://192.168.56.1";
//this should be a hash map
ArrayList<String> urls = new ArrayList<String>();
//didnt know how to get two string
Matcher m = Pattern.compile("([^ =]+)").matcher(line);
while (m.find()) {
urls.add(m.group());
}
System.out.println(urls);
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
}
}
Thank you for the help

To answer the question as per the title:
String line = "LOCALHOST = https://192.168.56.1";
Map<String, String> map = new HashMap<String, String>();
String[] parts = line.split(" *= *");
map.put(parts[0], parts[1]);
The regex splits on equals sign and consumes any spaces around it too so you don't have to trim to parts.

Try something like the following:
final Matcher m = Pattern.compile("^(.+) = https:\\/\\/(\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})$");
m.matcher(line);
final Map<String,String> map = new HashMap<String,String();
if (m.matches())
{
final String lh = m.group(1);
final String ip = m.group(2);
map.add(lh,ip);
}
Learn to use a good interactive Regular Expression editor like the one at regex101.com
/^(.+) = https:\/\/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$/m
^ Start of line
1st Capturing group (.+) 
. 1 to infinite times [greedy] Any character (except newline)
 = https:\/\/ Literal  = https://
2nd Capturing group (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) 
\d 1 to 3 times [greedy] Digit [0-9]
\. Literal .
\d 1 to 3 times [greedy] Digit [0-9]
\. Literal .
\d 1 to 3 times [greedy] Digit [0-9]
\. Literal .
\d 1 to 3 times [greedy] Digit [0-9]
$ End of line
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

String line = "LOCALHOST = https://192.168.56.1";
String []s =line.split("=");
map.put(s[0].trim(), s[1].trim());

This is very simple and does not require 'matcher/pattern' regex. Try This:
HashMap<String, String> x = new HashMap<String, String>();
String line = "LOCALHOST = https://192.168.56.1";
String[] items = line.split("=");
x.add(items[0], items[1]);

Related

Split String by | and numbers

Let's imagine I have the following strings:
String one = "123|abc|123abc";
String two = "123|ab12c|abc|456|abc|def";
String three = "123|1abc|1abc1|456|abc|wer";
String four = "123|abc|def|456|ghi|jkl|789|mno|pqr";
If I do a split on them I expect the following output:
one = ["123|abc|123abc"];
two = ["123|ab12c|abc", "456|abc|def"];
three = ["123|1abc|1abc1", "456|abc|wer"];
four = ["123|abc|def", "456|ghi|jkl", "789|mno|pqr"];
The string has the following structure:
Starts with 1 or more digits followed by a random number of (| followed by random number of characters).
When after a | it's only numbers is considered a new value.
More examples:
In - 123456|xxxxxx|zzzzzzz|xa2314|xzxczxc|1234|qwerty
Out - ["123456|xxxxxx|zzzzzzz|xa2314|xzxczxc", "1234|qwerty"]
Tried multiple variations of the following but does not work:
value.split( "\\|\\d+|\\d+" )
You may split on \|(?=\d+(?:\||$)):
List<String> nums = Arrays.asList(new String[] {
"123|abc|123abc",
"123|ab12c|abc|456|abc|def",
"123|1abc|1abc1|456|abc|wer",
"123|abc|def|456|ghi|jkl|789|mno|pqr"
});
for (String num : nums) {
String[] parts = num.split("\\|(?=\\d+(?:\\||$))");
System.out.println(num + " => " + Arrays.toString(parts));
}
This prints:
123|abc|123abc => [123|abc|123abc]
123|ab12c|abc|456|abc|def => [123|ab12c|abc, 456|abc|def]
123|1abc|1abc1|456|abc|wer => [123|1abc|1abc1, 456|abc|wer]
123|abc|def|456|ghi|jkl|789|mno|pqr => [123|abc|def, 456|ghi|jkl, 789|mno|pqr]
Instead of splitting, you can match the parts in the string:
\b\d+(?:\|(?!\d+(?:$|\|))[^|\r\n]+)*
\b A word boundary
\d+ Match 1+ digits
(?: Non capture group
\|(?!\d+(?:$|\|)) Match | and assert not only digits till either the next pipe or the end of the string
[^|\r\n]+ Match 1+ chars other than a pipe or a newline
)* Close the non capture group and optionally repeat (use + to repeat one or more times to match at least one pipe char)
Regex demo | Java demo
String regex = "\\b\\d+(?:\\|(?!\\d+(?:$|\\|))[^|\\r\\n]+)+";
String string = "123|abc|def|456|ghi|jkl|789|mno|pqr";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(string);
List<String> matches = new ArrayList<String>();
while (m.find())
matches.add(m.group());
for (String s : matches)
System.out.println(s);
Output
123|abc|def
456|ghi|jkl
789|mno|pqr

Is there a regex where if first expression is valid then check for next [duplicate]

I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}

Regular expression for price - Android

I have a string like bellow :
dfdfm;lg 2500$ jshfsnefsfz5405€mnvkjdf64rfmkd554668¢ odsfrknegj 885486¥ dsflkef 588525dollar
I am getting bellow values with this [\\d,]+\\s*\\$|[\\d,]+\\s*€|[\\d,]+\\s*¥|[\\d,]+\\s*¢|[\\d,]+\\s*dollar :
2500$
5405€
554668¢
885486¥
588525dollar
Problem : But I don't need to these $ € ¢ ¥ dollar . How I can delete these in top regex ?
Here is my method :
private String getPrice(String caption) {
String pricePattern = "[\\d,]+\\s*\\$|[\\d,]+\\s*€|[\\d,]+\\s*¥|[\\d,]+\\s*¢|[\\d,]+\\s*dollar|[\\d,]+\\s*Euro";
List<String> lstPrice = new ArrayList<>();
Pattern rPrice = Pattern.compile(pricePattern);
Matcher mPrice = rPrice.matcher(caption);
while (mPrice.find()) {
lstPrice.add(mPrice.group());
}
if (lstPrice.size() > 0) {
return lstPrice.get(0);
}
return "";
}
If you need to return all prices, make sure your getPrice method returns List<String> and adjust the regex to match the prices but capture the numbers only:
private List<String> getPrice(String caption) {
String pricePattern = "(?i)(\\d[\\d,]*)\\s*(?:[$€¥¢]|dollar|Euro)";
List<String> lstPrice = new ArrayList<>();
Pattern rPrice = Pattern.compile(pricePattern);
Matcher mPrice = rPrice.matcher(caption);
while (mPrice.find()) {
lstPrice.add(mPrice.group(1));
}
return lstPrice;
}
See the Java demo online.
String s = "dfdfm;lg 2500$ jshfsnefsfz5405€mnvkjdf64rfmkd554668¢ odsfrknegj 885486¥ dsflkef 588525dollar";
System.out.println(getPrice(s));
returns
[2500, 5405, 554668, 885486, 588525]
Pattern details:
(?i) - a case insensitive modifier (embedded flag option)
(\\d[\\d,]*) - Group 1 capturing a digit and then 0+ digits or ,
\\s* - 0+ whitespaces
(?:[$€¥¢]|dollar|Euro) - either $, €, ¥, ¢, dollar or euro (case insensitive search is enabled via (?i))
You can try with replaceAll
Replaces every subsequence of the input sequence that matches the
pattern with the given replacement string.
String pricePattern="2500$ 5405€ 554668¢ 885486¥ 588525dollar";
pricePattern= pricePattern.replaceAll("[^\\d+]", " "); //2500 5405 554668 885486 588525
Check Java Demo

What is the best regex which can split list of http headers?

My header list format is string of:
"headerName1:value1,headerName2:value2,headerName3:value3,..."
So since a comma can be present in headers, splitting using it might be a problem.
So what would be the characters that might not be present within the headers which I can use for splitting?
This is my code:
public List<Header> getHeaders(String headers) {
List<Header> headersList = new ArrayList<>();
if (!"".equals(headers)) {
String[] spam = headers.split(",");
for (String aSpam : spam) {
String[] header = aSpam.split(":",2);
if (header.length > 1) {
headersList.add(new Header(header[0], header[1]));
} else {
throw new HTTPSinkAdaptorRuntimeException("Invalid format");
}
}
return headersList;
}
My desired output is an array, {"headerName1:value1", "headerName2:value2", "headerName3:value3", ...}
The problem is: "From: Donna Doe, chief bottle washer ,TO: John Doe, chief bottle washer "
scenario like this it does not work well.
I believe you want to extract any 1+ word chars before : as key and then any number of any chars before the end of string or the first sequence of 1+ word chars followed with :.
You may consider using
(\w+):([^,]*(?:,(?!\s*\w+:)[^,]*)*)
which is an unrolled variant of (\w+):(.*?)(?=\s*\w+:|$) regex. See the regex demo.
Details:
(\w+) - Group 1 (key)
: - a colon
([^,]*(?:,(?!\s*\w+:)[^,]*)*) - Group 2 (value):
[^,]* - zero or more chars other than ,
(?:,(?!\s*\w+:)[^,]*)* - zero or more sequences of:
,(?!\s*\w+:) - comma not followed with 0+ whitespaces and then 1+ word chars + :
[^,]* - zero or more chars other than ,
The (.*?)(?=\s*\w+:|$) is more readable, but less efficient. It captures into Group 2 any 0+ chars other than line break chars (with (.*?)), but as few as possible (due to *?) up to the first occurrence of end of string ($) or 0+ whitespaces + 1 or more word chars + : (with the (?=\s*\w+:|$) positive lookahead).
See the Java demo:
Map<String,String> hash = new HashMap<>();
String s = "headerName1:va,lu,e1, headerName2:v,a,lue2,headerName3:valu,,e3,hn:dddd, ddd:val";
Pattern pattern = Pattern.compile("(\\w+):([^,]*(?:,(?!\\s*\\w+:)[^,]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
hash.put(matcher.group(1), matcher.group(2));
}
System.out.println(hash);
// => {headerName1=va,lu,e1, ddd=val, headerName2=v,a,lue2, hn=dddd, headerName3=valu,,e3}

Using Regex to ignore a pattern in java

I have a sentence: "we:PR show:V".
I want to match only those characters after ":" and before "\\s" using regex pattern matcher.
I used following pattern:
Pattern pattern=Pattern.compile("^(?!.*[\\w\\d\\:]).*$");
But it did not work.
What is the best pattern to get the output?
For a situation such as this, if you are using java, it may be easier to do something with substrings:
String input = "we:PR show:V";
String colon = ":";
String space = " ";
List<String> results = new ArrayList<String>();
int spaceLocation = -1;
int colonLocation = input.indexOf(colon);
while (colonLocation != -1) {
spaceLocation = input.indexOf(space);
spaceLocation = (spaceLocation == -1 ? input.size() : spaceLocation);
results.add(input.substring(colonLocation+1,spaceLocation);
if(spaceLocation != input.size()) {
input = input.substring(spaceLocation+1, input.size());
} else {
input = new String(); //reached the end of the string
}
}
return results;
This will be faster than trying to match on regex.
The following regex assumes that any non-whitespace characters following a colon (in turn preceded by non-colon characters) are a valid match:
[^:]+:(\S+)(?:\s+|$)
Use like:
String input = "we:PR show:V";
Pattern pattern = Pattern.compile("[^:]+:(\\S+)(?:\\s+|$)");
Matcher matcher = pattern.matcher(input);
int start = 0;
while (matcher.find(start)) {
String match = matcher.group(1); // = "PR" then "V"
// Do stuff with match
start = matcher.end( );
}
The pattern matches, in order:
At least one character that isn't a colon.
A colon.
At least non-whitespace character (our match).
At least one whitespace character, or the end of input.
The loop continues as long as the regex matches an item in the string, beginning at the index start, which is always adjusted to point to after the end of the current match.

Categories