Java String tokens - java

I have a string line
String user_name = "id=123 user=aron name=aron app=application";
and I have a list that contains: {user,cuser,suser}
And i have to get the user part from string. So i have code like this
List<String> userName = Config.getConfig().getList(Configuration.ATT_CEF_USER_NAME);
String result = null;
for (String param: user_name .split("\\s", 0)){
for(String user: userName ){
String userParam = user.concat("=.*");
if (param.matches(userParam )) {
result = param.split("=")[1];
}
}
}
But the problem is that if the String contains spaces in the user_name, It do not work.
For ex:
String user_name = "id=123 user=aron nicols name=aron app=application";
Here user has a value aron nicols which contain spaces. How can I write a code that can get me exact user value i.e. aron nicols

If you want to split only on spaces that are right before tokens which have = righ after it such as user=... then maybe add look ahead condition like
split("\\s(?=\\S*=)")
This regex will split on
\\s space
(?=\\S*=) which has zero or more * non-space \\S characters which ends with = after it. Also look-ahead (?=...) is zero-length match which means part matched by it will not be included in in result so split will not split on it.
Demo:
String user_name = "id=123 user=aron nicols name=aron app=application";
for (String s : user_name.split("\\s(?=\\S*=)"))
System.out.println(s);
output:
id=123
user=aron nicols
name=aron
app=application
From your comment in other answer it seems that = which are escaped with \ shouldn't be treated as separator between key=value but as part of value. In that case you can just add negative-look-behind mechanism to see if before = is no \, so (?<!\\\\) right before will require = to not have \ before it.
BTW to create regex which will match \ we need to write it as \\ but in Java we also need to escape each of \ to create \ literal in String that is why we ended up with \\\\.
So you can use
split("\\s(?=\\S*(?<!\\\\)=)")
Demo:
String user_name = "user=Dist\\=Name1, xyz src=activedirectorydomain ip=10.1.77.24";
for (String s : user_name.split("\\s(?=\\S*(?<!\\\\)=)"))
System.out.println(s);
output:
user=Dist\=Name1, xyz
src=activedirectorydomain
ip=10.1.77.24

Do it like this:
First split input string using this regex:
" +(?=\\w+(?<!\\\\)=)"
This will give you 4 name=value tokens like this:
id=123
user=aron nicols
name=aron
app=application
Now you can just split on = to get your name and value parts.
Regex Demo
Regex Demo with escaped =

CODE FISH, this simple regex captures the user in Group 1: user=\\s*(.*?)\s+name=
It will capture "Aron", "Aron Nichols", "Aron Nichols The Benevolent", and so on.
It relies on the knowledge that name= always follows user=
However, if you're not sure that the token following user is name, you can use this:
user=\s*(.*?)(?=$|\s+\w+=)
Here is how to use the second expression (for the first, just change the string in Pattern.compile:
String ResultString = null;
try {
Pattern regex = Pattern.compile("user=\\s*(.*?)(?=$|\\s+\\w+=)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Related

Java Regex multi delimiter split in order

I am trying to split a string having multi-delimiters in it but want to first check if the string satisfies the regex and then split based on it.
Example:-
The testString will contain ://,:,#,:,/ these characters in specific order and I need to first check if the given string satisfies the pattern or not and if satisfy then split it. The string other characters can also contain these in it but I need to split based on order of these ://,:,#,:,/
String testString = "aman://jaspreet:raman!#127.0.0.1:5031/test";
String[]tokens = testString.split("://|\\:|#|\\:|\\/");
for(String s:tokens) {
System.out.println(s);
}
Here above I have tried the regex to split but it doesn't split by checking in order. It just checks if any given regex character exists in string or not and then splits based on it.
If you first validate the pattern, then you shouldn't do split() after. Use capturing groups to gather the data you already validated.
E.g. in a simple case, foo#bar, with separator #, you would validate with ^([^#]+)#(.+)$, i.e. match and capture text up to #, match but don't capture the #, then match and capture the rest:
Pattern p = Pattern.compile("^([^#]+)#(.+)$");
Matcher m = p.matcher("foo#bar");
if (! m.matches()) {
// invalid data
} else {
String a = m.group(1); // a = "foo"
String b = m.group(2); // b = "bar"
// use a and b here
}
For the matching in the question, a lenient pattern could be:
^(.*?)://(.*?):(.*?)#(.*?):(.*?)/(.*)$
You would then use code above, but with:
String scheme = m.group(1); // "aman"
String user = m.group(2); // "jaspreet"
String password = m.group(3); // "raman!"
String host = m.group(4); // "127.0.0.1"
String port = m.group(5); // "5031"
String path = m.group(6); // "test"
For a stricter matching, replace .*? with a pattern that only matches allowed characters, e.g. [^:]+ if it cannot be empty and cannot contain colons.
Alternatively, you could just use the URI class to parse the URL string.
String testString = "aman://jaspreet:raman!#127.0.0.1:5031/test";
URI uri = URI.create(testString);
String scheme = uri.getScheme(); // "aman"
String userInfo = uri.getUserInfo(); // "jaspreet:raman!"
String host = uri.getHost(); // "127.0.0.1"
String port = uri.getPort(); // "5031"
String path = uri.getPath(); // "test"

How to append to beginning of java string using replace with regex?

How can I use java string replaceAll or replaceFirst to append to beginning?
String joe = "Joe";
String helloJoe = joe.replaceAll("\\^", "Hello");
Desired Output: "Hello Joe"
You don't need to escape ^ because ^ is a special meta character in regex which matches the start of a line.
String helloJoe = whatever.replaceFirst("^", "Hello ");
You could perform a simple String append with +, or String.format(String, Object...) like
String whatever = "Joe";
String helloJoe = String.format("Hello %s", whatever);
// String helloJoe = "Hello " + whatever;
System.out.println(helloJoe);
Output is (as requested)
Hello Joe

How to use pattern in Java to fetch groups like 'sscanf' does in C?

I have String user#domain:port
I want to fetch user, domain and port from this String.
So I created regex:
public static final String MATCH_USER_DOMAIN_PORT = "^([0-9,a-zA-Z-.*_]+)#([a-z0-9]+[\\.-][a-z0-9]+\\.[a-z]{2,}+):(6553[0-5]|655[0-2]\\d|65[0-4]\\d{2}|6[0-4]\\d{3}|[1-5]\\d{4}|[1-9]\\d{0,3})$";
and this is my method in Unitest so far:
public void test____matchesUserDomainWithPort(){
String identityText = "maxim#domain.com:5555";
String user = "";
String domain = "";
String port = "";
if(identityText.matches(MATCH_USER_DOMAIN_PORT))
{
Pattern p = Pattern.compile(MATCH_USER_DOMAIN_PORT);
Matcher m = p.matcher(identityText);
user = m.group(1);
domain= m.group(2);
port= m.group(3);
}
assertEquals("maxim", user);
assertEquals("domain.com", domain);
assertEquals("5555", port);
}
I get error:
java.lang.IllegalStateException: No successful match so far
at java.util.regex.Matcher.ensureMatch(Matcher.java:607)
....
in row: user = m.group(1);
I opened http://gskinner.com/RegExr/?2v5r0
and there all seems good:
Output:
RegExp: /^([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\.-][a-z0-9]+)*)+\.[a-z]{2,}+:(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})$/
pattern: ^([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\.-][a-z0-9]+)*)+\.[a-z]{2,}+:(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})$
flags:
3 capturing groups:
group 1: ([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\.-][a-z0-9]+)*)
group 2: ([\.-][a-z0-9]+)
group 3: (6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})
Do I miss something?
in C i just write: sscanf(identityText,"%[^#]#%[^:]:%511s",user,domain,port);
For sure I can split this text with # and : and get 3 values, but its interesting how to do that in gentle form :)
Please, help
Please use
if(identityText.matches(MATCH_USER_DOMAIN_PORT)){
Pattern p = Pattern.compile(MATCH_USER_DOMAIN_PORT);
Matcher m = p.matcher(identityText);
while(m.find()){
user = m.group(1);
domain= m.group(2);
port= m.group(3);
}
}
thanks
Yes, I think your regex is wrong.
public static final String MATCH_USER_DOMAIN_PORT = "^([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\\.-][a-z0-9]+)*)+\\.[a-z]{2,}+:(6553[0-5]|655[0-2]\\d|65[0-4]\\d{2}|6[0-4]\\d{3}|[1-5]\\d{4}|[1-9]\\d{0,3})$";
To break it down:
^(
[0-9,a-zA-Z-.*_]+
any number of these characters, will match "maxim"
#
will match "#"
[a-z0-9]+
any number of these characters, will match "domain"
([\\.-][a-z0-9]+)*
will match ".com" (or theoretically ".somethingelse.com", nice)
)+
will make group #2 "maxim#domain.com", I believe, but what's with the "+" ?
\\.
nothing in the input string here
[a-z]{2,}+
is this for a country code like .eu ? Again, what's with the "+" ?
:
(6553[0-5]|655[0-2]\\d|65[0-4]\\d{2}|6[0-4]\\d{3}|[1-5]\\d{4}|[1-9]\\d{0,3})
seems overly complicated - probably don't do the numeric validation with the regex
$
Take a look at Using a regular expression to validate an email address for some advice on validation of email addresses.

How to take a substring using pattern match

I have
String content= "<a data-hovercard=\"/ajax/hovercard/group.php?id=180552688740185\">
<a data-hovercard=\"/ajax/hovercard/group.php?id=21392174\">"
I want to get all the id between "group.php?id=" and "\""
Ex:180552688740185
Here is my code:
String content1 = "";
Pattern script1 = Pattern.compile("group.php?id=.*?\"");
Matcher mscript1 = script1.matcher(content);
while (mscript1.find()) {
content1 += mscript1.group() + "\n";
}
But for some reason it does not work.
Can you give me some advice?
Why are you using .*? to match the id. .*? will match every character. You just need to check for digits. So, just use \\d.
Also, you need to capture the id and then print it.
// To consider special characters as literals
String str = Pattern.quote("group.php?id=") + "(\\d*)";
Pattern script1 = Pattern.compile(str);
// Your matcher line
while (mscript1.find()) {
content += mscript1.group(1) + "\n"; // Capture group 1 contains your id
}

substring between two delimiters

I have a string as : "This is a URL http://www.google.com/MyDoc.pdf which should be used"
I just need to extract the URL that is starting from http and ending at pdf :
http://www.google.com/MyDoc.pdf
String sLeftDelimiter = "http://";
String[] tempURL = sValueFromAddAtt.split(sLeftDelimiter );
String sRequiredURL = sLeftDelimiter + tempURL[1];
This gives me the output as "http://www.google.com/MyDoc.pdf which should be used"
Need help on this.
This kind of problem is what regular expressions were made for:
Pattern findUrl = Pattern.compile("\\bhttp.*?\\.pdf\\b");
Matcher matcher = findUrl.matcher("This is a URL http://www.google.com/MyDoc.pdf which should be used");
while (matcher.find()) {
System.out.println(matcher.group());
}
The regular expression explained:
\b before the "http" there is a word boundary (i.e. xhttp does not match)
http the string "http" (be aware that this also matches "https" and "httpsomething")
.*? any character (.) any number of times (*), but try to use the least amount of characters (?)
\.pdf the literal string ".pdf"
\b after the ".pdf" there is a word boundary (i.e. .pdfoo does not match)
If you would like to match only http and https, try to use this instead of http in your string:
https?\: - this matches the string http, then an optional "s" (indicated by the ? after the s) and then a colon.
why don't you use startsWith("http://") and endsWith(".pdf") mthods of String class.
Both the method returns boolean value, if both returns true, then your condition succeed else your condition is failed.
Try this
String StringName="This is a URL http://www.google.com/MyDoc.pdf which should be used";
StringName=StringName.substring(StringName.indexOf("http:"),StringName.indexOf("which"));
You can use Regular Expression power for here.
First you have to find Url in original string then remove other part.
Following code shows my suggestion:
String regex = "\\b(http|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
String str = "This is a URL http://www.google.com/MyDoc.pdf which should be used";
String[] splited = str.split(regex);
for(String current_part : splited)
{
str = str.replace(current_part, "");
}
System.out.println(str);
This snippet code cans retrieve any url in any string with any pattern.
You cant add customize protocol such as https to protocol part in above regular expression.
I hope my answer help you ;)
public static String getStringBetweenStrings(String aString, String aPattern1, String aPattern2) {
String ret = null;
int pos1,pos2;
pos1 = aString.indexOf(aPattern1) + aPattern1.length();
pos2 = aString.indexOf(aPattern2);
if ((pos1>0) && (pos2>0) && (pos2 > pos1)) {
return aString.substring(pos1, pos2);
}
return ret;
}
You can use String.replaceAll with a capturing group and back reference for a very concise solution:
String input = "This is a URL http://www.google.com/MyDoc.pdf which should be used";
System.out.println(input.replaceAll(".*(http.*?\\.pdf).*", "$1"));
Here's a breakdown for the regex: https://regexr.com/3qmus

Categories