Extracting a substring from a given string pattern - java

Here are the Strings:
Example 1 - Movie=HULK/Incredible HULK;old_actor=Edward Norton;new_actor=Mark Ruffalo
Example 2 - Movie=HULK/Incredible HULK;old_movie_release_date=12 December 2008;new_movie_release_date=20 June 2012
How can I extract values like old_actor, new actor from example 1 and new_movie_release_date and old_movie_release_date from example 2.
I'm new to regex trying to see how can this be done. Thanks in advance.

You can do using java regex as follows
String str1 = "Movie=HULK/Incredible HULK;old_actor=Edward Norton;new_actor=Mark Ruffalo";
String str2 = "Movie=HULK/Incredible HULK;old_movie_release_date=12 December 2008;new_movie_release_date=20 June 2012";
String pattern1="Movie=(.*?);old_actor=(.*?);new_actor=(.*?)$";
String pattern2="Movie=(.*?);old_movie_release_date=(.*?);new_movie_release_date=(.*?)$";
Matcher m1 = Pattern.compile(pattern1).matcher(str1);
if (m1.find()) {
System.out.println("old_actor: " + m1.group(2));
System.out.println("new_actor: " + m1.group(3));
}
Matcher m2 = Pattern.compile(pattern2).matcher(str2);
if (m2.find()) {
System.out.println("old_movie_release_date: " + m2.group(2));
System.out.println("new_movie_release_date: " + m2.group(3));
}

You could use String.split(String regex).
First, you use String.split(";"), which will give you an array String[] values with contents looking like Movie=moviename, then you use String.split("=") on each string in the first array
for(String str : values) {
String[] keyValue = str.split("=");
}
to create subarrays of length 2 with key at position 0 and value at position 1.

Just an enhancement to #DerLebkuchenmann's solution
public static void main(String[] args) {
String str1 = "Movie=HULK/Incredible HULK;old_actor=Edward Norton;new_actor=Mark Ruffalo";
String str2 = "Movie=HULK/Incredible HULK;old_movie_release_date=12 December 2008;new_movie_release_date=20 June 2012";
Map<String, String> props1 = getProps(str1);
Map<String, String> props2 = getProps(str2);
System.out.println(String.format("Old Actor: %s", props1.get("old_actor")));
System.out.println(String.format("Old Movie Release Date: %s", props2.get("old_movie_release_date")));
System.out.println(String.format("New Movie Release Date: %s", props2.get("new_movie_release_date")));
}
private static Map<String, String> getProps(String str1) {
return Arrays.stream(str1.split(";"))
.map(pair -> pair.split("="))
.collect(Collectors.toMap(crumbs -> crumbs[0], crumbs -> crumbs[1]));
}

Another approach using StringTokenizer and assembling a HashMap for result:
public class Main
{
public static void main(String[] args) {
HashMap<String,String> m = new HashMap<String,String>();
StringTokenizer st = new StringTokenizer("Movie=HULK/Incredible HULK;old_actor=Edward Norton;new_actor=Mark Ruffalo",";=");
while(st.hasMoreTokens()) {
String s = st.nextToken();
if (st.hasMoreTokens()) { // ensure well-formed
m.put(s,st.nextToken());
}
}
System.out.println(m);
}
}
Prints:
{Movie=HULK/Incredible HULK, old_actor=Edward Norton, new_actor=Mark Ruffalo}

Related

Parsing String in Java, then storing in variables

I need help to parse a string in Java... I'm very new to Java and am not sure how to go about it.
Suppose the string I want to parse is...
String str = "NC43-EB2;49.21716;-122.667252;49.216757;-122.666235;"
What I would want to do is:
String name = C43
String direction = EB2;
Then what I'd like to do is store 2 coordinates as a pair...
Coordinate c1 = 49.21716;-122.667252;
Coordinate c2 = 49.216757;-122.666235;
And then make a List to store c1 and c2.
So far I have this:
parseOnePattern(String str) {
String toParse = str;
name = toParse.substring(1, toParse.indexOf("-"));
direction = toParse.substring(toParse.indexOf("-", toParse.indexOf(";")));
I'm not sure how to move forward. Any help will be appreciated.
A simple substring function may solve your problem.
String str = "NC43-EB2;49.21716;-122.667252;49.216757;-122.666235;";
String[]s = str.split(";");
String[]n = s[0].split("-");
String name = n[0].substring(1);
String direction = n[1];
String c1 = s[1] +";"+s[2];
String c2 = s[3] +";"+s[4];
System.out.println(name + " " + direction);
System.out.println(c1 + " " + c2);
I hope this helps you.
Welcome to Java and the whole set of operations it allows to perform on Strings. You have a whole set of operations to perform, I will give you the code to perform some of them and get you started :-
public void breakString() {
String str = "NC43-EB2;49.21716;-122.667252;49.216757;-122.666235";
// Will break str to "NC43-EB2" and "49.21716" "-122.667252" "49.216757" "-122.666235"
String [] allValues = str.split(";", -1);
String [] nameValuePair = allValues[0].split("-");
// substring selects only the specified portion of string
String name = nameValuePair[0].substring(1, 4);
// Since "49.21716" is of type String, we may need it to parse it to data type double if we want to do operations like numeric operations
double c1 = 0d;
try {
c1 = Double.parseDouble(allValues[1]);
} catch (NumberFormatException e) {
// TODO: Take corrective measures or simply log the error
}
What I would suggest you is to go through the documentation of String class, learn more about operations like String splitting and converting one data type to another and use an IDE like Eclipse which has very helpful features. Also I haven't tested the code above, so use it as a reference and not as a template.
Ok i made this:
public static void main(String[] args) {
String str = "NC43-EB2;49.21716;-122.667252;49.216757;-122.666235;";
String[] strSplit = str.split(";");
String[] nameSplit=strSplit[0].split("-");
String name=nameSplit[0].replace("N", "");
String direction= nameSplit[1];
String cordanateOne = strSplit[1]+";"+strSplit[2]+";";
String cordanateTwo = strSplit[3]+";"+strSplit[4]+";";
System.out.println("Name: "+name);
System.out.println("Direction: "+direction);
System.out.println("Cordenate One: "+cordanateOne);
System.out.println("Cordenate Two: "+cordanateTwo);
}
Name: C43
Direction: EB2
Cordenate One: 49.21716;-122.667252;
Cordenate Two: 49.216757;-122.666235;
String str3 = "NC43-EB2;49.21716;-122.667252;49.216757;-122.666235;";
String sub = str3.substring(0,4); // sub = NC43
String sub4 = str3.substring(5,9); // sub = EB2;
HashMap<String, String> hm = new HashMap<>();
hm.put(str3.substring(9 ,30), str3.substring(30));
hm.forEach((lat, lot) -> {
System.out.println(lat + " - " + lot); // 49.21716;-122.667252; - 49.216757;-122.666235;
});
//edit if using an array non pairs (I assumed it was lat + lon)
List<String> coordList = new ArrayList<>();
coordList.add(str3.substring(9 ,30));
coordList.add(str3.substring(30));
coordList.forEach( coord -> {
System.out.println(coord);
});
//output : 49.21716;-122.667252;
49.216757;-122.666235;

Java replaceALL for string

I have a string:
100-200-300-400
i want replace the dash to "," and add single quote so it become:
'100','200','300','400'
My current code only able to replace "-" to "," ,How can i plus the single quote?
String str1 = "100-200-300-400";
split = str1 .replaceAll("-", ",");
if (split.endsWith(","))
{
split = split.substring(0, split.length()-1);
}
You can use
split = str1 .replaceAll("-", "','");
split = "'" + split + "'";
As an alternative if you are using java 1.8 then you could create a StringJoiner and split the String by -. This would be a bit less time efficient, but it would be more safe if you take, for example, a traling - into account.
A small sample could look like this.
String string = "100-200-300-400-";
String[] splittet = string.split("-");
StringJoiner joiner = new StringJoiner("','", "'", "'");
for(String s : splittet) {
joiner.add(s);
}
System.out.println(joiner);
This will work for you :
public static void main(String[] args) throws Exception {
String s = "100-200-300-400";
System.out.println(s.replaceAll("(\\d+)(-|$)", "'$1',").replaceAll(",$", ""));
}
O/P :
'100','200','300','400'
Or (if you don't want to use replaceAll() twice.
public static void main(String[] args) throws Exception {
String s = "100-200-300-400";
s = s.replaceAll("(\\d+)(-|$)", "'$1',");
System.out.println(s.substring(0, s.length()-1));
}

How to filter string in java

I have the following string in Java. For Example:
String abc = "nama=john; class=6; height=170; weight=70";
How can I extract the value of height from the String?
Outputs: height=170
This is the code I have written so far:
String abc = "nama=john; class=6; height=170; weight=70";
String[] tokens = abc.split("; ");
List<String> listString = new ArrayList<String>();
String mod = new String();
for (String s : tokens) {
mod = s;
System.out.println(s);
listString.add(mod);
}
System.out.println(listString.size());
But I do not get the value height. Instead, I get value of height as a String.
Thanks.
With this Code-Snippet:
String abc = "nama=john; class=6; height=170; weight=70";
for(String sa : abc.split(";")){
System.out.println(sa.trim());
}
you generate this output:
nama=john
class=6
height=170
weight=70
if you want to add a specific String into a list you put the sa.trim() at the List.add parameter. To find the height-String you can use:
if(sa.trim().startsWith("height")) and you have the needed String.
you can use this regex:
(?<=height=)(\d+)(?=;|\Z)
if you want to implement this, you can do it like this:
Pattern pattern = Pattern.compile("(?<=height=)(\\d+)(?=;|\\Z)");
// create matcher object.
Matcher m = pattern.matcher(abc);
if (m.find())
{
String height = m.group(0);
}
else
{
System.out.println("not found");
}
here, you have an example: https://regex101.com/r/iM3gY0/2
and here you have an executable snipped: https://ideone.com/azngNt
If you want all parameter, you can use this regex:
(\w+)=([\d|\w]+)(?=;|\"|\Z)
so you get as Pattern:
Pattern pattern = Pattern.compile("(\\w+)=([\d|\\w]+)(?=;|\\"|\\Z)");
and the Regex101 again: https://regex101.com/r/uT6uK1/3
#Fast Snail you are correct, but I think they wanted an integer value of it:
final String string = "nama=john; class=6; height=170; weight=70";
final String[] tokens = string.split("; ");
for (String token : tokens) {
if (token.contains("height")) {
System.out.println(token);
final String[] heightSplit = token.split("=");
Integer heightValue = new Integer(heightSplit[1]);
System.out.println("height=" + heightValue);
}
}
System.out.println(tokens.length);
This should do what you need.
Use String tokenizer
import java.util.StringTokenizer;
public class stringval {
StringTokenizer st = new StringTokenizer(" nama=john, class=6, height=170, weight=70;",",");{
while(st.hasMoreTokens()){
String abc=st.nextToken();
if(abc.equals("height=170")){
StringTokenizer s=new StringTokenizer(abc,"=");
while(s.hasMoreTokens()){
String str=s.nextToken();
if (s.equals("170")){
System.out.print(s);
break;
}
}
}
}
}
}
With Java 8 you can do the following:
Map<String, String> map = Arrays.stream("nama=john; class=6; height=170; weight=70".split(";"))
.map(s -> s.trim().split("="))
.collect(Collectors.toMap(s -> s[0], s -> s[1]));
System.out.println(map.get("height")); //170
Thanks Mr. MrT.
Code answer: Resolved.
String abc = "nama=john; class=6; height=170; weight=70";
String[] tokens = abc.split("; ");
List<String> listString = new ArrayList<String>();
String mod = new String();
for(String s:tokens){
mod =s;
System.out.println(s);
listString.add(mod);
}
System.out.println(listString.size());
for(String abcd :listString){
if(abcd.trim().startsWith("height")){
System.out.println(abcd);
}
}

regex pattern to match particular uri from list of urls

I have a list of urls (lMapValues ) with wild cards like as mentioned in the code below
I need to match uri against this list to find matching url.
In below code I should get matching url as value of d in the map m.
That means if part of uri is matching in the list of urls, that particular url should be picked.
I tried splitting uri in tokens and then checking each token in list lMapValues .However its not giving me correct result.Below is code for that.
public class Matcher
{
public static void main( String[] args )
{
Map m = new HashMap();
m.put("a","https:/abc/eRControl/*");
m.put("b","https://abc/xyz/*");
m.put("c","https://work/Mypage/*");
m.put("d","https://cr/eRControl/*");
m.put("e","https://custom/MyApp/*");
List lMapValues = new ArrayList(m.values());
List tokens = new ArrayList();
String uri = "cr/eRControl/work/custom.jsp";
StringTokenizer st = new StringTokenizer(uri,"/");
while(st.hasMoreTokens()) {
String token = st.nextToken();
tokens.add(token);
}
for(int i=0;i<lMapValues.size();i++) {
String value = (String)lMapValues.get(i);
String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
java.util.regex.Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(value);
}
}
}
}
Please help me with regex pattern to achieve above objective.
Any help will be appreciated.
It's much simpler to check if a string starts with a certain value with String.indexOf().
String[] urls = {
"abc/eRControl",
"abc/xyz",
"work/Mypage",
"cr/eRControl",
"custom/MyApp"
};
String uri = "cr/eRControl/work/custom.jsp";
for (String url : urls) {
if (uri.indexOf(url) == 0) {
System.out.println("Matched: " + url);
}else{
System.out.println("Not matched: " + url);
}
}
Also. There is no need to store the scheme into the map if you are never going to match against it.
if I understand your goal correctly, you might not even need regular expressions here.
Try this...
package test;
import java.util.HashSet;
import java.util.Set;
public class PartialURLMapper {
private static final Set<String> PARTIAL_URLS = new HashSet<String>();
static {
PARTIAL_URLS.add("cr/eRControl");
// TODO add more partial Strings to check against input
}
public static String getPartialStringIfMatching(final String input) {
if (input != null && !input.isEmpty()) {
for (String partial: PARTIAL_URLS) {
// this will be case-sensitive
if (input.contains(partial)) {
return partial;
}
}
}
// no partial match found, we return an empty String
return "";
}
// main method just to add example
public static void main(String[] args) {
System.out.println(PartialURLMapper.getPartialStringIfMatching("cr/eRControl/work/custom.jsp"));
}
}
... it will return:
cr/eRControl
The problem is that i is acting as a key not as an index on
String value = (String)lMapValues.get(i);
you will be better served exchanging the map for a list, and using the for each loop.
List<String> patterns = new ArrayList<String>();
...
for (String pattern : patterns) {
....
}

Tokenize a string with a space in java

I want to tokenize a string like this
String line = "a=b c='123 456' d=777 e='uij yyy'";
I cannot split based like this
String [] words = line.split(" ");
Any idea how can I split so that I get tokens like
a=b
c='123 456'
d=777
e='uij yyy';
The simplest way to do this is by hand implementing a simple finite state machine. In other words, process the string a character at a time:
When you hit a space, break off a token;
When you hit a quote keep getting characters until you hit another quote.
Depending on the formatting of your original string, you should be able to use a regular expression as a parameter to the java "split" method: Click here for an example.
The example doesn't use the regular expression that you would need for this task though.
You can also use this SO thread as a guideline (although it's in PHP) which does something very close to what you need. Manipulating that slightly might do the trick (although having quotes be part of the output or not may cause some issues). Keep in mind that regex is very similar in most languages.
Edit: going too much further into this type of task may be ahead of the capabilities of regex, so you may need to create a simple parser.
line.split(" (?=[a-z+]=)")
correctly gives:
a=b
c='123 456'
d=777
e='uij yyy'
Make sure you adapt the [a-z+] part in case your keys structure changes.
Edit: this solution can fail miserably if there is a "=" character in the value part of the pair.
StreamTokenizer can help, although it is easiest to set up to break on '=', as it will always break at the start of a quoted string:
String s = "Ta=b c='123 456' d=777 e='uij yyy'";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.ordinaryChars('0', '9');
st.wordChars('0', '9');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
switch (st.ttype) {
case StreamTokenizer.TT_NUMBER:
System.out.println(st.nval);
break;
case StreamTokenizer.TT_WORD:
System.out.println(st.sval);
break;
case '=':
System.out.println("=");
break;
default:
System.out.println(st.sval);
}
}
outputs
Ta
=
b
c
=
123 456
d
=
777
e
=
uij yyy
If you leave out the two lines that convert numeric characters to alpha, then you get d=777.0, which might be useful to you.
Assumptions:
Your variable name ('a' in the assignment 'a=b') can be of length 1 or more
Your variable name ('a' in the assignment 'a=b') can not contain the space character, anything else is fine.
Validation of your input is not required (input assumed to be in valid a=b format)
This works fine for me.
Input:
a=b abc='123 456' &=777 #='uij yyy' ABC='slk slk' 123sdkljhSDFjflsakd#*#&=456sldSLKD)#(
Output:
a=b
abc='123 456'
&=777
#='uij yyy'
ABC='slk slk'
123sdkljhSDFjflsakd#*#&=456sldSLKD)#(
Code:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
// SPACE CHARACTER followed by
// sequence of non-space characters of 1 or more followed by
// first occuring EQUALS CHARACTER
final static String regex = " [^ ]+?=";
// static pattern defined outside so that you don't have to compile it
// for each method call
static final Pattern p = Pattern.compile(regex);
public static List<String> tokenize(String input, Pattern p){
input = input.trim(); // this is important for "last token case"
// see end of method
Matcher m = p.matcher(input);
ArrayList<String> tokens = new ArrayList<String>();
int beginIndex=0;
while(m.find()){
int endIndex = m.start();
tokens.add(input.substring(beginIndex, endIndex));
beginIndex = endIndex+1;
}
// LAST TOKEN CASE
//add last token
tokens.add(input.substring(beginIndex));
return tokens;
}
private static void println(List<String> tokens) {
for(String token:tokens){
System.out.println(token);
}
}
public static void main(String args[]){
String test = "a=b " +
"abc='123 456' " +
"&=777 " +
"#='uij yyy' " +
"ABC='slk slk' " +
"123sdkljhSDFjflsakd#*#&=456sldSLKD)#(";
List<String> tokens = RegexTest.tokenize(test, p);
println(tokens);
}
}
Or, with a regex for tokenizing, and a little state machine that just adds the key/val to a map:
String line = "a = b c='123 456' d=777 e = 'uij yyy'";
Map<String,String> keyval = new HashMap<String,String>();
String state = "key";
Matcher m = Pattern.compile("(=|'[^']*?'|[^\\s=]+)").matcher(line);
String key = null;
while (m.find()) {
String found = m.group();
if (state.equals("key")) {
if (found.equals("=") || found.startsWith("'"))
{ System.err.println ("ERROR"); }
else { key = found; state = "equals"; }
} else if (state.equals("equals")) {
if (! found.equals("=")) { System.err.println ("ERROR"); }
else { state = "value"; }
} else if (state.equals("value")) {
if (key == null) { System.err.println ("ERROR"); }
else {
if (found.startsWith("'"))
found = found.substring(1,found.length()-1);
keyval.put (key, found);
key = null;
state = "key";
}
}
}
if (! state.equals("key")) { System.err.println ("ERROR"); }
System.out.println ("map: " + keyval);
prints out
map: {d=777, e=uij yyy, c=123 456, a=b}
It does some basic error checking, and takes the quotes off the values.
This solution is both general and compact (it is effectively the regex version of cletus' answer):
String line = "a=b c='123 456' d=777 e='uij yyy'";
Matcher m = Pattern.compile("('[^']*?'|\\S)+").matcher(line);
while (m.find()) {
System.out.println(m.group()); // or whatever you want to do
}
In other words, find all runs of characters that are combinations of quoted strings or non-space characters; nested quotes are not supported (there is no escape character).
public static void main(String[] args) {
String token;
String value="";
HashMap<String, String> attributes = new HashMap<String, String>();
String line = "a=b c='123 456' d=777 e='uij yyy'";
StringTokenizer tokenizer = new StringTokenizer(line," ");
while(tokenizer.hasMoreTokens()){
token = tokenizer.nextToken();
value = token.contains("'") ? value + " " + token : token ;
if(!value.contains("'") || value.endsWith("'")) {
//Split the strings and get variables into hashmap
attributes.put(value.split("=")[0].trim(),value.split("=")[1]);
value ="";
}
}
System.out.println(attributes);
}
output:
{d=777, a=b, e='uij yyy', c='123 456'}
In this case continuous space will be truncated to single space in the value.
here attributed hashmap contains the values
import java.io.*;
import java.util.Scanner;
public class ScanXan {
public static void main(String[] args) throws IOException {
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("<file name>")));
while (s.hasNext()) {
System.out.println(s.next());
<write for output file>
}
} finally {
if (s != null) {
s.close();
}
}
}
}
java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line, " ");
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
int index = token.indexOf('=');
String key = token.substring(0, index);
String value = token.substring(index + 1);
}
Have you tried splitting by '=' and creating a token out of each pair of the resulting array?

Categories