Need Regular Expression to parse multi-line environmental variables - java

I want to parse a file that is a list of environmental variables similar to this example:
TPS_LIB_DIR = "$DEF_VERSION_DIR\lib\ver215";
TPS_PH_DIR = "$DEF_VERSION_DIR";
TPS_SCHEMA_DIR = "~TPS_DIR\Supersedes\code;" +
"~TPR_DIR\..\Supersedes\code;" +
"~TPN_DIR\..\..\Supersedes\code;" +
"$TPS_VERSION_DIR";
TPS_LIB_DIR = "C:\prog\lib";
BASE_DIR = "C:\prog\base";
SPARS_DIR = "C:\prog\spars";
SIGNALFILE_DIR = "E:\SIGNAL_FILES";
SIGNALFILE2_DIR = "E:\SIGNAL_FILES2";
SIGNALFILE3_DIR = "E:\SIGNAL_FILES2";
I came up with this regular expression that matches the single line definitions fine, but it will not match the multi-line definitions.
(\w+)\s*=\s*(.*);[\r\n]+
Does anyone know of a regular expression which will parse all lines in this file where the environmental variable name is in group 1 and the value (on right side of =) is in group 2? Even better would be if the multiple paths were in separate groups, but I can handle that part manually.
UPDATE:
Here is what I ended up implementing. The first pattern "Pattern p" matches the individual environmental variable blocks. The second pattern, "Pattern valpattern" parses the one or more values for each environmental variable. Hope someone finds this useful.
private static void parse(File filename) {
Pattern p = Pattern.compile("(\\w+)\\s*=\\s*([\\s\\S]+?\";)");
Pattern valpattern = Pattern.compile("\\s*\"(.+)\"\\s*");
try {
String str = readFile(filename, StandardCharsets.UTF_8);
Matcher matcher = p.matcher(str);
while(matcher.find()) {
String key = matcher.group(1);
Matcher valmatcher = valpattern.matcher(matcher.group(2));
System.out.println(key);
while(valmatcher.find()) {
System.out.println("\t" + valmatcher.group(1).replaceAll(System.getProperty("line.separator"), ""));
}
}
} catch (IOException e) {
System.out.println("Error: ProcessENV.parse -- problem parsing file: " + filename + System.lineSeparator());
e.printStackTrace();
}
}
static String readFile(File file, Charset encoding) throws IOException {
byte[] encoded = Files.readAllBytes(file.toPath());
return new String(encoded, encoding);
}

It is simpler to split on '=' and '";'.
[ c.strip().split(' = ') for c in s.split('";') ]
Or with double comprehension to get the individual paths:
[ [p[0].strip(), * [x.strip() for x in p.strip().split('=')] for c in s.split('";') for p in c.split(" = ")]
Split could be done with re, adding \s* to remove the trailing spaces:
re.split(r'\s*=\s*|";\s*', text, flags=re.MULTILINE):
even elements r[::2] would be vars, odd [1::2] values
then get rid of extra white space in values

You can use the following regex:
(\w+)\s*=\s*([\s\S]+?)";
It will start by matching a Group 1 of Word character, zero or more White Spaces, an equal sign, zero or more White Space, then a Group 2 or more of any characters (non greedy), and finally a a last double quote and a semi colon.
That will match all the lines.

Related

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}
You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

Java String tokens

I have a string line
String user_name = "id=123 user=aron name=aron app=application";
and I have a list that contains: {user,cuser,suser}
And i have to get the user part from string. So i have code like this
List<String> userName = Config.getConfig().getList(Configuration.ATT_CEF_USER_NAME);
String result = null;
for (String param: user_name .split("\\s", 0)){
for(String user: userName ){
String userParam = user.concat("=.*");
if (param.matches(userParam )) {
result = param.split("=")[1];
}
}
}
But the problem is that if the String contains spaces in the user_name, It do not work.
For ex:
String user_name = "id=123 user=aron nicols name=aron app=application";
Here user has a value aron nicols which contain spaces. How can I write a code that can get me exact user value i.e. aron nicols
If you want to split only on spaces that are right before tokens which have = righ after it such as user=... then maybe add look ahead condition like
split("\\s(?=\\S*=)")
This regex will split on
\\s space
(?=\\S*=) which has zero or more * non-space \\S characters which ends with = after it. Also look-ahead (?=...) is zero-length match which means part matched by it will not be included in in result so split will not split on it.
Demo:
String user_name = "id=123 user=aron nicols name=aron app=application";
for (String s : user_name.split("\\s(?=\\S*=)"))
System.out.println(s);
output:
id=123
user=aron nicols
name=aron
app=application
From your comment in other answer it seems that = which are escaped with \ shouldn't be treated as separator between key=value but as part of value. In that case you can just add negative-look-behind mechanism to see if before = is no \, so (?<!\\\\) right before will require = to not have \ before it.
BTW to create regex which will match \ we need to write it as \\ but in Java we also need to escape each of \ to create \ literal in String that is why we ended up with \\\\.
So you can use
split("\\s(?=\\S*(?<!\\\\)=)")
Demo:
String user_name = "user=Dist\\=Name1, xyz src=activedirectorydomain ip=10.1.77.24";
for (String s : user_name.split("\\s(?=\\S*(?<!\\\\)=)"))
System.out.println(s);
output:
user=Dist\=Name1, xyz
src=activedirectorydomain
ip=10.1.77.24
Do it like this:
First split input string using this regex:
" +(?=\\w+(?<!\\\\)=)"
This will give you 4 name=value tokens like this:
id=123
user=aron nicols
name=aron
app=application
Now you can just split on = to get your name and value parts.
Regex Demo
Regex Demo with escaped =
CODE FISH, this simple regex captures the user in Group 1: user=\\s*(.*?)\s+name=
It will capture "Aron", "Aron Nichols", "Aron Nichols The Benevolent", and so on.
It relies on the knowledge that name= always follows user=
However, if you're not sure that the token following user is name, you can use this:
user=\s*(.*?)(?=$|\s+\w+=)
Here is how to use the second expression (for the first, just change the string in Pattern.compile:
String ResultString = null;
try {
Pattern regex = Pattern.compile("user=\\s*(.*?)(?=$|\\s+\\w+=)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

String Parsing in Java and android

I want to parse the data below in java. What approach shall I follow?
I want to neglect ; inside { }.
Thus Version, Content, Provide, UserConfig and Icon as name and corresponding values.
Version:"1";
Content:2013091801;
Provide:"Airtel";
UserConfig :
{
Checksum = "sha1-234448e7e573b6dedd65f50a2da72245fd3b";
Source = "content\\user.ini";
};
Icon:
{
Checksum = "sha1-a99f835tytytyt3177674489770e613c89390a8c4";
Source = "content\\resept_ico.bmp";
};
Here we can't use String.split(";") function.
It would have been lot more complex to convert using the Regex and then creating a method to extract the required fields,
What I did was converted the above mentioned input to Json compatible string and then used GSON library by google to parse the String to my customized class,
class MyVer
{
String Version;
long Content;
String Provide;
Config UserConfig;
Config Icon;
String Source;
}
class Config
{
String Checksum;
String Source;
}
public static void main(String[] args)
{
String s = "Version:\"1\";Content:2013091801;Provide:\"Airtel\";UserConfig :{ Checksum = \"sha1-234448e7e573b6dedd65f50a2da72245fd3b\"; Source = \"content\\user.ini\";};Icon:{ Checksum = \"sha1-a99f835tytytyt3177674489770e613c89390a8c4\"; Source = \"content\\resept_ico.bmp\";};";
String startingBracePattern = Pattern.quote("{");
String endBracePattern = Pattern.quote("}");
s=s.replaceAll(Pattern.quote("\\"), "\\\\\\\\"); //Replacing all the single \ with double \\
s = s.replaceAll("\\s*"+startingBracePattern +"\\s*", "\\{\""); //Replacing all the `spaces { spaces` with `{"` MEANS all the { to replace with {"
s = s.replaceAll(";\\s*"+endBracePattern +"\\s*;", "\\};"); //Replacing all the `; spaces } spaces ;` with `},"` MEANS all the ;}; to replace with };
s = "{\"" + s.substring(0, s.length() - 1) +"}"; //Removing last ; and appending {" and }
s = s.replaceAll("\\s*:", "\":"); // Replacing all the `space with :` with `":`
s = s.replaceAll("\\s*;\\s*", ",\""); //Replacing all the `spaces ; spaces` with `,"`
s = s.replaceAll("\\s*=\\s*", "\":"); //Replacing all the `spaces = spaces` with `":`
Gson gson = new Gson();
MyVer newObj = gson.fromJson(s, MyVer.class);
}
This converts and give you the object of MyVer and then you can access all the variables.
NOTE: You can alter the code little to replace all \r\n if they are present in your input variables. I have not used them and your actual data supplied in question in a single line for simplicity.
JSON sounds a lot easier in this case..
.. however, if you were to do this using regular expressions, one way would be:
for the simple cases (eg. version):
// look for Version: some stuff ;
Pattern versionPattern = Pattern.compile("Version\\s*:\\s*\"\\w+\"\\s*;");
// the whole big string you're looking in
String bigString = ...; // the entire string from before can go here
// create a matcher for the "version pattern"
Matcher versionMatcher = versionPattern.matcher(bigString);
// check if there's a match in the string
if(versionMatcher.find()) {
// get the matching substring
String matchingSubstring = bigString.substring(
versionMatcher.start(),
versionMatcher.end()
);
// we need the area between the quotes
String version = matchingSubstring.split("\"")[1];
// do something with it
...
}
for the harder (multi-line) cases (eg. UserConfig):
// look for UserConfig : { some stuff };
Pattern userconfigPattern = Pattern.compile("UserConfig\\s*:\\s*{[^}]*};", Pattern.DOTALL);
// create a matcher for the "user config pattern"
Matcher userconfigMatcher = userconfigPattern.matcher(bigString);
// check if there's a match in the string
if(userconfigMatcher.find()) {
// get the matching substring
String matchingSubstring = bigString.substring(
userconfigMatcher.start(),
userconfigMatcher.end()
);
// we need the area between the curly braces
String version = matchingSubstring.split("[{}]")[1];
// do something with it
...
}
EDIT: this is probably an easier way
// split the input string into fields
String[] fields = bigString.split("[^:]+:([^{;]+;)|({[^}]+};)");
// for each key-value pair
for(String field : fields) {
// the key and value are separated by colons
String parts = field.split(":");
String key = parts[0];
String value = parts[1];
// do something with them, or add them to a map
...
}
This last way splits the input string based on the assumption that each key-value pair consists of:
some (non-colon) characters at the start, followed by
a colon,
either
-> some characters that are not curly braces or semi-colons (for simple attributes), or
-> curly braces containing some characters that are not curly braces
a semi-colon
Here is json solution
str = "{" + str.substring(0, str.lastIndexOf(";")).replace(";\n}", "}") + "}";
try {
JSONObject json = new JSONObject(str);
String version = json.getString("Version");
JSONObject config = json.getJSONObject("UserConfig");
String source = config.getString("Source");
} catch (JSONException e) {
e.printStackTrace();
}
since ";" should not be in front of "}"
Source = "content\\resept_ico.bmp";
}
we need remove them

Replace String in Java with regex and replaceAll

Is there a simple solution to parse a String by using regex in Java?
I have to adapt a HTML page. Therefore I have to parse several strings, e.g.:
href="/browse/PJBUGS-911"
=>
href="PJBUGS-911.html"
The pattern of the strings is only different corresponding to the ID (e.g. 911). My first idea looks like this:
String input = "";
String output = input.replaceAll("href=\"/browse/PJBUGS\\-[0-9]*\"", "href=\"PJBUGS-???.html\"");
I want to replace everything except the ID. How can I do this?
Would be nice if someone can help me :)
You can capture substrings that were matched by your pattern, using parentheses. And then you can use the captured things in the replacement with $n where n is the number of the set of parentheses (counting opening parentheses from left to right). For your example:
String output = input.replaceAll("href=\"/browse/PJBUGS-([0-9]*)\"", "href=\"PJBUGS-$1.html\"");
Or if you want:
String output = input.replaceAll("href=\"/browse/(PJBUGS-[0-9]*)\"", "href=\"$1.html\"");
This does not use regexp. But maybe it still solves your problem.
output = "href=\"" + input.substring(input.lastIndexOf("/")) + ".html\"";
This is how I would do it:
public static void main(String[] args)
{
String text = "href=\"/browse/PJBUGS-911\" blahblah href=\"/browse/PJBUGS-111\" " +
"blahblah href=\"/browse/PJBUGS-34234\"";
Pattern ptrn = Pattern.compile("href=\"/browse/(PJBUGS-[0-9]+?)\"");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find())
{
String match = mtchr.group(0);
String insMatch = mtchr.group(1);
String repl = match.replaceFirst(match, "href=\"" + insMatch + ".html\"");
System.out.println("orig = <" + match + "> repl = <" + repl + ">");
}
}
This just shows the regex and replacements, not the final formatted text, which you can get by using Matcher.replaceAll:
String allRepl = mtchr.replaceAll("href=\"$1.html\"");
If just interested in replacing all, you don't need the loop -- I used it just for debugging/showing how regex does business.

Categories