I want to extract specific substrings from a string:
String source = "info1 info1ContentA info1ContentB info3 info3ContentA info3ContentB"+
"info2 info2ContentA";
The result should be:
String info1 ="info1ContentA info1ContentB";
String info2 ="info2ContentA";
String info3 ="info3ContentA info3ContentB";
For me it's very difficult to extract the informations, because sometimes after "info" their are one, two or more content informations. Another problem that occurs is, that the order of info1, info2 etc. is not sorted and the "real data" doesn't contain a ascending number.
My first idea was to add info1, info2, info3 etc to an ArrayList.
private ArrayList<String> arr = new ArrayList<String>();
arr.add("info1");
arr.add("info2");
arr.add("info3");
Now I want to extract the substring with the method StringUtils.substringBetween() from Apache Commons (https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.4):
String result = StringUtils.substringBetween(source, arr.get(0), arr.get(1));
This works, if info1 is in the string before info2, but like I said the "real data" is not sorted.
Any idea how I can fix this?
Split those string by space and then use String's method startsWith to add the part to proper result string
Map<String, String> resultMap = new HashMap<String, String>();
String[] prefixes = new String[]{"info1", "info2", "info3"};
String source = "info1 info1ContentA info1ContentB info3 info3ContentA info3ContentB"+" info2 info2ContentA";
String[] parts = source.split(" ");
for(String part : parts) {
for(String prefix : prefixes) {
if(part.startsWith(prefix) {
String currentResult = (resultMap.containsKey(prefix) ? resultMap.get(prefix) + part + " " : part);
resultMap.put(prefix, currentResult);
}
}
}
Also consider using StringBuilder instead of adding string parts
If you cannot be sure that parts will be embraces with spaces you can change at the beginning all part to <SPACE>part in your source string using String replace method
You can use a regular expression, like this:
String source = "info1 info1ContentA info1ContentB info3 info3ContentA info3ContentB info2 info2ContentA";
for (int i = 1; i < 3; i++) {
Pattern pattern = Pattern.compile("info" + i + "Content[A-Z]");
Matcher matcher = pattern.matcher(source);
List<String> matches = new ArrayList<>();
while (matcher.find()) {
matches.add(matcher.group());
}
// process the matches list
}
Related
Like for eg., in “int bot = 235;” from a line in text file, I want to extract only “bot” and “235” and store it in a HashMap in Java.
You could use regexp:
String detail = "int bot = 235";
String pattern = "(\\w+) = (\\w+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(detail);
HashMap<String, String> result = new HashMap<>();
while (m.find()) {
result.put(m.group(1), m.group(2));
}
System.out.println(result);
gives
{bot=235}
You can use the string function split, like this:
String[] s = string.split("=");
String s1 = string[0]; // "int bot "
String s2 = parts[1]; // " 235;"
I have a List of Strings containing names and surnames and i have a free text.
List<String> names; // contains: "jon", "snow", "arya", "stark", ...
String text = "jon snow and stark arya";
I have to find all the names and surnames, possibly with a Java Regex (so using Pattern and Matcher objects). So i want something like:
List<String> foundNames; // contains: "jon snow", "stark arya"
I have done this 2 possible ways but without using Regex, they are not static beacause part of a NameFinder class that have a list "names" that contains all the names.
public List<String> findNamePairs(String text) {
List<String> foundNamePairs = new ArrayList<String>();
List<String> names = this.names;
text = text.toLowerCase();
for (String name : names) {
String nameToSearch = name + " ";
int index = text.indexOf(nameToSearch);
if (index != -1) {
String textSubstring = text.substring(index + nameToSearch.length());
for (String nameInner : names) {
if (name != nameInner && textSubstring.startsWith(nameInner)) {
foundNamePairs.add(name + " " + nameInner);
}
}
}
}
removeDuplicateFromList(foundNamePairs);
return foundNamePairs;
}
or in a worse (very bad) way (creating all the possible pairs):
public List<String> findNamePairsInTextNotOpt(String text) {
List<String> foundNamePairs = new ArrayList<String>();
text = text.toLowerCase();
List<String> pairs = getNamePairs(this.names);
for (String name : pairs) {
if (text.contains(name)) {
foundNamePairs.add(name);
}
}
removeDuplicateFromList(foundNamePairs);
return foundNamePairs;
}
You can create a regex using the list of names and then use find to find the names. To ensure you don't have duplicates, you can check if the name is already in the list of found names. The code would look like this.
List<String> names = Arrays.asList("jon", "snow", "stark", "arya");
String text = "jon snow and Stark arya and again Jon Snow";
StringBuilder regexBuilder = new StringBuilder();
for (int i = 0; i < names.size(); i += 2) {
regexBuilder.append("(")
.append(names.get(i))
.append(" ")
.append(names.get(i + 1))
.append(")");
if (i != names.size() - 2) regexBuilder.append("|");
}
System.out.println(regexBuilder.toString());
Pattern compile = Pattern.compile(regexBuilder.toString(), Pattern.CASE_INSENSITIVE);
Matcher matcher = compile.matcher(text);
List<String> found = new ArrayList<>();
int start = 0;
while (matcher.find(start)) {
String match = matcher.group().toLowerCase();
if (!found.contains(match)) found.add(match);
start = matcher.end();
}
for (String s : found) System.out.println("found: " + s);
If you want to be case sensitive just remove the flag in Pattern.compile(). If all matches have the same capitalization you can omit the toLowerCase() in the while loop as well.
But make sure that the list contains a multiple of 2 as list elements (name and surname) as the for-loop will throw an IndexOutOfBoundsException otherwise. Also the order matters in my code. It will only find the name pairs in the order they occur in the list. If you want to have both orders, you can change the regex generation accordingly.
Edit: As it is unknown whether a name is a surname or name and which build a name/surname pair, the regex generation must be done differently.
StringBuilder regexBuilder = new StringBuilder("(");
for (int i = 0; i < names.size(); i++) {
regexBuilder.append("(")
.append(names.get(i))
.append(")");
if (i != names.size() - 1) regexBuilder.append("|");
}
regexBuilder.append(") ");
regexBuilder.append(regexBuilder);
regexBuilder.setLength(regexBuilder.length() - 1);
System.out.println(regexBuilder.toString());
This regex will match any of the given names followed by a space and then again any of the names.
In debug mode I can see that locator of one of the element on the page is: By.name: NameOfMyElement_123.
The question is, how can I parse the following string (By.name: NameOfMyElement_123) in Java in order to have the type of my locator (name) and value (NameOfMyElement_123) ?
String[] split = "By.name: NameOfMyElement_123".split(" ");
or
Pattern p = Pattern.compile("([\\w.]*): ([\\w]*_[\\d]*)");
Matcher m = p.matcher("By.name: NameOfMyElement_123");
while (m.find()){
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You could use split(). In this case, it's best to split with :
String[] splittedText = element.split(':');
String type = splittedText[0].trim();
String value = splittedText[1].trim();
Nothing fancy is necessary, two split() methods are enough:
String[] firstSplit = element.split(':');
String[] secondSplit = firstSplit[0].split('.');
String type = secondSplit[1].trim(); // will result in "name"
String value = firstSplit[1].trim(); // will result in "NameOfMyElement_123"
I am writing a small programming language for a game I am making, this language will be for allowing users to define their own spells for the wizard entity outside the internal game code. I have the language written down, but I'm not entirely sure how to change a string like
setSpellName("Fireball")
setSplashDamage(32,5)
into an array which would have the method name and the arguments after it, like
{"setSpellName","Fireball"}
{"setSplashDamage","32","5"}
How could I do this using java's String.split or string regex's?
Thanks in advance.
Since you're only interested in the function name and parameters I'd suggest scanning up to the first instance of ( and then to the last ) for the params, as so.
String input = "setSpellName(\"Fireball\")";
String functionName = input.substring(0, input.indexOf('('));
String[] params = input.substring(input.indexOf(')'), input.length - 1).split(",");
To capture the String
setSpellName("Fireball")
Do something like this:
String[] line = argument.split("(");
Gets you "setSpellName" at line[0] and "Fireball") at line[1]
Get rid of the last parentheses like this
line[1].replaceAll(")", " ").trim();
Build your JSON with the two "cleaned" Strings.
There's probably a better way with Regex, but this is the quick and dirty way.
With String.indexOf() and String.substring(), you can parse out the function and parameters. Once you parse them out, apply the quotes are around each of them. Then combine them all back together delimited by commas and wrapped in curly braces.
public static void main(String[] args) throws Exception {
List<String> commands = new ArrayList() {{
add("setSpellName(\"Fireball\")");
add("setSplashDamage(32,5)");
}};
for (String command : commands) {
int openParen = command.indexOf("(");
String function = String.format("\"%s\"", command.substring(0, openParen));
String[] parameters = command.substring(openParen + 1, command.indexOf(")")).split(",");
for (int i = 0; i < parameters.length; i++) {
// Surround parameter with double quotes
if (!parameters[i].startsWith("\"")) {
parameters[i] = String.format("\"%s\"", parameters[i]);
}
}
String combine = String.format("{%s,%s}", function, String.join(",", parameters));
System.out.println(combine);
}
}
Results:
{"setSpellName","Fireball"}
{"setSplashDamage","32","5"}
This is a solution using regex, use this Regex "([\\w]+)\\(\"?([\\w]+)\"?\\)":
String input = "setSpellName(\"Fireball\")";
String pattern = "([\\w]+)\\(\"?([\\w]+)\"?\\)";
Pattern r = Pattern.compile(pattern);
String[] matches;
Matcher m = r.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
String[] params = m.group(2).split(",");
if (params.length > 1) {
matches = new String[params.length + 1];
matches[0] = m.group(1);
System.out.println(params.length);
for (int i = 0; i < params.length; i++) {
matches[i + 1] = params[i];
}
System.out.println(String.join(" :: ", matches));
} else {
matches = new String[2];
matches[0] = m.group(1);
matches[1] = m.group(2);
System.out.println(String.join(", ", matches));
}
}
([\\w]+) is the first group to get the function name.
\\(\"?([\\w]+)\"?\\) is the second group to get the parameters.
This is a Working DEMO.
How would you split this String format into parts:
message_type={any_text}&message_number={digits}&code={digits}&id={digits}&message={any_text}×tamp={digits_with_decimal}
Where in the message={any_text} part, the {any_text} may contain a & and a = thus not being able to do String split by & or =
And the order of the message parts may be scrambled or not in this order. I am thinking that a pattern can be extracted for a solution, ={the_text_needed}& however this would not apply for the last part of the String as there will be no & at the end.
I hope this will work -
String originalString = "message_type={a&=b}&message_number={1}&code={2}&id={3}&message={a&=b}×tamp={12}";
Map<String, String> resultMap = new HashMap<String, String>();
String[] splitted1 = originalString.split("&+(?![^{]*})");
for (String str : splitted1) {
String[] splitted2 = str.split("=+(?![^{]*})");
resultMap.put(splitted2[0], splitted2[1]);
splitted2 = null;
}
If parameter values are not enclosed within curly braces, then its really tough. I can think of a solution, but I don't know whether it could break in some situation or not -
String originalString = "message_type=msgTyp&message_number=1&code=2&message=a&=b×tamp=12";
String[] predefinedParameters = {"message_type", "message_number", "code", "message", "timestamp"};
String delimeter = "###";
for (String str : predefinedParameters) {
originalString = originalString.replace(str+"=", delimeter+str+"=");
}
originalString = originalString.substring(delimeter.length());
String[] result = originalString.split("&"+delimeter);
Assuming that none of the fields contain & or =, you could:
String[] fields = message.split("&");
Map<String,String> fieldMap = new LinkedHashMap<>();
for (String field:fields)
{
String[] fieldParts = field.split("=");
fieldMap.put(fieldParts[0],fieldParts[1]);
}
and have a map of all your fields.
That you are trying to do is to parse a querystring , you should check:
Query String Manipulation in Java