I have a string that contains a "table name" that I would like to extract out of this string. Basically from this string below, I would like to just grab "test_table". The String always designates "Table name=", but I am having trouble with walking this string and pulling out the table name that I need.
I need to grab each char until I hit the comma, but I am having trouble. An example string looks like this:
{newModel=Table name=test_table, nameInSource=null, uuid=tid:f1f46c57b618-b9a0d09f-00000001}model
Thanks in advance.
Use a regular expression with a matching group around the part you want. The example below looks for the substring Table name= in the target string then captures every character until it finds a comma in the group numbered one. Finally, if the pattern was found then it returns the characters in group one or "null".
public static String parseTableName(String s) {
Pattern p = Pattern.compile("Table name=([^,]*)");
Matcher m = p.matcher(s);
return m.find() ? m.group(1) : null;
}
// ...
parseTableName(yourString); // => "test_table"
What have you tried? There are a couple of approaches:
You could use a regex to match the key=value pattern to pull it out that way.
You could also just use JSON which is more standard to parse (plenty of libraries to do that).
You could strip out the characters { , } and do a string split on =
In your sample it actually looks like the key your looking for is "name" not "Table name" (where "Table" is the value of the previous "newModel" key. In any case, there are many ways to do this in Java. Assuming you don't know the order of keys/values in the string, I would use the StringTokenizer to split it up by commas, and then cycle through each key to see which is "name", and then use
String tableName;
StringTokenizer st = new StringTokenizer(in, ",");
while(st.hasMoreTokens()) {
String myString= st.nextToken();
if (myString.startsWith("name") {
tableName=myString.substr(myString.charAt("="), myString.length()+2);
}
break;
}
Related
I want to split the string
[{"starDate":"","endDate":"","relativeDays":,"cronExpression":""},{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}]
to
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
Maybe this one-liner work out for you, since it's not clear whether you want two strings or one string with explicit line breaks
String str = "[{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"},{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"}]";
str = String.join("},", str.replaceAll("^\\[|]$+", "").split("},"));
You should ideally do this the proper way using JSON parsing if it applies.
Meanwhile, for this very specific case, you can use a Regular Expression to extract the parts you need (Example in Java) :
String input = "[{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"},{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"}]";
Matcher matcher = Pattern.compile("(\\{[^\\}]*\\})").matcher(input);
while( matcher.find() )
System.out.println(matcher.group(1));
This will result in :
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
I have an String
String string = "-minY:50 -maxY:100 -minVein:8 -maxVein:10 -meta:0 perChunk:5;";
And I want to somehow get the -meta:0 out of it with regex (replace everything except -meta:0), I made an regex which deletes -meta:0 but I can't make it delete everything except -meta:0
I tried using some other regex but it was ignoring whole line when I had -meta:[0-9] in it, and like you can see I have one line for everything.
This is how it has been deleting -meta:0 from the String:
String meta = string.replaceAll("( -meta:[0-9])", "");
System.out.println(meta);
I just somehow want to reverse that and delete everything except -meta:[0-9]
I couldn't find anything on the page about my issue because everything was ignoring whole line after it found the word, so sorry if there's something similar to this.
You should be capturing your match in a captured group and use it's reference in replacement as:
String meta = string.replaceAll("^.*(-meta:\\d+).*$", "$1");
System.out.println(meta);
//=> "-meta:0"
RegEx Demo
As I understand your requirement you want to :
a) you want to extract meta* from the string
b) replace everything else with ""
You could do something like :
String string = "-minY:50 -maxY:100 -minVein:8 -maxVein:10 -meta:0 perChunk:5;";
Pattern p = Pattern.compile(".*(-meta:[0-9]).*");
Matcher m = p.matcher(string);
if ( m.find() )
{
string = string.replaceAll(m.group(0),m.group(1));
System.out.println("After removal of meta* : " + string);
}
What this code does is it finds meta:[0-9] and retains it and removes other found groups
I want to split the string say [AO_12345678, Real Estate] into AO_12345678 and Real Estate
how can I do this in Java using regex?
main issue m facing is in avoiding "[" and "]"
please help
Does it really have to be regex?
if not:
String s = "[AO_12345678, Real Estate]";
String[] split = s.substring(1, s.length()-1).split(", ");
I'd go the pragmatic way:
String org = "[AO_12345678, Real Estate]";
String plain = null;
if(org.startsWith("[") {
if(org.endsWith("]") {
plain = org.subString(1, org.length());
} else {
plain = org.subString(1, org.length() + 1);
}
}
String[] result = org.split(",");
If the string is always surrounded with '[]' you can just substring it without checking.
One easy way, assuming the format of all your inputs is consistent, is to ignore regex altogether and just split it. Something like the following would work:
String[] parts = input.split(","); // parts is ["[AO_12345678", "Real Estate]"]
String firstWithoutBrace = parts[0].substring(1);
String secondWithoutBrace = parts[1].substring(0, parts[1].length() - 1);
String first = firstWithoutBrace.trim();
String second = secondWithoutBrace.trim();
Of course you can tailor this as you wish - you might want to check whether the braces are present before removing them, for example. Or you might want to keep any spaces before the comma as part of the first string. This should give you a basis to modify to your specific requirements however.
And in a simple case like this I'd much prefer code like the above to a regex that extracted the two strings - I consider the former much clearer!
you can also use StringTokenizer. Here is the code:
String str="[AO_12345678, Real Estate]"
StringTokenizer st=new StringTokenizer(str,"[],",false);
String s1 = st.nextToken();
String s2 = st.nextToken();
s1=AO_12345678
s1=Real Estate
Refer to javadocs for reading about StringTokenizer
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
Another option using regular expressions (RE) capturing groups:
private static void extract(String text) {
Pattern pattern = Pattern.compile("\\[(.*),\\s*(.*)\\]");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) { // or .matches for matching the whole text
String id = matcher.group(1);
String name = matcher.group(2);
// do something with id and name
System.out.printf("ID: %s%nName: %s%n", id, name);
}
}
If speed/memory is a concern, the RE can be optimized to (using Possessive quantifiers instead of Greedy ones)
"\\[([^,]*+),\\s*+([^\\]]*+)\\]"
For example I have such a string, in which I must find and replace multiple substrings, all of which start with #, contains 6 symbols, end with ' and should not contain ) ... what do you think would be the best way of achieving that?
Thanks!
Edit:
just one more thing I forgot, to make the replacement, I need that substring, i.e. it gets replaces by a string generated from the substring being replaced.
yourNewText=yourOldText.replaceAll("#[^)]{6}'", "");
Or programmatically:
Matcher matcher = Pattern.compile("#[^)]{6}'").matcher(yourOldText);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb,
// implement your custom logic here, matcher.group() is the found String
someReplacement(matcher.group());
}
matcher.appendTail(sb);
String yourNewString = sb. toString();
Assuming you just know the substrings are formatted like you explained above, but not exactly which 6 characters, try the following:
String result = input.replaceAll("#[^\\)]{6}'", "replacement"); //pattern to replace is #+6 characters not being ) + '
You must use replaceAll with the right regular expression:
myString.replaceAll("#[^)]{6}'", "something")
If you need to replace with an extract of the matched string, use a a match group, like this :
myString.replaceAll("#([^)]{6})'", "blah $1 blah")
the $1 in the second String matches the first parenthesed expression in the first String.
this might not be the best way to do it but...
youstring = youstring.replace("#something'", "new stringx");
youstring = youstring.replace("#something2'", "new stringy");
youstring = youstring.replace("#something3'", "new stringz");
//edited after reading comments, thanks
Is there a nice way to extract tokens that start with a pre-defined string and end with a pre-defined string?
For example, let's say the starting string is "[" and the ending string is "]". If I have the following string:
"hello[world]this[[is]me"
The output should be:
token[0] = "world"
token[1] = "[is"
(Note: the second token has a 'start' string in it)
I think you can use the Apache Commons Lang feature that exists in StringUtils:
substringsBetween(java.lang.String str,
java.lang.String open,
java.lang.String close)
The API docs say it:
Searches a String for substrings
delimited by a start and end tag,
returning all matching substrings in
an array.
The Commons Lang substringsBetween API can be found here:
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/StringUtils.html#substringsBetween(java.lang.String,%20java.lang.String,%20java.lang.String)
Here is the way I would go to avoid dependency on commons lang.
public static String escapeRegexp(String regexp){
String specChars = "\\$.*+?|()[]{}^";
String result = regexp;
for (int i=0;i<specChars.length();i++){
Character curChar = specChars.charAt(i);
result = result.replaceAll(
"\\"+curChar,
"\\\\" + (i<2?"\\":"") + curChar); // \ and $ must have special treatment
}
return result;
}
public static List<String> findGroup(String content, String pattern, int group) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(content);
List<String> result = new ArrayList<String>();
while (m.find()) {
result.add(m.group(group));
}
return result;
}
public static List<String> tokenize(String content, String firstToken, String lastToken){
String regexp = lastToken.length()>1
?escapeRegexp(firstToken) + "(.*?)"+ escapeRegexp(lastToken)
:escapeRegexp(firstToken) + "([^"+lastToken+"]*)"+ escapeRegexp(lastToken);
return findGroup(content, regexp, 1);
}
Use it like this :
String content = "hello[world]this[[is]me";
List<String> tokens = tokenize(content,"[","]");
StringTokenizer?Set the search string to "[]" and the "include tokens" flag to false and I think you're set.
Normal string tokenizer wont work for his requirement but you have to tweak it or write your own.
There's one way you can do this. It isn't particularly pretty. What it involves is going through the string character by character. When you reach a "[", you start putting the characters into a new token. When you reach a "]", you stop. This would be best done using a data structure not an array since arrays are of static length.
Another solution which may be possible, is to use regexes for the String's split split method. The only problem I have is coming up with a regex which would split the way you want it to. What I can come up with is {]string of characters[) XOR (string of characters[) XOR (]string of characters) Each set of parenthesis denotes a different regex. You should evaluate them in this order so you don't accidentally remove anything you want. I'm not familiar with regexes in Java, so I used "string of characters" to denote that there's characters in between the brackets.
Try a regular expression like:
(.*?\[(.*?)\])
The second capture should contain all of the information between the set of []. This will however not work properly if the string contains nested [].
StringTokenizer won't cut it for the specified behavior. You'll need your own method. Something like:
public List extractTokens(String txt, String str, String end) {
int so=0,eo;
List lst=new ArrayList();
while(so<txt.length() && (so=txt.indexOf(str,so))!=-1) {
so+=str.length();
if(so<txt.length() && (eo=txt.indexOf(end,so))!=-1) {
lst.add(txt.substring(so,eo);
so=eo+end.length();
}
}
return lst;
}
The regular expression \\[[\\[\\w]+\\] gives us
[world] and
[[is]