Regex in java for the string - java

I am very new programmer to Java regular expressions. I do not want to use java split with delimiters and try getting the individual tokens. I don't feel its a neat way. I have the following string
"Some String lang:c,cpp,java file:build.java"
I want to break up this into three parts
1 part containing "Some String"
2 part containing "c,cpp,java"
3 String containing "build.java"
The lang: and file: can be placed any where and they are optional.

The lang: and file: can be placed any where and they are optional.
Try the following expressions to get the language list and the file:
String input = "Some String lang:c,cpp,java file:build.java";
String langExpression = "lang:([\\w,]*)";
String fileExpression = "file:([\w\.]*)";
Patter langPattern = Pattern.compile(langExpression);
Matcher langMatcher = langPattern.matcher(input);
if (langMatcher.matches()) {
String languageList = langMatcher.group(1);
}
Patter filePattern = Pattern.compile(fileExpression );
Matcher fileMatcher = filePattern.matcher(input);
if (fileMatcher .matches()) {
String filename= fileMatcher.group(1);
}
This should work with lang:xxx file:xxx as well as file:xxx lang:xxx as long as the language list or the filename don't contain whitespaces. This would also work if lang: and/or file: was missing.
Would you also expect a string like this: file:build.java Some String lang:c,cpp,java?

What is so "unmaintainable" about using split?
String str = "Some String lang:c,cpp,java file:build.java";
String[] s = str.split("(lang|file):");

While split can do what you want to achieve, you can write your own code using substring and indexOf methods. This will be far faster than using split in terms of performance.

Related

How to replace substrings that contain any integer in java?

How to replace a sub-string in java of form " :[number]: "
example:
string="Hello:6:World"
After replacement,
HelloWorld
ss="hello:909:world";
do as below:
String value = ss.replaceAll("[:]*[0-9]*[:]*","");
You can use a regex to define your desired pattern
String pattern = "(:\d+:)";
string EXAMPLE_TEST = ':12:'
System.out.println(EXAMPLE_TEST.replaceAll(pattern, "text to replace with"));
should work depending on what exactly you want to replace...
Do like this
String s = ":6:";
s = s.replaceAll(":", "");
Edit 1: After the question was changed, one should use
:\d+:
and within Java
:\\d+:
This is the answer for replacing :: as well.
This is the regexp you should use:
:\d*:
Debuggex Demo
And here is a running JavaCode snipped:
String str = "Hello :4: World";
String s = str.replaceAll(":\\d*:","");
System.out.println(s);
One problem with replaceAll is often, that the corrected String is returned. The string object from which replaceAll was called is not modified.

How to replace all numbers in java string

I have string like this String s="ram123",d="ram varma656887"
I want string like ram and ram varma so how to seperate string from combined string
I am trying using regex but it is not working
PersonName.setText(cursor.getString(cursor.getColumnIndex(cursor
.getColumnName(1))).replaceAll("[^0-9]+"));
The correct RegEx for selecting all numbers would be just [0-9], you can skip the +, since you use replaceAll.
However, your usage of replaceAll is wrong, it's defined as follows: replaceAll(String regex, String replacement). The correct code in your example would be: replaceAll("[0-9]", "").
You can use the following regex: \d for representing numbers. In the regex that you use, you have a ^ which will check for any characters other than the charset 0-9
String s="ram123";
System.out.println(s);
/* You don't need the + because you are using the replaceAll method */
s = s.replaceAll("\\d", ""); // or you can also use [0-9]
System.out.println(s);
To remove the numbers, following code will do the trick.
stringname.replaceAll("[0-9]","");
Please do as follows
String name = "ram varma656887";
name = name.replaceAll("[0-9]","");
System.out.println(name);//ram varma
alternatively you can do as
String name = "ram varma656887";
name = name.replaceAll("\\d","");
System.out.println(name);//ram varma
also something like given will work for you
String given = "ram varma656887";
String[] arr = given.split("\\d");
String data = new String();
for(String x : arr){
data = data+x;
}
System.out.println(data);//ram varma
i think you missed the second argument of replace all. You need to put a empty string as argument 2 instead of actually leaving it empty.
try
replaceAll(<your regexp>,"")
you can use Java - String replaceAll() Method.
This method replaces each substring of this string that matches the given regular expression with the given replacement.
Here is the syntax of this method:
public String replaceAll(String regex, String replacement)
Here is the detail of parameters:
regex -- the regular expression to which this string is to be matched.
replacement -- the string which would replace found expression.
Return Value:
This method returns the resulting String.
for your question use this
String s = "ram123", d = "ram varma656887";
System.out.println("s" + s.replaceAll("[0-9]", ""));
System.out.println("d" + d.replaceAll("[0-9]", ""));

Best way to get Tokens in java

I have files with some naming conventions -
Ex 1 - filename1.en.html.xslt
Ex 2 - filename2.de.text.xslt
where en/de - language, html/text - output
I need to read individual files and populate the java object accordingly.
Also, en should be converted to en-US etc, while populating the language field.
Format.java
private String language ;
private string output ;
What is the best way to do this? I know it can be done through plain indexOf or using string tokenizer or parsing thru regex.
If regex is better any code samples please?
It really doesn't matter how you parse the filename as long as it works for you. If you want to take the regex route, a Pattern like this will work:
Pattern p = Pattern.compile("([^.]+)\\.([^.]+)\\.([^.]+)\\.xslt");
The first capture group is the filename, the second is the language, and the third is the output.
That said, a regex does seem like overkill, so what's wrong with using String#split()?
You could do it with StringTokenizer, but String.split() should mostly do the trick.
String foo = "filename1.en.html.xslt"
String[] parts = foo.split("\\."); // regex: need to escape dot
System.out.println(parts[1]); // outputs "en"
With StringTokenizer you could do:
String foo = "filename1.en.html.xslt"
StringTokenizer tokenizer = new StringTokenizer(foo, ".");
List<String> parts = new ArrayList<String>();
while(tokenizer.hasMoreTokens()) {
String part = tokenizer.nextToken();
parts.add(part);
}
System.out.println(parts.get(1)); // "en"

regular expression to split the string in java

I want to split the string say [AO_12345678, Real Estate] into AO_12345678 and Real Estate
how can I do this in Java using regex?
main issue m facing is in avoiding "[" and "]"
please help
Does it really have to be regex?
if not:
String s = "[AO_12345678, Real Estate]";
String[] split = s.substring(1, s.length()-1).split(", ");
I'd go the pragmatic way:
String org = "[AO_12345678, Real Estate]";
String plain = null;
if(org.startsWith("[") {
if(org.endsWith("]") {
plain = org.subString(1, org.length());
} else {
plain = org.subString(1, org.length() + 1);
}
}
String[] result = org.split(",");
If the string is always surrounded with '[]' you can just substring it without checking.
One easy way, assuming the format of all your inputs is consistent, is to ignore regex altogether and just split it. Something like the following would work:
String[] parts = input.split(","); // parts is ["[AO_12345678", "Real Estate]"]
String firstWithoutBrace = parts[0].substring(1);
String secondWithoutBrace = parts[1].substring(0, parts[1].length() - 1);
String first = firstWithoutBrace.trim();
String second = secondWithoutBrace.trim();
Of course you can tailor this as you wish - you might want to check whether the braces are present before removing them, for example. Or you might want to keep any spaces before the comma as part of the first string. This should give you a basis to modify to your specific requirements however.
And in a simple case like this I'd much prefer code like the above to a regex that extracted the two strings - I consider the former much clearer!
you can also use StringTokenizer. Here is the code:
String str="[AO_12345678, Real Estate]"
StringTokenizer st=new StringTokenizer(str,"[],",false);
String s1 = st.nextToken();
String s2 = st.nextToken();
s1=AO_12345678
s1=Real Estate
Refer to javadocs for reading about StringTokenizer
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
Another option using regular expressions (RE) capturing groups:
private static void extract(String text) {
Pattern pattern = Pattern.compile("\\[(.*),\\s*(.*)\\]");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) { // or .matches for matching the whole text
String id = matcher.group(1);
String name = matcher.group(2);
// do something with id and name
System.out.printf("ID: %s%nName: %s%n", id, name);
}
}
If speed/memory is a concern, the RE can be optimized to (using Possessive quantifiers instead of Greedy ones)
"\\[([^,]*+),\\s*+([^\\]]*+)\\]"

Finding tokens in a Java String

Is there a nice way to extract tokens that start with a pre-defined string and end with a pre-defined string?
For example, let's say the starting string is "[" and the ending string is "]". If I have the following string:
"hello[world]this[[is]me"
The output should be:
token[0] = "world"
token[1] = "[is"
(Note: the second token has a 'start' string in it)
I think you can use the Apache Commons Lang feature that exists in StringUtils:
substringsBetween(java.lang.String str,
java.lang.String open,
java.lang.String close)
The API docs say it:
Searches a String for substrings
delimited by a start and end tag,
returning all matching substrings in
an array.
The Commons Lang substringsBetween API can be found here:
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/StringUtils.html#substringsBetween(java.lang.String,%20java.lang.String,%20java.lang.String)
Here is the way I would go to avoid dependency on commons lang.
public static String escapeRegexp(String regexp){
String specChars = "\\$.*+?|()[]{}^";
String result = regexp;
for (int i=0;i<specChars.length();i++){
Character curChar = specChars.charAt(i);
result = result.replaceAll(
"\\"+curChar,
"\\\\" + (i<2?"\\":"") + curChar); // \ and $ must have special treatment
}
return result;
}
public static List<String> findGroup(String content, String pattern, int group) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(content);
List<String> result = new ArrayList<String>();
while (m.find()) {
result.add(m.group(group));
}
return result;
}
public static List<String> tokenize(String content, String firstToken, String lastToken){
String regexp = lastToken.length()>1
?escapeRegexp(firstToken) + "(.*?)"+ escapeRegexp(lastToken)
:escapeRegexp(firstToken) + "([^"+lastToken+"]*)"+ escapeRegexp(lastToken);
return findGroup(content, regexp, 1);
}
Use it like this :
String content = "hello[world]this[[is]me";
List<String> tokens = tokenize(content,"[","]");
StringTokenizer?Set the search string to "[]" and the "include tokens" flag to false and I think you're set.
Normal string tokenizer wont work for his requirement but you have to tweak it or write your own.
There's one way you can do this. It isn't particularly pretty. What it involves is going through the string character by character. When you reach a "[", you start putting the characters into a new token. When you reach a "]", you stop. This would be best done using a data structure not an array since arrays are of static length.
Another solution which may be possible, is to use regexes for the String's split split method. The only problem I have is coming up with a regex which would split the way you want it to. What I can come up with is {]string of characters[) XOR (string of characters[) XOR (]string of characters) Each set of parenthesis denotes a different regex. You should evaluate them in this order so you don't accidentally remove anything you want. I'm not familiar with regexes in Java, so I used "string of characters" to denote that there's characters in between the brackets.
Try a regular expression like:
(.*?\[(.*?)\])
The second capture should contain all of the information between the set of []. This will however not work properly if the string contains nested [].
StringTokenizer won't cut it for the specified behavior. You'll need your own method. Something like:
public List extractTokens(String txt, String str, String end) {
int so=0,eo;
List lst=new ArrayList();
while(so<txt.length() && (so=txt.indexOf(str,so))!=-1) {
so+=str.length();
if(so<txt.length() && (eo=txt.indexOf(end,so))!=-1) {
lst.add(txt.substring(so,eo);
so=eo+end.length();
}
}
return lst;
}
The regular expression \\[[\\[\\w]+\\] gives us
[world] and
[[is]

Categories