extra space after parsing a string with regular expression - java

I have the following simple code:
String d = "_|,|\\.";
String s1 = "b,_a_.";
Pattern p = Pattern.compile(d);
String[] ss = p.split(s1);
for (String str : ss){
System.out.println(str.trim());
}
The output gives
b
a
Where does the extra space come from between b and a?

You do not have an extra space, you get an empty element in the resulting array because your regex matches only 1 character, and when there are several characters from the set on end, the string is split at each of those characters.
Thus, you should match as many of those characters in your character class as possible with + (1 or more) quantifier by placing the whole expression into a non-capturing group ((?:_|,|\\.)+), or - better - using a character class [_,.]+:
String d = "(?:_|,|\\.)+"; // Or better: String d = "[_,.]+";
String s1 = "b,_a_.";
Pattern p = Pattern.compile(d);
String[] ss = p.split(s1);
for (String str : ss){
System.out.println(str.trim());
}
See IDEONE demo

While i get puzzled my self, maybe what you want is to change your regex to
String d = "[_,\\.]+";

Related

Split string without losing split character

I want to split a string in Java some string like this, normal split function splits the string while losing the split characters:
String = "123{456]789[012*";
I want to split the string for {,[,],* character but don't want to lose them. I mean I want results like this:
part 1 = 123{
part 2 = 456]
part 3 = 789[
part 4 = 012*
Normally split function splits like this:
part 1 = 123
part 2 = 456
part 3 = 789
part 4 = 012
Is it possible?
You can use zero-width lookahead/behind expressions to define a regular expression that matches the zero-length string between one of your target characters and anything that is not one of your target characters:
(?<=[{\[\]*])(?=[^{\[\]*])
Pass this expression to String.split:
String[] parts = "123{456]789[012*".split("(?<=[{\\[\\]*])(?=[^{\\[\\]*])");
If you have a block of consecutive delimiter characters this will split once at the end of the whole block, i.e. the string "123{456][789[012*" would split into four blocks "123{", "456][", "789[", "012*". If you used just the first part (the look-behind)
(?<=[{\[\]*])
then you would get five parts "123{", "456]", "[", "789[", "012*"
Using a positive lookbehind:
(?<={|\[|\]|\*)
String str = "123{456]789[012*";
String parts[] = str.split("(?<=\\{|\\[|\\]|\\*)");
System.out.println(Arrays.toString(parts));
Output:
[123{, 456], 789[, 012*]
I think you're looking for something like
String str = "123{456]789[012*";
String[] parts = new String[] {
str.substring(0,4), str.substring(4,8), str.substring(8,12),
str.substring(12)
};
System.out.println(Arrays.toString(parts));
Output is
[123{, 456], 789[, 012*]
You can use a PatternMatcher to find the next index after a splitting character and the splitting character itself.
public static List<String> split(String string, String splitRegex) {
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile(splitRegex);
Matcher m = p.matcher(string);
int index = 0;
while (index < string.length()) {
if (m.find()) {
int splitIndex = m.end();
String splitString = m.group();
result.add(string.substring(index,splitIndex-1) + splitString);
index = splitIndex;
} else
result.add(string.substring(index));
}
return result;
}
Example code:
public static void main(String[] args) {
System.out.println(split("123{456]789[012*","\\{|\\]|\\[|\\*"));
}
Output:
[123{, 456], 789[, 012*]

How to split a '*' String in Java

i have problem to split string with 'split_', it seem my java netbean cant split when 'split_' is used.
any idea how we can overcame this?
i refer to this solution but it can only split without the used of '*'. How to split a string in Java
String echoPHP= "test*split_*test2";
String[] strArray = echoPHP.split("*split_*");
String part1 = strArray2[0]; // 004
String part2 = strArray2[1]; // 034556
System.out.println(strArray[0]);
System.out.println(strArray[1]);
error is:
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*split_*
output supposed to be:
test
test2
Use Pattern.quote() around your split string to ensure it's taken as a literal, not a regular expression:
String[] strArray = echoPHP.split(Pattern.quote("*split_*"));
You'll have difficulties otherwise, since * is a special character in regular expressions used to match any number of occurrences of the character or group that proceeded it.
Of course, you could manually escape all the special characters used in regular expressions using \, but this is both less clear and more error prone if you don't want to use any regular expression features.
try: echoPHP.split("\\*split_\\*")
important thing to remember is that the String you are passing to the split method is really a regular expression. refer to the API for more details: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
Here are different cases to split string in java. You can use one which may fit in your application.
case 1 : Here is code to split string by a character "." :
String imageName = "picture1.jpg";
String [] imageNameArray = imageName.split("\\.");
for(int i =0; i< imageNameArray.length ; i++)
{
system.out.println(imageNameArray[i]);
}
And what if mistakenly there are spaces left before or after "." in such cases? It's always best practice to consider those spaces also.
String imageName = "picture1 . jpg";
String [] imageNameArray = imageName.split("\\s*.\\s*");
for(int i =0; i< imageNameArray.length ; i++)
{
system.out.println(imageNameArray[i]);
}
Here, \\s* is there to consider the spaces and give you only required splitted strings.
Now, suppose you have placed parameters in between two special charaters like : #parameter# or parameter or even two differnt signs at a time like *paramter#. We can have list of all these parameters between those signs by this code :
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.commons.lang.StringUtils;
public class Splitter {
public static void main(String[] args) {
String pattern1 = "#";
String pattern2 = "#";
String text = "(#n1_1#/#n2_2#)*2/#n1_1#*34/#n4_4#";
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
ArrayList parameters = new ArrayList<>();
parameters.add(m.group(1));
System.out.println(parameters);
ArrayList result = new ArrayList<>();
result.add(parameters);
// System.out.println(result.size());
}
}
}
Here list result will have parameters n1_1,n2_2,n4_4.
You can use split method like this
String[] strArray = echoPHP.split("\\*split_\\*");
character is the special charater.. so you should use "\" in front of * character.

Getting next two words from a given word in string with words containing non alphanumeric characters as well

I have a String as below:
String str = "This is something Total Toys (RED) 300,000.00 (49,999.00) This is something";
Input from user would be a keyword String viz. Total Toys (RED)
I can get the index of the keyword using str.indexOf(keyword);
I can also get the start of the next word by adding length of keyword String to above index.
However, how can I get the next two tokens after the keyword in given String which are the values I want?
if(str.contains(keyWord)){
String Value1 = // what should come here such that value1 is 300,000.00 which is first token after keyword string?
String Value2 = // what should come here such that value2 is (49,999.00) which is second token after keyword string?
}
Context : Read a PDF using PDFBox. The keyword above is the header in first column of a table in the PDF and the next two tokens I want to read are the values in the next two columns on the same row in this table.
You can use regular expressions to do this. This will work for all instances of the keyword that are followed by two tokens, if the keyword is not followed by two tokens, it won't match; however, this is easily adaptable, so please state if you want to match in cases where 0 or 1 tokens follow the keyword.
String regex = "(?i)%s\\s+([\\S]+)\\s+([\\S]+)";
Matcher m = Pattern.compile(String.format(regex, Pattern.quote(keyword))).matcher(str);
while (m.find())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
In you example, %s in regex would be replaced by "Total Toys", giving:
300,000.00 49,999.00
(?i) means case-insensitive
\\s means whitespace
\\S means non-whitespace
[...] is a character class
+ means 1 or more
(...) is a capturing group
EDIT:If you want to use a keyword with special characters intrinsic to regular expressions, then you need to use Pattern.quote(). For example, in regex, ( and ) are special characters, so a keyword with them will result in an incorrect regex. Pattern.quote() interprets them as raw characters, so they will be escaped in the regex, ie changed to \\( and \\).
If you want three groups, use this:
String regex = "%s\\s+([\\S]+)\\s+([\\S]+)(?:\\s+([\\S]+))?";
NB: If only two groups follow, group(3) will be null.
Something like this:
String remainingPart= str.substring(str.indexOf(keyWord)+keyWord.length());
StringTokenizer st=new StringTokenizer(remainingPart);
if(st.hasMoreTokens()){
Value1=st.nextToken();
}
if(st.hasMoreTokens()){
Value2=st.nextToken();
}
Try this,
String str = "This is something Total Toys 300,000.00 49,999.00 This is something";
if(str.contains(keyWord)) {
String splitLine = str.split(keyword)[1];
String tokens[] = splitLine.split(" ");
String Value1 = tokens[1];
String Value2 = tokens[2];
}
Here is something that works given what you have provided:
public static void main(String[] args)
{
String search = "Total Toys";
String str = "This is something Total Toys 300,000.00 49,999.00 This is something";
int index = str.indexOf(search);
index += search.length();
String[] tokens = str.substring(index, str.length()).trim().split(" ");
String val1 = tokens[0];
String val2 = tokens[1];
System.out.println("Val1: " + val1 + ", Val2: " + val2);
}
Output:
Val1: 300,000.00, Val2: 49,999.00

Java - Split string

i have string which is separated by "." when i try to split it by the dot it is not getting spitted.
Here is the exact code i have. Please let me know what could cause this not to split the string.
public class TestStringSplit {
public static void main(String[] args) {
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String test[] = testStr.split(".");
for (String string : test) {
System.out.println("test : " + string);
}
System.out.println("Str Length : " + test.length);
}
}
I have to separate the above string and get only the last part. in the above case it is CreateRequisitionRO not CreateRequisitionRO; please help me to get this.
You can split this string through StringTokenizer and get each word between dot
StringTokenizer tokenizer = new StringTokenizer(string, ".");
String firstToken = tokenizer.nextToken();
String secondToken = tokenizer.nextToken();
As you are finding for last word CreateRequisitionRO you can also use
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String yourString = testStr.substring(testStr.lastIndexOf('.')+1, testStr.length()-1);
String testStr = "[Lcom.hexgen.ro.request.CreateRequisitionRO;";
String test[] = testStr.split("\\.");
for (String string : test) {
System.out.println("test : " + string);
}
System.out.println("Str Length : " + test.length);
The "." is a regular expression wildcard you need to escape it.
Change String test[] = testStr.split("."); to String test[] = testStr.split("\\.");.
As the argument to String.split takes a regex argument, you need to escape the dot character (which means wildcard in regex):
Note that String.split takes in a regular expression, and . has special meaning in regular expression (which matches any character except for line separator), so you need to escape it:
String test[] = testStr.split("\\.");
Note that you escape the . at the level of regular expression once: \., and to specify \. in a string literal, \ needs to be escaped again. So the string to pass to String.split is "\\.".
Or another way is to specify it inside a character class, where . loses it special meaning:
String test[] = testStr.split("[.]");
You need to escape the . as it is a special character, a full list of these is available. Your split line needs to be:
String test[] = testStr.split("\\.");
Split takes a regular expression as a parameter. If you want to split by the literal ".", you need to escape the dot because that is a special character in a regular expression. Try putting 2 backslashes before your dot ("\\.") - hopefully that does what you are looking for.
String test[] = testStr.split("\\.");

Java split regular expression

If I have a string, e.g.
setting=value
How can I remove the '=' and turn that into two separate strings containing 'setting' and 'value' respectively?
Thanks very much!
Two options spring to mind.
The first split()s the String on =:
String[] pieces = s.split("=", 2);
String name = pieces[0];
String value = pieces.length > 1 ? pieces[1] : null;
The second uses regexes directly to parse the String:
Pattern p = Pattern.compile("(.*?)=(.*)");
Matcher m = p.matcher(s);
if (m.matches()) {
String name = m.group(1);
String value = m.group(2);
}
The second gives you more power. For example you can automatically lose white space if you change the pattern to:
Pattern p = Pattern.compile("\\s*(.*?)\\s*=\\s*(.*)\\s*");
You don't need a regular expression for this, just do:
String str = "setting=value";
String[] split = str.split("=");
// str[0] == "setting", str[1] == "value"
You might want to set a limit if value can have an = in it too; see the javadoc

Categories