How to split string using regex without consuming the splitter part? - java

How would I split a string without consuming the splitter part?
Something like this but instead : I'm using #[a-fA-F0-9]{6} regex.
String from = "one:two:three";
String[] to = ["one",":","two",":","three"];
I already tried using commons lib since it has StringUtils.splitPreserveAllTokens() but it does not work with regex.
EDIT: I guess I should have been more specific, but this is more of what I was looking for.
String string = "Some text here #58a337test #a5fadbtest #123456test as well.
#58a337Word#a5fadbwith#123456more hex codes.";
String[] parts = string.split("#[a-fA-F0-9]{6}");
/*Output: ["Some text here ","#58a337","test ","#a5fadb","test ","#123456","test as well. ",
"#58a337","Word","#a5fadb","with","#123456","more hex codes."]*/
EDIT 2: Solution!
final String string = "Some text here #58a337test #a5fadbtest #123456test as
well. #58a337Word#a5fadbwith#123456more hex codes.";
String[] parts = string.split("(?=#.{6})|(?<=#.{6})");
for(String s: parts) {
System.out.println(s);
}
Output:
Some text here
#58a337
test
#a5fadb
test
#123456
test as well.
#58a337
Word
#a5fadb
with
#123456
more hex codes.

You could use \\b (word-break, \ escaped) to split in your case,
final String string = "one:two:three";
String[] parts = string.split("\\b");
for(String s: parts) {
System.out.println(s);
}
Try it online!

The answer given by #vrintle +1 is probably the tightest code which can be written for your exact input. But, assuming you might have other non word characters in the input besides :, then you could also split more precisely using lookarounds:
String from = "one:two:three";
String[] parts = from.split("(?<=:)|(?=:)");
System.out.println(Arrays.toString(parts));
This prints:
[one, :, two, :, three]

Related

Using StringTokenizer with Comma delimiter while at the same time keeping commas preceded by a backslash [duplicate]

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:
String [] parts = input.split(",");
And works great for input like:
a,b,c
Or
type=simple, output=Hello, repeat=true
Just to say something.
How can I escape the comma, so it doesn't match intermediate commas?
For instance, if I want to include a comma in one of the parts:
type=simple, output=Hello, world, repeate=true
I was thinking in something like:
type=simple, output=Hello\, world, repeate=true
But I don't know how to create the split to avoid matching the comma.
I've tried:
String [] parts = input.split("[^\,],");
But, well, is not working.
You can solve it using a negative look behind.
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", " that is not preceeded by a backslash.
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
type=simple
output=Hello\, world
repeate=true
(ideone.com link)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", " which is followed by some word-characters and an =
(ideone.com link)
I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
'type=simple'
'output=Hello, world'
'repeate=true'
Reference:
Pattern: Special Constructs
I think
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash.
BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

when I split a string into multiple strings, how do I get a certain part of that string?

I am trying to find a way to have access to a specific string after a bigger string is broken down into smaller strings. Below is an example:
So now that there are two strings, how do I get the first or second? Since there are brackets, I thought that it is an array of string so I thought all I had to do was do something like System.out.println(parts[0]); But that doesnt work..
String string = "hello ::= good morning";
String parts = Arrays.toString(string.split("::="));
System.out.println(parts);
the output should be [hello, good morning]
You need to put it an array like so:
String s = "I Like Apples."
String[] parts = s.split(" ");
for(String a : parts)
System.out.println(a);
It works fine if you just remove Arrays.toString(...:
String string = "hello ::= good morning";
String parts[] = string.split("::=");
System.out.println(parts[0]);
Update:
To print the whole array, you can do this:-
System.out.println(Arrays.toString(parts));
Also, to trim the spaces, you can change the split line to:-
String parts[] = string.trim().split("\\s*::=\\s*");
This looks like a better use case for pattern grouping with a regular expression to me. I would group the left-hand side and right-hand side of ::= preceded and followed by optional white-space. For example,
String string = "hello ::= good morning";
Pattern p = Pattern.compile("(.+)\\s*::=\\s*(.+)");
Matcher m = p.matcher(string);
if (m.find()) {
System.out.println(m.group(2));
}
Which outputs
good morning
If you also wanted "hello" that would be m.group(1)

Java- Extract part of a string between two similar special characters

Java- Extract part of a string between two similar special characters.
I want to substring the second number, example :
String str = '1-10-251';
I want the result to be: 10
String str = "1-10-251";
String[] strArray = str.split("-");
System.out.println(strArray[1]);

Split string against some characters except the # character

I want to split a string against the following characters
~!#$%^&*()_+­=<>,.?/:;"'{}|[]\, \n,\t, space
I tried to use \\s regex delimiter but i don't want the # included as the split character so that a string like this is #funny should result to this is #funny as the resulting values.
I have tried the following but it doesn't work.
this is #funny".split("\\s")
but it doesn't work. Any ideas?
Just specify the characters you want in square bracket, which means any of. Single escape Java characters (like \") and double escape Regex special characters (like \\[):
#Test
public void testName() throws Exception
{
String[] split = "this is #funny".split("[~!#$%^&*()_+­=<>,.?/:;\"'{}|\\[\\]\\\\ \\n\\t]");
for (String string : split)
{
logger.debug(string);
}
}
User replaceAll(String regex,String replacement) method from String.
String result = "this is #funny".replaceAll("[~!#$%^&*()_+­=<>,.?/:;\"'{}|\\[\\]\\,\\n\\t]", "");
System.out.println(result);
You can try to implement this:
String[] split = "this&is%a#funny^string".split("[^#\\p{Alnum}]|\\s+");
for (String string : split){
System.out.println(string);
}
Also check the Java API (Patterns) for more information on how to process strings.
It look like this will work for you:
String[] split = str.split("[^a-zA-Z&&[^#]]+");
This uses a character class subtraction to split on non-letter chars, except the hash.
Here's some test code:
String str = "this is #funny";
String[] split = str.split("[^a-zA-Z&&[^#]]+");
System.out.println(Arrays.toString(split));
Output:
[this, is, #funny]

java StringTokenizer skips the charaters if its part of delimiter

I have issues in using java string tokenizer:
String myString = "1||2||3|||4";
StringTokenizer stp = new StringTokenizer(myString, "||");
while (stp.hasMoreTokens()) {
System.out.println(stp.nextToken());
}
actual output : [1,2,3,4]
expected output : [1,2,3,'|4']
Could any one help me on the same
Try this..
String myString = "1||2||3|||4";
String[] s=myString.split("\\|\\|");
for (String string : s) {
System.err.println(string);
}
I think you cannot do anything because it's how StringTokenizer works (you can put returnDelims true and remove it manually but it's more hard than look sometimes)
String myString = "1||2||3|||4";
String[] tokens = myString.split("\\|\\|");
for(String token : tokens)
{
System.out.println(token);
}
You can use split which does what you want.
Output:
1
2
3
|4
It is recommended to use the split method of the String class for doing this since StringTokenizer matches the given string and split takes a regular expression. I would use this:
String[] splitStr = myString.split("[|]{2}");
This matches every time the regular expression [|] (a single pipe) is matched twice in a row.
You are maybe thinking of String.split, as this splits on a delimiter string.
A StringTokenizer takes the delimiter string and recognizes all characters in it as a delimiter. So in fact you redundantly specified the "|" character a second time.
Using the split function is what you maybe wanted:
System.out.println(Arrays.toString("1||2||3|||4".split("\\|\\|")));
This produces
[1, 2, 3, |4]
take a look this is an easy solution:
StringTokenizer stp = new StringTokenizer(myString, "|");
while (stp.hasMoreTokens()) {
System.out.println(stp.nextToken());
}

Categories