How to split this string in java - java

Hi in my program a string is generated like "1&area_id=54&cid=3".First an integer and then a string "&area_id=",then antoher integer, and after that a string "&cid=" and then the final integer.These two strings are always same.But integer is changing.How to write split function so that i can find these three integers in three variable or with in an array.I can seperate these by looping but i want to use split function.Thanks

How about
string.split("&\\w+=")
This works with your example:
System.out.println(Arrays.asList("1&area_id=54&cid=3".split("&\\w+=")));
outputs
[1, 54, 3]
The call to string.split("&\\w+=") reads in English: Split string on every match for the regular expression parameter, and then return all substrings in between the matched tokens as an array.
The regular expression reads: Match all substrings starting with "&", followed by at least ("+") one word-character ("\\w", i.e. letters, digits, and some special characters, such as the underscore from your example), followed by "=". For more details see the Javadoc for java.util.regex.Pattern

Related

+count the number of words in string

I want to count the number of words in a given String. For example, we are parsing a large text document.
I have used this method
noOfWords = countedStegoText.trim().split(" +").length;
but what about if the text contain two kind of spaces for example( "U+0020" and "U+205F) how I can count the number of word in this case?
.split(...) can take a regular expression, simply build one that includes all your matching characters.
For example:
"hello world-foo_bar".split("[ |\\-|_]")
Results in an array of length 4
["hello", "world", "foo", "bar"]
To use Unicode characters in RegEx you use \u####, so you're looking for something like:
countedStegoText.trim().split("[\u0020|\u205F]").length

Where does the split() method in Java begin matching regex to a string?

I was messing around with the split() method in Java when I came across a problem which I couldn't seem to understand. I was curious as to where exactly the split method starts to search for regex matches: at the first character, before, or after?
Given String "test":
If the split method starts before the first character then there should be an empty string before the string "test", and splitting at an empty string should return an array of length 6, but it is of length 5.
System.out.println("test".split("",-1).length);
So clearly the split method does not start before the given string.
If the split method starts at the first character given string then shouldn't splitting with a regex of "Z*" return an array of length 6 with a leading empty string as the first character is indeed not Z (hence 0 or more times)? However it returns an array of length 5.
System.out.println("test".split("Z*",-1).length);
So by induction the split method starts after the first character...
but clearly it does not since the following code works as expected:
System.out.println("test".split("t",-1).length);
Output: 3
So where exactly does the split method start searching for regex matches? Or what exactly is the gap in my reasoning?
You can read the jdk source code online. Here is split from OpenJdk 8.
String.split has a happy-path optimization for single character strings, but most work is delegated to Pattern.split. Pattern split has a special case for a zero width match at the beginning of the string.

Remove a numeric value along with the word after that numeric value in a String

I want to remove a numeric value plus the words after it.
I have some Strings like this. When I perform my Regex, it only removes the numeric value. But the words after that value is not removed. I want the words after that to be deleted as well.
Before: Strings :
E.g.
APPLEJUICE2.4L
GreenAppleJuice1L
HALVEDPEACHES415g
IceyChocIceCream60ml
After: Strings:
E.g.
APPLEJUICE
GreenAppleJuice
HALVEDPEACHES
IceyChocIceCream
You can use this regex:
String repl = input.replaceFirst("\\d\\S*", "");
i.e. find a first digit in input and match every non-space character after that.
RegEx Demo

Java split by alphabeta char creates an empty value in array

I want to split my string on every occurrence of an alpha-beta character.
for example:
"s1l1e13" to an array of: ["s1","l1","e13"]
when trying to use this simple split by regex i get some weird results:
testStr = "s1l1e13"
Arrays.toString(testStr.split("(?=[a-z])"))
gives me the array of:
["","s1","l1","e13"]
how can i create the split without the empty array element?
I tried a couple more things:
testStr = "s1"
Arrays.toString(testStr.split("(?=[a-z])"))
does return the currect array: ["s1"]
but when trying to use substring
testStr = "s1l1e13"
Arrays.toString(testStr.substring(1).split("(?=[a-z])")
i get in return ["1","l1","e13"]
what am i missing?
Your Lookahead marks each position before any character of a to z; marking the following positions:
s1 l1 e13
^ ^ ^
So by spliting using just the Lookahead, it returns ["", "s1", "l1", "e13"]
You can use a Negative Lookbehind here. This looks behind to see if there is not the beginning of the string.
String s = "s1l1e13";
String[] parts = s.split("(?<!\\A)(?=[a-z])");
System.out.println(Arrays.toString(parts)); //=> [s1, l1, e13]
Your problem is that (?=[a-z]) means "place before [a-z]" and in your text
s1l1e13
you have 3 such places. I will mark them with |
|s1|l1|e13
so split (unfortunately correctly) produces "" "s1" "l1" "e13" and doesn't automatically remove for you first empty elements.
To solve this problem you have at least two options:
make sure that there is something before your place you need to split on (it is not at start of your string). You can use for instance (?<=\\d)(?=[a-z]) if you want to split after digit but before character
(PREFFERED SOLUTION) start using Java 8 which automatically removes empty strings at start of result array if regex used on split is zero-length (look-arounds are zero length).
The first match finds "" to be okay because its looking ahead for any alpha character, which is called zero-width lookahead, so it doesn't need to actually match anything. So "s" at the beginning is alphanumeric, and it matches that at a probable spot.
If you want the regex to match something always, use ".+(?=[a-z])"
The problem is that the initial "s" counts as an alphabetic character. So, the regex is trying to split at s.
The issue is that there is nothing before the s, so the regex machine instead decides to show that there is nothing by adding the null element. It'll do the same thing at the end if you ended with "s" (or any other letter).
If this is the only string you're splitting, or if every array you had starts with a letter but does not end with one, just truncate the array to omit the first element. Otherwise, you'll probably need to loop through each array as you make it so that you can drop empty elements.
So it seems your matches has the pattern x###, where x is a letter, and # is a number.
I'd make the following Regex:
([a-z][0-9]+)

Java String split regex not working as expected

The following Java code will print "0". I would expect that this would print "4". According to the Java API String.split "Splits this string around matches of the given regular expression". And from the linked regular expression documentation:
Predefined character classes
. Any character (may or may not match line terminators)
Therefore I would expect "Test" to be split on each character. I am clearly misunderstanding something.
System.out.println("Test".split(".").length); //0
You're right: it is split on each character. However, the "split" character is not returned by this function, hence the resulting 0.
The important part in the javadoc: "Trailing empty strings are therefore not included in the resulting array. "
I think you want "Test".split(".", -1).length(), but this will return 5 and not 4 (there are 5 ''spaces'': one before T, one between T and e, others for e-s, s-t, and the last one after the final t.)
You have to use two backslashes and it should work fine:
Here is example:
String[] parts = string.split("\\.",-1);
Everything is ok. "Test" is split on each character, and so between them there is no character. If you want iterate your string over each character you can use charAt and length methods.

Categories