Split a string in java into two parts - java

I want to split a string based on a substring, and get the first part. Example below.
Input:
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]
Ouptut: splitted at [12]
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]
I wrote this code :
String path1 = "body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]"
String result;
if(path1.contains("[12]")){
System.out.println("yes");
result = path1.split("[12]")[0];
System.out.println(result);
}
but I got result like this :
body/div[

String result = path1.substring(0, path1.indexOf("li[12]") + 6);

The split method accepts regular expressions. The regular expression [12] matches one character which is either 1 or 2 and therefore splits the string between each 1 or 2. A better solution is to search for the occurrence of [12] directly:
int indexOf12 = path1.indexOf("[12]");
if(indexOf12 != -1)
{
System.out.println("yes");
String result = path1.substring(0, indexOf12 + 4);
System.out.println(result);
}

The [ character is interpreted as a special regex character so you should escape it by adding \\
So replace
result = path1.split("[12]")[0];
By
result = path1.split("\\[12]")[0];
Output:
yes
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li

need to add [12] after substring so +6 in result
String result = path1.substring(0, path1.indexOf("li[12]")+6);

This will solve your problem. Thing is you have to provide Regex for split. Not only string.
String path1 = "body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]";
String result;
if(path1.contains("[12]")){
System.out.println("yes");
result = path1.split("\\[12\\]")[0];
System.out.println(result+"[12]");
}

Here's an example of RegEx specific approach:
Matcher m = Pattern.compile("(.*\\[12\\])")
.matcher("body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]");
Output
body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]
Code
import java.util.regex.*;
import java.util.*;
public class HelloWorld {
public static void main(String[] args) {
List < String > allMatches = new ArrayList < String > ();
Matcher m = Pattern.compile("(.*\\[12\\])")
.matcher("body/div[2]/div[3]/div/div[1]/div/div[2]/div[2]/ul/li[12]/div/div/div/div[2]/div[2]");
while (m.find())
allMatches.add(m.group(1));
for (String match: allMatches)
System.out.println(match);
}
}

Related

Remove all the leading zero from the number part of a string

I am trying to remove all the leading zero from the number part of a string. I have came up with this code (below). From the given example it worked. But when I add a '0' in the begining it will not give the proper output. Anybody know how to achive this? Thanks in advance
input: (2016)abc00701def00019z -> output: (2016)abc701def19z -> resut: correct
input: 0(2016)abc00701def00019z -> output: (2016)abc71def19z -> result: wrong -> expected output: (2016)abc701def19z
EDIT: The string can contain other than english alphabet.
String localReference = "(2016)abc00701def00019z";
String localReference1 = localReference.replaceAll("[^0-9]+", " ");
List<String> lists = Arrays.asList(localReference1.trim().split(" "));
System.out.println(lists.toString());
String[] replacedString = new String[5];
String[] searchedString = new String[5];
int counter = 0;
for (String list : lists) {
String s = CharMatcher.is('0').trimLeadingFrom(list);
replacedString[counter] = s;
searchedString[counter++] = list;
System.out.println(String.format("Search: %s, replace: %s", list,s));
}
System.out.println(StringUtils.replaceEach(localReference, searchedString, replacedString));
str.replaceAll("(^|[^0-9])0+", "$1");
This removes any row of zeroes after non-digit characters and at the beginning of the string.
I tried doing the task using Regex and was able to do the required according to the two test cases you gave. Also $1 and $2 in the code below are the parts in the () brackets in preceding Regex.
Please find the code below:
public class Demo {
public static void main(String[] args) {
String str = "0(2016)abc00701def00019z";
/*Below line replaces all 0's which come after any a-z or A-Z and which have any number after them from 1-9. */
str = str.replaceAll("([a-zA-Z]+)0+([1-9]+)", "$1$2");
//Below line only replace the 0's coming in the start of the string
str = str.replaceAll("^0+","");
System.out.println(str);
}
}
java has \P{Alpha}+, which matches any non-alphabetic character and then removing the the starting Zero's.
String stringToSearch = "0(2016)abc00701def00019z";
Pattern p1 = Pattern.compile("\\P{Alpha}+");
Matcher m = p1.matcher(stringToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,m.group().replaceAll("\\b0+",""));
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
(2016)abc701def19z

How to extract a number from a string in a particular format?

I have a String like this as shown below. From below string I need to extract number 123 and it can be at any position as shown below but there will be only one number in a string and it will always be in the same format _number_
text_data_123
text_data_123_abc_count
text_data_123_abc_pqr_count
text_tery_qwer_data_123
text_tery_qwer_data_123_count
text_tery_qwer_data_123_abc_pqr_count
Below is the code:
String value = "text_data_123_abc_count";
// this below code will not work as index 2 is not a number in some of the above example
int textId = Integer.parseInt(value.split("_")[2]);
What is the best way to do this?
With a little guava magic:
String value = "text_data_123_abc_count";
Integer id = Ints.tryParse(CharMatcher.inRange('0', '9').retainFrom(value)
see also CharMatcher doc
\\d+
this regex with find should do it for you.
Use Positive lookahead assertion.
Matcher m = Pattern.compile("(?<=_)\\d+(?=_)").matcher(s);
while(m.find())
{
System.out.println(m.group());
}
You can use replaceAll to remove all non-digits to leave only one number (since you say there will be only 1 number in the input string):
String s = "text_data_123_abc_count".replaceAll("[^0-9]", "");
See IDEONE demo
Instead of [^0-9] you can use \D (which also means non-digit):
String s = "text_data_123_abc_count".replaceAll("\\D", "");
Given current requirements and restrictions, the replaceAll solution seems the most convenient (no need to use Matcher directly).
u can get all parts from that string and compare with its UPPERCASE, if it is equal then u can parse it to a number and save:
public class Main {
public static void main(String[] args) {
String txt = "text_tery_qwer_data_123_abc_pqr_count";
String[] words = txt.split("_");
int num = 0;
for (String t : words) {
if(t == t.toUpperCase())
num = Integer.parseInt(t);
}
System.out.println(num);
}
}

Get string within double quotes along with rest of the string

I have a case where I need to extract the string within double quotes in one var and the rest of the string in another var.
Two possibilities:
String: "Franklin B" Benjamin
Result:
var1 = Franklin B
var2 = Benjamin
String: Benjamin "Franklin B"
Result:
var1 = Benjamin
var2 = Franklin B
Regex/Without regex; I am open to any method.
Give this a try...
Basically you remove any leading delimiter in the string before you perform the split. This way you don't have to worry about a leading empty element.
public static void main(String[] args) {
String testString = "\"Franklin B\" Benjamin";
String testString2 = "Benjamin \"Franklin B\"";
displaySplitResults(mySplit(testString, "\""));
displaySplitResults(mySplit(testString2, "\""));
}
private static String[] mySplit(final String input, final String delim)
{
return input.replaceFirst("^" + delim, "").split(delim);
}
private static void displaySplitResults(String[] splitResults) {
if (splitResults.length == 2) {
String var1 = splitResults[0].trim();
String var2 = splitResults[1].trim();
System.out.println(var1);
System.out.println(var2);
}
}
Results:
Franklin B
Benjamin
Benjamin
Franklin B
A simple non-regex way to do it:
public static String[] split(String input) {
if (input.charAt(0) == '"') {
return input.substring(1).split("\" ");
} else {
return input.substring(0, input.length() - 1).split(" \"");
}
}
First check whether the first character is ". Then remove the quote from either beginning or the end and simply split it.
The following will get you a List with the values you want:
private List<String> getValues(String input) {
List<String> matchList = new ArrayList<>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(input);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
return matchList;
}
Taken from Regex for splitting a string using space when not surrounded by single or double quotes
#Shar1er80 Nice piece of work without regex. Worked great.
I also tried with regex:
//Using regex to get values separated by whitespace but keeping values with double quotes
RegexOptions options = RegexOptions.None;
Regex regex = new Regex( #"((""((?<token>.*?)(?<!\\)"")|(?<token>[\w]+))(\s)*)", options );
string input = #" Here is ""my string"" it has "" six matches"" ";
var result = (from Match m in regex.Matches( input )
where m.Groups[ "token" ].Success
select m.Groups[ "token" ].Value).ToList();
Gave me exact result.

Split string without losing split character

I want to split a string in Java some string like this, normal split function splits the string while losing the split characters:
String = "123{456]789[012*";
I want to split the string for {,[,],* character but don't want to lose them. I mean I want results like this:
part 1 = 123{
part 2 = 456]
part 3 = 789[
part 4 = 012*
Normally split function splits like this:
part 1 = 123
part 2 = 456
part 3 = 789
part 4 = 012
Is it possible?
You can use zero-width lookahead/behind expressions to define a regular expression that matches the zero-length string between one of your target characters and anything that is not one of your target characters:
(?<=[{\[\]*])(?=[^{\[\]*])
Pass this expression to String.split:
String[] parts = "123{456]789[012*".split("(?<=[{\\[\\]*])(?=[^{\\[\\]*])");
If you have a block of consecutive delimiter characters this will split once at the end of the whole block, i.e. the string "123{456][789[012*" would split into four blocks "123{", "456][", "789[", "012*". If you used just the first part (the look-behind)
(?<=[{\[\]*])
then you would get five parts "123{", "456]", "[", "789[", "012*"
Using a positive lookbehind:
(?<={|\[|\]|\*)
String str = "123{456]789[012*";
String parts[] = str.split("(?<=\\{|\\[|\\]|\\*)");
System.out.println(Arrays.toString(parts));
Output:
[123{, 456], 789[, 012*]
I think you're looking for something like
String str = "123{456]789[012*";
String[] parts = new String[] {
str.substring(0,4), str.substring(4,8), str.substring(8,12),
str.substring(12)
};
System.out.println(Arrays.toString(parts));
Output is
[123{, 456], 789[, 012*]
You can use a PatternMatcher to find the next index after a splitting character and the splitting character itself.
public static List<String> split(String string, String splitRegex) {
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile(splitRegex);
Matcher m = p.matcher(string);
int index = 0;
while (index < string.length()) {
if (m.find()) {
int splitIndex = m.end();
String splitString = m.group();
result.add(string.substring(index,splitIndex-1) + splitString);
index = splitIndex;
} else
result.add(string.substring(index));
}
return result;
}
Example code:
public static void main(String[] args) {
System.out.println(split("123{456]789[012*","\\{|\\]|\\[|\\*"));
}
Output:
[123{, 456], 789[, 012*]

How to return the first chunk of either numerics or letters from a string?

For example, if I had (-> means return):
aBc123afa5 -> aBc
168dgFF9g -> 168
1GGGGG -> 1
How can I do this in Java? I assume it's something regex related but I'm not great with regex and so not too sure how to implement it (I could with some thought but I have a feeling it would be 5-10 lines long, and I think this could be done in a one-liner).
Thanks
String myString = "aBc123afa5";
String extracted = myString.replaceAll("^([A-Za-z]+|\\d+).*$", "$1");
View the regex demo and the live code demonstration!
To use Matcher.group() and reuse a Pattern for efficiency:
// Class
private static final Pattern pattern = Pattern.compile("^([A-Za-z]+|\\d+).*$");
// Your method
{
String myString = "aBc123afa5";
Matcher matcher = pattern.matcher(myString);
if(matcher.matches())
System.out.println(matcher.group(1));
}
Note: /^([A-Za-z]+|\d+).*$ and /^([A-Za-z]+|\d+)/ both works in similar efficiency. On regex101 you can compare the matcher debug logs to find out this.
Without using regex, you can do this:
String string = "168dgFF9g";
String chunk = "" + string.charAt(0);
boolean searchDigit = Character.isDigit(string.charAt(0));
for (int i = 1; i < string.length(); i++) {
boolean isDigit = Character.isDigit(string.charAt(i));
if (isDigit == searchDigit) {
chunk += string.charAt(i);
} else {
break;
}
}
System.out.println(chunk);
public static String prefix(String s) {
return s.replaceFirst("^(\\d+|\\pL+|).*$", "$1");
}
where
\\d = digit
\\pL = letter
postfix + = one or more
| = or
^ = begin of string
$ = end of string
$1 = first group `( ... )`
An empty alternative (last |) ensures that (...) is always matched, and always a replace happens. Otherwise the original string would be returned.

Categories