String s = "John Stuart Mill";
String aFriendlyAssigneeName = s.substring(s.lastIndexOf('-')+1);
I'm currently able to remove jstm - from jstm - John Stuart Mill but I'm not sure how to now remove everything after John.
All data will be in the format initials - Fist Middle Last. Basically I just want to strip everything except First.
How can I accomplish this? Perhaps by removing everything after the third white space...
I'd just use this, should be fast enough, and quite short:
String aFriendlyAssigneeName = s.split(" ")[2];
(Splits the string at the spaces in it, and takes the third member of the array, which should be the first name if they're all in that format.)
This should work:
String s = "jstm - John Stuart Mill";
String aFriendlyAssigneeName = s.substring(s.lastIndexOf('-')+1);
String aFriendlyAssigneeName = aFriendlyAssigneeName.substring(aFriendlyAssigneeName.indexOf(' '));
After you have removed th Initials, the firstname ends after the first blank.
You are looking for the following method -
s.substring(startIndex, endIndex);
This gives a begin and end index, this will help you to easily get the middle of any String.
You can then find the last index with a bit of ( I dare say ) magic...
endIndex = indexOf(" ", fromIndex)
Where from index is
s.lastIndexOf('-')+1
Alternatively
If substring is no "hard" requirement, try using
String[] words = s.split(" ");
This will return an array of all values separated by the space.
You can then just select the index of the word. ( This case words[2] )
Why do not you find the substring after the first occurrence of the space in the string that you found without initials?
aFriendlyAssigneeName = aFriendlyAssigneeName.substring(aFriendlyAssigneeName.indexOf(' '));
In my opinion this is a job for a regex: .* - (\w+)? .*
final String value = "jstm - John Stuart Mill";
final Matcher matcher = Pattern.compile(".* - (\\w+)? .*").matcher(value);
matcher.matches();
System.out.println(matcher.group(1));
In my opinion using a regex vs substring:
Pros:
More clear on what you expect as input and what you intent to capture.
Easily modified/extended if input changes or you want to capture some other part.
Cons:
Regexes can look more cryptic to someone that's not used to them.
Related
I have a string consisting of 18 digits Eg. 'abcdefghijklmnopqr'. I need to add a blank space after 5th character and then after 9th character and after 15th character making it look like 'abcde fghi jklmno pqr'. Can I achieve this using regular expression?
As regular expressions are not my cup of tea hence need help from regex gurus out here. Any help is appreciated.
Thanks in advance
Regex finds a match in a string and can't preform a replacement. You could however use regex to find a certain matching substring and replace that, but you would still need a separate method for replacement (making it a two step algorithm).
Since you're not looking for a pattern in your string, but rather just the n-th char, regex wouldn't be of much use, it would make it unnecessary complex.
Here are some ideas on how you could implement a solution:
Use an array of characters to avoid creating redundant strings: create a character array and copy characters from the string before
the given position, put the character at the position, copy the rest
of the characters from the String,... continue until you reach the end
of the string. After that construct the final string from that
array.
Use Substring() method: concatenate substring of the string before
the position, new character, substring of the string after the
position and before the next position,... and so on, until reaching the end of the original string.
Use a StringBuilder and its insert() method.
Note that:
First idea listed might not be a suitable solution for very large strings. It needs an auxiliary array, using additional space.
Second idea creates redundant strings. Strings are immutable and final in Java, and are stored in a pool. Creating
temporary strings should be avoided.
Yes you can use regex groups to achieve that. Something like that:
final Pattern pattern = Pattern.compile("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})");
final Matcher matcher = pattern.matcher("abcdefghijklmnopqr");
if (matcher.matches()) {
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String fourth = matcher.group(3);
return first + " " + second + " " + third + " " + fourth;
} else {
throw new SomeException();
}
Note that pattern should be a constant, I used a local variable here to make it easier to read.
Compared to substrings, which would also work to achieve the desired result, regex also allow you to validate the format of your input data. In the provided example you check that it's a 18 characters long string composed of only lowercase letters.
If you had a more interesting examples, with for example a mix of letters and digits, you could check that each group contains the correct type of data with the regex.
You can also do a simpler version where you just replace with:
"abcdefghijklmnopqr".replaceAll("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})", "$1 $2 $3 $4")
But you don't have the benefit of checking because if the string doesn't match the format it will just not replaced and this is less efficient than substrings.
Here is an example solution using substrings which would be more efficient if you don't care about checking:
final Set<Integer> breaks = Set.of(5, 9, 15);
final String str = "abcdefghijklmnopqr";
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (breaks.contains(i)) {
stringBuilder.append(' ');
}
stringBuilder.append(str.charAt(i));
}
return stringBuilder.toString();
How to split this String in java such that I'll get the text occurring between the braces in a String array?
GivenString = "(1,2,3,4,#) (a,s,3,4,5) (22,324,#$%) (123,3def,f34rf,4fe) (32)"
String [] array = GivenString.split("");
Output must be:
array[0] = "1,2,3,4,#"
array[1] = "a,s,3,4,5"
array[2] = "22,324,#$%"
array[3] = "123,3def,f34rf,4fe"
array[4] = "32"
You can try to use:
Matcher mtc = Pattern.compile("\\((.*?)\\)").matcher(yourString);
The best solution is the answer by Rahul Tripathi, but your question said "How to split", so if you must use split() (e.g. this is an assignment), then this regex will do:
^\s*\(|\)\s*\(|\)\s*$
It says:
Match the open-parenthesis at the beginning
Match close-parenthesis followed by open-parenthesis
Match the close-parenthesis at the end
All 3 allowing whitespace.
As a Java regex, that would mean:
str.split("^\\s*\\(|\\)\\s*\\(|\\)\\s*$")
See regex101 for demo.
The problem with using split() is that the leading open-parenthesis causes a split before the first value, resulting in an empty value at the beginning:
array[0] = ""
array[1] = "1,2,3,4,#"
array[2] = "a,s,3,4,5"
array[3] = "22,324,#$%"
array[4] = "123,3def,f34rf,4fe"
array[5] = "32"
That is why Rahul's answer is better, because it won't see such an empty value.
Usually, you would want to use the split() function as this is the easiest way to split a string into multiple arrays when the string is broken up by a key char.
The main problem is that you need information inbetween two chars. The easiest way to solve this problem would to go through the string get ride of every instance of '('. This leaves the string looking like
String = "1,2,3,4,#) a,s,3,4,5) 22,324,#$%) 123,3def,f34rf,4fe) 32)"
And this is perfect, as you can split by the char ')' and not worry about the other bracket interfering with the split. I suggest using the replace("","") where it replaces every instance of the first parameter with the second parameter (we can use "" to delete it).
Here is some example code that may work :
String a = "(1,2,3,4,#) (a,s,3,4,5) (22,324,#$%) (123,3def,f34rf,4fe) (32)"
a = a.replace("(","");
//a is now equal to 1,2,3,4,#) a,s,3,4,5) 22,324,#$%) 123,3def,f34rf,4fe) 32)
String[] parts = a.split("\\)");
System.out.println(parts[0]); //this will print 1,2,3,4,#
I haven't tested it completely, so you may end up with unwanted spaces at the end of the strings you may need to get rid of!
You can then loop through parts[] and it should have all of the required parts for you!
I need to split a string based on a pattern and again i need to merge it back on a portion of string.
for ex: Below is the actual and expected strings.
String actualstr="abc.def.ghi.jkl.mno";
String expectedstr="abc.mno";
When i use below, i can store in a Array and iterate over to get it back. Is there anyway it can be done simple and efficient than below.
String[] splited = actualstr.split("[\\.\\.\\.\\.\\.\\s]+");
Though i can acess the string based on index, is there any other way to do this easily. Please advise.
You do not understand how regexes work.
Here is your regex without the escapes: [\.\.\.\.\.\s]+
You have a character class ([]). Which means there is no reason to have more than one . in it. You also don't need to escape .s in a char class.
Here is an equivalent regex to your regex: [.\s]+. As a Java String that's: "[.\\s]+".
You can do .split("regex") on your string to get an array. It's very simple to get a solution from that point.
I would use a replaceAll in this case
String actualstr="abc.def.ghi.jkl.mno";
String str = actualstr.replaceAll("\\..*\\.", ".");
This will replace everything with the first and last . with a .
You could also use split
String[] parts = actualString.split("\\.");
string str = parts[0]+"."+parts[parts.length-1]; // first and last word
public static String merge(String string, String delimiter, int... partnumbers)
{
String[] parts = string.split(delimiter);
String result = "";
for ( int x = 0 ; x < partnumbers.length ; x ++ )
{
result += result.length() > 0 ? delimiter.replaceAll("\\\\","") : "";
result += parts[partnumbers[x]];
}
return result;
}
and then use it like:
merge("abc.def.ghi.jkl.mno", "\\.", 0, 4);
I would do it this way
Pattern pattern = Pattern.compile("(\\w*\\.).*\\.(\\w*)");
Matcher matcher = pattern.matcher("abc.def.ghi.jkl.mno");
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
If you can cache the result of
Pattern.compile("(\\w*\\.).*\\.(\\w*)")
and reuse "pattern" all over again this code will be very efficient as pattern compilation is the most expensive. java.lang.String.split() method that other answers suggest uses same Pattern.compile() internally if the pattern length is greater then 1. Meaning that it will do this expensive operation of Pattern compilation on each invocation of the method. See java.util.regex - importance of Pattern.compile()?. So it is much better to have the Pattern compiled and cached and reused.
matcher.group(1) refers to the first group of () which is "(\w*\.)"
matcher.group(2) refers to the second one which is "(\w*)"
even though we don't use it here but just to note that group(0) is the match for the whole regex.
I have string something like :
SKU: XP321654
Quantity: 1
Order date: 01/08/2016
The SKU length is not fixed , so my function sometime returns me the first or two characters of Quantity also which I do not want to get. I want to get only SKU value.
My Code :
int index = Content.indexOf("SKU:");
String SKU = Content.substring(index, index+15);
If SKU has one or two more digits then also it is not able to get because I have specified limit till 15. If I do index + 16 to get long SKU data then for Short SKU it returns me some character of Quantity also.
How can I solve it. Is there any way to use instead of a static string character length as limit.
My SKU last digit will always number so any other thing which I can use to get only SKU till it's last digit?
Using .substring is simply not the way to process such things. What you need is a regex (or regular expression):
Pattern pat = Pattern.compile("SKU\\s*:\\s*(\\S+)");
String sku = null;
Matcher matcher = pattern.matcher(Content);
if(matcher.find()) { //we've found a match
sku = matcher.group(1);
}
//do something with sku
Unescaped the regex is something like:
SKU\s*:\s*(\S+)
you are thus looking for a pattern that starts with SKU then followed by zero or more \s (spacing characters like space and tab), followed by a colon (:) then potentially zero or more spacing characters (\s) and finally the part in which you are interested: one or more (that's the meaning of +) non-spacing characters (\S). By putting these in brackets, these are a matching group. If the regex succeeds in finding the pattern (matcher.find()), you can extract the content of the matching group matcher.group(1) and store it into a string.
Potentially you can improve the regex further if you for instance know more about how a SKU looks like. For instance if it consists only out of uppercase letters and digits, you can replace \S by [0-9A-Z], so then the pattern becomes:
Pattern pat = Pattern.compile("SKU\\s*:\\s*([0-9A-Z]+)");
EDIT: for the quantity data, you could use:
Pattern pat2 = Pattern.compile("Quantity\\s*:\\s*(\\d+)");
int qt = -1;
Matcher matcher = pat2.matcher(Content);
if(matcher.find()) { //we've found a match
qt = Integer.parseInt(matcher.group(1));
}
or see this jdoodle.
You know you can just refer to the length of the string right ?
String s = "SKU: XP321654";
String sku = s.substring(4, s.length()).trim();
I think using a regex is clearly overkill in this case, it is way way simpler than this. You can even split the expression although it's a bit less efficient than the solution above, but please don't use a regex for this !
String sku = "SKU: XP321654".split(':')[1].trim();
1: you have to split your input by lines (or split by \n)
2: when you have your line: you search for : and then you take the remaining of the line (with the String size as mentionned in Dici answer).
Depending on how exactly the string contains new lines, you could do this:
public static void main(String[] args) {
String s = "SKU: XP321654\r\n" +
"Quantity: 1\r\n" +
"Order date: 01/08/2016";
System.out.println(s.substring(s.indexOf(": ") + 2, s.indexOf("\r\n")));
}
Just note that this 1-liner has several restrictions:
The SKU property has to be first. If not, then modify the start index appropriately to search for "SKU: ".
The new lines might be separated otherwise, \R is a regex for all the valid new line escape characters combinations.
Trying to write a short method so that I can parse a string and extract the first word. I have been looking for the best way to do this.
I assume I would use str.split(","), however I would like to grab just the first first word from a string, and save that in one variable, and and put the rest of the tokens in another variable.
Is there a concise way of doing this?
The second parameter of the split method is optional, and if specified will split the target string only N times.
For example:
String mystring = "the quick brown fox";
String arr[] = mystring.split(" ", 2);
String firstWord = arr[0]; //the
String theRest = arr[1]; //quick brown fox
Alternatively you could use the substring method of String.
You should be doing this
String input = "hello world, this is a line of text";
int i = input.indexOf(' ');
String word = input.substring(0, i);
String rest = input.substring(i);
The above is the fastest way of doing this task.
To simplify the above:
text.substring(0, text.indexOf(' '));
Here is a ready function:
private String getFirstWord(String text) {
int index = text.indexOf(' ');
if (index > -1) { // Check if there is more than one word.
return text.substring(0, index).trim(); // Extract first word.
} else {
return text; // Text is the first word itself.
}
}
The simple one I used to do is
str.contains(" ") ? str.split(" ")[0] : str
Where str is your string or text bla bla :). So, if
str is having empty value it returns as it is.
str is having one word, it returns as it is.
str is multiple words, it extract the first word and return.
Hope this is helpful.
import org.apache.commons.lang3.StringUtils;
...
StringUtils.substringBefore("Grigory Kislin", " ")
You can use String.split with a limit of 2.
String s = "Hello World, I'm the rest.";
String[] result = s.split(" ", 2);
String first = result[0];
String rest = result[1];
System.out.println("First: " + first);
System.out.println("Rest: " + rest);
// prints =>
// First: Hello
// Rest: World, I'm the rest.
API docs for: split
for those who are searching for kotlin
var delimiter = " "
var mFullname = "Mahendra Rajdhami"
var greetingName = mFullname.substringBefore(delimiter)
like this:
final String str = "This is a long sentence";
final String[] arr = str.split(" ", 2);
System.out.println(Arrays.toString(arr));
arr[0] is the first word, arr[1] is the rest
You could use a Scanner
http://download.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
The scanner can also use delimiters
other than whitespace. This example
reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();
prints the following output:
1
2
red
blue
None of these answers appears to define what the OP might mean by a "word". As others have already said, a "word boundary" may be a comma, and certainly can't be counted on to be a space, or even "white space" (i.e. also tabs, newlines, etc.)
At the simplest, I'd say the word has to consist of any Unicode letters, and any digits. Even this may not be right: a String may not qualify as a word if it contains numbers, or starts with a number. Furthermore, what about hyphens, or apostrophes, of which there are presumably several variants in the whole of Unicode? All sorts of discussions of this kind and many others will apply not just to English but to all other languages, including non-human language, scientific notation, etc. It's a big topic.
But a start might be this (NB written in Groovy):
String givenString = "one two9 thr0ee four"
// String givenString = "oňňÜÐæne;:tŵo9===tĥr0eè? four!"
// String givenString = "mouse"
// String givenString = "&&^^^%"
String[] substrings = givenString.split( '[^\\p{L}^\\d]+' )
println "substrings |$substrings|"
println "first word |${substrings[0]}|"
This works OK for the first, second and third givenStrings. For "&&^^^%" it says that the first "word" is a zero-length string, and the second is "^^^". Actually a leading zero-length token is String.split's way of saying "your given String starts not with a token but a delimiter".
NB in regex \p{L} means "any Unicode letter". The parameter of String.split is of course what defines the "delimiter pattern"... i.e. a clump of characters which separates tokens.
NB2 Performance issues are irrelevant for a discussion like this, and almost certainly for all contexts.
NB3 My first port of call was Apache Commons' StringUtils package. They are likely to have the most effective and best engineered solutions for this sort of thing. But nothing jumped out... https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html ... although something of use may be lurking there.
You could also use http://download.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
I know this question has been answered already, but I have another solution (For those still searching for answers) which can fit on one line:
It uses the split functionality but only gives you the 1st entity.
String test = "123_456";
String value = test.split("_")[0];
System.out.println(value);
The output will show:
123
The easiest way I found is this:
void main()
String input = "hello world, this is a line of text";
print(input.split(" ").first);
}
Output: hello
Assuming Delimiter is a blank space here:
Before Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String[] words = sentence.split(delimiter);
return words[0];
}
After Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String firstWord = Arrays.stream(sentence.split(delimiter))
.findFirst()
.orElse("No word found");
}
String anotherPalindrome = "Niagara. O roar again!";
String roar = anotherPalindrome.substring(11, 15);
You can also do like these