Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
First post. Be nice?
Learning Java.
I have a String object "1 Book on wombats at 12.99"
I want to split this String into either a String[] OR an ArrayList<String> splitting the string on the first space and around the word " at " so my String[] has 3 Strings of "1" "Book on wombats" "12.99"
my current solution is:
// private method call from my constructor method
ArrayList<String> fields = extractFields(item);
// private method
private ArrayList<String> extractFields (String item) {
ArrayList<String> parts = new ArrayList<String>();
String[] sliceQuanity = item.split(" ", 2);
parts.add(sliceQuanity[0]);
String[] slicePrice = sliceQuanity[1].split(" at ");
parts.add(slicePrice[0]);
parts.add(slicePrice[1]);
return parts;
}
So this works fine, but surely there is a more elegant way? perhaps with regex which is something that I'm still trying to get a good handle on.
Thankyou!
your could use this pattern
^(\S+)\s(.*?)\sat\s(.*)$
Demo
^ begining of string
(\S+) caputre anything that is not a white space
\s a white space
(.*?) capture as few as possible
\sat\s followed by a white space, the word "at" and a white space
(.*)$ then capture anything to the end
This regex will return what you need: ^(\S+)\s(.*?)\sat\s(.*)$
Explanation:
^ assert position at start of a line.
\S+ will match any non-white space character.
\s will match any white space character.
.*? will match any character (except newline).
\s again will match any white space character.
at matches the characters at literally (case sensitive).
\s again will match any white space character.
(.*)$ will match any character (except newline), and assert position at end of a line.
Well it would be simpler by just calling .split() on item.
Store that array in a String[], then hardcode which index you want of your String[] into the ArrayList that you are returning. The String.concat() method might help as well.
Here's a piece of code to arrive at the String[] result that you requested. Using the regex expression suggested in the other answers:
^(\S+)\s(.*?)\sat\s(.*)$ is converted to a Java string by escaping each of the backslashes with another backslash, so they appear twice when creating the Pattern object.
String item = "1 Book on wombats at 12.99";
Pattern pattern = Pattern.compile("^(\\S+)\\s(.*?)\\sat\\s(.*)$");
Matcher matcher = pattern.matcher(item);
matcher.find();
String[] parts = new String[]{matcher.group(1),matcher.group(2),matcher.group(3)};
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.
I have String like below ,I want to get subString If any special character is there.
String myString="Regular $express&ions are <patterns <that can# be %matched against *strings";
I want out like below
express
inos
patterns
that
matched
Strings
Any one help me.Thanks in Advance
Note: as #MaxZoom pointed out, it seems that I didn't understand the OP's problem properly. The OP apparently does not want to split the string on special characters, but rather keep the words starting with a special character. The former is adressed by my answer, the latter by #MaxZoom's answer.
You should take a look at the String.split() method.
Give it a regexp matching all the characters you want, and you'll get an array of all the strings you want. For instance:
String myString = "Regular $express&ions are <patterns <that can# be %matched against *strings";
String[] words = myString.split("[$&<#%*]");
This regex will select words that starts with special character:
[$&<%*](\w*)
explanation:
[$&<%*] match a single character present in the list below
$&<%* a single character in the list $&<%* literally (case sensitive)
1st Capturing group (\w*)
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
g modifier: global. All matches (don't return on first match)
DEMO
MATCH 1 [9-16] express
MATCH 2 [17-21] ions
MATCH 3 [27-35] patterns
MATCH 4 [37-41] that
MATCH 5 [51-58] matched
MATCH 6 [68-75] strings
Solution in Java code:
String str = "Regular $express&ions are <patterns <that can# be %matched against *strings";
Matcher matcher = Pattern.compile("[$&<%*](\\w*)").matcher(str);
List<String> words = new ArrayList<>();
while (matcher.find()) {
words.add(matcher.group(1));
}
System.out.println(words.toString());
// prints [express, ions, patterns, that, matched, strings]
I have string with spaces and some non-informative characters and substrings required to be excluded and just to keep some important sections. I used the split as below:
String myString[]={"01: Hi you look tired today? Can I help you?"};
myString=myString[0].split("[\\s+]");// Split based on any white spaces
for(int ii=0;ii<myString.length;ii++)
System.out.println(myString[ii]);
The result is :
01:
Hi
you
look
tired
today?
Can
I
help
you?
The spaces appeared after the split as sub strings when the regex is “[\s+]” but disappeared when the regex is "\s+". I am confused and not able to find answer in the related stack overflow pages. The link regex-Pattern made me more confused.
Please help, I am new with java.
19/1/2015:Edit
After your valuable advice, I reached to point in my program where a conditional statements is required to be decomposed and processed. The case I have is:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\,]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result is fine till now as:
01:IF
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
with
0.4610;
My next step is to add string "with" to the regex and get rid of this word while doing the split.
I tried it this way:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\, with]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result not perfect, because I got unwonted extra split at every "h" letter as:
01:IF
rd.
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
0.4610;
Any advice on how to specify string with mixed white spaces and separation marks?
Many thanks.
inside square brackets, [\s+] will represent the whitespace character class with the plus sign added. it is only one character so a sequence of spaces will split many empty strings as Todd noted, and will also use + as separator.
you should use \s+ (without brackets) as the separator. that means one or more whitespace characters.
myString=myString[0].split("\\s+");
Your biggest problem is not understanding enough about regular expressions to write them properly. One key point you don't comprehend is that [...] is a character class, which is a list of characters any one of which can match. For example:
[abc] matches either a, b or c (it does not match "abc")
[\\s+] matches any whitespace or "+" character
[with] matches a single character that is either w, i, t or h
[.$&^?] matches those literal characters - most characters lose their special regex meaning when in a character class
To split on any number of whitespace, comma and ampersand and consume "with" (if it appears), do this:
String [] s2 = s1.split("[\\s,&]+(with[\\s,&]+)?");
You can try it easily here Online Regex and get useful comments.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have some code:
firstWord = sentance.substring (0, sentance.indexOf(' '));
secondWord = sentance.substring(sentance.indexOf(' ') + 1);
the code is used for selecting the first word out of a string without the use of arrays.
However I am wondering if I can further fool-proof my code by implementing a safeguard so that if the user inputs two spaces, the code will take the group of white space and count it as one unit.
Is this possible without the use of arrays, or loops?
For example the user would input this:
"Hello 2spaces there"
the user accidentally inputted two spaces in the beginning which will mess the program up when it tries to take the second word i think.
remove multiple space with single space as :
String str="Hello world this is string";
str=str.replaceAll("\\s+", " ");
.......// do whatever you want
Your code will not take the first word out of string only if the first character of the string is a space, or before the first space is not a word, or there is no space, for example " hello" -> "", "!##! blah" -> "!##!", "asdasd" -> ""
y.indexOf(x) returns the index of the first occurrence of x in y.
Your solution is mostly foolproof, but it will fail to get the first word if there are spaces before it, or there is no whitespace in the specified string, because indexOf would return -1.
You should call the .trim() method on the string object you want to get the first word of, it will remove the whitespace around the string, and then add a single space character at the end of the string.
str = "Hello I'm your String";
String[] splited = str.split("\\s+");
You can use arrays they are not that bad.
If you really must avoid using an array, you could use sentance.replaceAll("\\s+", " "); first to collapse all sequences of consecutive whitespace into singleton spaces.
(Similarly, you would want to trim() leading and trailing whitespace as well.)
If you just want to remove trailing and leading whitespace use .trim()
str = str.trim()
If the title not clear, here is the examples
I want to remove all the Special Characters in a String, a word character after special character followed by a white space.
String = "Here is the world's first movie. #movie";
with the above example, I need the output like
"Here is the world first movie movie";
I tried many regex to achieve this, but I can't I tried following
replaceAll("[^\\w]\\w{1}\\s", " ");
but it's not working, can you tell me the Regex for this with explanation.
thanks in advance
Edit: requirement is I want to remove the special character like in #movie, and also want to remove special character with a single character followed by a space like in world's favorite the final output should be world favorite.
MOST PROBABLY THE CHARACTER AFTER A SPECIAL CHARS WILL BE 's'. for example
world's, India's, john's
String input = "Here is the world's first movie. #movie";
String output = input.replaceAll("[!##$%^&*][a-z]+", "");
System.out.println(output);
Regex - [!##$%^&*][a-z]+
[!##$%^&*] - tests for special character for the first character
[a-z]+ - tests for word
Finally I made it to work by many trial and error, The Final Regex String is [^a-zA-Z\\s]s?(\\s)?
str.replaceAll("[^a-zA-Z\\s]s?(\\s)?", "$1");
Here is the Explanation,
[^a-zA-Z\\s]s? => Find all the special character except White space, follwed by one or zero s
(\\s)? => and immediately followed by a one or zero White space.
Here I grouped (\\s) becoz I want to use that in the replaceAll() function as 2nd Parameter to replace as $1.
The Answer is
String orig = "Here is the world's first movie. #movie";`
System.out.println(orig..replaceAll("[^a-zA-Z\\s]s?(\\s)?", "$1"));
Final Output will be Here is the world first movie movie