For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).
What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.
Reassign the variable to a substring:
s = s.substring(0, s.length() - 1)
Also an alternative way of solving your problem: you might also want to consider using a StringTokenizer to read the file and set the delimiters to be the characters you don't want to be part of words.
Use:
String str = "whatever";
str = str.replaceAll("[,.]", "");
replaceAll takes a regular expression. This:
[,.]
...looks for each comma and/or period.
To remove the last character do as Mark Byers said
s = s.substring(0, s.length() - 1);
Additionally, another way to remove the characters you don't want would be to use the .replace(oldCharacter, newCharacter) method.
as in:
s = s.replace(",","");
and
s = s.replace(".","");
You can't modify a String in Java. They are immutable. All you can do is create a new string that is substring of the old string, minus the last character.
In some cases a StringBuffer might help you instead.
The best method is what Mark Byers explains:
s = s.substring(0, s.length() - 1)
For example, if we want to replace \ to space " " with ReplaceAll, it doesn't work fine
String.replaceAll("\\", "");
or
String.replaceAll("\\$", ""); //if it is a path
Note that the word boundaries also depend on the Locale. I think the best way to do it using standard java.text.BreakIterator. Here is an example from the java.sun.com tutorial.
import java.text.BreakIterator;
import java.util.Locale;
public static void main(String[] args) {
String text = "\n" +
"\n" +
"For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).\n" +
"\n" +
"What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.\n" +
"\n" +
"Every help appreciated. Thanx";
BreakIterator wordIterator = BreakIterator.getWordInstance(Locale.getDefault());
extractWords(text, wordIterator);
}
static void extractWords(String target, BreakIterator wordIterator) {
wordIterator.setText(target);
int start = wordIterator.first();
int end = wordIterator.next();
while (end != BreakIterator.DONE) {
String word = target.substring(start, end);
if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
}
start = end;
end = wordIterator.next();
}
}
Source: http://java.sun.com/docs/books/tutorial/i18n/text/word.html
You can use replaceAll() method :
String.replaceAll(",", "");
String.replaceAll("\\.", "");
String.replaceAll("\\(", "");
etc..
Related
I have a string consisting of 18 digits Eg. 'abcdefghijklmnopqr'. I need to add a blank space after 5th character and then after 9th character and after 15th character making it look like 'abcde fghi jklmno pqr'. Can I achieve this using regular expression?
As regular expressions are not my cup of tea hence need help from regex gurus out here. Any help is appreciated.
Thanks in advance
Regex finds a match in a string and can't preform a replacement. You could however use regex to find a certain matching substring and replace that, but you would still need a separate method for replacement (making it a two step algorithm).
Since you're not looking for a pattern in your string, but rather just the n-th char, regex wouldn't be of much use, it would make it unnecessary complex.
Here are some ideas on how you could implement a solution:
Use an array of characters to avoid creating redundant strings: create a character array and copy characters from the string before
the given position, put the character at the position, copy the rest
of the characters from the String,... continue until you reach the end
of the string. After that construct the final string from that
array.
Use Substring() method: concatenate substring of the string before
the position, new character, substring of the string after the
position and before the next position,... and so on, until reaching the end of the original string.
Use a StringBuilder and its insert() method.
Note that:
First idea listed might not be a suitable solution for very large strings. It needs an auxiliary array, using additional space.
Second idea creates redundant strings. Strings are immutable and final in Java, and are stored in a pool. Creating
temporary strings should be avoided.
Yes you can use regex groups to achieve that. Something like that:
final Pattern pattern = Pattern.compile("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})");
final Matcher matcher = pattern.matcher("abcdefghijklmnopqr");
if (matcher.matches()) {
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String fourth = matcher.group(3);
return first + " " + second + " " + third + " " + fourth;
} else {
throw new SomeException();
}
Note that pattern should be a constant, I used a local variable here to make it easier to read.
Compared to substrings, which would also work to achieve the desired result, regex also allow you to validate the format of your input data. In the provided example you check that it's a 18 characters long string composed of only lowercase letters.
If you had a more interesting examples, with for example a mix of letters and digits, you could check that each group contains the correct type of data with the regex.
You can also do a simpler version where you just replace with:
"abcdefghijklmnopqr".replaceAll("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})", "$1 $2 $3 $4")
But you don't have the benefit of checking because if the string doesn't match the format it will just not replaced and this is less efficient than substrings.
Here is an example solution using substrings which would be more efficient if you don't care about checking:
final Set<Integer> breaks = Set.of(5, 9, 15);
final String str = "abcdefghijklmnopqr";
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (breaks.contains(i)) {
stringBuilder.append(' ');
}
stringBuilder.append(str.charAt(i));
}
return stringBuilder.toString();
i have a link http://localhost:8080/reporting/pvsUsageAction.do?form_action=inline_audit_view&days=7&projectStatus=scheduled&justificationId=5&justificationName= No Technicians in Area in my struts based web application.
The variable in URL justificationName have some spaces before its vales as shown. when i get value of justificationName using request.getParameter("justificationName") it gives me that value with spaces as given in the URL. i want to remove those spaces. i tried trim() i tries str = str.replace(" ", ""); but any of them did not removed those spaces. can any one tell some other way to remove the space.
Noted one more thing that i did right click on the link and opened the link into new tab there i noticed that link looks like.
http://localhost:8080/reporting/pvsUsageAction.do?form_action=inline_audit_view&days=7&projectStatus=scheduled&justificationId=5&justificationName=%A0%A0%A0%A0%A0%A0%A0%A0No%20Technicians%20in%20Area
Notable point is that in the address bar it shows %A0 for white spaces and also show %20 for space as well see the link and tell the difference please if any one have idea about it.
EDIT
Here is my code
String justificationCode = "";
if (request.getParameter("justificationName") != null) {
justificationCode = request.getParameter("justificationName");
}
justificationCode = justificationCode.replace(" ", "");
Note: replace function remove the space from inside the string but not removing starting spaces.
e-g if my string is " This is string" after using replace it becomes " Thisisstring"
Thanks in advance
Strings are immutable in Java, so the method doesn't change the string you pass but returns a new one. You must use the returned value :
str = str.replace(" ", "");
Manual trim
You need to remove the spaces the string. This will remove any number of consecutive spaces.
String trimmed = str.replaceAll(" +", "");
If you want to replace all whitespace characters:
String trimmed = str.replaceAll("\\s+", "");
URL Encoding
You could also use an URLEncoder, which sounds like a more appropriate way to go:
import java.net.UrlEncoder;
String url = "http://localhost:8080/reporting/" + URLEncoder.encode("pvsUsageAction.do?form_action=inline_audit_view&days=7&projectStatus=scheduled&justificationId=5&justificationName= No Technicians in Area", "ISO-8859-1");
You have to assign the result of the replace(String regex, String replacement) operation to another variable. See the Javadoc for the replace(String regex, String replacement) method. It returns a brand new String object and this is because the String(s) in Java are immutable. In your case, you can simply do the following
String noSpacesString = str.replace("\\s+", "");
You can use replaceAll("\\s","") It will remove all white space.
If you are trying to remove the trailing and ending white spaces, then
s = s.trim();
Or if you want to remove all the spaces the use :
s = s.replace(" ","");
There are two ways of doing one is regular expression based or your own way of implementing the logic
replaceAll("\\s","")
or
if (text.contains(" ") || text.contains("\t") || text.contains("\r")
|| text.contains("\n"))
{
//code goes here
}
I'm reading line by line from a text file which contains a string followed by a white space followed by another string. It's the second string I want to use for my method.
Example of text file:
0h e3ne6t
ie 51b0x
6 8qlaqi
ty2 9j5dbb
nwz55 7lrwor
So I want 'e3ne6t', then '51b0x' etc.
I've tried using the .substring method and have tried using " " and "\s" as representations of white space.
Here's a snippet of code that should give you a good idea of what I'm trying to achieve.
while ((strLine = br.readLine()) != null) {
lineNumber++;
System.out.println("lineNumber = " + lineNumber);
int index = strLine.indexOf(" "); // tried \\s
System.out.println("index = " + index);
strLine.substring(index);
System.out.println(strLine);
if (myString.equals(strLine)) {
System.out.println("Match Found!");;
System.out.println("myString = " + myString );
System.out.println("strLine =" + strLine);
}
}
I even tried changing the white space to a "+" but it still wouldn't work.
Suggestions?
substring doesn't change the contents of the string you call it on - nothing does, as String is immutable in Java. Instead, it returns a new string which is the relevant substring. So you can use:
strLine = strLine.substring(index);
(The same is true for things like toUpperCase, trim, replace etc.)
String's are immutable in java. you need to reassign the value retrieved by substring to the actual variable.
strLine=strLine.substring(index);
Also note that indexOf(str) doesn't take regex, so indexOf("\\s") would give you nothing.
As others have mentioned, Strings are immutable in Java. The substring method returns a new String that is the substring. But if you pass index to substring, then you will get a substring starting with your space character, e.g. " e3ne6t". So I would use this:
strLine = strLine.substring(index + 1);
to get your second field, advancing past the space character, as long as index is not -1 (not found).
Manipulating a String using .substring in Java
www.gleegrid.com/all_in_one/language/substring
I'm looking for the simplest way of tokenizing strings such as
INPUT OUTPUT
"hello %my% world" -> "hello ", "%my%", " world"
in Java. Is it possible to accomplish this with regex? I am basically looking for a String.split() that takes as separator something of the form "%*%" but that won't ignore it, as it seems to generally do.
Thanks
No, you can't do this the way you explained it. The reason is--it's ambiguous!
You give the example:
"hello %my% world" -> "hello ", "%my%", " world"
Should the % be attached to the string before it or after it?
Should the output be
"hello ", "%my", "% world"
Or, perhaps the output should be
"hello %", "my%", " world"
In your example you don't follow either of these rules. You come up with %my% which attaches the delimiter first to the string after it appears and then to the string before it appears.
Do you see the ambiguity?
So, you first need to come up with a clear set of rules about where you want the delimeter to be attached to. Once you do this, one simple (although not particularly efficient since Strings are immutable) way of achieving what you want is to:
Use String.split() to split the strings in the normal way
Follow your rule set to re-add the delimiter to where it should be in the string.
A simpler solution would be to just split the string by %s. That way, every other subsequence would have been between %s. All you have to do afterwards is iterate over the results, toggling a flag to know if the result is a regular string or one between %s.
Special attention has to be taken to the split implementation, how does it handle empty subsequences. Some implementations decide to discard empty subsequences at the begin/end of the input, others discard all empty subsequences and others discard none of them.
This would not result in the exact output that you want, since the %s would be gone. However you can easily add those back if there is an actual need for them (and I presume there isn't).
why not you split by space between your words. in that case you will get "hello","%my%","world".
If possible, use a simpler delimiter. And I'm okay with jury-rigging "%" as your delimiter, just so you can get String.split() instead of regexps. But if that's not possible...
Regexps! You can parse this using a Matcher. If you know there's one delimiter per line, you specify a pattern that eats the whole line:
String singleDelimRegexp = "(.*)(%[^%]*%)(.*)";
Pattern singleDelimPattern = Pattern.compile(singleDelimRegexp);
Matcher singleDelimMatcher = singleDelimPattern.matcher(input);
if (singleDelimMatcher.matches()) {
String before = singleDelimMatcher.group(1);
String delim = singleDelimMatcher.group(2);
String after = singleDelimMatcher.group(3);
System.out.println(before + "//" + delim + "//" + after);
}
If the input is long and you need a chain of results, you use Matcher in a loop:
String multiDelimRegexp = "%[^%]*%";
Pattern multiDelimPattern = Pattern.compile(multiDelimRegexp);
Matcher multiDelimMatcher = multiDelimPattern.matcher(input);
int lastEnd = 0;
while (multiDelimMatcher.find()) {
String data = input.substring(lastEnd, multiDelimMatcher.start());
String delim = multiDelimMatcher.group();
lastEnd = multiDelimMatcher.end();
System.out.println(data);
System.out.println(delim);
}
String lastData = input.substring(lastEnd);
System.out.println(lastData);
Add those to a data structure as you go, and you'll build the whole parsed input.
Running on input: http://ideone.com/s8FzeW
I'm trying to replace the last dot in a String using a regular expression.
Let's say I have the following String:
String string = "hello.world.how.are.you!";
I want to replace the last dot with an exclamation mark such that the result is:
"hello.world.how.are!you!"
I have tried various expressions using the method String.replaceAll(String, String) without any luck.
One way would be:
string = string.replaceAll("^(.*)\\.(.*)$","$1!$2");
Alternatively you can use negative lookahead as:
string = string.replaceAll("\\.(?!.*\\.)","!");
Regex in Action
Although you can use a regex, it's sometimes best to step back and just do it the old-fashioned way. I've always been of the belief that, if you can't think of a regex to do it in about two minutes, it's probably not suited to a regex solution.
No doubt get some wonderful regex answers here. Some of them may even be readable :-)
You can use lastIndexOf to get the last occurrence and substring to build a new string: This complete program shows how:
public class testprog {
public static String morph (String s) {
int pos = s.lastIndexOf(".");
if (pos >= 0)
return s.substring(0,pos) + "!" + s.substring(pos+1);
return s;
}
public static void main(String args[]) {
System.out.println (morph("hello.world.how.are.you!"));
System.out.println (morph("no dots in here"));
System.out.println (morph(". first"));
System.out.println (morph("last ."));
}
}
The output is:
hello.world.how.are!you!
no dots in here
! first
last !
The regex you need is \\.(?=[^.]*$). the ?= is a lookahead assertion
"hello.world.how.are.you!".replace("\\.(?=[^.]*$)", "!")
Try this:
string = string.replaceAll("[.]$", "");