Efficient Text Processing Java

Efficient Text Processing Java - java

I have created an application to process log files but am having some bottle neck when the amount of files = ~20
The issue comes from a particular method which takes on average a second or so to complete roughly and as you can imagime this isn't practical when it needs to be done > 50 times
private String getIdFromLine(String line){
String[] values = line.split("\t");
String newLine = substringBetween(values[4], "Some String : ", "Value=");
String[] split = newLine.split(" ");
return split[1].substring(4, split[1].length());
}
private String substringBetween(String str, String open, String close) {
if (str == null || open == null || close == null) {
return null;
}
int start = str.indexOf(open);
if (start != -1) {
int end = str.indexOf(close, start + open.length());
if (end != -1) {
return str.substring(start + open.length(), end);
}
}
return null;
}
A line comes from the reading of a file which is very efficient so I don't feel a need to post that code unless someone asks.
Is there anyway to improve perofmrance of this at all?
Thanks for your time

A few things are likely problematic:
Whether or not you realized, you are using regular expressions. The argument to String.split() is a treated as a regex. Using String.indexOf() will almost certainly be a faster way to find the particular portion of the String that you want. As HRgiger points out, Guava's splitter is a good choice because it does just that.
You're allocating a bunch of stuff you don't need. Depending on how long your lines are, you could be creating a ton of extra Strings and String[]s that you don't need (and the garbage collecting them). Another reason to avoid String.split().
I also recommend using String.startsWith() and String.endsWith() rather that all of this stuff that you're doing with the indexOf() if only for the fact that it'd be easier to read.

I would try to use regular expressions.

One of the main problems in this code is the "split" method.
For example this one:
private String getIdFromLine3(String line) {
int t_index = -1;
for (int i = 0; i < 3; i++) {
t_index = line.indexOf("\t", t_index+1);
if (t_index == -1) return null;
}
//String[] values = line.split("\t");
String newLine = substringBetween(line.substring(t_index + 1), "Some String : ", "Value=");
// String[] split = newLine.split(" ");
int p_index = newLine.indexOf(" ");
if (p_index == -1) return null;
int p_index2 = newLine.indexOf(" ", p_index+1);
if (p_index2 == -1) return null;
String split = newLine.substring(p_index+1, p_index2);
// return split[1].substring(4, split[1].length());
return split.substring(4, split.length());
}
UPD: It could be 3 times faster.

I would recommend to use the VisualVM to find the bottle neck before oprimisation.
If you need performance in your application, you will need profiling anyways.
As optimisation i would make an custom loop to replace yours substringBetween method and get rid of multiple indexOf calls

Google guava splitter pretty fast as well.

Could you try the regex anyway and post results please just for comparison:
Pattern p = Pattern.compile("(Some String : )(.*?)(Value=)"); //remove first and last group if not needed (adjust m.group(x) to match
#Test
public void test2(){
String str = "Long java line with Some String : and some object with Value=154345 ";
System.out.println(substringBetween(str));
}
private String substringBetween(String str) {
Matcher m = p.matcher(str);
if(m.find(2)){
return m.group(2);
}else{
return null;
}
}
If this is faster find a regex that combines both functions

Related

Java efficiently replace unless matches complex regular expression

I have over a gigabyte of text that I need to go through and surround punctuation with spaces (tokenizing). I have a long regular expression (1818 characters, though that's mostly lists) that defines when punctuation should not be separated. Being long and complicated makes it hard to use groups with it, though I wouldn't leave that out as an option since I could make most groups non-capturing (?:).
Question: How can I efficiently replace certain characters that don't match a particular regular expression?
I've looked into using lookaheads or similar, and I haven't quite figured it out, but it seems to be terribly inefficient anyway. It would likely be better than using placeholders though.
I can't seem to find a good "replace with a bunch of different regular expressions for both finding and replacing in one pass" function.
Should I do this line by line instead of operating on the whole text?
String completeRegex = "[^\\w](("+protectedPrefixes+")|(("+protectedNumericOnly+")\\s*\\p{N}))|"+protectedRegex;
Matcher protectedM = Pattern.compile(completeRegex).matcher(s);
ArrayList<String> protectedStrs = new ArrayList<String>();
//Take note of the protected matches.
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
}
//Replace protected matches.
String replaceStr = "<PROTECTED>";
s = protectedM.replaceAll(replaceStr);
//Now that it's safe, separate punctuation.
s = s.replaceAll("([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])"," $1 ");
// These are for apostrophes. Can these be combined with either the protecting regular expression or the one above?
s = s.replaceAll("([\\p{N}\\p{L}])'(\\p{L})", "$1 '$2");
s = s.replaceAll("([^\\p{L}])'([^\\p{L}])", "$1 ' $2");
Note the two additional replacements for apostrophes. Using placeholders protects against those replacements as well, but I'm not really concerned with apostrophes or single quotes in my protecting regex anyway, so it's not a real concern.
I'm rewriting what I considered very inefficient Perl code with my own in Java, keeping track of speed, and things were going fine until I started replacing the placeholders with the original strings. With that addition it's too slow to be reasonable (I've never seen it get even close to finishing).
//Replace placeholders with original text.
String resultStr = "";
String currentStr = "";
int currentPos = 0;
int[] protectedArray = replaceStr.codePoints().toArray();
int protectedLen = protectedArray.length;
int[] strArray = s.codePoints().toArray();
int protectedCount = 0;
for (int i=0; i<strArray.length; i++) {
int pt = strArray[i];
// System.out.println("pt: "+pt+" symbol: "+String.valueOf(Character.toChars(pt)));
if (protectedArray[currentPos]==pt) {
if (currentPos == protectedLen - 1) {
resultStr += protectedStrs.get(protectedCount);
protectedCount++;
currentPos = 0;
} else {
currentPos++;
}
} else {
if (currentPos > 0) {
resultStr += replaceStr.substring(0, currentPos);
currentPos = 0;
currentStr = "";
}
resultStr += ParseUtils.getSymbol(pt);
}
}
s = resultStr;
This code may not be the most efficient way to return the protected matches. What is a better way? Or better yet, how can I replace punctuation without having to use placeholders?

I don't know exactly how big your in-between strings are, but I suspect that you can do somewhat better than using Matcher.replaceAll, speed-wise.
You're doing 3 passes across the string, each time creating a new Matcher instance, and then creating a new String; and because you're using + to concatenate the strings, you're creating a new string which is the concatenation of the in-between string and the protected group, and then another string when you concatenate this to the current result. You don't really need all of these extra instances.
Firstly, you should accumulate the resultStr in a StringBuilder, rather than via direct string concatenation. Then you can proceed something like:
StringBuilder resultStr = new StringBuilder();
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
appendInBetween(resultStr, str, current, protectedM.str());
resultStr.append(protectedM.group());
currIndex = protectedM.end();
}
resultStr.append(str, currIndex, str.length());
where appendInBetween is a method implementing the equivalent to the replacements, just in a single pass:
void appendInBetween(StringBuilder resultStr, String s, int start, int end) {
// Pass the whole input string and the bounds, rather than taking a substring.
// Allocate roughly enough space up-front.
resultStr.ensureCapacity(resultStr.length() + end - start);
for (int i = start; i < end; ++i) {
char c = s.charAt(i);
// Check if c matches "([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])".
if (!(Character.isLetter(c)
|| Character.isDigit(c)
|| Character.getType(c) == Character.NON_SPACING_MARK
|| "_\\-<>'".indexOf(c) != -1)) {
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else if (c == '\'' && i > 0 && i + 1 < s.length()) {
// We have a quote that's not at the beginning or end.
// Call these 3 characters bcd, where c is the quote.
char b = s.charAt(i - 1);
char d = s.charAt(i + 1);
if ((Character.isDigit(b) || Character.isLetter(b)) && Character.isLetter(d)) {
// If the 3 chars match "([\\p{N}\\p{L}])'(\\p{L})"
resultStr.append(' ');
resultStr.append(c);
} else if (!Character.isLetter(b) && !Character.isLetter(d)) {
// If the 3 chars match "([^\\p{L}])'([^\\p{L}])"
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else {
resultStr.append(c);
}
} else {
// Everything else, just append.
resultStr.append(c);
}
}
}
Ideone demo
Obviously, there is a maintenance cost associated with this code - it is undeniably more verbose. But the advantage of doing it explicitly like this (aside from the fact it is just a single pass) is that you can debug the code like any other - rather than it just being the black box that regexes are.
I'd be interested to know if this works any faster for you!

At first I thought that appendReplacement wasn't what I was looking for, but indeed it was. Since it's replacing the placeholders at the end that slowed things down, all I really needed was a way to dynamically replace matches:
StringBuffer replacedBuff = new StringBuffer();
Matcher replaceM = Pattern.compile(replaceStr).matcher(s);
int index = 0;
while (replaceM.find()) {
replaceM.appendReplacement(replacedBuff, "");
replacedBuff.append(protectedStrs.get(index));
index++;
}
replaceM.appendTail(replacedBuff);
s = replacedBuff.toString();
Reference: Second answer at this question.
Another option to consider:
During the first pass through the String, to find the protected Strings, take the start and end indices of each match, replace the punctuation for everything outside of the match, add the matched String, and then keep going. This takes away the need to write a String with placeholders, and requires only one pass through the entire String. It does, however, require many separate small replacement operations. (By the way, be sure to compile the patterns before the loop, as opposed to using String.replaceAll()). A similar alternative is to add the unprotected substrings together, and then replace them all at the same time. However, the protected strings would then have to be added to the replaced string at the end, so I doubt this would save time.
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
String substr = s.substring(currIndex,protectedM.start());
substr = p1.matcher(substr).replaceAll(" $1 ");
substr = p2.matcher(substr).replaceAll("$1 '$2");
substr = p3.matcher(substr).replaceAll("$1 ' $2");
resultStr += substr+protectedM.group();
currIndex = protectedM.end();
}
Speed comparison for 100,000 lines of text:
Original Perl script: 272.960579875 seconds
My first attempt: Too long to finish.
With appendReplacement(): 14.245160866 seconds
Replacing while finding protected: 68.691842962 seconds
Thank you, Java, for not letting me down.

What am I missing with this code? Google foo.bar

So recently I got invited to this google foo.bar challenge and I believe the code runs the way it should be. To be precise what I need to find is the number of occurrences of "abc" in a String. When I verify my code with them, I pass 3/10 test cases. I'm starting to feel bad because I don't know what I am doing wrong. I have written the code which I will share with you guys. Also the string needs to be less than 200 characters. When I run this from their website, I pass 3 tests and fail 7. Basically 7 things need to be right.
The actual question:
Write a function called answer(s) that, given a non-empty string less
than 200 characters in length describing the sequence of M&Ms. returns the maximum number of equal parts that can be cut from the cake without leaving any leftovers.
Example : Input : (string) s = "abccbaabccba"
output : (int) 2
Input: (string) s = "abcabcabcabc"
output : (int) 4
public static int answer(String s) {
int counter = 0;
int index;
String findWord ="ABC";
if(s!=null && s.length()<200){
s = s.toUpperCase();
while (s.contains(findWord))
{
index = s.indexOf(findWord);
s = s.substring(index + findWord.length(), s.length());
counter++;
}
}
return counter;
}

I see a couple of things in your code snippet:
1.
if(s.length()<200){
Why are you checking for the length to be lesser than 200? Is that a requirement? If not, you can skip checking the length.
2.
String findWord ="abc";
...
s.contains(findWord)
Can the test program be checking for upper case alphabets? Example: "ABC"? If so, you might need to consider changing your logic for the s.contains() line.
Update:
You should also consider putting a null check for the input string. This will ensure that the test cases will not fail for null inputs.

The logic of your code is well but on the other hand i found that you didn't check for if input string is empty or null.
I belief that google foo.bar wants to see the logic and the way of coding in a proper manner.
so don't be feel bad

I would go for a simpler approach
int beforeLen = s.length ();
String after = s.replace (findWord, "");
int afterLen = after.length ();
return (beforeLen - afterLen) / findWord.length ();

String pattern = "abc";
String line="<input text here>";
int i=0;
Pattern TokenPattern=Pattern.compile(pattern);
if(line!=null){
Matcher m=TokenPattern.matcher(line);
while(m.find()){
i++;
}}
System.out.println("No of occurences : "+ " "+i);

put declaration of index out before while block, isn't never good re-declare the same variable n time.
int index;
while (s.contains(findWord))
{
index = s.indexOf(findWord);
....
}
I hope this help
Update:
try to compact your code
public static int answer(String s) {
int counter = 0;
int index;
String findWord = "ABC";
if (s != null && s.length() < 200) {
s = s.toUpperCase();
while ((index = s.indexOf(findWord)) > -1) {
s = s.substring(index + findWord.length(), s.length());
counter++;
}
}
return counter;
}
Update:
The logic seems good to me, I'm still try to improve the performance, if you can try this
while ((index = s.indexOf(findWord, index)) > -1) {
//s = s.substring(index + findWord.length(), s.length());
index+=findWord.length();
counter++;
}

Java codingbat help - withoutString

I'm using codingbat.com to get some java practice in. One of the String problems, 'withoutString' is as follows:
Given two strings, base and remove, return a version of the base string where all instances of the remove string have been removed (not case sensitive).
You may assume that the remove string is length 1 or more. Remove only non-overlapping instances, so with "xxx" removing "xx" leaves "x".
This problem can be found at: http://codingbat.com/prob/p192570
As you can see from the the dropbox-linked screenshot below, all of the runs pass except for three and a final one called "other tests." The thing is, even though they are marked as incorrect, my output matches exactly the expected output for the correct answer.
Here's a screenshot of my output:
And here's the code I'm using:
public String withoutString(String base, String remove) {
String result = "";
int i = 0;
for(; i < base.length()-remove.length();){
if(!(base.substring(i,i+remove.length()).equalsIgnoreCase(remove))){
result = result + base.substring(i,i+1);
i++;
}
else{
i = i + remove.length();
}
if(result.startsWith(" ")) result = result.substring(1);
if(result.endsWith(" ") && base.substring(i,i+1).equals(" ")) result = result.substring(0,result.length()-1);
}
if(base.length()-i <= remove.length() && !(base.substring(i).equalsIgnoreCase(remove))){
result = result + base.substring(i);
}
return result;
}

Your solution IS failing AND there is a display bug in coding bat.
The correct output should be:
withoutString("This is a FISH", "IS") -> "Th a FH"
Yours is:
withoutString("This is a FISH", "IS") -> "Th a FH"
Yours fails because it is removing spaces, but also, coding bat does not display the correct expected and run output string due to HTML removing extra spaces.
This recursive solution passes all tests:
public String withoutString(String base, String remove) {
int remIdx = base.toLowerCase().indexOf(remove.toLowerCase());
if (remIdx == -1)
return base;
return base.substring(0, remIdx ) +
withoutString(base.substring(remIdx + remove.length()) , remove);
}
Here is an example of an optimal iterative solution. It has more code than the recursive solution but is faster since far fewer function calls are made.
public String withoutString(String base, String remove) {
int remIdx = 0;
int remLen = remove.length();
remove = remove.toLowerCase();
while (true) {
remIdx = base.toLowerCase().indexOf(remove);
if (remIdx == -1)
break;
base = base.substring(0, remIdx) + base.substring(remIdx + remLen);
}
return base;
}

I just ran your code in an IDE. It compiles correctly and matches all tests shown on codingbat. There must be some bug with codingbat's test cases.
If you are curious, this problem can be solved with a single line of code:
public String withoutString(String base, String remove) {
return base.replaceAll("(?i)" + remove, ""); //String#replaceAll(String, String) with case insensitive regex.
}
Regex explaination:
The first argument taken by String#replaceAll(String, String) is what is known as a Regular Expression or "regex" for short.
Regex is a powerful tool to perform pattern matching within Strings. In this case, the regular expression being used is (assuming that remove is equal to IS):
(?i)IS
This particular expression has two parts: (?i) and IS.
IS matches the string "IS" exactly, nothing more, nothing less.
(?i) is simply a flag to tell the regex engine to ignore case.
With (?i)IS, all of: IS, Is, iS and is will be matched.
As an addition, this is (almost) equivalent to the regular expressions: (IS|Is|iS|is), (I|i)(S|s) and [Ii][Ss].
EDIT
Turns out that your output is not correct and is failing as expected. See: dansalmo's answer.

public String withoutString(String base, String remove) {
String temp = base.replaceAll(remove, "");
String temp2 = temp.replaceAll(remove.toLowerCase(), "");
return temp2.replaceAll(remove.toUpperCase(), "");
}

Please find below my solution
public String withoutString(String base, String remove) {
final int rLen=remove.length();
final int bLen=base.length();
String op="";
for(int i = 0; i < bLen;)
{
if(!(i + rLen > bLen) && base.substring(i, i + rLen).equalsIgnoreCase(remove))
{
i +=rLen;
continue;
}
op += base.substring(i, i + 1);
i++;
}
return op;
}
Something things go really weird on codingBat this is just one of them.

I am adding to a previous solution, but using a StringBuilder for better practice. Most credit goes to Anirudh.
public String withoutString(String base, String remove) {
//create a constant integer the size of remove.length();
final int rLen=remove.length();
//create a constant integer the size of base.length();
final int bLen=base.length();
//Create an empty string;
StringBuilder op = new StringBuilder();
//Create the for loop.
for(int i = 0; i < bLen;)
{
//if the remove string lenght we are looking for is not less than the base length
// and the base substring equals the remove string.
if(!(i + rLen > bLen) && base.substring(i, i + rLen).equalsIgnoreCase(remove))
{
//Increment by the remove length, and skip adding it to the string.
i +=rLen;
continue;
}
//else, we add the character at i to the string builder.
op.append(base.charAt(i));
//and increment by one.
i++;
}
//We return the string.
return op.toString();
}

Taylor's solution is the most efficient one, however I have another solution that is a naive one and it works.
public String withoutString(String base, String remove) {
String returnString = base;
while(returnString.toLowerCase().indexOf(remove.toLowerCase())!=-1){
int start = returnString.toLowerCase().indexOf(remove.toLowerCase());
int end = remove.length();
returnString = returnString.substring(0, start) + returnString.substring(start+end);
}
return returnString;
}

#Daemon
your code works. Thanks for the regex explanation. Though dansalmo pointed out that codingbat is displaying the intended output incorrectly, I through in some extra lines to your code to unnecessarily account for the double spaces with the following:
public String withoutString(String base, String remove){
String result = base.replaceAll("(?i)" + remove, "");
for(int i = 0; i < result.length()-1;){
if(result.substring(i,i+2).equals(" ")){
result = result.replace(result.substring(i,i+2), " ");
}
else i++;
}
if(result.startsWith(" ")) result = result.substring(1);
return result;
}

public String withoutString(String base, String remove){
return base.replace(remove,"");
}

String.split() Not Acting on Semicolon or Space Delimiters

This may be a simple question, but I have been Googling for over an hour and haven't found an answer yet.
I'm trying to simply use the String.split() method with a small Android application to split an input string. The input string will be something along the lines of: "Launch ip:192.168.1.101;port:5900". I'm doing this in two iterations to ensure that all of the required parameters are there. I'm first trying to do a split on spaces and semicolons to get the individual tokens sorted out. Next, I'm trying to split on colons in order to strip off the identification tags of each piece of information.
So, for example, I would expect the first round of split to give me the following data from the above example string:
(1) Launch
(2) ip:192.168.1.101
(3) port:5900
Then the second round would give me the following:
(1) 192.168.1.101
(2) 5900
However, the following code that I wrote doesn't give me what's expected:
private String[] splitString(String inputString)
{
String[] parsedString;
String[] orderedString = new String[SOSLauncherConstants.SOCKET_INPUT_STRING_PARSE_VALUE];
parsedString = inputString.trim().split("; ");
Log.i("info", "The parsed data is as follows for the initially parsed string of size " + parsedString.length + ": ");
for (int i = 0; i < parsedString.length; ++i)
{
Log.i("info", parsedString[i]);
}
for (int i = 0; i < parsedString.length; ++i )
{
if (parsedString[i].toLowerCase().contains(SOSLauncherConstants.PARSED_LAUNCH_COMMAND_VALUE))
{
orderedString[SOSLauncherConstants.PARSED_COMMAND_WORD] = parsedString[i];
}
if (parsedString[i].toLowerCase().contains("ip"))
{
orderedString[SOSLauncherConstants.PARSED_IP_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("port"))
{
orderedString[SOSLauncherConstants.PARSED_PORT_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("username"))
{
orderedString[SOSLauncherConstants.PARSED_USERNAME_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("password"))
{
orderedString[SOSLauncherConstants.PARSED_PASSWORD_VALUE] = parsedString[i].split(":")[1];
}
else if (parsedString[i].toLowerCase().contains("color"))
{
orderedString[SOSLauncherConstants.PARSED_COLOR_VALUE] = parsedString[i].split(":")[1];
}
}
Log.i("info", "The parsed data is as follows for the second parsed string of size " + orderedString.length + ": ");
for (int i = 0; i < orderedString.length; ++i)
{
Log.i("info", orderedString[i]);
}
return orderedString;
}
For a result, I'm getting the following:
The parsed data is as follows for the parsed string of size 1:
launch ip:192.168.1.106;port:5900
The parsed data is as follows for the second parsed string of size 6:
launch ip:192.168.1.106;port:5900
192.168.1.106;port
And then, of course, it crashes because the for loop runs into a null string.
Side Note:
The following snippet is from the constants class that defines all of the string indexes --
public static final int SOCKET_INPUT_STRING_PARSE_VALUE = 6;
public static final int PARSED_COMMAND_WORD = 0;
public static final String PARSED_LAUNCH_COMMAND_VALUE = "launch";
public static final int PARSED_IP_VALUE = 1;
public static final int PARSED_PORT_VALUE = 2;
public static final int PARSED_USERNAME_VALUE = 3;
public static final int PARSED_PASSWORD_VALUE = 4;
public static final int PARSED_COLOR_VALUE = 5;
I looked into needing a possible escape (by inserting a \\ before the semicolon) on the semicolon delimiter, and even tried using it, but that didn't work. The odd part is that neither the space nor the semicolon function as a delimiter, yet the colon works on the second time around. Does anybody have any ideas what would cause this?
Thanks for your time!
EDIT: I should also add that I'm receiving the string over a WiFi socket connection. I don't think this should make a difference, but I'd like you to have all of the information that you need.

String.split(String) takes a regex. Use "[; ]". eg:
"foo;bar baz".split("[; ]")
will return an array containing "foo", "bar" and "baz".
If you need groups of spaces to work as a single delimiter, you can use something like:
"foo;bar baz".split("(;| +)")

I believe String.split() tries to split on each of the characters you specify together (or on a regex), not each character individually. That is, split(";.") would not split "a;b.c" at all, but would split "a;.b".
You may have better luck with Guava's Splitter, which is meant to be slightly less unpredictable than java.lang.String.split.
I would write something like
Iterable<String> splits = Splitter.on(CharMatcher.anyOf("; ")).split(string);
but Splitter also provides fluent-style customization like "trim results" or "skip over empty strings."

Is there a reason why you are using String.split(), but not using Regular Expressions? This is a perfect candidate for regex'es, esp if the string format is consistent.
I'm not sure if your format is fixed, and if it is, then the following regex should break it down for you (am sure that someone can come up with an even more elegant regex). If you have several command strings that follow, then you can use a more flexible regex and loop over all the groups:
Pattern p = Pattern.compile("([\w]*)[ ;](([\w]*):([^ ;]*))*");
Matcher m = p.match( <input string>);
if( m.find() )
command = m.group(1);
do{
id = m.group(3);
value = m.group(4);
} while( m.find() );
A great place to test out regex'es online is http://www.regexplanet.com/simple/index.html. It allows you to play with the regex without having to compile and launch you app every time if you just want to get the regex correct.

Replacing user portion of email address in java

You have
user.nick#domain.com
and result should be:
******#domain.com
Currently I'm doing it this way:
public static String removeUserFromEmail(String email) {
StringBuffer sbEmail = new StringBuffer(email);
int start = sbEmail.indexOf("#");
sbEmail.delete(0, start);
return "******" + sbEmail.toString();
}
Is there something simpler or more elegant?

i would be inclined to run indexOf on email string before putting it in the stringbuffer...
int start = email.indexOf( '#' );
if( start == -1 )
{
// handle invalid e-mail
}
else
{
return "*****" + email.substring( start );
}

Nothing wrong with that solution, although I have two suggestions:
1) Use StringBuilder instead of StringBuffer unless you need to synchronize access between multiple threads. There is a performance penalty associated with StringBuffer that for this application is likely unnecessary.
2) One of the benefits of StringBuilder/Buffer is avoiding excessive string concatenations.
Your return line converts the Buffer to a string, and then concatenates. I would probably do this instead:
int start = email.indexOf("#");
if (start < 0) {
return ""; // pick your poison for the error condition
}
StringBuilder sbEmail = new StringBuilder(email);
sbEmail.replace(0, start, "******");
return sbEmail.toString();
FYI - my solution is really just some thoughts on your current use of StringBuffer (which are hopefully helpful). I would recommend Konstantin's solution for this simple string exercise. Simple, readable, and it gives you the opportunity to handle the error condition.

"some.user#domain.com".replaceAll("^[^#]+", "******");

Looks OK. Better check if indexOf returns -1.

public static String removeUserFromEmail(String email) {
String[] pieces = email.split("#");
return (pieces.length > 1 ? "******" + pieces[1] : email);
}

You could use a regex, but your solution seems fine to me. Probably faster than the regex too.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Efficient Text Processing Java - java

I would try to use regular expressions.

I would recommend to use the VisualVM to find the bottle neck before oprimisation. If you need performance in your application, you will need profiling anyways. As optimisation i would make an custom loop to replace yours substringBetween method and get rid of multiple indexOf calls

Google guava splitter pretty fast as well.

Related

Java efficiently replace unless matches complex regular expression

What am I missing with this code? Google foo.bar

Java codingbat help - withoutString

String.split() Not Acting on Semicolon or Space Delimiters

Replacing user portion of email address in java

Categories

Resources