Given a word, I've to replace some specific alphabets with some specific letters such as 1 for a, 5 for b etc. I'm using regex for this. I understand that StringBuilder is the best way to deal with this problem as I'm doing a lot of string manipulations. Here is what I'm doing:
String word = "foobooandfoo";
String converted = "";
converted = word.replaceAll("[ao]", "1");
converted = converted.replaceAll("[df]", "2");
converted = converted.replaceAll("[n]", "3");
My problem is how to rewrite this program using StringBuilder. I tried everything but I can't succeed. Or using String is just fine for this?
I think this is a case where clarity and performance happily coincide. I would use a lookup table to do the "translation".
public static void translate(StringBuilder str, char[] table)
{
for (int idx = 0; idx < str.length(); ++idx) {
char ch = str.charAt(idx);
if (ch < table.length) {
ch = table[ch];
str.setCharAt(idx, ch);
}
}
}
If you have a large alphabet for the str input, or your mappings are sparse, you could use a real map, like this:
public static void translate(StringBuilder str, Map<Character, Character> table)
{
for (int idx = 0; idx < str.length(); ++idx) {
char ch = str.charAt(idx);
Character conversion = table.get(ch);
if (conversion != null)
str.setCharAt(idx, conversion);
}
}
While these implementations work in-place, you could create a new StringBuilder instance (or append to one that's passed in).
I'd actually say that the code is pretty OK in most applications although it's theoretically inferior to other methods. If you don't want to use the Matcher, try it like this:
StringBuilder result = new StringBuilder(word.length());
for (char c : word.toCharArray()) {
switch (c) {
case 'a': case 'o': result.append('1'); break;
case 'd': case 'f': result.append('2'); break;
case 'n': result.append('3'); break;
default: result.append(c); break;
}
}
I don't know if StringBuilder is the tool for you here. I'd consider looking at Matcher which is part of the java regex package and might be faster than your example above in case you really need the performance.
I don't believe you can. All the regex replace APIs use String instead of StringBuilder.
If you're basically converting each char into a different char, you could just do something like:
public String convert(String text)
{
char[] chars = new char[text.length()];
for (int i=0; i < text.length(); i++)
{
char c = text.charAt(i);
char converted;
switch (c)
{
case 'a': converted = '1'; break;
case 'o': converted = '1'; break;
case 'd': converted = '2'; break;
case 'f': converted = '2'; break;
case 'n': converted = '3'; break;
default : converted = c; break;
}
chars[i] = converted;
}
return new String(chars);
}
However, if you do any complex regular expressions, that obviously won't help much.
StringBuilder and StringBuffer can have a big performance difference in some programs. See: http://www.thectoblog.com/2011/01/stringbuilder-vs-stringbuffer-vs.html
Which would be a strong reason to want to hold onto it.
The original post asked for multi-character to be replaced with single character. This has a resize impact, which in turn could affect performance.
That said the simplest way to do this is with a String. But to take care of were it is done so as to minimize the gc and other effect if performance is a concern.
I like P Arrayah's approach, but for a more generic answer it should use a LinkedHashMap or something that preserves order in case the replacements have a dependency.
Map replaceRules = new HashMap();
Map replaceRules = new LinkedHashMap();
I had a look at the Matcher.replaceAll() and I noticed that it returns a String. Therefore, I think that what you've got is going to be plenty fast. Regex's are easy to read and quick.
Remember the first rule of optimization: don't do it!
I understand that StringBuilder is the best way to deal with this problem as I'm doing a lot of string manipulations.
Who say that to you? The best way is those that is more clear to read, to the one that uses StringBuilder. The StringBuilder is some circumnstances but in many does not provide a percetible speed up.
You shouldn't initialize "converted" if the value is always replaced.
You can remove some of the boiler plate to improve your code:
String word = "foobooandfoo";
String converted = word.replaceAll("[ao]", "1")
.replaceAll("[df]", "2")
.replaceAll("[n]", "3");
If you want use StringBuilder you could use this method
java.util.regex.Pattern#matcher(java.lang.CharSequence)
which accept CharSequence (implemented by StringBuilder).
See http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#matcher(java.lang.CharSequence).
StringBuilder vs. regex is a false dichotomy. The reason String#replaceAll() is the wrong tool is because, each time you call it, you're compiling the regex and processing the whole string. You can avoid all that excess work by combining all the regexes into one and using the lower-level methods in Matcher instead of replaceAll(), like so:
String text = "foobooandfoo";
Pattern p = Pattern.compile("([ao])|([df])|n");
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb, "");
sb.append(m.start(1) != -1 ? '1' :
m.start(2) != -1 ? '2' :
'3');
}
m.appendTail(sb);
System.out.println(sb.toString());
Of course, this is still overkill; for a job as simple as this one, I recommend erickson's approach.
I would NOT recommend using any regex for this, those are actually all painfully slow when you're doing simple operations. Instead I'd recommend you start with something like this
// usage:
Map<String, String> replaceRules = new HashMap<String, String>();
replaceRules.put("ao", "1");
replaceRules.put("df", "2");
replaceRules.put("n", "3");
String s = replacePartsOf("foobooandfoo", replaceRules);
// actual method
public String replacePartsOf(String thisString, Map<String, String> withThese) {
for(Entry<String, String> rule : withThese.entrySet()) {
thisString = thisString.replaceAll(rule.getKey(), rule.getValue());
}
return thisString;
}
and after you've got that working, refactor it to use character arrays instead. While I think what you want to do can be done with StringBuilder it most likely won't be worth the effort.
Related
I have a large String in which I need to replace a few string pathern.
I have written code as below and I am getting very large memory usage in deployed environment.
I have read online to see if there is any more optimal way this can be re-written but could not find conclusive answer.
Can anyone give any suggestions for code below.
String response = "#very #large & string % STUFF";
System.out.println(response);
String[] SPECIAL_CHARACTERS = {"&","%","#","STUFF"};
for(int count = 0;count<SPECIAL_CHARACTERS.length;count++)
{
if(response.contains(SPECIAL_CHARACTERS[count]))
{
response = response.replace(SPECIAL_CHARACTERS[count],"");
}
}
System.out.println(response);
I have a large String in which I need to replace a few characters.
I would avoid getting into that situation. Stream your data! (And how come you have a "very large" string stored in the database? That doesn't sound good.)
If you can't, this is the most memory efficient way to do what you are doing:
int len = response.length();
for (int i = 0; i < len; i++) {
char ch = response.charAt(i);
switch (ch) {
case '&': case '%': case '#': case '#':
break;
default:
System.out.print(ch);
}
}
System.out.println();
In some circumstances, it may be better to use a StringBuilder so that you can do a single write operation.
int len = response.length();
StringBuilder temp = new StringBuilder(len);
for (int i = 0; i < len; i++) {
char ch = response.charAt(i);
switch (ch) {
case '&': case '%': case '#': case '#':
break;
default:
temp.append(ch);
}
}
System.out.println(temp.toString());
but that allocates more memory.
I would solve this kind of problem by reading packets from the input and append them to the output. So, assuming that the very large String is called input, I would always read easily manageable packets from input into temp, do the algorithm of replacements for temp and then append temp to output, which should be initialized before your cycles as empty String.
The problem you encounter is that you are always replacing your whole String. So, instead of replacing response each time, do all the replacements on smaller packets and add the result to a String variable.
I'm currently trying to loop through a String and identity a specific character within that string then add a specific character following on from the originally identified character.
For example using the string: aaaabbbcbbcbb
And the character I want to identify being: c
So every time a c is detected a following c will be added to the string and the loop will continue.
Thus aaaabbbcbbcbb will become aaaabbbccbbccbb.
I've been trying to make use of indexOf(),substring and charAt() but I'm currently either overriding other characters with a c or only detecting one c.
I know you've asked for a loop, but won't something as simple as a replace suffice?
String inputString = "aaaabbbcbbcbb";
String charToDouble = "c";
String result = inputString.replace(charToDouble, charToDouble+charToDouble);
// or `charToDouble+charToDouble` could be `charToDouble.repeat(2)` in JDK 11+
Try it online.
If you insist on using a loop however:
String inputString = "aaaabbbcbbcbb";
char charToDouble = 'c';
String result = "";
for(char c : inputString.toCharArray()){
result += c;
if(c == charToDouble){
result += c;
}
}
Try it online.
Iterate over all the characters. Add each one to a StringBuilder. If it matches the character you're looking for then add it again.
final String test = "aaaabbbcbbcbb";
final char searchChar = 'c';
final StringBuilder builder = new StringBuilder();
for (final char c : test.toCharArray())
{
builder.append(c);
if (c == searchChar)
{
builder.append(c);
}
}
System.out.println(builder.toString());
Output
aaaabbbccbbccbb
You probably are trying to modify a String in java. Strings in Java are immutable and cannot be changed like one might do in c++.
You can use StringBuilder to insert characters. eg:
StringBuilder builder = new StringBuilder("acb");
builder.insert(1, 'c');
The previous answer suggesting String.replace is the best solution, but if you need to do it some other way (e.g. for an exercise), then here's a 'modern' solution:
public static void main(String[] args) {
final String inputString = "aaaabbbcbbcbb";
final int charToDouble = 'c'; // A Unicode codepoint
final String result = inputString.codePoints()
.flatMap(c -> c == charToDouble ? IntStream.of(c, c) : IntStream.of(c))
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
assert result.equals("aaaabbbccbbccbb");
}
This looks at each character in turn (in an IntStream). It doubles the character if it matches the target. It then accumulates each character in a StringBuilder.
A micro-optimization can be made to pre-allocate the StringBuilder's capacity. We know the maximum possible size of the new string is double the old string, so StringBuilder::new can be replaced by () -> new StringBuilder(inputString.length()*2). However, I'm not sure if it's worth the sacrifice in readability.
Basically given an int, I need to generate a String with the same length containing only the specified character. Related question here, but it relates to C# and it does matter what's in the String.
This question, and my answer to it are why I am asking this one. I'm not sure what's the best way to go about it performance wise.
Example
Method signature:
String getPattern(int length, char character);
Usage:
//returns "zzzzzz"
getPattern(6, 'z');
What I've tried
String getPattern(int length, char character) {
String result = "";
for (int i = 0; i < length; i++) {
result += character;
}
return result;
}
Is this the best that I can do performance-wise?
You should use StringBuilder instead of concatenating chars this way. Use StringBuilder.append().
StringBuilder will give you better performance. The problem with concatenation the way you are doing is each time a new String (string is immutable) is created then the old string is copied, the new string is appended, and the old String is thrown away. It's a lot of extra work that over a period of type (like in a big for loop) will cause performance degradation.
StringUtils from commons-lang or Strings from guava are your friends. As already stated avoid String concatenations.
StringUtils.repeat("a", 3) // => "aaa"
Strings.repeat("hey", 3) // => "heyheyhey"
Use primitive char arrays & some standard util classes like Arrays
public class Test {
static String getPattern(int length, char character) {
char[] cArray = new char[length];
Arrays.fill(cArray, character);
// return Arrays.toString(cArray);
return new String(cArray);
}
static String buildPattern(int length, char character) {
StringBuilder sb= new StringBuilder(length);
for (int i = 0; i < length; i++) {
sb.append(character);
}
return sb.toString();
}
public static void main(String args[]){
long time = System.currentTimeMillis();
getPattern(10000000,'c');
time = System.currentTimeMillis() - time;
System.out.println(time); //prints 93
time = System.currentTimeMillis();
buildPattern(10000000,'c');
time = System.currentTimeMillis() - time;
System.out.println(time); //prints 188
}
}
EDIT Arrays.toString() gave lower performance since it eventually used a StringBuilder, but the new String did the magic.
Yikes, no.
A String is immutable in java; you can't change it. When you say:
result += character;
You're creating a new String every time.
You want to use a StringBuilder and append to it, then return a String with its toString() method.
I think it would be more efficient to do it like following,
String getPattern(int length, char character)
{
char[] list = new char[length];
for(int i =0;i<length;i++)
{
list[i] = character;
}
return new string(list);
}
Concatenating a String is never the most efficient, since String is immutable, for better performance you should use StringBuilder, and append()
String getPattern(int length, char character) {
StringBuilder sb= new StringBuilder(length)
for (int i = 0; i < length; i++) {
sb.append(character);
}
return sb.toString();
}
Performance-wise, I think you'd have better results creating a small String and concatenating (using StringBuilder of course) until you reach the request size: concatenating/appending "zzz" to "zzz" performs probably betters than concatenating 'z' three times (well, maybe not for such small numbers, but when you reach 100 or so chars, doing ten concatenations of 'z' followed by ten concatenations of "zzzzzzzzzz" is probably better than 100 concatenatinos of 'z').
Also, because you ask about GWT, results will vary a lot between DevMode (pure Java) and "production mode" (running in JS in the browser), and is likely to vary depending on the browser.
The only way to really know is to benchmark, everything else is pure speculation.
And possibly use deferred binding to use the most performing variant in each browser (that's exactly how StringBuilder is emulated in GWT).
I get a set of chars, e.g. as a String containing all of them and need a charclass Pattern matching any of them. For example
for "abcde" I want "[a-e]"
for "[]^-" I want "[-^\\[\\]]"
How can I create a compact solution and how to handle border cases like empty set and set of all chars?
What chars need to be escaped?
Clarification
I want to create a charclass Pattern, i.e. something like "[...]", no repetitions and no such stuff. It must work for any input, that's why I'm interested in the corner cases, too.
Here's a start:
import java.util.*;
public class RegexUtils {
private static String encode(char c) {
switch (c) {
case '[':
case ']':
case '\\':
case '-':
case '^':
return "\\" + c;
default:
return String.valueOf(c);
}
}
public static String createCharClass(char[] chars) {
if (chars.length == 0) {
return "[^\\u0000-\\uFFFF]";
}
StringBuilder builder = new StringBuilder();
boolean includeCaret = false;
boolean includeMinus = false;
List<Character> set = new ArrayList<Character>(new TreeSet<Character>(toCharList(chars)));
if (set.size() == 1<<16) {
return "[\\w\\W]";
}
for (int i = 0; i < set.size(); i++) {
int rangeLength = discoverRange(i, set);
if (rangeLength > 2) {
builder.append(encode(set.get(i))).append('-').append(encode(set.get(i + rangeLength)));
i += rangeLength;
} else {
switch (set.get(i)) {
case '[':
case ']':
case '\\':
builder.append('\\').append(set.get(i));
break;
case '-':
includeMinus = true;
break;
case '^':
includeCaret = true;
break;
default:
builder.append(set.get(i));
break;
}
}
}
builder.append(includeCaret ? "^" : "");
builder.insert(0, includeMinus ? "-" : "");
return "[" + builder + "]";
}
private static List<Character> toCharList(char[] chars) {
List<Character> list = new ArrayList<Character>();
for (char c : chars) {
list.add(c);
}
return list;
}
private static int discoverRange(int index, List<Character> chars) {
int range = 0;
for (int i = index + 1; i < chars.size(); i++) {
if (chars.get(i) - chars.get(i - 1) != 1) break;
range++;
}
return range;
}
public static void main(String[] args) {
System.out.println(createCharClass("daecb".toCharArray()));
System.out.println(createCharClass("[]^-".toCharArray()));
System.out.println(createCharClass("".toCharArray()));
System.out.println(createCharClass("d1a3e5c55543b2000".toCharArray()));
System.out.println(createCharClass("!-./0".toCharArray()));
}
}
As you can see, the input:
"daecb".toCharArray()
"[]^-".toCharArray()
"".toCharArray()
"d1a3e5c55543b2000".toCharArray()
prints:
[a-e]
[-\[\]^]
[^\u0000-\uFFFF]
[0-5a-e]
[!\--0]
The corner cases in a character class are:
\
[
]
which will need a \ to be escaped. The character ^ doesn't need an escape if it's not placed at the start of a character class, and the - does not need to be escaped when it's placed at the start, or end of the character class (hence the boolean flags in my code).
The empty set is [^\u0000-\uFFFF], and the set of all the characters is [\u0000-\uFFFF]. Not sure what you need the former for as it won't match anything. I'd throw an IllegalArgumentException() on an empty string instead.
What chars need to be escaped?
- ^ \ [ ] - that's all of them, I've actually tested it. And unlike some other regex implementations [ is considered a meta character inside a character class, possibly due to the possibility of using inner character classes with operators.
The rest of task sounds easy, but rather tedious. First you need to select unique characters. Then loop through them, appending to a StringBuilder, possibly escaping. If you want character ranges, you need to sort the characters first and select contiguous ranges while looping. If you want the - to be at the beginning of the range with no escaping, then set a flag, but don't append it. After the loop, if the flag is set, prepend - to the result before wrapping it in [].
Match all characters ".*" (zero or more repeitions * of matching any character . .
Match a blank line "^$" (match start of a line ^ and end of a line $. Note the lack of stuff to match in the middle of the line).
Not sure if the last pattern is exactly what you wanted, as there's different interpretations to "match nothing".
A quick, dirty, and almost-not-pseudo-code answer:
StringBuilder sb = new StringBuilder("[");
Set<Character> metaChars = //...appropriate initialization
while (sourceString.length() != 0) {
char c = sourceString.charAt(0);
sb.append(metaChars.contains(c) ? "\\"+c : c);
sourceString.replace(c,'');
}
sb.append("]");
Pattern p = Pattern.compile(sb.toString());
//...can check here for the appropriate sb.length cases
// e.g, 2 = empty, all chars equals the count of whatever set qualifies as all chars, etc
Which gives you the unique string of char's you want to match, with meta-characters replaced. It will not convert things into ranges (which I think is fine - doing so smells like premature optimization to me). You can do some post tests for simple set cases - like matching sb against digits, non-digits, etc, but unless you know that's going to buy you a lot of performance (or the simplification is the point of this program), I wouldn't bother.
If you really want to do ranges, you could instead sourceString.toCharArray(), sort that, iterate deleting repetitions and doing some sort of range check and replacing meta characters as you add the contents to StringBuilder.
EDIT: I actually kind of liked the toCharArray version, so pseudo-coded it out as well:
//...check for empty here, if not...
char[] sourceC = sourceString.toCharArray();
Arrays.sort(sourceC);
lastC = sourceC[0];
StringBuilder sb = new StringBuilder("[");
StringBuilder range = new StringBuilder();
for (int i=1; i<sourceC.length; i++) {
if (lastC == sourceC[i]) continue;
if (//.. next char in sequence..//) //..add to range
else {
// check range size, append accordingly to sb as a single item, range, etc
}
lastC = sourceC[i];
}
I have string which contains alpahanumeric and special character.
I need to replace each and every special char with some string.
For eg,
Input string = "ja*va st&ri%n#&"
Expected o/p = "jaasteriskvaspacestandripercentagenatand"
= "asterisk"
& = "and"
% = "percentage"
# = "at"
thanks,
Unless you're absolutely desperate for performance, I'd use a very simple approach:
String result = input.replace("*", "asterisk")
.replace("%", "percentage")
.replace("#", "at"); // Add more to taste :)
(Note that there's a big difference between replace and replaceAll - the latter takes a regular expression. It's easy to get the wrong one and see radically different effects!)
An alternative would be something like:
public static String replaceSpecial(String input)
{
// Output will be at least as long as input
StringBuilder builder = new StringBuilder(input.length());
for (int i = 0; i < input.length(); i++)
{
char c = input.charAt(i);
switch (c)
{
case '*': builder.append("asterisk"); break;
case '%': builder.append("percentage"); break;
case '#': builder.append("at"); break;
default: builder.append(c); break;
}
}
return builder.toString();
Take a look at the following java.lang.String methods:
replace()
replaceAll()