The fact that the replace method returns a string object rather than replacing the contents of a given string is a little obtuse (but understandable when you know that strings are immutable in Java). I am taking a major performance hit by using a deeply nested replace in some code. Is there something I can replace it with that would make it faster?
This is what StringBuilder is meant for. If you're going to be doing a lot of manipulation, do it on a StringBuilder, then turn that into a String whenever you need to.
StringBuilder is described thus:
"A mutable sequence of characters. This class provides an API compatible with StringBuffer, but with no guarantee of synchronization".
It has replace (and append, insert, delete, et al) and you can use toString to morph it into a real String.
The previous posts are right, StringBuilder/StringBuffer are a solution.
But, you also have to question if it is a good idea to do the replace on big Strings in memory.
I often have String manipulations that are implemented as a stream, so instead of replacing it in the string and then sending it to an OutputStream, I do the replace at the moment that I send the String to the outputstream. That works much faster than any replace.
This works much faster if you want this replace to implement a template mechanism. Streaming is always faster since you consume less memory and if the clients is slow, you only need to generate at a slow pace - so it scales much better.
The follow code is approx. 30 times faster if there is no match and 5 times faster if there is a match.
static String fastReplace( String str, String target, String replacement ) {
int targetLength = target.length();
if( targetLength == 0 ) {
return str;
}
int idx2 = str.indexOf( target );
if( idx2 < 0 ) {
return str;
}
StringBuilder buffer = new StringBuilder( targetLength > replacement.length() ? str.length() : str.length() * 2 );
int idx1 = 0;
do {
buffer.append( str, idx1, idx2 );
buffer.append( replacement );
idx1 = idx2 + targetLength;
idx2 = str.indexOf( target, idx1 );
} while( idx2 > 0 );
buffer.append( str, idx1, str.length() );
return buffer.toString();
}
Adding to the #paxdiablo answer, here's a sample implementation of a replaceAll using StringBuffers that is a ~3.7 times faster than String.replaceAll():
Code:
public static String replaceAll(final String str, final String searchChars, String replaceChars)
{
if ("".equals(str) || "".equals(searchChars) || searchChars.equals(replaceChars))
{
return str;
}
if (replaceChars == null)
{
replaceChars = "";
}
final int strLength = str.length();
final int searchCharsLength = searchChars.length();
StringBuilder buf = new StringBuilder(str);
boolean modified = false;
for (int i = 0; i < strLength; i++)
{
int start = buf.indexOf(searchChars, i);
if (start == -1)
{
if (i == 0)
{
return str;
}
return buf.toString();
}
buf = buf.replace(start, start + searchCharsLength, replaceChars);
modified = true;
}
if (!modified)
{
return str;
}
else
{
return buf.toString();
}
}
Test Case -- the output is the following (Delta1 = 1917009502; Delta2 =7241000026):
#Test
public void testReplaceAll()
{
String origStr = "1234567890-1234567890-";
String replacement1 = StringReplacer.replaceAll(origStr, "0", "a");
String expectedRep1 = "123456789a-123456789a-";
String replacement2 = StringReplacer.replaceAll(origStr, "0", "ab");
String expectedRep2 = "123456789ab-123456789ab-";
String replacement3 = StringReplacer.replaceAll(origStr, "0", "");
String expectedRep3 = "123456789-123456789-";
String replacement4 = StringReplacer.replaceAll(origStr, "012", "a");
String expectedRep4 = "1234567890-1234567890-";
String replacement5 = StringReplacer.replaceAll(origStr, "123", "ab");
String expectedRep5 = "ab4567890-ab4567890-";
String replacement6 = StringReplacer.replaceAll(origStr, "123", "abc");
String expectedRep6 = "abc4567890-abc4567890-";
String replacement7 = StringReplacer.replaceAll(origStr, "123", "abcdd");
String expectedRep7 = "abcdd4567890-abcdd4567890-";
String replacement8 = StringReplacer.replaceAll(origStr, "123", "");
String expectedRep8 = "4567890-4567890-";
String replacement9 = StringReplacer.replaceAll(origStr, "123", "");
String expectedRep9 = "4567890-4567890-";
assertEquals(replacement1, expectedRep1);
assertEquals(replacement2, expectedRep2);
assertEquals(replacement3, expectedRep3);
assertEquals(replacement4, expectedRep4);
assertEquals(replacement5, expectedRep5);
assertEquals(replacement6, expectedRep6);
assertEquals(replacement7, expectedRep7);
assertEquals(replacement8, expectedRep8);
assertEquals(replacement9, expectedRep9);
long start1 = System.nanoTime();
for (long i = 0; i < 10000000L; i++)
{
String rep = StringReplacer.replaceAll(origStr, "123", "abcdd");
}
long delta1 = System.nanoTime() -start1;
long start2= System.nanoTime();
for (long i = 0; i < 10000000L; i++)
{
String rep = origStr.replaceAll( "123", "abcdd");
}
long delta2 = System.nanoTime() -start1;
assertTrue(delta1 < delta2);
System.out.printf("Delta1 = %d; Delta2 =%d", delta1, delta2);
}
If you have a number of strings to replace (such as XML escape sequences), especially where the replacements are different length from the pattern, FSM lexer type algorithm seems like it might be most efficient, similar to the suggestion of processing in a stream fashion, where the output is incrementally built.
Perhaps a Matcher object could be used to do that efficiently.
Just get the char[] of the String and iterate through it. Use a temporary StringBuilder.
Look for the pattern you want to replace while iterating if you don't find the pattern, write the stuff you scanned to the StringBuilder, else write the replacement text to the StringBuilder.
All string manipulation in general are very slow. Consider to use StringBuffer, it's not exactly like the String class, but have a lot in common and it's mutable as well.
When you're replacing single characters, consider iterating over your character array but replace characters by using a (pre-created) HashMap<Character, Character>().
I use this strategy to convert an integer exponent string by unicode superscript characters.
It's about twice as fast compared to String.replace(char, char). Note that the time associated to creating the hash map isn't included in this comparison.
Because String.replace(CharSequence target, CharSequence replacement) has Pattern.compile, matcher, replaceAll inside, one can slightly optimize it by for using precompiled target pattern constant, like this:
private static final Pattern COMMA_REGEX = Pattern.compile(",");
...
COMMA_REGEX.matcher(value).replaceAll(replacement);
Apache commons stuff, based on StringBuilder
StringUtils.replace(String text, String searchString, String replacement)
Related
So I want to match credit card numbers and mask them in 6*4 format. So that only first 6 and last 4 characters will be visible. The characters between will be '*'. I tried to figure it out with a MASK like;
private static final String MASK = "$1***$3";
matcher.replaceAll(MASK);
But could not find out the way to give me back equal length of stars in the middle as the group $2.
Then I implemented the below code and it works.
But what i want to ask if there is a shorter or easier way to do this. Anyone knows it?
private static final String HIDING_MASK = "**********";
private static final String REGEX = "\\b([0-9]{6})([0-9]{3,9})([0-9]{4})\\b";
private static final int groupToReplace = 2;
private String formatMessage(String message) throws NotMatchedException {
Matcher m = Pattern.compile(REGEX).matcher(message);
if (!m.find()) throw new NotMatchedException();
else {
StringBuilder maskedMessage = new StringBuilder(message);
do {
maskedMessage.replace(m.start(groupToReplace), m.end(groupToReplace),
HIDING_MASK.substring(0, (m.end(groupToReplace) - m.start(groupToReplace))));
} while(m.find(m.end()));
return maskedMessage.toString();
}
}
EDIT: Here is an example message to process.
"2017.08.26 20:51 [Thread-Name] [Class-Name] [MethodName] Credit card holder 12345678901234567 02/2022 123 ........."
You can do it simply with this code:
str.replaceAll( "(?<=\\d{6})\\d(?=\\d{4})", "*" );
private String formatMessage(String message) throws NotMatchedException {
if (message.matches(".*\\b\\d{13,19}\\b.*")) {
return message.replaceAll("(?:[.\\b]*)(?<=\\d{6})\\d(?=\\d{4})(?:[.\\b]*)", "*");
} else {
throw new NotMatchedException() ;
}
}
Readable but uncool.
String in = "1234561231234";
String mask = in
.replaceFirst("^\\d{6}(\\d+)\\d{4}$", "$1")
.replaceAll("\\d", "\\*");
String out = in
.replaceFirst("^(\\d{6})\\d+(\\d{4})$", "$1" + mask + "$2");
You can use the following if your text contains multiple credit-card numbers with variable lengths:
str.replaceAll( "\\b(\\d{13,19})\\b", "\u0000$1\u0000" )
.replaceAll( "(?<=\\d{6})(?<=\u0000\\d{6,14})\\d(?=\\d{4,12}\u0000)(?=\\d{4})", "*" )
.replaceAll( "\u0000([\\d*]+)\u0000", "$1" );
Not really readable, though, but it's all in one go.
A simple solution for a 16 char "number":
String masked = num.substring(0,6) + "******" + num.substring(12,16);
For a string of arbitrary length ( >10 ):
String masked = num.substring(0,6)
+ stars(num.length() - 10)
+ num.substring(num.length() - 6);
... where stars(int n) returns a String of n stars. See Simple way to repeat a String in java -- or if you don't mind a limit of 9 stars, "*********".substring(0,n)
Use a StringBuffer and overwrite the desired characters:
StringBuffer buf = new StringBuffer(num);
for(int i=4; i< buf.length() - 6) {
buf.setCharAt(i, '*');
}
return buf.toString();
You could also use buf.replace(int start, int end, String str)
I am in need to mask PII data for my application. The PII data will be of String format and of variable lengths, as it may include name, address, mail id's etc.
So i need to mask these data before logging them, it should not be a full mask instead, if the length of string is less than or equal to 8 characters then mask the first half with "XXX etc.."
If the length is more than 8 then mask the first and last portion of the string such that only the mid 5 characters are visible.
I know we can do this using java sub-stringa nd iterating over the string, but want to know if there is any other simple solution to address this.
Thanks in advance
If you are using Apache Commons, you can do like
String maskChar = "*";
//number of characters to be masked
String maskString = StringUtils.repeat( maskChar, 4);
//string to be masked
String str = "FirstName";
//this will mask first 4 characters of the string
System.out.println( StringUtils.overlay(str, maskString, 0, 4) );
You can check the string length before generating maskString using if else statement.
You can use this function; change the logic of half's as per your needs:
public static String maskedVariableString(String original)
{
String maskedString = null;
if(original.length()<9)
{
int half = original.length()/2;
StringBuilder sb =new StringBuilder("");
for(int i=0;i<(original.length()-half);i++)
{
sb.append("X");
}
maskedString = original.replaceAll("\\b.*(\\d{"+half+"})", sb.toString()+"$1");
}
else
{
int maskLength = original.length()-5;
int firstMaskLength = maskLength/2;
int secondMaskLength = maskLength-firstMaskLength;
StringBuilder sb =new StringBuilder("");
for(int i=0;i<firstMaskLength;i++)
{
sb.append("X");
}
String firstMask = sb.toString();
StringBuilder sb1 =new StringBuilder("");
for(int i=0;i<secondMaskLength;i++)
{
sb1.append("X");
}
String secondMask = sb1.toString();
maskedString = original.replaceAll("\\b(\\d{"+firstMaskLength+"})(\\d{5})(\\d{"+secondMaskLength+"})", firstMask+"$2"+secondMask);
}
return maskedString;
}
Explanation:
() groups the regular expression and we can use $ to access this group($1, $2,$3).
The \b boundary helps check that we are the start of the digits (there are other ways to do this, but here this will do).
(\d{+half+}) captures (half) no of digits to Group 1. The same happens in the else part also.
is there a nice way for creating a string initialized with a number of characters given an int (counter) and the character to set.
Simply put I would like a method that returns "#,#,#,#,#" when passed 5 and # as parameter.
Any ideas?
Using the StringUtils utility in the Apache Commons lang library:
String myString = StringUtils.repeat("#", ",", 5);
If you only want the characters (and not the comma separators), it is just:
String myString = StringUtils.repeat("#", 5);
It's pretty simple to write a method for this:
public static String createPlaceholderString(char placeholder, int count) {
StringBuilder builder = new StringBuilder(count * 2 - 1);
for (int i = 0; i < count; i++) {
if (i!= 0) {
builder.append(',');
}
builder.append(placeholder);
}
return builder.toString();
}
(Note that we can initialize the builder with exactly the right size of buffer as we know how big it will be.)
You could use something like Strings.repeat from Guava:
String text = Strings.repeat("#,", count - 1) + "#";
Or even more esoterically:
String text = Joiner.on(',').join(Iterables.limit(Iterables.cycle("#"), count));
... but personally I'd probably stick with the method.
Try:
public String buildString(int nbr, String repeat){
StringBuilder builder = new StringBuilder();
for(int i=0; i>nbr; i++){
builder.append(repeat);
if(i<(nbr-1))
builder.append(",");
}
return builder.toString();
}
If I have a string such as one of the following:
AlphaSuffix
BravoSuffix
CharlieSuffix
DeltaSuffix
What is the most concise Java syntax to transform AlphaSuffix into Alpha into BravoSuffix into Bravo?
Use a simple regexp to delete the suffix:
String myString = "AlphaSuffix";
String newString = myString.replaceFirst("Suffix$", "");
Chop it off.
String given = "AlphaSuffix"
String result = given.substring(0, given.length()-"Suffix".length());
To make it even more concise, create a utility method.
public static String chop(String value, String suffix){
if(value.endsWith(suffix)){
return value.substring(0, value.length() - suffix.length());
}
return value;
}
In the utility method, I've added a check to see if the suffix is actually at the end of the value.
Test:
String[] sufs = new String[] {
"AlphaSuffix",
"BravoSuffix",
"CharlieSuffix",
"DeltaSuffix"
};
for (int i = 0; i < sufs.length; i++) {
String s = chop(sufs[i], "Suffix");
System.out.println(s);
}
Gives:
Alpha
Bravo
Charlie
Delta
if suffixes are all different/unkown you can use
myString.replaceFirst("^(Alpha|Bravo|Charlie|Delta|...).*", "$1");
This question already has answers here:
Simple way to repeat a string
(32 answers)
Closed 4 years ago.
I did check the other questions; this question has its focus on solving this particular question the most efficient way.
Sometimes you want to create a new string with a specified length, and with a default character filling the entire string.
ie, it would be cool if you could do new String(10, '*') and create a new String from there, with a length of 10 characters all having a *.
Because such a constructor does not exist, and you cannot extend from String, you have either to create a wrapper class or a method to do this for you.
At this moment I am using this:
protected String getStringWithLengthAndFilledWithCharacter(int length, char charToFill) {
char[] array = new char[length];
int pos = 0;
while (pos < length) {
array[pos] = charToFill;
pos++;
}
return new String(array);
}
It still lacks any checking (ie, when length is 0 it will not work). I am constructing the array first because I believe it is faster than using string concatination or using a StringBuffer to do so.
Anyone else has a better sollution?
Apache Commons Lang (probably useful enough to be on the classpath of any non-trivial project) has StringUtils.repeat():
String filled = StringUtils.repeat("*", 10);
Easy!
Simply use the StringUtils class from apache commons lang project. You have a leftPad method:
StringUtils.leftPad("foobar", 10, '*'); // Returns "****foobar"
No need to do the loop, and using just standard Java library classes:
protected String getStringWithLengthAndFilledWithCharacter(int length, char charToFill) {
if (length > 0) {
char[] array = new char[length];
Arrays.fill(array, charToFill);
return new String(array);
}
return "";
}
As you can see, I also added suitable code for the length == 0 case.
Some possible solutions.
This creates a String with length-times '0' filled and replaces then the '0' with the charToFill (old school).
String s = String.format("%0" + length + "d", 0).replace('0', charToFill);
This creates a List containing length-times Strings with charToFill and then joining the List into a String.
String s = String.join("", Collections.nCopies(length, String.valueOf(charToFill)));
This creates a unlimited java8 Stream with Strings with charToFill, limits the output to length and collects the results with a String joiner (new school).
String s = Stream.generate(() -> String.valueOf(charToFill)).limit(length).collect(Collectors.joining());
In Java 11, you have repeat:
String s = " ";
s = s.repeat(1);
(Although at the time of writing still subject to change)
char[] chars = new char[10];
Arrays.fill(chars, '*');
String text = new String(chars);
To improve performance you could have a single predefined sting if you know the max length like:
String template = "####################################";
And then simply perform a substring once you know the length.
Solution using Google Guava
String filled = Strings.repeat("*", 10);
public static String fillString(int count,char c) {
StringBuilder sb = new StringBuilder( count );
for( int i=0; i<count; i++ ) {
sb.append( c );
}
return sb.toString();
}
What is wrong?
using Dollar is simple:
String filled = $("=").repeat(10).toString(); // produces "=========="
Solution using Google Guava, since I prefer it to Apache Commons-Lang:
/**
* Returns a String with exactly the given length composed entirely of
* the given character.
* #param length the length of the returned string
* #param c the character to fill the String with
*/
public static String stringOfLength(final int length, final char c)
{
return Strings.padEnd("", length, c);
}
The above is fine. Do you mind if I ask you a question - Is this causing you a problem? It seams to me you are optimizing before you know if you need to.
Now for my over engineered solution. In many (thou not all) cases you can use CharSequence instead of a String.
public class OneCharSequence implements CharSequence {
private final char value;
private final int length;
public OneCharSequence(final char value, final int length) {
this.value = value;
this.length = length;
}
public char charAt(int index) {
if(index < length) return value;
throw new IndexOutOfBoundsException();
}
public int length() {
return length;
}
public CharSequence subSequence(int start, int end) {
return new OneCharSequence(value, (end-start));
}
public String toString() {
char[] array = new char[length];
Arrays.fill(array, value);
return new String(array);
}
}
One extra note: it seems that all public ways of creating a new String instance involves necessarily the copy of whatever buffer you are working with, be it a char[], a StringBuffer or a StringBuilder. From the String javadoc (and is repeated in the respective toString methods from the other classes):
The contents of the character array are copied; subsequent modification of
the character array does not affect
the newly created string.
So you'll end up having a possibly big memory copy operation after the "fast filling" of the array. The only solution that may avoid this issue is the one from #mlk, if you can manage working directly with the proposed CharSequence implementation (what may be the case).
PS: I would post this as a comment but I don't have enough reputation to do that yet.
Try this Using the substring(int start, int end); method
String myLongString = "abcdefghij";
if (myLongString .length() >= 10)
String shortStr = myLongString.substring(0, 5)+ "...";
this will return abcde.
Mi solution :
pw = "1321";
if (pw.length() < 16){
for(int x = pw.length() ; x < 16 ; x++){
pw += "*";
}
}
The output :
1321************
Try this jobber
String stringy =null;
byte[] buffer = new byte[100000];
for (int i = 0; i < buffer.length; i++) {
buffer[i] =0;
}
stringy =StringUtils.toAsciiString(buffer);