Trim a string based on the string length - java

I want to trim a string if the length exceeds 10 characters.
Suppose if the string length is 12 (String s="abcdafghijkl"), then the new trimmed string will contain "abcdefgh..".
How can I achieve this?

s = s.substring(0, Math.min(s.length(), 10));
Using Math.min like this avoids an exception in the case where the string is already shorter than 10.
Notes:
The above does simple trimming. If you actually want to replace the last characters with three dots if the string is too long, use Apache Commons StringUtils.abbreviate; see #H6's solution. If you want to use the Unicode horizontal ellipsis character, see #Basil's solution.
For typical implementations of String, s.substring(0, s.length()) will return s rather than allocating a new String.
This may behave incorrectly1 if your String contains Unicode codepoints outside of the BMP; e.g. Emojis. For a (more complicated) solution that works correctly for all Unicode code-points, see #sibnick's solution.
1 - A Unicode codepoint that is not on plane 0 (the BMP) is represented as a "surrogate pair" (i.e. two char values) in the String. By ignoring this, we might trim the string to fewer than 10 code points, or (worse) truncate it in the middle of a surrogate pair. On the other hand, String.length() is not a good measure of Unicode text length, so trimming based on that property may be the wrong thing to do.

StringUtils.abbreviate from Apache Commons Lang library could be your friend:
StringUtils.abbreviate("abcdefg", 6) = "abc..."
StringUtils.abbreviate("abcdefg", 7) = "abcdefg"
StringUtils.abbreviate("abcdefg", 8) = "abcdefg"
StringUtils.abbreviate("abcdefg", 4) = "a..."
Commons Lang3 even allow to set a custom String as replacement marker. With this you can for example set a single character ellipsis.
StringUtils.abbreviate("abcdefg", "\u2026", 6) = "abcde…"

There is a Apache Commons StringUtils function which does this.
s = StringUtils.left(s, 10)
If len characters are not available, or the String is null, the String will be returned without an exception. An empty String is returned if len is negative.
StringUtils.left(null, ) = null
StringUtils.left(, -ve) = ""
StringUtils.left("", *) = ""
StringUtils.left("abc", 0) = ""
StringUtils.left("abc", 2) = "ab"
StringUtils.left("abc", 4) = "abc"
StringUtils.Left JavaDocs
Courtesy:Steeve McCauley

As usual nobody cares about UTF-16 surrogate pairs. See about them: What are the most common non-BMP Unicode characters in actual use? Even authors of org.apache.commons/commons-lang3
You can see difference between correct code and usual code in this sample:
public static void main(String[] args) {
//string with FACE WITH TEARS OF JOY symbol
String s = "abcdafghi\uD83D\uDE02cdefg";
int maxWidth = 10;
System.out.println(s);
//do not care about UTF-16 surrogate pairs
System.out.println(s.substring(0, Math.min(s.length(), maxWidth)));
//correctly process UTF-16 surrogate pairs
if(s.length()>maxWidth){
int correctedMaxWidth = (Character.isLowSurrogate(s.charAt(maxWidth)))&&maxWidth>0 ? maxWidth-1 : maxWidth;
System.out.println(s.substring(0, Math.min(s.length(), correctedMaxWidth)));
}
}

Or you can just use this method in case you don't have StringUtils on hand:
public static String abbreviateString(String input, int maxLength) {
if (input.length() <= maxLength)
return input;
else
return input.substring(0, maxLength-2) + "..";
}

s = s.length() > 10 ? s.substring(0, 9) : s;

Just in case you are looking for a way to trim and keep the LAST 10 characters of a string.
s = s.substring(Math.max(s.length(),10) - 10);

tl;dr
You seem to be asking for an ellipsis (…) character in the last place, when truncating. Here is a one-liner to manipulate your input string.
String input = "abcdefghijkl";
String output = ( input.length () > 10 ) ? input.substring ( 0 , 10 - 1 ).concat ( "…" ) : input;
See this code run live at IdeOne.com.
abcdefghi…
Ternary operator
We can make a one-liner by using the ternary operator.
String input = "abcdefghijkl" ;
String output =
( input.length() > 10 ) // If too long…
?
input
.substring( 0 , 10 - 1 ) // Take just the first part, adjusting by 1 to replace that last character with an ellipsis.
.concat( "…" ) // Add the ellipsis character.
: // Or, if not too long…
input // Just return original string.
;
See this code run live at IdeOne.com.
abcdefghi…
Java streams
The Java Streams facility makes this interesting, as of Java 9 and later. Interesting, but maybe not the best approach.
We use code points rather than char values. The char type is legacy, and is limited to the a subset of all possible Unicode characters.
String input = "abcdefghijkl" ;
int limit = 10 ;
String output =
input
.codePoints()
.limit( limit )
.collect( // Collect the results of processing each code point.
StringBuilder::new, // Supplier<R> supplier
StringBuilder::appendCodePoint, // ObjIntConsumer<R> accumulator
StringBuilder::append // BiConsumer<R,​R> combiner
)
.toString()
;
If we had excess characters truncated, replace the last character with an ellipsis.
if ( input.length () > limit )
{
output = output.substring ( 0 , output.length () - 1 ) + "…";
}
If only I could think of a way to put together the stream line with the "if over limit, do ellipsis" part.

The question is asked on Java, but it was back in 2014.
In case you use Kotlin now, it is as simple as:
yourString.take(10)
Returns a string containing the first n characters from this string, or the entire string if this string is shorter.
Documentation

str==null ? str : str.substring(0, Math.min(str.length(), 10))
or,
str==null ? "" : str.substring(0, Math.min(str.length(), 10))
Works with null.

// this is how you shorten the length of the string with ..
// add following method to your class
private String abbreviate(String s){
if(s.length() <= 10) return s;
return s.substring(0, 8) + ".." ;
}

Related

How to remove leading 0 in the time timestamp 02:25PM using java? [duplicate]

I've seen questions on how to prefix zeros here in SO. But not the other way!
Can you guys suggest me how to remove the leading zeros in alphanumeric text? Are there any built-in APIs or do I need to write a method to trim the leading zeros?
Example:
01234 converts to 1234
0001234a converts to 1234a
001234-a converts to 1234-a
101234 remains as 101234
2509398 remains as 2509398
123z remains as 123z
000002829839 converts to 2829839
Regex is the best tool for the job; what it should be depends on the problem specification. The following removes leading zeroes, but leaves one if necessary (i.e. it wouldn't just turn "0" to a blank string).
s.replaceFirst("^0+(?!$)", "")
The ^ anchor will make sure that the 0+ being matched is at the beginning of the input. The (?!$) negative lookahead ensures that not the entire string will be matched.
Test harness:
String[] in = {
"01234", // "[1234]"
"0001234a", // "[1234a]"
"101234", // "[101234]"
"000002829839", // "[2829839]"
"0", // "[0]"
"0000000", // "[0]"
"0000009", // "[9]"
"000000z", // "[z]"
"000000.z", // "[.z]"
};
for (String s : in) {
System.out.println("[" + s.replaceFirst("^0+(?!$)", "") + "]");
}
See also
regular-expressions.info
repetitions, lookarounds, and anchors
String.replaceFirst(String regex)
You can use the StringUtils class from Apache Commons Lang like this:
StringUtils.stripStart(yourString,"0");
If you are using Kotlin This is the only code that you need:
yourString.trimStart('0')
How about the regex way:
String s = "001234-a";
s = s.replaceFirst ("^0*", "");
The ^ anchors to the start of the string (I'm assuming from context your strings are not multi-line here, otherwise you may need to look into \A for start of input rather than start of line). The 0* means zero or more 0 characters (you could use 0+ as well). The replaceFirst just replaces all those 0 characters at the start with nothing.
And if, like Vadzim, your definition of leading zeros doesn't include turning "0" (or "000" or similar strings) into an empty string (a rational enough expectation), simply put it back if necessary:
String s = "00000000";
s = s.replaceFirst ("^0*", "");
if (s.isEmpty()) s = "0";
A clear way without any need of regExp and any external libraries.
public static String trimLeadingZeros(String source) {
for (int i = 0; i < source.length(); ++i) {
char c = source.charAt(i);
if (c != '0') {
return source.substring(i);
}
}
return ""; // or return "0";
}
To go with thelost's Apache Commons answer: using guava-libraries (Google's general-purpose Java utility library which I would argue should now be on the classpath of any non-trivial Java project), this would use CharMatcher:
CharMatcher.is('0').trimLeadingFrom(inputString);
You could just do:
String s = Integer.valueOf("0001007").toString();
Use this:
String x = "00123".replaceAll("^0*", ""); // -> 123
Use Apache Commons StringUtils class:
StringUtils.strip(String str, String stripChars);
Using Regexp with groups:
Pattern pattern = Pattern.compile("(0*)(.*)");
String result = "";
Matcher matcher = pattern.matcher(content);
if (matcher.matches())
{
// first group contains 0, second group the remaining characters
// 000abcd - > 000, abcd
result = matcher.group(2);
}
return result;
Using regex as some of the answers suggest is a good way to do that. If you don't want to use regex then you can use this code:
String s = "00a0a121";
while(s.length()>0 && s.charAt(0)=='0')
{
s = s.substring(1);
}
If you (like me) need to remove all the leading zeros from each "word" in a string, you can modify #polygenelubricants' answer to the following:
String s = "003 d0g 00ss 00 0 00";
s.replaceAll("\\b0+(?!\\b)", "");
which results in:
3 d0g ss 0 0 0
I think that it is so easy to do that. You can just loop over the string from the start and removing zeros until you found a not zero char.
int lastLeadZeroIndex = 0;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (c == '0') {
lastLeadZeroIndex = i;
} else {
break;
}
}
str = str.subString(lastLeadZeroIndex+1, str.length());
Without using Regex or substring() function on String which will be inefficient -
public static String removeZero(String str){
StringBuffer sb = new StringBuffer(str);
while (sb.length()>1 && sb.charAt(0) == '0')
sb.deleteCharAt(0);
return sb.toString(); // return in String
}
Using kotlin it is easy
value.trimStart('0')
You could replace "^0*(.*)" to "$1" with regex
String s="0000000000046457657772752256266542=56256010000085100000";
String removeString="";
for(int i =0;i<s.length();i++){
if(s.charAt(i)=='0')
removeString=removeString+"0";
else
break;
}
System.out.println("original string - "+s);
System.out.println("after removing 0's -"+s.replaceFirst(removeString,""));
If you don't want to use regex or external library.
You can do with "for":
String input="0000008008451"
String output = input.trim();
for( ;output.length() > 1 && output.charAt(0) == '0'; output = output.substring(1));
System.out.println(output);//8008451
I made some benchmark tests and found, that the fastest way (by far) is this solution:
private static String removeLeadingZeros(String s) {
try {
Integer intVal = Integer.parseInt(s);
s = intVal.toString();
} catch (Exception ex) {
// whatever
}
return s;
}
Especially regular expressions are very slow in a long iteration. (I needed to find out the fastest way for a batchjob.)
And what about just searching for the first non-zero character?
[1-9]\d+
This regex finds the first digit between 1 and 9 followed by any number of digits, so for "00012345" it returns "12345".
It can be easily adapted for alphanumeric strings.

Regex to consolidate multiple rules

I'm looking at optimising my string manipulation code and consolidating all of my replaceAll's to just one pattern if possible
Rules -
strip all special chars except -
replace space with -
condense consecutive - 's to just one -
Remove leading and trailing -'s
My code -
public static String slugifyTitle(String value) {
String slugifiedVal = null;
if (StringUtils.isNotEmpty(value))
slugifiedVal = value
.replaceAll("[ ](?=[ ])|[^-A-Za-z0-9 ]+", "") // strips all special chars except -
.replaceAll("\\s+", "-") // converts spaces to -
.replaceAll("--+", "-"); // replaces consecutive -'s with just one -
slugifiedVal = StringUtils.stripStart(slugifiedVal, "-"); // strips leading -
slugifiedVal = StringUtils.stripEnd(slugifiedVal, "-"); // strips trailing -
return slugifiedVal;
}
Does the job but obviously looks shoddy.
My test assertions -
Heading with symbols *~!##$%^&()_+-=[]{};',.<>?/ ==> heading-with-symbols
Heading with an asterisk* ==> heading-with-an-asterisk
Custom-id-&-stuff ==> custom-id-stuff
--Custom-id-&-stuff-- ==> custom-id-stuff
Disclaimer: I don't think a regex approach to this problem is wrong, or that this is an objectively better approach. I am merely presenting an alternative approach as food for thought.
I have a tendency against regex approaches to problems where you have to ask how to solve with regex, because that implies you're going to struggle to maintain that solution in the future. There is an opacity to regexes where "just do this" is obvious, when you know just to do this.
Some problems typically solved with regex, like this one, can be solved using imperative code. It tends to be more verbose, but it uses simple, apparent, code constructs; it's easier to debug; and can be faster because it doesn't involve the full "machinery" of the regex engine.
static String slugifyTitle(String value) {
boolean appendHyphen = false;
StringBuilder sb = new StringBuilder(value.length());
// Go through value one character at a time...
for (int i = 0; i < value.length(); i++) {
char c = value.charAt(i);
if (isAppendable(c)) {
// We have found a character we want to include in the string.
if (appendHyphen) {
// We previously found character(s) that we want to append a single
// hyphen for.
sb.append('-');
appendHyphen = false;
}
sb.append(c);
} else if (requiresHyphen(c)) {
// We want to replace hyphens or spaces with a single hyphen.
// Only append a hyphen if it's not going to be the first thing in the output.
// Doesn't matter if this is set for trailing hyphen/whitespace,
// since we then never hit the "isAppendable" condition.
appendHyphen = sb.length() > 0;
} else {
// Other characters are simply ignored.
}
}
// You can lowercase when appending the character, but `Character.toLowerCase()`
// recommends using `String.toLowerCase` instead.
return sb.toString().toLowerCase(Locale.ROOT);
}
// Some predicate on characters you want to include in the output.
static boolean isAppendable(char c) {
return (c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9');
}
// Some predicate on characters you want to replace with a single '-'.
static boolean requiresHyphen(char c) {
return c == '-' || Character.isWhitespace(c);
}
(This code is wildly over-commented, for the purpose of explaining it in this answer. Strip out the comments and unnecessary things like the else, it's actually not super complicated).
Consider the following regex parts:
Any special chars other than -: [\p{S}\p{P}&&[^-]]+ (character class subtraction)
Any one or more whitespace or hyphens: [^-\s]+ (this will be used to replace with a single -)
You will still need to remove leading/trailing hyphens, it will be a separate post-processing step. If you wish, you can use a ^-+|-+$ regex.
So, you can only reduce this to three .replaceAll invocations keeping the code precise and readable:
public static String slugifyTitle(String value) {
String slugifiedVal = null;
if (value != null && !value.trim().isEmpty())
slugifiedVal = value.toLowerCase()
.replaceAll("[\\p{S}\\p{P}&&[^-]]+", "") // strips all special chars except -
.replaceAll("[\\s-]+", "-") // converts spaces/hyphens to -
.replaceAll("^-+|-+$", ""); // remove trailing/leading hyphens
return slugifiedVal;
}
See the Java demo:
List<String> strs = Arrays.asList("Heading with symbols *~!##$%^&()_+-=[]{};',.<>?/",
"Heading with an asterisk*",
"Custom-id-&-stuff",
"--Custom-id-&-stuff--");
for (String str : strs)
System.out.println("\"" + str + "\" => " + slugifyTitle(str));
}
Output:
"Heading with symbols *~!##$%^&()_+-=[]{};',.<>?/" => heading-with-symbols
"Heading with an asterisk*" => heading-with-an-asterisk
"Custom-id-&-stuff" => custom-id-stuff
"--Custom-id-&-stuff--" => custom-id-stuff
NOTE: if your strings can contain any Unicode whitespace, replace "[\\s-]+" with "(?U)[\\s-]+".

( Regex ) How can i convert a number to a roman form if the number is repetitive number

The output should be like;
Hans4444müller ---> HansIVmüller
Mary555kren ---> MaryVkren
Firstly I have tried to get all repetitive numbers from a word with that regex:
(\d)\1+ // and replace that with $1
After I get the repetitive number such as 4, I tried to change this number to IV but
unfortunately I can't find the correct regex for this.
What I think about this algorithm is if there is a repeating number, replace that number with the roman form.
Are there any possible way to do it with regex ?
I don't know Java very well, but I do know regular expressions, C# and JavaScript. I am confident you can adapt one of my techniques to Java.
I have sample code with two different techniques.
The first invokes a function on every match to perform the replacement
The second iterates the matches provided by your regular expression you and convert each match into Roman numerals, then injects the result into your original text.
The link below illustrates technique 1 using DotNetFiddle. The replacement function takes a method name. The method in question performs is invoked for every match. This technique requires very little code.
https://dotnetfiddle.net/o9gG28. If you're lucky, Java has a similar technique available.
Technique 2: a javascript version that loops through every match found by the regex:
https://jsfiddle.net/ActualRandy/rxnzoc3u/81/. The method does some string concatenation using the replacement value.
Here's some code for method 2 using .NET syntax, Java should be similar. The key methods are 'Match' and 'GetNextMatch'. Match uses your regex to get the first match.
private void btnRegexRep_Click(object sender, RoutedEventArgs e) {
string fixThis = #"Hans4444müller,Mary555kren";
var re = new Regex("\\d+");
string result = "";
int lastIndex = 0;
string lastMatch = "";
//Get the first match using the regular expression:
var m = re.Match(fixThis);
//Keep looping while we can match:
while (m.Success) {
//Get length of text between last match and current match:
int len = m.Index - (lastIndex + lastMatch.Length);
result += fixThis.Substring(lastIndex + lastMatch.Length, len) + GetRomanText(m);
//Save values for next iteration:
lastIndex = m.Index;
lastMatch = m.Value;
m = m.NextMatch();
}
//Append text after last match:
if (lastIndex > 0) {
result += fixThis.Substring(lastIndex + lastMatch.Length);
}
Console.WriteLine(result);
}
private string GetRomanText(Match m) {
string[] roman = new[] { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "IX" };
string result = "";
// Get ASCII value of first digit from the match (remember, 48= ascii 0, 57=ascii 9):
char c = m.Value[0];
if (c >= 48 && c <= 57) {
int index = c - 48;
result = roman[index];
}
return result;
}

Replace digits in String with that amount of a certain character using regex one-liner

Is it possible to replace digits in a String with that amount of a certain character like 'X' using a regex? (I.e. replace "test3something8" with "testXXXsomethingXXXXXXXX")?
I know I could use a for-loop, but I'm looking for an as short-as-possible (regex) one-liner solution (regarding a code-golf challenge - and yes, I am aware of the codegolf-SE, but this is a general Java question instead of a codegolf question).
I know how to replace digits with one character:
String str = "test3something8".replaceAll("[1-9]", "X"); -> str = "testXsomethingX"
I know how to use groups, in case you want to use the match, or add something after every match instead of replacing it:
String str = "test3something8".replaceAll("([1-9])", "$1X"); -> str = "test3Xsomething8X"
I know how to create n amount of a certain character:
int n = 5; String str = new String(new char[n]).replace('\0', 'X'); -> str = "XXXXX"
Or like this:
int n = 5; String str = String.format("%1$"+n+"s", "").replace(' ', 'X'); -> str = "XXXXX";
What I want is something like this (the code below obviously doesn't work, but it should give an idea of what I somehow want to achieve - preferably even a lot shorter):
String str = "test3Xsomething8X"
.replaceAll("([1-9])", new String(new char[new Integer("$1")]).replace('\0', 'X')));
// Result I want it to have: str = "testXXXsomethingXXXXXXXX"
As I said, this above doesn't work because "$1" should be used directly, so now it's giving a
java.lang.NumberFormatException: For input string: "$1"
TL;DR: Does anyone know a one-liner to replace a digit in a String with that amount of a certain character?
If you really want to have it as a one-liner. A possible solution (see it more as a PoC) could be to use the Stream API.
String input = "test3something8";
input.chars()
.mapToObj(
i -> i >= '0' && i <= '9' ?
new String(new char[i-'0']).replace('\0', 'X')
: "" + ((char)i)
)
.forEach(System.out::print);
output
testXXXsomethingXXXXXXXX
note No investigation has been done for performance, scalability, to be GC friendly, etc.

Is there a way to extract a value from a string which is separated by a certain character?

My app reads a value of a NFC tag which contains plain text, to then cut the read String up.
The string should be as follow:
"r=v;b=v;g=v;n=v;p=v;m=v;s=v"
I want to read the "v" characters, since they are divided by the ; character, and i remember there being a function that let me divide strings like this, how do i do it? The v value isn't constant, it could span 1 position like it could span 3 or 4. The app is for Android phones, written in Java on Android Studio.
You are asking about String method .split()
it's splits string into array and so for you question, since split can work with regex you can split exactly for needed patterns
like this:
String givenString="r=v;b=v;g=v;n=v;p=v;m=v;s=v";
String[]vs=givenString.split("([;]?[a-z]{1}[=])");
for(String v: vs){System.out.println(v);}//prints all v
Regex explanation:
[;]? -> means may start with one semicolon or none
[a-z]{1} -> means one letter lower case only
[=] -> means equals sign
Edit: if you use split by only semicolon (as #cvester suggested), you get the whole entry string, such as: "r=v","b=v", etc..
in this case you can iterate over all entries and then make one more split by equals "=" like this:
String []entries=givenString.split(";");
for (String entry:entries){
String []vs=entry.split("=");
System.out.println(vs[1]);//prints all v
}
Java has a split function on a String.
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
So you can just use split(";");
Use String#split() or you could create your own logic.
String string = "r=v;b=asfasv;g=v;n=asf;p=v;m=v;s=vassaf";
String word = "";
int i = 0;
ArrayList<String> list = new ArrayList();
while(i < string.length())
{
if(string.charAt(i) == '=')
{
i++;
while( i < string.length() && string.charAt(i) != ';' )
{
word += string.charAt(i);
i++;
}
list.add(word);
System.out.println(word);
word = "";
}
i ++;
}
if(!word.equals(""))
list.add(word);
System.out.println(list);
You could use regex to avoid looping
String input = "r=v;b=v;g=v;n=v;p=v;m=v;s=v";
input.trim().replaceAll("([a-zA-Z]=)+","").replaceAll(";",""));

Categories