Converting C++ std::string's find_*_of() methods to Java - java

When converting code from C++ to Java, what is an easy way to convert the std::string methods like find_last_of(), find_last_not_of, etc?
These C++ methods find an index of any of a set of characters.
Java's String class provides indexOf() and lastIndexOf(), but these find an index of a character or a string, not any of a set of characters.
For example, the code below finds the last character that is not ASCII whitespace.
size_t pos = myString.find_last_not_of( " \t\n\r" );

One option is to use Guava's CharMatcher class.
Here are tested conversions for each of the single-argument find_*_of() methods.
public int findFirstOf( String sequence, String str ) {
return CharMatcher.anyOf( str ).indexIn( sequence );
}
public int findFirstNotOf( String sequence, String str ) {
return CharMatcher.anyOf( str ).negate().indexIn( sequence );
}
public int findLastOf( String sequence, String str ) {
return CharMatcher.anyOf( str ).lastIndexIn( sequence );
}
public int findLastNotOf( String sequence, String str ) {
return CharMatcher.anyOf( str ).negate().lastIndexIn( sequence );
}
Other answers welcomed. [I couldn't find anything for find_last_not_of() in Java when searching on stackoverflow and elsewhere. And I missed CharMatcher the first time I searched through Guava for corresponding functionality. I'd like to document this easy conversion for future use.]

If you like regex, you can give the below equivalents a shot. This might not be the most efficient method, but certainly worth considering, if you don't want to use any 3rd party library (Given that, there are no equivalent methods in String class in Java).
P.S: - If you are comfortable with 3rd party library, then I wouldn't suggest using regex for this task, as this might soon become difficult to extend as per requirement.
So, this is just another option :
public int findFirstOf( String sequence, String str ) {
String regex = "^[^" + Pattern.quote(str) + "]*";
int index = sequence.length() - sequence.replaceAll(regex, "").length();
return index == sequence.length() ? -1 : index;
}
public int findFirstNotOf( String sequence, String str ) {
String regex = "^[" + Pattern.quote(str) + "]*";
int index = sequence.length() - sequence.replaceAll(regex, "").length();
return index == sequence.length() ? -1 : index;
}
public int findLastOf( String sequence, String str ) {
String regex = "[^" + Pattern.quote(str) + "]*$";
return sequence.replaceAll(regex, "").length() - 1;
}
public int findLastNotOf( String sequence, String str ) {
String regex = "[" + Pattern.quote(str) + "]*$";
return sequence.replaceAll(regex, "").length() - 1;
}
I haven't tested above methods. You can do the test, and compare the results with the corresponding method you have got, and see if this works. Please get back, if this doesn't work.
As far as 3rd party library is concerned, you also have Apache Commons, StringUtils class, with following methods:
StringUtils#indexOfAny()
StringUtils#indexOfAnyBut()
StringUtils#lastIndexOfAny()

static int findFirstNotOf(String searchIn, String searchFor, int searchFrom) {
boolean found;
char c;
int i;
for (i = searchFrom; i < searchIn.length(); i++) {
found = true;
c = searchIn.charAt(i);
System.out.printf("s='%s', idx=%d\n",c,searchFor.indexOf(c));
if (searchFor.indexOf(c) == -1) {
found = false;
}
if (!found) {
return i;
}
}
return i;
}
static int findLastNotOf(String searchIn, String searchFor, int searchFrom) {
boolean found;
char c;
int i;
for ( i = searchFrom; i>=0; i--) {
found = true;
c = searchIn.charAt(i);
System.out.printf("s='%s', idx=%d\n",c,searchFor.indexOf(c));
if (searchFor.indexOf(c) == -1)
found = false;
if (!found) return i;
}
return i;
}
public static void main(String[] args){
String str = "look for non-alphabetic characters...";
int found = findFirstNotOf(str,"abcdefghijklmnopqrstuvwxyz ",0);
if (found!=str.length()) {
System.out.print("The first non-alphabetic character is " + str.charAt(found));
System.out.print(" at position " + found + '\n');
}
found = findLastNotOf(str,"abcdefghijklmnopqrstuvwxyz ",str.length()-1);
if (found>=0) {
System.out.print("The last non-alphabetic character is " + str.charAt(found));
System.out.print(" at position " + found + '\n');
}
str = "Please, erase trailing white-spaces \n";
String whitespaces = " \t\f\n\r";
found = findLastNotOf(str,whitespaces,str.length()-1);
if (found!=str.length()-1)
str = str.substring(0,found+1);
else
str = ""; // str is all whitespace
System.out.printf('['+ str +"]\n");
}

Related

How to efficiently remove consecutive same characters in a string

I wrote a method to reduce a sequence of the same characters to a single character as follows. It seems its logic is correct while there is a room for improvement in terms of performance, according to my tutor. Could anyone shed some light on this?
Comments of aspects other than performance is also really appreciated.
public class RemoveRepetitions {
public static String remove(String input) {
String ret = "";
String last = "";
String[] stringArray = input.split("");
for(int j=0; j < stringArray.length; j++) {
if (! last.equals(stringArray[j]) ) {
ret += stringArray[j];
}
last = stringArray[j];
}
return ret;
}
public static void main(String[] args) {
System.out.println(RemoveRepetitions.remove("foobaarrbuzz"));
}
}
We can improve the performance by using StringBuilder instead of using string as string operations are costlier. Also, the split function is also not required (it will make the program slower as well).
Here is a way to solve this:
public static String remove(String input)
{
StringBuilder answer = new StringBuilder("");
int N = input.length();
int i = 0;
while (i < N)
{
char c = input.charAt(i);
answer.append( c );
while (i<N && input.charAt(i)==c)
++i;
}
return answer.toString();
}
The idea is to iterate over all characters of the input string and keep appending every new character to the answer and skip all the same consecutive characters.
Possible change which you could think of in your code is:
Time Complexity: Your code is achieving output in O(n) time complexity, which might be the best possible way.
Space Complexity: Your code is using extra memory space which arises due to splitting.
Question to ask: Can you achieve this output, without using the extra space for character array that you get after splitting the string? (as character by character traversal is possible directly on string).
I can provide you the code here but, it would be great if you could try it on your own, once you are done with your attempts
you can lookup for the best solution here (you are almost there)
https://www.geeksforgeeks.org/remove-consecutive-duplicates-string/
Good luck!
As mentioned before, it is much better to access the characters in the string using method String::charAt or at least by iterating a char array retrieved with String::toCharArray instead of splitting the input string into String array.
However, Java strings may contain characters exceeding basic multilingual plane of Unicode (e.g. emojis 😂😍😊, Chinese or Japanese characters etc.) and therefore String::codePointAt should be used. Respectively, Character.charCount should be used to calculate appropriate offset while iterating the input string.
Also the input string should be checked if it's null or empty, so the resulting code may look like this:
public static String dedup(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int prev = -1;
int n = str.length();
System.out.println("length = " + n + " of [" + str + "], real length: " + str.codePointCount(0, n));
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ) {
int cp = str.codePointAt(i);
if (i == 0 || cp != prev) {
sb.appendCodePoint(cp);
}
prev = cp;
i += Character.charCount(cp); // for emojis it returns 2
}
return sb.toString();
}
A version with String::charAt may look like this:
public static String dedup2(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int n = str.length();
StringBuilder sb = new StringBuilder(n);
sb.append(str.charAt(0));
for (int i = 1; i < n; i++) {
if (str.charAt(i) != str.charAt(i - 1)) {
sb.append(str.charAt(i));
}
}
return sb.toString();
}
The following test proves that charAt fails to deduplicate repeated emojis:
System.out.println("codePoint: " + dedup ("😂😂😍😍😊😊😂 hello"));
System.out.println("charAt: " + dedup2("😂😂😍😍😊😊😂 hello"));
Output:
length = 20 of [😂😂😍😍😊😊😂 hello], real length: 13
codePoint: 😂😍😊😂 helo
charAt: 😂😂😍😍😊😊😂 helo

java: to retrieve a part of a string that begins and ends with a specific string

This is the program and i have a string strLineText from which i need to extract the words that contain target in them.
Ex. In the string "random string with IWANTTHISABC-123 and more" i need to extract IWANTTHISABC-123. Similarly if the string is "random string with IWANTTHISBBC-001" i need to extract `IWANTTHISBBC-001. The prefix is fixed
I have tried it with substring() (Method1) but the logic doesn't work for Strings that end with this target word i.e., nothing is outputted
I tried the split() (Method2) and it works for all four combinations.
Can you help me with achieving using the substring() (Method1) for all four combinations
public static void main(String[] args) throws IOException {
String target = "IWANTTHIS";
//Four possible inputs
String strLineText = "random string with IWANTTHISABC-123 and more"; //works
String strLineText = "IWANTTHISCBC-45601 and more"; //works
String strLineText = "IWANTTHISEBC-1"; //doesn't work
String strLineText = "random string with IWANTTHISKBC-55545"; //doesn't work
//Method1
System.out.println("O/P 1:" + strLineText.substring(strLineText.indexOf(target),
strLineText.indexOf(target) + strLineText.substring(strLineText.indexOf(target)).indexOf(" ") + 1).trim());
//Method2
for (String s : strLineText.split(" "))
if (s.contains(target))
System.out.println("O/P 2:" + s.trim());
}
I think it is pretty straightforward, you just need to compute the end index starting from begin index. Here is the snippet that works for all cases.
int begin = strLineText.indexOf(target);
int end = strLineText.indexOf(" ", begin);
if(end == -1) end = strLineText.length();
System.out.println(strLineText.substring(begin, end));
Assumes that your definition of "word" is a sequence of alphas, excluding numbers, symbols, etc. For other definitions of "word," the regular expression can be adjusted accordingly. If you want to include parts of the word previous to the target string, you can add a loop that counts backwards from startIndex, examining characters to see if they are alpha.
public class Foo
{
public static void main(String[] args)
{
String target = "IWANTTHIS";
// String candidate = "random string with IWANTTHISABC-123 and more";
String candidate = "IWANTTHISCBC-45601 and more";
// String candidate = "IWANTTHISEBC-1";
// String candidate = "random string with IWANTTHISKBC-55545";
int startIndex = -1;
int endIndex = -1;
if(candidate.contains(target))
{
System.out.println("Target located.");
startIndex = candidate.indexOf(target);
System.out.println("target starts at " + startIndex);
// keep adding characters until first non-alpha char
endIndex = startIndex + target.length();
boolean wordEnded = false;
while(!wordEnded && (endIndex >= candidate.length()))
{
String foo = Character.toString(candidate.charAt(endIndex + 1));
if(foo.matches("[a-zA-Z]"))
{
endIndex++;
}
else
{
wordEnded = true;
}
}
String full = candidate.substring(startIndex, endIndex + 1);
System.out.println("Full string = " + full);
}
else
{
System.out.println("No target located. Exiting.");
}
}
}
strLineText.substring(strLineText.indexOf(target)).indexOf(" ") will be -1 if strLineText contains no spaces after your target string. You could check if strLineText.substring(strLineText.indexOf(target)) contains spaces, and if not, take the substring until the end of strLineText:
//Method1
int beginIndex = strLineText.indexOf(target);
String substring = strLineText.substring(beginIndex);
int endIndex = substring.contains(" ") ? beginIndex + substring.indexOf(" ") : strLineText.length();
System.out.println("O/P 1:" + strLineText.substring(beginIndex, endIndex));

Read string format and fetch required irregular data

I have a string format like this which is output of
readAllBytes(new String(Files.readAllBytes(Paths.get(data))
from a file
a+2 b+3 c+33 d+88 ......
My scenario is I want to get the data after c+" ". The position of c is not constant but c occurs only once. It may occur anywhere. My required value will always be after c+ only. The required size of value 33.....is also not constant. Can someone help me with the optimal code please? I think collections need to be used here.
You can use this regex which will let you capture the data you want,
c\+(\d+)
Explanation:
c+ matches a literal c character immediately followed by a + char
(\d+) captures the next digit(s) which you are interested in capturing.
Demo, https://regex101.com/r/jfYUPG/1
Here is a java code for demonstrating same,
public static void main(String args[]) {
String s = "a+2 b+3 c+33 d+88 ";
Pattern p = Pattern.compile("c\\+(\\d+)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("Data: " + m.group(1));
} else {
System.out.println("Input data doesn't match the regex");
}
}
This gives following output,
Data: 33
This code is extracting the value right after c+ up to the next space, or to the end of the string if there is no space:
String str = "a+2 b+3 c+33 d+88 ";
String find = "c+";
int index = str.indexOf(" ", str.indexOf(find) + 2);
if (index == -1)
index = str.length();
String result = str.substring(str.indexOf(find) + 2, index);
System.out.println(result);
prints
33
or in a method:
public static String getValue(String str, String find) {
int index = str.indexOf(find) + 2;
int indexSpace = str.indexOf(" ", index);
if (indexSpace == -1)
indexSpace = str.length();
return str.substring(index, indexSpace);
}
public static void main(String[] args) {
String str = "a+2 b+3 c+33 d+88 ";
String find = "c+";
System.out.println(getValue(str, find));
}

Removing contiguous spaces in a String without trim() and replaceAll()

I have to remove leading and trailing spaces from the given string as well as combine the contiguous spaces. For example,
String str = " this is a string containing numerous whitespaces ";
and I need to return it as:
"this is a string containing numerous whitespaces";
But the problem is I can't use String#trim(). (This is a homework and I'm not allowed to use such methods.) I'm currently trying it by accessing each character one-by-one but quite unsuccessful.
I need an optimized code for this. Could anybody help? I need it to be done by today :(
EDIT: Answer posted before we were told we couldn't use replaceAll. I'm leaving it here on the grounds that it may well be useful to other readers, even if it's not useful to the OP.
I need an optimized code for this.
Do you really need it to be opimtized? Have you identified this as a bottleneck?
This should do it:
str = str.replaceAll("\\s+", " ");
That's a regular expression to say "replace any contintiguous whitespace with a single space". It may not be the fastest possible, but I'd benchmark it before trying anything else.
Note that this will replace all whitespace with spaces - so if you have tabs or other whitespace characters, they will be replaced with spaces too.
I'm not permitted to use these methods. I've to do this with loops
and all.
So i wrote for you some little snipet of code if you can't use faster and more efficient way:
String str = " this is a string containing numerous whitespaces ";
StringBuffer buff = new StringBuffer();
String correctedString = "";
boolean space = false;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (c == ' ') {
if (!space && i > 0) {
buff.append(c);
}
space = true;
}
else {
buff.append(c);
space = false;
}
}
String temp = buff.toString();
if (temp.charAt(temp.length() - 1) == ' ') {
correctedString = temp.substring(0, buff.toString().length() - 1);
System.out.println(correctedString);
}
System.out.println(buff.toString())
Note:
But this is "harcoded" and only for "learning".
More efficient way is for sure use approaches pointed out by #JonSkeet and #BrunoReis
What about str = str.replaceAll(" +", " ").trim();?
If you don't want to use trim() (and I really don't see a reason not to), replace it with:
str = str.replaceAll(" +", " ").replaceAll("^ ", "").replaceAll(" $", "");`
Remove White Spaces without Using any inbuilt library Function
this is just a simple example with fixed array size.
public class RemWhite{
public static void main(String args[]){
String s1=" world qwer ";
int count=0;
char q[]=new char[9];
char ch[]=s1.toCharArray();
System.out.println(ch);
for(int i=0;i<=ch.length-1;i++)
{
int j=ch[i];
if(j==32)
{
continue;
}
else
q[count]=ch[i];
count++;
}
System.out.println(q);
}}
To remove single or re-occurrence of space.
public class RemoveSpace {
public static void main(String[] args) {
char space = ' ';
int ascii = (int) space;
String str = " this is a string containing numerous whitespaces ";
char c[] = str.toCharArray();
for (int i = 0; i < c.length - 1; i++) {
if (c[i] == ascii) {
continue;
} else {
System.out.print(c[i]);
}
}
}
}
If you don't want to use any inbuilt methods here's what you refer
private static String trim(String s)
{
String s1="";boolean nonspace=false;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)!=' ' || nonspace)
{
s1 = s1+s.charAt(i);
nonspace = true;
}
}
nonspace = false;
s="";
for(int i=s1.length()-1;i>=0;i--)
{
if(s1.charAt(i)!=' ' || nonspace)
{
s = s1.charAt(i)+s;
nonspace = true;
}
}
return s;
}
package removespace;
import java.util.Scanner;
public class RemoveSpace {
public static void main(String[] args) {
Scanner scan= new Scanner(System.in);
System.out.println("Enter the string");
String str= scan.nextLine();
String str2=" ";
char []arr=str.toCharArray();
int i=0;
while(i<=arr.length-1)
{
if(arr[i]==' ')
{
i++;
}
else
{
str2= str2+arr[i];
i++;
}
}
System.out.println(str2);
}
}
This code is used for removing the white spaces and re-occurrence of alphabets in the given string,without using trim(). We accept a string from user. We separate it in characters by using charAt() then we compare each character with null(' '). If null is found we skip it and display that character in the else part. For skipping the null we increment the index i by 1.
try this code to get the solution of your problem.
String name = " abc ";
System.out.println(name);
for (int i = 0; i < name.length(); i++) {
char ch = name.charAt(i);
if (ch == ' ') {
i = 2 + i - 2;
} else {
System.out.print(name.charAt(i));
}
}

Generate fixed length Strings filled with whitespaces

I need to produce fixed length string to generate a character position based file. The missing characters must be filled with space character.
As an example, the field CITY has a fixed length of 15 characters. For the inputs "Chicago" and "Rio de Janeiro" the outputs are
" Chicago"
" Rio de Janeiro".
Since Java 1.5 we can use the method java.lang.String.format(String, Object...) and use printf like format.
The format string "%1$15s" do the job. Where 1$ indicates the argument index, s indicates that the argument is a String and 15 represents the minimal width of the String.
Putting it all together: "%1$15s".
For a general method we have:
public static String fixedLengthString(String string, int length) {
return String.format("%1$"+length+ "s", string);
}
Maybe someone can suggest another format string to fill the empty spaces with an specific character?
Utilize String.format's padding with spaces and replace them with the desired char.
String toPad = "Apple";
String padded = String.format("%8s", toPad).replace(' ', '0');
System.out.println(padded);
Prints 000Apple.
Update more performant version (since it does not rely on String.format), that has no problem with spaces (thx to Rafael Borja for the hint).
int width = 10;
char fill = '0';
String toPad = "New York";
String padded = new String(new char[width - toPad.length()]).replace('\0', fill) + toPad;
System.out.println(padded);
Prints 00New York.
But a check needs to be added to prevent the attempt of creating a char array with negative length.
This code will have exactly the given amount of characters; filled with spaces or truncated on the right side:
private String leftpad(String text, int length) {
return String.format("%" + length + "." + length + "s", text);
}
private String rightpad(String text, int length) {
return String.format("%-" + length + "." + length + "s", text);
}
For right pad you need String.format("%0$-15s", str)
i.e. - sign will "right" pad and no - sign will "left" pad
See my example:
import java.util.Scanner;
public class Solution {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
System.out.println("================================");
for(int i=0;i<3;i++)
{
String s1=sc.nextLine();
Scanner line = new Scanner( s1);
line=line.useDelimiter(" ");
String language = line.next();
int mark = line.nextInt();;
System.out.printf("%s%03d\n",String.format("%0$-15s", language),mark);
}
System.out.println("================================");
}
}
The input must be a string and a number
example input : Google 1
String.format("%15s",s) // pads left
String.format("%-15s",s) // pads right
Great summary here
import org.apache.commons.lang3.StringUtils;
String stringToPad = "10";
int maxPadLength = 10;
String paddingCharacter = " ";
StringUtils.leftPad(stringToPad, maxPadLength, paddingCharacter)
Way better than Guava imo. Never seen a single enterprise Java project that uses Guava but Apache String Utils is incredibly common.
You can also write a simple method like below
public static String padString(String str, int leng) {
for (int i = str.length(); i <= leng; i++)
str += " ";
return str;
}
The Guava Library has Strings.padStart that does exactly what you want, along with many other useful utilities.
Here's a neat trick:
// E.g pad("sss","00000000"); should deliver "00000sss".
public static String pad(String string, String pad) {
/*
* Add the pad to the left of string then take as many characters from the right
* that is the same length as the pad.
* This would normally mean starting my substring at
* pad.length() + string.length() - pad.length() but obviously the pad.length()'s
* cancel.
*
* 00000000sss
* ^ ----- Cut before this character - pos = 8 + 3 - 8 = 3
*/
return (pad + string).substring(string.length());
}
public static void main(String[] args) throws InterruptedException {
try {
System.out.println("Pad 'Hello' with ' ' produces: '"+pad("Hello"," ")+"'");
// Prints: Pad 'Hello' with ' ' produces: ' Hello'
} catch (Exception e) {
e.printStackTrace();
}
}
Here is the code with tests cases ;) :
#Test
public void testNullStringShouldReturnStringWithSpaces() throws Exception {
String fixedString = writeAtFixedLength(null, 5);
assertEquals(fixedString, " ");
}
#Test
public void testEmptyStringReturnStringWithSpaces() throws Exception {
String fixedString = writeAtFixedLength("", 5);
assertEquals(fixedString, " ");
}
#Test
public void testShortString_ReturnSameStringPlusSpaces() throws Exception {
String fixedString = writeAtFixedLength("aa", 5);
assertEquals(fixedString, "aa ");
}
#Test
public void testLongStringShouldBeCut() throws Exception {
String fixedString = writeAtFixedLength("aaaaaaaaaa", 5);
assertEquals(fixedString, "aaaaa");
}
private String writeAtFixedLength(String pString, int lenght) {
if (pString != null && !pString.isEmpty()){
return getStringAtFixedLength(pString, lenght);
}else{
return completeWithWhiteSpaces("", lenght);
}
}
private String getStringAtFixedLength(String pString, int lenght) {
if(lenght < pString.length()){
return pString.substring(0, lenght);
}else{
return completeWithWhiteSpaces(pString, lenght - pString.length());
}
}
private String completeWithWhiteSpaces(String pString, int lenght) {
for (int i=0; i<lenght; i++)
pString += " ";
return pString;
}
I like TDD ;)
Apache common lang3 dependency's StringUtils exists to solve Left/Right Padding
Apache.common.lang3 provides the StringUtils class where you can use the following method to left padding with your preferred character.
StringUtils.leftPad(final String str, final int size, final char padChar);
Here, This is a static method and the parameters
str - string needs to be pad (can be null)
size - the size to pad to
padChar the character to pad with
We have additional methods in that StringUtils class as well.
rightPad
repeat
different join methods
I just add the Gradle dependency here for your reference.
implementation 'org.apache.commons:commons-lang3:3.12.0'
https://mvnrepository.com/artifact/org.apache.commons/commons-lang3/3.12.0
Please see all the utils methods of this class.
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html
GUAVA Library Dependency
This is from jricher answer. The Guava Library has Strings.padStart that does exactly what you want, along with many other useful utilities.
This code works great.
String ItemNameSpacing = new String(new char[10 - masterPojos.get(i).getName().length()]).replace('\0', ' ');
printData += masterPojos.get(i).getName()+ "" + ItemNameSpacing + ": " + masterPojos.get(i).getItemQty() +" "+ masterPojos.get(i).getItemMeasure() + "\n";
Happy Coding!!
public static String padString(String word, int length) {
String newWord = word;
for(int count = word.length(); count < length; count++) {
newWord = " " + newWord;
}
return newWord;
}
This simple function works for me:
public static String leftPad(String string, int length, String pad) {
return pad.repeat(length - string.length()) + string;
}
Invocation:
String s = leftPad(myString, 10, "0");
public class Solution {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
for (int i = 0; i < 3; i++) {
int s;
String s1 = sc.next();
int x = sc.nextInt();
System.out.printf("%-15s%03d\n", s1, x);
// %-15s -->pads right,%15s-->pads left
}
}
}
Use printf() to simply format output without using any library.

Categories