Java: Efficient way to determine if a String meets several criteria? - java

I would like to find an efficient way (not scanning the String 10,000 times, or creating lots of intermediary Strings for holding temporary results, or string bashing, etc.) to write a method that accepts a String and determine if it meets the following criteria:
It is at least 2 characters in length
The first character is uppercased
The remaining substring after the first character contains at least 1 lowercased character
Here's my attempt so far:
private boolean isInProperForm(final String token) {
if(token.length() < 2)
return false;
char firstChar = token.charAt(0);
String restOfToken = token.substring(1);
String firstCharAsString = firstChar + "";
String firstCharStrToUpper = firstCharAsString.toUpperCase();
// TODO: Giving up because this already seems way too complicated/inefficient.
// Ignore the '&& true' clause - left it there as a placeholder so it wouldn't give a compile error.
if(firstCharStrToUpper.equals(firstCharAsString) && true)
return true;
// Presume false if we get here.
return false;
}
But as you can see I already have 1 char and 3 temp strings, and something just doesn't feel right. There's got to be a better way to write this. It's important because this method is going to get called thousands and thousands of times (for each tokenized word in a text document). So it really really needs to be efficient.
Thanks in advance!

This function should cover it. Each char is examined only once and no objects are created.
public static boolean validate(String token) {
if (token == null || token.length() < 2) return false;
if (!Character.isUpperCase(token.charAt(0)) return false;
for (int i = 1; i < token.length(); i++)
if (Character.isLowerCase(token.charAt(i)) return true;
return false;

The first criteria is simply the length - this data is cached in the string object and is not requiring traversing the string.
You can use Character.isUpperCase() to determine if the first char is upper case. No need as well to traverse the string.
The last criteria requires a single traversal on the string- and stop when you first find a lower case character.
P.S. An alternative for the 2+3 criteria combined is to use a regex (not more efficient - but more elegant):
return token.matches("[A-Z].*[a-z].*");
The regex is checking if the string starts with an upper case letter, and then followed by any sequence which contains at least one lower case character.

It is at least 2 characters in length
The first character is
uppercased
The remaining substring after the first character contains
at least 1 lowercased character
Code:
private boolean isInProperForm(final String token) {
if(token.length() < 2) return false;
if(!Character.isUpperCase(token.charAt(0)) return false;
for(int i = 1; i < token.length(); i++) {
if(Character.isLowerCase(token.charAt(i)) {
return true; // our last criteria, so we are free
// to return on a met condition
}
}
return false; // didn't meet the last criteria, so we return false
}
If you added more criteria, you'd have to revise the last condition.

What about:
return token.matches("[A-Z].*[a-z].*");
This regular expression starts with an uppercase letter and has at least one following lowercase letter and therefore meets your requirements.

To find if the first character is uppercase:
Character.isUpperCase(token.charAt(0))
To check if there is at least one lowercase:
if(Pattern.compile("[a-z]").matcher(token).find()) {
//At least one lowercase
}

To check if first char is uppercase you can use:
Character.isUpperCase(s.charAt(0))

return token.matches("[A-Z].[a-z].");

Related

Capitalize each word in a string, how does this code check the previous space

I recently came across this solution to writing a program that capitalizes each word in a string and I'm trying really hard to understand it, but I can't get over one hurdle.
public class test {
public static void main(String[] args) {
// create a string
String message = "everyone loves java";
// stores each characters to a char array
char[] charArray = message.toCharArray();
boolean foundSpace = true;
for(int i = 0; i < charArray.length; i++) {
// if the array element is a letter
if(Character.isLetter(charArray[i])) {
// check space is present before the letter
if(foundSpace) {
// change the letter into uppercase
charArray[i] = Character.toUpperCase(charArray[i]);
foundSpace = false;
}
}
else {
// if the new character is not character
foundSpace = true;
}
}
// convert the char array to the string
message = String.valueOf(charArray);
System.out.println("Message: " + message);
}
I understand everything in this code except for how it checks if the previous character was a space. The comment says it does this with the if(foundSpace) statement, but I don't see any reason why that would work since it's never specified that it's looking for a space. Any pointers in the right direction would be appreciated
Edit: Thanks for everyone’s answers, I think I finally get it now!
Look at the code inside the loop, stripped down a bit:
for(int i = 0; i < charArray.length; i++) {
if(Character.isLetter(charArray[i])) {
// check space is present before the letter
if(foundSpace) {
// Some code.
foundSpace = false;
}
// Here, foundSpace is false.
} else {
foundSpace = true;
}
}
The code doesn't care about what your variables or methods are called: it's checking if the character meets some criterion (Character.isLetter), and afterwards it has recorded whether or not the character met that criterion (in foundSpace).
This means that foundSpace contains whether the previous character met that criterion, and allows you to decide to do something on the current character based on what the previous one was: in your case, uppercase that character if the previous character didn't meet the criterion.
foundSpace is initially set to true, which means that the loop treats a string which starts with a letter as if there was a space before it, and so that will be capitalized.
Really, there is just a mismatch between the criterion used, and the name of the variable:
If the criterion should be "is it whitespace", use Character.isWhitespace as your criterion, and keep the flag name as foundSpace;
If the criterion really is "is it a letter", use Character.isLetter as your criterion, and name the flag foundLetter (and invert its sense, so check if (!foundLetter) etc, because it's easier to deal with "positive" checks (if (foundLetter)) rather than "negative" checks (if (!foundLetter))).
foundSpace is only set to false if the condition:
(Character.isLetter(charArray[i]))
is triggered.
If the condition is not triggered, aka if the character is not a letter, then foundSpace will be set to True.
This means that anything that is not a letter will set foundSpace equal to true.
As Silvio pointed out, this includes other non space characters such as 1, $, and ".
It would be better instead of checking if the character is a letter, to instead check to see if charArray[i] is equal to a space.
Like so:
boolean foundSpace = Character.isWhitespace(charArray[i])
More info about the Character class can be found below:
https://www.geeksforgeeks.org/character-equals-method-in-java-with-examples/
The code you've shown could have some undesirable side effects - especially if foundSpace is relied upon later on - because it could potentially lead you to believe that a space exists in a word, when it's not a space at all but rather a character that isn't a letter like "$".
In the context of your entire program, this looks like:
public class test {
public static void main(String[] args) {
// create a string
String message = "everyone loves java";
// stores each characters to a char array
char[] charArray = message.toCharArray();
boolean foundSpace = true;
for(int i = 0; i < charArray.length; i++) {
if(Character.isWhitespace(charArray[i])){
foundSpace = true;
}
else{
foundSpace = false;
}
// if the array element is a letter
if(Character.isLetter(charArray[i])) {
// check space is present before the letter
if(foundSpace) {
// change the letter into uppercase
charArray[i] = Character.toUpperCase(charArray[i]);
}
}
else {
// if the new character is not character
// delete this condition if you don't plan to use it
}
}
// convert the char array to the string
message = String.valueOf(charArray);
System.out.println("Message: " + message);
}
Some edits made to the code to fix compilation errors, thanks to those in the comments for your suggestions.

Count the Characters in a String Recursively & treat "eu" as a Single Character

I am new to Java, and I'm trying to figure out how to count Characters in the given string and threat a combination of two characters "eu" as a single character, and still count all other characters as one character.
And I want to do that using recursion.
Consider the following example.
Input:
"geugeu"
Desired output:
4 // g + eu + g + eu = 4
Current output:
2
I've been trying a lot and still can't seem to figure out how to implement it correctly.
My code:
public static int recursionCount(String str) {
if (str.length() == 1) {
return 0;
}
else {
String ch = str.substring(0, 2);
if (ch.equals("eu") {
return 1 + recursionCount(str.substring(1));
}
else {
return recursionCount(str.substring(1));
}
}
}
OP wants to count all characters in a string but adjacent characters "ae", "oe", "ue", and "eu" should be considered a single character and counted only once.
Below code does that:
public static int recursionCount(String str) {
int n;
n = str.length();
if(n <= 1) {
return n; // return 1 if one character left or 0 if empty string.
}
else {
String ch = str.substring(0, 2);
if(ch.equals("ae") || ch.equals("oe") || ch.equals("ue") || ch.equals("eu")) {
// consider as one character and skip next character
return 1 + recursionCount(str.substring(2));
}
else {
// don't skip next character
return 1 + recursionCount(str.substring(1));
}
}
}
Recursion explained
In order to address a particular task using Recursion, you need a firm understanding of how recursion works.
And the first thing you need to keep in mind is that every recursive solution should (either explicitly or implicitly) contain two parts: Base case and Recursive case.
Let's have a look at them closely:
Base case - a part that represents a simple edge-case (or a set of edge-cases), i.e. a situation in which recursion should terminate. The outcome for these edge-cases is known in advance. For this task, base case is when the given string is empty, and since there's nothing to count the return value should be 0. That is sufficient for the algorithm to work, outcomes for other inputs should be derived from the recursive case.
Recursive case - is the part of the method where recursive calls are made and where the main logic resides. Every recursive call eventually hits the base case and stars building its return value.
In the recursive case, we need to check whether the given string starts from a particular string like "eu". And for that we don't need to generate a substring (keep in mind that object creation is costful). instead we can use method String.startsWith() which checks if the bytes of the provided prefix string match the bytes at the beginning of this string which is chipper (reminder: starting from Java 9 String is backed by an array of bytes, and each character is represented either with one or two bytes depending on the character encoding) and we also don't bother about the length of the string because if the string is shorter than the prefix startsWith() will return false.
Implementation
That said, here's how an implementation might look:
public static int recursionCount(String str) {
if(str.isEmpty()) {
return 0;
}
return str.startsWith("eu") ?
1 + recursionCount(str.substring(2)) : 1 + recursionCount(str.substring(1));
}
Note: that besides from being able to implement a solution, you also need to evaluate it's Time and Space complexity.
In this case because we are creating a new string with every call time complexity is quadratic O(n^2) (reminder: creation of the new string requires allocating the memory to coping bytes of the original string). And worse case space complexity also would be O(n^2).
There's a way of solving this problem recursively in a linear time O(n) without generating a new string at every call. For that we need to introduce the second argument - current index, and each recursive call should advance this index either by 1 or by 2 (I'm not going to implement this solution and living it for OP/reader as an exercise).
In addition
In addition, here's a concise and simple non-recursive solution using String.replace():
public static int count(String str) {
return str.replace("eu", "_").length();
}
If you would need handle multiple combination of character (which were listed in the first version of the question) you can make use of the regular expressions with String.replaceAll():
public static int count(String str) {
return str.replaceAll("ue|au|oe|eu", "_").length();
}

finding the middle index of a substring when there are duplicates in the string

I was working on a Java coding problem and encountered the following issue.
Problem:
Given a string, does "xyz" appear in the middle of the string? To define middle, we'll say that the number of chars to the left and right of the "xyz" must differ by at most one
xyzMiddle("AAxyzBB") → true
xyzMiddle("AxyzBBB") → false
My Code:
public boolean xyzMiddle(String str) {
boolean result=false;
if(str.length()<3)result=false;
if(str.length()==3 && str.equals("xyz"))result=true;
for(int j=0;j<str.length()-3;j++){
if(str.substring(j,j+3).equals("xyz")){
String rightSide=str.substring(j+3,str.length());
int rightLength=rightSide.length();
String leftSide=str.substring(0,j);
int leftLength=leftSide.length();
int diff=Math.abs(rightLength-leftLength);
if(diff>=0 && diff<=1)result=true;
else result=false;
}
}
return result;
}
Output I am getting:
Running for most of the test cases but failing for certain edge cases involving more than once occurence of "xyz" in the string
Example:
xyzMiddle("xyzxyzAxyzBxyzxyz")
My present method is taking the "xyz" starting at the index 0. I understood the problem. I want a solution where the condition is using only string manipulation functions.
NOTE: I need to solve this using string manipulations like substrings. I am not considering using list, stringbuffer/builder etc. Would appreciate answers which can build up on my code.
There is no need to loop at all, because you only want to check if xyz is in the middle.
The string is of the form
prefix + "xyz" + suffix
The content of the prefix and suffix is irrelevant; the only thing that matters is they differ in length by at most 1.
Depending on the length of the string (and assuming it is at least 3):
Prefix and suffix must have the same length if the (string's length - the length of xyz) is even. In this case:
int prefixLen = (str.length()-3)/2;
result = str.substring(prefixLen, prefixLen+3).equals("xyz");
Otherwise, prefix and suffix differ in length by 1. In this case:
int minPrefixLen = (str.length()-3)/2;
int maxPrefixLen = minPrefixLen+1;
result = str.substring(minPrefixLen, minPrefixLen+3).equals("xyz") || str.substring(maxPrefixLen, maxPrefixLen+3).equals("xyz");
In fact, you don't even need the substring here. You can do it with str.regionMatches instead, and avoid creating the substrings, e.g. for the first case:
result = str.regionMatches(prefixLen, "xyz", 0, 3);
Super easy solution:
Use Apache StringUtils to split the string.
Specifically, splitByWholeSeparatorPreserveAllTokens.
Think about the problem.
Specifically, if the token is in the middle of the string then there must be an even number of tokens returned by the split call (see step 1 above).
Zero counts as an even number here.
If the number of tokens is even, add the lengths of the first group (first half of the tokens) and compare it to the lengths of the second group.
Pay attention to details,
an empty token indicates an occurrence of the token itself.
You can count this as zero length, count as the length of the token, or count it as literally any number as long as you always count it as the same number.
if (lengthFirstHalf == lengthSecondHalf) token is in middle.
Managing your code, I left unchanged the cases str.lengt<3 and str.lengt==3.
Taking inspiration from #Andy's answer, I considered the pattern
prefix+'xyz'+suffix
and, while looking for matches I controlled also if they respect the rule IsMiddle, as you defined it. If a match that respect the rule is found, the loop breaks and return a success, else the loop continue.
public boolean xyzMiddle(String str) {
boolean result=false;
if(str.length()<3)
result=false;
else if(str.length()==3 && str.equals("xyz"))
result=true;
else{
int preLen=-1;
int sufLen=-2;
int k=0;
while(k<str.lenght){
if(str.indexOf('xyz',k)!=-1){
count++;
k=str.indexOf('xyz',k);
//check if match is in the middle
preLen=str.substring(0,k).lenght;
sufLen=str.substring(k+3,str.lenght-1).lenght;
if(preLen==sufLen || preLen==sufLen-1 || preLen==sufLen+1){
result=true;
k=str.length; //breaks the while loop
}
else
result=false;
}
else
k++;
}
}
return result;
}

How to find duplicates inside a string?

I want to find out if a string that is comma separated contains only the same values:
test,asd,123,test
test,test,test
Here the 2nd string contains only the word "test". I'd like to identify these strings.
As I want to iterate over 100GB, performance matters a lot.
Which might be the fastest way of determining a boolean result if the string contains only one value repeatedly?
public static boolean stringHasOneValue(String string) {
String value = null;
for (split : string.split(",")) {
if (value == null) {
value = split;
} else {
if (!value.equals(split)) return false;
}
}
return true;
}
No need to split the string at all, in fact no need for any string manipulation.
Find the first word (indexOf comma).
Check the remaining string length is an exact multiple of that word+the separating comma. (i.e. length-1 % (foundLength+1)==0)
Loop through the remainder of the string checking the found word against each portion of the string. Just keep two indexes into the same string and move them both through it. Make sure you check the commas too (i.e. bob,bob,bob matches bob,bobabob does not).
As assylias pointed out there is no need to reset the pointers, just let them run through the String and compare the 1st with 2nd, 2nd with 3rd, etc.
Example loop, you will need to tweak the exact position of startPos to point to the first character after the first comma:
for (int i=startPos;i<str.length();i++) {
if (str.charAt(i) != str.charAt(i-startPos)) {
return false;
}
}
return true;
You won't be able to do it much faster than this given the format the incoming data is arriving in but you can do it with a single linear scan. The length check will eliminate a lot of mismatched cases immediately so is a simple optimization.
Calling split might be expensive - especially if it is 200 GB data.
Consider something like below (NOT tested and might require a bit of tweaking the index values, but I think you will get the idea) -
public static boolean stringHasOneValue(String string) {
String seperator = ",";
int firstSeparator = string.indexOf(seperator); //index of the first separator i.e. the comma
String firstValue = string.substring(0, firstSeparator); // first value of the comma separated string
int lengthOfIncrement = firstValue.length() + 1; // the string plus one to accommodate for the comma
for (int i = 0 ; i < string.length(); i += lengthOfIncrement) {
String currentValue = string.substring(i, firstValue.length());
if (!firstValue.equals(currentValue)) {
return false;
}
}
return true;
}
Complexity O(n) - assuming Java implementations of substring is efficient. If not - you can write your own substring method that takes the required no of characters from the String.
for a crack just a line code:
(#Tim answer is more efficient)
System.out.println((new HashSet<String>(Arrays.asList("test,test,test".split(","))).size()==1));

Java String.matches() regular expressions

I am trying to use the matches() function to check a user-entered password for certain conditions
"Contains six alphanumeric characters, at least one letter and one number"
Here is my current condition for checking for alphanumeric characters
pword.matches("([[a-zA-Z]&&0-9])*")
unfortunately in example using "rrrrZZ1" as the password this condition still returns false
What would be the correct regular expression? Thank you
Someone else here may prove me wrong, but this is going to be very difficult to do without an excessively complex regular expression. I'd just use a non-regular expression approach instead. Set 3 booleans for each of your conditions, loop through the characters and set each boolean as each condition is met, and if all 3 booleans don't equal true, then fail the verification.
You could use something as simple as this:
public boolean validatePassword(String password){
if(password.length() < 6){
return false;
}
boolean foundLetter = false;
boolean foundNumber = false;
for(int i=0; i < password.length(); i++){
char c = password.charAt(i);
if(Character.isLetter(c)){
foundLetter = true;
}else if(Character.isDigit(c)){
foundNumber = true;
}else{
// Isn't alpha-numeric.
return false;
}
}
return foundLetter && foundNumber;
}
I agree with ziesemer - use a validatePassword() method instead of cramming it into regex.
Much more readable for a developer to maintain.
If you do want to go down the regex path, it is achievable using zero width positive lookaheads.
Contains six characters, at least one letter and one number:
^.*(?=.{6,})(?=.*[a-zA-Z]).*$
I changed your six alphanumeric characters to just characters.
Supports more complex passwords :)
Great post on the topic:
http://www.zorched.net/2009/05/08/password-strength-validation-with-regular-expressions/
Bookmark this one too:
http://www.regular-expressions.info/

Categories