How to check if a String contains only ASCII? - java

The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?

From Guava 19.0 onward, you may use:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString) method which relies on the factory method ascii() rather than the now deprecated ASCII singleton.
Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20 (space) such as tabs, line-feed / return but also BEL with code 0x07 and DEL with code 0x7F.
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000 or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
For earlier Guava versions without the ascii() method you may write:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);

You can do it with java.nio.charset.Charset.
import java.nio.charset.Charset;
public class StringUtils {
public static boolean isPureAscii(String v) {
return Charset.forName("US-ASCII").newEncoder().canEncode(v);
// or "ISO-8859-1" for ISO Latin 1
// or StandardCharsets.US_ASCII with JDK1.7+
}
public static void main (String args[])
throws Exception {
String test = "Réal";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
test = "Real";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
/*
* output :
* Réal isPureAscii() : false
* Real isPureAscii() : true
*/
}
}
Detect non-ASCII character in a String

Here is another way not depending on a library but using a regex.
You can use this single line:
text.matches("\\A\\p{ASCII}*\\z")
Whole example program:
public class Main {
public static void main(String[] args) {
char nonAscii = 0x00FF;
String asciiText = "Hello";
String nonAsciiText = "Buy: " + nonAscii;
System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
}
}
Understanding the regex :
li \\A : Beginning of input
\\p{ASCII} : Any ASCII character
* : all repetitions
\\z : End of input

Iterate through the string and make sure all the characters have a value less than 128.
Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127

Or you copy the code from the IDN class.
// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
boolean isASCII = true;
for (int i = 0; i < input.length(); i++) {
int c = input.charAt(i);
if (c > 0x7F) {
isASCII = false;
break;
}
}
return isASCII;
}

commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.
System.out.println(StringUtils.isAsciiPrintable("!#£$%^&!#£$%^"));

try this:
for (char c: string.toCharArray()){
if (((int)c)>127){
return false;
}
}
return true;

This will return true if String only contains ASCII characters and false when it does not
Charset.forName("US-ASCII").newEncoder().canEncode(str)
If You want to remove non ASCII , here is the snippet:
if(!Charset.forName("US-ASCII").newEncoder().canEncode(str)) {
str = str.replaceAll("[^\\p{ASCII}]", "");
}

In Java 8 and above, one can use String#codePoints in conjunction with IntStream#allMatch.
boolean allASCII = str.codePoints().allMatch(c -> c < 128);

In Kotlin:
fun String.isAsciiString() : Boolean =
this.toCharArray().none { it < ' ' || it > '~' }

Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.
Break at the first you don't like.

private static boolean isASCII(String s)
{
for (int i = 0; i < s.length(); i++)
if (s.charAt(i) > 127)
return false;
return true;
}

It was possible. Pretty problem.
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
public class EncodingTest {
static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
.newEncoder();
public static void main(String[] args) {
String testStr = "¤EÀsÆW°ê»Ú®i¶T¤¤¤ß3¼Ó®i¶TÆU2~~KITEC 3/F Rotunda 2";
String[] strArr = testStr.split("~~", 2);
int count = 0;
boolean encodeFlag = false;
do {
encodeFlag = asciiEncoderTest(strArr[count]);
System.out.println(encodeFlag);
count++;
} while (count < strArr.length);
}
public static boolean asciiEncoderTest(String test) {
boolean encodeFlag = false;
try {
encodeFlag = asciiEncoder.canEncode(new String(test
.getBytes("ISO8859_1"), "BIG5"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return encodeFlag;
}
}

//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
return (c > 64 && c < 91) || (c > 96 && c < 123);
}

Related

Check number is valid number or not [duplicate]

How would you check if a String was a number before parsing it?
This is generally done with a simple user-defined function (i.e. Roll-your-own "isNumeric" function).
Something like:
public static boolean isNumeric(String str) {
try {
Double.parseDouble(str);
return true;
} catch(NumberFormatException e){
return false;
}
}
However, if you're calling this function a lot, and you expect many of the checks to fail due to not being a number then performance of this mechanism will not be great, since you're relying upon exceptions being thrown for each failure, which is a fairly expensive operation.
An alternative approach may be to use a regular expression to check for validity of being a number:
public static boolean isNumeric(String str) {
return str.matches("-?\\d+(\\.\\d+)?"); //match a number with optional '-' and decimal.
}
Be careful with the above RegEx mechanism, though, as it will fail if you're using non-Arabic digits (i.e. numerals other than 0 through to 9). This is because the "\d" part of the RegEx will only match [0-9] and effectively isn't internationally numerically aware. (Thanks to OregonGhost for pointing this out!)
Or even another alternative is to use Java's built-in java.text.NumberFormat object to see if, after parsing the string the parser position is at the end of the string. If it is, we can assume the entire string is numeric:
public static boolean isNumeric(String str) {
ParsePosition pos = new ParsePosition(0);
NumberFormat.getInstance().parse(str, pos);
return str.length() == pos.getIndex();
}
With Apache Commons Lang 3.5 and above: NumberUtils.isCreatable or StringUtils.isNumeric.
With Apache Commons Lang 3.4 and below: NumberUtils.isNumber or StringUtils.isNumeric.
You can also use StringUtils.isNumericSpace which returns true for empty strings and ignores internal spaces in the string. Another way is to use NumberUtils.isParsable which basically checks the number is parsable according to Java. (The linked javadocs contain detailed examples for each method.)
Java 8 lambda expressions.
String someString = "123123";
boolean isNumeric = someString.chars().allMatch( Character::isDigit );
if you are on android, then you should use:
android.text.TextUtils.isDigitsOnly(CharSequence str)
documentation can be found here
keep it simple. mostly everybody can "re-program" (the same thing).
As #CraigTP had mentioned in his excellent answer, I also have similar performance concerns on using Exceptions to test whether the string is numerical or not. So I end up splitting the string and use java.lang.Character.isDigit().
public static boolean isNumeric(String str)
{
for (char c : str.toCharArray())
{
if (!Character.isDigit(c)) return false;
}
return true;
}
According to the Javadoc, Character.isDigit(char) will correctly recognizes non-Latin digits. Performance-wise, I think a simple N number of comparisons where N is the number of characters in the string would be more computationally efficient than doing a regex matching.
UPDATE: As pointed by Jean-François Corbett in the comment, the above code would only validate positive integers, which covers the majority of my use case. Below is the updated code that correctly validates decimal numbers according to the default locale used in your system, with the assumption that decimal separator only occur once in the string.
public static boolean isStringNumeric( String str )
{
DecimalFormatSymbols currentLocaleSymbols = DecimalFormatSymbols.getInstance();
char localeMinusSign = currentLocaleSymbols.getMinusSign();
if ( !Character.isDigit( str.charAt( 0 ) ) && str.charAt( 0 ) != localeMinusSign ) return false;
boolean isDecimalSeparatorFound = false;
char localeDecimalSeparator = currentLocaleSymbols.getDecimalSeparator();
for ( char c : str.substring( 1 ).toCharArray() )
{
if ( !Character.isDigit( c ) )
{
if ( c == localeDecimalSeparator && !isDecimalSeparatorFound )
{
isDecimalSeparatorFound = true;
continue;
}
return false;
}
}
return true;
}
Google's Guava library provides a nice helper method to do this: Ints.tryParse. You use it like Integer.parseInt but it returns null rather than throw an Exception if the string does not parse to a valid integer. Note that it returns Integer, not int, so you have to convert/autobox it back to int.
Example:
String s1 = "22";
String s2 = "22.2";
Integer oInt1 = Ints.tryParse(s1);
Integer oInt2 = Ints.tryParse(s2);
int i1 = -1;
if (oInt1 != null) {
i1 = oInt1.intValue();
}
int i2 = -1;
if (oInt2 != null) {
i2 = oInt2.intValue();
}
System.out.println(i1); // prints 22
System.out.println(i2); // prints -1
However, as of the current release -- Guava r11 -- it is still marked #Beta.
I haven't benchmarked it. Looking at the source code there is some overhead from a lot of sanity checking but in the end they use Character.digit(string.charAt(idx)), similar, but slightly different from, the answer from #Ibrahim above. There is no exception handling overhead under the covers in their implementation.
Do not use Exceptions to validate your values.
Use Util libs instead like apache NumberUtils:
NumberUtils.isNumber(myStringValue);
Edit:
Please notice that, if your string starts with an 0, NumberUtils will interpret your value as hexadecimal.
NumberUtils.isNumber("07") //true
NumberUtils.isNumber("08") //false
Why is everyone pushing for exception/regex solutions?
While I can understand most people are fine with using try/catch, if you want to do it frequently... it can be extremely taxing.
What I did here was take the regex, the parseNumber() methods, and the array searching method to see which was the most efficient. This time, I only looked at integer numbers.
public static boolean isNumericRegex(String str) {
if (str == null)
return false;
return str.matches("-?\\d+");
}
public static boolean isNumericArray(String str) {
if (str == null)
return false;
char[] data = str.toCharArray();
if (data.length <= 0)
return false;
int index = 0;
if (data[0] == '-' && data.length > 1)
index = 1;
for (; index < data.length; index++) {
if (data[index] < '0' || data[index] > '9') // Character.isDigit() can go here too.
return false;
}
return true;
}
public static boolean isNumericException(String str) {
if (str == null)
return false;
try {
/* int i = */ Integer.parseInt(str);
} catch (NumberFormatException nfe) {
return false;
}
return true;
}
The results in speed I got were:
Done with: for (int i = 0; i < 10000000; i++)...
With only valid numbers ("59815833" and "-59815833"):
Array numeric took 395.808192 ms [39.5808192 ns each]
Regex took 2609.262595 ms [260.9262595 ns each]
Exception numeric took 428.050207 ms [42.8050207 ns each]
// Negative sign
Array numeric took 355.788273 ms [35.5788273 ns each]
Regex took 2746.278466 ms [274.6278466 ns each]
Exception numeric took 518.989902 ms [51.8989902 ns each]
// Single value ("1")
Array numeric took 317.861267 ms [31.7861267 ns each]
Regex took 2505.313201 ms [250.5313201 ns each]
Exception numeric took 239.956955 ms [23.9956955 ns each]
// With Character.isDigit()
Array numeric took 400.734616 ms [40.0734616 ns each]
Regex took 2663.052417 ms [266.3052417 ns each]
Exception numeric took 401.235906 ms [40.1235906 ns each]
With invalid characters ("5981a5833" and "a"):
Array numeric took 343.205793 ms [34.3205793 ns each]
Regex took 2608.739933 ms [260.8739933 ns each]
Exception numeric took 7317.201775 ms [731.7201775 ns each]
// With a single character ("a")
Array numeric took 291.695519 ms [29.1695519 ns each]
Regex took 2287.25378 ms [228.725378 ns each]
Exception numeric took 7095.969481 ms [709.5969481 ns each]
With null:
Array numeric took 214.663834 ms [21.4663834 ns each]
Regex took 201.395992 ms [20.1395992 ns each]
Exception numeric took 233.049327 ms [23.3049327 ns each]
Exception numeric took 6603.669427 ms [660.3669427 ns each] if there is no if/null check
Disclaimer: I'm not claiming these methods are 100% optimized, they're just for demonstration of the data
Exceptions won if and only if the number is 4 characters or less, and every string is always a number... in which case, why even have a check?
In short, it is extremely painful if you run into invalid numbers frequently with the try/catch, which makes sense. An important rule I always follow is NEVER use try/catch for program flow. This is an example why.
Interestingly, the simple if char <0 || >9 was extremely simple to write, easy to remember (and should work in multiple languages) and wins almost all the test scenarios.
The only downside is that I'm guessing Integer.parseInt() might handle non ASCII numbers, whereas the array searching method does not.
For those wondering why I said it's easy to remember the character array one, if you know there's no negative signs, you can easily get away with something condensed as this:
public static boolean isNumericArray(String str) {
if (str == null)
return false;
for (char c : str.toCharArray())
if (c < '0' || c > '9')
return false;
return true;
Lastly as a final note, I was curious about the assigment operator in the accepted example with all the votes up. Adding in the assignment of
double d = Double.parseDouble(...)
is not only useless since you don't even use the value, but it wastes processing time and increased the runtime by a few nanoseconds (which led to a 100-200 ms increase in the tests). I can't see why anyone would do that since it actually is extra work to reduce performance.
You'd think that would be optimized out... though maybe I should check the bytecode and see what the compiler is doing. That doesn't explain why it always showed up as lengthier for me though if it somehow is optimized out... therefore I wonder what's going on. As a note: By lengthier, I mean running the test for 10000000 iterations, and running that program multiple times (10x+) always showed it to be slower.
EDIT: Updated a test for Character.isDigit()
public static boolean isNumeric(String str)
{
return str.matches("-?\\d+(.\\d+)?");
}
CraigTP's regular expression (shown above) produces some false positives. E.g. "23y4" will be counted as a number because '.' matches any character not the decimal point.
Also it will reject any number with a leading '+'
An alternative which avoids these two minor problems is
public static boolean isNumeric(String str)
{
return str.matches("[+-]?\\d*(\\.\\d+)?");
}
We can try replacing all the numbers from the given string with ("") ie blank space and if after that the length of the string is zero then we can say that given string contains only numbers.
Example:
boolean isNumber(String str){
if(str.length() == 0)
return false; //To check if string is empty
if(str.charAt(0) == '-')
str = str.replaceFirst("-","");// for handling -ve numbers
System.out.println(str);
str = str.replaceFirst("\\.",""); //to check if it contains more than one decimal points
if(str.length() == 0)
return false; // to check if it is empty string after removing -ve sign and decimal point
System.out.println(str);
return str.replaceAll("[0-9]","").length() == 0;
}
You can use NumberFormat#parse:
try
{
NumberFormat.getInstance().parse(value);
}
catch(ParseException e)
{
// Not a number.
}
If you using java to develop Android app, you could using TextUtils.isDigitsOnly function.
Here was my answer to the problem.
A catch all convenience method which you can use to parse any String with any type of parser: isParsable(Object parser, String str). The parser can be a Class or an object. This will also allows you to use custom parsers you've written and should work for ever scenario, eg:
isParsable(Integer.class, "11");
isParsable(Double.class, "11.11");
Object dateFormater = new java.text.SimpleDateFormat("yyyy.MM.dd G 'at' HH:mm:ss z");
isParsable(dateFormater, "2001.07.04 AD at 12:08:56 PDT");
Here's my code complete with method descriptions.
import java.lang.reflect.*;
/**
* METHOD: isParsable<p><p>
*
* This method will look through the methods of the specified <code>from</code> parameter
* looking for a public method name starting with "parse" which has only one String
* parameter.<p>
*
* The <code>parser</code> parameter can be a class or an instantiated object, eg:
* <code>Integer.class</code> or <code>new Integer(1)</code>. If you use a
* <code>Class</code> type then only static methods are considered.<p>
*
* When looping through potential methods, it first looks at the <code>Class</code> associated
* with the <code>parser</code> parameter, then looks through the methods of the parent's class
* followed by subsequent ancestors, using the first method that matches the criteria specified
* above.<p>
*
* This method will hide any normal parse exceptions, but throws any exceptions due to
* programmatic errors, eg: NullPointerExceptions, etc. If you specify a <code>parser</code>
* parameter which has no matching parse methods, a NoSuchMethodException will be thrown
* embedded within a RuntimeException.<p><p>
*
* Example:<br>
* <code>isParsable(Boolean.class, "true");<br>
* isParsable(Integer.class, "11");<br>
* isParsable(Double.class, "11.11");<br>
* Object dateFormater = new java.text.SimpleDateFormat("yyyy.MM.dd G 'at' HH:mm:ss z");<br>
* isParsable(dateFormater, "2001.07.04 AD at 12:08:56 PDT");<br></code>
* <p>
*
* #param parser The Class type or instantiated Object to find a parse method in.
* #param str The String you want to parse
*
* #return true if a parse method was found and completed without exception
* #throws java.lang.NoSuchMethodException If no such method is accessible
*/
public static boolean isParsable(Object parser, String str) {
Class theClass = (parser instanceof Class? (Class)parser: parser.getClass());
boolean staticOnly = (parser == theClass), foundAtLeastOne = false;
Method[] methods = theClass.getMethods();
// Loop over methods
for (int index = 0; index < methods.length; index++) {
Method method = methods[index];
// If method starts with parse, is public and has one String parameter.
// If the parser parameter was a Class, then also ensure the method is static.
if(method.getName().startsWith("parse") &&
(!staticOnly || Modifier.isStatic(method.getModifiers())) &&
Modifier.isPublic(method.getModifiers()) &&
method.getGenericParameterTypes().length == 1 &&
method.getGenericParameterTypes()[0] == String.class)
{
try {
foundAtLeastOne = true;
method.invoke(parser, str);
return true; // Successfully parsed without exception
} catch (Exception exception) {
// If invoke problem, try a different method
/*if(!(exception instanceof IllegalArgumentException) &&
!(exception instanceof IllegalAccessException) &&
!(exception instanceof InvocationTargetException))
continue; // Look for other parse methods*/
// Parse method refuses to parse, look for another different method
continue; // Look for other parse methods
}
}
}
// No more accessible parse method could be found.
if(foundAtLeastOne) return false;
else throw new RuntimeException(new NoSuchMethodException());
}
/**
* METHOD: willParse<p><p>
*
* A convienence method which calls the isParseable method, but does not throw any exceptions
* which could be thrown through programatic errors.<p>
*
* Use of {#link #isParseable(Object, String) isParseable} is recommended for use so programatic
* errors can be caught in development, unless the value of the <code>parser</code> parameter is
* unpredictable, or normal programtic exceptions should be ignored.<p>
*
* See {#link #isParseable(Object, String) isParseable} for full description of method
* usability.<p>
*
* #param parser The Class type or instantiated Object to find a parse method in.
* #param str The String you want to parse
*
* #return true if a parse method was found and completed without exception
* #see #isParseable(Object, String) for full description of method usability
*/
public static boolean willParse(Object parser, String str) {
try {
return isParsable(parser, str);
} catch(Throwable exception) {
return false;
}
}
To match only positive base-ten integers, that contains only ASCII digits, use:
public static boolean isNumeric(String maybeNumeric) {
return maybeNumeric != null && maybeNumeric.matches("[0-9]+");
}
A well-performing approach avoiding try-catch and handling negative numbers and scientific notation.
Pattern PATTERN = Pattern.compile( "^(-?0|-?[1-9]\\d*)(\\.\\d+)?(E\\d+)?$" );
public static boolean isNumeric( String value )
{
return value != null && PATTERN.matcher( value ).matches();
}
Regex Matching
Here is another example upgraded "CraigTP" regex matching with more validations.
public static boolean isNumeric(String str)
{
return str.matches("^(?:(?:\\-{1})?\\d+(?:\\.{1}\\d+)?)$");
}
Only one negative sign - allowed and must be in beginning.
After negative sign there must be digit.
Only one decimal sign . allowed.
After decimal sign there must be digit.
Regex Test
1 -- **VALID**
1. -- INVALID
1.. -- INVALID
1.1 -- **VALID**
1.1.1 -- INVALID
-1 -- **VALID**
--1 -- INVALID
-1. -- INVALID
-1.1 -- **VALID**
-1.1.1 -- INVALID
Here is my class for checking if a string is numeric. It also fixes numerical strings:
Features:
Removes unnecessary zeros ["12.0000000" -> "12"]
Removes unnecessary zeros ["12.0580000" -> "12.058"]
Removes non numerical characters ["12.00sdfsdf00" -> "12"]
Handles negative string values ["-12,020000" -> "-12.02"]
Removes multiple dots ["-12.0.20.000" -> "-12.02"]
No extra libraries, just standard Java
Here you go...
public class NumUtils {
/**
* Transforms a string to an integer. If no numerical chars returns a String "0".
*
* #param str
* #return retStr
*/
static String makeToInteger(String str) {
String s = str;
double d;
d = Double.parseDouble(makeToDouble(s));
int i = (int) (d + 0.5D);
String retStr = String.valueOf(i);
System.out.printf(retStr + " ");
return retStr;
}
/**
* Transforms a string to an double. If no numerical chars returns a String "0".
*
* #param str
* #return retStr
*/
static String makeToDouble(String str) {
Boolean dotWasFound = false;
String orgStr = str;
String retStr;
int firstDotPos = 0;
Boolean negative = false;
//check if str is null
if(str.length()==0){
str="0";
}
//check if first sign is "-"
if (str.charAt(0) == '-') {
negative = true;
}
//check if str containg any number or else set the string to '0'
if (!str.matches(".*\\d+.*")) {
str = "0";
}
//Replace ',' with '.' (for some european users who use the ',' as decimal separator)
str = str.replaceAll(",", ".");
str = str.replaceAll("[^\\d.]", "");
//Removes the any second dots
for (int i_char = 0; i_char < str.length(); i_char++) {
if (str.charAt(i_char) == '.') {
dotWasFound = true;
firstDotPos = i_char;
break;
}
}
if (dotWasFound) {
String befDot = str.substring(0, firstDotPos + 1);
String aftDot = str.substring(firstDotPos + 1, str.length());
aftDot = aftDot.replaceAll("\\.", "");
str = befDot + aftDot;
}
//Removes zeros from the begining
double uglyMethod = Double.parseDouble(str);
str = String.valueOf(uglyMethod);
//Removes the .0
str = str.replaceAll("([0-9])\\.0+([^0-9]|$)", "$1$2");
retStr = str;
if (negative) {
retStr = "-"+retStr;
}
return retStr;
}
static boolean isNumeric(String str) {
try {
double d = Double.parseDouble(str);
} catch (NumberFormatException nfe) {
return false;
}
return true;
}
}
Exceptions are expensive, but in this case the RegEx takes much longer. The code below shows a simple test of two functions -- one using exceptions and one using regex. On my machine the RegEx version is 10 times slower than the exception.
import java.util.Date;
public class IsNumeric {
public static boolean isNumericOne(String s) {
return s.matches("-?\\d+(\\.\\d+)?"); //match a number with optional '-' and decimal.
}
public static boolean isNumericTwo(String s) {
try {
Double.parseDouble(s);
return true;
} catch (Exception e) {
return false;
}
}
public static void main(String [] args) {
String test = "12345.F";
long before = new Date().getTime();
for(int x=0;x<1000000;++x) {
//isNumericTwo(test);
isNumericOne(test);
}
long after = new Date().getTime();
System.out.println(after-before);
}
}
// please check below code
public static boolean isDigitsOnly(CharSequence str) {
final int len = str.length();
for (int i = 0; i < len; i++) {
if (!Character.isDigit(str.charAt(i))) {
return false;
}
}
return true;
}
You can use the java.util.Scanner object.
public static boolean isNumeric(String inputData) {
Scanner sc = new Scanner(inputData);
return sc.hasNextInt();
}
// only int
public static boolean isNumber(int num)
{
return (num >= 48 && c <= 57); // 0 - 9
}
// is type of number including . - e E
public static boolean isNumber(String s)
{
boolean isNumber = true;
for(int i = 0; i < s.length() && isNumber; i++)
{
char c = s.charAt(i);
isNumber = isNumber & (
(c >= '0' && c <= '9') || (c == '.') || (c == 'e') || (c == 'E') || (c == '')
);
}
return isInteger;
}
// is type of number
public static boolean isInteger(String s)
{
boolean isInteger = true;
for(int i = 0; i < s.length() && isInteger; i++)
{
char c = s.charAt(i);
isInteger = isInteger & ((c >= '0' && c <= '9'));
}
return isInteger;
}
public static boolean isNumeric(String s)
{
try
{
Double.parseDouble(s);
return true;
}
catch (Exception e)
{
return false;
}
}
This a simple example for this check:
public static boolean isNumericString(String input) {
boolean result = false;
if(input != null && input.length() > 0) {
char[] charArray = input.toCharArray();
for(char c : charArray) {
if(c >= '0' && c <= '9') {
// it is a digit
result = true;
} else {
result = false;
break;
}
}
}
return result;
}
I have illustrated some conditions to check numbers and decimals without using any API,
Check Fix Length 1 digit number
Character.isDigit(char)
Check Fix Length number (Assume length is 6)
String number = "132452";
if(number.matches("([0-9]{6})"))
System.out.println("6 digits number identified");
Check Varying Length number between (Assume 4 to 6 length)
// {n,m} n <= length <= m
String number = "132452";
if(number.matches("([0-9]{4,6})"))
System.out.println("Number Identified between 4 to 6 length");
String number = "132";
if(!number.matches("([0-9]{4,6})"))
System.out.println("Number not in length range or different format");
Check Varying Length decimal number between (Assume 4 to 7 length)
// It will not count the '.' (Period) in length
String decimal = "132.45";
if(decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Numbers Identified between 4 to 7");
String decimal = "1.12";
if(decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Numbers Identified between 4 to 7");
String decimal = "1234";
if(decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Numbers Identified between 4 to 7");
String decimal = "-10.123";
if(decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Numbers Identified between 4 to 7");
String decimal = "123..4";
if(!decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Decimal not in range or different format");
String decimal = "132";
if(!decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Decimal not in range or different format");
String decimal = "1.1";
if(!decimal.matches("(-?[0-9]+(\.)?[0-9]*){4,6}"))
System.out.println("Decimal not in range or different format");
Hope it will help manyone.
Based off of other answers I wrote my own and it doesn't use patterns or parsing with exception checking.
It checks for a maximum of one minus sign and checks for a maximum of one decimal point.
Here are some examples and their results:
"1", "-1", "-1.5" and "-1.556" return true
"1..5", "1A.5", "1.5D", "-" and "--1" return false
Note: If needed you can modify this to accept a Locale parameter and pass that into the DecimalFormatSymbols.getInstance() calls to use a specific Locale instead of the current one.
public static boolean isNumeric(final String input) {
//Check for null or blank string
if(input == null || input.isBlank()) return false;
//Retrieve the minus sign and decimal separator characters from the current Locale
final var localeMinusSign = DecimalFormatSymbols.getInstance().getMinusSign();
final var localeDecimalSeparator = DecimalFormatSymbols.getInstance().getDecimalSeparator();
//Check if first character is a minus sign
final var isNegative = input.charAt(0) == localeMinusSign;
//Check if string is not just a minus sign
if (isNegative && input.length() == 1) return false;
var isDecimalSeparatorFound = false;
//If the string has a minus sign ignore the first character
final var startCharIndex = isNegative ? 1 : 0;
//Check if each character is a number or a decimal separator
//and make sure string only has a maximum of one decimal separator
for (var i = startCharIndex; i < input.length(); i++) {
if(!Character.isDigit(input.charAt(i))) {
if(input.charAt(i) == localeDecimalSeparator && !isDecimalSeparatorFound) {
isDecimalSeparatorFound = true;
} else return false;
}
}
return true;
}
For non-negative number use this
public boolean isNonNegativeNumber(String str) {
return str.matches("\\d+");
}
For any number use this
public boolean isNumber(String str) {
return str.matches("-?\\d+");
}
I modified CraigTP's solution to accept scientific notation and both dot and comma as decimal separators as well
^-?\d+([,\.]\d+)?([eE]-?\d+)?$
example
var re = new RegExp("^-?\d+([,\.]\d+)?([eE]-?\d+)?$");
re.test("-6546"); // true
re.test("-6546355e-4456"); // true
re.test("-6546.355e-4456"); // true, though debatable
re.test("-6546.35.5e-4456"); // false
re.test("-6546.35.5e-4456.6"); // false
That's why I like the Try* approach in .NET. In addition to the traditional Parse method that's like the Java one, you also have a TryParse method. I'm not good in Java syntax (out parameters?), so please treat the following as some kind of pseudo-code. It should make the concept clear though.
boolean parseInteger(String s, out int number)
{
try {
number = Integer.parseInt(myString);
return true;
} catch(NumberFormatException e) {
return false;
}
}
Usage:
int num;
if (parseInteger("23", out num)) {
// Do something with num.
}
Parse it (i.e. with Integer#parseInt ) and simply catch the exception. =)
To clarify: The parseInt function checks if it can parse the number in any case (obviously) and if you want to parse it anyway, you are not going to take any performance hit by actually doing the parsing.
If you would not want to parse it (or parse it very, very rarely) you might wish to do it differently of course.
You can use NumberUtils.isCreatable() from Apache Commons Lang.
Since NumberUtils.isNumber will be deprecated in 4.0, so use NumberUtils.isCreatable() instead.
Java 8 Stream, lambda expression, functional interface
All cases handled (string null, string empty etc)
String someString = null; // something="", something="123abc", something="123123"
boolean isNumeric = Stream.of(someString)
.filter(s -> s != null && !s.isEmpty())
.filter(Pattern.compile("\\D").asPredicate().negate())
.mapToLong(Long::valueOf)
.boxed()
.findAny()
.isPresent();

how check if String has Full width character in java

Can anyone suggest me how to check if a String contains full width characters in Java? Characters having full width are special characters.
Full width characters in String:
abc@gmail.com
Half width characters in String:
abc#gmail.com
I'm not sure if you are looking for any or all, so here are functions for both:
public static boolean isAllFullWidth(String str) {
for (char c : str.toCharArray())
if ((c & 0xff00) != 0xff00)
return false;
return true;
}
public static boolean areAnyFullWidth(String str) {
for (char c : str.toCharArray())
if ((c & 0xff00) == 0xff00)
return true;
return false;
}
As for your half width '.' and possible '_'. Strip them out first with a replace maybe:
String str="abc@gmail.com";
if (isAllFullWidth(str.replaceAll("[._]","")))
//then apart from . and _, they are all full width
Regex
Alternatively if you want to use a regex to test, then this is the actual character range for full width:
[\uFF01-\uFF5E]
So the method then looks like:
public static boolean isAllFullWidth(String str) {
return str.matches("[\\uff01-\\uff5E]*");
}
You can add your other characters to it and so not need to strip them:
public static boolean isValidFullWidthEmail(String str) {
return str.matches("[\\uff01-\\uff5E._]*");
}
You can compare the UNICODE Since unicode for alphabets (a-z) is 97-122 , So you can easily diffrentiate between the two
String str="abc@gmail.com";
System.out.println((int)str.charAt(0));
for Input
abc@gmail.com
Output
65345
You can try something like this:
public static final String FULL_WIDTH_CHARS = "AaBbCcDdEeFfGgHhIiJj"
+ "KkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";
public static boolean containsFullWidthChars(String str) {
for(int i = 0; i < FULL_WIDTH_CHARS.length(); i++) {
if(str.contains(String.valueOf(FULL_WIDTH_CHARS.charAt(i)))) {
return true;
}
}
return false;
}
use regular expression here.
\W is used to check for non-word characters.
str will contain full width character if following statement return true:
boolean flag = str.matches("\\W");
half-width: 1 byte
full-width: > 1 byte (2,3,4.. byte)
-> compare: length of String == byte length
String strCheck = "abc@gmail.com";
if (str.length() != str.getBytes().length) {
// is Full Width
} else {
// is Half Width
}

Does Java Have the Equivalent of Python's "string.ascii_uppercase"?

Python's string module has a few handy operations that will return a certain character sets, such as all the uppercase characters. Is there anything similar for Java?
http://docs.python.org/2/library/string.html
string.ascii_lowercase
The lowercase letters 'abcdefghijklmnopqrstuvwxyz'.
string.ascii_uppercase
The uppercase letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
string.digits
The string '0123456789'.
string.punctuation
String of ASCII characters which are considered punctuation characters in the C locale.
No. You'd need to write a loop:
String result = "";
for (char c = 'A'; c <= 'Z'; c++) {
result += c;
}
You could create your own,
UPPERCASE
public String uppercase()
{
String answer = "";
char c;
for (c = 'A'; c <= 'Z'; c++)
{
answer += c;
}
return answer;
}
LOWERCASE
public String lowercase()
{
String answer = "";
char c;
for (c = 'a'; c <= 'z'; c++)
{
answer += c;
}
return answer;
}
NUMBERS
public String numbers()
{
String answer = "";
int i;
for (i=0;i<9;i++)
{
answer += i;
}
return answer;
}
You can just do toLowerCase() and toUpperCase()
String abc = aBc;
abc.toLowerCase(); // abc
abc.toUpperCase(); // ABC
EDIT: I misread the OP.
OP doesn't want to convert strings, but get a collections of all Upper-/Lowercases, Punctiation and Numbers, using Java
public static char[] getUpper() {
char[] res = new char[26];
for(int i = 0; i <= 26; i++) {
res[i] = 'A' + i;
}
return res;
}
// Or just do getUpper().toLowerCase();
public static char[] getUpper() {
char[] res = new char[26];
for(int i = 0; i <= 26; i++) {
res[i] = 'a' + i;
}
return res;
}
public static char[] getUpper() {
char[] res = new char[10];
for(int i = 0; i <= 10; i++) {
res[i] = '0' + i;
}
return res;
}
For punctuations I really don't know
You can ofc just use some directly returning methods, as you know what the output should be.
public static String getUpper() {
return "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
}
public static String getLower() {
return "abcdefghijklmnopqrstuvwxyz"
}
public static String getDigits() {
return "0123456789"
}
public static String getPunctuation() {
return ".," // Don't really know what this should return
}
I think that the simple answer is that there is no direct equivalent.
Obviously, if you simply want all of the uppercase letters in (say) ASCII, then it is a trivial task to implement. Java doesn't implement it in a standard API. I imagine that's because:
it is trivial to implement yourself,
they do not want APIs that mix up the distinction between encoded and decoded characters, and
they do not want to encourage code that hard-wires use of specific character sets / encodings.
Java's internal character representation is Unicode based. Unicode is essentially a catalog of character code points from many languages and writing systems. There is a subsystem for mapping between Java's internal Unicode-based text representation an external character encodings such as ASCII, LATIN-1 ... and UTF-8. Some of these support all Unicode code points, and others don't.
So if you wanted a general solution to the problem of "find me all uppercase letters in some character set XYX", you'd need to do something like this:
Obtain the Java Charset object for "XYZ".
Enumerate all of the valid characters in the character set "XYZ".
Obtain the CharsetDecoder object for the character set, and use it to decode each valid character ... to a Unicode code point.
Use the Java Character class to test the code point's Unicode attributes; e.g. whether it is a "letter", whether it is "upper case" and so on. (Use Character.getType(codepoint))
I imagine this procedure (or similar) is implementable, but I'm not aware of any standard Java APIs or 3rd party Java APIs that do that.
try this method toUpperCase().

Is there any utlity method in Java to find repeating duplicate characters?

Is there any utility method in java to find the repeating duplicate character in java?
e.g. "allowed" is not allowed as it has two repeating 'l' and "repeating" is allowed though it has two 'e'
I was looking at the StringUtils, but doesn't have anything there. I am thinking to write something like
for (each char in string) {
if (char at counter of loop == char at next counter) {
break;
}}
The loop approach is one solution or, if you want something fancy, you could use a regex approach, which would look like:
private static final Pattern repeatMatcher = Pattern.compile("^(?:(.)(?!\\1))*$");
public static boolean hasRepeatedCharacters(String input) {
return !repeatMatcher.matcher(input).matches();
}
But the basic approach with a loop is certainly more readable:
public static boolean hasRepeatedCharacters(String input) {
for (int i = 0; i < input.length() - 1; i++) {
if (input.charAt(i) == input.charAt(i + 1)) return true;
}
return false;
}
There's no utility method for this as I don't think this problem is common enough to actually deserve one. It far too specific for any general use.
Make your own method just as you suggested, it seems fine.
Doesn't sound like a common usecase for an utility. Your code logic seems good enough. Optimization to check if it's single char or not and check for char at next counter doesn't exceed string length should do.
Try this:
Character last = null;
boolean allowed = true;
for (Character c : str.toCharArray()) {
if (c.equals(last)) {
allowed = false;
break;
}
last = c.charValue();
}
You can try this a well:
package com.stack.overflow.works.main;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
/**
* #author sarath_sivan
*/
public class DuplicatesFinder {
public static void findDuplicates(String inputString) {
Map<Character, Integer> duplicatesMap = new HashMap<Character, Integer>();
char[] charArray = inputString.toCharArray();
for (Character ch : charArray) {
if (duplicatesMap.containsKey(ch)) {
duplicatesMap.put(ch, duplicatesMap.get(ch) + 1);
} else {
duplicatesMap.put(ch, 1);
}
}
Set<Character> keySet = duplicatesMap.keySet();
for (Character ch: keySet) {
if (duplicatesMap.get(ch) > 1) {
System.out.println("[INFO: CHARACTER " + ch + " IS DUPLICATE, OCCURENCE: " + duplicatesMap.get(ch) + " TIMES]");
}
}
}
public static void main(String[] args) {
DuplicatesFinder.findDuplicates("sarath kumar sivan");
}
}
It will produce the simple test result for the input string "sarath kumar sivan" like this:
[INFO: CHARACTER IS DUPLICATE, OCCURENCE: 2 TIMES]
[INFO: CHARACTER s IS DUPLICATE, OCCURENCE: 2 TIMES]
[INFO: CHARACTER r IS DUPLICATE, OCCURENCE: 2 TIMES]
[INFO: CHARACTER a IS DUPLICATE, OCCURENCE: 4 TIMES]

Check if String contains only letters

The idea is to have a String read and to verify that it does not contain any numeric characters. So something like "smith23" would not be acceptable.
What do you want? Speed or simplicity? For speed, go for a loop based approach. For simplicity, go for a one liner RegEx based approach.
Speed
public boolean isAlpha(String name) {
char[] chars = name.toCharArray();
for (char c : chars) {
if(!Character.isLetter(c)) {
return false;
}
}
return true;
}
Simplicity
public boolean isAlpha(String name) {
return name.matches("[a-zA-Z]+");
}
Java 8 lambda expressions. Both fast and simple.
boolean allLetters = someString.chars().allMatch(Character::isLetter);
Or if you are using Apache Commons, [StringUtils.isAlpha()].
First import Pattern :
import java.util.regex.Pattern;
Then use this simple code:
String s = "smith23";
if (Pattern.matches("[a-zA-Z]+",s)) {
// Do something
System.out.println("Yes, string contains letters only");
}else{
System.out.println("Nope, Other characters detected");
}
This will output:
Nope, Other characters detected
I used this regex expression (".*[a-zA-Z]+.*"). With if not statement it will avoid all expressions that have a letter before, at the end or between any type of other character.
String strWithLetters = "123AZ456";
if(! Pattern.matches(".*[a-zA-Z]+.*", str1))
return true;
else return false
A quick way to do it is by:
public boolean isStringAlpha(String aString) {
int charCount = 0;
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (aString.length() == 0) {
return false; //zero length string ain't alpha
}
for (int i = 0; i < aString.length(); i++) {
for (int j = 0; j < alphabet.length(); j++) {
if (aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1))
|| aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1).toLowerCase())) {
charCount++;
}
}
if (charCount != (i + 1)) {
System.out.println("\n**Invalid input! Enter alpha values**\n");
return false;
}
}
return true;
}
Because you don't have to run the whole aString to check if it isn't an alpha String.
private boolean isOnlyLetters(String s){
char c=' ';
boolean isGood=false, safe=isGood;
int failCount=0;
for(int i=0;i<s.length();i++){
c = s.charAt(i);
if(Character.isLetter(c))
isGood=true;
else{
isGood=false;
failCount+=1;
}
}
if(failCount==0 && s.length()>0)
safe=true;
else
safe=false;
return safe;
}
I know it's a bit crowded. I was using it with my program and felt the desire to share it with people. It can tell if any character in a string is not a letter or not. Use it if you want something easy to clarify and look back on.
Faster way is below. Considering letters are only a-z,A-Z.
public static void main( String[] args ){
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
}
public static boolean bettertWay(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for(char c : chars){
if(!(c>=65 && c<=90)&&!(c>=97 && c<=122) ){
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
public static boolean isAlpha(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for (char c : chars) {
if(!Character.isLetter(c)) {
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
Runtime is calculated in nano seconds. It may vary system to system.
5748//bettertWay without numbers
true
89493 //isAlpha without numbers
true
3284 //bettertWay with numbers
false
22989 //isAlpha with numbers
false
Check this,i guess this is help you because it's work in my project so once you check this code
if(! Pattern.matches(".*[a-zA-Z]+.*[a-zA-Z]", str1))
{
String not contain only character;
}
else
{
String contain only character;
}
String expression = "^[a-zA-Z]*$";
CharSequence inputStr = str;
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches())
{
//if pattern matches
}
else
{
//if pattern does not matches
}
Try using regular expressions: String.matches
public boolean isAlpha(String name)
{
String s=name.toLowerCase();
for(int i=0; i<s.length();i++)
{
if((s.charAt(i)>='a' && s.charAt(i)<='z'))
{
continue;
}
else
{
return false;
}
}
return true;
}
Feels as if our need is to find whether the character are only alphabets.
Here's how you can solve it-
Character.isAlphabetic(c)
helps to check if the characters of the string are alphabets or not.
where c is
char c = s.charAt(elementIndex);
While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphabetic() will return False whereas "john smith".IsAlphabetic() will return True. By default the .IsAlphabetic() method ignores spaces, but it can also be overridden such that "john smith".IsAlphabetic(false) will return False since the space is not considered part of the alphabet.
Every other check in the rest of the code is simply MyString.IsAlphabetic().
To allow only ASCII letters, the character class \p{Alpha} can be used. (This is equivalent to [\p{Lower}\p{Upper}] or [a-zA-Z].)
boolean allLettersASCII = str.matches("\\p{Alpha}*");
For allowing all Unicode letters, use the character class \p{L} (or equivalently, \p{IsL}).
boolean allLettersUnicode = str.matches("\\p{L}*");
See the Pattern documentation.
I found an easy of way of checking a string whether all its digit is letter or not.
public static boolean isStringLetter(String input) {
boolean b = false;
for (int id = 0; id < input.length(); id++) {
if ('a' <= input.charAt(id) && input.charAt(id) <= 'z') {
b = true;
} else if ('A' <= input.charAt(id) && input.charAt(id) <= 'Z') {
b = true;
} else {
b = false;
}
}
return b;
}
I hope it could help anyone who is looking for such method.
Use StringUtils.isAlpha() method and it will make your life simple.

Categories