Python's string module has a few handy operations that will return a certain character sets, such as all the uppercase characters. Is there anything similar for Java?
http://docs.python.org/2/library/string.html
string.ascii_lowercase
The lowercase letters 'abcdefghijklmnopqrstuvwxyz'.
string.ascii_uppercase
The uppercase letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
string.digits
The string '0123456789'.
string.punctuation
String of ASCII characters which are considered punctuation characters in the C locale.
No. You'd need to write a loop:
String result = "";
for (char c = 'A'; c <= 'Z'; c++) {
result += c;
}
You could create your own,
UPPERCASE
public String uppercase()
{
String answer = "";
char c;
for (c = 'A'; c <= 'Z'; c++)
{
answer += c;
}
return answer;
}
LOWERCASE
public String lowercase()
{
String answer = "";
char c;
for (c = 'a'; c <= 'z'; c++)
{
answer += c;
}
return answer;
}
NUMBERS
public String numbers()
{
String answer = "";
int i;
for (i=0;i<9;i++)
{
answer += i;
}
return answer;
}
You can just do toLowerCase() and toUpperCase()
String abc = aBc;
abc.toLowerCase(); // abc
abc.toUpperCase(); // ABC
EDIT: I misread the OP.
OP doesn't want to convert strings, but get a collections of all Upper-/Lowercases, Punctiation and Numbers, using Java
public static char[] getUpper() {
char[] res = new char[26];
for(int i = 0; i <= 26; i++) {
res[i] = 'A' + i;
}
return res;
}
// Or just do getUpper().toLowerCase();
public static char[] getUpper() {
char[] res = new char[26];
for(int i = 0; i <= 26; i++) {
res[i] = 'a' + i;
}
return res;
}
public static char[] getUpper() {
char[] res = new char[10];
for(int i = 0; i <= 10; i++) {
res[i] = '0' + i;
}
return res;
}
For punctuations I really don't know
You can ofc just use some directly returning methods, as you know what the output should be.
public static String getUpper() {
return "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
}
public static String getLower() {
return "abcdefghijklmnopqrstuvwxyz"
}
public static String getDigits() {
return "0123456789"
}
public static String getPunctuation() {
return ".," // Don't really know what this should return
}
I think that the simple answer is that there is no direct equivalent.
Obviously, if you simply want all of the uppercase letters in (say) ASCII, then it is a trivial task to implement. Java doesn't implement it in a standard API. I imagine that's because:
it is trivial to implement yourself,
they do not want APIs that mix up the distinction between encoded and decoded characters, and
they do not want to encourage code that hard-wires use of specific character sets / encodings.
Java's internal character representation is Unicode based. Unicode is essentially a catalog of character code points from many languages and writing systems. There is a subsystem for mapping between Java's internal Unicode-based text representation an external character encodings such as ASCII, LATIN-1 ... and UTF-8. Some of these support all Unicode code points, and others don't.
So if you wanted a general solution to the problem of "find me all uppercase letters in some character set XYX", you'd need to do something like this:
Obtain the Java Charset object for "XYZ".
Enumerate all of the valid characters in the character set "XYZ".
Obtain the CharsetDecoder object for the character set, and use it to decode each valid character ... to a Unicode code point.
Use the Java Character class to test the code point's Unicode attributes; e.g. whether it is a "letter", whether it is "upper case" and so on. (Use Character.getType(codepoint))
I imagine this procedure (or similar) is implementable, but I'm not aware of any standard Java APIs or 3rd party Java APIs that do that.
try this method toUpperCase().
Related
I have a string that I must filter, in some scenarios to only contain the characters 'a' through 'f' or the digits '0' through '9' and in other scenarios, just the digits '1' through '9'.
Since i am not looking to strip any specific chars, rather to only preserve specific chars, using a loop to go through the every character in unicode save those i'd like to preserve would seem to be slight overkill.
Here's the signature of the method i'm looking to write:
String stripExtras(CharSequence input, CharSequence legalChars){
}
And I'd use it like in this example:
String example = "aeiou456";
String output = stripExtras(example,"abcdef0123456789");
System.out.println(output);
where the output should be ae456.
Iv'e seen a method in org.apache.commons.lang3.StringUtils called containsOnly that returns a boolean value representing if the input contains the specified chars, but the source is a bit beyond my grasp.
How do I go about filtering a string to allow only specific characters?
Try this.
static String stripExtras(CharSequence input, CharSequence legalChars){
return input.toString().replaceAll("[^" + legalChars + "]", "");
}
But you can not specify special characters ("]", "-", ...) as legalChars.
if you mind this limitation.
static String stripExtras(CharSequence input, CharSequence legalChars){
Set<Integer> regalSet = legalChars.codePoints().boxed()
.collect(Collectors.toCollection(() -> new HashSet<>(legalChars.length())));
return input.codePoints()
.filter(regalSet::contains)
.collect(StringBuilder::new,
(sb, cp) -> sb.appendCodePoint(cp),
StringBuilder::append)
.toString();
}
Here is implementation that works on Java 1.5 and later.
static String stripExtras(CharSequence input, CharSequence legalChars) {
StringBuilder output = new StringBuilder();
for (int i = 0; i < input.length(); i++) {
char ch = input.charAt(i);
if (contains(legalChars, input.charAt(i)))
output.append(ch);
}
return output.toString();
}
static boolean contains(CharSequence str, char ch) {
for (int i = 0; i < str.length(); i++)
if (str.charAt(i) == ch)
return true;
return false;
}
Test
String example = "aeiou456";
String output = stripExtras(example,"abcdef0123456789");
System.out.println(output);
Output
ae456
Try this. Works with ASCII or Unicode characters.
String example = "aeiou456";
String output = stripExtras(example, "abcdef0123456789");
System.out.println(output);
static String stripExtras(CharSequence input,
CharSequence legalChars) {
return input.codePoints()
.filter(a -> legalChars.toString().indexOf(a) >= 0)
.mapToObj(Character::toString)
.collect(Collectors.joining(""));
}
Prints
ae456
I want to write a library to generate a String from a given regex. However, I run into a problem, if the regex uses a negated character class, like [^a-z]. In this case, I have to place a character into the generated String that does not match [a-z]. Also, I want to be able to define a set of characters that are used preferably, e.g. the set of printable characters.
Question
How do I generate a random character that is not contained in a given array/collection? How can I prefer groups of characters in this process?
An existing function in the libraries would be great, however I wasn't able to find one.
Here is my approach to solve the problem, however I wonder if there is a better algorithm. Also, my algorithm does not prefer a given set of characters, mainly because I do not know how to check if a character is printable or how I get an array/collection/iterable of printable characters.
private void run() {
int i = 1024;
System.out.println(getFirstLegalChar(createExampleIllegalCharArray(i)));
System.out.println((char) i);
}
private char getFirstLegalChar(char[] illegalCharArray) {
for (int i = 0; true; i++) {
if (!contains(illegalCharArray, (char) i)) {
return (char) i;
}
}
}
private char[] createExampleIllegalCharArray(int size) {
char[] illegalCharArray = new char[size];
for (int i = 0; i < illegalCharArray.length; i++) {
illegalCharArray[i] = (char) i;
}
return illegalCharArray;
}
private boolean contains(char[] charArray, char c) {
for (int j = 0; j < charArray.length; j++) {
if (charArray[j] == c) {
return true;
}
}
return false;
}
you can check the list of printable and non printable characters at
Juniper.
i have checked few things and come-up with one solution you can check
public static void main(String[] args) {
final char RECORD_SEPARATOR = 0x1e;
final char END_OF_TEXT = 0x03;
System.out.println(isPrintableChar(RECORD_SEPARATOR));
System.out.println(isPrintableChar(END_OF_TEXT));
System.out.println(isPrintableChar('a'));
}
public static boolean isPrintableChar( char c ) {
Character.UnicodeBlock block = Character.UnicodeBlock.of( c );
return (!Character.isISOControl(c)) &&
c != KeyEvent.CHAR_UNDEFINED &&
block != null &&
block != Character.UnicodeBlock.SPECIALS;
}
i got the output as
false
false
true
As the title says: I want the input to be one or more symbols that is not in the union of letters, numbers and white space. So basically any of ~!##, etc. I have
"^(?=.*[[^0-9][^\w]])(?=\\S+$)$"
I know I could negate the appropriate set, but I don't know how to create my super set to start with. Would the following do?
"^(?=.*[(_A-Za-z0-9-\\+)])(?=\\S+$)$"
Maybe you're looking for \p{Punct}, which matches any of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~.
String re = "\\p{Punct}+";
The class:
[^\w\s]
This will match any non-alphanumeric/non-whitespace character.
Java String:
String regex = "[^\\w\\s]";
To match a string of one or more non letter, non number or non white space you with a regex you could use:
^(?:[^\w\s]|_)+$
You have to include the _ separately because the character class \w includes the _. And the \w character class is equivalent to [a-zA-Z_0-9] reference link
I would just use a Character object to keep it simple.
Something like this:
public String getSpecialSymbols(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
Character c = s.charAt(i);
if (!c.isDigit() && !c.isWhitespace() && !c.isLetter()) {
sb.append(c);
}
}
return sb.toString();
}
This would be even more straightforward:
public String getSpecialSymbols(String s) {
String special = "!##$%^&*()_+-=[]{}|'\";\\:/?.>,<~`";
for (int i = 0; i < s.length(); i++) {
String c = s.substring(i, 1);
if (special.contains(c)) {
sb.append(c);
}
}
return sb.toString();
}
The idea is to have a String read and to verify that it does not contain any numeric characters. So something like "smith23" would not be acceptable.
What do you want? Speed or simplicity? For speed, go for a loop based approach. For simplicity, go for a one liner RegEx based approach.
Speed
public boolean isAlpha(String name) {
char[] chars = name.toCharArray();
for (char c : chars) {
if(!Character.isLetter(c)) {
return false;
}
}
return true;
}
Simplicity
public boolean isAlpha(String name) {
return name.matches("[a-zA-Z]+");
}
Java 8 lambda expressions. Both fast and simple.
boolean allLetters = someString.chars().allMatch(Character::isLetter);
Or if you are using Apache Commons, [StringUtils.isAlpha()].
First import Pattern :
import java.util.regex.Pattern;
Then use this simple code:
String s = "smith23";
if (Pattern.matches("[a-zA-Z]+",s)) {
// Do something
System.out.println("Yes, string contains letters only");
}else{
System.out.println("Nope, Other characters detected");
}
This will output:
Nope, Other characters detected
I used this regex expression (".*[a-zA-Z]+.*"). With if not statement it will avoid all expressions that have a letter before, at the end or between any type of other character.
String strWithLetters = "123AZ456";
if(! Pattern.matches(".*[a-zA-Z]+.*", str1))
return true;
else return false
A quick way to do it is by:
public boolean isStringAlpha(String aString) {
int charCount = 0;
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (aString.length() == 0) {
return false; //zero length string ain't alpha
}
for (int i = 0; i < aString.length(); i++) {
for (int j = 0; j < alphabet.length(); j++) {
if (aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1))
|| aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1).toLowerCase())) {
charCount++;
}
}
if (charCount != (i + 1)) {
System.out.println("\n**Invalid input! Enter alpha values**\n");
return false;
}
}
return true;
}
Because you don't have to run the whole aString to check if it isn't an alpha String.
private boolean isOnlyLetters(String s){
char c=' ';
boolean isGood=false, safe=isGood;
int failCount=0;
for(int i=0;i<s.length();i++){
c = s.charAt(i);
if(Character.isLetter(c))
isGood=true;
else{
isGood=false;
failCount+=1;
}
}
if(failCount==0 && s.length()>0)
safe=true;
else
safe=false;
return safe;
}
I know it's a bit crowded. I was using it with my program and felt the desire to share it with people. It can tell if any character in a string is not a letter or not. Use it if you want something easy to clarify and look back on.
Faster way is below. Considering letters are only a-z,A-Z.
public static void main( String[] args ){
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
}
public static boolean bettertWay(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for(char c : chars){
if(!(c>=65 && c<=90)&&!(c>=97 && c<=122) ){
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
public static boolean isAlpha(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for (char c : chars) {
if(!Character.isLetter(c)) {
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
Runtime is calculated in nano seconds. It may vary system to system.
5748//bettertWay without numbers
true
89493 //isAlpha without numbers
true
3284 //bettertWay with numbers
false
22989 //isAlpha with numbers
false
Check this,i guess this is help you because it's work in my project so once you check this code
if(! Pattern.matches(".*[a-zA-Z]+.*[a-zA-Z]", str1))
{
String not contain only character;
}
else
{
String contain only character;
}
String expression = "^[a-zA-Z]*$";
CharSequence inputStr = str;
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches())
{
//if pattern matches
}
else
{
//if pattern does not matches
}
Try using regular expressions: String.matches
public boolean isAlpha(String name)
{
String s=name.toLowerCase();
for(int i=0; i<s.length();i++)
{
if((s.charAt(i)>='a' && s.charAt(i)<='z'))
{
continue;
}
else
{
return false;
}
}
return true;
}
Feels as if our need is to find whether the character are only alphabets.
Here's how you can solve it-
Character.isAlphabetic(c)
helps to check if the characters of the string are alphabets or not.
where c is
char c = s.charAt(elementIndex);
While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphabetic() will return False whereas "john smith".IsAlphabetic() will return True. By default the .IsAlphabetic() method ignores spaces, but it can also be overridden such that "john smith".IsAlphabetic(false) will return False since the space is not considered part of the alphabet.
Every other check in the rest of the code is simply MyString.IsAlphabetic().
To allow only ASCII letters, the character class \p{Alpha} can be used. (This is equivalent to [\p{Lower}\p{Upper}] or [a-zA-Z].)
boolean allLettersASCII = str.matches("\\p{Alpha}*");
For allowing all Unicode letters, use the character class \p{L} (or equivalently, \p{IsL}).
boolean allLettersUnicode = str.matches("\\p{L}*");
See the Pattern documentation.
I found an easy of way of checking a string whether all its digit is letter or not.
public static boolean isStringLetter(String input) {
boolean b = false;
for (int id = 0; id < input.length(); id++) {
if ('a' <= input.charAt(id) && input.charAt(id) <= 'z') {
b = true;
} else if ('A' <= input.charAt(id) && input.charAt(id) <= 'Z') {
b = true;
} else {
b = false;
}
}
return b;
}
I hope it could help anyone who is looking for such method.
Use StringUtils.isAlpha() method and it will make your life simple.
The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?
From Guava 19.0 onward, you may use:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString) method which relies on the factory method ascii() rather than the now deprecated ASCII singleton.
Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20 (space) such as tabs, line-feed / return but also BEL with code 0x07 and DEL with code 0x7F.
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000 or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
For earlier Guava versions without the ascii() method you may write:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);
You can do it with java.nio.charset.Charset.
import java.nio.charset.Charset;
public class StringUtils {
public static boolean isPureAscii(String v) {
return Charset.forName("US-ASCII").newEncoder().canEncode(v);
// or "ISO-8859-1" for ISO Latin 1
// or StandardCharsets.US_ASCII with JDK1.7+
}
public static void main (String args[])
throws Exception {
String test = "Réal";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
test = "Real";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
/*
* output :
* Réal isPureAscii() : false
* Real isPureAscii() : true
*/
}
}
Detect non-ASCII character in a String
Here is another way not depending on a library but using a regex.
You can use this single line:
text.matches("\\A\\p{ASCII}*\\z")
Whole example program:
public class Main {
public static void main(String[] args) {
char nonAscii = 0x00FF;
String asciiText = "Hello";
String nonAsciiText = "Buy: " + nonAscii;
System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
}
}
Understanding the regex :
li \\A : Beginning of input
\\p{ASCII} : Any ASCII character
* : all repetitions
\\z : End of input
Iterate through the string and make sure all the characters have a value less than 128.
Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127
Or you copy the code from the IDN class.
// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
boolean isASCII = true;
for (int i = 0; i < input.length(); i++) {
int c = input.charAt(i);
if (c > 0x7F) {
isASCII = false;
break;
}
}
return isASCII;
}
commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.
System.out.println(StringUtils.isAsciiPrintable("!#£$%^&!#£$%^"));
try this:
for (char c: string.toCharArray()){
if (((int)c)>127){
return false;
}
}
return true;
This will return true if String only contains ASCII characters and false when it does not
Charset.forName("US-ASCII").newEncoder().canEncode(str)
If You want to remove non ASCII , here is the snippet:
if(!Charset.forName("US-ASCII").newEncoder().canEncode(str)) {
str = str.replaceAll("[^\\p{ASCII}]", "");
}
In Java 8 and above, one can use String#codePoints in conjunction with IntStream#allMatch.
boolean allASCII = str.codePoints().allMatch(c -> c < 128);
In Kotlin:
fun String.isAsciiString() : Boolean =
this.toCharArray().none { it < ' ' || it > '~' }
Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.
Break at the first you don't like.
private static boolean isASCII(String s)
{
for (int i = 0; i < s.length(); i++)
if (s.charAt(i) > 127)
return false;
return true;
}
It was possible. Pretty problem.
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
public class EncodingTest {
static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
.newEncoder();
public static void main(String[] args) {
String testStr = "¤EÀsÆW°ê»Ú®i¶T¤¤¤ß3¼Ó®i¶TÆU2~~KITEC 3/F Rotunda 2";
String[] strArr = testStr.split("~~", 2);
int count = 0;
boolean encodeFlag = false;
do {
encodeFlag = asciiEncoderTest(strArr[count]);
System.out.println(encodeFlag);
count++;
} while (count < strArr.length);
}
public static boolean asciiEncoderTest(String test) {
boolean encodeFlag = false;
try {
encodeFlag = asciiEncoder.canEncode(new String(test
.getBytes("ISO8859_1"), "BIG5"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return encodeFlag;
}
}
//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
return (c > 64 && c < 91) || (c > 96 && c < 123);
}