how check if String has Full width character in java - java

Can anyone suggest me how to check if a String contains full width characters in Java? Characters having full width are special characters.
Full width characters in String:
abc@gmail.com
Half width characters in String:
abc#gmail.com

I'm not sure if you are looking for any or all, so here are functions for both:
public static boolean isAllFullWidth(String str) {
for (char c : str.toCharArray())
if ((c & 0xff00) != 0xff00)
return false;
return true;
}
public static boolean areAnyFullWidth(String str) {
for (char c : str.toCharArray())
if ((c & 0xff00) == 0xff00)
return true;
return false;
}
As for your half width '.' and possible '_'. Strip them out first with a replace maybe:
String str="abc@gmail.com";
if (isAllFullWidth(str.replaceAll("[._]","")))
//then apart from . and _, they are all full width
Regex
Alternatively if you want to use a regex to test, then this is the actual character range for full width:
[\uFF01-\uFF5E]
So the method then looks like:
public static boolean isAllFullWidth(String str) {
return str.matches("[\\uff01-\\uff5E]*");
}
You can add your other characters to it and so not need to strip them:
public static boolean isValidFullWidthEmail(String str) {
return str.matches("[\\uff01-\\uff5E._]*");
}

You can compare the UNICODE Since unicode for alphabets (a-z) is 97-122 , So you can easily diffrentiate between the two
String str="abc@gmail.com";
System.out.println((int)str.charAt(0));
for Input
abc@gmail.com
Output
65345

You can try something like this:
public static final String FULL_WIDTH_CHARS = "AaBbCcDdEeFfGgHhIiJj"
+ "KkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";
public static boolean containsFullWidthChars(String str) {
for(int i = 0; i < FULL_WIDTH_CHARS.length(); i++) {
if(str.contains(String.valueOf(FULL_WIDTH_CHARS.charAt(i)))) {
return true;
}
}
return false;
}

use regular expression here.
\W is used to check for non-word characters.
str will contain full width character if following statement return true:
boolean flag = str.matches("\\W");

half-width: 1 byte
full-width: > 1 byte (2,3,4.. byte)
-> compare: length of String == byte length
String strCheck = "abc@gmail.com";
if (str.length() != str.getBytes().length) {
// is Full Width
} else {
// is Half Width
}

Related

Find the amount of occurances of a character in a string recursively (java)

I am trying to write a method which returns the number of times the first character of a string appears throughout the string. This is what I have so far,
public int numberOfFirstChar0(String str) {
char ch = str.charAt(0);
if (str.equals("")) {
return 0;
}
if ((str.substring(0, 1).equals(ch))) {
return 1 + numberOfFirstChar0(str.substring(1));
}
return numberOfFirstChar0(str);
}
however, it does not seem to work (does not return the correct result of how many occurrences there are in the string). Is there anything wrong with the code? Any help is appreciated.
This uses 2 functions, one which is recursive. We obtain the character at the first index and the character array from the String once instead of doing it over and over and concatenating the String. We then use recursion to continue going through the indices of the character array.
Why you would do this I have no idea. A simple for-loop would achieve this in a much easier fashion.
private static int numberOfFirstChar0(String str) {
if (str.isEmpty()) {
return 0;
}
char[] characters = str.toCharArray();
char character = characters[0];
return occurrences(characters, character, 0, 0);
}
private static int occurrences(char[] characters, char character, int index, int occurrences) {
if (index >= characters.length - 1) {
return occurrences;
}
if (characters[index] == character) {
occurrences++;
}
return occurrences(characters, character, ++index, occurrences);
}
Java 8 Solution
private static long occurrencesOfFirst(String input) {
if (input.isEmpty()) {
return 0;
}
char characterAtIndexZero = input.charAt(0);
return input.chars()
.filter(character -> character == characterAtIndexZero)
.count();
}
Here is a simple example of what you are looking for.
Code
public static void main(String args[]) {
//the string we will use to count the occurence of the first character
String countMe = "abcaabbbdc";
//the counter used
int charCount=0;
for(int i = 0;i<countMe.length();i++) {
if(countMe.charAt(i)==countMe.charAt(0)) {
//add to counter
charCount++;
}
}
//print results
System.out.println("The character '"+countMe.charAt(0)+"' appears "+ charCount+ " times");
}
Output
The character 'a' appears 3 times

java, how to determine if a string contains a sub string in a particular order

I have two strings to compare
String st1 = "database-2.0/version\"25-00\"";
String st2 = "database2.0version25";
I want to determine if st1 contains st2. In the example provided I expect to get Yes as answer because the order of characters in st2 is same is st1 and it only missing some characters. Is any function in Java library to do such comparison? I am aware of st1.indexOf(st2) and st1.contains(st2) but they didn't work in this case, both returned false.
Try this:
String regex = st2.chars()
.mapToObj(i -> String.valueOf((char) i))
.map(str -> ".*+?^${}()|[]\\".contains(str) ? "\\" + str : str)
.collect(Collectors.joining(".*", ".*", ".*"));
boolean contains = st1.matches(regex);
Here's a rundown:
Get a regex string of the shorter string (st2 in our case - hardcoded - you can automate this of-course), adding .* in front and back, and between each character. (.* matches 0 or more of any character).
String.chars() returns an IntStream, convert it to String with type cast
As #Robert suggested, escape special characters with a backslash.
Check of the longer string matches, which effectivelly means it contains all characters of the short string, and maybe more.
What you are looking for is a subsequence, not a substring.
Here's a working solution I found on geeksforgeeks:
// Recursive Java program to check if a string
// is subsequence of another string
import java.io.*;
class SubSequence
{
// Returns true if str1[] is a subsequence of str2[]
// m is length of str1 and n is length of str2
static boolean isSubSequence(String str1, String str2, int m, int n)
{
// Base Cases
if (m == 0)
return true;
if (n == 0)
return false;
// If last characters of two strings are matching
if (str1.charAt(m-1) == str2.charAt(n-1))
return isSubSequence(str1, str2, m-1, n-1);
// If last characters are not matching
return isSubSequence(str1, str2, m, n-1);
}
// Driver program
public static void main (String[] args)
{
String str1 = "database2.0version25";
String str2 = "database2.0/version\"2-00\"";
int m = str1.length();
int n = str2.length();
boolean res = isSubSequence(str1, str2, m, n);
if(res)
System.out.println("Yes");
else
System.out.println("No");
}
}
// Contributed by Pramod Kumar
You can find the subsequence needle in the string haystack by looking for needle's characters in order, starting from an index searchFrom that you update as you find each successive character.
In the following code, note that haystack.indexOf(needleChar, searchFrom) returns the index of the first occurrence of needleChar starting from index searchFrom in haystack.
boolean contains(String haystack, String needle) {
int searchFrom = 0;
for (char needleChar : needle.toCharArray()) {
searchFrom = haystack.indexOf(needleChar, searchFrom);
if (searchFrom == -1) {
return false;
}
}
return true;
}

Add separator in string using regex in Java

I have a string (for example: "foo12"), and I want to add a delimiting character in between the letters and numbers (e.g. "foo|12"). However, I can't seem to figure out what the appropriate code is for doing this in Java. Should I use a regex + replace or do I need to use a matcher?
A regex replace would be just fine:
String result = subject.replaceAll("(?<=\\p{L})(?=\\p{N})", "|");
This looks for a position right after a letter and right before a digit (by using lookaround assertions). If you only want to look for ASCII letters/digits, use
String result = subject.replaceAll("(?i)(?<=[a-z])(?=[0-9])", "|");
Split letters and numbers and concatenate with "|". Here is a one-liner:
String x = "foo12";
String result = x.replaceAll("[0-9]", "") + "|" + x.replaceAll("[a-zA-Z]", "");
Printing result will output: foo|12
Why even use regex? This isn't too hard to implement on your own:
public static String addDelimiter(String str, char delimiter) {
StringBuilder string = new StringBuilder(str);
boolean isLetter = false;
boolean isNumber = false;
for (int index = 0; index < string.length(); index++) {
isNumber = isNumber(string.charAt(index));
if (isLetter && isNumber) {
//the last char was a letter, and now we have a number
//so here we adjust the stringbuilder
string.insert(index, delimiter);
index++; //We just inserted the delimiter, get past the delimiter
}
isLetter = isLetter(string.charAt(index));
}
return string.toString();
}
public static boolean isLetter(char c) {
return 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z';
}
public static boolean isNumber(char c) {
return '0' <= c && c <= '9';
}
The advantage of this over regex is that regex can easily be slower. Additionally, it is easy to change the isLetter and isNumber methods to allow for inserting the delimiter in different places.

Check if String contains only letters

The idea is to have a String read and to verify that it does not contain any numeric characters. So something like "smith23" would not be acceptable.
What do you want? Speed or simplicity? For speed, go for a loop based approach. For simplicity, go for a one liner RegEx based approach.
Speed
public boolean isAlpha(String name) {
char[] chars = name.toCharArray();
for (char c : chars) {
if(!Character.isLetter(c)) {
return false;
}
}
return true;
}
Simplicity
public boolean isAlpha(String name) {
return name.matches("[a-zA-Z]+");
}
Java 8 lambda expressions. Both fast and simple.
boolean allLetters = someString.chars().allMatch(Character::isLetter);
Or if you are using Apache Commons, [StringUtils.isAlpha()].
First import Pattern :
import java.util.regex.Pattern;
Then use this simple code:
String s = "smith23";
if (Pattern.matches("[a-zA-Z]+",s)) {
// Do something
System.out.println("Yes, string contains letters only");
}else{
System.out.println("Nope, Other characters detected");
}
This will output:
Nope, Other characters detected
I used this regex expression (".*[a-zA-Z]+.*"). With if not statement it will avoid all expressions that have a letter before, at the end or between any type of other character.
String strWithLetters = "123AZ456";
if(! Pattern.matches(".*[a-zA-Z]+.*", str1))
return true;
else return false
A quick way to do it is by:
public boolean isStringAlpha(String aString) {
int charCount = 0;
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (aString.length() == 0) {
return false; //zero length string ain't alpha
}
for (int i = 0; i < aString.length(); i++) {
for (int j = 0; j < alphabet.length(); j++) {
if (aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1))
|| aString.substring(i, i + 1).equals(alphabet.substring(j, j + 1).toLowerCase())) {
charCount++;
}
}
if (charCount != (i + 1)) {
System.out.println("\n**Invalid input! Enter alpha values**\n");
return false;
}
}
return true;
}
Because you don't have to run the whole aString to check if it isn't an alpha String.
private boolean isOnlyLetters(String s){
char c=' ';
boolean isGood=false, safe=isGood;
int failCount=0;
for(int i=0;i<s.length();i++){
c = s.charAt(i);
if(Character.isLetter(c))
isGood=true;
else{
isGood=false;
failCount+=1;
}
}
if(failCount==0 && s.length()>0)
safe=true;
else
safe=false;
return safe;
}
I know it's a bit crowded. I was using it with my program and felt the desire to share it with people. It can tell if any character in a string is not a letter or not. Use it if you want something easy to clarify and look back on.
Faster way is below. Considering letters are only a-z,A-Z.
public static void main( String[] args ){
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(bestWay("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
System.out.println(isAlpha("azAZpratiyushkumarsinghjdnfkjsaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
}
public static boolean bettertWay(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for(char c : chars){
if(!(c>=65 && c<=90)&&!(c>=97 && c<=122) ){
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
public static boolean isAlpha(String name) {
char[] chars = name.toCharArray();
long startTimeOne = System.nanoTime();
for (char c : chars) {
if(!Character.isLetter(c)) {
System.out.println(System.nanoTime() - startTimeOne);
return false;
}
}
System.out.println(System.nanoTime() - startTimeOne);
return true;
}
Runtime is calculated in nano seconds. It may vary system to system.
5748//bettertWay without numbers
true
89493 //isAlpha without numbers
true
3284 //bettertWay with numbers
false
22989 //isAlpha with numbers
false
Check this,i guess this is help you because it's work in my project so once you check this code
if(! Pattern.matches(".*[a-zA-Z]+.*[a-zA-Z]", str1))
{
String not contain only character;
}
else
{
String contain only character;
}
String expression = "^[a-zA-Z]*$";
CharSequence inputStr = str;
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches())
{
//if pattern matches
}
else
{
//if pattern does not matches
}
Try using regular expressions: String.matches
public boolean isAlpha(String name)
{
String s=name.toLowerCase();
for(int i=0; i<s.length();i++)
{
if((s.charAt(i)>='a' && s.charAt(i)<='z'))
{
continue;
}
else
{
return false;
}
}
return true;
}
Feels as if our need is to find whether the character are only alphabets.
Here's how you can solve it-
Character.isAlphabetic(c)
helps to check if the characters of the string are alphabets or not.
where c is
char c = s.charAt(elementIndex);
While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphabetic() will return False whereas "john smith".IsAlphabetic() will return True. By default the .IsAlphabetic() method ignores spaces, but it can also be overridden such that "john smith".IsAlphabetic(false) will return False since the space is not considered part of the alphabet.
Every other check in the rest of the code is simply MyString.IsAlphabetic().
To allow only ASCII letters, the character class \p{Alpha} can be used. (This is equivalent to [\p{Lower}\p{Upper}] or [a-zA-Z].)
boolean allLettersASCII = str.matches("\\p{Alpha}*");
For allowing all Unicode letters, use the character class \p{L} (or equivalently, \p{IsL}).
boolean allLettersUnicode = str.matches("\\p{L}*");
See the Pattern documentation.
I found an easy of way of checking a string whether all its digit is letter or not.
public static boolean isStringLetter(String input) {
boolean b = false;
for (int id = 0; id < input.length(); id++) {
if ('a' <= input.charAt(id) && input.charAt(id) <= 'z') {
b = true;
} else if ('A' <= input.charAt(id) && input.charAt(id) <= 'Z') {
b = true;
} else {
b = false;
}
}
return b;
}
I hope it could help anyone who is looking for such method.
Use StringUtils.isAlpha() method and it will make your life simple.

How to check if a String contains only ASCII?

The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?
From Guava 19.0 onward, you may use:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString) method which relies on the factory method ascii() rather than the now deprecated ASCII singleton.
Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20 (space) such as tabs, line-feed / return but also BEL with code 0x07 and DEL with code 0x7F.
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000 or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
For earlier Guava versions without the ascii() method you may write:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);
You can do it with java.nio.charset.Charset.
import java.nio.charset.Charset;
public class StringUtils {
public static boolean isPureAscii(String v) {
return Charset.forName("US-ASCII").newEncoder().canEncode(v);
// or "ISO-8859-1" for ISO Latin 1
// or StandardCharsets.US_ASCII with JDK1.7+
}
public static void main (String args[])
throws Exception {
String test = "Réal";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
test = "Real";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
/*
* output :
* Réal isPureAscii() : false
* Real isPureAscii() : true
*/
}
}
Detect non-ASCII character in a String
Here is another way not depending on a library but using a regex.
You can use this single line:
text.matches("\\A\\p{ASCII}*\\z")
Whole example program:
public class Main {
public static void main(String[] args) {
char nonAscii = 0x00FF;
String asciiText = "Hello";
String nonAsciiText = "Buy: " + nonAscii;
System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
}
}
Understanding the regex :
li \\A : Beginning of input
\\p{ASCII} : Any ASCII character
* : all repetitions
\\z : End of input
Iterate through the string and make sure all the characters have a value less than 128.
Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127
Or you copy the code from the IDN class.
// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
boolean isASCII = true;
for (int i = 0; i < input.length(); i++) {
int c = input.charAt(i);
if (c > 0x7F) {
isASCII = false;
break;
}
}
return isASCII;
}
commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.
System.out.println(StringUtils.isAsciiPrintable("!#£$%^&!#£$%^"));
try this:
for (char c: string.toCharArray()){
if (((int)c)>127){
return false;
}
}
return true;
This will return true if String only contains ASCII characters and false when it does not
Charset.forName("US-ASCII").newEncoder().canEncode(str)
If You want to remove non ASCII , here is the snippet:
if(!Charset.forName("US-ASCII").newEncoder().canEncode(str)) {
str = str.replaceAll("[^\\p{ASCII}]", "");
}
In Java 8 and above, one can use String#codePoints in conjunction with IntStream#allMatch.
boolean allASCII = str.codePoints().allMatch(c -> c < 128);
In Kotlin:
fun String.isAsciiString() : Boolean =
this.toCharArray().none { it < ' ' || it > '~' }
Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.
Break at the first you don't like.
private static boolean isASCII(String s)
{
for (int i = 0; i < s.length(); i++)
if (s.charAt(i) > 127)
return false;
return true;
}
It was possible. Pretty problem.
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
public class EncodingTest {
static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
.newEncoder();
public static void main(String[] args) {
String testStr = "¤EÀsÆW°ê»Ú®i¶T¤¤¤ß3¼Ó®i¶TÆU2~~KITEC 3/F Rotunda 2";
String[] strArr = testStr.split("~~", 2);
int count = 0;
boolean encodeFlag = false;
do {
encodeFlag = asciiEncoderTest(strArr[count]);
System.out.println(encodeFlag);
count++;
} while (count < strArr.length);
}
public static boolean asciiEncoderTest(String test) {
boolean encodeFlag = false;
try {
encodeFlag = asciiEncoder.canEncode(new String(test
.getBytes("ISO8859_1"), "BIG5"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return encodeFlag;
}
}
//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
return (c > 64 && c < 91) || (c > 96 && c < 123);
}

Categories