How To check whether the string of characters are ASCII in Java - java

In order to check whether the string of characters are ASCII or not. Which one of the below is better choice ?
java.nio.charset.Charset.forName("US-ASCII").newEncoder().canEncode("Desire
character string to be checked")or
Convert the String to character array and use :
org.apache.commons.lang.CharUtils.isAscii() method to check whether ASCII.
What are their differences, and which one is good performance wise. I know for the second option there is additional step of converting string to the character array first and then, need to check each character.

You can use regex as a quick shortcut.
String asciiText = "Hello";
System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
this will check only ASCII characters.
Regards.

Related

Regular expression matching Unicode and Number string

Problem Description
I have string "Վիկտոր1 Ափոյան2" using regular expression I want to get first letters of both strings. So as a result I will have "ՎԱ" As string is unicode I'm musing following regex:
"(\\p{L})\\p{L}*\\s(\\p{L})\\p{L}*
Which works fine if string does not contains numbers "1", "2", to get result I also tried with following regex:
"(\\p{L}\\p{N})\\p{L}\\p{N}*\\s(\\p{L}\\p{N})\\p{L}\\p{N}*
but this does not work correct.
Is there a something like "\\p{LN}" which will check for Unicode letters and numbers at the same time, or anyone knows how I can solve this issue?
Is there a something like "\p{LN}" which will check for Unicode letters and numbers at the same time
Use a character class [\p{L}\p{N}] that will match either a Unicode letter or a digit.
Alternatively use \p{Alnum} with a Pattern.UNICODE_CHARACTER_CLASS flag (or prepend the pattern with (?U)).

Java - method equals and more possible letters

I am doing a cipher program, so I am replacing letters with other letters.
Here comes my problem:
I need to replace ONLY letter, not special symbols so I am checking if selected character is letter or not. I need to use equals method with more possible letters to get "true"
I have:
if(pismenka[i].equals(["abcdefghijklmnopqrstuvwxyz"]))
but it doesn't work at all, it was only idea. Do I have to use || symbol or there is any more clear solution?
Thank you, AliFox.
For identifying characters you can use the following (Java Character):
Character.isLetter(<target_char>)
And if replacement is to be done, following would help you replace characters from a-z:
<target_string>.replaceAll("[a-zA-Z]",<replacement>)
Use the String's replaceAll method, that takes a regex as a first argument.
Assuming you want to replace all non-special characters with a String replacement, the command would be:
pismenka[i].replaceAll("[a-z]",replacement);
Then, you don't need another explicit check if your String matches the regex. It is done inside this method. If your String does not contain any non-special characters, it is left intact.
Character#equals(Object) checks whether or not your character is equal to the array (not to any its element). They are obviously not equal.
If you need to check that your character is a lower-case latin then use this check
c >= 'a' && c <= 'z'.

How do I get the 5th word in a string? Java

Say I have a string of a text document and I want to save the 124th word of that string in another string how would I do this? I assume it counts each "word" as a string of text in between spaces (including things like - hyphens).
Edit:
Basically what I'm doing right now is grabbing text off a webpage and I want to get number values next to a certain word. Like here is the webpage, and its saved in a string .....health 78 mana 32..... or something of the sort and i want to get the 78 and the 32 and save it as a variable
If you have a string
String s = "...";
then you can get the word (separated by spaces) in the nth position using split(delimiter) which returns an array String[]:
String word = s.split("\\s+")[n-1];
Note:
The argument passed to split() is the delimiter. In this case, "\\s+" is a regular expression, that means that the method will split the string on each whitespace, even if there are more than one together.
Why not convert the String to a String array using StringName.split(" "), i.e. split the string based on spaces. Then only a matter of retrieving the 124th element of the array.
For example, you have a string like this:
String a="Hello stackoverflow i am Gratin";
To see 5th word, just write that code:
System.out.println(a.split("\\s+")[4]);
This is a different approach that automatically returns a blank String if there isn't a 5th word:
String word = input.replaceAll("^\\s*(\\S+\\s+){4}(\\S+)?.*", "$1");
Solutions that rely on split() would need an extra step of checking the resulting array length to prevent getting an ArrayIndexOutOfBoundsException if there are less than 5 words.

How to encode a string to replace all special characters

I have a string which contains special character. But I have to convert the string into a string without having any special character so I used Base64 But in Base64 we are using equals to symbol (=) which is a special character. But I want to convert the string into a string which will have only alphanumerical letters. Also I can't remove special character only i have to replace all the special characters to maintain unique between two different strings. How to achieve this, Which encoding will help me to achieve this?
The simplest option would be to encode the text to binary using UTF-8, and then convert the binary back to text as hex (two characters per byte). It won't be terribly efficient, but it will just be alphanumeric.
You could use base32 instead to be a bit more efficient, but that's likely to be significantly more work, unless you can find a library which supports it out of the box. (Libraries to perform hex encoding are very common.)
There are a number of variations of base64, some of which don't use padding. (You still have a couple of non-alphanumeric characters for characters 62 and 63.)
The Wikipedia page on base64 goes into the details, including the "standard" variations used for a number of common use-cases. (Does yours match one of those?)
If your strings have to be strictly alphanumeric, then you'll need to use hex encoding (one byte becomes 2 hex digits), or roll your own encoding scheme. Your stated requirements are rather unusual ...
Commons codec has a url safe version of base64, which emits - and _ instead of + and / characters
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html#encodeBase64URLSafe(byte[])
The easiest way would be to use a regular expression to match all nonalphanumeric characters and replace them with an empty string.
// This will remove all special characters except space.
var cleaned = stringToReplace.replace(/[^\w\s]/gm, '')
Adding any special characters to the above regex will skip that character.
// This will remove all special characters except space and period.
var cleaned = stringToReplace.replace(/[^\w\s.]/gm, '')
A working example.
const regex = /[^\w\s]/gm;
const str = `This is a text with many special characters.
Hello, user, your password is 543#!\$32=!`;
const subst = ``;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Regex explained.
[^\w\s]/gm
Match a single character not present in the list below [^\w\s]
\w matches any word character (equivalent to [a-zA-Z0-9_])
\s matches any whitespace character (equivalent to [\r\n\t\f\v \u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff])
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
If you truly can only use alphanumerical characters you will have to come up with an escaping scheme that uses one of those chars for example, use 0 as the escape, and then encode the special char as a 2 char hex encoding of the ascii. Use 000 to mean 0.
e.g.
This is my special sentence with a 0.
encodes to:
This020is020my020special020sentence020with020a02000002e

Regular Expression to match more than one occurrence of a character

I need help coming up with a regular expression to match if a string has more than one occurrence of character. I already validated the length of the two strings and they will always be equal. Heres what i mean, for example. The string "aab" and "abb". These two should match the regular expression because they have repeating characters, the "aa" in the first string and the "bb" in the second.
Since you say "aba"-style repetition doesn't count, back-references should make this simple:
(.)\1+
Would find sequences of characters. Try it out:
java.util.regex.Pattern.compile("(.)\\1+").matcher("b").find(); // false
java.util.regex.Pattern.compile("(.)\\1+").matcher("bbb").find(); // true
If you're checking anagrams maybe a different algorithm would be better.
If you sort your strings (both the original and the candidate), checking for anagrams can be done with a string comparison.
static final String REGEX_MORE_THAN_ONE_OCCURANCE_OF_B = "([b])\\1{1,}";
static final String REGEX_MORE_THAN_ONE_OCCURANCE_OF_B_AS_PREFIX_TO_A = "(b)\\1+([a])";

Categories