How do I remove illegal characters in a subdomain? - java

I'm using Java 6. Using an Amazon AWS library, I'm dynamically creating domains. However, I'm looking for a function that can strip out illegal characters from a subdomain. E.g. if my function were about to create
dave'ssite.mydomain.com
I would like to pass the string "dave'ssite" to some function, which would strip out the apostrophe, or whatever other illegal characters lurked in the subdomain.
How do I do taht? THe more specific quesiton is, how do I identify what the illegal subdomain characters are?

Subdomains are same as Domains, so most likely the allowed characters are A-Z a-z 0-9 and -. There fore you can use Regex.
...
String s = "dave's-site.mydomain.com";
//prints daves-sitemydomaincom
System.out.println(s.replaceAll("[^A-Za-z0-9\\-]",""));
...

Here is some I made for a game. Its basically the same thing except I used it for removing invalid characters from a username.
char[] validChars = {a, b, c, d, etc...};//Put all valid characters in this array, so for subdomains put all letters and numbers
public static String cleanString(String text){
StringBuilder sb = new StringBuilder("");
for(int i = 0;i < text.length();i++){
for (int j = 0; j < validCharslength; j++) {
if (validChars[j] == text.charAt(i)) {
sb.append(text.charAt(i));
break;
}
}
}
return sb.toString();
}
As said my the comment in the code, the char array contains all the valid characters and anything else will be removed. Keep in mind that his is a return method.

I stumbled on here looking for a c# solution to the same problem and well here it is. Look how elegant c# LINQ makes this ;)
if (model.UserName.All(char.IsLetterOrDigit) && !model.UserName.StartsWith("-"))
{
//oh yeah, valid subdomain
}
Not sure what the spec says about length though.

Related

Storing words from a .txt file into a String array

I was going through the answers of this question asked by someone previously and I found them to be very helpful. However, I have a question about the highlighted answer but I wasn't sure if I should ask there since it's a 6 year old thread.
My question is about this snippet of code given in the answers:
private static boolean isAWord(String token)
{
//check if the token is a word
}
How would you check that the token is a word? Would you .contains("\\s+") the string and check to see if it contains characters between them? But what about when you encounter a paragraph? I'm not sure how to go about this.
EDIT: I think I should've elaborated a bit more. Usually, you'd think a word would be something surrounded by " " but, for example, if the file contains a hyphen (which is also surrounded by a blank space), you'd want the isAWord() method to return false. How can I verify that something is actually a word and not punctuation?
Since the question wasn't entirely clear, I made two methods. First method consistsOfLetters just goes through the whole string and returns false if it has any numbers/symbols. This should be enough to determine if a token is word (if you don't mind if that words exists in dictionary or not).
public static boolean consistsOfLetters(String string) {
for(int i=0; i<string.length(); i++) {
if(string.charAt(i) == '.' && (i+1) == string.length() && string.length() != 1) break; // if last char of string is ., it is still word
if((string.toLowerCase().charAt(i) < 'a' || string.toLowerCase().charAt(i) > 'z')) return false;
} // toLowerCase is used to avoid having to compare it to A and Z
return true;
}
Second method helps us divide original String (for example a sentence of potentional words) based on " " character. When that is done, we go through every element there and check if it is a word. If it's not a word it returns false and skips the rest. If everything is fine, returns true.
public static boolean isThisAWord(String string) {
String[] array = string.split(" ");
for(int i = 0; i < array.length; i++) {
if(consistsOfLetters(array[i]) == false) return false;
}
return true;
}
Also, this might not work for English since English has apostrophes in words like "don't" so a bit of further tinkering is needed.
The Scanner in java splits string using his WHITESPACE_PATTERN by default, so splitting a string like "He's my friend" would result in an array like ["He's", "my", "friend"].
If that is sufficient, just remove that if clause and dont use that method.
If you want to make it to "He","is" instead of "He's", you need a different approach.
In short: The method works like verification check -> if the given token is not supposed to be in the result, then return false, true otherwise.
return token.matches("[\\pL\\pM]+('(s|nt))?");
matches requires the entire string to match.
This takes letters \pL and zero-length combining diacritical marks \pM (accents).
And possibly for English apostrophe, should you consider doesn't and let's one term (for instance for translation purposes).
You might also consider hyphens.
There are several single quotes and dashes.
Path path = Paths.get("..../x.txt");
Charset charset = Charset.defaultCharset();
String content = Files.readString(path, charset)
Pattern wordPattern = Pattern.compile("[\\pL\\pM]+");
Matcher m = wordPattern.matcher(content);
while (m.find()) {
String word = m.group(); ...
}

Missing characters in text Java

I have a small string like: "OT-0*02"
and I have a "database" where I have the full strings "OT-0502" and "OT-0602".
I need to locate these based on the user input, which looks like the first string above, the star marks the unknown character, and it could be anywhere.
How do I do this? I've tried fooling around with the Pattern.matches...stuff and the regex thingy but it doesn't seem to give me any solution.
This is what I've got so far
rsz="OT-0*02";
int cv = 0;
while (cv < jarmu.size()-1){
if (Pattern.matches(jarmu.get(cv).substring(9), rsz)) {
System.out.println("fasz");
}
cv++;
}
It sounds like you just want a simple wildcard search where * matches any char. To do that, change all * in the user input to ., and escape everything else. Then you can use it as a pattern.
String userPattern = "O*-0*02";
String[] parts = userPattern.split("\\*", -1);
for (int i = 0; i < parts.length(); i++) parts[i] = "\\Q" + parts[i] + "\\E";
userPattern = String.join(".", parts);
Then use ****.matches(userPattern) to check if a string matches the pattern.
Basically, you want * to mean "match any single char" but in regex, . performs this function, so you replace *s with .s. However, you don't want anything else to be interpreted as a special character, so the string is broken up into parts at the *s, and each part is quoted using \Q and \E to remove any special meaning, and then put together, with .s where the *s used to be. The -1 means *s at the ends of the string won't be lost.

the best way for character replacement in String in java

I want to check a string for each character I replace it with other characters or keep it in the string. and also because it's a long string the time to do this task is so important. what is the best way of these, or any better idea?
for all of them I append the result to an StringBuilder.
check all of the characters with a for and charAt commands.
use switch like the previous way.
use replaceAll twice.
and if one of the first to methods is better is there any way to check a character with a group of characters, like :
if (st.charAt(i)=='a'..'z') ....
Edit:
please tell the less consuming in time way and tell the reason.I know all of these ways you said!
If you want to replace a single character (or a single sequence), use replace(), as other answers have suggested.
If you want to replace several characters (e.g., 'a', 'b', and 'c') with a single substitute character or character sequence (e.g., "X"), you should use a regular expression replace:
String result = original.replaceAll("[abc]", "X");
If you want to replace several characters, each with a different replacement (e.g., 'a' with 'A', 'b' with 'B'), then looping through the string yourself and building the result in a StringBuilder will probably be the most efficient. This is because, as you point out in your question, you will be going through the string only once.
String sb = new StringBuilder();
String targets = "abc";
String replacements = "ABC";
for (int i = 0; i < result.length; ++i) {
char c = original.charAt(i);
int loc = targets.indexOf(c);
sb.append(loc >= 0 ? replacements.charAt(loc) : c);
}
String result = sb.toString();
Check the documentation and find some good methods:
char from = 'a';
char to = 'b';
str = str.replace(from, to);
String replaceSample = "This String replace Example shows
how to replace one char from String";
String newString = replaceSample.replace('r', 't');
Output: This Stting teplace Example shows how to teplace one chat ftom Stting
Also, you could use contains:
str1.toLowerCase().contains(str2.toLowerCase())
To check if the substring str2 exists in str1
Edit.
Just read that the String come from a file. You can use Regex for this. That would be the best method.
http://docs.oracle.com/javase/tutorial/essential/regex/literals.html
This is your comment:
I want to replace all of the uppercases to lower cases and replace all
of the characters except a-z with space.
You can do it like this:
str = str.toLowerCase().replaceAll("[^a-z]", " ");
Your requirement should be part of the question, not in comment #7 under a posted answer...
You should look into regex for Java. You can match an entire set of characters. Strings have several functions: replace, replaceAll, and match, which you may find useful here.
You can match the set of alphanumeric, for instance, using [a-zA-Z], which may be what you're looking for.

Java - charAt with multiple answers

I'm having some trouble with my code, I am trying to test if a position in a char is equal to an integer. The way I have it setup is like so:
for(int i = 0; i < str.length(); i++) {
if(str.charAt(i) == '[1234567890]') {
System.out.println(str);
}
}
However I'm getting the error "unclosed character literal" when I try to compile. Does anyone know why I'm getting an error, or can explain a more simple way to do this?
Try:
if ( Character.isDigit(str.charAt(i)) )
You have to check that every character is a digit to check if your string contains and integer.
'[1234567890]' is not a char. A char is a single character. This is why your code doesn't compile.
You seem to be trying to use regex notation inside a character literal. That won't work. If you want to use a regex, you can write:
if(str.substring(i, i+1).matches("[1234567890]") {
but it's better/simpler/faster/clearer to write:
if(Character.isDigit(str.charAt(i))) {
On the other hand, even once you make this change, your code will print str several times if it contains several digits. Is that really want you want? I wonder if you want something more like this:
if(str.matches("\d+"))
System.out.println(str);
which will print str once if all of its characters are digits.
Your code cannot be compiled because in java character ' ( single quote ) is used to mark one character. In order to define string you should use double quote ".
In your case I believe that you wanted to check whether your string contains digits only and were confused with regular expression syntax you tried to use incorrectly.
You can either rewirte your if statement as following:
char c = str.charAt(i);
if(c>= '0' && c <= 9) {
or use pattern matching, e.g.
Pattern.compile("\\d+").matcher(str).matches()
In this case you even do not need to implement any loop.
I think you are trying to write something simple like this:
for(int i = 0; i < str.length(); i++) {
//check str.charAt(i) is one of the chars in 1234567890
if("1234567890".indexOf(str.charAt(i))>=0) {
System.out.println(str.charAt(i));
}
}

char[] to String sequence mismatching in Java for Unicode characters

I have a method like below (please ignore the code optimization issue.) This method replaces the Unicode character (Bengali characters)
static String swap(String temp, char c)
{
Integer length=temp.length();
char[] charArray = temp.toCharArray();
for(int u=0;u<length;u++)
{
if(charArray[u]==c)
{
char g=charArray[u];
charArray[u]=charArray[u-1];
charArray[u-1]=g;
}
}
String string2 = new String(charArray);
return string2;
}
while debugging, i got the values of charArray like the below image:
please note that the characters are in a sequenced format what I want. But after the execution of the statement, the value stored in String variable is mismatched. like below:
I want to display the string as "রেরেরে" but it is displaying "েরেরের" what i not want. Please tell me what I am doing wrong.
Note - I don't know Bengali, but I know a bit (or a lot, depending on whom you ask) about Unicode and how Java supports it. The answer assumes knowledge of the latter and not the former.
Going by the Unicode 6.0 Bengali chart, রে is a combination of the dependent vowel sign ে (0x09C7) and the consonant র (0x09B0) and is represented as a sequence of two characters in the character array.
If you are getting the dependent vowel sign alone, in the resulting character sequence (and hence the string), then your optimization is likely to be kooky, as it appears to assume that Bengali characters in Unicode can be represented as a single Unicode codepoint or a single char variable in Java; this would result in the scenario where a consonant would be replaced by another consonant, but the dependent vowel preceding the consonant would never be replaced.
I think a correct optimization must therefore consider the presence of dependent vowels, and compare the following consonant in addition to the vowel , i.e. it must compare two characters in the character array, instead of comparing individual characters. This might also imply that your method signature must be changed to allow for a char[] to be passed, instead of a single char, so that Bengali characters can be replaced with the intended Bengali character, instead of replacing a Unicode codepoint with another, which is what is being done currently.
The notes in other answers on the ArrayIndexOutofBoundsException is valid. The following example that uses your character replacement algorithm demonstrates that not only is your algorithm incorrect, but it is quite possible for the exception to be thrown:
class CodepointReplacer
{
public static void main(String[] args)
{
String str1 = "রেরেরে";
/*
* The following is a linguistically invalid sequence,
* but Java does not concern itself with linguistical correctness
* if the String or char sequence has been constructed incorrectly.
*/
String str2 = "েরেরের";
/*
* replacement character র for our strings
* It is not রে as one would anticipate.
*/
char c = str1.charAt(1);
optimizeKookily(str1, c);
optimizeKookily(str2, c);
}
private static void optimizeKookily(String temp, char c)
{
Integer length = temp.length();
char[] charArray = temp.toCharArray();
for (int u = 0; u < length; u++)
{
if (charArray[u] == c)
{
char g = charArray[u];
charArray[u] = charArray[u - 1]; //throws exception on second invocation of this method.
charArray[u - 1] = g;
}
}
}
}
A better character replacement strategy would therefore be to use the String.replace (the CharSequence variant) or String.replaceAll functions, assuming that you would know how to use these with Bengali characters.
problem is in
for(int u=0;u<length;u++)
{
if(charArray[u]==c)
{
char g=charArray[u];
charArray[u]=charArray[u-1];
charArray[u-1]=g;
}
}
See when u=0 what is the value of charArray[u-1] that is the index -1.Modify your for loop or just put the condition where u=0.
Your code will cause an IndexOutOfBound Exception.
When u=0, charArray[u-1]=-1.

Categories