removing invalid XML characters from a string in java

removing invalid XML characters from a string in java - java

Hi
i would like to remove all invalid XML characters from a string.
i would like to use a regular expression with the string.replace method.
like
line.replace(regExp,"");
what is the right regExp to use ?
invalid XML character is everything that is not this :
[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
thanks.

Java's regex supports supplementary characters, so you can specify those high ranges with two UTF-16 encoded chars.
Here is the pattern for removing characters that are illegal in XML 1.0:
// XML 1.0
// #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
String xml10pattern = "[^"
+ "\u0009\r\n"
+ "\u0020-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]";
Most people will want the XML 1.0 version.
Here is the pattern for removing characters that are illegal in XML 1.1:
// XML 1.1
// [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
String xml11pattern = "[^"
+ "\u0001-\uD7FF"
+ "\uE000-\uFFFD"
+ "\ud800\udc00-\udbff\udfff"
+ "]+";
You will need to use String.replaceAll(...) and not String.replace(...).
String illegal = "Hello, World!\0";
String legal = illegal.replaceAll(pattern, "");

Should we consider surrogate characters? otherwise '(current >= 0x10000) && (current <= 0x10FFFF)' will never be true.
Also tested that the regex way seems slower than the following loop.
if (null == text || text.isEmpty()) {
return text;
}
final int len = text.length();
char current = 0;
int codePoint = 0;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < len; i++) {
current = text.charAt(i);
boolean surrogate = false;
if (Character.isHighSurrogate(current)
&& i + 1 < len && Character.isLowSurrogate(text.charAt(i + 1))) {
surrogate = true;
codePoint = text.codePointAt(i++);
} else {
codePoint = current;
}
if ((codePoint == 0x9) || (codePoint == 0xA) || (codePoint == 0xD)
|| ((codePoint >= 0x20) && (codePoint <= 0xD7FF))
|| ((codePoint >= 0xE000) && (codePoint <= 0xFFFD))
|| ((codePoint >= 0x10000) && (codePoint <= 0x10FFFF))) {
sb.append(current);
if (surrogate) {
sb.append(text.charAt(i));
}
}
}

All these answers so far only replace the characters themselves. But sometimes an XML document will have invalid XML entity sequences resulting in errors. For example, if you have  in your xml, a java xml parser will throw Illegal character entity: expansion character (code 0x2 at ....
Here is a simple java program that can replace those invalid entity sequences.
public final Pattern XML_ENTITY_PATTERN = Pattern.compile("\\&\\#(?:x([0-9a-fA-F]+)|([0-9]+))\\;");
/**
* Remove problematic xml entities from the xml string so that you can parse it with java DOM / SAX libraries.
*/
String getCleanedXml(String xmlString) {
Matcher m = XML_ENTITY_PATTERN.matcher(xmlString);
Set<String> replaceSet = new HashSet<>();
while (m.find()) {
String group = m.group(1);
int val;
if (group != null) {
val = Integer.parseInt(group, 16);
if (isInvalidXmlChar(val)) {
replaceSet.add("&#x" + group + ";");
}
} else if ((group = m.group(2)) != null) {
val = Integer.parseInt(group);
if (isInvalidXmlChar(val)) {
replaceSet.add("&#" + group + ";");
}
}
}
String cleanedXmlString = xmlString;
for (String replacer : replaceSet) {
cleanedXmlString = cleanedXmlString.replaceAll(replacer, "");
}
return cleanedXmlString;
}
private boolean isInvalidXmlChar(int val) {
if (val == 0x9 || val == 0xA || val == 0xD ||
val >= 0x20 && val <= 0xD7FF ||
val >= 0x10000 && val <= 0x10FFFF) {
return false;
}
return true;
}

Jun's solution, simplified. Using StringBuffer#appendCodePoint(int), I need no char current or String#charAt(int). I can tell a surrogate pair by checking if codePoint is greater than 0xFFFF.
(It is not necessary to do the i++, since a low surrogate wouldn't pass the filter. But then one would re-use the code for different code points and it would fail. I prefer programming to hacking.)
StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
int codePoint = text.codePointAt(i);
if (codePoint > 0xFFFF) {
i++;
}
if ((codePoint == 0x9) || (codePoint == 0xA) || (codePoint == 0xD)
|| ((codePoint >= 0x20) && (codePoint <= 0xD7FF))
|| ((codePoint >= 0xE000) && (codePoint <= 0xFFFD))
|| ((codePoint >= 0x10000) && (codePoint <= 0x10FFFF))) {
sb.appendCodePoint(codePoint);
}
}

String xmlData = xmlData.codePoints().filter(c -> isValidXMLChar(c)).collect(StringBuilder::new,
StringBuilder::appendCodePoint, StringBuilder::append).toString();
private boolean isValidXMLChar(int c) {
if((c == 0x9) ||
(c == 0xA) ||
(c == 0xD) ||
((c >= 0x20) && (c <= 0xD7FF)) ||
((c >= 0xE000) && (c <= 0xFFFD)) ||
((c >= 0x10000) && (c <= 0x10FFFF)))
{
return true;
}
return false;
}

From Mark McLaren's Weblog
/**
* This method ensures that the output String has only
* valid XML unicode characters as specified by the
* XML 1.0 standard. For reference, please see
* <a href="http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char">the
* standard</a>. This method will return an empty
* String if the input is null or empty.
*
* #param in The String whose non-valid characters we want to remove.
* #return The in String, stripped of non-valid characters.
*/
public static String stripNonValidXMLCharacters(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in))) return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
if ((current == 0x9) ||
(current == 0xA) ||
(current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}

From Best way to encode text data for XML in Java?
String xmlEscapeText(String t) {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < t.length(); i++){
char c = t.charAt(i);
switch(c){
case '<': sb.append("<"); break;
case '>': sb.append(">"); break;
case '\"': sb.append("""); break;
case '&': sb.append("&"); break;
case '\'': sb.append("&apos;"); break;
default:
if(c>0x7e) {
sb.append("&#"+((int)c)+";");
}else
sb.append(c);
}
}
return sb.toString();
}

If you want to store text elements with the forbidden characters in XML-like form, you can use XPL instead. The dev-kit provides concurrent XPL to XML and XML processing - which means no time cost to the translation from XPL to XML. Or, if you don't need the full power of XML (namespaces), you can just use XPL.
Web Page: HLL XPL

I believe that the following articles may help you.
http://commons.apache.org/lang/api-2.1/org/apache/commons/lang/StringEscapeUtils.html
http://www.javapractices.com/topic/TopicAction.do?Id=96
Shortly, try to use StringEscapeUtils from Jakarta project.

Related

Java AWS Serverless Lambda [duplicate]

I am using Jackson version 2.4.3 for converting my complex Java object into a String object, so below is what I'm getting in output. The output is like below (Fyi - I just printed some part of the output)
"{\"FirstName\":\"John \",\"LastName\":cena,\"salary\":7500,\"skills\":[\"java\",\"python\"]}";
Here is my code (PaymentTnx is a complex Java object)
ObjectMapper mapper = new ObjectMapper();
mapper.setVisibility(PropertyAccessor.FIELD, Visibility.ANY);
String lpTransactionJSON = mapper.writeValueAsString(paymentTxn);
I don't want to see \ slashes in my JSON string. What do I need to do to get a string like below:
"{"FirstName":"John ","LastName":cena,"salary":7500,"skills":["java","python"]}";

String test = "{\"FirstName\":\"John \",\"LastName\":cena,\"salary\":7500,\"skills\":[\"java\",\"python\"]}";
System.out.println(StringEscapeUtils.unescapeJava(test));
This might help you.

I have not tried Jackson. I just have similar situation.
I used org.apache.commons.text.StringEscapeUtils.unescapeJson but it's not working for malformed JSON format like {\"name\": \"john\"}
So, I used this class. Perfectly working fine.
https://gist.githubusercontent.com/jjfiv/2ac5c081e088779f49aa/raw/8bda15d27c73047621a94359492a5a9433f497b2/JSONUtil.java
// BSD License (http://lemurproject.org/galago-license)
package org.lemurproject.galago.utility.json;
public class JSONUtil {
public static String escape(String input) {
StringBuilder output = new StringBuilder();
for(int i=0; i<input.length(); i++) {
char ch = input.charAt(i);
int chx = (int) ch;
// let's not put any nulls in our strings
assert(chx != 0);
if(ch == '\n') {
output.append("\\n");
} else if(ch == '\t') {
output.append("\\t");
} else if(ch == '\r') {
output.append("\\r");
} else if(ch == '\\') {
output.append("\\\\");
} else if(ch == '"') {
output.append("\\\"");
} else if(ch == '\b') {
output.append("\\b");
} else if(ch == '\f') {
output.append("\\f");
} else if(chx >= 0x10000) {
assert false : "Java stores as u16, so it should never give us a character that's bigger than 2 bytes. It literally can't.";
} else if(chx > 127) {
output.append(String.format("\\u%04x", chx));
} else {
output.append(ch);
}
}
return output.toString();
}
public static String unescape(String input) {
StringBuilder builder = new StringBuilder();
int i = 0;
while (i < input.length()) {
char delimiter = input.charAt(i); i++; // consume letter or backslash
if(delimiter == '\\' && i < input.length()) {
// consume first after backslash
char ch = input.charAt(i); i++;
if(ch == '\\' || ch == '/' || ch == '"' || ch == '\'') {
builder.append(ch);
}
else if(ch == 'n') builder.append('\n');
else if(ch == 'r') builder.append('\r');
else if(ch == 't') builder.append('\t');
else if(ch == 'b') builder.append('\b');
else if(ch == 'f') builder.append('\f');
else if(ch == 'u') {
StringBuilder hex = new StringBuilder();
// expect 4 digits
if (i+4 > input.length()) {
throw new RuntimeException("Not enough unicode digits! ");
}
for (char x : input.substring(i, i + 4).toCharArray()) {
if(!Character.isLetterOrDigit(x)) {
throw new RuntimeException("Bad character in unicode escape.");
}
hex.append(Character.toLowerCase(x));
}
i+=4; // consume those four digits.
int code = Integer.parseInt(hex.toString(), 16);
builder.append((char) code);
} else {
throw new RuntimeException("Illegal escape sequence: \\"+ch);
}
} else { // it's not a backslash, or it's the last character.
builder.append(delimiter);
}
}
return builder.toString();
}
}

With Jackson do:
toString(paymentTxn);
with
public String toString(Object obj) {
try (StringWriter w = new StringWriter();) {
new ObjectMapper().configure(SerializationFeature.INDENT_OUTPUT, true).writeValue(w, obj);
return w.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}

This here is not valid JSON:
"{"FirstName":"John ","LastName":cena,"salary":7500,"skills":["java","python"]}";
This here is valid JSON, specifically a single string value:
"{\"FirstName\":\"John \",\"LastName\":cena,\"salary\":7500,\"skills\":[\"java\",\"python\"]}";
Given that you're calling writeValueAsString, this is the correct behaviour. I would suggest writeValue, perhaps?

Why does this method always return false?

I would like to evaluate a phone number using the provided method. The phone number should always have a length of 10. However the following method always seems to return false. Why is that? Thanks.
public static boolean valPhoneNumber(String phonenumber){
boolean result= true;
if (phonenumber.length() > 10 || phonenumber.length() < 10){
result= false;
}else
phonenumber.length();
char a=phonenumber.charAt(0);
char b=phonenumber.charAt(1);
char d=phonenumber.charAt(3);
char e=phonenumber.charAt(4);
char f=phonenumber.charAt(5);
if (a<2 || a>9){
result = false;
}else if( b<0 || b>8){
result = false;
}else if (d<2 || d>9){
result = false;
}else if (e==1 && f==1){
result = false;
}
return result;
}

So looking into your ladder which is comparing character to number. In this case the comparison will happen with ASCII value.
You can put single quotes to check the range:
if (a < '2' || a > '9') {
result = false;
} else if( b < '0' || b > '8') {
result = false;
} else if (d < '2' || d > '9') {
result = false;
} else if (e == '1' && f == '1') {
result = false;
}
One liner:
result = !((a < '2' || a > '9') || (b < '0' || b > '8') || (d < '2' || d > '9') || (e == '1' && f == '1'));

I think your code wrong at the parsing phonenumber.charAt(). This always return char, and when you do comparision with integer it will convert to number which present to that char code (ASCII code). I think you should modify your code to int a=Character.getNumericValue(phonenumber.charAt(0)); and so on

I think an approach with regex here would be the cleanest and easiest solution.
public static boolean valPhoneNumber(String phonenumber){
String regex = "[2-9][0-8][0-9][2-9][02-9][0-29][0-9]{4}";
return phonenumber.matches(regex);
}

You should cast the char variables to integer.

you can try this:
int a = Integer.parseInt(phonenumber.substring(0,1));

I added single quotes to check the range. Thank you all.
public static boolean valPhoneNumber(String phonenumber) {
boolean result= true;
if (phonenumber.length() != 10) {
result = false;
} else {
//phonenumber.length();
char a = phonenumber.charAt(0);
char b = phonenumber.charAt(1);
char d = phonenumber.charAt(3);
char e = phonenumber.charAt(4);
char f = phonenumber.charAt(5);
if (a < '2' || a > '9') {
} else if( b<'0' || b>'8') {
result = false;
} else if (d < '2' || d > '9') {
result = false;
} else if (e == '1' && f == '1') {
result = false;
}
}
return result;
}

Errors When Running Java from Command Prompt

I am having issues with running a java program from Command Prompt. I have a java file called DataRecover, and I have a second java file that is called Triple. Now, when I run javac Triple.java in Command Prompt, it does what it is supposed to. However, when I run javac DataRecover.java, it comes with this error message: "Exception in thread "main" java.lang.NoClassDefFoundError: DataRecover (wrong name: projectbeng\DataRecover
DataRecover.java:61: error: cannot find symbol
static Triple extractTriples(String str) {
^
symbol: class Triple
location: class DataRecover
DataRecover.java:30: error: cannot find symbol
Triple triples = extractTriples(line);
^
symbol: class Triple
location: class DataRecover
EDIT: I have included both classes. I have now been able to run the javac command, and there is a CLASS file for each in the proper folder. Now, I need to run the DataRecover file in Command Prompt. When I run "java DataRecover" I get the following error: "Exception in thread "main" java.lang.NoClassDefFoundError: DataRecover (wrong name: projectbeng\DataRecover)".
package projectbeng;
import java.util.Scanner;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
public class DataRecover {
public static void main(String[] args) throws Exception {
//Create a Scanner for the user
Scanner sc = new Scanner(System.in);
System.out.print("Enter file name to process: ");
File fileName = new File(sc.nextLine() + ".txt"); //Do not include the .txt extension
if(!fileName.exists()){ //does not exist
throw new IOException("File \"" + fileName + "\" not found.");
}
System.out.println("\nProcessing file: " + fileName + "\n----------------------------------------");
BufferedReader br = new BufferedReader(new FileReader(fileName));
int lineCount = 0; //assumes file does not end with a new line character
int tripleLineCount = 0;
int tripleCount = 0;
String line = "";
//Read data from file
while((line = br.readLine()) != null){ //has another line in the file
lineCount++;
if(!line.equals("")) { //is not a blank line
Triple triples = extractTriples(line);
if(triples.getHasTriple()) { //line contains triples
System.out.println(triples.getTriples());
tripleLineCount++;
}
for(int j = 0; j < triples.getTriples().length(); j++) {
if(triples.getTriples().charAt(j) == '(') tripleCount++;
}
}
}
//prints out the summary of the file
System.out.println("\nSummary\n----------------------------------------");
System.out.println("Total lines: " + lineCount);
System.out.println("Lines containing triples: " + tripleLineCount);
System.out.println("Total number of triples: " + tripleCount);
}
/*Given a string, returns a Triple with a string containing the triples (if any) and a boolean stating whether
or not it contains a triple.
Assumptions:
1.) If a '-' sign is found, it has been added. If preceeding a number (for example -32), the number is 32 where
the '-' sign is simply garbage.
2.) If a '.' is found in a number (for example 2.32), the potential integers are 2 and 32 where the '.' is
garbage.
3.) For part c, if the first valid character found is a letter, this will always be the real triple. It does not
matter whether or not it is part of a word (for example, if it comes across "Dog a", 'D' will be the triple.)
4.) The strings "<null>", "<cr>", "<lf>", and "<eof>" as well as multi-digit numbers (ex. 32) count as single
characters. Thus, they cannot be broken up (no garbage in between the characters).
*/
static Triple extractTriples(String str) {
/*Grammar:
Triple is in form (a,b,c) where a is either a non-negative integer or the string "<null>", b is a
non-negative integer where b <= a (b must be 0 if a is <null>), and c is either an individual letter
(upper or lower case), period, colon, semicolon, or one of the three strings "<cr>", "<lf>", or "<eof>".
state == 0 ==> needs left parenthesis
state == 1 ==> needs right parenthesis
state == 2 ==> needs comma
state == 3 ==> needs a
state == 4 ==> needs b
state == 5 ==> needs c
*/
int state = 0;
int a = -1;
int b = -1;
String triples = "";
String tempTriples = "";
for(int i = 0; i < str.length(); i++) {
if(str.charAt(i) == '.' || str.charAt(i) == ':' || str.charAt(i) == ';' || str.charAt(i) == '<' ||
(str.charAt(i) >= 'a' && str.charAt(i) <= 'z') || (str.charAt(i) >= 'A' && str.charAt(i) <= 'Z')
|| (str.charAt(i) >= '0' && str.charAt(i) <= '9') || str.charAt(i) == ',' ||
str.charAt(i) == '(' || str.charAt(i) == ')') {
if(state == 0) {
if(str.charAt(i) == '(') {
tempTriples = str.substring(i, i+1);
state = 3;
}
}else if(state == 1) {
if(str.charAt(i) == ')') {
triples = triples + tempTriples + str.substring(i, i+1) + " ";
tempTriples = "";
state = 0;
a = -1;
b = -1;
}
}else if(state == 2) {
if(str.charAt(i) == ',') {
tempTriples = tempTriples + str.substring(i, i+1);
if(b != -1) state = 5;
else state = 4;
}
}else if(state == 3) {
if(str.charAt(i) >= '0' && str.charAt(i) <= '9') {
int j = i;
while(j < str.length() && str.charAt(j) >= '0' && str.charAt(j) <= '9') j++;
a = Integer.parseInt(str.substring(i, j));
i = j - 1;
tempTriples = tempTriples + a;
state = 2;
}else if(str.length() > i + 5 && str.substring(i, i+6).equals("<null>")) {
a = 0;
tempTriples = tempTriples + str.substring(i, str.indexOf(">", i)+1);
i = str.indexOf(">", i);
state = 2;
}
}else if(state == 4) {
if(str.charAt(i) >= '0' && str.charAt(i) <= '9') {
int j = i;
while(j < str.length() && str.charAt(j) >= '0' && str.charAt(j) <= '9') j++;
b = Integer.parseInt(str.substring(i, j));
i = j - 1;
if(b <= a) {
tempTriples = tempTriples + b;
state = 2;
}else b = -1;
}
}else if(state == 5) {
if(str.charAt(i) == '.' || str.charAt(i) == ':'||(str.charAt(i) <= 'z' && str.charAt(i) >= 'a')
|| str.charAt(i) == ';' || (str.charAt(i) <= 'Z' && str.charAt(i) >= 'A')) {
tempTriples = tempTriples + str.substring(i, i+1);
state = 1;
}else if((str.length() > i + 4 && str.substring(i, i+5).equals("<eof>")) ||
(str.length() > i + 3 && (str.substring(i, i+4).equals("<cr>") ||
str.substring(i, i+4).equals("<lf>")))) {
tempTriples = tempTriples + str.substring(i, str.indexOf(">", i)+1);
i = str.indexOf(">", i);
state = 1;
}else if(str.length() > i + 5 && str.substring(i, i+6).equals("<null>")) {
i = str.indexOf(">", i);
}
}
}
}
Triple triple = new Triple(true, triples);
if(triples.equals("")) triple.setHasTriple(false); //does not contain a triple
return triple;
}
package projectbeng;
class Triple {
boolean hasTriple = this.hasTriple;
String triple = this.triple;
//creates a new Triple
Triple(boolean newHasTriple, String newTriple){
this.hasTriple = newHasTriple;
this.triple = newTriple;
}
//returns whether or not Triple contains any triples
boolean getHasTriple() {
return hasTriple;
}
//returns the triples in Triple
String getTriples() {
return triple;
}
//changes the state of whether a Triple contains triples
void setHasTriple(boolean newHasTriple){
this.hasTriple = newHasTriple;
}
}
What is the proper way to run the DataRecover file through Command Prompt?

When you are referencing the source files in other files, you have to give all those files together. In your case it should be:
javac Triple.java DataRecover.java
Many modern Java projects use build tools to help with the management of source files. Two popular Java build tools are Gradle and Maven.

Incorrect logic comparing strings?

This program is supposed to compare "DNA" strings.
Input:
3
ATGC
TACG
ATGC
CGTA
AGQ
TCF
First line represents how many times the program will be run. Each time it runs, it compares the two strings. A matches with T and vice versa. G matches with C and vise versa. So if the first letter of string 1 is A, the first letter of string 2 should be T. If the next one is T, the next one on the other string should be A and etc. If a letter other than A, T, G, or C appear, it is a bad sample. If its bad, print out bad, if its good, print out good. I tried many different combinations to this and they all worked fine but according the the judge's test data (they have different input), it failed. Does anyone see anything wrong with this? I know it might not be the most efficient way of getting the job done but it did, at least to my understanding.
Output:
GOOD
BAD
BAD
public class DNA
{
public static void main(String[] args) throws IOException
{
Scanner scan = new Scanner (new File ("dna.dat"));
int T = scan.nextInt();
scan.nextLine();
boolean valid = true;
for (int i = 0; i < T; i++)
{
String strand1 = scan.nextLine();
strand1 = strand1.toUpperCase();
String strand2 = scan.nextLine();
strand2 = strand2.toUpperCase();
for (int p = 0; p < strand1.length(); p++)
{
if (strand1.charAt(p) != 'A' && strand1.charAt(p) != 'T' && strand1.charAt(p) != 'G' && strand1.charAt(p) != 'C'
&& strand2.charAt(p) != 'A' && strand2.charAt(p) != 'T' && strand2.charAt(p) != 'G' && strand2.charAt(p) != 'C')
{
valid = false;
break;
}
if (strand1.length() != strand2.length())
{
valid = false;
break;
}
}
if (valid)
{
for (int p = 0; p < strand1.length(); p++)
{
if ((strand1.charAt(p) == 'A' && strand2.charAt(p) == 'T') || (strand1.charAt(p) == 'T' && strand2.charAt(p) == 'A')
|| (strand1.charAt(p) == 'G' && strand2.charAt(p) == 'C') || (strand1.charAt(p) == 'C' && strand2.charAt(p) == 'G'))
valid = true;
else
valid = false;
}
}
if (valid)
out.println("GOOD");
else
out.println("BAD");
valid = true;
}
}
}
I added the toUpperCase and compared the strings for equal length just as a last attempt to see if their data maybe had some lowercase letters or different length strings though they SHOULD all be the same length and uppercase. Nevertheless, the program was still rejected for "failing the judges test data."

You need a break in the second for loop when valid = false. For example if characters 1,2,3 are wrong but #4 is a match you will still end up with valid.
I would convert the strings to arrays to make things easier:
for (int i = 0; i < T; i++)
{
boolean valid = true;
String strand1 = scan.nextLine();
strand1 = strand1.toUpperCase();
String strand2 = scan.nextLine();
strand2 = strand2.toUpperCase();
if ( strand1.length() != strand2.length())
{
valid = false;
}
if (valid) {
char[] c1 = strand1.toCharArray();
char[] c2 = strand2.toCharArray();
for (int p = 0; p < c1.length; p++)
{
if (-1 == "ACTG".indexOf(c1[p]) || -1 == "ACTG".indexOf(c2[p]))
{
valid = false;
break;
}
}
if (valid)
{
for (int p = 0; p < c1.length; p++)
{
if (('A' == c1[p] && 'T' != c2[p]) ||
('T' == c1[p] && 'A' != c2[p]) ||
('C' == c1[p] && 'G' != c2[p]) ||
('G' == c1[p] && 'C' != c2[p])) {
valid = false;
break;
}
}
}
}
if (valid)
System.out.println("GOOD");
else
System.out.println("BAD");
}

Change all
&&
in
if (strand1.charAt(p) != 'A' && strand1.charAt(p) != 'T' && strand1.charAt(p) != 'G' && strand1.charAt(p) != 'C' && strand2.charAt(p) != 'A' && strand2.charAt(p) != 'T' && strand2.charAt(p) != 'G' && strand2.charAt(p) != 'C')
to
||
if ANY, not ALL character is other than A, T, G, or C, then we exit the loop.

Using replace to take a character and change it to another in java

I am trying to figure out how to remove certain characters to make it English after it being in l33t speak. For example, I 54w 3 5hip5, would translate to I saw 3 ships. I need the 3 to stay a 3 here but in, N3v3r f0rg37 y0|_|r t0w31, I would need the 3's to become e's. Here is my code as follows. All the characters translate over correctly, but I just can't figure out how to do the 3's to e's.
My question is, what is needed to be added to get the 3's to be e's at a certain time, and to have my 3's stay 3's another time. Just so that you know, is that we aren't allowed to use regex, arrays, or string builder for this.
Rules are that if the number is supposed to be a number that it stays a number when you translate it from l33t to English, if the l33t number is a letter than you replace the number and turn it into the letter that corresponds to it.
I also have a different block of code that already takes into consideration the 3 to e's, but instead adds two u's instead of one.
Here are the replacements for the letters, a = 4, b = 8, e = 3, l = 1, o = 0, s = 5, t = 7, u = |_|, z = 2.
I decided to go the route of mike's answer since I understand exactly what's going on.
Thanks to everyone for the help!

Input/Output examples
This following code translates
I 54w 3 5hip5
to
I saw 3 ships
and
3 5hip5 4r3 c0ming m3 w4y
to
3 ships are coming me way
Code
public static String translateToEnglish(String phrase) {
if (phrase == null)
return null;
boolean threeAtBeginning = false, threeAtEnd = fal;
if (phrase.charAt(0) == '3' && phrase.charAt(1) == ' ')
threeAtBeginning = true;
int length = phrase.length();
if (phrase.charAt(length - 1) == '3' && phrase.charAt(length - 2) == ' ')
threeAtEnd = true;
String finished = phrase.replace('4', 'a') .replace('1', 'l') .replace('2', 'z') .replace('5', 's') .replace('8', 'b') .replace('0', 'o') .replace('7', 't') .replace("|_|", "u") .replace("3", "e");
finished = finished.replace(" e ", " 3 ");
if (threeAtBeginning)
finished = '3' + finished.substring(1);
if (threeAtEnd)
finished = finished.substring(0, length - 1) + '3';
return finished;
}

This is clearly homework, and the restrictions are clearly intended to prevent any sane solution, but here's an O(n^2) solution that seems to avoid the restrictions:
public class RemoveL33t {
public static void main(String[] args) {
System.out.println(removeL33t("I 54w 3 5hip5"));
System.out.println(removeL33t("I 54w 33 5hip5"));
System.out.println(removeL33t("I 54w 45 5hip5"));
System.out.println(removeL33t("N3v3r f0rg37 y0|_|r t0w31"));
}
public static String removeL33t(String s) {
String result = "";
for (int pos = 0;;) {
// Find the beginning of the next word.
int whitespaceBegin = pos;
while (pos < s.length() && Character.isWhitespace(s.charAt(pos))) {
pos++;
}
// Add the whitespace to the result.
result += s.substring(whitespaceBegin, pos);
// If there is no next word, then we're done.
if (pos >= s.length()) {
return result;
}
// Find the end of the word. Determine if the word is entirely numbers.
int wordBegin = pos;
boolean nonNumber = false;
while (pos < s.length() && !Character.isWhitespace(s.charAt(pos))) {
nonNumber |= s.charAt(pos) < '0' || s.charAt(pos) > '9';
pos++;
}
// Append the word. Perform replacements if it contains a non-number.
if (nonNumber) {
result += s.substring(wordBegin, pos)
.replace('4', 'a')
.replace('8', 'b')
.replace('3', 'e')
.replace('1', 'l')
.replace('0', 'o')
.replace('5', 's')
.replace('7', 't')
.replace("|_|", "u")
.replace('2', 'z');
} else {
result += s.substring(wordBegin, pos);
}
}
}
}

I think this is it.
public static String translateToEnglish(String phrase) {
if (phrase == null) {
return null;
}
String finished = phrase.replace('4', 'a') .replace('1', 'l') .replace('2', 'z') .replace('5', 's') .replace('8', 'b') .replace('0', 'o') .replace('7', 't') .replace("|_|", "u") .replace("3", "e");
finished = finished.replace(" e ", " 3 ");
if(finished.startsWith("e ")){
finished = "3 " + finished.substring(2);
}
if(finished.endsWith(" e")){
finished = finished.substring(0, finished.length()-2) + " 3";
}
return finished;
}

I don't know if this is the answer, but is the best i could think of
public static void main (String[] args) throws java.lang.Exception
{
String c = "I 54w 45 5hip5";
for(String s: c.split(" ")){
try{
Integer.parseInt(s);
System.out.print(s + " ");
}
catch(NumberFormatException e){
s = s.replace('4', 'a').replace('1', 'l').replace('2', 'z').replace('5', 's').replace('8', 'b').replace('0', 'o').replace('7', 't').replace("|_|", "u").replace("3", "e");
System.out.print(s + " ");
}
}
}

This is for your "new" code that you decided to use, or this could just be an alternate solution. The input/output is identical to the samples I gave in my other answer:
public static String translateToEnglish(String phrase) {
if (phrase == null)
return null;
String finished = "";
for (int i = 0; i < phrase.length(); i++) {
char c = phrase.charAt(i);
if (c == '4')
finished += 'a';
else if (c == '3') {
if (i != phrase.length() - 1)
{
if (phrase.charAt(i + 1) == ' ') {
if (i == 0)
finished += c;
else
if (phrase.charAt(i - 1) == ' ')
finished += c;
else
finished += 'e';
}
else
finished += 'e';
}
else
{
if (phrase.charAt(i - 1) == ' ')
finished += c;
else
finished += 'e';
}
} else if (c == '1')
finished += 'l';
else if (c == '2')
finished += 'z';
else if (c == '5')
finished += 's';
else if (c == '7')
finished +='t';
else if (c == '8')
finished += 'b';
else if (c == '0')
finished += 'o';
else if (i + 2 < phrase.length() && phrase.charAt(i + 1) == '_' && phrase.charAt(i + 2) == '|') {
finished += 'u';
i += 2;
} else
finished += c;
}
return finished;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

removing invalid XML characters from a string in java - java

I believe that the following articles may help you. http://commons.apache.org/lang/api-2.1/org/apache/commons/lang/StringEscapeUtils.html http://www.javapractices.com/topic/TopicAction.do?Id=96 Shortly, try to use StringEscapeUtils from Jakarta project.

Related

Java AWS Serverless Lambda [duplicate]

Why does this method always return false?

Errors When Running Java from Command Prompt

Incorrect logic comparing strings?

Using replace to take a character and change it to another in java

Categories

Resources