String or number to Character with jflex

String or number to Character with jflex - java

I'm using jflex and i have to recognize characters, which can be:
Normal chars, like 'a'
Numbers, like '\126'
I've made this regular expression (Integer is a macro already defined):
Character = (\'.\')|(\'\\{Integer}\')
I don't know if it's ok, but my real problem is that i don't know what code i have to put to turn both type of strings into Characters, because this doesn't work:
{Character} { this.yylval = new Character(yytext());
return Parser.CHARACTER; }
Any idea?

You have to write valid Java: the only constructor for Character is Character(char) but you are invoking Character(String).
You need to extract what you want from yytext().

Related

How to split string if splitting character is dynamic or unknown?

I want to make a Java program in which I want to take a String as a input. The string will have two integer numbers and operation to be performed.
eg. 25+85
or 15*78
The output will the solution of the string.
But I don't know how to split the string because operator sign is not known before execution.

You would want to check what operation it is using by using String.contains("+"); and checking all the other operators you want to support. Then split wherever that operator is, String.split("+"). From there parse the output of String.split("+") by using Integer.parseInt(String s) and then return the sum. Pretty simple, good luck.

You can use the split() method of the String class to split the input at non-digit characters:
input.split("\\D");
This will give you an array containing only the numbers.
I guess you also want to get the operator somehow? Although it's not the most elegant way, you might want to start with input.replaceAll("[^\\*\\+\\-\\/]", "") to remove everything that's not an operator, but you will still have to do some careful input filtering. What if i type 5+4*6 oder 2+hello ?

Regex: Ignoring numbers

I am trying to write a regex that tries to match on a specific string, but ignores all numbers in the target string - So my regex could be 'MyDog', but it should match MyDog, as well as My11Dog and MyDog1 etc. I could write something like
M[^\d]*y[^\d]D[^\d]*o[^\d]g[^\d]*
But that is pretty painful. Any ideas out there? I am using Java, and cannot change what is in the string, because I need to retrieve it as is.

Regular Expressions can do this at the end but why don't you get help by your programming language Java? (I can't Java!)
String s1 = "0My1D2og3";
s2 = s1.replaceAll("\d", "");
if (s2.equals("MyDog")) {
// Do something
}

regex expression for any letter, number and "normal" characters

I receive bytes into a method and I want to send them over serial, but I only want to send valid bytes, (i.e. a-zA-Z0-9"!£$%^&*()-_=+), things like that, spaces, new lines etc. I just want to filter out any character like ones with accents or �, in any order and any number of times.
Would something like this including all characters with | work?
^[a-z|A-Z|0-9|\\s|-<other characters>]*
Or, what would be the correct expression?
So if a string contained "exit����", I would only want to send "exit", and never send characters that are not valid, but send everything else.
public void write(byte[] bytes, int offset, int count) {
String str;
try {
str = new String(bytes, "ASCII");
Log.d(TAG, "data received in write: " +str );
//^[a-z|A-Z|0-9|\s|-]*
//test here, call next line on any character that is valid
GraphicsTerminalActivity.sendOverSerial(str.getBytes("ASCII"));
} catch (UnsupportedEncodingException e) {
Log.d(TAG, "exception" );
e.printStackTrace();
}
// appendToEmulator(bytes, 0, bytes.length);
}
EDIT: I tried [^\x00-\x7F] which is the range of ascii characters....but then the � symbols still get through, weird.

Try using pattern like [\x20-\x7E] These are the ASCII codes of the printable characters.
By the way I assume you are asking about ASCII, because this is how you parse in your question.

You want to do a search-replace:
String fixed = input.replaceAll("[^\p{Print}\t\n]", "");
Rolf
Edit: Add references:
Pattern Javadoc -> scroll down to POSIX Character Classes (US-ASCII ONLY)
The pattern above matches all characters that are not printable characters....

You may want to look into Java's Normalizer class if you haven't already. It would allow you to extract the "normal" character from its accented equivalent, as an alternative to throwing away the whole character.
I don't remember my exact source for this idea (I was trying to do accent-agnostic searching recently), but a quick search turned up this simple blog post that may offer a little more insight into how to use it.

The pipe is not the correct way to turn your list of characters into a regular expression. Put the characters in a charecter class with square brackets around it. All characters in the character class are by default ORed, so no need for pipes. There is a need to escape symbols that are not numbers and letters.
[a-zA-Z0-9\"\!\£\$\%\^\&\*\(\)\-\_\=\+]
And then if you want to put that into a Java string, you need to double escape the escapes
Pattern p = Pattern.compile("[a-zA-Z0-9\\"\\!\\£\\$\\%\\^\\&\\*\\(\\)\\-\\_\\=\\+]");
Keep in mind that the pound symbol (£) is not an ASCII character, so converting it to ASCII is not going to work.

Java string replace and the NUL (NULL, ASCII 0) character?

Testing out someone elses code, I noticed a few JSP pages printing funky non-ASCII characters. Taking a dip into the source I found this tidbit:
// remove any periods from first name e.g. Mr. John --> Mr John
firstName = firstName.trim().replace('.','\0');
Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a C-string. Would this be the culprit to the funky characters?

Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a c-string.
That depends on how you define what is working. Does it replace all occurrences of the target character with '\0'? Absolutely!
String s = "food".replace('o', '\0');
System.out.println(s.indexOf('\0')); // "1"
System.out.println(s.indexOf('d')); // "3"
System.out.println(s.length()); // "4"
System.out.println(s.hashCode() == 'f'*31*31*31 + 'd'); // "true"
Everything seems to work fine to me! indexOf can find it, it counts as part of the length, and its value for hash code calculation is 0; everything is as specified by the JLS/API.
It DOESN'T work if you expect replacing a character with the null character would somehow remove that character from the string. Of course it doesn't work like that. A null character is still a character!
String s = Character.toString('\0');
System.out.println(s.length()); // "1"
assert s.charAt(0) == 0;
It also DOESN'T work if you expect the null character to terminate a string. It's evident from the snippets above, but it's also clearly specified in JLS (10.9. An Array of Characters is Not a String):
In the Java programming language, unlike C, an array of char is not a String, and neither a String nor an array of char is terminated by '\u0000' (the NUL character).
Would this be the culprit to the funky characters?
Now we're talking about an entirely different thing, i.e. how the string is rendered on screen. Truth is, even "Hello world!" will look funky if you use dingbats font. A unicode string may look funky in one locale but not the other. Even a properly rendered unicode string containing, say, Chinese characters, may still look funky to someone from, say, Greenland.
That said, the null character probably will look funky regardless; usually it's not a character that you want to display. That said, since null character is not the string terminator, Java is more than capable of handling it one way or another.
Now to address what we assume is the intended effect, i.e. remove all period from a string, the simplest solution is to use the replace(CharSequence, CharSequence) overload.
System.out.println("A.E.I.O.U".replace(".", "")); // AEIOU
The replaceAll solution is mentioned here too, but that works with regular expression, which is why you need to escape the dot meta character, and is likely to be slower.

Should be probably changed to
firstName = firstName.trim().replaceAll("\\.", "");

I think it should be the case. To erase the character, you should use replace(".", "") instead.

Does replacing a character in a String
with a null character even work in
Java?
No.
Would this be the culprit to the funky characters?
Quite likely.

This does cause "funky characters":
System.out.println( "Mr. Foo".trim().replace('.','\0'));
produces:
Mr[] Foo
in my Eclipse console, where the [] is shown as a square box. As others have posted, use String.replace().

Print string literal unicode as the actual character

In my Java application I have been passed in a string that looks like this:
"\u00a5123"
When printing that string into the console, I get the same string as the output (as expected).
However, I want to print that out by having the unicode converted into the actual yen symbol (\u00a5 -> yen symbol) - how would I go about doing this?
i.e. so it looks like this: "[yen symbol]123"

I wrote a little program:
public static void main(String[] args) {
System.out.println("\u00a5123");
}
It's output:
¥123
i.e. it output exactly what you stated in your post. I am not sure there is not something else going on. What version of Java are you using?
edit:
In response to your clarification, there are a couple of different techniques. The most straightforward is to look for a "\u" followed by 4 hex-code characters, extract that piece and replace with a unicode version with the hexcode (using the Character class). This of course assumes the string will not have a \u in front of it.
I am not aware of any particular system to parse the String as though it was an encoded Java String.

As has been mentioned before, these strings will have to be parsed to get the desired result.
Tokenize the string by using \u as separator. For example: \u63A5\u53D7 => { "63A5", "53D7" }
Process these strings as follows:
String hex = "63A5";
int intValue = Integer.parseInt(hex, 16);
System.out.println((char)intValue);

You're probably going to have to write a parse for these, unless you can find one in a third party library. There is nothing in the JDK to parse these for you, I know because I fairly recently had an idea to use these kind of escapes as a way to smuggle unicode through a Latin-1-only database. (I ended up doing something else btw)
I will tell you that java.util.Properties escapes and unescapes Unicode characters in this manner when reading and writing files (since the files have to be ASCII). The methods it uses for this are private, so you can't call them, but you could use the JDK source code to inspire your solution.

Could replace the above with this:
System.out.println((char)0x63A5);
Here is the code to print all of the box building unicode characters.
public static void printBox()
{
for (int i=0x2500;i<=0x257F;i++)
{
System.out.printf("0x%x : %c\n",i,(char)i);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String or number to Character with jflex - java

You have to write valid Java: the only constructor for Character is Character(char) but you are invoking Character(String). You need to extract what you want from yytext().

Related

How to split string if splitting character is dynamic or unknown?

Regex: Ignoring numbers

regex expression for any letter, number and "normal" characters

Java string replace and the NUL (NULL, ASCII 0) character?

Print string literal unicode as the actual character

Categories

Resources