I'm working with a Lexical Analyzer program right now and I'm using Java. I've been researching for answers on this problem but until now I failed to find any. Here's my problem:
Input:
System.out.println ("Hello World");
Desired Output:
Lexeme----------------------Token
System [Key_Word]
. [Object_Accessor]
out [Key_Word]
. [Object_Accessor]
println [Key_Word]
( [left_Parenthesis]
"Hello World" [String_Literal]
) [right_Parenthesis]
; [statement_separator]
I'm still a beginner so I hope you guys can help me on this. Thanks.
You need neither ANTLR nor the Dragon book to write a simple lexical analyzer by hand. Even lexical analyzers for fuller languages (like Java) aren't terribly complicated to write by hand. Obviously if you have an industrial task you might want to consider industrial strength tools like ANTLR or some lex variant, but for the sake of learning how lexical analysis works, writing one by hand would likely prove to be a useful exercise. I'm assuming that this is the case, since you said you're still a beginner.
Here's a simple lexical analyzer, written in Java, for a subset of a Scheme-like language, that I wrote after seeing this question. I think the code is relatively easy to understand even if you've never seen a lexer before, simply because breaking a stream of characters (in this case a String) into a stream of tokens (in this case a List<Token>) isn't that hard. If you have questions I can try to explain in more depth.
import java.util.List;
import java.util.ArrayList;
/*
* Lexical analyzer for Scheme-like minilanguage:
* (define (foo x) (bar (baz x)))
*/
public class Lexer {
public static enum Type {
// This Scheme-like language has three token types:
// open parens, close parens, and an "atom" type
LPAREN, RPAREN, ATOM;
}
public static class Token {
public final Type t;
public final String c; // contents mainly for atom tokens
// could have column and line number fields too, for reporting errors later
public Token(Type t, String c) {
this.t = t;
this.c = c;
}
public String toString() {
if(t == Type.ATOM) {
return "ATOM<" + c + ">";
}
return t.toString();
}
}
/*
* Given a String, and an index, get the atom starting at that index
*/
public static String getAtom(String s, int i) {
int j = i;
for( ; j < s.length(); ) {
if(Character.isLetter(s.charAt(j))) {
j++;
} else {
return s.substring(i, j);
}
}
return s.substring(i, j);
}
public static List<Token> lex(String input) {
List<Token> result = new ArrayList<Token>();
for(int i = 0; i < input.length(); ) {
switch(input.charAt(i)) {
case '(':
result.add(new Token(Type.LPAREN, "("));
i++;
break;
case ')':
result.add(new Token(Type.RPAREN, ")"));
i++;
break;
default:
if(Character.isWhitespace(input.charAt(i))) {
i++;
} else {
String atom = getAtom(input, i);
i += atom.length();
result.add(new Token(Type.ATOM, atom));
}
break;
}
}
return result;
}
public static void main(String[] args) {
if(args.length < 1) {
System.out.println("Usage: java Lexer \"((some Scheme) (code to) lex)\".");
return;
}
List<Token> tokens = lex(args[0]);
for(Token t : tokens) {
System.out.println(t);
}
}
}
Example use:
~/code/scratch $ java Lexer ""
~/code/scratch $ java Lexer "("
LPAREN
~/code/scratch $ java Lexer "()"
LPAREN
RPAREN
~/code/scratch $ java Lexer "(foo)"
LPAREN
ATOM<foo>
RPAREN
~/code/scratch $ java Lexer "(foo bar)"
LPAREN
ATOM<foo>
ATOM<bar>
RPAREN
~/code/scratch $ java Lexer "(foo (bar))"
LPAREN
ATOM<foo>
LPAREN
ATOM<bar>
RPAREN
RPAREN
Once you've written one or two simple lexers like this, you will get a pretty good idea of how this problem decomposes. Then it would be interesting to explore how to use automated tools like lex. The theory behind regular expression based matchers is not too difficult, but it does take a while to fully understand. I think writing lexers by hand motivates that study and helps you come to grips with the problem better than diving into the theory behind converting regular expressions to finite automate (first NFAs, then NFAs to DFAs), etc... wading into that theory can be a lot to take in at once, and it is easy to get overwhelmed.
Personally, while the Dragon book is good and very thorough, the coverage might not be the easiest to understand because it aims to be complete, not necessarily accessible. You might want to try some other compiler texts before opening up the Dragon book. Here are a few free books, which have pretty good introductory coverage, IMHO:
http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf
http://www.diku.dk/~torbenm/Basics/
Some articles on the implementation of regular expressions (automated lexical analysis usually uses regular expressions)
http://swtch.com/~rsc/regexp/
ANTLR 4 will do exactly this with the Java.g4 reference grammar. You have two options depending on how closely you want the handling of Unicode escape sequences to follow the language specification.
https://github.com/antlr/grammars-v4/blob/master/java/Java.g4: This grammar only handles Unicode escape sequences as characters within a string or character literal.
https://github.com/antlr/antlr4/blob/master/tool/test/org/antlr/v4/test/Java-LR.g4 (must be renamed to Java.g4 before use): This grammar requires that you wrap your ANTLRInputStream in a JavaUnicodeInputStream, which processes Unicode escape sequences according to the JLS prior to feeding them to the lexer.
Edit: The names of the tokens produced by this grammar differ slightly from your table.
Your Key_Word token is Identifier
Your Object_Accessor token is DOT
Your left_Parenthesis token is LPAREN
Your String_Literal token is StringLiteral
Your right_Parenthesis token is RPAREN
Your statement_separator token is SEMI
Lexical analysis is a topic by itself that usually goes together with compiler design and analysis. You should read up about it before trying to code anything. My favourite book on this topic is the Dragon book which should give you a good introduction to compiler design and even provides pseudocodes for all compiler phases which you can easily translate to Java and move from there.
In short, the main idea is to parse the input and divide it into tokens which belong to certain classes (parentheses or keywords, for example, in your desired output) using a finite state machine. Process of state machine building is actually the only hard part of this analysis and the Dragon book will provide you with great insight into this thing.
You can use libraries like Lex & Bison in C or Antlr in Java. Lexical analysis can be done through making automata. I'll give you small example:
Suppose you need to tokenize a string where keywords (language) are {'echo', '.', ' ', 'end'). By keywords I mean language consists of following keywords only. So if I input
echo .
end .
My lexer should output
echo ECHO
SPACE
. DOT
end END
SPACE
. DOT
Now to build automata for such a tokenizer, I can start by
->(SPACE) (Back)
|
(S)-------------E->C->H->O->(ECHO) (Back)
| |
.->(DOT)(Back) ->N->D ->(END) (Back to Start)
Above diagram is prolly very bad, but idea is that you have a start state represented by S now you consume E and go to some other state, now you expect N or C to come for END and ECHO respectively. You keep consuming characters and reach different states within this simple finite state machine. Ultimately, you reach certain Emit state, for example after consuming E, N, D you reach emit state for END which emits the token out and then you go back to start state. This cycle continues forever as far as you have characters stream coming to your tokenizer. On invalid character you can either thrown an error or ignore depending on the design.
CookCC ( https://github.com/coconut2015/cookcc ) generates a very fast, small, zero-dependency lexer for Java.
Write a program to make a simple lexical analyzer that will build a symbol table from given stream of chars. You will need to read a file named “input.txt” to collect all chars. For simplicity, input file will be a C/Java/Python program without headers and methods(body of the main progrm). Then you will identify all the numerical values, identifiers, keywords, math operators, logical operators and others[distinct]. See the example for more details. You can assume that, there will be a space after each keyword.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main(){
/* By Ashik Rabbani
Daffodil International University,CSE43 */
keyword_check();
identifier_check();
math_operator_check();
logical_operator_check();
numerical_check();
others_check();
return 0;
}
void math_operator_check()
{
char ch, string_input[15], operators[] = "+-*/%";
FILE *fp;
char tr[20];
int i,j=0;
fp = fopen("input.txt","r");
if(fp == NULL){
printf("error while opening the file\n");
exit(0);
}
printf("\nMath Operators : ");
while((ch = fgetc(fp)) != EOF){
for(i = 0; i < 6; ++i){
if(ch == operators[i])
printf("%c ", ch);
}
}
printf("\n");
fclose(fp);
}
void logical_operator_check()
{
char ch, string_input[15], operators[] = "&&||<>";
FILE *fp;
char tr[20];
int i,j=0;
fp = fopen("input.txt","r");
if(fp == NULL){
printf("error while opening the file\n");
exit(0);
}
printf("\nLogical Operators : ");
while((ch = fgetc(fp)) != EOF){
for(i = 0; i < 6; ++i){
if(ch == operators[i])
printf("%c ", ch);
}
}
printf("\n");
fclose(fp);
}
void numerical_check()
{
char ch, string_input[15], operators[] ={'0','1','2','3','4','5','6','7','8','9'};
FILE *fp;
int i,j=0;
fp = fopen("input.txt","r");
if(fp == NULL){
printf("error while opening the file\n");
exit(0);
}
printf("\nNumerical Values : ");
while((ch = fgetc(fp)) != EOF){
for(i = 0; i < 6; ++i){
if(ch == operators[i])
printf("%c ", ch);
}
}
printf("\n");
fclose(fp);
}
void others_check()
{
char ch, string_input[15], symbols[] = "(){}[]";
FILE *fp;
char tr[20];
int i,j=0;
fp = fopen("input.txt","r");
if(fp == NULL){
printf("error while opening the file\n");
exit(0);
}
printf("\nOthers : ");
while((ch = fgetc(fp)) != EOF){
for(i = 0; i < 6; ++i){
if(ch == symbols[i])
printf("%c ", ch);
}
}
printf("\n");
fclose(fp);
}
void identifier_check()
{
char ch, string_input[15];
FILE *fp;
char operators[] ={'0','1','2','3','4','5','6','7','8','9'};
int i,j=0;
fp = fopen("input.txt","r");
if(fp == NULL){
printf("error while opening the file\n");
exit(0);
}
printf("\nIdentifiers : ");
while((ch = fgetc(fp)) != EOF){
if(isalnum(ch)){
string_input[j++] = ch;
}
else if((ch == ' ' || ch == '\n') && (j != 0)){
string_input[j] = '\0';
j = 0;
if(isKeyword(string_input) == 1)
{
}
else
printf("%s ", string_input);
}
}
printf("\n");
fclose(fp);
}
int isKeyword(char string_input[]){
char keywords[32][10] = {"auto","break","case","char","const","continue","default",
"do","double","else","enum","extern","float","for","goto",
"if","int","long","register","return","short","signed",
"sizeof","static","struct","switch","typedef","union",
"unsigned","void","volatile","while"};
int i, flag = 0;
for(i = 0; i < 32; ++i){
if(strcmp(keywords[i], string_input) == 0){
flag = 1;
break;
}
}
return flag;
}
void keyword_check()
{
char ch, string_input[15], operators[] = "+-*/%=";
FILE *fp;
char tr[20];
int i,j=0;
printf(" Token Identification using C \n By Ashik-E-Rabbani \n 161-15-7093\n\n");
fp = fopen("input.txt","r");
if(fp == NULL){
printf("error while opening the file\n");
exit(0);
}
printf("\nKeywords : ");
while((ch = fgetc(fp)) != EOF){
if(isalnum(ch)){
string_input[j++] = ch;
}
else if((ch == ' ' || ch == '\n') && (j != 0)){
string_input[j] = '\0';
j = 0;
if(isKeyword(string_input) == 1)
printf("%s ", string_input);
}
}
printf("\n");
fclose(fp);
}
Related
I have over a gigabyte of text that I need to go through and surround punctuation with spaces (tokenizing). I have a long regular expression (1818 characters, though that's mostly lists) that defines when punctuation should not be separated. Being long and complicated makes it hard to use groups with it, though I wouldn't leave that out as an option since I could make most groups non-capturing (?:).
Question: How can I efficiently replace certain characters that don't match a particular regular expression?
I've looked into using lookaheads or similar, and I haven't quite figured it out, but it seems to be terribly inefficient anyway. It would likely be better than using placeholders though.
I can't seem to find a good "replace with a bunch of different regular expressions for both finding and replacing in one pass" function.
Should I do this line by line instead of operating on the whole text?
String completeRegex = "[^\\w](("+protectedPrefixes+")|(("+protectedNumericOnly+")\\s*\\p{N}))|"+protectedRegex;
Matcher protectedM = Pattern.compile(completeRegex).matcher(s);
ArrayList<String> protectedStrs = new ArrayList<String>();
//Take note of the protected matches.
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
}
//Replace protected matches.
String replaceStr = "<PROTECTED>";
s = protectedM.replaceAll(replaceStr);
//Now that it's safe, separate punctuation.
s = s.replaceAll("([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])"," $1 ");
// These are for apostrophes. Can these be combined with either the protecting regular expression or the one above?
s = s.replaceAll("([\\p{N}\\p{L}])'(\\p{L})", "$1 '$2");
s = s.replaceAll("([^\\p{L}])'([^\\p{L}])", "$1 ' $2");
Note the two additional replacements for apostrophes. Using placeholders protects against those replacements as well, but I'm not really concerned with apostrophes or single quotes in my protecting regex anyway, so it's not a real concern.
I'm rewriting what I considered very inefficient Perl code with my own in Java, keeping track of speed, and things were going fine until I started replacing the placeholders with the original strings. With that addition it's too slow to be reasonable (I've never seen it get even close to finishing).
//Replace placeholders with original text.
String resultStr = "";
String currentStr = "";
int currentPos = 0;
int[] protectedArray = replaceStr.codePoints().toArray();
int protectedLen = protectedArray.length;
int[] strArray = s.codePoints().toArray();
int protectedCount = 0;
for (int i=0; i<strArray.length; i++) {
int pt = strArray[i];
// System.out.println("pt: "+pt+" symbol: "+String.valueOf(Character.toChars(pt)));
if (protectedArray[currentPos]==pt) {
if (currentPos == protectedLen - 1) {
resultStr += protectedStrs.get(protectedCount);
protectedCount++;
currentPos = 0;
} else {
currentPos++;
}
} else {
if (currentPos > 0) {
resultStr += replaceStr.substring(0, currentPos);
currentPos = 0;
currentStr = "";
}
resultStr += ParseUtils.getSymbol(pt);
}
}
s = resultStr;
This code may not be the most efficient way to return the protected matches. What is a better way? Or better yet, how can I replace punctuation without having to use placeholders?
I don't know exactly how big your in-between strings are, but I suspect that you can do somewhat better than using Matcher.replaceAll, speed-wise.
You're doing 3 passes across the string, each time creating a new Matcher instance, and then creating a new String; and because you're using + to concatenate the strings, you're creating a new string which is the concatenation of the in-between string and the protected group, and then another string when you concatenate this to the current result. You don't really need all of these extra instances.
Firstly, you should accumulate the resultStr in a StringBuilder, rather than via direct string concatenation. Then you can proceed something like:
StringBuilder resultStr = new StringBuilder();
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
appendInBetween(resultStr, str, current, protectedM.str());
resultStr.append(protectedM.group());
currIndex = protectedM.end();
}
resultStr.append(str, currIndex, str.length());
where appendInBetween is a method implementing the equivalent to the replacements, just in a single pass:
void appendInBetween(StringBuilder resultStr, String s, int start, int end) {
// Pass the whole input string and the bounds, rather than taking a substring.
// Allocate roughly enough space up-front.
resultStr.ensureCapacity(resultStr.length() + end - start);
for (int i = start; i < end; ++i) {
char c = s.charAt(i);
// Check if c matches "([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])".
if (!(Character.isLetter(c)
|| Character.isDigit(c)
|| Character.getType(c) == Character.NON_SPACING_MARK
|| "_\\-<>'".indexOf(c) != -1)) {
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else if (c == '\'' && i > 0 && i + 1 < s.length()) {
// We have a quote that's not at the beginning or end.
// Call these 3 characters bcd, where c is the quote.
char b = s.charAt(i - 1);
char d = s.charAt(i + 1);
if ((Character.isDigit(b) || Character.isLetter(b)) && Character.isLetter(d)) {
// If the 3 chars match "([\\p{N}\\p{L}])'(\\p{L})"
resultStr.append(' ');
resultStr.append(c);
} else if (!Character.isLetter(b) && !Character.isLetter(d)) {
// If the 3 chars match "([^\\p{L}])'([^\\p{L}])"
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else {
resultStr.append(c);
}
} else {
// Everything else, just append.
resultStr.append(c);
}
}
}
Ideone demo
Obviously, there is a maintenance cost associated with this code - it is undeniably more verbose. But the advantage of doing it explicitly like this (aside from the fact it is just a single pass) is that you can debug the code like any other - rather than it just being the black box that regexes are.
I'd be interested to know if this works any faster for you!
At first I thought that appendReplacement wasn't what I was looking for, but indeed it was. Since it's replacing the placeholders at the end that slowed things down, all I really needed was a way to dynamically replace matches:
StringBuffer replacedBuff = new StringBuffer();
Matcher replaceM = Pattern.compile(replaceStr).matcher(s);
int index = 0;
while (replaceM.find()) {
replaceM.appendReplacement(replacedBuff, "");
replacedBuff.append(protectedStrs.get(index));
index++;
}
replaceM.appendTail(replacedBuff);
s = replacedBuff.toString();
Reference: Second answer at this question.
Another option to consider:
During the first pass through the String, to find the protected Strings, take the start and end indices of each match, replace the punctuation for everything outside of the match, add the matched String, and then keep going. This takes away the need to write a String with placeholders, and requires only one pass through the entire String. It does, however, require many separate small replacement operations. (By the way, be sure to compile the patterns before the loop, as opposed to using String.replaceAll()). A similar alternative is to add the unprotected substrings together, and then replace them all at the same time. However, the protected strings would then have to be added to the replaced string at the end, so I doubt this would save time.
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
String substr = s.substring(currIndex,protectedM.start());
substr = p1.matcher(substr).replaceAll(" $1 ");
substr = p2.matcher(substr).replaceAll("$1 '$2");
substr = p3.matcher(substr).replaceAll("$1 ' $2");
resultStr += substr+protectedM.group();
currIndex = protectedM.end();
}
Speed comparison for 100,000 lines of text:
Original Perl script: 272.960579875 seconds
My first attempt: Too long to finish.
With appendReplacement(): 14.245160866 seconds
Replacing while finding protected: 68.691842962 seconds
Thank you, Java, for not letting me down.
I've got a JSON mapping all of the unicode emojis to a colon separated string representation of them (like twitter uses). I've imported the file into an ArrayList of Pair< Character, String> and now need to scan a String message and replace any unicode emojis with their string equivalents.
My code for conversion is the following:
public static String getStringFromUnicode(Context context, String m) {
ArrayList<Pair<Character, String>> list = loadEmojis(context);
String formattedString="";
for (Pair p : list) {
formattedString = message.replaceAll(String.valueOf(p.first), ":" + p.second + ":");
}
return formattedString;
}
but I always get the unicode emoji representation when I send the message to a server.
Any help would be greatly appreciated, thanks!!
When in doubt go back to first principles.
You have a lot of stuff that is all nested together. I have found in such cases that your best approach to solving the problem is to pull it apart and look at what the different pieces are doing. This lets you take control of the problem, and place test code where needed to see what the data is doing.
My best guess is that replaceAll() is acting unpredictably; misinterpreting the emoji string as commands for its regular expression analysis.
I would suggest substituting replaceAll() with a loop of your own that does the same thing. Since we are working with Unicode I would suggest going down deep on this one. This little code sample will do the same thing as replace all, but because I am addressing the string on a character by character basis it should work no matter what funny controls codes are in the string.
String message = "This :-) is a test :-) message";
String find = ":-)";
String replace = "!";
int pos = 0;
//Replicates function of replaceAll without the regular expression analysis
pos = subPos(message,find);
while (pos != -1)
{
String tmp = message.substring(0,pos);
tmp = tmp + replace;
tmp = tmp + message.substring(pos+find.length());
message = tmp;
pos = subPos(message,find);
}
System.out.println(message);
-- Snip --
//Replicates function of indexOf
public static int subPos(String str, String sub)
{
for (int i = 0; i < str.length() - (sub.length() - 1); i++)
{
int j;
for (j = 0; j < sub.length(); j++)
{
System.out.println(i + j);
if (str.charAt(i + j) != sub.charAt(j))
break;
}
if (j == sub.length())
return i;
}
return -1;
}
I hope this helps. :-)
I"m trying to take a string that represents a full algebraic excpression, such as x = 15 * 6 / 3 which is a string, and tokenize it into its individual components. So the first would be x, then =, then 15, then *, 6, / and finally 3.
The problem I am having is actually parsing through the string and looking at the individual characters. I can't think of a way to do this without a massive amount of if statements. Surely there has to be a better way tan specifically defining each individual case and testing for it.
For each type of token, you'll want to figure out how to identify:
when you're starting to read a particular token
if you're continuing to read the same token, or if you've started a different one
Let's take your example: x=15*6/3. Let's assume that you cannot rely on the fact that there are spaces in between each token. In that case, it's trivial: your new token starts when you reach a space.
You can break down the character types into letters, digits, and symbols. Let's call the token types Variable, Operator, and Number.
A letter indicates a Variable token has started. It continues until you read a non-letter.
A symbol indicates the start of an Operator token. I only see single symbols, but you can have groups of symbols correspond to different Operator tokens.
A digit indicates the start of a Number token. (Let's assume integers for now.) The Number token continues until you read a non-digit.
Basically, that's how a simple symbolic parser works. Now, if you add in negative numbers (where the '-' symbol can have multiple meanings), or parentheses, or function names (like sin(x)) then things get more complicated, but it amounts to the same set of rules, now just with more choices.
create regular expression for each possible element: integer, variable, operator, parentheses.
combine them using the | regular expression operator into one big regular expression with capture groups to identify which one matched.
in a loop match the head of the remaining string and break off the matched part as a token. the type of the token depends on which sub-expression matched as described in 2.
or
use a lexer library, such as the one in antlr or javacc
This is from my early expression evaluator that takes an infix expression like yours and turns it into postfix to evaluate. There are methods that help the parser but I think they're pretty self documenting. Mine uses symbol tables to check tokens against. It also allows for user defined symbols and nested assignments and other things you may not need/want. But it shows how I handled your issue without using niceties like regex which would simplify this task tremendously. In addition everything shown is of my own implementation - stack and queue as well - everything. So if anything looks abnormal (unlike Java imps) that's because it is.
This section of code is important not to answer your immediate question but to show the necessary work to determine the type of token you're dealing with. In my case I had three different types of operators and two different types of operands. Based on either the known rules or rules I chose to enforce (when appropriate) it was easy to know when something was a number (starts with a number), variable/user symbol/math function (starts with a letter), or math operator (is: /,*,-,+) . Note that it only takes seeing the first char to know the correct extraction rules. From your example, if all your cases are as simple, you'd only have to handle two types, operator or operand. Nonetheless the same logic will apply.
protected Queue<Token> inToPostParse(String exp) {
// local vars
inputExp = exp;
offset = 0;
strLength = exp.length();
String tempHolder = "";
char c;
// the program runs in a loop so make sure you're dealing
// with an empty queue
q1.reset();
for (int i = offset; tempHolder != null && i < strLength; ++i) {
c = exp.charAt(i);
// Spaces are useless so skip them
if (c == ' ') { continue; }
// If c is a letter
if ((c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')) {
// Here we know it must be a user symbol possibly undefined
// at this point or an function like SIN, ABS, etc
// We extract, based on obvious rules, the op
tempHolder = extractPhrase(i); // Used to be append sequence
if (ut.isTrigOp(tempHolder) || ut.isAdditionalOp(tempHolder)) {
s1.push(new Operator(tempHolder, "Function"));
} else {
// If not some math function it is a user defined symbol
q1.insert(new Token(tempHolder, "User"));
}
i += tempHolder.length() - 1;
tempHolder = "";
// if c begins with a number
} else if (c >= '0' && c <= '9') {
try {
// Here we know that it must be a number
// so we extract until we reach a non number
tempHolder = extractNumber(i);
q1.insert(new Token(tempHolder, "Number"));
i += tempHolder.length() - 1;
tempHolder = "";
}
catch (NumberFormatException nfe) {
return null;
}
// if c is in the math symbol table
} else if (ut.isMathOp(String.valueOf(c))) {
String C = String.valueOf(c);
try {
// This is where the magic happens
// Here we determine the "intersection" of the
// current C and the top of the stack
// Based on the intersection we take action
// i.e., in math do you want to * or + first?
// Depending on the state you may have to move
// some tokens to the queue before pushing onto the stack
takeParseAction(C, ut.findIntersection
(C, s1.showTop().getSymbol()));
}
catch (NullPointerException npe) {
s1(C);
}
// it must be an invalid expression
} else {
return null;
}
}
u2();
s1.reset();
return q1;
}
Basically I have a stack (s1) and a queue (q1). All variables or numbers go into the queue. Any operators trig, math, parens, etc.. go on the stack. If the current token is to be put on the stack you have to check the state (top) to determine what parsing action to take (i.e., what to do based on math precedence). Sorry if this seems like useless information. I imagine if you're parsing a math expression it's because at some point you plan to evaluate it. IMHO, postfix is the easiest so I, regardless of input format, change it to post and evaluate with one method. If your O is different - do what you like.
Edit: Implementations
The extract phrase and number methods, which you may be most interested in, are as follows:
protected String extractPhrase(int it) {
String phrase = new String();
char c;
for ( ; it < inputExp.length(); ++it) {
c = inputExp.charAt(it);
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9')) {
phrase += String.valueOf(c);
} else {
break;
}
}
return phrase;
}
protected String extractNumber(int it) throws NumberFormatException {
String number = new String();
int decimals = 0;
char c;
for ( ; it < strLength; ++it) {
c = inputExp.charAt(it);
if (c >= '0' && c <= '9') {
number += String.valueOf(c);
} else if (c == '.') {
++decimals;
if (decimals < 2) {
number += ".";
} else {
throw new NumberFormatException();
}
} else {
break;
}
}
return number;
}
Remember - By the time they enter these methods I've already been able to deduce what type it is. This allows you to avoid the seemingly endless while-if-else chain.
Are components always separated by space character like in your question? if so, use algebricExpression.split(" ") to get a String[] of components.
If no such restrictions can be assumed, a possible solution can be to iterate over the input, and switch the Character.getType() of the current index, somthing like that:
ArrayList<String> getExpressionComponents(String exp) {
ArrayList<String> components = new ArrayList<String>();
String current = "";
int currentSequenceType = Character.UNASSIGNED;
for (int i = 0 ; i < exp.length() ; i++) {
if (currentSequenceType != Character.getType(exp.charAt(i))) {
if (current.length() > 0) components.add(current);
current = "";
currentSequenceType = Character.getType(exp.charAt(i));
}
switch (Character.getType(exp.charAt(i))) {
case Character.DECIMAL_DIGIT_NUMBER:
case Character.MATH_SYMBOL:
case Character.START_PUNCTUATION:
case Character.END_PUNCTUATION:
case Character.LOWERCASE_LETTER:
case Character.UPPERCASE_LETTER:
// add other required types
current = current.concat(new String(new char[] {exp.charAt(i)}));
currentSequenceType = Character.getType(exp.charAt(i));
break;
default:
current = "";
currentSequenceType = Character.UNASSIGNED;
break;
}
}
return components;
}
You can easily change the cases to meet with other requirements, such as split non-digit chars to separate components etc.
I need to write a regular expression for string read from a file
apple,boy,cat,"dog,cat","time\" after\"noon"
I need to split it into
apple
boy
cat
dog,cat
time"after"noon
I tried using
Pattern pattern =
Pattern.compile("[\\\"]");
String items[]=pattern.split(match);
for the second part but I could not get the right answer,can you help me with this?
Since your question is more of a parsing problem than a regex problem, here's another solution that will work:
public class CsvReader {
Reader r;
int row, col;
boolean endOfRow;
public CsvReader(Reader r){
this.r = r instanceof BufferedReader ? r : new BufferedReader(r);
this.row = -1;
this.col = 0;
this.endOfRow = true;
}
/**
* Returns the next string in the input stream, or null when no input is left
* #return
* #throws IOException
*/
public String next() throws IOException {
int i = r.read();
if(i == -1)
return null;
if(this.endOfRow){
this.row++;
this.col = 0;
this.endOfRow = false;
} else {
this.col++;
}
StringBuilder b = new StringBuilder();
outerLoop:
while(true){
char c = (char) i;
if(i == -1)
break;
if(c == ','){
break;
} else if(c == '\n'){
endOfRow = true;
break;
} else if(c == '\\'){
i = r.read();
if(i == -1){
break;
} else {
b.append((char)i);
}
} else if(c == '"'){
while(true){
i = r.read();
if(i == -1){
break outerLoop;
}
c = (char)i;
if(c == '\\'){
i = r.read();
if(i == -1){
break outerLoop;
} else {
b.append((char)i);
}
} else if(c == '"'){
r.mark(2);
i = r.read();
if(i == '"'){
b.append('"');
} else {
r.reset();
break;
}
} else {
b.append(c);
}
}
} else {
b.append(c);
}
i = r.read();
}
return b.toString().trim();
}
public int getColNum(){
return col;
}
public int getRowNum(){
return row;
}
public static void main(String[] args){
try {
String input = "apple,boy,cat,\"dog,cat\",\"time\\\" after\\\"noon\"\nquick\"fix\" hello, \"\"\"who's there?\"";
System.out.println(input);
Reader r = new StringReader(input);
CsvReader csv = new CsvReader(r);
String s;
while((s = csv.next()) != null){
System.out.println("R" + csv.getRowNum() + "C" + csv.getColNum() + ": " + s);
}
} catch(IOException e){
e.printStackTrace();
}
}
}
Running this code, I get the output:
R0C0: apple
R0C1: boy
R0C2: cat
R0C3: dog,cat
R0C4: time" after"noon
R1C0: quickfix hello
R1C1: "who's there?
This should fit your needs pretty well.
A few disclaimers, though:
It won't catch errors in the syntax of the CSV format, such as an unescaped quotation mark in the middle of a value.
It won't perform any character conversion (such as converting "\n" to a newline character). Backslashes simply cause the following character to be treated literally, including other backslashes. (That should be easy enough to alter if you need additional functionality)
Some csv files escape quotes by doubling them rather than using a backslash, this code now looks for both.
Edit: Looked up the csv format, discovered there's no real standard, but updated my code to catch quotes escaped by doubling rather than backslashes.
Edit 2: Fixed. Should work as advertised now. Also modified it to test the tracking of row and column numbers.
First thing: String.split() uses the regex to find the separators, not the substrings.
Edit: I'm not sure if this can be done with String.split(). I think the only way you could deal with the quotes while only matching the comma would be by readahead and lookbehind, and that's going to break in quite a lot of cases.
Edit2: I'm pretty sure it can be done with a regular expression. And I'm sure this one case could be solved with string.split() -- but a general solution wouldn't be simple.
Basically, you're looking for anything that isn't a comma as input [^,], you can handle quotes as a separate character. I've gotten most of the way there myself. I'm getting this as output:
apple
boy
cat
dog
cat
time\" after\"noon
But I'm not sure why it has so many blank lines.
My complete code is:
String input = "apple,boy,cat,\"dog,cat\",\"time\\\" after\\\"noon\"";
Pattern pattern =
Pattern.compile("(\\s|[^,\"\\\\]|(\\\\.)||(\".*\"))*");
Matcher m = pattern.matcher(input);
while(m.find()){
System.out.println(m.group());
}
But yeah, I'll echo the guy above and say that if there's no requirement to use a regular expression, then it's probably simpler to do it manually.
But then I guess I'm almost there. It's spitting out ... oh hey, I see what's going on here. I think I can fix that.
But I'm going to echo the guy above and say that if there's no requirement to use a regular expression, it's probably better to do it one character at a time and implement the logic manually. If your regex isn't picture-perfect, then it could cause all kinds of unpredictable weirdness down the line.
I am not really sure about this but you could have a go at Pattern.compile("[\\\\"]");
\ is an escape character and to detect a \ in the expression, \\\\ could be used.
A similar thing worked for me in another context and I hope it solves your problem too.
I know variants of this question have been asked frequently before (see here and here for instance), but this is not an exact duplicate of those.
I would like to check if a String is a number, and if so I would like to store it as a double. There are several ways to do this, but all of them seem inappropriate for my purposes.
One solution would be to use Double.parseDouble(s) or similarly new BigDecimal(s). However, those solutions don't work if there are commas present (so "1,234" would cause an exception). I could of course strip out all commas before using these techniques, but that would seem to pose loads of problems in other locales.
I looked at Apache Commons NumberUtils.isNumber(s), but that suffers from the same comma issue.
I considered NumberFormat or DecimalFormat, but those seemed far too lenient. For instance, "1A" is formatted to "1" instead of indicating that it's not a number. Furthermore, something like "127.0.0.1" will be counted as the number 127 instead of indicating that it's not a number.
I feel like my requirements aren't so exotic that I'm the first to do this, but none of the solutions does exactly what I need. I suppose even I don't know exactly what I need (otherwise I could write my own parser), but I know the above solutions do not work for the reasons indicated. Does any solution exist, or do I need to figure out precisely what I need and write my own code for it?
Sounds quite weird, but I would try to follow this answer and use java.util.Scanner.
Scanner scanner = new Scanner(input);
if (scanner.hasNextInt())
System.out.println(scanner.nextInt());
else if (scanner.hasNextDouble())
System.out.println(scanner.nextDouble());
else
System.out.println("Not a number");
For inputs such as 1A, 127.0.0.1, 1,234, 6.02e-23 I get the following output:
Not a number
Not a number
1234
6.02E-23
Scanner.useLocale can be used to change to the desired locale.
You can specify the Locale that you need:
NumberFormat nf = NumberFormat.getInstance(Locale.GERMAN);
double myNumber = nf.parse(myString).doubleValue();
This should work in your example since German Locale has commas as decimal separator.
You can use the ParsePosition as a check for complete consumption of the string in a NumberFormat.parse operation. If the string is consumed, then you don't have a "1A" situation. If not, you do and can behave accordingly. See here for a quick outline of the solution and here for the related JDK bug that is closed as wont fix because of the ParsePosition option.
Unfortunately Double.parseDouble(s) or new BigDecimal(s) seem to be your best options.
You cite localisation concerns, but unfortunately there is no way reliably support all locales w/o specification by the user anyway. It is just impossible.
Sometimes you can reason about the scheme used by looking at whether commas or periods are used first, if both are used, but this isn't always possible, so why even try? Better to have a system which you know works reliably in certain situations than try to rely on one which may work in more situations but can also give bad results...
What does the number 123,456 represent? 123456 or 123.456?
Just strip commas, or spaces, or periods, depending on locale specified by user. Default to stripping spaces and commas. If you want to make it stricter, only strip commas OR spaces, not both, and only before the period if there is one. Also should be pretty easy to check manually if they are spaced properly in threes. In fact a custom parser might be easiest here.
Here is a bit of a proof of concept. It's a bit (very) messy but I reckon it works, and you get the idea anyways :).
public class StrictNumberParser {
public double parse(String numberString) throws NumberFormatException {
numberString = numberString.trim();
char[] numberChars = numberString.toCharArray();
Character separator = null;
int separatorCount = 0;
boolean noMoreSeparators = false;
for (int index = 1; index < numberChars.length; index++) {
char character = numberChars[index];
if (noMoreSeparators || separatorCount < 3) {
if (character == '.') {
if (separator != null) {
throw new NumberFormatException();
} else {
noMoreSeparators = true;
}
} else if (separator == null && (character == ',' || character == ' ')) {
if (noMoreSeparators) {
throw new NumberFormatException();
}
separator = new Character(character);
separatorCount = -1;
} else if (!Character.isDigit(character)) {
throw new NumberFormatException();
}
separatorCount++;
} else {
if (character == '.') {
noMoreSeparators = true;
} else if (separator == null) {
if (Character.isDigit(character)) {
noMoreSeparators = true;
} else if (character == ',' || character == ' ') {
separator = new Character(character);
} else {
throw new NumberFormatException();
}
} else if (!separator.equals(character)) {
throw new NumberFormatException();
}
separatorCount = 0;
}
}
if (separator != null) {
if (!noMoreSeparators && separatorCount != 3) {
throw new NumberFormatException();
}
numberString = numberString.replaceAll(separator.toString(), "");
}
return Double.parseDouble(numberString);
}
public void testParse(String testString) {
try {
System.out.println("result: " + parse(testString));
} catch (NumberFormatException e) {
System.out.println("Couldn't parse number!");
}
}
public static void main(String[] args) {
StrictNumberParser p = new StrictNumberParser();
p.testParse("123 45.6");
p.testParse("123 4567.8");
p.testParse("123 4567");
p.testParse("12 45");
p.testParse("123 456 45");
p.testParse("345.562,346");
p.testParse("123 456,789");
p.testParse("123,456,789");
p.testParse("123 456 789.52");
p.testParse("23,456,789");
p.testParse("3,456,789");
p.testParse("123 456.12");
p.testParse("1234567.8");
}
}
EDIT: obviously this would need to be extended for recognising scientific notation, but this should be simple enough, especially as you don't have to actually validate anything after the e, you can just let parseDouble fail if it is badly formed.
Also might be a good idea to properly extend NumberFormat with this. have a getSeparator() for parsed numbers and a setSeparator for giving desired output format... This sort of takes care of localisation, but again more work would need to be done to support ',' for decimals...
Not sure if it meets all your requirements, but the code found here might point you in the right direction?
From the article:
To summarize, the steps for proper input processing are:
Get an appropriate NumberFormat and define a ParsePosition variable.
Set the ParsePosition index to zero.
Parse the input value with parse(String source, ParsePosition parsePosition).
Perform error operations if the input length and ParsePosition index value don't match or if the parsed Number is null.
Otherwise, the value passed validation.
This is an interesting problem. But perhaps it is a little open-ended? Are you looking specifically to identify base-10 numbers, or hex, or what? I'm assuming base-10. What about currency? Is that important? Or is it just numbers.
In any case, I think that you can use the deficiencies of Number format to your advantage. Since you no that something like "1A", will be interpreted as 1, why not check the result by formatting it and comparing against the original string?
public static boolean isNumber(String s){
try{
Locale l = Locale.getDefault();
DecimalFormat df = new DecimalFormat("###.##;-##.##");
Number n = df.parse(s);
String sb = df.format(n);
return sb.equals(s);
}
catch(Exception e){
return false;
}
}
What do you think?
This is really interesting, and I think people are trying to overcomplicate it. I would really just break this down by rules:
1) Check for scientific notation (does it match the pattern of being all numbers, commas, periods, -/+ and having an 'e' in it?) -- if so, parse however you want
2) Does it match the regexp for valid numeric characters (0-9 , . - +) (only 1 . - or + allowed)
if so, strip out everything that's not a digit and parse appropriately, otherwise fail.
I can't see a shortcut that's going to work here, just take the brute force approach, not everything in programming can be (or needs to be) completely elegant.
My understanding is that you want to cover Western/Latin languages while retaining as much strict interpretation as possible. So what I'm doing here is asking DecimalFormatSymbols to tell me what the grouping, decimal, negative, and zero separators are, and swapping them out for symbols Double will recognize.
How does it perform?
In the US, it rejects: "1A", "127.100.100.100"
and accepts "1.47E-9"
In Germany it still rejects "1A"
It ACCEPTS "1,024.00" but interprets it correctly as 1.024. Likewise, it accepts "127.100.100.100" as 127100100100.0
In fact, the German locale correctly identifies and parses "1,47E-9"
Let me know if you have any trouble in a different locale.
import java.util.Locale;
import java.text.DecimalFormatSymbols;
public class StrictNumberFormat {
public static boolean isDouble(String s, Locale l) {
String clean = convertLocaleCharacters(s,l);
try {
Double.valueOf(clean);
return true;
} catch (NumberFormatException nfe) {
return false;
}
}
public static double doubleValue(String s, Locale l) {
return Double.valueOf(convertLocaleCharacters(s,l));
}
public static boolean isDouble(String s) {
return isDouble(s,Locale.getDefault());
}
public static double doubleValue(String s) {
return doubleValue(s,Locale.getDefault());
}
private static String convertLocaleCharacters(String number, Locale l) {
DecimalFormatSymbols symbols = new DecimalFormatSymbols(l);
String grouping = getUnicodeRepresentation( symbols.getGroupingSeparator() );
String decimal = getUnicodeRepresentation( symbols.getDecimalSeparator() );
String negative = getUnicodeRepresentation( symbols.getMinusSign() );
String zero = getUnicodeRepresentation( symbols.getZeroDigit() );
String clean = number.replaceAll(grouping, "");
clean = clean.replaceAll(decimal, ".");
clean = clean.replaceAll(negative, "-");
clean = clean.replaceAll(zero, "0");
return clean;
}
private static String getUnicodeRepresentation(char ch) {
String unicodeString = Integer.toHexString(ch); //ch implicitly promoted to int
while(unicodeString.length()<4) unicodeString = "0"+unicodeString;
return "\\u"+unicodeString;
}
}
You're best off doing it manually. Figure out what you can accept as a number and disregard everything else:
import java.lang.NumberFormatException;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class ParseDouble {
public static void main(String[] argv) {
String line = "$$$|%|#|1A|127.0.0.1|1,344|95|99.64";
for (String s : line.split("\\|")) {
try {
System.out.println("parsed: " +
any2double(s)
);
}catch (NumberFormatException ne) {
System.out.println(ne.getMessage());
}
}
}
public static double any2double(String input) throws NumberFormatException {
double out =0d;
Pattern special = Pattern.compile("[^a-zA-Z0-9\\.,]+");
Pattern letters = Pattern.compile("[a-zA-Z]+");
Pattern comma = Pattern.compile(",");
Pattern allDigits = Pattern.compile("^[0-9]+$");
Pattern singleDouble = Pattern.compile("^[0-9]+\\.[0-9]+$");
Matcher[] goodCases = new Matcher[]{
allDigits.matcher(input),
singleDouble.matcher(input)
};
Matcher[] nanCases = new Matcher[]{
special.matcher(input),
letters.matcher(input)
};
// maybe cases
if (comma.matcher(input).find()){
out = Double.parseDouble(
comma.matcher(input).replaceFirst("."));
return out;
}
for (Matcher m : nanCases) {
if (m.find()) {
throw new NumberFormatException("Bad input "+input);
}
}
for (Matcher m : goodCases) {
if (m.find()) {
try {
out = Double.parseDouble(input);
return out;
} catch (NumberFormatException ne){
System.out.println(ne.getMessage());
}
}
}
throw new NumberFormatException("Could not parse "+input);
}
}
If you set your locale right, built in parseDouble will work with commas. Example is here.
I think you've got a multi step process to handle here with a custom solution, if you're not willing to accept the results of DecimalFormat or the answers already linked.
1) Identify the decimal and grouping separators. You might need to identify other format symbols (such as scientific notation indicators).
http://download.oracle.com/javase/1.4.2/docs/api/java/text/DecimalFormat.html#getDecimalFormatSymbols()
2) Strip out all grouping symbols (or craft a regex, be careful of other symbols you accept such as the decimal if you do). Then strip out the first decimal symbol. Other symbols as needed.
3) Call parse or isNumber.
One of the easy hacks would be to use replaceFirst for String you get and check the new String whether it is a double or not. In case it's a double - convert back (if needed)
If you want to convert some string number which is comma separated decimal to double, you could use DecimalSeparator + DecimalFormalSymbols:
final double strToDouble(String str, char separator){
DecimalFormatSymbols s = new DecimalFormatSymbols();
s.setDecimalSeparator(separator);
DecimalFormat df = new DecimalFormat();
double num = 0;
df.setDecimalFormatSymbols(s);
try{
num = ((Double) df.parse(str)).doubleValue();
}catch(ClassCastException | ParseException ex){
// if you want, you could add something here to
// indicate the string is not double
}
return num;
}
well, lets test it:
String a = "1.2";
String b = "2,3";
String c = "A1";
String d = "127.0.0.1";
System.out.println("\"" + a + "\" = " + strToDouble(a, ','));
System.out.println("\"" + a + "\" (with '.' as separator) = "
+ strToDouble(a, '.'));
System.out.println("\"" + b + "\" = " + strToDouble(b, ','));
System.out.println("\"" + c + "\" = " + strToDouble(c, ','));
System.out.println("\"" + d + "\" = " + strToDouble(d, ','));
if you run the above code, you'll see:
"1.2" = 0.0
"1.2" (with '.' as separator) = 1.2
"2,3" = 2.3
"A1" = 0.0
"127.0.0.1" = 0.0
This will take a string, count its decimals and commas, remove commas, conserve a valid decimal (note that this is based on US standardization - in order to handle 1.000.000,00 as 1 million this process would have to have the decimal and comma handling switched), determine if the structure is valid, and then return a double. Returns null if the string could not be converted. Edit: Added support for international or US. convertStoD(string,true) for US, convertStoD(string,false) for non US. Comments are now for US version.
public double convertStoD(string s,bool isUS){
//string s = "some string or number, something dynamic";
bool isNegative = false;
if(s.charAt(0)== '-')
{
s = s.subString(1);
isNegative = true;
}
string ValidNumberArguements = new string();
if(isUS)
{
ValidNumberArguements = ",.";
}else{
ValidNumberArguements = ".,";
}
int length = s.length;
int currentCommas = 0;
int currentDecimals = 0;
for(int i = 0; i < length; i++){
if(s.charAt(i) == ValidNumberArguements.charAt(0))//charAt(0) = ,
{
currentCommas++;
continue;
}
if(s.charAt(i) == ValidNumberArguements.charAt(1))//charAt(1) = .
{
currentDec++;
continue;
}
if(s.charAt(i).matches("\D"))return null;//remove 1 A
}
if(currentDecimals > 1)return null;//remove 1.00.00
string decimalValue = "";
if(currentDecimals > 0)
{
int index = s.indexOf(ValidNumberArguements.charAt(1));
decimalValue += s.substring(index);
s = s.substring(0,index);
if(decimalValue.indexOf(ValidNumberArguements.charAt(0)) != -1)return null;//remove 1.00,000
}
int allowedCommas = (s.length-1) / 3;
if(currentCommas > allowedCommas)return null;//remove 10,00,000
String[] NumberParser = s.split(ValidNumberArguements.charAt(0));
length = NumberParser.length;
StringBuilder returnString = new StringBuilder();
for(int i = 0; i < length; i++)
{
if(i == 0)
{
if(NumberParser[i].length > 3 && length > 1)return null;//remove 1234,0,000
returnString.append(NumberParser[i]);
continue;
}
if(NumberParser[i].length != 3)return null;//ensure proper 1,000,000
returnString.append(NumberParser[i]);
}
returnString.append(decimalValue);
double answer = Double.parseDouble(returnString);
if(isNegative)answer *= -1;
return answer;
}
This code should handle most inputs, except IP addresses where all groups of digits are in three's (ex: 255.255.255.255 is valid, but not 255.1.255.255). It also doesn't support scientific notation
It will work with most variants of separators (",", "." or space). If more than one separator is detected, the first is assumed to be the thousands separator, with additional checks (validity etc.)
Edit: prevDigit is used for checking that the number uses thousand separators correctly. If there are more than one group of thousands, all but the first one must be in groups of 3. I modified the code to make it clearer so that "3" is not a magic number but a constant.
Edit 2: I don't mind the down votes much, but can someone explain what the problem is?
/* A number using thousand separator must have
groups of 3 digits, except the first one.
Numbers following the decimal separator can
of course be unlimited. */
private final static int GROUP_SIZE=3;
public static boolean isNumber(String input) {
boolean inThousandSep = false;
boolean inDecimalSep = false;
boolean endsWithDigit = false;
char thousandSep = '\0';
int prevDigits = 0;
for(int i=0; i < input.length(); i++) {
char c = input.charAt(i);
switch(c) {
case ',':
case '.':
case ' ':
endsWithDigit = false;
if(inDecimalSep)
return false;
else if(inThousandSep) {
if(c != thousandSep)
inDecimalSep = true;
if(prevDigits != GROUP_SIZE)
return false; // Invalid use of separator
}
else {
if(prevDigits > GROUP_SIZE || prevDigits == 0)
return false;
thousandSep = c;
inThousandSep = true;
}
prevDigits = 0;
break;
default:
if(Character.isDigit(c)) {
prevDigits++;
endsWithDigit = true;
}
else {
return false;
}
}
}
return endsWithDigit;
}
Test code:
public static void main(String[] args) {
System.out.println(isNumber("100")); // true
System.out.println(isNumber("100.00")); // true
System.out.println(isNumber("1,5")); // true
System.out.println(isNumber("1,000,000.00.")); // false
System.out.println(isNumber("100,00,2")); // false
System.out.println(isNumber("123.123.23.123")); // false
System.out.println(isNumber("123.123.123.123")); // true
}