I am trying to convert a regex query to keyword query, such that the keyword query gives me a superset of the regex query. For example
"host." can convert to "host"
"host ((?10\.6\.2*)) ChuckN*" can convert to "host *", "10 6 ", "Chuck"
"host.* registered.+" can convert to "host*", "registered*"
"10\.64\.2*" can convert to "10 64 *"
For this I am looking for a regex tree whose leaf elements can be combined to get the keyword query. I am trying to access the data structure inside pattern class in java used to store the regex. Please let me know how this could be done or if there is some other way.
System.out.println(
( "host."
+ "\nhost ((?10\\.6\\.2*)) ChuckN*"
+ "\nhost.* registered.+"
+ "\n10\\.64\\.2*"
)
.replaceAll("(.?\\*)", "*")
.replaceAll("\\.\\+", "*")
.replaceAll("\\\\.", " ")
.replaceAll("(\\.?[^A-Za-z0-9 \\*\\n])", "")
);
output
host
host 10 6 * Chuck*
host* registered*
10 64 *
Edit : Last replaceAll line corrected.
You may have to complete the sString in last .replaceAll("sString", "").
Related
I have a CSV file below from one of the system.
""demo"",""kkkk""
""demo " ","fg"
" " demo" "
"demo"
"value1","" frg" ","vaue5"
"val3",""tttyy " ",""hjhj","ghuy"
Objective is get all the 2 pair double quotes removed and only one set of double quote is allowed like below. The spaces between the sets of double quote is not a fixed value. This has to be handled in a Java program using replaceAll
function in Java
"demo","kkkk"
"demo","fg"
"demo"
"demo"
"value1","frg","vaue5"
"val3","tttyy","hjhj","ghuy"
I tired this on regex101 with "[ ]*" and it works for PHP>=7.3 version but not in Java.
Also tried [\"][\"]|[^\"]\s+[\"] but still not getting desired output. Any suggestion please for the regular expression which can be used in Java program?
Based on shown sample data, you can use:
String repl = str.replaceAll("(?:\\h*\"){2}\\h*", "\"");
RegEx Demo
RegEx Details:
(?:\h*\"){2}: Match a pair of double quotes that have 0 or more whitespaces between them
\h*: Match 0 or more whitespace
Replacement is just a "
I have a requirement of parsing through an python file which contains multiple sql queries and get the start and end positions of the query to get only the query part using JAVA
I am using .contains function to check for sql(''' as my opening character for the query and now for the closing character I have ''') but there are some cases where ''') comes in between the query when there is a variable involved which should not be detected as an end of the query.
Something like this :
spark.sql(''' SELECT .......
FROM.....
WHERE xxx IN ('''+ Variable +''')
''')
here the last but one line also gets detected as end of line if I use line.contains(" ''') ") which is wrong.
All I can think of is to check for next line character as the end of the query as each query is separated by two empty lines. So tried these if (line.contains(" ''')\n") & if (line.contains(" ''')\r\n") but none of them work for me.
Kindly let me know of any other way to do this.
Note that I do not have the privilege to change the query file.
Thanks
I believe simple contains won't solve this problem.
You will have to use Pattern if you are looking to match \n.
String query = "spark.sql(''' SELECT .......\n" +
"FROM..... \n" +
"WHERE xxx IN ('''+ Variable +''')\n" +
"''')";
Pattern pattern = Pattern.compile("^spark.sql\\('''(.*)'''\\)$", Pattern.DOTALL);
System.out.println(pattern.matcher(query).find());
Output:
true
Pattern.DOTALL tells Java to allow the dot to match newline characters, too.
I will split up this problem to be more easy to me :
for this expression :
"created":"589c8377576a33706397f3f4"
I write this regex :
output_row.json.replaceAll("\"created\":\"589c8377576a33706397f3f4\"","");
It works ! Now I would like to use a dynamic token e.g. [[:xdigit:]].
I try this but It didn't work !
output_row.json.replaceAll("\"created\":\"[[:xdigit:]]\"","");
Could you advice me, please ?
[[:xdigit:]] is exactly one hex digit. Add the + quantifier to match 1 to n, or the * to match 0 to n hex digits.
Finally I found the answer :
//replace the value of the key created
output_row.json = output_row.json.replaceAll("\"created\":\"[a-zA-Z0-9]+\"","\"created\":\"" + formatted + "\"");
I don't know why this class is not accepted in Talend editor : [[:xdigit:]]not specific to Java perhaps ?
Anyway the topic is closed for me !
Ale
I have a string which I want a string to parse via Java or Python regexp:
something (\var1 \var2 \var3 $var4 #var5 $var6 *fdsfdsfd #uytuytuyt fdsgfdgfdgf aaabbccc)
The number of var is unknown. Their exact names are unknown. Their names may or may not start with "\" or "$", "*", "#" or "#" and there're delimited by whitespace.
I'd like to parse them separately, that is, in capture groups, if possible. How can I do that? The output I want is a list of:
[\var1 , \var2 , \var3 , $var4 , #var5 , $var6 , *fdsfdsfd , #uytuytuyt , fdsgfdgfdgf , aaabbccc]
I don't need the java or python code, I just need the regexp. My incomplete one is:
something\s\(.+\)
something\s\((.+)\)
In this regex you are capturing the string containing all the variables. split it based on whitespace since you are sure that they are delimited by whitespace.
m = re.search('something\s\((.+)\)', input_string)
if m:
list_of_vars = m.group(1).split()
I have string like
order by o desc,b asc
Here I want to replace o and b columns of this clause by table_o and table_b and output
order by table_o desc, table_b asc
I am using replace function for that but output becomes like
table_order table_by table_o desc,table_b asc
How to solve this problem using regular expression?
One more example
"order by orders desc, bye asc"
should be replaced as
"order by table_orders desc, table_bye asc"
Here is one possible solution. [You might have to tweak spaces around desc asc and , based on your actual SQL]
String str = "select a,b,c * from Table order by o desc,b asc,c,d";
System.out.println(str.replaceAll(
"(.*order by )?(\\w+)( desc| asc)?(,|$)", "$1table_$2$3$4"));
Result
select a,b,c * from Table order by table_o desc,table_b asc,table_c,table_d
Visual Regex
Regex details
(.*order by)? => will match select a,b,c * from Table order by =>back ref $1
(\\w+) => will match column name =>back ref $2
( desc| asc)? => will match desc or asc => back ref $3
(,|$) => will match trailing comma or endof line => back ref $4
Please Note : this solution only works with simple sql queries, and would produce wrong result if the order byclause is part of inner query of a complex SQL. Moreover Regex is not can not ideal tool to parse SQL syntax
See this link Regular expression to match common SQL syntax?
If full-fledged SQL parsing is required, Its better to use either SQL parsers or Parser generators like ANTLR to parse SQL. See this link for list of available ANTLR SQL grammer
If you just want to replace text like that just use these regexes:
" o "
" b "
Probably you are looking for this? Regular Expressions in Java SE & EE Have a look at Regular Expressions chapter that will do the work most of the times.
Simply use a space in the replace function (you do not need a regex).
Pseudo-code:
string = string_replace(string, " o ", " table_o ")
Edit:
After your example, you can but every valid boundary between [ and ]. The regex will then match is. To get back the origional boundary put it between ( and ) and replace it back.
E.g.:
string = regex_replace(string, "([ \t])o([ \t,])", "\1o\2")
\1 and \2 might be different in your regex implementation.
Also I'd suggest clarifying your case so that it is clear what you really want to replace and also take a look at Truth's suggestion of the XY problem.
You can use code like this to convert your text:
String sql = "select o, b, c,d form Table order by orders ,b asc, c desc,d desc, e";
String text = sql.toLowerCase();
String orderBy = "order by ";
int start = text.indexOf(orderBy);
if (start >= 0) {
String subtext = text.substring(start+orderBy.length());
System.out.printf("Replaceed: [%s%s%s]%n", text.substring(0, start), orderBy, subtext.replaceAll("(\\w+)(\\s+(?:asc|desc)?,?\\s*)?", "table_$1$2"));
}
OUTPUT:
Replaceed: [select o, b, c,d form table order by table_orders ,table_b asc, table_c desc,table_d desc, table_e]