Extracting Operation(...); and sub Operation from String using REGEX

Extracting Operation(...); and sub Operation from String using REGEX - java

I have an issue with a Regex in java for Android.
i would like to retreive the first operation (and each sub operations) like in the following samples:
"OPERATION(ASYNC_OPERATION,_RFID_ITEM_SERIAL);"
"OPERATION(CONCAT,~1261,01,OPERATION(ASYNC_OPERATION,_RFID_ITEM_ID);,21,OPERATION(ASYNC_OPERATION,_RFID_ITEM_SERIAL););"
As you can see each Operation can have sub Operations... And that's where i'm getting problems.
Actually i am using this Regex: ^\s*(OPERATION\s*\(\s*)(.*)(\);)
but the index of ");" returned is always the last index, and in case of two sub operations, inside of a "Main" operation, this is wrong...
private static Pattern operationPattern=Pattern.compile("^\\s*(OPERATION\\s*\\(\\s*)(.*)(\\);)",Pattern.CASE_INSENSITIVE);
public Operation(String text){
parseOperationText(text);
}
private void parseOperationText(String text){
String strText = text.replace("#,", "§");
Matcher matcher=operationPattern.matcher(strText);
if(matcher.find()) {
//This is an OPERATION
subOperations=new ArrayList<>();
String strChain = matcher.group(2);//Should only contain the text between "OPERATION(" and ");"
int commaIdx = strChain.indexOf(",");
if (commaIdx == -1) {
//Operation without parameter
operationType = strChain;
} else {
//Operation with parameters
operationType = strChain.substring(0, commaIdx);
strChain = strChain.substring(commaIdx + 1);
while (strChain.length()>0) {
matcher = operationPattern.matcher(strChain);
if (matcher.find()) {
String subOpText=matcher.group(0);
strChain=StringUtils.stripStart(strChain.substring(matcher.end())," ");
if(strChain.startsWith(",")){
strChain=strChain.substring(1);
}
subOperations.add(new Operation(subOpText));
}
else{
commaIdx = strChain.indexOf(",");
if(commaIdx==-1)
{
subOperations.add(new Operation(strChain));
strChain="";
}
else{
subOperations.add(new Operation(strChain.substring(0,commaIdx)));
strChain=strChain.substring(commaIdx+1);
}
}
}
}
}
else {
//Not an operation
//...
}
}
It works for sample 1 but for Sample 2, after finding the "Main" operation (CONCAT in the sample), the second match returns this:
OPERATION(ASYNC_OPERATION,_RFID_ITEM_ID);,21,OPERATION(ASYNC_OPERATION,_RFID_ITEM_SERIAL);
What i would like to retrieve is this:
"CONCAT,~1261,01,OPERATION(ASYNC_OPERATION,_RFID_ITEM_ID);,21,OPERATION(ASYNC_OPERATION,_RFID_ITEM_SERIAL);"
"ASYNC_OPERATION,_RFID_ITEM_ID"
"ASYNC_OPERATION,_RFID_ITEM_SERIAL"

Could use this
"(?s)(?=OPERATION\\s*\\()(?:(?=.*?OPERATION\\s*\\((?!.*?\\1)(.*\\)(?!.*\\2).*))(?=.*?\\)(?!.*?\\2)(.*)).)+?.*?(?=\\1)(?:(?!OPERATION\\s*\\().)*(?=\\2$)"
to find the balanced OPERATION( ) string in group 0.
https://regex101.com/r/EsaDtC/1
Then use this
(?s)^OPERATION\((.*?)\)$
on that last matched string to get the inner contents of the
operation, which is in group 1.

Finally i'm using two different REGEX :
//First Regex catches main operation content (Group 2):
\s*(OPERATION\s*\(\s*)(.*)(\);)
//Second Regex catches next full sub "OPERATION(...);" (Group 0):
^(?:\s*(OPERATION\s*\(\s*))(.*)(?:\)\s*\;\s*)(?=\,)|^(?:\s*(OPERATION\s*\(\s*))(.*)(?:\)\s*\;\s*)$
Then i can use Fisst Regex to detect if this is an operation (match.find()), catch it's content in Group(2) and then for each param (separated by comma) i can check if it's a sub operation with second regex. If it's a sub Operation i call recursively the same function that uses First Regex again... and so on.
private static Pattern operationPattern=Pattern.compile("^\\s*(OPERATION\\s*\\(\\s*)(.*)(\\);)",Pattern.CASE_INSENSITIVE);
private static Pattern subOperationPattern=Pattern.compile("^(?:\\s*(OPERATION\\s*\\(\\s*))(.*)(?:\\)\\s*\\;\\s*)(?=\\,)|^(?:\\s*(OPERATION\\s*\\(\\s*))(.*)(?:\\)\\s*\\;\\s*)$",Pattern.CASE_INSENSITIVE);
private void parseOperationText(String strText ){
Matcher matcher=operationPattern.matcher(strText);
if(matcher.find()) {
//This is an OPERATION
subOperations=new ArrayList<>();
String strChain = matcher.group(2);
int commaIdx = strChain.indexOf(",");
if (commaIdx == -1) {
//Operation without parameter
operationType = strChain;
} else {
//Operation with parameters
operationType = strChain.substring(0, commaIdx);
strChain = strChain.substring(commaIdx + 1);
while (strChain.length()>0) {
matcher = subOperationPattern.matcher(strChain);
if (matcher.find()) {
String subOpText=matcher.group(0);
strChain=StringUtils.stripStart(strChain.substring(matcher.end())," ");
if(strChain.startsWith(",")){
strChain=strChain.substring(1);
}
subOperations.add(new Operation(subOpText));
}
else{
commaIdx = strChain.indexOf(",");
if(commaIdx==-1)
{
subOperations.add(new Operation(strChain));
strChain="";
}
else{
subOperations.add(new Operation(strChain.substring(0,commaIdx)));
strChain=strChain.substring(commaIdx+1);
}
}
}
}
}
else {
//Fixed value: we store the value as is
fieldValue = strText;
operationType = OperationType.NONE;
}
}
public Operation(String text){
parseOperationText(text);
}

Related

How can I remove the subdomain part of a URL

I am trying to remove subdomain and leave only the domain name followed by the extension.
It is difficult to find the subdomain because I do not know how many dots to expect in a url. some urls end in .com some in .co.uk for example.
How can I remove the subdomain safely so that foo.bar.com becomes bar.com and foo.bar.co.uk becomes bar.co.uk
if(!rawUrl.startsWith("http://")&&!rawUrl.startsWith("https://")){
rawUrl = "http://"+rawUrl;
}
String url = new java.net.URL(rawUrl).getHost();
String urlWithoutSub = ???

What you need is a Public Sufix List, such as the one available at https://publicsuffix.org/. Basically, there is no algorithm that can tell you which suffixes are public, so you need a list. And you’d better used one that is public and well-maintained.

just stumped upon this question and decided to write the following function.
Example Input -> Output:
http://example.com -> http://example.com
http://www.example.com -> http://example.com
ftp://www.a.example.com -> ftp://example.com
SFTP://www.a.example.com -> SFTP://example.com
http://www.a.b.example.com -> http://example.com
http://www.a.c.d.example.com -> http://example.com
http://example.com/ -> http://example.com/
https://example.com/aaa -> http://example.com/aaa
http://www.example.com/aa/bb../d -> http://example.com/aa/bb../d
FILE://www.a.example.com/ddd/dd/../ff -> FILE://example.com/ddd/dd/../ff
HTTPS://www.a.b.example.com/index.html?param=value -> HTTPS://example.com/index.html?param=value
http://www.a.c.d.example.com/#yeah../..! -> http://lmao.com/#yeah../..!
Same goes for second level domains
http://some.thing.co.uk/?ke - http://thing.co.uk/?ke
something.co.uk/?ke - something.co.uk/?ke
www.something.co.uk/?ke - something.co.uk/?ke
www.something.co.uk - something.co.uk
https://www.something.co.uk - https://something.co.uk
Code:
public static String removeSubdomains(String url, ArrayList<String> secondLevelDomains) {
// We need our URL in three parts, protocol - domain - path
String protocol= getProtocol(url);
url = url.substring(protocol.length());
String urlDomain=url;
String path="";
if(urlDomain.contains("/")) {
int slashPos = urlDomain.indexOf("/");
path=urlDomain.substring(slashPos);
urlDomain=urlDomain.substring(0, slashPos);
}
// Done, now let us count the dots . .
int dotCount = Strng.countOccurrences(urlDomain, ".");
// example.com <-- nothing to cut
if(dotCount==1){
return protocol+url;
}
int dotOffset=2; // subdomain.example.com <-- default case, we want to remove everything before the 2nd last dot
// however, somebody had the glorious idea, to have second level domains, such as co.uk
for (String secondLevelDomain : secondLevelDomains) {
// we need to check if our domain ends with a second level domain
// example: something.co.uk we don't want to cut away "something", since it isn't a subdomain, but the actual domain
if(urlDomain.endsWith(secondLevelDomain)) {
// we increase the dot offset with the amount of dots in the second level domain (co.uk = +1)
dotOffset += Strng.countOccurrences(secondLevelDomain, ".");
break;
}
}
// if we have something.co.uk, we have a offset of 3, but only 2 dots, hence nothing to remove
if(dotOffset>dotCount) {
return protocol+urlDomain+path;
}
// if we have sub.something.co.uk, we have a offset of 3 and 3 dots, so we remove "sub"
int pos = Strng.nthLastIndexOf(dotOffset, ".", urlDomain)+1;
urlDomain = urlDomain.substring(pos);
return protocol+urlDomain+path;
}
public static String getProtocol(String url) {
String containsProtocolPattern = "^([a-zA-Z]*:\\/\\/)|^(\\/\\/)";
Pattern pattern = Pattern.compile(containsProtocolPattern);
Matcher m = pattern.matcher(url);
if (m.find()) {
return m.group();
}
return "";
}
public static ArrayList<String> getPublicSuffixList(boolean loadFromPublicSufficOrg) {
ArrayList<String> secondLevelDomains = new ArrayList<String>();
if(!loadFromPublicSufficOrg) {
secondLevelDomains.add("co.uk");secondLevelDomains.add("co.at");secondLevelDomains.add("or.at");secondLevelDomains.add("ac.at");secondLevelDomains.add("gv.at");secondLevelDomains.add("ac.at");secondLevelDomains.add("ac.uk");secondLevelDomains.add("gov.uk");secondLevelDomains.add("ltd.uk");secondLevelDomains.add("fed.us");secondLevelDomains.add("isa.us");secondLevelDomains.add("nsn.us");secondLevelDomains.add("dni.us");secondLevelDomains.add("ac.ru");secondLevelDomains.add("com.ru");secondLevelDomains.add("edu.ru");secondLevelDomains.add("gov.ru");secondLevelDomains.add("int.ru");secondLevelDomains.add("mil.ru");secondLevelDomains.add("net.ru");secondLevelDomains.add("org.ru");secondLevelDomains.add("pp.ru");secondLevelDomains.add("com.au");secondLevelDomains.add("net.au");secondLevelDomains.add("org.au");secondLevelDomains.add("edu.au");secondLevelDomains.add("gov.au");
}
try {
String a = URLHelpers.getHTTP("https://publicsuffix.org/list/public_suffix_list.dat", false, true);
Scanner scanner = new Scanner(a);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
if(!line.startsWith("//") && !line.startsWith("*") && line.contains(".")) {
secondLevelDomains.add(line);
}
}
scanner.close();
} catch (Exception e) {
e.printStackTrace();
}
return secondLevelDomains;
}

Check if String token contains string char and String number value

1)I need to check if String contains a String characters what will be the corect way how to do it ?
2) Are some ways how to corectly transform String to number and then compare theese two number s? Like String = "House":1234 is equal to "House":1234 but no to "house":123
Priview:
String token ="123"; False
String token = "ā123"; or other characters True utc.
if(isChars(token)){
Long value = toLong(token);
}
THANKS!
//EDIT
public BigDecimal eval() {
Stack<BigDecimal> stack = new Stack<BigDecimal>();
for (String token : getRPN()) {
if (operators.containsKey(token)) {
BigDecimal v1 = stack.pop();
BigDecimal v2 = stack.pop();
stack.push(operators.get(token).eval(v2, v1));
} else if (variables.containsKey(token)) {
stack.push(variables.get(token).round(mc));
} else if (functions.containsKey(token.toUpperCase())) {
Function f = functions.get(token.toUpperCase());
ArrayList<BigDecimal> p = new ArrayList<BigDecimal>(f.getNumParams());
for (int i = 0; i < f.numParams; i++) {
p.add(0, stack.pop());
}
BigDecimal fResult = f.eval(p);
stack.push(fResult);
} else if (isDate(token)) {
Long date = null;
try {
date = SU.sdf.parse(token).getTime();
} catch (ParseException e) {/* IGNORE! */
}
// mylog.pl("LONG DATE : "+new BigDecimal(date, mc));
stack.push(new BigDecimal(date, mc));
}//TODO HERE
else if (isChar(token)){
Long cha = toLong(token);
stack.push(new BigDecimal(cha, mc));
//TODO ENDS HERE
}
else {
// mylog.pl("Token : "+ token);
stack.push(new BigDecimal(token, mc));
}
}
return stack.pop().stripTrailingZeros();
}

Another way for determing whether string contains any chars is nice class StringUtils from apache-commons-lang library.
It contains several methods for analyzing string's contents. It seems that in your case you can use StringUtils.isAlphanumeric(CharSequence cs) or negation of StringUtils.isNumeric(CharSequence cs)'s result.
What about second part of your question, so I do not see here necessety of extracting numbers from string. You can compare strings "House":1234 and "house":123 using standard String.equals() method.

Long l;
try{
l = Long.parseLong(token);
} catch(NumberFormatException e){
//contains non-numeric character(s)
}
As for "transforming varchar into Long" - that sounds rather impossible, we do not have universally accepted way of doing that, and you did not provide one. However if I guess correctly that what you want is the number within the string disregarding the characters - you want regular expressions. The code you want could look like:
if (!StringUtils.isNumeric(token)){
String stripped = token.replaceAll("\\D","");
Long l = Long.parseLong(stripped);
}

Fetch all the hyperlinks from a webpage and recursively doing that in java

1 .Fetch all contents from a Webpage
2. fetch hyperlinks from the webpage.
3. Repeat the 1 & 2 from the fetched hyperlink
4. repeat the process untill 200 hyperlinks regietered or no more hyperlink to fetch.
I wrote a sample programs but due to poor understanding of recursion , my loop became an infinite loop.
Suggest me to solve the code matching the expectation.
import java.net.URL;
import java.net.URLConnection;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Content
{
private static final String HTML_A_HREF_TAG_PATTERN =
"\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))";
Pattern pattern;
public Content ()
{
pattern = Pattern.compile(HTML_A_HREF_TAG_PATTERN);
}
private void fetchContentFromURL(String strLink) {
String content = null;
URLConnection connection = null;
try {
connection = new URL(strLink).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\\Z");
content = scanner.next();
}catch ( Exception ex ) {
ex.printStackTrace();
return;
}
fetchURL(content);
}
private void fetchURL ( String content )
{
Matcher matcher = pattern.matcher( content );
while(matcher.find()) {
String group = matcher.group();
if(group.toLowerCase().contains( "http" ) || group.toLowerCase().contains( "https" )) {
group = group.substring( group.indexOf( "=" )+1 );
group = group.replaceAll( "'", "" );
group = group.replaceAll( "\"", "" );
System.out.println("lINK "+group);
fetchContentFromURL(group);
}
}
System.out.println("DONE");
}
/**
* #param args
*/
public static void main ( String[] args )
{
new Content().fetchContentFromURL( "http://www.google.co.in" );
}
}
I am open for any other solution as well but want to stick with core java Api only no 3rd party.

One possible option here is to remember all visited links to avoid cyclic paths. Here's how to archive it with additional Set storage for already visited links:
public class Content {
private static final String HTML_A_HREF_TAG_PATTERN =
"\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))";
private Pattern pattern;
private Set<String> visitedUrls = new HashSet<String>();
public Content() {
pattern = Pattern.compile(HTML_A_HREF_TAG_PATTERN);
}
private void fetchContentFromURL(String strLink) {
String content = null;
URLConnection connection = null;
try {
connection = new URL(strLink).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\\Z");
if (scanner.hasNext()) {
content = scanner.next();
visitedUrls.add(strLink);
fetchURL(content);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
private void fetchURL(String content) {
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
String group = matcher.group();
if (group.toLowerCase().contains("http") || group.toLowerCase().contains("https")) {
group = group.substring(group.indexOf("=") + 1);
group = group.replaceAll("'", "");
group = group.replaceAll("\"", "");
System.out.println("lINK " + group);
if (!visitedUrls.contains(group) && visitedUrls.size() < 200) {
fetchContentFromURL(group);
}
}
}
System.out.println("DONE");
}
/**
* #param args
*/
public static void main(String[] args) {
new Content().fetchContentFromURL("http://www.google.co.in");
}
}
I also fixed some other issues in fetching logic, now it works as expected.

inside the fetchContentFromURL method you should record which url u r currently fetching, and if that url has already be fetched then skip it. otherwise two page A, B, which has a link point to each other will cause your code keep fetching.

In addition to JK1's answer, for achieving target 4 of your question, you might want to maintain the count of hyperlinks as instance variable. A rough pseudo code might be(you can adjust the exact count. Also as an alternate, you can use HashSet length to know the number of Hyperlinks your program has parsed till now):
if (!visitedUrls.contains(group) && noOfHyperlinksVisited++ < 200) {
fetchContentFromURL(group);
}
However, I was not sure whether you want a total of 200 hyperlinks OR want to traverse to a depth of 200 links from starting page. In case it is later, you might wish to explore Breadth First Search, which will let you know when you have reached your target depth.

Simplest way to strip an int out of a URL in Java?

I have a String containing a URL. I want to get just one piece of data out of it: an int that should be showing up in the query string.
So if the url is:
http://domain.tld/page.html?iVar=123
I want to get "123" into an int.
What's the most elegant way you know to do this?

You could try matching just that parameter in the URL string:
public static Integer getIVarParamValue(String urlStr) {
Pattern p = Pattern.compile("iVar=(\\d+)");
Matcher m = p.matcher(urlStr);
if (m.find()) {
return Integer.parseInt(m.group(1));
}
return null;
}

It seems you want to obtain get parameters and parse them. I have this method here (got it from somewhere on SO, I guess):
public static Map<String, List<String>> getQueryParams(String url) {
try {
Map<String, List<String>> params = new HashMap<String, List<String>>();
String[] urlParts = url.split("\\?");
if (urlParts.length > 1) {
String query = urlParts[1];
for (String param : query.split("&")) {
String[] pair = param.split("=");
String key = URLDecoder.decode(pair[0], "UTF-8");
String value = "";
if (pair.length > 1) {
value = URLDecoder.decode(pair[1], "UTF-8");
}
List<String> values = params.get(key);
if (values == null) {
values = new ArrayList<String>();
params.put(key, values);
}
values.add(value);
}
}
return params;
} catch (UnsupportedEncodingException ex) {
throw new AssertionError(ex);
}
}
So:
String var = WebUtils.getQueryParams(url).get("iVar");
int intVar = Integer.parseInt(var);

You can use the URL class.
i.e.:
URL myUrl = new URL("http://domain.tld/page.html?iVar=123");
String query = myUrl.getQuery(); //this will return iVar=123
//at this point you can either parse it manually (i.e. use some of the regexp in the other suggestions, or use something like:
String[] parts = query.split();
String variable = parts[0];
String value = parts[1];
This will work only for this case though and won't work if you have additional params or no params.
There are a number of solution that will split it into a param map online, you can see some here.

If it's really as simple as you describe: There is only 1 int in your URL and all you want is that int, I'd go for a regular expression. If it is actually more complicated see the other answers.
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher("http://domain.tld/page.html?iVar=123");
if (m.find())
System.out.println(m.group());

This also could do the work :
public static int getIntParam(HttpServletRequest request, String name, int defaultValue) {
String value = request.getParameter(name);
try {
if (value != null) {
return Integer.valueOf(value);
}
} catch (NumberFormatException e) {
}
return defaultValue;
}
Hope it helps!

If the query string part of the URL is always the same (so if it was always iVar) you could use urlAsString.indexOf("iVar=") to find iVar= and then knowing the number is after that, extract the number. That is admittedly not the least brittle approach.
But if you're looking for all the query strings then Bozho's answer is much better.

Use JDT to get full method name

I am new to eclipse plugin development and I am trying to convert a IMethod to a string representation of the full method name. I.E.
my.full.package.ClassName.methodName(int param, String string)
so far I have had to hand roll my own solution. Is there a better way?
private static String getMethodFullName(IMethod iMethod)
{
String packageString = "[Default Package]";
try {
IPackageDeclaration[] declarations = iMethod.getCompilationUnit().getPackageDeclarations();
if(declarations.length > 0)
{
packageString = declarations[0].getElementName();
}
} catch (JavaModelException e) {
}
String classString = iMethod.getCompilationUnit().getElementName();
classString = classString.replaceAll(".java", "");
String methodString = iMethod.getElementName() + "(";
for (String type : iMethod.getParameterTypes()) {
methodString += type + ",";
}
methodString += ")";
return packageString + "." + classString + "." + methodString;
}

You can get the Fully qualified name for the type using
method.getDeclaringType().getFullyQualifiedName();
This is probably easier than accessing the package from the compilation unit. The rest of you function looks correct.
One small point: you should use StringBuilder to build up the string instead of adding to a standard String. Strings are immutable so addition creates loads of unrecesary temparary objects.
private static String getMethodFullName(IMethod iMethod)
{
StringBuilder name = new StringBuilder();
name.append(iMethod.getDeclaringType().getFullyQualifiedName());
name.append(".");
name.append(iMethod.getElementName());
name.append("(");
String comma = "";
for (String type : iMethod.getParameterTypes()) {
name.append(comma);
comma = ", ";
name.append(type);
}
name.append(")");
return name.toString();
}

Thanks to iain and some more research I have come up with this solution. It seems like something like this should be built into the JDT....
import org.eclipse.jdt.core.Signature;
private static String getMethodFullName(IMethod iMethod)
{
StringBuilder name = new StringBuilder();
name.append(iMethod.getDeclaringType().getFullyQualifiedName());
name.append(".");
name.append(iMethod.getElementName());
name.append("(");
String comma = "";
String[] parameterTypes = iMethod.getParameterTypes();
try {
String[] parameterNames = iMethod.getParameterNames();
for (int i=0; i<iMethod.getParameterTypes().length; ++i) {
name.append(comma);
name.append(Signature.toString(parameterTypes[i]));
name.append(" ");
name.append(parameterNames[i]);
comma = ", ";
}
} catch (JavaModelException e) {
}
name.append(")");
return name.toString();
}

I am not sure it would take into account all cases (method within an internal class, an anonymous class, with generic parameters...)
When it comes to methods signatures, the classes to look into are:
org.eclipse.jdt.internal.corext.codemanipulation.AddUnimplementedMethodsOperation
org.eclipse.jdt.internal.corext.codemanipulation.StubUtility2
You need to get the jdt.core.dom.IMethodBinding, from which you can extract all what you need.
If you have a MethodInvocation, you can:
//MethodInvocation node
ITypeBinding type = node.getExpression().resolveTypeBinding();
IMethodBinding method=node.resolveMethodBinding();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting Operation(...); and sub Operation from String using REGEX - java

Related

How can I remove the subdomain part of a URL

Check if String token contains string char and String number value

Fetch all the hyperlinks from a webpage and recursively doing that in java

Simplest way to strip an int out of a URL in Java?

Use JDT to get full method name

Categories

Resources