Regex - Find javascript methods and its variables in text

Regex - Find javascript methods and its variables in text - java

Best Solution i come up with so far, given a textblock it finds those methods that have paramters, but also the function with parameter key like this: "get: function(key)".
public class JavaScriptMethodFinder
{
static readonly string pattern = #"(?<=\s(?<Begin>[a-zA-Z_][a-zA-Z0-9_]*?)\(|\G)\s*((['""]).+?(?<!\\)\2|\{[^}]+\}|[^,;'""(){}\)]+)\s*(?:,|(?<IsEnd>\)))";
private static readonly Regex RegEx = new Regex(pattern, RegexOptions.Compiled);
public IEnumerable<dynamic> Find(string text)
{
var t = RegEx.Matches(text);
dynamic current = null;
bool isBegin;
foreach (Match item in t)
{
if (isBegin = (item.Groups["Begin"].Value != string.Empty))
{
current = new ExpandoObject();
current.MethodName = item.Groups["Begin"].Value;
current.Parameters = new List<string>();
current.Parameters.Add(item.Groups[1].Value);
}else
current.Parameters.Add(item.Groups[1].Value);
if (item.Groups["IsEnd"].Value != string.Empty)
{
isBegin = false;
if(!(item.Groups["Begin"].Value != string.Empty))
current.Parameters.Add(item.Groups[1].Value);
yield return current;
}
}
}
}
I wanna find Methods and its Variables. Given two examples.
First Example
function loadMarkers(markers)
{
markers.push(
new Marker(
"Hdsf",
40.261330438503,
10.4877055287361,
"some text"
)
);
}
Second Example
var block = new AnotherMethod('literal', 'literal', {"key":0,"key":14962,"key":false,"key":2});
So far i have, tested here: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
(?<=Marker\(|\G)\s*((?<name>['""]).+?(?<!\\)\2|\{[^}]+\}|[^,;'""(){}\)]+)\s*(?:,|\))
Found 5 matches:
"Hdsf", has 2 groups:
"Hdsf"
"
40.261330438503, has 2 groups:
40.261330438503
10.4877055287361, has 2 groups:
10.4877055287361
"some text" ) has 2 groups:
"some text"
"
) has 2 groups:
(?<=AnotherMethod\(|\G)\s*((?<name>['""]).+?(?<!\\)\2|\{[^}]+\}|[^,;'""(){}\)]+)\s*(?:,|\))
Found 3 matches:
'literal', has 2 groups:
'literal'
' (name)
'literal', has 2 groups:
'literal'
' (name)
{"key":0,"key":14962,"key":false,"key":2}) has 2 groups:
{"key":0,"key":14962,"key":false,"key":2}
(name)
I would like to combine it such that i have one expression
Match<(methodname)>
Group : parameter
Group : parameter
Group : parameter
Match<(methodname)>
Group : parameter
Group : parameter
Group : parameter
so when i scan a page which contains both cases, i will get two matches witch
ect the first capture being the method name and then the following is the paramters.
I been trying to modify what i already have, but its to complex with the LookBehind stuff for I to understand it.

Regex's are a very problematic approach for this type of project. Have you looked at using a genuine JavaScript parser/compiler like Rhino? That will give you full awareness of JavaScript syntax "for free" and the ability to walk your source code meaningfully.

Related

How to get a certain characters through regex [duplicate]

I am trying to select just what comes after name= and before the & in :
"/pages/new?name=J&return_url=/page/new"
So far I have..
^name=(.*?).
I am trying to return in this case, just the J, but its dynamic so it could very several characters, letters, or numbers.
The end case situation would be allowing myself to do a replace statement on this dynamic variable found by regex.

/name=([^&]*)/
remove the ^ and end with an &
Example:
var str = "/pages/new?name=J&return_url=/page/new";
var matches = str.match(/name=([^&]*)/);
alert(matches[1]);
The better way is to break all the params down (Example using current address):
function getParams (str) {
var queryString = str || window.location.search || '';
var keyValPairs = [];
var params = {};
queryString = queryString.replace(/.*?\?/,"");
if (queryString.length)
{
keyValPairs = queryString.split('&');
for (pairNum in keyValPairs)
{
var key = keyValPairs[pairNum].split('=')[0];
if (!key.length) continue;
if (typeof params[key] === 'undefined')
params[key] = [];
params[key].push(keyValPairs[pairNum].split('=')[1]);
}
}
return params;
}
var url = "/pages/new?name=L&return_url=/page/new";
var params = getParams(url);
params['name'];
Update
Though still not supported in any version of IE, URLSearchParams provides a native way of retrieving values for other browsers.

The accepted answer includes the hash part if there is a hash right after the params. As #bishoy has in his function, the correct regex would be
/name=([^&#]*)/

Improving on previous answers:
/**
*
* #param {string} name
* #returns {string|null}
*/
function getQueryParam(name) {
var q = window.location.search.match(new RegExp('[?&]' + name + '=([^&#]*)'));
return q && q[1];
}
getQueryParam('a'); // returns '1' on page http://domain.com/page.html?a=1&b=2

here is the full function (tested and fixed for upper/lower case)
function getParameterByName (name)
{
name = name.replace(/[\[]/, "\\\[").replace(/[\]]/, "\\\]");
var regexS = "[\\?&]" + name.toLowerCase() + "=([^&#]*)";
var regex = new RegExp(regexS);
var results = regex.exec(window.location.search.toLowerCase());
if (results == null)
return "";
else
return decodeURIComponent(results[1].replace(/\+/g, " "));
}

The following should work:
\?name=(.*?)&

var myname = str.match(/\?name=([^&]+)&/)[1];
The [1] is because you apparently want the value of the group (the part of the regex in brackets).
var str = "/pages/new?name=reaojr&return_url=/page/new";
var matchobj = str.match(/\?name=([^&]+)&/)[1];
document.writeln(matchobj); // prints 'reaojr'

Here's a single line answer that prevents having to store a variable (if you can't use URLSearchParams because you still support IE)
(document.location.search.match(/[?&]name=([^&]+)/)||[null,null])[1]
By adding in the ||[null,null] and surrounding it in parentheses, you can safely index item 1 in the array without having to check if match came back with results. Of course, you can replace the [null,null] with whatever you'd like as a default.

You can get the same result with simple .split() in javascript.
let value = url.split("name=")[1].split("&")[0];

This might work:
\??(.*=.+)*(&.*=.+)?

Issue with Java string search pattern ( contains / matches)

I have one string which contains a couple of attribute values. While verifying whether the string contains specific attribute values or not by using some simple regex, the matches function is always returning false value.
Now I need the behavior like,
If String contains \"import\" : Then I need isExportSet to be
set as true.
If String contains \"path\" : true Then I need
isPathSet to be set to true.
I tried as shown below, but it did not work for me:
public class DriverClass {
public static void main(String[] args) {
String str = "\"import\" : \"static\",\"path\" : true";
boolean isExportSet = str.matches("\\*+export\\*+");
boolean isPathSet = str.matches("\\*+multipath\\s+:\\s+true");
System.out.println("Export " + isExportSet);
System.out.println("Path " + isPathSet);
}
}

Please let me know if the following code fulfill the problem deifinition.
static String str = "\"import\" : \"static\",\"path\" : true";
static void test(String str) {
Map<String, String> map = new HashMap<String, String>();
String[] parts = str.split("(:|,)");
for (int i = 0; i < parts.length - 1; i+=2) {
map.put(getUnquotedStr(parts[i]), getUnquotedStr(parts[i+1]));
}
System.out.println(map.size() + " entries: " + map); // 2 entries: {path=true, import=static}
boolean isExportSet = "".equals(map.get("import"));
boolean isPathSet = "true".equals(map.get("path"));
System.out.println(isExportSet + " - " + isPathSet);
}
private static String getUnquotedStr(String str) {
return str.replaceAll("\"", "").trim();
}
will print as follows on the console:
2 entries: {path=true, import=static}
false - true

You can simply use str.contains("valueToSerach")

You can use
\"import\"
\"(path|multipath)\"
And please never connect a * with another quantity indicator that leads to errors.
And since you want to check the " hard, you have to include them in your expression.

Testing the string for containing \"import\" is just checking if the string contains "import". In your regular expression you need to disregard the \ check because this is an escape character for Java to be able to handle the double quotes inside a string, without ending the string definition. You will, however, need to escape those characters in your regex as well.
For "import" the regex becomes str.matches(\"import\"). Analogous for the "path" string.
I found this a handy tool to check regex's: Free Formatter

Java trim character and whitespaces

Reading the annotation on top of Java TestNG test and I have annotation as:
#TestInfo(id={ " C26603", " C10047" })
where TestInfo is just the interface that has id() as String array:
public String[] id() default {};
and C26603 and C10047 are just test ids that I assign.
Here is how test structure looks like (for example):
CASE 1:
#TestInfo(id={ " C26603", " C10047" })
public void testDoSomething() {
Assert.assertTrue(false);
}
Similarly more cleaner case would be:
CASE 2:
#TestInfo(id={ "C26603", "C10047" })
As you can see this case 2 is more clear than the case 1. This case 2 does not have white spaces in test ids.
How do I fetch these ids and make sure that they don't have that C character in beginning and just a pure number? For example, I just want 26603 for my first id and 10047 for 2nd one. There are some spaces in the id array (inside quotes). I want to trim everything of that (like white spaces) and just get the id. I am currently applying for loop to process each id and once I get the pure number, I want to make a 3rd party API call (API expects pure number as input and so removal of C as initial character and other white spaces is important).
Here is what I have tried:
TestInfo annotation = method.getAnnotation(TestInfo.class);
if(annotation!=null) {
for(String test_id: annotation.id()) {
//check if id is null or empty
if (test_id !=null && !test_id.isEmpty()) {
//remove white spaces and check if id = "C1234" or id = "1234"
if(Character.isLetter(test_id.trim().charAt(0))) {
test_id = test_id.substring(1);
}
System.out.println(test_id);
System.out.println(test_id.trim());
}
}
}
Above code gives me C26603 and not 26603 for the case 1. It works for the case 2.
CASE 3:
#TestInfo(id={ " 26603", " 10047" })
For this case, there is no C as beginning character of test id, so the function should be smart enough to just trim white spaces and go ahead.

The simplest approach would be to just remove everything that is not a digit, using the regular expression non-digit character class (\D):
test_id = test_id.replaceAll("\\D", "");

I highly encourage you to debug your method. You will learn a lot.
If you take a look at your if statement here:
if(Character.isLetter(test_id.trim().charAt(0))) {
test_id = test_id.substring(1);
}
When your test_id = " C1234", your condition is true. However, your problem becomes the substring.
ANSWER: trim it!
test_id = test_id.trim().substring(1);

Regex: Find first occurence and map to canonical value

I have some input data like this:
1996 caterpiller d6 dozer for sale (john deere and komatsu too!)
I want to match the first brand name found and map it to its canonical value.
Here's the map:
canonical regex
KOMATSU \bkomatsu\b
CAT \bcat(erpill[ae]r)?\b
DEERE \b(john )?deere?\b
I can easily test that a brand is in the string:
/\b(cat(erpill[ae]r)?|(john )?deere?|komatsu)\b/i.exec(...) != null
or what the first match was:
/\b(cat(erpill[ae]r)?|(john )?deere?|komatsu)\b/i.exec(...)[0]; //caterpiller
But is there a fast or convenient way to map the first match to the real value that I want?
caterpiller => CAT
Do I need to find the first match, then test against all patterns in the map?
I need to do 10,000+ inputs against 10,000+ brands :D
I could loop the the map, testing against the input value, but that would find the first value that appears in the map, not the input.

An idea consists to associate the number of a capture group with an index in the canonical name array. So each different brand must have its own number:
var can = ['', 'KOMATSU', 'CAT', 'DEERE'];
// ^idx1 ^idx 2 ^idx 3
var re =/\b(?:(komatsu)|(cat(?:erpill[ae]r)?)|((?:john )?deere))\b/ig;
// ^ 1st grp ^ 2nd grp ^ 3rd grp
var text = '1996 caterpiller d6 dozer for sale (john deere and komatsu too!)';
while ((res = re.exec(text)) !== null) {
for (var i=1; i<4; i++) { // test each group until one is defined
if (res[i]!= undefined) {
console.log(can[i] + "\t" + res[0]);
break;
}
}
}
// result:
// CAT caterpiller
// DEERE john deere
// KOMATSU komatsu

How can I change my regex to correctly match float literals?

I am attempting to create a parser for java expressions, but for some reason I am unable to match floating point values. I am using a java.util.Matcher obtained from
Matcher token = Pattern.compile(
"(\\w[\\w\\d]*+)|" + //identifiers as group 1
"((?:(?>[1-9][0-9]*+\\.?[0-9]*+)|(?>\\.[0-9]++))(?:[Ee][+-]?[0-9]++)?)|" + //literal numbers
"([^\\w\\d\\s]*+)" //operators as group 3
).matcher();
This is intended to match an identifier, a floating point value, or an operator (I still need to refine that part of the match though will refine that part of the match later). However, I am having an issue with it in that
Below is the code that is using that expression, which is intended to take all the identifiers, numbers, and operators, register all the numbers in vars, and put all the identifiers, each number's corresponding value, and all the operators in tokens in same order as in the original string.
It does not succeed in doing so, however, because for an input string like foo 34.78e5 bar -2.7 the resulting list is '[34, A, , bar, , -, 2, B, ]' with A=-78000.0 and B=-0.7. It is supposed to return '[foo, A, bar, B]` with A=3478000 and B=-2.7. I beleive it may be just that it is failing to include both parts of the number as the match of the regex, however that may not be the case.
I have tried removing the atomic grouping and possesives from the regex, however that did not change anything.
LinkedList<String> tokens = new LinkedList<String>();
HashMap<String, Double> vars = new HashMap<String, Double>();
VariableNamer varNamer = new VariableNamer();
for(Matcher token = Pattern.compile(
"(\\w[\\w\\d]*+)|" + //variable names as group 1
"((?:(?:[1-9][0-9]*+\\.?[0-9]*+)|(?:\\.[0-9]++))(?:[Ee][+-]?[0-9]++)?)|" +
//literal numbers as group 2
"([^\\w\\d\\s]*+)" //operators as group 3
).matcher(expression); token.find();){
if(token.group(2) != null) { //if its a literal number, register it in vars and substitute a string for it
String name = varNamer.next();
if (
tokens.size()>0 &&
tokens.get(tokens.size()-1).matches("[+-]") &&
tokens.size()>1?tokens.get(tokens.size()-2).matches("[^\\w\\d\\s]"):true
)
vars.put(name, tokens.pop().equals("+")?Double.parseDouble(token.group()):-Double.parseDouble(token.group()));
else
vars.put(name, Double.parseDouble((token.group())));
tokens.addLast(name);
} else {
tokens.addLast(token.group());
}
}
and here is VariableNamer:
import java.util.Iterator;
public class VariableNamer implements Iterator<String>{
StringBuffer next = new StringBuffer("A");
#Override
public boolean hasNext() {
return true;
}
#Override
public String next() {
try{
return next.toString();
}finally{
next.setCharAt(next.length()-1, (char) (next.charAt(next.length()-1) + 1));
for(int idx = next.length()-1; next.charAt(idx) + 1 > 'Z' && idx > 0; idx--){
next.setCharAt(idx, 'A');
next.setCharAt(idx - 1, (char) (next.charAt(idx - 1) + 1));
}
if (next.charAt(0) > 'Z'){
next.setCharAt(0, 'A');
next.insert(0, 'A');
}
}
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}

Depending on details of your expression mini-language, it is either close to the limit on what is possible using regexes ... or beyond it. And even if you do succeed in "parsing", you will be left with the problem of mapping the "group" substrings into a meaningful expression.
My advice would be to take an entirely different approach. Either find / use an existing expression library, or implement expression parsing using a parser generator like ANTLR or Javacc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex - Find javascript methods and its variables in text - java

Regex's are a very problematic approach for this type of project. Have you looked at using a genuine JavaScript parser/compiler like Rhino? That will give you full awareness of JavaScript syntax "for free" and the ability to walk your source code meaningfully.

Related

How to get a certain characters through regex [duplicate]

Issue with Java string search pattern ( contains / matches)

Java trim character and whitespaces

Regex: Find first occurence and map to canonical value

How can I change my regex to correctly match float literals?

Categories

Resources