My String is like this.
{\\\"692950841314120\\\":[{\\\"type\\\":\\\"ads_management\\\",\\\"call_count\\\":3,\\\"total_cputime\\\":1,\\\"total_time\\\":5,\\\"estimated_time_to_regain_access\\\":0}]}
Since the key here is a variable value I am trying to replace this 692950841314120(or the values which I get from sever) with a constant like ID. My main goal is to parse this as POJO. I have tried using..
string.replaceAll("^[0-9]{15}$","ID")
but due to Slashes I think i am not able to get the desired value. Is there any better way to do this. I know I can do below Code but I don't want any ID123 if I added extra value and distort any other info in JSON.
string.replaceAll("[0-9]{15}","ID")
Strictly speaking, if you have a valid JSON string, you should parse it using something like GSON, rather than using regex. That being said, if you must use regex, you could try removing the starting and ending anchors:
string.replaceAll("[0-9]{15}", "ID")
Or maybe use double quotes instead:
string.replaceAll("\"[0-9]{15}\"", "ID")
It is safer to assume the value is inisde \" and \":.
You can then use
.replaceAll("(\\\\\")[0-9]{15}(\\\\\":)", "$1ID$2")
The regex is (\\")[0-9]{15}(\\":) and it means:
(\\") - match and capture \" substring into Group 1
[0-9]{15} - fifteen digits
(\\":) - Group 2: a \": substring.
The $1 and $2 are placeholders holding the Group 1 and 2 values.
You should use "A word boundary" \b.
Try this.
public static void main(String[] args) {
String input = "{\\\"692950841314120\\\":"
+ "[{\\\"type\\\":\\\"12345678901234567890\\\","
+ "\\\"call_count\\\":3,"
+ "\\\"total_cputime\\\":1,"
+ "\\\"total_time\\\":5,"
+ "\\\"estimated_time_to_regain_access\\\":0}]}";
System.out.println(input.replaceAll("\\b[0-9]{15}\\b", "ID"));
}
output:
{\"ID\":[{\"type\":\"12345678901234567890\",\"call_count\":3,\"total_cputime\":1,\"total_time\":5,\"estimated_time_to_regain_access\":0}]}
How can we create a Rest API (Spring controller) which allows multiple path variables to have query parameters?
Where
1) function is a path variable and id=functionname is query parameter
2) subfunction is a path variable and id=subfuntionname is query parameter
Request URL : /content/v1/clients/clientname/function?id=functionname&subfunction?id=subfunctionname
Update I am using matrix variations suggested by
/content/v1/clients/clientname/function;id=functionname/subfunction;id=subfunctionname
The method shown below is not working as expected.
What should the method definition look like?
public HashMap<String, List<Model>> getContent(
#PathVariable String clientname,
#MatrixVariable(name="id", pathVar="function") List<String> capabilitiesId,
#MatrixVariable(name="id", pathVar="subfunction") List<String> subcapabilitiesId) {
}
Error : Missing matrix variable 'id' for method parameter of type List
It's not possible.
In REST controller you have two type of parameters:
Path parameter: parameter usefull to select a resource. (a you class's method)
Query parameter: parameter useful to send other information.
In your case I think that is a good idea send all this informations inside payload, using POST or PUT http method.
If you can't use payload you can obtain the following solution:
Request URL : /content/v1/clients/clientname/function1/function2?id1=functionnamec&id2=subfunctionaname
In this way you can create your controller with 2 path parameters and 2 query parameters:
#GET
#Path("/basePath/{funct1}/{funct2}")
public Response <methodName>(#PathParam("funct1") String funct1, #PathParam("funct2") String funct2, #QueryParam("id1") String id1, #QueryParam("id2") String id2)
/content/v1/clients/clientname/function?id=functionnamec&subfunction?id=subfunctionaname
The parsing of URI is defined by RFC 3986. In particular, U+003F QUESTION MARK is a reserved character, the first instance of which serves a the delimiter between the relative-part and the query.
So your example breaks would parse as
path: /content/v1/clients/clientname/function
query: id=functionnamec&subfunction?id=subfunctionaname
And if we were to parse the query, as though it were an application/x-www-form-urlencoded value....
>>> import urllib.parse
>>> urllib.parse.parse_qs("id=functionnamec&subfunction?id=subfunctionaname")
{'id': ['functionnamec'], 'subfunction?id': ['subfunctionaname']}
We see that the second question mark becomes part of the parameter name.
In short, it's a perfectly valid URI, but it isn't likely to produce the results that you are hoping for.
/content/v1/clients/clientname/function/subfunction?id=functionnamec&id=subfunctionaname
This might be usable, but there's likely to be some confusion about the duplicate id query parameters
>>> urllib.parse.parse_qs("id=functionnamec&id=subfunctionaname")
{'id': ['functionnamec', 'subfunctionaname']}
/content/v1/clients/clientname/function/subfunction?function.id=functionnamec&subfunction.id=subfunctionaname
>>> urllib.parse.parse_qs("function.id=functionnamec&subfunction.id=subfunctionaname")
{'function.id': ['functionnamec'], 'subfunction.id': ['subfunctionaname']}
That might be easier.
I think it would be common to take the data out of the query and put it on the path instead
/content/v1/clients/clientname/function/functionname/subfunction/subfunctionaname
And then extract the path parameters you need.
I'm using jsoup to get all text from websites.
Document doc = Jsoup.connect("URL").get();
String allText doc.text().toLowerCase();
Then I'm using Hibernate to persist the object that holds all text to a MySQL DB:
...
#Column(name="all_text")
#Lob
private String allText = null;
...
Everything is good so far. Only that sometimes I get a MySQL error when I try to save the object with allText:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8A s...' for column 'all_text' at row 1
Already looked this up and it's an encoding error. Probably have some special characters on their websites. I found a way to fix this by changing the encoding in the DB.
But my actual question is: what's the best way to filter and remove the special characters from the allText string and not persist them at all?
EDIT: To clarify, by special characters I mean Emoticons and all that stuff. Definitely anything that doesn't fit into UTF-8 encoding. I'm not concerned about ~ ^ etc...
Thanks in advance!
Just use regex:
allText.replaceAll("\\p{C}", "");
Don't forget to import java.util.regexPattern
This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 5 years ago.
im trying to build a regex where i try to filter urls and only those who are not given in my regex will result in a match.
If there is no test1.com or test2.com in the url it should result in a match.
The top level domains i want not to result in a match (test1.com and test2.com) are using always the https protocol, can contain subdomains and having paths after the top level domain ".com".
Last try was the following but still doesnt work...
https?://([a-z0-9]+[.])((test1)|(test2))[.](com)/.*
Result on regexplanet:
https://abc.test1.com/test.htm
==> MATCH
www.google.com
==> NO MATCH
https://123.test2.com/test.html
==> MATCH
https://test2.com/test.html
==> NO MATCH
Ho do i need to write the regex that everything which has not the test1.com and test2.com domain in its string will give a match?
This pattern should work:
^((?!test1\\.com|test2\\.com).)*$
Try out:
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "https://abc.test1.com/test.htm"));
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "www.google.com"));
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "https://123.test2.com/test.html"));
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "https://test2.com/test.html"));
Results:
false
true
false
false
I got to deal here with a problem, caused by a dirty design. I get a list of string and want to parse attributes out of it. Unfortunately, I can't change the source, where these String were created.
Example:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false"
Now I want to extract the attributes type, languageCode, url, ref, info and deactivated.
The problem here is the field info, whose text is not limited by quote mark. Also commas may occur in this field, so I can't use the comma at the end of the string, to find out where is ends.
Additional, those strings not always contain all attributes. type, info and deactivated are always present, the rest is optional.
Any suggestions how I can solve this problem?
One possible solution is to search for = characters in the input and then take the single word immediately before it as the field name - it seems that all your field names are single words (no whitespace). If that's the case, you can then take everything after the = until the next field name (accounting for separating ,) as the value.
This assumes that the value cannot contain =.
Edit:
As a possible way to handle embedded =, you can see if the word in front of it is one your known field names - if not, you can possibly treat the = as an embedded character rather than an operator. This, however, assumes that you have a fixed set of known fields (some of which may not always appear). This assumption may be eased if you know that the field names are case-sensitive.
Assuming that order of elements is fixed you could write solution using regex like this one
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String regex = //type, info and deactivated are always present
"type=(?<type>.*?)"
+ "(?:, languageCode=(?<languageCode>.*?))?"//optional group
+ "(?:, url=(?<url>.*?))?"//optional group
+ "(?:, ref=(?<rel>.*?))?"//optional group
+ ", info=(?<info>.*?)"
+ ", deactivated=(?<deactivated>.*?)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
if(m.matches()){
System.out.println("type -> "+m.group("type"));
System.out.println("languageCode -> "+m.group("languageCode"));
System.out.println("url -> "+m.group("url"));
System.out.println("rel -> "+m.group("rel"));
System.out.println("info -> "+m.group("info"));
System.out.println("deactivated -> "+m.group("deactivated"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
rel -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
EDIT: Version2 regex searching for oneOfPossibleKeys=value where value ends with:
, oneOfPossibleKeys=
or has end of string after it (represented by $).
Code:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String[] possibleKeys = {"type","languageCode","url","ref","info","deactivated"};
String keysStrRegex = String.join("|", possibleKeys);
//above will contain type|languageCode|url|ref|info|deactivated
String regex = "(?<key>\\b(?:"+keysStrRegex+")\\b)=(?<value>.*?(?=, (?:"+keysStrRegex+")=|$))";
// (?<key>\b(?:type|languageCode|url|ref|info|deactivated)\b)
// =
// (?<value>.*?(?=, (?:type|languageCode|url|ref|info|deactivated)=|$))System.out.println(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group("key")+" -> "+m.group("value"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
ref -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
You could use a regular expression, capturing all the "fixed" groups and using whatever remains for info. This should even work if the info part contains , or = characters. Here's some quick example (using Python, but that should not be a problem...).
>>> p = r"(type=[A-Z]+), (languageCode=[-A-Z]+), (url=[^,]+), (ref=\d), (info=.+?), (deactivated=(?:true|false))"
>>> s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars, even deactivated=true., deactivated=false"
>>> re.search(p, s).groups()
('type=INFO',
'languageCode=EN-GB',
'url=http://www.stackoverflow.com',
'ref=1',
'info=Text, that may contain all kind of chars, even deactivated=true.',
'deactivated=false')
If any of those elements are optional, you can put a ? after those groups, and make the comma optional. If the order can be different, then it's more complicated. In this case, instead of using one RegEx to capture everything at once, use several RegExes to capture the individual attributes and then remove (replace with '') those in the string before matching the next attribute. Finally, match info.
On further consideration, given that those attributes could have any order, it may be more promising to capture just everything spanning from one keyword to the next, regardless of its actual content, very similar to Pshemo's solution:
keys = "type|languageCode|url|ref|info|deactivated"
p = r"({0})=(.+?)(?=\, (?:{0})=|$)".format(keys)
matches = re.findall(p, s)
But this, too, might fail in some very obscure cases, e.g. if the info attribute contains something like ', ref=foo', including the comma. However, there seems to be no way around those ambiguities. If you had a string like info=in this string, ref=1, and in another, ref=2, ref=1, does it contain one ref attribute, or three, or none at all?