Why #PathVariable trims the data containing #? [duplicate]

Why #PathVariable trims the data containing #? [duplicate] - java

This question already has an answer here:
How to escape hash character in URL
(1 answer)
Closed 2 years ago.
I have a URI mapping in my custom controller given below :
http://localhost:8080/abc/{id}
Now, for normal values in id, it's not creating any problem.
When id contains a #, the content gets trimmed.
For example: for id = 123#qqq, then #PathVariable makes it 123
How to resolve this issue ?

You basically have 2 options:
1) Keep id as Path Variable and escape the #
As already mentioned in the comments # is a special character in URIs normally reserved to refer anchor positions on the webpage. If you want to use it as a path variable you will have to escape it:
http://localhost:8080/abc/123%23qqq
Will yield your desired id = 123#qqq
2) Use a Request Parameter instead
This seems to to me the cleaner solution. If you have to have the # as part of your id, you should propably just encode it as a String in a request parameter:
public void fooBar(#Requestparam String id) {
// do stuff
}
This way you won't have to worry about URI encodings since your # character will be interpreted as a String.

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

My String is like this.
{\\\"692950841314120\\\":[{\\\"type\\\":\\\"ads_management\\\",\\\"call_count\\\":3,\\\"total_cputime\\\":1,\\\"total_time\\\":5,\\\"estimated_time_to_regain_access\\\":0}]}
Since the key here is a variable value I am trying to replace this 692950841314120(or the values which I get from sever) with a constant like ID. My main goal is to parse this as POJO. I have tried using..
string.replaceAll("^[0-9]{15}$","ID")
but due to Slashes I think i am not able to get the desired value. Is there any better way to do this. I know I can do below Code but I don't want any ID123 if I added extra value and distort any other info in JSON.
string.replaceAll("[0-9]{15}","ID")

Strictly speaking, if you have a valid JSON string, you should parse it using something like GSON, rather than using regex. That being said, if you must use regex, you could try removing the starting and ending anchors:
string.replaceAll("[0-9]{15}", "ID")
Or maybe use double quotes instead:
string.replaceAll("\"[0-9]{15}\"", "ID")

It is safer to assume the value is inisde \" and \":.
You can then use
.replaceAll("(\\\\\")[0-9]{15}(\\\\\":)", "$1ID$2")
The regex is (\\")[0-9]{15}(\\":) and it means:
(\\") - match and capture \" substring into Group 1
[0-9]{15} - fifteen digits
(\\":) - Group 2: a \": substring.
The $1 and $2 are placeholders holding the Group 1 and 2 values.

You should use "A word boundary" \b.
Try this.
public static void main(String[] args) {
String input = "{\\\"692950841314120\\\":"
+ "[{\\\"type\\\":\\\"12345678901234567890\\\","
+ "\\\"call_count\\\":3,"
+ "\\\"total_cputime\\\":1,"
+ "\\\"total_time\\\":5,"
+ "\\\"estimated_time_to_regain_access\\\":0}]}";
System.out.println(input.replaceAll("\\b[0-9]{15}\\b", "ID"));
}
output:
{\"ID\":[{\"type\":\"12345678901234567890\",\"call_count\":3,\"total_cputime\":1,\"total_time\":5,\"estimated_time_to_regain_access\":0}]}

Rest controller request multiple path variables with multiple query parameters

How can we create a Rest API (Spring controller) which allows multiple path variables to have query parameters?
Where
1) function is a path variable and id=functionname is query parameter
2) subfunction is a path variable and id=subfuntionname is query parameter
Request URL : /content/v1/clients/clientname/function?id=functionname&subfunction?id=subfunctionname
Update I am using matrix variations suggested by
/content/v1/clients/clientname/function;id=functionname/subfunction;id=subfunctionname
The method shown below is not working as expected.
What should the method definition look like?
public HashMap<String, List<Model>> getContent(
#PathVariable String clientname,
#MatrixVariable(name="id", pathVar="function") List<String> capabilitiesId,
#MatrixVariable(name="id", pathVar="subfunction") List<String> subcapabilitiesId) {
}
Error : Missing matrix variable 'id' for method parameter of type List

It's not possible.
In REST controller you have two type of parameters:
Path parameter: parameter usefull to select a resource. (a you class's method)
Query parameter: parameter useful to send other information.
In your case I think that is a good idea send all this informations inside payload, using POST or PUT http method.
If you can't use payload you can obtain the following solution:
Request URL : /content/v1/clients/clientname/function1/function2?id1=functionnamec&id2=subfunctionaname
In this way you can create your controller with 2 path parameters and 2 query parameters:
#GET
#Path("/basePath/{funct1}/{funct2}")
public Response <methodName>(#PathParam("funct1") String funct1, #PathParam("funct2") String funct2, #QueryParam("id1") String id1, #QueryParam("id2") String id2)

/content/v1/clients/clientname/function?id=functionnamec&subfunction?id=subfunctionaname
The parsing of URI is defined by RFC 3986. In particular, U+003F QUESTION MARK is a reserved character, the first instance of which serves a the delimiter between the relative-part and the query.
So your example breaks would parse as
path: /content/v1/clients/clientname/function
query: id=functionnamec&subfunction?id=subfunctionaname
And if we were to parse the query, as though it were an application/x-www-form-urlencoded value....
>>> import urllib.parse
>>> urllib.parse.parse_qs("id=functionnamec&subfunction?id=subfunctionaname")
{'id': ['functionnamec'], 'subfunction?id': ['subfunctionaname']}
We see that the second question mark becomes part of the parameter name.
In short, it's a perfectly valid URI, but it isn't likely to produce the results that you are hoping for.
/content/v1/clients/clientname/function/subfunction?id=functionnamec&id=subfunctionaname
This might be usable, but there's likely to be some confusion about the duplicate id query parameters
>>> urllib.parse.parse_qs("id=functionnamec&id=subfunctionaname")
{'id': ['functionnamec', 'subfunctionaname']}
/content/v1/clients/clientname/function/subfunction?function.id=functionnamec&subfunction.id=subfunctionaname
>>> urllib.parse.parse_qs("function.id=functionnamec&subfunction.id=subfunctionaname")
{'function.id': ['functionnamec'], 'subfunction.id': ['subfunctionaname']}
That might be easier.
I think it would be common to take the data out of the query and put it on the path instead
/content/v1/clients/clientname/function/functionname/subfunction/subfunctionaname
And then extract the path parameters you need.

Filter Special Characters in Spring / Java

I'm using jsoup to get all text from websites.
Document doc = Jsoup.connect("URL").get();
String allText doc.text().toLowerCase();
Then I'm using Hibernate to persist the object that holds all text to a MySQL DB:
...
#Column(name="all_text")
#Lob
private String allText = null;
...
Everything is good so far. Only that sometimes I get a MySQL error when I try to save the object with allText:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8A s...' for column 'all_text' at row 1
Already looked this up and it's an encoding error. Probably have some special characters on their websites. I found a way to fix this by changing the encoding in the DB.
But my actual question is: what's the best way to filter and remove the special characters from the allText string and not persist them at all?
EDIT: To clarify, by special characters I mean Emoticons and all that stuff. Definitely anything that doesn't fit into UTF-8 encoding. I'm not concerned about ~ ^ etc...
Thanks in advance!

Just use regex:
allText.replaceAll("\\p{C}", "");
Don't forget to import java.util.regexPattern

Java regex: How to match a string if its not consisting a list of certain top level domains [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 5 years ago.
im trying to build a regex where i try to filter urls and only those who are not given in my regex will result in a match.
If there is no test1.com or test2.com in the url it should result in a match.
The top level domains i want not to result in a match (test1.com and test2.com) are using always the https protocol, can contain subdomains and having paths after the top level domain ".com".
Last try was the following but still doesnt work...
https?://([a-z0-9]+[.])((test1)|(test2))[.](com)/.*
Result on regexplanet:
https://abc.test1.com/test.htm
==> MATCH
www.google.com
==> NO MATCH
https://123.test2.com/test.html
==> MATCH
https://test2.com/test.html
==> NO MATCH
Ho do i need to write the regex that everything which has not the test1.com and test2.com domain in its string will give a match?

This pattern should work:
^((?!test1\\.com|test2\\.com).)*$
Try out:
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "https://abc.test1.com/test.htm"));
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "www.google.com"));
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "https://123.test2.com/test.html"));
System.out.println(Pattern.matches("^((?!test1\\.com|test2\\.com).)*$", "https://test2.com/test.html"));
Results:
false
true
false
false

Extract attributes of an string

I got to deal here with a problem, caused by a dirty design. I get a list of string and want to parse attributes out of it. Unfortunately, I can't change the source, where these String were created.
Example:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false"
Now I want to extract the attributes type, languageCode, url, ref, info and deactivated.
The problem here is the field info, whose text is not limited by quote mark. Also commas may occur in this field, so I can't use the comma at the end of the string, to find out where is ends.
Additional, those strings not always contain all attributes. type, info and deactivated are always present, the rest is optional.
Any suggestions how I can solve this problem?

One possible solution is to search for = characters in the input and then take the single word immediately before it as the field name - it seems that all your field names are single words (no whitespace). If that's the case, you can then take everything after the = until the next field name (accounting for separating ,) as the value.
This assumes that the value cannot contain =.
Edit:
As a possible way to handle embedded =, you can see if the word in front of it is one your known field names - if not, you can possibly treat the = as an embedded character rather than an operator. This, however, assumes that you have a fixed set of known fields (some of which may not always appear). This assumption may be eased if you know that the field names are case-sensitive.

Assuming that order of elements is fixed you could write solution using regex like this one
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String regex = //type, info and deactivated are always present
"type=(?<type>.*?)"
+ "(?:, languageCode=(?<languageCode>.*?))?"//optional group
+ "(?:, url=(?<url>.*?))?"//optional group
+ "(?:, ref=(?<rel>.*?))?"//optional group
+ ", info=(?<info>.*?)"
+ ", deactivated=(?<deactivated>.*?)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
if(m.matches()){
System.out.println("type -> "+m.group("type"));
System.out.println("languageCode -> "+m.group("languageCode"));
System.out.println("url -> "+m.group("url"));
System.out.println("rel -> "+m.group("rel"));
System.out.println("info -> "+m.group("info"));
System.out.println("deactivated -> "+m.group("deactivated"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
rel -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
EDIT: Version2 regex searching for oneOfPossibleKeys=value where value ends with:
, oneOfPossibleKeys=
or has end of string after it (represented by $).
Code:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String[] possibleKeys = {"type","languageCode","url","ref","info","deactivated"};
String keysStrRegex = String.join("|", possibleKeys);
//above will contain type|languageCode|url|ref|info|deactivated
String regex = "(?<key>\\b(?:"+keysStrRegex+")\\b)=(?<value>.*?(?=, (?:"+keysStrRegex+")=|$))";
// (?<key>\b(?:type|languageCode|url|ref|info|deactivated)\b)
// =
// (?<value>.*?(?=, (?:type|languageCode|url|ref|info|deactivated)=|$))System.out.println(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group("key")+" -> "+m.group("value"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
ref -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false

You could use a regular expression, capturing all the "fixed" groups and using whatever remains for info. This should even work if the info part contains , or = characters. Here's some quick example (using Python, but that should not be a problem...).
>>> p = r"(type=[A-Z]+), (languageCode=[-A-Z]+), (url=[^,]+), (ref=\d), (info=.+?), (deactivated=(?:true|false))"
>>> s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars, even deactivated=true., deactivated=false"
>>> re.search(p, s).groups()
('type=INFO',
'languageCode=EN-GB',
'url=http://www.stackoverflow.com',
'ref=1',
'info=Text, that may contain all kind of chars, even deactivated=true.',
'deactivated=false')
If any of those elements are optional, you can put a ? after those groups, and make the comma optional. If the order can be different, then it's more complicated. In this case, instead of using one RegEx to capture everything at once, use several RegExes to capture the individual attributes and then remove (replace with '') those in the string before matching the next attribute. Finally, match info.
On further consideration, given that those attributes could have any order, it may be more promising to capture just everything spanning from one keyword to the next, regardless of its actual content, very similar to Pshemo's solution:
keys = "type|languageCode|url|ref|info|deactivated"
p = r"({0})=(.+?)(?=\, (?:{0})=|$)".format(keys)
matches = re.findall(p, s)
But this, too, might fail in some very obscure cases, e.g. if the info attribute contains something like ', ref=foo', including the comma. However, there seems to be no way around those ambiguities. If you had a string like info=in this string, ref=1, and in another, ref=2, ref=1, does it contain one ref attribute, or three, or none at all?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why #PathVariable trims the data containing #? [duplicate] - java

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

Rest controller request multiple path variables with multiple query parameters

Filter Special Characters in Spring / Java

Java regex: How to match a string if its not consisting a list of certain top level domains [duplicate]

Extract attributes of an string

Categories

Resources