How to reduce double dots in path expressions using Java - java

I am in the middle of making a Java application which in particular takes a relative file path of the form
String path = "path/to/Plansystem/Xslt/omraade/../../../Kms/Xslt/Rense/Template.xslt"
and reduce / simplify the path expression, so that it provides an equivalent path, but without the double dots. That is, we should obtain this String:
String result = "path/to/Kms/Xslt/Rense/Template.xslt"
Currently, I have defined the following Regular expression:
String parentDirectory = $/\/(?!\.)([\w,_-]*)\.?([\w,_-]*)\/\.\.\//$
I then replace any match with a single slash. This approach seems to work, and I came up with the expression using Regexr.com, but it seems to me that my approach is a little hacky, and I would be surprised if this specific functionality is not available in some well tested, well developed library. Is anyone familiar with such a library?
Edit:
Based on the responses made by rzwitserloot and Andy Turner I realized that the following methods works for me:
public static String slash = "/"
public static final String backslashes = $/\\+/$
static String normalizePath(String first, String... more) {
String pathToReturn = Paths.get(first, more).normalize().toString().replaceAll(backslashes, slash)
return pathToReturn
}
Note that the replacement I make at the end is only due to a specific need I have, where I want to preserve the unix notation (even when running on Windows).

No, don't bother with regular expressions. There's an API for this!
Basic 'dot' removal:
import java.nio.file.Paths;
Paths.get("/Users/Birdie/../../Users/Birdie/workspace/../workspace").normalize()
Will get you a path representing /Users/Birdie/workspace.
You can go further and follow softlinks, even:
Paths.get("/Users/Birdie/../../Users/Birdie/workspace/../workspace").toRealPath()

Use java.nio.Path:
Path path = Paths.get("path/to/Plansystem/Xslt/omraade/../../../Kms/Xslt/Rense/Template.xslt");
Path normalized = path.normalize();

Related

Regex pattern to split colon char with a condition

I have a string like this :
http://schemas/identity/claims/usertype:External
Then my goal is to split that string into 2 words by colon delimiter, but in need to specified how the regex worked, it will be split the colon but not including colon in "http://", so those strings will be split into :
http://schemas/identity/claims/usertype
External
I have tried regex like this :
(http:\/\/+schemas\/identity\/claims\/usertype)
So it will be :
http://schemas/identity/claims/usertype
:External
then after that i will replace the remaining colon with empty string.
but i think its not a best practice for this, because i rarely used regex.
Do you have any suggestion to simplified the regex ?
Thanks in advance
This is an X/Y problem. Fortunately, you asked the question in a great way, by explaining the underlying problem you are trying to solve (namely: Pull some string out of a URL), and then describing the direction you've chosen to solve your problem (which is bad, see below), and then asking about a problem you have with this solution (which is irrelevant, as the entire solution is bad).
URLs aren't parsable like this. You shouldn't treat them as a string you can lop into pieces like this. For example, the server part can contain colons too: For port number. In front of the server part, there can be an authentication which can also contain a colon. It's rarely used, of course.
Try this one, which shows the problem with your approach:
https://joe:joe#google.com:443/
That link just works. Port 443 was the default anyway, and google ignores the authentication header that ends up sending, but the point is, a URL may contain this stuff.
But rzwitserloot, it.. won't! I know!
That's bad programming mindset. That mindset leads to security issues. Why go for a solution that burdens your codebase with unstated assumptions (assumption: The places that provide a URL to this code are under my control and will never send port or auth headers)? If the 'server' part is configurable in a config file, will you mention in said config file that you cannot add a port? Will you remember 4 years from now?
The solution that does it right isn't going to burden your code with all these unstated (or very unwieldy if stated) assumptions.
Okay, so what is the right way?
First, toss that string into the constructor of java.net.URI. Then, use the methods there to get what you actually want, which is the path part. That is a string you can pull apart:
URI uri = new URI("http://schemas/identity/claims/usertype:External");
String path = uri.getPath();
String newPath = path.replaceAll(":.*", "");
String type = path.replaceAll(".*?:", "");
URI newUri = uri.resolve(newPath);
System.out.println(newUri);
System.out.println(type);
prints:
http://schemas/identity/claims/usertype
External
NB: Toss some ports or auth stuff in there, or make it a relative URL - do whatever you like, this code is far more robust in the face of changing the base URL than any attempt to count colons is going to be.
Use Negative Lookbehind and split
Regex:
"(?<!(http|https)):"
Regex in context:
public static void main(String[] args) {
String input = "http://schemas/identity/claims/usertype:External";
validateURI(input);
List<String> result = Arrays.asList(input.split("(?<!(http|https)):"));
result.forEach(System.out::println);
}
private static void validateURI(String input) {
try {
new URI(input);
} catch (URISyntaxException e) {
System.out.println("Invalid URI!!!");
e.printStackTrace();
}
}
Output:
http://schemas/identity/claims/usertype
External
I think this might help you:
public class Separator{
public static void main(String[] args) {
String input = "http://schemas/identity/claims/usertype:External";
String[] splitted = input.split("\\:");
System.out.println(splitted[splitted.length-1]);
}
}
Output
External

Trying to replace part of a string starts with /x2D

In JMeter, I used a Regular Expression Extractor to extract part of an HTML response. I then passed that to a BeanShell Post Processor. However, having trouble replacing \x2D to -. Is there a way to do this or perhaps do I need to extract the response as
String yourvar = vars.get("accessToken");
String anotherVar = yourvar.replace("data.access_token = '","");
String finalAccessToken = anotherVar.replace("\x2D","-");
vars.put("finalAccessToken",finalAccessToken);
It is not liking the "\x2D" part. It works if I find \x2D but the original string only has .
You need to escape your target String parameter.
final String finalAccessToken = anotherVar.replace("\\x2D", "-");
If it's not what you're asking for, add more info to the question. That's all what I was able to understand.
It is recommended to use JMeter's built-in test elements where possible. In particular your case you might be interested in __strReplace() custom JMeter Function
Install Custom JMeter Functions bundle using JMeter Plugins Manager
Use the following expression to make the replacement:
${__strReplace(${anotherVar},\\\x2D,-,)}
If you want to go for scripting - make sure to use JSR223 PostProcessor and Groovy language. Be aware that you will still need to escape backslash with another backslash like:
String finalAccessToken = anotherVar.replace("\\x2D","-");

Java Regexp to match domain of url

I would like to use Java regex to match a domain of a url, for example,
for www.table.google.com, I would like to get 'google' out of the url, namely, the second last word in this URL string.
Any help will be appreciated !!!
It really depends on the complexity of your inputs...
Here is a pretty simple regex:
.+\\.(.+)\\..+
It fetches something that is inside dots \\..
And here are some examples for that pattern: https://regex101.com/r/L52oz6/1.
As you can see, it works for simple inputs but not for complex urls.
But why reinventing the wheel, there are plenty of really good libraries that correctly parse any complex url. But sure, for simple inputs a small regex is easily build. So if that does not solve the problem for your inputs then please callback, I will adjust the regex pattern then.
Note that you can also just use simple splitting like:
String[] elements = input.split("\\.");
String secondToLastElement = elements[elements.length - 2];
But don't forget the index-bound checking.
Or if you search for a very quick solution than walk through the input starting from the last position. Work your way through until you found the first dot, continue until the second dot was found. Then extract that part with input.substring(index1, index2);.
There is also already a delegate method for exactly that purpose, namely String#lastIndexOf (see the documentation).
Take a look at this code snippet:
String input = ...
int indexLastDot = input.lastIndexOf('.');
int indexSecondToLastDot = input.lastIndexOf('.', indexLastDot);
String secondToLastWord = input.substring(indexLastDot, indexSecondToLastDot);
Maybe the bounds are off by 1, haven't tested the code, but you get the idea. Also don't forget bound checking.
The advantage of this approach is that it is really fast, it can directly work on the internal structures of Strings without creating copies.
My attempt:
(?<scheme>https?:\/\/)?(?<subdomain>\S*?)(?<domainword>[^.\s]+)(?<tld>\.[a-z]+|\.[a-z]{2,3}\.[a-z]{2,3})(?=\/|$)
Demo. Works correctly for:
http://www.foo.stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.com/
http://stackoverflow.com
https://www.stackoverflow.com
www.stackoverflow.com
stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.co.uk
foo.www.stackoverflow.com
foo.www.stackoverflow.co.uk
foo.www.stackoverflow.co.uk/a/b/c
private static final Pattern URL_MATCH_GET_SECOND_AND_LAST =
Pattern.compile("www.(.*)//.google.(.*)", Pattern.CASE_INSENSITIVE);
String sURL = "www.table.google.com";
if (URL_MATCH_GET_SECOND_AND_LAST.matcher(sURL).find()){
Matcher matchURL = URL_MATCH_GET_SECOND_AND_LAST .matcher(sURL);
if (matchURL .find()) {
String sFirst = matchURL.group(1);
String sSecond= matchURL.group(2);
}
}

How to replace double slash with single slash for an url

For the given url like "http://google.com//view/All/builds", i want to replace the double slash with single slash. For example the above url should display as "http://google.com/view/All/builds"
I dint know regular expressions. Can any one help me, how can i achieve this using regular expressions.
To avoid replacing the first // in http:// use the following regex :
String to = from.replaceAll("(?<!http:)//", "/");
PS: if you want to handle https use (?<!(http:|https:))// instead.
Is Regex the right approach?
In case you wanted this solution as part of an exercise to improve your regex skills, then fine. But what is it that you're really trying to achieve? You're probably trying to normalize a URL. Replacing // with / is one aspect of normalizing a URL. But what about other aspects, like removing redundant ./ and collapsing ../ with their parent directories? What about different protocols? What about ///? What about the // at the start? What about /// at the start in case of file:///?
If you want to write a generic, reusable piece of code, using a regular expression is probably not the best appraoch. And it's reinventing the wheel. Instead, consider java.net.URI.normalize().
java.net.URI.normalize()
java.lang.String
String inputUrl = "http://localhost:1234//foo//bar//buzz";
String normalizedUrl = new URI(inputUrl).normalize().toString();
java.net.URL
URL inputUrl = new URL("http://localhost:1234//foo//bar//buzz");
URL normalizedUrl = inputUrl.toURI().normalize().toURL();
java.net.URI
URI inputUri = new URI("http://localhost:1234//foo//bar//buzz");
URI normalizedUri = inputUri.normalize();
Regex
In case you do want to use a regular expression, think of all possibilities. What if, in future, this should also process other protocols, like https, file, ftp, fish, and so on? So, think again, and probably use URI.normalize(). But if you insist on a regular expression, maybe use this one:
String noramlizedUri = uri.replaceAll("(?<!\\w+:/?)//+", "/");
Compared to other solutions, this works with all URLs that look similar to HTTP URLs just with different protocols instead of http, like https, file, ftp and so on, and it will keep the triple-slash /// in case of file:///. But, unlike java.net.URI.normalize(), this does not remove redundant ./, it does not collapse ../ with their parent directories, it does not other aspects of URL normalization that you and I might have forgotten about, and it will not be updated automatically with newer RFCs about URLs, URIs, and such.
String to = from.replaceAll("(?<!(http:|https:))[//]+", "/");
will match two or more slashes.
Here is the regexp:
/(?<=[^:\s])(\/+\/)/g
It finds multiple slashes in url preserving ones after protocol regardless of it.
Handles also protocol relative urls which start from //.
#Test
public void shouldReplaceMultipleSlashes() {
assertEquals("http://google.com/?q=hi", replaceMultipleSlashes("http://google.com///?q=hi"));
assertEquals("https://google.com/?q=hi", replaceMultipleSlashes("https:////google.com//?q=hi"));
assertEquals("//somecdn.com/foo/", replaceMultipleSlashes("//somecdn.com/foo///"));
}
private static String replaceMultipleSlashes(String url) {
return url.replaceAll("(?<=[^:\\s])(\\/+\\/)", "/");
}
Literally means:
(\/+\/) - find group: /+ one or more slashes followed by / slash
(?<=[^:\s]) - which follows the group (*posiive lookbehind) of this (*negated set) [^:\s] that excludes : colon and \s whitespace
g - global search flag
I suggest you simply use String.replace which documentation is http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence)
Something like
`myString.replace("//", "/");
If you want to remove the first occurence:
String[] parts = str.split("//", 2);
str = parts[0] + "//" + parts[1].replaceAll("//", "/");
Which is the simplest way (without regular expression). I don't know the regular expression corresponding, if there is an expert looking at the thread.... ;)

How to retrieve particular part of string

I have got a directory listing as a String and I want to retrieve a particular part of the string, the only thing is that as this is a directory it can change in length
I want to retrieve the file name from the string
"C:\projects\Compiler\Compiler\src\JUnit\ExampleTest.java"
"C:\projects\ExampleTest.java"
So in these two cases I want to retrieve just ExampleTest (the filename can also change so i need something like get the text before the first . and after the last \). Is there a way to do this using something like regex or something similar?
Why not use Apache Commons FileNameUtils rather than coding your own regular expressions ? From the doc:
This class defines six components within a filename (example
C:\dev\project\file.txt):
the prefix - C:\
the path - dev\project\
the full path - C:\dev\project\
the name - file.txt
the base name - file
the extension - txt
You're a lot better off using this. It's geared directly towards filenames, dirs etc. and given that it's a commonly used, well-defined component, it'll have been tested extensively and edge cases ironed out etc.
new File(thePath).getName()
or
int pos = thePath.lastIndexOf("\\");
return pos >= 0? thePath.substring(pos+1): thePath;
File file = new File("C:\\projects\\ExampleTest.java");
System.out.println(file.getAbsoluteFile().getName());
Java code
String test = "C:\\projects\\Compiler\\Compiler\\src\\JUnit\\ExampleTest.java";
String arr[] = test.split("\\Q"+"\\");
System.out.println(arr[arr.length-1].split("\\.")[0]);
This is the regex in c# and it works in java :P too.Thanks to Perl.It matches in Group[1]
^.*\\(.*?)\..*?$

Categories