How to elegantly parse a string to have exactly what you need? - java

I currently have a S3 bucket directory key like this:
String dir = "s3://mybucket/workflow/science/sweet-humoor/vars";
What I am trying to do is to get the prefix of this S3 directory, a prefix is actually without s3:://mybucket/, so what I want to have is workflow/science/sweet-humoor/vars
Now, what would be a elegant way to achieve this? I know the quickest way to do is to do a subString(13), but this will break whenever the bucket name changes.
How would you handle this?

Use a regular expression with replaceAll:
String result = directoryKey.replaceAll("s3://[^/]+/", "");
The regex here is:
s3://[^/]+/
It matches the part that you want to remove, which is s3:// followed by a bunch of non-slash characters, followed by a slash.

It's cleanest to use the Java library functions for paths instead of handling the Strings directly. What you have is an URL, so
URL url = new URL(dir);
URI uri = url.toURI();
Path fullpath = Paths.get(uri);
Now you have a Path (ie the "/mybucket/workflow/science/sweet-humoor/vars" part), and you can get the subpath by
// start index 1 to skip the first directory element
Path subpath = fullpath.subpath(1, fullpath.getNameCount()-1);
You can make a File out of this (subpath.toFile()), or just get the path string by
subpath.toString();

The URIBuilder class from the org.apache.http.client.utils package can do that.
URIBuilder builder = new URIBuilder(dir);
String thePath = builder.getPath();
This automatically extracts /workflow/science/sweet-humoor/vars from the path. The retrieved path does not include mybucket, because URIBuilder sees the first part immediately after the protocol specifier (s3://) as hostname.
Further processing can be done through Path p = Paths.get(thePath).

You can try this:
String dir2=dir.replaceAll("s3://"+dir.split("/")[2]+"/","");

String dir = "s3://mybucket/workflow/science/sweet-humoor/vars";
dir = dir.replace("//", "").substring( dir.indexOf("/") );
System.err.println(dir); // prints mybucket/workflow/science/sweet-humoor/vars

I would split the string by "/" and get the values from third index and join it with "/". Sample code in python.
input_string = "s3://mybucket/workflow/science/sweet-humoor/vars"
list1 = (input_string.split("/"))
print(list1)
print("/".join(list1[3:]))
Output:
workflow/science/sweet-humoor/vars

Related

How to use split() to remove last destination in path?

I have a list of paths and i need to remove the last directory of each path.
Path : "I:\Directory_1\Directory_2\Directory_3-Sometext"
I used the split method to remove everyting on the right side of the '-'
I've tried using split() removing one by one and then regrouping everything in one string.
I've tried splitting everyting on ("\") and using lenght()
//Removes text after '-'
String [] parts = path.split("-")
String partsA = parts[0]
String [] newParts = partsA.split("\\\\");
String partsB = newParts[newParts.length-1];
partsA = partsA.substring(partsA.length()-partsB.length(),partsA.length()+partsB.length());
I expect the ouput to be
\Directory_1\Directory_2
without the last directory and the text
Instead of using string manipulation, you could use proper path/file objects, with the additional benefit that it can handle other types of paths (for example a unix path such as /home/directory1):
String f = "I:\\Directory_1\\Directory_2\\Directory_3-Sometext";
Path p = Paths.get(f);
Path parent = p.getParent();
System.out.println(parent.toString());
You could also use Java's File API:
new File("I:\\Directory_1\\Directory_2\\Directory_3-Sometext").getParent();
This has the advantage of being OS indepenent...
Use:
String directory = "I:\\Directory_1\\Directory_2\\Directory_3-Sometext";
directory.substring(0, directory.lastIndexOf("\\"));
which outputs:
I:\Directory_1\Directory_2

Getting file extension from http url using Java

Now I know about FilenameUtils.getExtension() from apache.
But in my case I'm processing extensions from http(s) urls, so in case I have something like
https://your_url/logo.svg?position=5
this method is gonna return svg?position=5
Is there the best way to handle this situation? I mean without writing this logic by myself.
You can use the URL library from JAVA. It has a lot of utility in this cases. You should do something like this:
String url = "https://your_url/logo.svg?position=5";
URL fileIneed = new URL(url);
Then, you have a lot of getter methods for the "fileIneed" variable. In your case the "getPath()" will retrieve this:
fileIneed.getPath() ---> "/logo.svg"
And then use the Apache library that you are using, and you will have the "svg" String.
FilenameUtils.getExtension(fileIneed.getPath()) ---> "svg"
JAVA URL library docs >>>
https://docs.oracle.com/javase/7/docs/api/java/net/URL.html
If you want a brandname® solution, then consider using the Apache method after stripping off the query string, if it exists:
String url = "https://your_url/logo.svg?position=5";
url = url.replaceAll("\\?.*$", "");
String ext = FilenameUtils.getExtension(url);
System.out.println(ext);
If you want a one-liner which does not even require an external library, then consider this option using String#replaceAll:
String url = "https://your_url/logo.svg?position=5";
String ext = url.replaceAll(".*/[^.]+\\.([^?]+)\\??.*", "$1");
System.out.println(ext);
svg
Here is an explanation of the regex pattern used above:
.*/ match everything up to, and including, the LAST path separator
[^.]+ then match any number of non dots, i.e. match the filename
\. match a dot
([^?]+) match AND capture any non ? character, which is the extension
\??.* match an optional ? followed by the rest of the query string, if present

How do I split the rest of the URL from the last path of it

I have this file URL: http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new sample.pdf which will be converted to http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new%20sample.pdf later.
Now I can get the last path by:
public static String getLastPathFromUrl(String url) {
return url.replaceFirst(".*/([^/?]+).*", "$1");
}
which will give me new sample.pdf
but how do I get the remaining of the URL: http://xxx.xxx.xx.xx/resources/upload/2014/09/02/
?
Easier way to get last path from URL would be to use String.split function, like this:-
String url = "http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new sample.pdf";
String[] urlArray = url.split("/");
String lastPath = urlArray[urlArray.length-1];
This converts your url into an Array which can then be used in many ways. There are various ways to get url-lastPath, one way could be to join the above generated Array using this answer. Or use lastIndexOf() and substring like this:-
String restOfUrl = url.substring(0,url.lastIndexOf("/"));
PS:- Although you can learn something by doing this but I think your best solution would be to replace space by %20 in the complete url String, that would be the fastest and make more sense.
I am not sure if I understood it correctly but when you say
I have this file URL: URL/new sample.pdf which will be converted to URL/new%20sample.pdf later.
It looks like you are trying to replace "space" with %20 in URL or said in simple words trying to take care of unwanted characters in URL. If that is what you need use pre-built
URLEncoder.encode(String url,String enc), You can us ÜTF-8 as encoding.
http://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html
If you really need to split it, assuming that you interested in URL after http://, remove http:// and take store remaining URL in string variable called say remainingURL. then use
List myList = new ArrayList(Arrays.asList(remainingURL.split("/")));
You can iterate on myList to get rest of URL fragments.
I've found it:
File file=new File("http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new sample.pdf");
System.out.println(file.getPath().replaceAll(file.getName(),""));
Output:
http://xxx.xxx.xx.xx/resources/upload/2014/09/02/
Spring solution:
List<String> pathSegments = UriComponentsBuilder.fromUriString(url).build().getPathSegments();
String lastPath = pathSegments.get(pathSegments.size()-1);

Concatenating to a file name before the ‘.’ Filename extension in Java

I can’t seem to find a way of concatenating to a file name before the “.” extension in Java and I’m not entirely sure how I would go about this.
I have already tried:
String s = r + "V1";
Where the variable r contains the value of myFile.txt and the output is: myFile.txtV1, but what I need to achieve is myFileV1.txt as I don’t want to overwrite the existing file with the same name but concatenate the V1 before the . filename extension when the file is written.
Thanks
In case file name can contain more then one dot like foo.bar.txt you should find index of last dot (String#lastIndexOf(char) can be useful here).
To get file name without extension (foo.bar part) substring(int, int) full file name from index 0 till index of that last dot.
To get extension (.txt part from last dot till the end of string) substring(int) from last dot index.
So your code can look like:
int lastDotIndex = r.lastIndexOf('.');
String s = r.substring(0, lastDotIndex ) + "V1" + r.substring(lastDotIndex);
Another approach is to use Apache Commons IO's FilenameUtils class to get the file's base name and extension.
import org.apache.commons.io.FilenameUtils;
...
File file = ...
String filename = file.getName();
String base = FilenameUtils.removeExtension(filename);
String extension = FilenameUtils.getExtension(filename);
String result = base + "-something-here" + "." + extension;
Look at String.indexOf() and String.substring() to split the string up and rebuild your updated version.
Try this (assuming that you have only one '.' in the name of your file):
String[] x = r.split("\\.");
String s = x[0]+"V1."+x[1];
Another apache commons method based on StringUtils.substringBeforeLast() and StringUtils.substringAfterLast:
String newPath = StringUtils.substringBeforeLast(filePath, ".") +
"_updated." + StringUtils.substringAfterLast(filePath, ".");
NB: You still need to check if the file actually contains the dot character or otherwise the result won't be consistent.
String s = r.substring(0,r.indexOf(".")) + "V1" + r.substring(a.indexOf("."));
Reminder that extensions are technically platform specific. Also, you probably want to have separate variables for the name and extension and combine them together at the end. Last caveat is that this code will not work if there are multiple period symbols in the filename (e.g. hello.world.txt)

How do I manipulate strings with regex?

I'm fairly new to java and I'm trying to get a part of a string:
Say I have a URL and I want a specific part of it, such as a filename:
String url = "http://example.com/filename02563.zip";
The 02563 will be generated at random every time and it's now always 5 characters long.
I want to have java find what's between "m/" (from .com/) to the end of the line to get the filename alone.
Now consider this example:
Say I have an html file that I want a snippet extracted from. Below would be the extracted example:
<applet name=someApplet id=game width="100%" height="100%" archive=someJarFile0456799.jar code=classInsideAJarFile.class mayscript>
I want to extract the jar filename, so I want to get the text between "ve=" and ".jar". The extension will always be ".jar", so including this is not important.
How would I do this? If possible, could you comment the code so I understand what's happening?
Use the Java URI class where you can access the individual elements.
URI uri = new URI("http://example.com/filename02563.zip");
String filename = uri.getPath();
Granted, this will need a little more work if the resource no longer resides in the root path.
You can use the lastIndexOf() and substring() methods from the String class to extract a specific piece of a String:
String url = "http://example.com/filename02563.zip";
String filename = url.substring(url.lastIndexOf("/") + 1); //+1 skips ahead of the '/'
You have answers for your first question so this is for second one. Normally I would use some XML parser but your example is not valid XML file so this will be solved with regex (as you wanted).
String url = "<applet name=someApplet id=game width=\"100%\" height=\"100%\" archive=someJarFile0456799.jar code=classInsideAJarFile.class mayscript>";
Pattern pattern= Pattern.compile("(?<=archive=).*?(?= )");
Matcher m=pattern.matcher(url);
if(m.find())
System.out.println(m.group());
output:
someJarFile0456799.jar

Categories