Extract prefix of a path of given length - java

Given an absolute path, how to extract the beginning part of this path of some given length? Effectively, the same value that I would get if I invoked getParent() the needed number of times.
I also need this to be filesystem-independent.
I see that there is Path.subpath, but it does not seem to be what I want: Path.of("/","a","b","c").subpath(0,2) gives a/b, but I need /a/b. Yes, I can get the root and then create a new path from the results of subpath() and the root, but then is it going to be system-independent?
Is there a simple way to achieve this?

Two simple ways to make it platform independent are the ways you suggest: Path with getParent, or Path getRoot+subpath. These should give identical results:
public static Path subpath1(Path p, int parts) {
if (parts < 1) throw new IllegalArgumentException("parts must be positive");
// Go up from path appropriate number of times:
return Stream.iterate(p, Path::getParent)
.skip(Math.max(0, p.getNameCount() - parts))
.findFirst().get();
}
public static Path subpath2(Path p, int parts) {
if (parts < 1) throw new IllegalArgumentException("parts must be positive");
// Resolve the subpath from the root
Path sub = p.subpath(0, parts);
Path root = p.getRoot();
return root != null ? root.resolve(sub) : sub;
}
However note that system-independence rather depends on your input data because Path.of("/","a","b","c") on Windows means UNC pathname \\a\b\c and NOT a path \a\b\c which starts from whatever your current drive happens to be. So if you use Path.of("/","a","b","c") in all platforms you'll see different results if run on Windows vs Linux.
To be consistent, specify Windows and Linux paths using "/a/b/c" form not "/","a","b","c". Here are some tests which show these differences:
private static void check(Path expected, Path input, int parts) {
assertEquals(expected, subpath1(input, parts));
assertEquals(expected, subpath2(input, parts));
}
#EnabledOnOs(OS.WINDOWS)
#Test void testSubpathsWindows() {
check(Path.of("C:/a/b"), Path.of("C:\\", "a", "b", "c", "d"), 2);
check(Path.of("C:\\a"), Path.of("C:\\", "a", "b", "c", "d"), 1);
// UNC paths - note that Path.of("/", "a", "b", ... ) is UNC \\a\b
check(Path.of("\\\\a\\b\\c"), Path.of("/", "a", "b", "c", "d"), 1);
check(Path.of("\\\\a\\b\\c"), Path.of("\\\\a\\b\\c\\d"), 1);
}
#EnabledOnOs(OS.LINUX)
#Test void testSubpathsLinux() {
check(Path.of("/", "a", "b"), Path.of("/", "a", "b", "c", "d"), 2);
check(Path.of("/a/b/c"), Path.of("/", "a", "b", "c", "d"), 3);
check(Path.of("a/b/c"), Path.of("a", "b", "c", "d"), 3);
}
#Test void testSubpathsAllOS() {
check(Path.of("/a/b"), Path.of("/a/b/c/d"), 2);
check(Path.of("/a/b/c"), Path.of("/a/b/c/d/e"), 3);
check(Path.of("a/b/c"), Path.of("a/b/c/d"), 3);
}

I'm not exactly sure if this is what you need but is does parse the supplied path string and return the number of segments desired. Read the comments in code:
/**
* Returns the supplied path string at the desired path depth, for example:<pre>
*
* If a path string consisted of:
*
* "C:/a/b/c/d/e/f/g/h/i/j/k" OR "C:\\a\\b\\c\\d\\e\\f\\g\\h\\i\\j\\k";
*
* And we want the path from a depth of 6 then what is returned is:
*
* 1 2 3 4 5 6
* "C:\a\b\c\d\e"
*
* If a path string consisted of:
*
* "/a/b/c/d/e/f/g/h/i/j/k" OR "\\a\\b\\c\\d\\e\\f\\g\\h\\i\\j\\k";
*
* And we want the path from a depth of 6 then what is returned is:
*
* 1 2 3 4 5 6
* "\a\b\c\d\e\f"
*
* Note that the File System Separator is used on the returned String. This
* can be changed in code by replaceing all instances of `File.Separator`
* with whatever string character you want.</pre>
*
* #param absolutePath (String) The path string to acquire the path depth
* from.<br>
*
* #param desiredDepth (int) The desired path depth to retrieve. If a path
* depth is provided that exceeds the number of directories within the
* supplied path string then the depth value will be modified to the MAX
* depth of that supplied path string. If a depth of 0 is provided, then
* a Null String ("") is returned.<br>
*
* #return (String) The Path string at the specified depth.
*/
public String getPathFromDepth(String absolutePath, int desiredDepth) {
// Convert separators in supplied path string to forward slash (/)
if (absolutePath.contains("\\")) {
absolutePath = absolutePath.replace("\\", "/");
}
// Determine if the path string starts with a separator.
boolean separatorStart = false;
if (absolutePath.startsWith("/")) {
separatorStart = true; // Flag the fact
absolutePath = absolutePath.substring(1); // Remove the starting seperator
}
// Split the supplied (and modified) path string into an array:
String[] pathParts = absolutePath.split("/");
// See if the supplied depth goes beyond limits...
if (desiredDepth > pathParts.length) {
// It does, so make it to Max Limit.
desiredDepth = pathParts.length;
}
// Prepare to build the new Path.
StringBuilder sb = new StringBuilder("");
// Iterate through the created array one element at a time:
for (int i = 0; i < desiredDepth; i++) {
/* If the StringBuilder object contains something
then append a System File Separator character. */
if (!sb.toString().isEmpty()) {
sb.append(File.separator);
}
/* If the supplied path started with a Seperator then
make sure the returned path does too. Append a File
Sperator character to the build. */
if (separatorStart) {
sb.append(File.separator);
separatorStart = false; // Remove the flag so this doesn't get applied again.
}
// Append the current path segment from the String[] Array:
sb.append(pathParts[i]);
}
// Return the Build Path String:
return sb.toString();
}

Related

I need to prase integers after a specific character from list of strings

i got a problem here guys. I need to get all the numbers from a string here from a list of strings.
Lets say one of the strings in the list is "Jhon [B] - 14, 15, 16"
and the format of the strings is constant, every string has maximum of 7 numbers in it and the numbers are separated with "," . I want to get every number after the "-". i am really confused here, i tried everything i know of but i am not getting even close.
public static List<String> readInput() {
final Scanner scan = new Scanner(System.in);
final List<String> items = new ArrayList<>();
while (scan.hasNextLine()) {
items.add(scan.nextLine());
}
return items;
}
public static void main(String[] args) {
final List<String> stats= readInput();
}
}
You could...
Just manually parse the String using things like String#indexOf and String#split (and String#trim)
String text = "Jhon [B] - 14, 15, 16";
int indexOfDash = text.indexOf("-");
if (indexOfDash < 0 && indexOfDash + 1 < text.length()) {
return;
}
String trailingText = text.substring(indexOfDash + 1).trim();
String[] parts = trailingText.split(",");
// There's probably a really sweet and awesome
// way to use Streams, but the point is to try
// and keep it simple 😜
List<Integer> values = new ArrayList<>(parts.length);
for (int index = 0; index < parts.length; index++) {
values.add(Integer.parseInt(parts[index].trim()));
}
System.out.println(values);
which prints
[14, 15, 16]
You could...
Make use of a custom delimiter for Scanner for example...
String text = "Jhon [B] - 14, 15, 16";
Scanner parser = new Scanner(text);
parser.useDelimiter(" - ");
if (!parser.hasNext()) {
// This is an error
return;
}
// We know that the string has leading text before the "-"
parser.next();
if (!parser.hasNext()) {
// This is an error
return;
}
String trailingText = parser.next();
parser = new Scanner(trailingText);
parser.useDelimiter(", ");
List<Integer> values = new ArrayList<>(8);
while (parser.hasNextInt()) {
values.add(parser.nextInt());
}
System.out.println(values);
which prints...
[14, 15, 16]
Or You could use a method that will extract signed or unsigned Whole or floating point numbers from a string. The method below makes use of the String#replaceAll() method:
/**
* This method will extract all signed or unsigned Whole or floating point
* numbers from a supplied String. The numbers extracted are placed into a
* String[] array in the order of occurrence and returned.<br><br>
*
* It doesn't matter if the numbers within the supplied String have leading
* or trailing non-numerical (alpha) characters attached to them.<br><br>
*
* A Locale can also be optionally supplied so to use whatever decimal symbol
* that is desired otherwise, the decimal symbol for the system's current
* default locale is used.
*
* #param inputString (String) The supplied string to extract all the numbers
* from.<br>
*
* #param desiredLocale (Optional - Locale varArgs) If a locale is desired for a
* specific decimal symbol then that locale can be optionally
* supplied here. Only one Locale argument is expected and used
* if supplied.<br>
*
* #return (String[] Array) A String[] array is returned with each element of
* that array containing a number extracted from the supplied
* Input String in the order of occurrence.
*/
public static String[] getNumbersFromString(String inputString, java.util.Locale... desiredLocale) {
// Get the decimal symbol the the current system's locale.
char decimalSeparator = new java.text.DecimalFormatSymbols().getDecimalSeparator();
/* Is there a supplied Locale? If so, set the decimal
separator to that for the supplied locale */
if (desiredLocale != null && desiredLocale.length > 0) {
decimalSeparator = new java.text.DecimalFormatSymbols(desiredLocale[0]).getDecimalSeparator();
}
/* The first replaceAll() removes all dashes (-) that are preceeded
or followed by whitespaces. The second replaceAll() removes all
periods from the input string except those that part of a floating
point number. The third replaceAll() removes everything else except
the actual numbers. */
return inputString.replaceAll("\\s*\\-\\s{1,}","")
.replaceAll("\\.(?![\\d](\\.[\\d])?)", "")
.replaceAll("[^-?\\d+" + decimalSeparator + "\\d+]", " ")
.trim().split("\\s+");
}

Retrieve Line Numbers from Diff Patch Match

I am working on a project that compares two large text file versions (around 5000+ lines of text). The newer version contains potentially new and removed content. It is intended to help detect early changes in text versions as a team receives information from that text.
To solve the problem, I use the diff-match-patch libary, which allows me to identify already removed and new content. In the first step I search for changes.
public void compareStrings(String oldText, String newText){
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
}
Then I filter the list by the keywords INSERT/DELETE to get only the new/removed content.
public String showAddedElements(){
String insertions = "";
for(Diff elem: diffs){
if(elem.operation == Operation.INSERT){
insertions = insertions + elem.text + System.lineSeparator();
}
}
return insertions;
}
However, when I output the contents, I sometimes get only single letters, like (o, contr, ler), when only single characters were removed/added. Instead, I would like to output the whole sentence in which a change occured.
Is there a way to also retrieve the line number from the DiffMatchPatch where the changes occured?
I have found a solution by using another libary for the line extraction. The DiffUtils (Class DiffUtils of DMitry Maumenko) helped me achieve the desired goal.
/**
* Converts a String to a list of lines by dividing the string at linebreaks.
* #param text The text to be converted to a line list
*/
private List<String> fileToLines(String text) {
List<String> lines = new LinkedList<String>();
Scanner scanner = new Scanner(text);
while (scanner.hasNext()) {
String line = scanner.nextLine();
lines.add(line);
}
scanner.close();
return lines;
}
/**
* Starts a line-by-line comparison between two strings. The results are included
* in an intern list element for further processing.
*
* #param firstText The first string to be compared
* #param secondText The second string to be compared
*/
public void startLineByLineComparison(String firstText, String secondText){
List<String> firstString = fileToLines(firstText);
List<String> secondString = fileToLines(secondText);
changes = DiffUtils.diff(firstString, secondString).getDeltas();
}
After inserting the list with changes can be extracted by using the following code, whereas elem.getType() represents the type of difference between the text:
/**
* Returns a String filled with all removed content including line position
* #return String with removed content
*/
public String returnRemovedContent(){
String deletions = "";
for(Delta elem: changes){
if(elem.getType() == TYPE.DELETE){
deletions = deletions + appendLines(elem.getOriginal()) + System.lineSeparator();
}
}
return deletions;
}

How to parse data from mongodb document

I am using change stream to see the changes in mongodb. I retrieve the document in the below format, now how to parse in strings. I need the value of $oid and name
Full document is
{"_id": {"$oid": "5c60f87a9ea5deac53457e9c"}, "name": "freddy"}
I am using Java code
MongoCursor<ChangeStreamDocument<BasicDBObject>> cursor1 = collection.watch().iterator();
System.out.println("Connection Completely Established 4");
for(int i = 1; i <= 200; i++)
{
ChangeStreamDocument<BasicDBObject> next1 = cursor1.next();
System.out.println("Operation Type is " + next1.getOperationType());
System.out.println("Database Name is" + next1.getDatabaseName());
System.out.println("Full Document is " + next1.getFullDocument());
}
If you know the format of the document string then you can use a method to acquire the data needed. The getBetween() method below will retrieve the information you want from your provided document string, here is how it might be used to achieve this:
String docString = "Full Document is {\"_id\": {\"$oid\": \"5c60f87a9ea5deac53457e9c\"}, \"name\": \"freddy\"}";
String oid = getBetween(docString, "$oid\": \"", "\"}")[0];
String name = getBetween(docString, "name\": \"", "\"}")[0];
System.out.println(oid);
System.out.println(name);
The Console Window will display:
5c60f87a9ea5deac53457e9c
freddy
Here is the getBetween() method:
/**
* Retrieves any string data located between the supplied string leftString
* parameter and the supplied string rightString parameter.<br><br>
* <p>
* <p>
* This method will return all instances of a substring located between the
* supplied Left String and the supplied Right String which may be found
* within the supplied Input String.<br>
*
* #param inputString (String) The string to look for substring(s) in.
*
* #param leftString (String) What may be to the Left side of the substring
* we want within the main input string. Sometimes the
* substring you want may be contained at the very
* beginning of a string and therefore there is no
* Left-String available. In this case you would simply
* pass a Null String ("") to this parameter which
* basically informs the method of this fact. Null can
* not be supplied and will ultimately generate a
* NullPointerException.
*
* #param rightString (String) What may be to the Right side of the
* substring we want within the main input string.
* Sometimes the substring you want may be contained at
* the very end of a string and therefore there is no
* Right-String available. In this case you would simply
* pass a Null String ("") to this parameter which
* basically informs the method of this fact. Null can
* not be supplied and will ultimately generate a
* NullPointerException.
*
* #param options (Optional - Boolean - 2 Parameters):<pre>
*
* ignoreLetterCase - Default is false. This option works against the
* string supplied within the leftString parameter
* and the string supplied within the rightString
* parameter. If set to true then letter case is
* ignored when searching for strings supplied in
* these two parameters. If left at default false
* then letter case is not ignored.
*
* trimFound - Default is true. By default this method will trim
* off leading and trailing white-spaces from found
* sub-string items. General sentences which obviously
* contain spaces will almost always give you a white-
* space within an extracted sub-string. By setting
* this parameter to false, leading and trailing white-
* spaces are not trimmed off before they are placed
* into the returned Array.</pre>
*
* #return (1D String Array) Returns a Single Dimensional String Array
* containing all the sub-strings found within the supplied Input
* String which are between the supplied Left String and supplied
* Right String.
*/
public String[] getBetween(String inputString, String leftString, String rightString, boolean... options) {
// Return nothing if nothing was supplied.
if (inputString.equals("") || (leftString.equals("") && rightString.equals(""))) {
return null;
}
// Prepare optional parameters if any supplied.
// If none supplied then use Defaults...
boolean ignoreCase = false; // Default.
boolean trimFound = true; // Default.
if (options.length > 0) {
if (options.length >= 1) {
ignoreCase = options[0];
}
if (options.length >= 2) {
trimFound = options[1];
}
}
// Remove any ASCII control characters from the
// supplied string (if they exist).
String modString = inputString.replaceAll("\\p{Cntrl}", "");
// Establish a List String Array Object to hold
// our found substrings between the supplied Left
// String and supplied Right String.
List<String> list = new ArrayList<>();
// Use Pattern Matching to locate our possible
// substrings within the supplied Input String.
String regEx = Pattern.quote(leftString)
+ (!rightString.equals("") ? "(.*?)" : "(.*)?")
+ Pattern.quote(rightString);
if (ignoreCase) {
regEx = "(?i)" + regEx;
}
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(modString);
while (matcher.find()) {
// Add the found substrings into the List.
String found = matcher.group(1);
if (trimFound) {
found = found.trim();
}
list.add(found);
}
return list.toArray(new String[0]);
}

Finding the strings in a TreeSet that start with a given prefix

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.
(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)
How can I do this?
To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.
But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:
/**
* #param prefix
* #return The least string that's greater than all strings starting with
* prefix, if one exists. Otherwise, returns Optional.empty().
* (Specifically, returns Optional.empty() if the prefix is the
* empty string, or is just a sequence of Character.MAX_VALUE-s.)
*/
private static Optional<String> incrementPrefix(final String prefix) {
final StringBuilder sb = new StringBuilder(prefix);
// remove any trailing occurrences of Character.MAX_VALUE:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
sb.setLength(sb.length() - 1);
}
// if the prefix is empty, then there's no upper bound:
if (sb.length() == 0) {
return Optional.empty();
}
// otherwise, increment the last character and return the result:
sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
return Optional.of(sb.toString());
}
To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:
/**
* #param allElements - a SortedSet of strings. This set must use the
* natural string ordering; otherwise this method
* may not behave as intended.
* #param prefix
* #return The subset of allElements containing the strings that start
* with prefix.
*/
private static SortedSet<String> getElementsWithPrefix(
final SortedSet<String> allElements, final String prefix) {
final Optional<String> endpoint = incrementPrefix(prefix);
if (endpoint.isPresent()) {
return allElements.subSet(prefix, endpoint.get());
} else {
return allElements.tailSet(prefix);
}
}
See it in action at: http://ideone.com/YvO4b3.
If anybody is looking for a shorter version of ruakh's answer:
First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)
public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
String first = set.ceiling(prefix);
char[] chars = prefix.toCharArray();
if(chars.length>0)
chars[chars.length-1] = (char) (chars[chars.length-1]+1);
String last = set.floor(new String(chars));
if(first==null || last==null || last.compareTo(first)<0)
return new TreeSet<>();
return set.subSet(first, true, last, true);
}

Java split the path..?

This is the input as string:
"C:\jdk1.6.0\bin\program1.java"
I need output as:
Path-->C:\jdk1.6.0\bin\
file--->program1.java
extension--->.java
Watch out the "\" char. I easily got output for "/".
The File class gives you everything you need:
File f = new File("C:\\jdk1.6.0\\bin\\program1.java");
System.out.println("Path-->" + f.getParent());
System.out.println("file--->" + f.getName());
int idx = f.getName().lastIndexOf('.');
System.out.println("extension--->" + ((idx > 0) ? f.getName().substring(idx) : "") );
EDIT: Thanks Dave for noting that String.lastIndexOf will return -1 if File.getName does not contain '.'.
Consider using an existing solution instead of rolling your own and introducing more code that needs to be tested. FilenameUtils from Apache Commons IO is one example:
http://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/FilenameUtils.html
Since Java's File class does not support probing for the extension, I suggest you create a subclass of File that provides this ability:
package mypackage;
/**
* Enhances java.io.File functionality by adding extension awareness.
*/
public class File extends java.io.File {
/**
* Returns the characters after the last period.
*
* #return An empty string if there is no extension.
*/
public String getExtension() {
String name = getName();
String result = "";
int index = name.lastIndexOf( '.' );
if( index > 0 ) {
result = name.substring( index );
}
return result;
}
}
Now simply substitute your version of File for Java's version and, when combined with Kurt's answer, gives you everything you need.
Notice that using a subclass is ideal because if you wanted to change the behaviour (due to a different operating system using a different extension delimiter token), you need only update a single method and your entire application continues to work. (Or if you need to fix a bug, such as trying to execute str.substring( -1 ).)
In other words, if you extract a file extension in more than one place in your code base, you have made a mistake.
Going further, if you wanted to completely abstract the knowledge of the file type (because some operating systems might not use the . separator), you could write:
/**
* Enhances java.io.File functionality by adding extension awareness.
*/
public class File extends java.io.File {
public File( String filename ) {
super( filename );
}
/**
* Returns true if the file type matches the given type.
*/
public boolean isType( String type ) {
return getExtension().equals( type );
}
/**
* Returns the characters after the last period.
*
* #return An empty string if there is no extension.
*/
private String getExtension() {
String name = getName();
String result = "";
int index = name.lastIndexOf( '.' );
if( index > 0 ) {
result = name.substring( index );
}
return result;
}
}
I would consider this a much more robust solution. This would seamlessly allow substituting a more advanced file type detection mechanism (analysis of file contents to determine the type), without having to change the calling code. Example:
File file = new File( "myfile.txt" );
if( file.isType( "png" ) ) {
System.out.println( "PNG image found!" );
}
If a user saved "myfile.png" as "myfile.txt", the image would still be processed because the advanced version (not shown here) would look for the "PNG" marker that starts every single PNG file in the (cyber) world.
You need to compensate for the double slashes returned in Path (if it has been programmatically generated).
//Considering that strPath holds the Path String
String[] strPathParts = strPath.split("\\\\");
//Now to check Windows Drive
System.out.println("Drive Name : "+strPathParts[0]);

Categories