How to pattern match a complex file path

How to pattern match a complex file path - java

I am using Java 8. I have a scenario where user can upload a document and I have to compare if the uploaded path contains the below path format:
"/abc:doc_home{anyWord}/xyz:docFolder{anyWord}/[someWord]/def:library{anyWord}"
I need the curly braces where I have indicated above and within that any word can be included. Is it possible to do this in regex?

You have included anyWord twice in the expression. If you intended for these two words to be the same word, then no, that is not possible to do with regular expressions.

Related

Regex to check if file does not have an extension

I want to process files based on file extension. I need to process 2 files: one is with .nc extension and another file with does not have any extension
File name could be anything, doesn't matter.
for .nc extension I have .*.nc regex but I need combine regex. I googled but unable to find anything. Could anyone help me with regex which matches these 2 conditions?

You can use this pattern (?(?=.*\.)^.*\.nc$|^.*$)
This is conditional with positive lookahead, which checks if string contains dot (with pattern (?=.*\.)). If it does, then match string with .nc extension (with ^.*\.nc$), if not, then match whole string (with ^.*$).
Demo

You can use the regex (\w+.nc\b|\b\w+\b[^.]). It would capture anything like abc.nc and abc but not abc.rc So it would only capture the required extention or with no extension.

I think this would also do just fine for your case.
^([^\s]+.nc|[^\s.]+)$
.^and$ asserts position at the start and end of line respectively and in between it matches any word character without extension or with .nc extension.

How to validate String in Java by matches?

To validate String in Java I can use String.matches(). I would like to validate a simple string "*.txt" where "*" means anything. Input e.g. test.txt is correct, but test.tt is not correct, because of ".tt". I tried to use matches("[*].txt"), but it doesn't work. How can I improve this matches? Thanks.

Do not use code, you don't understand!
For your simple problem you could totally avoid using a regular expression and just use
yourString.endsWith(".txt")
and if you want to perform this comparison case insensitive (i.e. allow ".TXT" or ".tXt") use
yourString.toLowerCase().endsWith(".txt")
If you want to learn more about regular expressions in java, I'd recomment a tutorial. For example this one.

You may try this for txt files:
"file.txt".matches("^.*[.]txt$")
Basically ^ means the start of your string. .* means match anything greedy, hence as much as you can get to make the expression match. And [.] means match the dot character. The suffix txt is just the txt text itself. And finally $ is the anchor for the end of the string, which ensures that the string does not contain anything more.

Use .+, it means any character having one or unlimited lengths. It will ensure to avoid the inputs like only .txt
matches(".+[.]txt")
FYI: [.] simply matches with the dot character.

Undoing automatic linkification using Java and Regex

I am working with a database whose entries contain automatically generated html links: each URL was converted to
URL
I want to undo these links: the new software will generate the links on the fly. Is there a way in Java to use .replaceAll or a Regex method that will replace the fragments with just the URL (only for those cases where the URLs match)?
To clarify, based on the questions below: the existing entries will contain one or more instances of linkified URLs. Showing an example of just one:
I visited http://www.amazon.com/ to buy a book.
should be replaced with
I visited http://www.amazon.com/ to buy a book.
If the URL in the href differs in any way from the link text, the replacement should not occur.

You can use this pattern with replaceAll method:
<a (?>[^h>]++|\Bh|h(?!ref\b))*href\s*=\s*["']?(http://)?([^\s"']++)["']?[^>]*>\s*+(?:http://)?\2\s*+<\/a\s*+>
replacement: $1$2
I wrote the pattern as a raw pattern thus, don't forget to escape double quotes and using double backslashes before using it.
The main interest of this pattern is that urls are compared without the substring http:// to obtain more results.

First, a reminder that regular expressions are not great for parsing XML/HTML: this HTML should parse out the same as what you've got, but it's really hard to write a regex for it:
<
a
foo="bar"
href="URL">
<nothing/>URL
</a
>
That's why we say "don't use regular expressions to parse XML!"
But it's often a great shortcut. What you're looking for is a back-reference:
\1
This will match when the quoted string and the contents of the a-element are the same. The \1 matches whatever was captured in group 1. You can also use named capturing groups if you like a little more documentation in your regular expressions. See Pattern for more options.

Java Regex for Finding a Pattern and Getting Value in It?

I am working on a plugin. I will parse HTML files. I have a naming convention like that:
<!--$include="a.html" -->
or
<!--$include="a.html"-->
is similar
According to this pattern(similar to server side includes) I want to search an HTML file.
Question is that:
Find that pattern and get value (a.html at my example, it is variable)
It should be like:
while(!notFinishedWholeFile){
fileName = findPatternFunc(htmlFile)
replaceFunc(fileName,something)
}
PS: Using regex at Java or implementing it different(as like using .indexOf()) I don't know which one is better. If regex is good at this situation by performence I want to use it.
Any ideas?

You mean like this?
<!--\$include=\"(?<htmlName>[a-z-_]*).html\"\s?-->

Read a file into a string then
str = str.replaceAll("(?<=<!--\\$include=\")[^\"]+(?=\" ?-->)", something);
will replace the filenames with the string something, then the string can be written back to the file.
(Note: this replaces any text inside the double quotes, not just valid filenames.)
If you want only want to replace filenames with the html extension, swap the [^\"]+ for [^.]+.html.
Using regex for this task is fine performance wise, but see e.g.
How to use regular expressions to parse HTML in Java? and Java Regex performance etc.

I have used that pattern:
"<!--\\$include=\"(.+)(.)(html|htm)\"-->"

Looking for a regex to test for filenames with multiple extensions

I am looking for regular expression to test for files with more than one extensions, i.e test.1.log.old
Thanks,
M.

If you are only testing filenames, then this will work.
(.[^.]+\.){2}
You may need to modify the regex if you try to match against more than file names (path).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to pattern match a complex file path - java

You have included anyWord twice in the expression. If you intended for these two words to be the same word, then no, that is not possible to do with regular expressions.

Related

Regex to check if file does not have an extension

How to validate String in Java by matches?

Undoing automatic linkification using Java and Regex

Java Regex for Finding a Pattern and Getting Value in It?

Looking for a regex to test for filenames with multiple extensions

Categories

Resources