How to extract number suffix from a filename

How to extract number suffix from a filename - java

In Java I have a filename example ABC.12.txt.gz, I want to extract number 12 from the filename. Currently I am using last index method and extracting substring multiple times.

You could try using pattern matching
import java.util.regex.Pattern;
import java.util.regex.Matcher;
// ... Other features
String fileName = "..."; // Filename with number extension
Pattern pattern = Pattern.compile("^.*(\\d+).*$"); // Pattern to extract number
// Then try matching
Matcher matcher = pattern.matcher(fileName);
String numberExt = "";
if(matcher.matches()) {
numberExt = matcher.group(1);
} else {
// The filename has no numeric value in it.
}
// Use your numberExt here.

You can just separate every numeric part from alphanumeric ones by using a regular expression:
public static void main(String args[]) {
String str = "ABC.12.txt.gz";
String[] parts = str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
// view the resulting parts
for (String s : parts) {
System.out.println(s);
}
// do what you want with those values...
}
This will output
ABC.
12
.txt.gz
Then take the parts you need and do what you have to do with them.

We can use something like this to extract the number from a string
String fileName="ABC.12.txt.gz";
String numberOnly= fileName.replaceAll("[^0-9]", "");

Related

Exclude/remove a string between two special characters using regex

I was trying to replace/remove any string between - <branch prefix> /
Example:
String name = Application-2.0.2-bug/TEST-1.0.0.zip
expected output :
Application-2.0.2-TEST-1.0.0.zip
I tried the below regex, but it's not working accurate.
String FILENAME = 2.2.1-Application-2.0.2-bug/TEST-1.0.0.zip
println(FILENAME.replaceAll(".+/", ""))

There can be many ways e.g. you can replace \w+\/ with a "". Note that \w+ means one or more word characters.
Demo:
public class Main {
public static void main(String[] args) {
String FILENAME = "Application-2.0.2-bug/TEST-1.0.0.zip";
FILENAME = FILENAME.replaceAll("\\w+\\/", "");
System.out.println(FILENAME);
}
}
Output:
Application-2.0.2-TEST-1.0.0.zip
ONLINE DEMO

How to remove text between <script></script> tags

I want to remove the content between <script></script>tags. I'm manually checking for the pattern and iterating using while loop. But, I'm getting StringOutOfBoundException at this line:
String script = source.substring(startIndex,endIndex-startIndex);
Below is the complete method:
public static String getHtmlWithoutScript(String source) {
String START_PATTERN = "<script>";
String END_PATTERN = " </script>";
while (source.contains(START_PATTERN)) {
int startIndex=source.lastIndexOf(START_PATTERN);
int endIndex=source.indexOf(END_PATTERN,startIndex);
String script=source.substring(startIndex,endIndex);
source.replace(script,"");
}
return source;
}
Am I doing anything wrong here? And I'm getting endIndex=-1. Can anyone help me to identify, why my code is breaking.

String text = "<script>This is dummy text to remove </script> dont remove this";
StringBuilder sb = new StringBuilder(text);
String startTag = "<script>";
String endTag = "</script>";
//removing the text between script
sb.replace(text.indexOf(startTag) + startTag.length(), text.indexOf(endTag), "");
System.out.println(sb.toString());
If you want to remove the script tags too add the following line :
sb.toString().replace(startTag, "").replace(endTag, "")
UPDATE :
If you dont want to use StringBuilder you can do this:
String text = "<script>This is dummy text to remove </script> dont remove this";
String startTag = "<script>";
String endTag = "</script>";
//removing the text between script
String textToRemove = text.substring(text.indexOf(startTag) + startTag.length(), text.indexOf(endTag));
text = text.replace(textToRemove, "");
System.out.println(text);

You can use a regex to remove the script tag content:
public String removeScriptContent(String html) {
if(html != null) {
String re = "<script>(.*)</script>";
Pattern pattern = Pattern.compile(re);
Matcher matcher = pattern.matcher(html);
if (matcher.find()) {
return html.replace(matcher.group(1), "");
}
}
return null;
}
You have to add this two imports:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

I know I'm probably late to the party. But I would like to give you a regex (really tested solution).
What you have to note here is that when it comes to regular expressions, their engines are greedy by default. So a search string such as <script>(.*)</script> will match the entire string starting from <script> up until the end of the line, or end of the file depending on the regexp options used. This is due to the fact that the search engine uses greedy matching by default.
Now in order to perform the match that you want to in an accurate manner... you could use "lazy" searching.
Search with Lazy loading
<script>(.*?)<\/script>
Now with that, you will get accurate results.
You can read more about about Regexp Lazy & Greedy in this answer.

This worked for me:
private static String removeScriptTags(String message) {
String scriptRegex = "<(/)?[ ]*script[^>]*>";
Pattern pattern2 = Pattern.compile(scriptRegex);
if(message != null) {
Matcher matcher2 = pattern2.matcher(message);
StringBuffer str = new StringBuffer(message.length());
while(matcher2.find()) {
matcher2.appendReplacement(str, Matcher.quoteReplacement(" "));
}
matcher2.appendTail(str);
message = str.toString();
}
return message;
}
Credit goes to nealvs: https://nealvs.wordpress.com/2010/06/01/removing-tags-from-a-string-in-java/

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters

You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>

To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);

Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

Java regex get exact token value

I've string like below , want to get the value of cn=ADMIN , but dont know how to get to using regex efficient way.
group:192.168.133.205:387/cn=ADMIN,cn=groups,dc=mi,dc=com,dc=usa

well ... like this?
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexSample {
public static void main(String[] args) {
String str = "group:192.168.133.205:387/cn=ADMIN,cn=groups,dc=mi,dc=com,dc=usa";
Pattern pattern = Pattern.compile("^.*/(.*)$");
Matcher matcher = pattern.matcher(str);
if (matcher.matches()) {
String right = matcher.group(1);
String[] parts = right.split(",");
for (String part : parts) {
System.err.println("part: " + part);
}
}
}
}
Output is:
part: cn=ADMIN
part: cn=groups
part: dc=mi
part: dc=com
part: dc=usa

String bubba = "group:192.168.133.205:387/cn=ADMIN,cn=groups,dc=mi,dc=com,dc=usa";
String target = "cn=ADMIN";
for(String current: bubba.split("[/,]")){
if(current.equals(target)){
System.out.println("Got it");
}
}

Pattern for regex
cn=([a-zA-Z0-9]+?),
Your name will be in group 1 of matcher. You can extend character classes if you allow spaces etc.

How to parse this string in Java?

prefix/dir1/dir2/dir3/dir4/..
How to parse the dir1, dir2 values out of the above string in Java?
The prefix here can be:
/usr/local/apache2/resumes

If you want to split the String at the / character, the String.split method will work:
For example:
String s = "prefix/dir1/dir2/dir3/dir4";
String[] tokens = s.split("/");
for (String t : tokens)
System.out.println(t);
Output
prefix
dir1
dir2
dir3
dir4
Edit
Case with a / in the prefix, and we know what the prefix is:
String s = "slash/prefix/dir1/dir2/dir3/dir4";
String prefix = "slash/prefix/";
String noPrefixStr = s.substring(s.indexOf(prefix) + prefix.length());
String[] tokens = noPrefixStr.split("/");
for (String t : tokens)
System.out.println(t);
The substring without the prefix "slash/prefix/" is made by the substring method. That String is then run through split.
Output:
dir1
dir2
dir3
dir4
Edit again
If this String is actually dealing with file paths, using the File class is probably more preferable than using string manipulations. Classes like File which already take into account all the intricacies of dealing with file paths is going to be more robust.

...
String str = "bla!/bla/bla/"
String parts[] = str.split("/");
//To get fist "bla!"
String dir1 = parts[0];

In this case, why not use new File("prefix/dir1/dir2/dir3/dir4") and go from there?

String str = "/usr/local/apache/resumes/dir1/dir2";
String prefix = "/usr/local/apache/resumes/";
if( str.startsWith(prefix) ) {
str = str.substring(0, prefix.length);
String parts[] = str.split("/");
// dir1=parts[0];
// dir2=parts[1];
} else {
// It doesn't start with your prefix
}

String result;
String str = "/usr/local/apache2/resumes/dir1/dir2/dir3/dir4";
String regex ="(dir)+[\\d]";
Matcher matcher = Pattern.compile( regex ).matcher( str);
while (matcher.find( ))
{
result = matcher.group();
System.out.println(result);
}
output--
dir1
dir2
dir3
dir4

Using String.split method will surely work as told in other answers here.
Also, StringTokenizer class can be used to to parse the String using / as the delimiter.
import java.util.StringTokenizer;
public class Test
{
public static void main(String []args)
{
String s = "prefix/dir1/dir2/dir3/dir4/..";
StringTokenizer tokenizer = new StringTokenizer(s, "/");
String dir1 = tokenizer.nextToken();
String dir2 = tokenizer.nextToken();
System.out.println("Dir 1 : "+dir1);
System.out.println("Dir 2 : " + dir2);
}
}
Gives the output as :
Dir 1 : prefix
Dir 2 : dir1
Here you can find more about StringTokenizer.

If it's a File, you can get the parts by creating an instanceof File and then ask for its segments.
This is good because it'll work regardless of the direction of the slashes; it's platform independent (except for the "drive letters" in windows...)

public class Test {
public static void main(String args[]) {
String s = "pre/fix/dir1/dir2/dir3/dir4/..";
String prefix = "pre/fix";
String[] tokens = s.substring(prefix.length()).split("/");
for (int i=0; i<tokens.length; i++) {
System.out.println(tokens[i]);
}
}
}

String.split(String regex) is convenient but if you don't need the regular expression handling then go with the substring(..) example, java.util.StringTokenizer or use Apache commons lang 1. The performance difference when not using regular expressions can be a gain of 1 to 2 orders of magnitude in speed.

String s = "prefix/dir1/dir2/dir3/dir4"
String parts[] = s.split("/");
System.out.println(s[0]); // "prefix"
System.out.println(s[1]); // "dir1"
...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract number suffix from a filename - java

In Java I have a filename example ABC.12.txt.gz, I want to extract number 12 from the filename. Currently I am using last index method and extracting substring multiple times.

We can use something like this to extract the number from a string String fileName="ABC.12.txt.gz"; String numberOnly= fileName.replaceAll("[^0-9]", "");

Related

Exclude/remove a string between two special characters using regex

How to remove text between <script></script> tags

complex regular expression in Java

Java regex get exact token value

How to parse this string in Java?

Categories

Resources