Extracting substrings from a string in Java

Extracting substrings from a string in Java - java

I have a number of files and they are all called something like name_version_xyz.ext.
In my Java code I need to extract the name and the version part of the filename. I can accomplish this using lastIndexOf where I look for underscore, but I don't think that's the nicest solution. Can this be done with a regexp somehow?
Note that the "name" part can contain any number of underscores.

If you are guaranteed to having the last part of your files named _xyz.ext, then this is really the cleanest way to do it. (If you aren't guaranteed this, then, you will need to figure out something else, of course.)
As the saying goes with regular expressions:
If you solve you a problem with regular expressions, you now have two problems.

You could use Regex but I think it is a bit overkill in this case. So I personally would stick with your current solution.
It is working, not too complicated and that's why I don't see any reasons to switch to another approach.

If you don't want to use regular expression I think the easiest solution is when you retrieve files and get only part without extension and then:
String file = "blah_blah_version_123";
String[] tmp = file.split("_version_");
System.out.println("name = " + tmp[0]);
System.out.println("version = " + tmp[1]);
Output:
name = blah_blah
version = 123

Yes, the regexp as a Java string would just look something like (untested)
(.+)_(\\d+)_([^_]+)\\.(???)
"name" would be group(1), "version" group(2), xyz is group(3), and ext is group(4).

Use a string tokeniser:
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
Or alternatively, String.split():
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#split%28java.lang.String%29

Related

Java Regex for Finding a Pattern and Getting Value in It?

I am working on a plugin. I will parse HTML files. I have a naming convention like that:
<!--$include="a.html" -->
or
<!--$include="a.html"-->
is similar
According to this pattern(similar to server side includes) I want to search an HTML file.
Question is that:
Find that pattern and get value (a.html at my example, it is variable)
It should be like:
while(!notFinishedWholeFile){
fileName = findPatternFunc(htmlFile)
replaceFunc(fileName,something)
}
PS: Using regex at Java or implementing it different(as like using .indexOf()) I don't know which one is better. If regex is good at this situation by performence I want to use it.
Any ideas?

You mean like this?
<!--\$include=\"(?<htmlName>[a-z-_]*).html\"\s?-->

Read a file into a string then
str = str.replaceAll("(?<=<!--\\$include=\")[^\"]+(?=\" ?-->)", something);
will replace the filenames with the string something, then the string can be written back to the file.
(Note: this replaces any text inside the double quotes, not just valid filenames.)
If you want only want to replace filenames with the html extension, swap the [^\"]+ for [^.]+.html.
Using regex for this task is fine performance wise, but see e.g.
How to use regular expressions to parse HTML in Java? and Java Regex performance etc.

I have used that pattern:
"<!--\\$include=\"(.+)(.)(html|htm)\"-->"

Best way to detect logical connectors

I'm trying to detect logical connectors in a string (AND, OR, NOT) using Java. What I would like to do is:
Given a string (e.g ((blue) AND (yellow) OR (pink)), separate each word and put them in a List. The result should be something like {"blue","yellow","pink"}
I know that for match the words, I need to use a regex like \b(AND|OR|NOT)\b.
But I don't know how return each word after or before the connector.
Other question: Is usefull use regex or maybe I have to use contains()?

How about this?
String s = "((blue) AND (yellow) OR (pink))";
s = s.replaceAll("\\(|\\)", "");
String[] words = s.split("AND|OR|NOT");
System.out.println(Arrays.toString(words));
Output:
[blue , yellow , pink]

string s="((blue) AND (yellow) OR (pink))";
s.split("\bAND\b|\bNOT\b|\bOR\b");

You could try using string.split("AND|OR|NOT");.
EDIT: oops, forgot \b:
string.split("\b(AND|OR|NOT)\b");

Parsing this kind of strings is no task for regular expressions, regular expressions represent a finite predefined automaton.
you need to use some sort of pushdown automaton for this task.
http://en.wikipedia.org/wiki/Pushdown_automaton
The easiest way to do this, is with a recursion that identifies a "logical string structure"
(...) AND (...) or (...) OR (...), NOT (...), etc...
and strips the parenthesis, and repeats until you discover a string that doesn't match this kind of structure.
This string is what you're looking for.

Java String quote delimiter

Is there any way in Java to use a special delimiter at the start and the end of a String to avoid having to backslash all of the quotes within that String?
i.e. not have to do this:
String s = "Quote marks like this \" are just the best, here are a few more \" \" \""

No, there is no such option. Sorry.

No - there's nothing like C#'s verbatim string literals or Groovy's slashy strings, for example.
On the other hand, it's the kind of feature which may be included in the future. It's not like it would require any fundamental changes in the type system. I'd be hugely surprised for it to make it into Java 7 this late in the day though, and I haven't seen any suggestions that it'll be in Java 8... so you're in for a long wait :(

The only way to achive this is to put your strings in some other file and read it from Java. For instance a resource bundle.

Its not possible as of now, May be NOT in future also.
if you can give us what and why you are loookng for this kind of feature we can defnitely Suggest some more alternatives

Java Regex - exclude empty tags from xml

let's say I have two xml strings:
String logToSearch = "<abc><number>123456789012</number></abc>"
String logToSearch2 = "<abc><number xsi:type=\"soapenc:string\" /></abc>"
String logToSearch3 = "<abc><number /></abc>";
I need a pattern which finds the number tag if the tag contains value, i.e. the match should be found only in the logToSearch.
I'm not saying i'm looking for the number itself, but rather that the matcher.find method should return true only for the first string.
For now i have this:
Pattern pattern = Pattern.compile("<(" + pattrenString + ").*?>",
Pattern.CASE_INSENSITIVE);
where the patternString is simply "number". I tried to add "<(" + pattrenString + ")[^/>].*?> but it didn't work because in [^/>] each character is treated separately.
Thanks

This is absolutely the wrong way to parse XML. In fact, if you need more than just the basic example given here, there's provably no way to solve the more complex cases with regex.
Use an easy XML parser, like XOM. Now, using xpath, query for the elements and filter those without data. I can only imagine that this question is a precursor to future headaches unless you modify your approach right now.

So a search for "<number[^/>]*>" would find the opening tag. If you want to be sure it isn't empty, try "<number[^/>]*>[^<]" or "<number[^/>]*>[0-9]"

replacing regex in java string

I have this java string:
String bla = "<my:string>invalid_content</my:string>";
How can I replace the "invalid_content" piece?
I know I should use something like this:
bla.replaceAll(regex,"new_content");
in order to have:
"<my:string>new_content</my:string>";
but I can't discover how to create the correct regex
help please :)

You could do something like
String ResultString = subjectString.replaceAll("(<my:string>)(.*)(</my:string>)", "$1whatever$3");

Mark's answer will work, but can be improved with two simple changes:
The central parentheses are redundant if you're not using that group.
Making it non-greedy will help if you have multiple my:string tags to match.
Giving:
String ResultString = SubjectString.replaceAll
( "(<my:string>).*?(</my:string>)" , "$1whatever$2" );
But that's still not how I'd write it - the replacement can be simplified using lookbehind and lookahead, and you can avoid repeating the tag name, like this:
String ResultString = SubjectString.replaceAll
( "(?<=<(my:string)>).*?(?=</\1>)" , "whatever" );
Of course, this latter one may not be as friendly to those who don't yet know regex - it is however more maintainable/flexible, so worth using if you might need to match more than just my:string tags.

See Java regex tutorial and check out character classes and capturing groups.

The PCRE would be:
/invalid_content/
For a simple substitution. What more do you want?

Is invalid_content a fix value? If so you could simply replace that with your new content using:
bla = bla.replaceAll("invalid_content","new_content");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting substrings from a string in Java - java

You could use Regex but I think it is a bit overkill in this case. So I personally would stick with your current solution. It is working, not too complicated and that's why I don't see any reasons to switch to another approach.

Yes, the regexp as a Java string would just look something like (untested) (.+)_(\\d+)_([^_]+)\\.(???) "name" would be group(1), "version" group(2), xyz is group(3), and ext is group(4).

Use a string tokeniser: http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html Or alternatively, String.split(): http://download.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#split%28java.lang.String%29

Related

Java Regex for Finding a Pattern and Getting Value in It?

Best way to detect logical connectors

Java String quote delimiter

Java Regex - exclude empty tags from xml

replacing regex in java string

Categories

Resources