I have a problem - I can't seem to be able to remove the new lines/spaces from the beginning/end of a string. I use \s in the beginning and end of the regex and even use .trim() after I get the string, but to no avail.
public void extractInfo(String mydata) {
// regex to extract the user name
Pattern pattern = Pattern.compile("user:\\s*(.*)\\s+branch");
Matcher matcher = pattern.matcher(mydata);
// regex to extract the branch name
Pattern pattern2 = Pattern.compile("branch:\\s*(.*)\\s+changed");
Matcher matcher2 = pattern2.matcher(mydata);
// regex to extract the comment and write it in a variable
comment = mydata.replaceAll("(?s)\\s.*java;[0-9,.]+|.*java;NONE\\s", "");
// put the author name in a variable
matcher.find();
author = matcher.group(1).toString();
// put the branch name in a variable
matcher2.find();
branch = matcher2.group(1).toString();
author.trim();
comment.trim();
branch.trim();
}
This is what I use to extract the info.
This is the output I get (lines kept), after I append the extracted information using StringBuilder:
git log --all -100 --before="2013-03-11" --branches=HEAD
--author="\(cholakov\)" --grep="^[#]*[0]*23922:[ ]*user:
Fixed the message for defaulted bonds " --pretty="%H - %s ; %ad"
The new line after user: is what causes the whole command to fail when I try to execute it in cmd, that's what I need fixed.
And this is my input (can't seem to be able to keep the formatting, DataObjectParser.java;1.94 is on a new line and there is no line skipped between each line):
user: cholakov
branch: HEAD
changed files:
DataObjectParser.java;1.94
Fixed the message for defaulted bonds
author.trim();
is a no-op since String is an immutable class. Use
author = author.trim();
calling author.trim RETURNS a new String, but it does not replace the one you call it from.
The trim function returns a copy of the string, with leading and trailing whitespace omitted. You should do this instead:
author = author.trim();
comment = comment.trim();
branch = branch.trim();
I think that you can completely remove the .trim() in the end if you change your regex a bit:
Pattern pattern = Pattern.compile("user:\\s*(.*?)\\s+branch");
Pattern pattern2 = Pattern.compile("branch:\\s*(.*?)\\s+changed");
comment = mydata.replaceAll("(?s)\\s*.*java;(?:[0-9,.]+|NONE)\\s*", "");
I tweaked your regex a little in each; namely made some (.*) into (.*?) so that you can remove all the trailing spaces and simplified your replace comment a bit. Try to see if that solves your issues ^^
EDIT:
Try running one last replace on the comment:
comment = comment.replaceAll("^\\s*|\\s*$", "");
Hey Try to run this stuff :
public class ReadFile {
public static void main(String[] args) {
String line = "\n Java ";
System.out.println(line.trim());
}
}
Related
I have a file with a long string a I would like to split it by specific item i.e.
String line = "{{[Metadata{"this, is my first, string"}]},{[Metadata{"this, is my second, string"}]},{[Metadata{"this, is my third string"}]}}"
String[] tab = line.split("(?=\\bMetadata\\b)");
So now when I iterate my tab I will get lines starting from word: "Metadata" but I would like lines starting from:
"{[Metadata"
I've tried something like:
String[] tab = line.split("(?=\\b{[Metadata\\b)");
but it doesnt work.
Can anyone help me how to do that, plese?
You may use
(?=\{\[Metadata\b)
See a demo on regex101.com.
Note that the backslashes need to be escaped in Java so that it becomes
(?=\\{\\[Metadata\\b)
Here is solution using a formal pattern matcher. We can try matching your content using the following regex:
(?<=Metadata\\{\")[^\"]+
This uses a lookbehind to check for the Metadata marker, ending with a double quote. Then, it matches any content up to the closing double quote.
String line = "{{[Metadata{\"this, is my first, string\"}]},{[Metadata{\"this, is my second, string\"}]},{[Metadata{\"this, is my third string\"}]}}";
String pattern = "(?<=Metadata\\{\")[^\"]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find( )) {
System.out.println(m.group(0));
}
this, is my first, string
this, is my second, string
this, is my third string
I have an String
String string = "-minY:50 -maxY:100 -minVein:8 -maxVein:10 -meta:0 perChunk:5;";
And I want to somehow get the -meta:0 out of it with regex (replace everything except -meta:0), I made an regex which deletes -meta:0 but I can't make it delete everything except -meta:0
I tried using some other regex but it was ignoring whole line when I had -meta:[0-9] in it, and like you can see I have one line for everything.
This is how it has been deleting -meta:0 from the String:
String meta = string.replaceAll("( -meta:[0-9])", "");
System.out.println(meta);
I just somehow want to reverse that and delete everything except -meta:[0-9]
I couldn't find anything on the page about my issue because everything was ignoring whole line after it found the word, so sorry if there's something similar to this.
You should be capturing your match in a captured group and use it's reference in replacement as:
String meta = string.replaceAll("^.*(-meta:\\d+).*$", "$1");
System.out.println(meta);
//=> "-meta:0"
RegEx Demo
As I understand your requirement you want to :
a) you want to extract meta* from the string
b) replace everything else with ""
You could do something like :
String string = "-minY:50 -maxY:100 -minVein:8 -maxVein:10 -meta:0 perChunk:5;";
Pattern p = Pattern.compile(".*(-meta:[0-9]).*");
Matcher m = p.matcher(string);
if ( m.find() )
{
string = string.replaceAll(m.group(0),m.group(1));
System.out.println("After removal of meta* : " + string);
}
What this code does is it finds meta:[0-9] and retains it and removes other found groups
I am exploring Regular expressions.
Problem statement : Replace String between # and # with the values provided in replacements map.
import java.util.regex.*;
import java.util.*;
public class RegExTest {
public static void main(String args[]){
HashMap<String,String> replacements = new HashMap<String,String>();
replacements.put("OldString1","NewString1");
replacements.put("OldString2","NewString2");
replacements.put("OldString3","NewString3");
String source = "#OldString1##OldString2#_ABCDEF_#OldString3#";
Pattern pattern = Pattern.compile("\\#(.+?)\\#");
//Pattern pattern = Pattern.compile("\\#\\#");
Matcher matcher = pattern.matcher(source);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(buffer, "");
buffer.append(replacements.get(matcher.group(1)));
}
matcher.appendTail(buffer);
System.out.println("OLD_String:"+source);
System.out.println("NEW_String:"+buffer.toString());
}
}
Output: ( Caters to my requirement but does not know who group(1) command works)
OLD_String:#OldString1##OldString2#_ABCDEF_#OldString3#
NEW_String:NewString1NewString2_ABCDEF_NewString3
If I change the code as below
Pattern pattern = Pattern.compile("\\#(.+?)\\#");
with
Pattern pattern = Pattern.compile("\\#\\#");
I am getting below error:
Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 1
I did not understand difference between
"\\#(.+?)\\#" and `"\\#\\#"`
Can you explain the difference?
The difference is fairly straightforward - \\#(.+?)\\# will match two hashes with one or more chars between them, while \\#\\# will match two hashes next to each other.
A more powerful question, to my mind, is "what is the difference between \\#(.+?)\\# and \\#.+?\\#?"
In this case, what's different is what is (or isn't) getting captured. Brackets in a regex indicate a capture group - basically, some substring you want to output separately from the overall matched string. In this case, you're capturing the text in between the hashes - the first pattern will capture and output it separately, while the second will not. Try it yourself - asking for matcher.group(1) on the first will return that text, while the second will produce an exception, even though they both match the same text.
.+? Tells it to match (one or more of) anything lazily (until it sees a #). So as soon as it parses one instance of something, it stops.
I think the \#\# would match ## so i think the error is because it only matches that one ## and then there's only a group 0, no group 1. But not 100% on that part.
I need a regex for this example:
//This is a comment and I need this \n position
String notwanted ="//I do not need this end of line position";
Try this regex:
(?<!")\/\/[^\n]+(\n)
you can use Matcher method matcher.start(1) to get index of \n character, but in will not match String where \\ is preceded by ". Example in Java:
public class Main {
public static void main(String[] args){
String example = "//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";";
Pattern regex = Pattern.compile("(?<!\")//[^\\n]+(\\n)");
Matcher matcher = regex.matcher(example);
while (matcher.find()) {
System.out.println(matcher.start(1));
}
}
}
however it would be enough to use:
(?<!")\/\/[^\n]+
and just use matcher.end(), to get start position of new line.
Another case, if you would like to split a string using this position, you can also use this one:
example.split("(?<=^//[^\n]{0,1000})\n");
The (?<=^//[^\n]{0,999}) means:
?<= - lookbehind,
^// - beginning of a line, fallowed by // comments sign
[^\n]{0,1000} - multiple characters but not new lines; here is tricky thing, as lookbehind need to have defined lenght, you cannot use quatifires like * or +, this is why you need to use interval, in this case, from 0 to 1000 characters, but be aware, if your comment is more than 1000 characters (not too possible but still possible), it will not work - so set this number (1000 in this example) carefully
\n - new line you are looking for
but if you would like to split whole string in multiple places, you will need to add modifier (?m) - multiline match - on the beginning of regex:
(?m)(?<=^//[^\n]{0,1000})\n
but I'm not entirely sure
>>EDIT<< response to questions from comments
Try this code:
public class Main {
public static void main(String[] args){
String example =
"//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";\n" +
"String a = aaa; //comment here";
Pattern regex = Pattern.compile("(?m)(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)");
Matcher matcher = regex.matcher(example);
while(matcher.find()){
System.out.println(matcher.start());
}
System.out.println(example.replaceAll("(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)", " (X)\n"));
}
}
maybe this regex will fulfill your expectations. If not, please redefine and ask another question with more details like: input, expexted output, your current code, your goal.
This should work for you. It's really really awful. Couldn't really think of a much better, versatile solution. I'm assuming you also wanted comments like this:
String myStr = "asasdasd"; //some comment here
^[^"\n]*?(?:[^"\n]*?"(?>\\"|[^"\n])*?"[^"\n]*?)*?[^"\n]*?\/\/.*?(\n)
Regex101
Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");