I have a pattern here which finds the integers after a comma.
The problem I have is that my return value is in new lines, so the pattern only works on the new line. How do I fix this? I want it to find the pattern in every line.
All help is appreciated:
url = new URL("https://test.com");
con = url.openConnection();
is = con.getInputStream();
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
String responseData = line;
System.out.println(responseData);
}
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern);
match = pr.matcher(responseData); // String responseData
System.out.println();
while (match.find()) {
System.out.println("Found: " + match.group());
}
Here is the response returned as a string:
test.test.test.test.test-test,0,0,0
test.test.test.test.test-test,2,0,0
test.test.test.test.test-test,0,0,3
Here is the printout:
Found: 0
Found: 0
Found: 0
The problem is with building your String, you're assigning only the last line from the BufferedReader:
responseData = line;
If you print responseData before you try to match, you'll see it's only one line, and not what you expected.
Since you're printing the buffer's content using a System.out.println, you do see the whole result, but what's getting saved to responseData is actually the last line.
You should use a StringBuilder to build the whole string:
StringBuilder str = new StringBuilder();
while ((line = br.readLine()) != null) {
str.append(line);
}
responseData = str.toString();
// now responseData contains the whole String, as you expected
Tip: Use the debugger, it'll make you better understand your code and will help you to find bugs very faster.
You can use the Pattern.MULTILINE option when compiling your regex:
pattern = "(?<=,)\\d+";
pr = Pattern.compile(pattern, Pattern.MULTILINE);
Related
I am trying to remove a specific lines in a text file using regex but I am receiving an Illegal State Exception. I am recently trying to get accustomed to regex and have tried to to use match.matches(); but that solution has not worked for me . any advice to what I am doing wrong
try {
BufferedReader br = new BufferedReader(new FileReader("TestFile.txt"));
//System.out.println(br.toString());
ArrayList<String> list = new ArrayList<String>();
String line= br.readLine() ;
while (br.readLine() != null ) {
//System.out.println(line);
//System.out.println("test1"); {
Pattern regex = Pattern.compile("[^\\s\"]+|\"[^\"]*\"");
Matcher regexMatcher = regex.matcher(line);
String match = regexMatcher.group();// here is where the illegalstateexception occurs
match = removeLeadingChar(match, "\"");
match = removeLeadingChar(match, "\"");
list.add(match);
// }
// br.close();
System.out.println(br);
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.base/java.util.regex.Matcher.group(Unknown Source)
at java.base/java.util.regex.Matcher.group(Unknown Source)
Use Matcher.find() method to see if there is a match in the regular expression pattern. Debug the results of the regexMatcher.find() method in the IDE(e.g. IntelliJ)
try {
BufferedReader br = new BufferedReader(new FileReader("TestFile.txt"));
ArrayList<String> list = new ArrayList<>();
String line;
// Assign one line read from the file to a variable
while ((line = br.readLine()) != null) {
System.out.println(line);
Pattern regex = Pattern.compile("[^\\s\"]+|\"[^\"]*\"");
Matcher regexMatcher = regex.matcher(line);
// Returns true if a match is found for the regular expression pattern.
while (regexMatcher.find()) {
String match = regexMatcher.group();
match = removeLeadingChar(match, "\"");
match = removeLeadingChar(match, "\"");
list.add(match);
}
}
// What is the purpose of this code?
System.out.println(br);
// If you want to output the string elements of the list
System.out.println(list.toString());
// must be closed after use.(to prevent memory leak)
br.close();
} catch (IOException e) {
// exception handling
e.printStackTrace();
}
You had the while loop wrong so it causes the line to be null, try that:
try {
BufferedReader br = new BufferedReader(new FileReader("TestFile.txt"));
ArrayList<String> list = new ArrayList<String>();
String line; // <--- FIXED
while ((line = br.readLine()) != null) { // <--- FIXED
Pattern regex = Pattern.compile("[^\\s\"]+|\"[^\"]*\"");
Matcher regexMatcher = regex.matcher(line);
String match = regexMatcher.group();// here is where the illegalstateexception occurs
match = removeLeadingChar(match, "\"");
match = removeLeadingChar(match, "\"");
list.add(match);
}
br.close();
System.out.println(list.toString());
}
{
"TEST":"189456",
"TEST1":"X_Y_Z",
"TEST2":"Y_Z_W",
"TEST3":"GGG ",
"TEST4":"32423423233322"
},
{
"TEST":"123456",
"TEST1":"X_E_Z",
"TEST2":"T_Z_W",
"TEST3":"EWE ",
"TEST4":"324234243234"
}
This is a .txt file I want to read and print only 189456,123456 from the above file.Can anyone help me in doing this.Please find the code for reference.Please post the easiest code.....
Pattern p = Pattern.compile("\"Test\"\\s*:\\s*\"(.*)\"", Pattern.CASE_INSENSITIVE);
while ( (line = bf.readLine()) != null) {
linecount++;
Matcher m = p.matcher(line);
// indicate all matches on the line
while (m.find()) {
System.out.println(m.group(1));
}
}
Another way to do it:
while ((line = br.readLine()) != null) {
if(line.contains("\"TEST:\"")){
String[] lineValues = line.split(":");
System.out.println(lineValues[1].replace("\"", "").replace(",",""));
}
}
As for a Regex solution :
(.*)\"TEST":\"(.*?)\"
Note the ? , it makes your regex to stop at the first match of ".
With spaces in between :
(.*)\"TEST"\s*:\s*\"(.*?)\"
With provided input, you should read it as json instead of raw text.
com.fasterxml.jackson.databind.ObjectMapper mapper = new com.fasterxml.jackson.databind.ObjectMapper();
List<TestObj> test = new ArrayList<TestObj>();
test = mapper.readValue(new File("c:\\YourFile.txt"), test.getClass());
Where TestObj is something like this:
class TestObj {
String test;
String test1; // You should use json annotation here because it does not match your json field name.
...
// getter setter methods
}
Hope I understood the question the right way :D
String saveData;
Pattern p = Pattern.compile("\"Test\"\\s*:\\s*\"(.*)\"", Pattern.CASE_INSENSITIVE);
while ( (line = bf.readLine()) != null) {
linecount++;
Matcher m = p.matcher(line);
// indicate all matches on the line
if(line.contains("189456") || line.contains("123456")) {
saveData = line;
}
}
if the String you get from readLine() contains the searched string it will save it in saveData
FileInputStream fstream = new FileInputStream("D:\\prac\\src\\test.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
while ((strLine = br.readLine()) != null) {
if(strLine.contains("\"TEST\":")){
System.out.println(strLine.split(":")[1].replaceAll("\"","").replace(",",""));
}
}
br.close();
}
Output:
189456
123456
I want to apply my regular expression not just to the first line of the text file, but to the all lines together.
Currently it matches only when the entire appropriate match is on one line. And if the appropriate match continues on the next line - it doesn't match at all.
class Parser {
public static void main(String[] args) throws IOException {
Pattern patt = Pattern.compile("(include|"
+ "integrate|"
+ "driven based on|"
+ "facilitate through|"
+ "contain|"
+ "using|"
+ "equipped"
+ "integrate|"
+ "implement|"
+ "utilized to facilitate|"
+ "comprise){1}"
+ "[\\s\\w\\,\\(\\)\\;\\:]*\\."); //Regex
BufferedReader r = new BufferedReader(new FileReader("E:/test/test.txt")); // read the file
String line;
PrintWriter pWriter = null;
while ((line = r.readLine()) != null) {
Matcher matcher = patt.matcher(line);
while (matcher.find()) {
try{
pWriter = new PrintWriter(new BufferedWriter(new FileWriter("E:/test/test1.txt", true)));//append any given input
pWriter.println(matcher.group()); //write the result of matcher to the new file
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
if (pWriter != null){
pWriter.flush();
pWriter.close();
}
}
System.out.println(matcher.group());
}
}
}
}
Change while ((line = r.readLine()) != null) to this:
String file = ""; // Basically, a conglomerate of all of the lines in the file
while ((line = r.readLine()) != null) {
file += line; // Append each line to the "file" string
}
Matcher matcher = patt.matcher(file);
while (matcher.find()) {
/* Blah blah blah, your outputting goes here. */
}
The reason why this happens is because you're doing each line individually. For what you want, you need to apply the regex to the file all at once.
Currently the matcher is applied per line, it needs to be applied to the whole file to work as intended.
Regex are greedy, you will match the whole String on the first match unless you have . (or other special characters) in your String:
...
+ "comprise){1}"
+ "[\\s\\w\\,\\(\\)\\;\\:]*\\."); //Regex
On the last line you match any whitespace and word, so pretty much anything but .. Also the {1} and most of the \ are superfluous (because in []):
...
+ "comprise)"
+ "[\\s\\w,();:]*\\."); //Regex
If you don't care about the newline characters just remove them first and it should work (I see no way around it if you have something like "com\nprise" and want to match that):
s = s.replaceAll("\\n+", "");
String sCurrentLine;
br = new BufferedReader(new FileReader(path));
while ((sCurrentLine = br.readLine()) != null) {
Pattern pattern = Pattern.compile(".*?unregistKey\\(tvKey\\.(.*?)\\);");
Matcher m= pattern.matcher(sCurrentLine);
if(m.matches()) {
String abc = m.group(1) ;
System.out.println ("aaaaaaaaaaaaaa" + abc.toString());
}
}
Why this code is looping more than 1 time.
I checked this call to this code but its coming only 1 time.
o/p is N times like this:
aaaaaaaaaaaaaaKEY_1
aaaaaaaaaaaaaaKEY_2
aaaaaaaaaaaaaaKEY_3
aaaaaaaaaaaaaaKEY_CH_UP
aaaaaaaaaaaaaaKEY_PANEL_CH_UP
aaaaaaaaaaaaaaKEY_CH_DOWN
aaaaaaaaaaaaaaKEY_1
aaaaaaaaaaaaaaKEY_2
aaaaaaaaaaaaaaKEY_3
aaaaaaaaaaaaaaKEY_CH_UP
aaaaaaaaaaaaaaKEY_PANEL_CH_UP
aaaaaaaaaaaaaaKEY_CH_DOWN
You will see this output only when the input file contains the same pattern several times (i.e. there are several lines that contain KEY_1, etc).
I need to read the html of a webpage, then find the links and images, then rename the links and images, what i have done
reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), 'UTF-8'));
String line;
while ((line = reader.readLine()) != null) {
regex = "<a[^>]*href=(\"([^\"]*)\"|\'([^\']*)\'|([^\\s>]*))[^>]*>(.*?)</a>";
final Pattern pa = Pattern.compile(regex, Pattern.DOTALL);
final Matcher ma = pa.matcher(s);
if(ma.find()){
string newlink=path+"1-2.html";
//replace the link in href with newlink, how can i do this?
}
html.append(line).append("/r/n");
}
how can i do the comment part
Using regex for parsing HTML can be difficult and unreliable. It's better to use XPath and DOM manipulation for things like that.
Alternatives were mentioned, nevertheless:
Matcher has support to do a "replace all" using a StringBuffer.
The matched text must partially be readded as replacement text, hence all must be in ma.group(1) (2, 3, ...).
DOTALL would let . match newline chars, not needed as using readLine which strips the line end.
There could be more than one link per line.
You had a matcher(s) instead of matcher(line) in the example code.
So the code uses Matcher.appendReplacement and appendTail.
StringBuffer html = new StringBuffer();
reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), 'UTF-8'));
String line;
regex = "(<a[^>]*href=)(\"([^\"]*)\"|\'([^\']*)\'|([^\\s>]*))[^>]*>(.*?)(</a>)";
final Pattern pa = Pattern.compile(regex);
while ((line = reader.readLine()) != null) {
final Matcher ma = pa.matcher(line);
while (ma.find()) {
string newlink=path+"1-2.html";
ma.appendReplacement(html, m.group(1) /* a href */ + ...);
}
ma.appendTail(html);
html.append(line).append("/r/n");
}