How to replace string using java regex - java

I've a file which contains self closing anchor tags
<p><a name="impact"/><span class="sectiontitle">Impact</span></p>
<p><a name="Summary"/><span class="sectiontitle">Summary</span></p>
i want to correct the tags like below
<p><a name="impact"><span class="sectiontitle">Impact</span></a></p>
<p><a name="Summary"><span class="sectiontitle">Summary</span></a></p>
I've written this code to find and replace incorrect anchor tags
package mypack;
import java.io.*;
import java.util.regex.*;
public class AnchorIssue {
static int count=0;
public static void main(String[] args) throws IOException {
Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
BufferedReader r = new BufferedReader
(new FileReader("D:/file.txt"));
String line;
while ((line =r.readLine()) != null) {
Matcher m1= pFinder.matcher(line);
while (m1.find()) {
int start = m1.start(0);
int end = m1.end(0);
++count;
// Use CharacterIterator.substring(offset, end);
String actual=line.substring(start, end);
System.out.println(count+"."+"Actual String :-"+actual);
actual.replace(m1.group(1),"");
System.out.println(actual);
actual.replaceAll(m1.group(3),"</a><");
System.out.println(actual);
// Use CharacterIterator.substring(offset, end);
System.out.println(count+"."+"Replaced"+actual);
}
}
r.close();
}
}
The above code returns the correct number of self-closing anchor tags in file but the replace code is not working properly.

Your problem is greediness. I.e. the .*" will match everything up to the last " in that line. There are two fixes for this.
Both fixes are about to replace this line:
Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
Option one: use a negated character class:
Pattern pFinder = Pattern.compile("<a name=\\\"[^\\"]*\\\"(\\/)>(.*)(<)");
Option two: use lazy repetitor:
Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/)>(.*)(<)");
See more here.

Since the file structure seems "constant", it might be better to simplify the problem to a matter of simple replaces as opposed to complex html matching. It seems to me that you're not really interested in the content of the anchor tag, so just replace /><span with ><span and </span></p> with </span></a></p>.

Using below code i'm able to find and replace all self closed anchor tags.
package mypack;
import java.io.*;
import java.util.regex.*;
public class AnchorIssue {
static int count=0;
public static void main(String[] args) throws IOException {
Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/><span)(.*)(<\\/span>)");
BufferedReader r = new BufferedReader
(new FileReader("file.txt"));
String line;
while ((line =r.readLine()) != null) {
Matcher m1= pFinder.matcher(line);
while (m1.find()) {
int start = m1.start(0);
int end = m1.end(0);
++count;
// Use CharacterIterator.substring(offset, end);
String actual=line.substring(start, end);
System.out.println(count+"."+"Actual String : "+actual);
actual= actual.replaceAll(m1.group(1),"><span");
System.out.println("\n");
actual= actual.replaceAll(m1.group(3),"</span></a>");
System.out.println(count+"."+"Replaced : "+actual);
System.out.println("\n");
System.out.println("---------------------------------------------------");
}
}
r.close();
}
}

Related

How to read text from a file between two given patterns in java8

I have one text file. I have to read contents between two given patterns.
for example lets I have a file names - datafromOU.txt
I have to need data from the pattern1 till pattern2.
pattern1 : "CREATE EXTR"
Pattern2 :";"
But problem is - file has multiple occurances of pattern2. So my requirement is to look for pattern1 and then search immediate occurance of pattern2 after the pattern1. I need to store this into one string and then process later.
Can you help me how to read data from pattern1 and immediate occurance of pattern2 into a string variable using java streams?
I use java8. Thanks a lot in advance.
Thanks.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class Main {
static String getPatternFromString(String startPattern,String endPattern,String data) {
int startIndex=data.indexOf(startPattern);
data=data.substring(startIndex);
int endIndex=data.indexOf(endPattern, startIndex + startPattern.length());
data=data.substring(0,endIndex+1);
return data;
}
public static void main(String[] args) throws IOException {
File file1 = new File("C:\\Users\\amish\\Desktop\\partie.txt");
String startPattern="CREATE EXTR";
String endPattern=";";
try (BufferedReader leBuffer1 = new BufferedReader(new FileReader(file1));) {
StringBuilder everything = new StringBuilder();
String line;
while ((line = leBuffer1.readLine()) != null) {
everything.append(line);
}
String data = everything.toString();
data = data.trim();
System.out.println(data);
System.out.println(getPatternFromString(startPattern,endPattern, data));
leBuffer1.close();
} catch (FileNotFoundException exception) {
System.out.println("File not found");
}
}
}

Java reading a file into a list

I need to figure out a way to turn an input file into a list of sentences which are delimited by more than one character, or more specifically, periods and exclamation points (! or .)
My input file has a layout similar to this:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence! This one also.
Heres another one?
Yes another one.
How can I put that file into a list sentence by sentence?
Each sentence in my file is finished once a ! or . character is passed.
There are a decent amount of ways to accomplish what you are asking, but here is one way to read a file into a program and split each line by specific delimiters into a list, while still keeping the delimiters in the sentence.
All of the functionality for turning a file to a list based on multiple delimiters can be found in the turnSentencesToList() method
In my example below I split by: ! . ?
import java.io.File;
import java.io.FileNotFoundException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test{
public static void main(String [] args){
LinkedList<String> list = turnSentencesToList("sampleFile.txt");
for(String s: list)
System.out.println(s);
}
private static LinkedList<String> turnSentencesToList(String fileName) {
LinkedList<String> list = new LinkedList<>();
String regex = "\\.|!|\\?";
File file = new File(fileName);
Scanner scan = null;
try {
scan = new Scanner(file);
while(scan.hasNextLine()){
String line = scan.nextLine().trim();
String[] sentences = null;
//we don't need empty lines
if(!line.equals("")) {
//splits by . or ! or ?
sentences = line.split("\\.|!|\\?");
//gather delims because split() removes them
List<String> delims = getDelimiters(line, regex);
if(sentences!=null) {
int count = 0;
for(String s: sentences) {
list.add(s.trim()+delims.get(count));
count++;
}
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}finally {
if(scan!=null)
scan.close();
}
return list;
}
private static List<String> getDelimiters(String line, String regex) {
//this method is used to provide a list of all found delimiters in a line
List<String> allDelims = new LinkedList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
String delim = null;
while(matcher.find()) {
delim = matcher.group();
allDelims.add(delim);
}
return allDelims;
}
}
Based on your example input file, the produced output would be:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence!
This one also.
Heres another one?
Yes another one.

Java Scanner cannot find word

I am sorry if there is something very simple I am missing.
I have the following code:
import java.util.Scanner;
import java.io.File;
import java.util.regex.Pattern;
public class UnJumble
{
String[] ws;
int ind=0;
public static void main(String args[]) throws Exception
{
System.out.println("Enter a jumbled word");
String w = new Scanner(System.in).next();
UnJumble uj = new UnJumble();
uj.ws = new String[uj.fact(w.length())];
uj.makeWords("",w);
int c=1;
Scanner sc = new Scanner(new File("dict.txt"));
for(int i=0; i<uj.ws.length; i++)
{
Pattern pat = Pattern.compile(uj.ws[i].toUpperCase());
if(sc.hasNext(pat))
System.out.println(c+++" : \'"+uj.ws[i]+"\'");
}
System.out.println("Search Completed.");
if(c==1) System.out.println("No word found.");
}
public void makeWords(String p,String s)
{
if(s.length()==0)
ws[ind++] = p;
else
for(int i=0; i<s.length(); i++)
makeWords(p+s.charAt(i),s.substring(0,i)+s.substring(i+1));
}
public int fact(int n)
{
if(n==0) return 1;
else return n*fact(n-1);
}
}
The dict.txt file is the SOWPODS dictionary, which is the official Scrabble dictionary..
I want to take in a jumbled word, and rearrange it to check if it is present in the dictionary. If it is, then print it out.
When I try tra as input, the output says No word Found..
But the output should have the words tar, art and rat.
Please tell me where I am making a mistake. I apologize if I have made a very simple mistake, because this is the first time I am working with Pattern.
This is from the JavaDoc of Scanner.hasNext(Pattern pattern) (with my highlighting)
Returns true if the next complete token matches the specified pattern.
As your Scanner was initialized with file dict.txt, it is positioned on first word.
And the first complete token in dict.txt does not match any of your scambled words, so no match is found.
Note: This assumes you have one word per line
I'd think you may want to change your code to find your scrambled text somewhere in the dictionary file (with start-of-line before and end-of-line after) resulting in a pattern "(^|\\W)"+uj.ws[i].toUpperCase()+"(\\W|$)" and something like
String dictstring = your dictionary as one string;
Matcher m = p.matcher(dictstring);
if(m.find()) {
...
I recommend IOUtils.toString() for reading your file like this:
String dictstring = "";
try(InputStream is = new FileInputStream("dict.txt")) {
dictstring = IOUtils.toString(is);
}
Here's a small example code to get familiar with pattern and matcher:
String dictString= "ONE\r\nTWO\r\nTHREE";
Pattern p = Pattern.compile("(^|\\W)TWO(\\W|$)");
Matcher m = p.matcher(dictString);
if(m.find()) {
System.out.println("MATCH: " + m.group());
}

Java replacing in a file regex

I want to put in a file some regex expressions and separated by a semicolon (or something) another expression, i.e.:
orderNumber:* completionStatus;orderNumber:X completionStatus
I will have a log file what will have:
.... orderNumber:123 completionStatus...
and I want them to look like:
.... orderNumber:X completionStatus...
How can I do this in Java?
I've tried creating a Map with (key: the regex, and value: the replacement), reading my log file and for each line try matching the keys but my output looks the same.
FileInputStream fstream = new FileInputStream(file);
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader( in ));
FileWriter fstreamError = new FileWriter(myFile.replace(".", "Replaced."));
BufferedWriter output = new BufferedWriter(fstreamError);
while ((strFile = br.readLine()) != null) {
for (String clave: expressions.keySet()) {
Pattern p = Pattern.compile(clave);
Matcher m = p.matcher(strFile); // get a matcher object
strFile = m.replaceAll(expressions.get(clave));
System.out.println(strFile);
}
}
Any thoughts on this?
It seems like you are on a good path. I would however suggest several things:
Do not compile the regex every time. You should have them all precomplied and just produce new matchers from them in your loop.
You aren't really using the map as a map, but as a collection of pairs. You could easily make a small class RegexReplacement and then just have a List<RegexReplacement> that you iterate over in the loop.
class RegexReplacement {
final Pattern regex;
final String replacement;
RegexReplacement(String regex, String replacement) {
this.regex = Pattern.compile(regex);
this.replacement = replacement;
}
String replace(String in) { return regex.matcher(in).replaceAll(replacement); }
}
is this what you are looking for?
import java.text.MessageFormat;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexpTests {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String text = "orderNumber:123 completionStatus";
String regexp = "(.*):\\d+ (.*)";
String msgFormat = "{0}:X {1}";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(text);
MessageFormat mf = new MessageFormat(msgFormat);
if (m.find()) {
String[] captures = new String[m.groupCount()];
for (int i = 0; i < m.groupCount(); i++) {
captures[i] = m.group(i + 1);
}
System.out.println(mf.format(msgFormat, captures));
}
}
}

in java, how to print entire line in the file when string match found

i am having text file called "Sample.text". It contains multiple lines. From this file, i have search particular string.If staring matches or found in that file, i need to print entire line . searching string is in in middle of the line . also i am using string buffer to append the string after reading the string from text file.Also text file is too large size.so i dont want to iterate line by line. How to do this
You could do it with FileUtils from Apache Commons IO
Small sample:
StringBuffer myStringBuffer = new StringBuffer();
List lines = FileUtils.readLines(new File("/tmp/myFile.txt"), "UTF-8");
for (Object line : lines) {
if (String.valueOf(line).contains("something")) {
myStringBuffer.append(String.valueOf(line));
}
}
we can also use regex for string or pattern matching from a file.
Sample code:
import java.util.regex.*;
import java.io.*;
/**
* Print all the strings that match a given pattern from a file.
*/
public class ReaderIter {
public static void main(String[] args) throws IOException {
// The RE pattern
Pattern patt = Pattern.compile("[A-Za-z][a-z]+");
// A FileReader (see the I/O chapter)
BufferedReader r = new BufferedReader(new FileReader("file.txt"));
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
int start = m.start(0);
// Get ending position
int end = m.end(0);
// Print whatever matched.
// Use CharacterIterator.substring(offset, end);
System.out.println(line.substring(start, end));
}
}
}
}

Categories