I want to put in a file some regex expressions and separated by a semicolon (or something) another expression, i.e.:
orderNumber:* completionStatus;orderNumber:X completionStatus
I will have a log file what will have:
.... orderNumber:123 completionStatus...
and I want them to look like:
.... orderNumber:X completionStatus...
How can I do this in Java?
I've tried creating a Map with (key: the regex, and value: the replacement), reading my log file and for each line try matching the keys but my output looks the same.
FileInputStream fstream = new FileInputStream(file);
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader( in ));
FileWriter fstreamError = new FileWriter(myFile.replace(".", "Replaced."));
BufferedWriter output = new BufferedWriter(fstreamError);
while ((strFile = br.readLine()) != null) {
for (String clave: expressions.keySet()) {
Pattern p = Pattern.compile(clave);
Matcher m = p.matcher(strFile); // get a matcher object
strFile = m.replaceAll(expressions.get(clave));
System.out.println(strFile);
}
}
Any thoughts on this?
It seems like you are on a good path. I would however suggest several things:
Do not compile the regex every time. You should have them all precomplied and just produce new matchers from them in your loop.
You aren't really using the map as a map, but as a collection of pairs. You could easily make a small class RegexReplacement and then just have a List<RegexReplacement> that you iterate over in the loop.
class RegexReplacement {
final Pattern regex;
final String replacement;
RegexReplacement(String regex, String replacement) {
this.regex = Pattern.compile(regex);
this.replacement = replacement;
}
String replace(String in) { return regex.matcher(in).replaceAll(replacement); }
}
is this what you are looking for?
import java.text.MessageFormat;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexpTests {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String text = "orderNumber:123 completionStatus";
String regexp = "(.*):\\d+ (.*)";
String msgFormat = "{0}:X {1}";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(text);
MessageFormat mf = new MessageFormat(msgFormat);
if (m.find()) {
String[] captures = new String[m.groupCount()];
for (int i = 0; i < m.groupCount(); i++) {
captures[i] = m.group(i + 1);
}
System.out.println(mf.format(msgFormat, captures));
}
}
}
Related
I try to write a program that counts all the words in text file.
I put any word that matches the patterns in TreeMap.
The text file I get through args0
For example, the text file contains this text: The Project Gutenberg EBook of The Complete Works of William Shakespeare
The condition that checks if the TreeMap already has the word, return false for the second appearance of word The, but returns true the second appearance of word of.
I don't understand why...
This is my code:
public class WordCount
{
public static void main(String[] args)
{
// Charset charset = Charset.forName("UTF-8");
// Locale locale = new Locale("en", "US");
Path p0 = Paths.get(args[0]);
Path p1 = Paths.get(args[1]);
Path p2 = Paths.get(args[2]);
Pattern pattern1 = Pattern.compile("[a-zA-Z]");
Matcher matcher;
Pattern pattern2 = Pattern.compile("'.");
Map<String, Integer> alphabetical = new TreeMap<String, Integer>();
try (BufferedReader reader = Files.newBufferedReader(p0))
{
String line = null;
while ((line = reader.readLine()) != null)
{
// System.out.println(line);
for (String word : line.split("\\s"))
{
boolean found = false;
matcher = pattern1.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
boolean check = alphabetical.containsKey(word.toLowerCase());
if (!alphabetical.containsKey(word.toLowerCase()))
alphabetical.put(word.toLowerCase(), 1);
else
alphabetical.put(word.toLowerCase(), alphabetical.get(word.toLowerCase()).intValue() + 1);
}
else
{
matcher = pattern2.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
if (!alphabetical.containsKey(word.substring(1, word.length())))
alphabetical.put(word.substring(1, word.length()).toLowerCase(), 1);
else
alphabetical.put(word.substring(1, word.length()).toLowerCase(), alphabetical.get(word).intValue() + 1);
}
}
}
}
}
I've tested your code, it is ok. I think you have to check your file encoding.
It is certainly in "UTF-8". Put it in "UTF-8 without BOM", and you'll be OK !
Edit :
If you can't change the encoding, you can do it manually. See this link :
http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
Regards
I've a file which contains self closing anchor tags
<p><a name="impact"/><span class="sectiontitle">Impact</span></p>
<p><a name="Summary"/><span class="sectiontitle">Summary</span></p>
i want to correct the tags like below
<p><a name="impact"><span class="sectiontitle">Impact</span></a></p>
<p><a name="Summary"><span class="sectiontitle">Summary</span></a></p>
I've written this code to find and replace incorrect anchor tags
package mypack;
import java.io.*;
import java.util.regex.*;
public class AnchorIssue {
static int count=0;
public static void main(String[] args) throws IOException {
Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
BufferedReader r = new BufferedReader
(new FileReader("D:/file.txt"));
String line;
while ((line =r.readLine()) != null) {
Matcher m1= pFinder.matcher(line);
while (m1.find()) {
int start = m1.start(0);
int end = m1.end(0);
++count;
// Use CharacterIterator.substring(offset, end);
String actual=line.substring(start, end);
System.out.println(count+"."+"Actual String :-"+actual);
actual.replace(m1.group(1),"");
System.out.println(actual);
actual.replaceAll(m1.group(3),"</a><");
System.out.println(actual);
// Use CharacterIterator.substring(offset, end);
System.out.println(count+"."+"Replaced"+actual);
}
}
r.close();
}
}
The above code returns the correct number of self-closing anchor tags in file but the replace code is not working properly.
Your problem is greediness. I.e. the .*" will match everything up to the last " in that line. There are two fixes for this.
Both fixes are about to replace this line:
Pattern pFinder = Pattern.compile("<a name=\\\".*\\\"(\\/)>(.*)(<)");
Option one: use a negated character class:
Pattern pFinder = Pattern.compile("<a name=\\\"[^\\"]*\\\"(\\/)>(.*)(<)");
Option two: use lazy repetitor:
Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/)>(.*)(<)");
See more here.
Since the file structure seems "constant", it might be better to simplify the problem to a matter of simple replaces as opposed to complex html matching. It seems to me that you're not really interested in the content of the anchor tag, so just replace /><span with ><span and </span></p> with </span></a></p>.
Using below code i'm able to find and replace all self closed anchor tags.
package mypack;
import java.io.*;
import java.util.regex.*;
public class AnchorIssue {
static int count=0;
public static void main(String[] args) throws IOException {
Pattern pFinder = Pattern.compile("<a name=\\\".*?\\\"(\\/><span)(.*)(<\\/span>)");
BufferedReader r = new BufferedReader
(new FileReader("file.txt"));
String line;
while ((line =r.readLine()) != null) {
Matcher m1= pFinder.matcher(line);
while (m1.find()) {
int start = m1.start(0);
int end = m1.end(0);
++count;
// Use CharacterIterator.substring(offset, end);
String actual=line.substring(start, end);
System.out.println(count+"."+"Actual String : "+actual);
actual= actual.replaceAll(m1.group(1),"><span");
System.out.println("\n");
actual= actual.replaceAll(m1.group(3),"</span></a>");
System.out.println(count+"."+"Replaced : "+actual);
System.out.println("\n");
System.out.println("---------------------------------------------------");
}
}
r.close();
}
}
My Java program needs to launch agrep.exe with parameters for all pairs of elements in a big matrix and get number of matching errors of two stings. I've wrote a code, but it runs very slowly. Can I speed up this part of code? Or, maybe, you can suggest me some java implementation of agrep function?
public static double getSignatureDistance(String one, String two) throws IOException, InterruptedException {
String strReprOne = one.replace(".*","").replace("\\.",".");
String strReprTwo = two.replace(".*","").replace("\\.",".");
PrintWriter writer = new PrintWriter("tmp.txt", "UTF-8");
writer.print(strReprTwo);
writer.close();
List<String> cmd = new ArrayList<>();
cmd.add("agrep.exe");
cmd.add("-B");
cmd.add(one);
cmd.add("tmp.txt");
ProcessBuilder pb = new ProcessBuilder(cmd);
pb.redirectErrorStream(true);
Process proc = pb.start();
BufferedReader in = new BufferedReader(new InputStreamReader(proc.getInputStream()));
StringBuilder lineBuilder = new StringBuilder();
String line = "";
char[] buf = new char[2];
while (in.read(buf) == 2) {
lineBuilder.append(buf);
}
line = lineBuilder.toString();
Pattern p = Pattern.compile("(\\d+)\\serror");
Matcher m = p.matcher(line);
double agrep = 0;
if(m.find()) {
agrep = Double.valueOf(m.group(1));
}
in.close();
proc.destroy();
double length = strReprOne.length();
return agrep/length;
}
Can I use FREJ library for this purpose? For example, perform match of strings, get match result and multiply it by length of matched region?
Nobody knows, so I've used FREJ library.
My program need to add newline after every 3rd element in the arraylist. Here is my input file which contain the following data:
934534519137441534534534366, 0796544345345345348965,
796345345345544894534565, 734534534596544534538965 ,
4058991374534534999999, 34534539624, 91953413789453450452,
9137534534482080, 9153453459137482080, 405899137999999,
9653453564564524, 91922734534779797, 0834534534980001528, 82342398534
6356343430001528, 405899137999999, 9191334534643534547423752,
3065345782642564522021, 826422205645345345645621,
40584564563499137999999, 953453345344624, 3063454564345347,
919242353463428434451, 09934634634604641264, 990434634634641264,
40346346345899137999999, 963445636534653452, 919234634643325857953,
91913453453437987385, 59049803463463453455421, 405899137534534999999,
9192273453453453453758434,
and it goes on to multiple lines.
Code:
public class MyFile {
private static final Pattern ISDN =Pattern.compile("\\s*ISDN=(.*)");
public List<String> getISDNsFromFile(final String fileName) throws IOException {
final Path path = Paths.get(fileName);
final List<String> ret = new ArrayList<>();
Matcher m;
String line;
int index = 0;
try (
final BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);) {
while ((line = reader.readLine()) != null) {
m = ISDN.matcher(line);
if (m.matches()) {
ret.add(m.group(1));
index++;
if(index%3==0){
if(ret.size() == index){
ret.add(m.group(1).concat("\n"));
}
}
}
}
return ret;
}
}
}
I changed your code to read using the "new" Java 7 files I/O and corrected the use of a line separator, along with some formatting. Here I iterate over the list after it was completed. If it is really long you can iterate over it while constructing it.
public class MyFile {
private static final Pattern ISDN = Pattern.compile("\\s*ISDN=(.*)");
private static final String LS = System.getProperty("line.separator");
public List<String> getISDNsFromFile(final String fileName) throws IOException {
List<String> ret = new ArrayList<>();
Matcher m;
List<String> lines = Files.readAllLines(Paths.get(fileName), StandardCharsets.UTF_8);
for (String line : lines) {
m = ISDN.matcher(line);
if (m.matches())
ret.add(m.group(1));
}
for (int i = 3; i < ret.size(); i+=4)
ret.add(i, LS);
return ret;
}
}
I think you do not need to compare the size of arraylist with index. Just remove that condition and try with this
if(index%3==0){
ret.add(System.getProperty("line.separator"));
}
Although I support the comment made by #Peter
I would add them all to a List and only add the formatting when you print. Adding formatting in your data usually leads to confusion.
i am having text file called "Sample.text". It contains multiple lines. From this file, i have search particular string.If staring matches or found in that file, i need to print entire line . searching string is in in middle of the line . also i am using string buffer to append the string after reading the string from text file.Also text file is too large size.so i dont want to iterate line by line. How to do this
You could do it with FileUtils from Apache Commons IO
Small sample:
StringBuffer myStringBuffer = new StringBuffer();
List lines = FileUtils.readLines(new File("/tmp/myFile.txt"), "UTF-8");
for (Object line : lines) {
if (String.valueOf(line).contains("something")) {
myStringBuffer.append(String.valueOf(line));
}
}
we can also use regex for string or pattern matching from a file.
Sample code:
import java.util.regex.*;
import java.io.*;
/**
* Print all the strings that match a given pattern from a file.
*/
public class ReaderIter {
public static void main(String[] args) throws IOException {
// The RE pattern
Pattern patt = Pattern.compile("[A-Za-z][a-z]+");
// A FileReader (see the I/O chapter)
BufferedReader r = new BufferedReader(new FileReader("file.txt"));
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
int start = m.start(0);
// Get ending position
int end = m.end(0);
// Print whatever matched.
// Use CharacterIterator.substring(offset, end);
System.out.println(line.substring(start, end));
}
}
}
}