Search for a String in a Text File using Java 8

Search for a String in a Text File using Java 8 - java

I have a long text file that I want to read and extract some data out of it. Using JavaFX and FXML, I am using FileChooser to load the file to get the file path.
My controller.java has the following:
private void handleButtonAction(ActionEvent event) throws IOException {
FileChooser fileChooser = new FileChooser();
FileChooser.ExtensionFilter extFilter = new FileChooser.ExtensionFilter("TXT files (*.txt)", "*.txt");
fileChooser.getExtensionFilters().add(extFilter);
File file = fileChooser.showOpenDialog(stage);
System.out.println(file);
stage = (Stage) button.getScene().getWindow();
}
Sample of text file: Note some of the file content is split between 2 lines. for Example -Ba\ 10.10.10.3 is part of the first line.
net ip-interface create 10.10.10.2 255.255.255.128 MGT-1 -Ba \
10.10.10.3
net ip-interface create 192.168.1.1 255.255.255.0 G-1 -Ba \
192.168.1.2
net route table create 10.10.10.5 255.255.255.255 10.10.10.1 -i \
MGT-1
net route table create 10.10.10.6 255.255.255.255 10.10.10.1 -i \
MGT-1
I am looking for a way to search this (file) and output the following:
MGT-1 ip-interface 10.10.10.2
MGT-1 Backup ip-interface 10.10.10.3
G-1 ip-interface 192.168.1.1
G-1 Backup Ip-interface 192.168.1.2
MGT-1 route 10.10.10.5 DFG 10.10.10.1
MGT-1 route 10.10.10.6 DFG 10.10.10.1

Of course you can read the input file as the stream of lines using BufferedReader.lines or Files.lines. However the tricky thing here is how to deal with the trailing "\". There are several possible solutions. You may write your own Reader which wraps an existing Reader and just ignores the slash followed by EOL. Alternatively you can write a custom Iterator or Spliterator which takes the BufferedReader.lines stream as the input and handles this case. I'd suggest to use my StreamEx library which already has a method for such tasks called collapse:
StreamEx.ofLines(reader).collapse((a, b) -> a.endsWith("\\"),
(a, b) -> a.substring(0, a.length()-1).concat(b));
The first argument is the predicate which is applied for two adjacent lines and should return true if lines should be merged. The second argument is the function which actually merges two lines (we chop the slash via substring, then concatenate the next line).
Now you can just split the line by the whitespace and convert it to one or two output lines according to your task. Better to do it by the separate method. The whole code:
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.regex.Pattern;
import java.util.stream.Stream;
import javax.util.streamex.StreamEx;
public class ParseFile {
static Stream<String> convertLine(String[] fields) {
switch(fields[1]) {
case "ip-interface":
return Stream.of(fields[5]+" "+fields[1]+" "+fields[3],
fields[5]+" Backup "+fields[1]+" "+fields[7]);
case "route":
return Stream.of(fields[8]+" route "+fields[4]+" DFG "+fields[6]);
default:
throw new IllegalArgumentException("Unrecognized input: "+
String.join(" ", fields));
}
}
static Stream<String> convert(Reader reader) {
return StreamEx.ofLines(reader)
.collapse((a, b) -> a.endsWith("\\"),
(a, b) -> a.substring(0, a.length()-1).concat(b))
.map(Pattern.compile("\\s+")::split)
.flatMap(ParseFile::convertLine);
}
public static void main(String[] args) throws IOException {
try(Reader r = new InputStreamReader(
ParseFile.class.getResourceAsStream("test.txt"))) {
convert(r).forEach(System.out::println);
}
}
}

Related

Beansheel sampler is not stopping after execution

I am printing the responses in a csv file using beanshell sampler but it is not stopping after completion.
What can be done so that it stops after printing it. Below is the sample code I have used acctId is used in the pre processor from other thread group.
import java.io.FileWriter;
import java.util.Arrays;
import java.io.Writer;
import java.util.List;
char SEPARATOR = ',';
public void writeLine(FileWriter writer, String[] params, char separator)
{
boolean firstParam = true;
StringBuilder stringBuilder = new StringBuilder();
String param = "";
for (int i = 0; i <params.length; i++)
{
param = params[i];
log.info(param);
if (!firstParam)
{
stringBuilder.append(separator);
}
stringBuilder.append(param);
firstParam = false;
}
stringBuilder.append("\n");
log.info(stringBuilder.toString());
writer.append(stringBuilder.toString());
}
String csvFile = "D:/jmeter/test1/result.csv"; // for example '/User/Downloads/blabla.csv'
//String[] params = {"${acctId}", "${tranId}"};
String[] params = {"${acctId}"};
FileWriter fileWriter = new FileWriter(csvFile, true);
writeLine(fileWriter, params, SEPARATOR);
fileWriter.flush();
fileWriter.close();

You can use "View Result Tree" Sampler or "Simple Data Writer" to save the response messages. Just click "Configure" and use save as XML and select "Save response data(XML)" with other required fields. Thought, it is not recommended for load test.

The recommended way of saving a variable into a CSV file is using Sample Variables property
Add the next line to user.properties file (lives in "bin" folder of your JMeter installation)
sample_variables=acctId
Restart JMeter to pick the property up
That's it now when you run your JMeter test in command-line non-GUI mode like:
jmeter -n -t test.jmx -l result.jtl
You will see an extra column in the result.jtl file holding the value of the acctId variable for each sampler.
Also be aware that starting from JMeter 3.1 it is recommended to use Groovy for any form of scripting. You will be able to replace your code with something like:
new File('D:/jmeter/test1/result.csv') << vars.get('acctId') << System.getProperty('line.separator')
If you don't like Groovy syntax be aware that you can use FileUtils.writeStringToFile() function

Replace Pattern match with Preferred Text Java

Hey I am trying to replace the a regex pattern in a directory of files and replace with this character 'X'. I started out trying alter one file but that is not working. I cam eup with the following code any help would be appreciated.
My goal is to read all the file content find the regex pattern and replace it.
Also this code is not working it runs but dose nothing to the text file.
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
public class DataChange {
public static void main(String[] args) throws IOException {
String absolutePathOne = "C:\\Users\\hoflerj\\Desktop\\After\\test.txt";
String[] files = { "test.txt" };
for (String file : files) {
File f = new File(file);
String content = FileUtils.readFileToString(new File(absolutePathOne));
FileUtils.writeStringToFile(f, content.replaceAll("2018(.+)", "X"));
}
}
}
File Content inside the file is:
3-MAAAA2017/2/00346
I am trying to have it read through and replace 2017/2/00346 with XXX's
my goal is to do this for like 3 files at one time also.

Java - Download sequence file in Hadoop

I have problem to copy the binary files (which is store as sequence files in Hadoop) to my local machine. The problem is that the binary file I downloaded from hdfs was not the original binary file I generated when I'm running map-reduce tasks. I Googled similar problems and I guess the issue is that when I copy the sequence files to my local machine, I got the header of the sequence file plus the original file.
My question is: Is there any way to avoid download the header but still preserve my original binary file?
There are two ways I can think about:
I can transform the binary file into some other format like Text so that I can avoid using SequenceFile. After I do copyToLocal, I transform it back to binary file.
I still use the sequence file. But when I generate the binary file, I also generate some meta information about the corresponding sequence file (e.g. the length of the header and the original length of the file). And after I do copyToLocal, I use the downloaded binary file (which contains header, etc.) along with the meta information to recover my original binary file.
I don't know which one is feasible. Could anyone give me a solution? Could you also show me some sample code for the solution you give?
I highly appreciate your help.

I found a workaround for this question. Since downloading sequence file will give you header and other magic word in the binary file, the way I avoid this problem is to transform my original binary file into Base64 String and store it as Text in HDFS and when downloading the encoded binary files, I decode it back to my original binary file.
I know this will take extra time but currently I don't find any other solution to this problem. The hard part to directly remove headers and other magic words in the sequence file is that Hadoop may insert some word "Sync" in between my binary file.
If anyone have a better solution to this problem, I'd be very happy to hear about that. :)

Use a MapReduce Code to read the SequenceFile and use the SequenceFileInputFormat as InputFileFormat to read the Sequence File in HDFS. This would split the file as Key Value pairs and the value would have only the binary file contents which you can use to create your binary file.
Here is a code snippet to split a sequence file that is made of multiple images and split that into individual binary files and write it into local file system.
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class CreateOrgFilesFromSeqFile {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
if (args.length !=2){
System.out.println("Incorrect No of args (" + args.length + "). Expected 2 args: <seqFileInputPath> <outputPath>");
System.exit(-1);
}
Path seqFileInputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "CreateSequenceFile");
job.setJarByClass(M4A6C_CreateOrgFilesFromSeqFile.class);
job.setMapperClass(CreateOrgFileFromSeqFileMapper.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, seqFileInputPath);
FileOutputFormat.setOutputPath(job, outputPath);
//Delete the existing output File
outputPath.getFileSystem(conf).delete(outputPath, true);
System.exit(job.waitForCompletion(true)? 0 : -1);
}
}
class CreateOrgFileFromSeqFileMapper extends Mapper<Text, BytesWritable, NullWritable, Text>{
#Override
public void map(Text key, BytesWritable value, Context context) throws IOException, InterruptedException{
Path outputPath = FileOutputFormat.getOutputPath(context);
FileSystem fs = outputPath.getFileSystem(context.getConfiguration());
String[] filePathWords = key.toString().split("/");
String fileName = filePathWords[filePathWords.length-1];
System.out.println("outputPath.toString()+ key: " + outputPath.toString() + "/" + fileName + "value length : " + value.getLength());
try(FSDataOutputStream fdos = fs.create(new Path(outputPath.toString() + "/" + fileName)); ){
fdos.write(value.getBytes(),0,value.getLength());
fdos.flush();
}
//System.out.println("value: " + value + ";\t baos.toByteArray().length: " + baos.toByteArray().length);
context.write(NullWritable.get(), new Text(outputPath.toString() + "/" + fileName));
}
}

Using Hadoop to find files that contain a particular string

I have around 1000 files and each file is of the size of 1GB. And I need to find a String in all these 1000 files and also which files contains that particular String. I am working with Hadoop File System and all those 1000 files are in Hadoop File System.
All the 1000 files are under real folder, so If I do like this below, I will be getting all the 1000 files. And I need to find which files contains a particular String hello under real folder.
bash-3.00$ hadoop fs -ls /technology/dps/real
And this is my data structure in hdfs-
row format delimited
fields terminated by '\29'
collection items terminated by ','
map keys terminated by ':'
stored as textfile
How I can write MapReduce jobs to do this particular problem so that I can find which files contains a particular string? Any simple example will be of great help to me.
Update:-
With the use of grep in Unix I can solve the above problem scenario, but it is very very slow and it takes lot of time to get the actual output-
hadoop fs -ls /technology/dps/real | awk '{print $8}' | while read f; do hadoop fs -cat $f | grep cec7051a1380a47a4497a107fecb84c1 >/dev/null && echo $f; done
So that is the reason I was looking for some MapReduce jobs to do this kind of problem...

It sounds like you're looking for a grep-like program, which is easy to implement using Hadoop Streaming (the Hadoop Java API would work too):
First, write a mapper that outputs the name of the file being processed if the line being processed contains your search string. I used Python, but any language would work:
#!/usr/bin/env python
import os
import sys
SEARCH_STRING = os.environ["SEARCH_STRING"]
for line in sys.stdin:
if SEARCH_STRING in line.split():
print os.environ["map_input_file"]
This code reads the search string from the SEARCH_STRING environmental variable. Here, I split the input line and check whether the search string matches any of the splits; you could change this to perform a substring search or use regular expressions to check for matches.
Next, run a Hadoop streaming job using this mapper and no reducers:
$ bin/hadoop jar contrib/streaming/hadoop-streaming-*.jar \
-D mapred.reduce.tasks=0
-input hdfs:///data \
-mapper search.py \
-file search.py \
-output /search_results \
-cmdenv SEARCH_STRING="Apache"
The output will be written in several parts; to obtain a list of matches, you can simply cat the files (provided they aren't too big):
$ bin/hadoop fs -cat /search_results/part-*
hdfs://localhost/data/CHANGES.txt
hdfs://localhost/data/CHANGES.txt
hdfs://localhost/data/ivy.xml
hdfs://localhost/data/README.txt
...

To get the filename you are currently processing, do:
((FileSplit) context.getInputSplit()).getPath().getName()
When you are searching your file record by record, when you see hello, emit the above path (and maybe the line or anything else).
Set the number of reducers to 0, they aren't doing anything here.
Does 'row format delimited' mean that lines are delimited by a newline? in which case TextInputFormat and LineRecordReader work fine here.

You can try something like this, though I'm not sure if it's an efficient way to go about it. Let me know if it works - I haven't tested it or anything.
You can use it like this: java SearchFiles /technology/dps/real hello making sure you run it from the appropriate directory of course.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
public class SearchFiles {
public static void main(String[] args) throws IOException {
if (args.length < 2) {
System.err.println("Usage: [search-dir] [search-string]");
return;
}
File searchDir = new File(args[0]);
String searchString = args[1];
ArrayList<File> matches = checkFiles(searchDir.listFiles(), searchString, new ArrayList<File>());
System.out.println("These files contain '" + searchString + "':");
for (File file : matches) {
System.out.println(file.getPath());
}
}
private static ArrayList<File> checkFiles(File[] files, String search, ArrayList<File> acc) throws IOException {
for (File file : files) {
if (file.isDirectory()) {
checkFiles(file.listFiles(), search, acc);
} else {
if (fileContainsString(file, search)) {
acc.add(file);
}
}
}
return acc;
}
private static boolean fileContainsString(File file, String search) throws IOException {
BufferedReader in = new BufferedReader(new FileReader(file));
String line;
while ((line = in.readLine()) != null) {
if (line.contains(search)) {
in.close();
return true;
}
}
in.close();
return false;
}
}

Reading Java Properties file without escaping values

My application needs to use a .properties file for configuration.
In the properties files, users are allow to specify paths.
Problem
Properties files need values to be escaped, eg
dir = c:\\mydir
Needed
I need some way to accept a properties file where the values are not escaped, so that the users can specify:
dir = c:\mydir

Why not simply extend the properties class to incorporate stripping of double forward slashes. A good feature of this will be that through the rest of your program you can still use the original Properties class.
public class PropertiesEx extends Properties {
public void load(FileInputStream fis) throws IOException {
Scanner in = new Scanner(fis);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext()) {
out.write(in.nextLine().replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}
}
Using the new class is a simple as:
PropertiesEx p = new PropertiesEx();
p.load(new FileInputStream("C:\\temp\\demo.properties"));
p.list(System.out);
The stripping code could also be improved upon but the general principle is there.

Two options:
use the XML properties format instead
Writer your own parser for a modified .properties format without escapes

You can "preprocess" the file before loading the properties, for example:
public InputStream preprocessPropertiesFile(String myFile) throws IOException{
Scanner in = new Scanner(new FileReader(myFile));
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext())
out.write(in.nextLine().replace("\\","\\\\").getBytes());
return new ByteArrayInputStream(out.toByteArray());
}
And your code could look this way
Properties properties = new Properties();
properties.load(preprocessPropertiesFile("path/myfile.properties"));
Doing this, your .properties file would look like you need, but you will have the properties values ready to use.
*I know there should be better ways to manipulate files, but I hope this helps.

The right way would be to provide your users with a property file editor (or a plugin for their favorite text editor) which allows them entering the text as pure text, and would save the file in the property file format.
If you don't want this, you are effectively defining a new format for the same (or a subset of the) content model as the property files have.
Go the whole way and actually specify your format, and then think about a way to either
transform the format to the canonical one, and then use this for loading the files, or
parse this format and populate a Properties object from it.
Both of these approaches will only work directly if you actually can control your property object's creation, otherwise you will have to store the transformed format with your application.
So, let's see how we can define this. The content model of normal property files is simple:
A map of string keys to string values, both allowing arbitrary Java strings.
The escaping which you want to avoid serves just to allow arbitrary Java strings, and not just a subset of these.
An often sufficient subset would be:
A map of string keys (not containing any whitespace, : or =) to string values (not containing any leading or trailing white space or line breaks).
In your example dir = c:\mydir, the key would be dir and the value c:\mydir.
If we want our keys and values to contain any Unicode character (other than the forbidden ones mentioned), we should use UTF-8 (or UTF-16) as the storage encoding - since we have no way to escape characters outside of the storage encoding. Otherwise, US-ASCII or ISO-8859-1 (as normal property files) or any other encoding supported by Java would be enough, but make sure to include this in your specification of the content model (and make sure to read it this way).
Since we restricted our content model so that all "dangerous" characters are out of the way, we can now define the file format simply as this:
<simplepropertyfile> ::= (<line> <line break> )*
<line> ::= <comment> | <empty> | <key-value>
<comment> ::= <space>* "#" < any text excluding line breaks >
<key-value> ::= <space>* <key> <space>* "=" <space>* <value> <space>*
<empty> ::= <space>*
<key> ::= < any text excluding ':', '=' and whitespace >
<value> ::= < any text starting and ending not with whitespace,
not including line breaks >
<space> ::= < any whitespace, but not a line break >
<line break> ::= < one of "\n", "\r", and "\r\n" >
Every \ occurring in either key or value now is a real backslash, not anything which escapes something else.
Thus, for transforming it into the original format, we simply need to double it, like Grekz proposed, for example in a filtering reader:
public DoubleBackslashFilter extends FilterReader {
private boolean bufferedBackslash = false;
public DoubleBackslashFilter(Reader org) {
super(org);
}
public int read() {
if(bufferedBackslash) {
bufferedBackslash = false;
return '\\';
}
int c = super.read();
if(c == '\\')
bufferedBackslash = true;
return c;
}
public int read(char[] buf, int off, int len) {
int read = 0;
if(bufferedBackslash) {
buf[off] = '\\';
read++;
off++;
len --;
bufferedBackslash = false;
}
if(len > 1) {
int step = super.read(buf, off, len/2);
for(int i = 0; i < step; i++) {
if(buf[off+i] == '\\') {
// shift everything from here one one char to the right.
System.arraycopy(buf, i, buf, i+1, step - i);
// adjust parameters
step++; i++;
}
}
read += step;
}
return read;
}
}
Then we would pass this Reader to our Properties object (or save the contents to a new file).
Instead, we could simply parse this format ourselves.
public Properties parse(Reader in) {
BufferedReader r = new BufferedReader(in);
Properties prop = new Properties();
Pattern keyValPattern = Pattern.compile("\s*=\s*");
String line;
while((line = r.readLine()) != null) {
line = line.trim(); // remove leading and trailing space
if(line.equals("") || line.startsWith("#")) {
continue; // ignore empty and comment lines
}
String[] kv = line.split(keyValPattern, 2);
// the pattern also grabs space around the separator.
if(kv.length < 2) {
// no key-value separator. TODO: Throw exception or simply ignore this line?
continue;
}
prop.setProperty(kv[0], kv[1]);
}
r.close();
return prop;
}
Again, using Properties.store() after this, we can export it in the original format.

Based on #Ian Harrigan, here is a complete solution to get Netbeans properties file (and other escaping properties file) right from and to ascii text-files :
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Reader;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Properties;
/**
* This class allows to handle Netbeans properties file.
* It is based on the work of : http://stackoverflow.com/questions/6233532/reading-java-properties-file-without-escaping-values.
* It overrides both load methods in order to load a netbeans property file, taking into account the \ that
* were escaped by java properties original load methods.
* #author stephane
*/
public class NetbeansProperties extends Properties {
#Override
public synchronized void load(Reader reader) throws IOException {
BufferedReader bfr = new BufferedReader( reader );
ByteArrayOutputStream out = new ByteArrayOutputStream();
String readLine = null;
while( (readLine = bfr.readLine()) != null ) {
out.write(readLine.replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}//while
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}//met
#Override
public void load(InputStream is) throws IOException {
load( new InputStreamReader( is ) );
}//met
#Override
public void store(Writer writer, String comments) throws IOException {
PrintWriter out = new PrintWriter( writer );
if( comments != null ) {
out.print( '#' );
out.println( comments );
}//if
List<String> listOrderedKey = new ArrayList<String>();
listOrderedKey.addAll( this.stringPropertyNames() );
Collections.sort(listOrderedKey );
for( String key : listOrderedKey ) {
String newValue = this.getProperty(key);
out.println( key+"="+newValue );
}//for
}//met
#Override
public void store(OutputStream out, String comments) throws IOException {
store( new OutputStreamWriter(out), comments );
}//met
}//class

You could try using guava's Splitter: split on '=' and build a map from resulting Iterable.
The disadvantage of this solution is that it does not support comments.

#pdeva: one more solution
//Reads entire file in a String
//available in java1.5
Scanner scan = new Scanner(new File("C:/workspace/Test/src/myfile.properties"));
scan.useDelimiter("\\Z");
String content = scan.next();
//Use apache StringEscapeUtils.escapeJava() method to escape java characters
ByteArrayInputStream bi=new ByteArrayInputStream(StringEscapeUtils.escapeJava(content).getBytes());
//load properties file
Properties properties = new Properties();
properties.load(bi);

It's not an exact answer to your question, but a different solution that may be appropriate to your needs. In Java, you can use / as a path separator and it'll work on both Windows, Linux, and OSX. This is specially useful for relative paths.
In your example, you could use:
dir = c:/mydir

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Search for a String in a Text File using Java 8 - java

Related

Beansheel sampler is not stopping after execution

Replace Pattern match with Preferred Text Java

Java - Download sequence file in Hadoop

Using Hadoop to find files that contain a particular string

Reading Java Properties file without escaping values

Categories

Resources