How to mask the content of a text file in Java - java

I am looking to mask the content of a text file.
Ex: The text file contains data like
Peter|peter#gmail.com|312-445-9988|....|
John|john#gmail.com|123-457-6789|....|
Expected Output:
Peter|XXXXX#gmail.com|XXX-XXX-XXXX|....|
John|XXXX#gmail.com|XXX-XXX-XXXX|....|
I have to mask the content like phone number and mail ID till peter not #gmail.com
Here is my code that I tried I have tried till reading the data from the text file after that I am not getting any ideas...
public class DataMasking {
public static void main(String args[]) throws IOException{
BufferedReader in = new BufferedReader(new FileReader("Filepath"));
String str;
List<String> parts = new ArrayList<String>();
while ((str = in.readLine()) != null) {
parts.add(str);
}
int size = parts.size();
//we are reducing the size by one because we are not counting the first line(Only contains file name and time stamp).
size = size-1;
System.out.println("The Number of lines in the text file "+size);
Any help is appreciated.

Okay so may be you want something like this. Try it -
BufferedReader in = new BufferedReader(new FileReader("filepath"));
String str;
List<String> parts = new ArrayList<String>();
while ((str = in.readLine()) != null) {
parts.add(str);
}
List<String> newList = new ArrayList<String>();
List<String> emailPart=new LinkedList<String>();
List<String> numberPart=new LinkedList<String>();
for(int i=1;i<parts.size();i++){
String[] strArr=parts.get(i).split("\\|");
for(int j=0;j<strArr.length;j++){
if(strArr[j].matches(".*#.*")){
int index=strArr[j].indexOf("#");
emailPart.add(strArr[j].substring(0, index).replaceAll("[A-Za-z0-9]", "X")+
strArr[j].substring(index, strArr[j].length()));
}
if(strArr[j].matches("[0-9\\-]+")){
numberPart.add(strArr[j].replaceAll("[0-9]", "X"));
}
}
newList.add(strArr[0]+"|"+emailPart.get(i-1)+"|"+numberPart.get(i-1));
}
System.out.println(newList);

Related

Trying to separate a txt file into two ArrayLists?

This image is a text file I need to separate by date and the digits alongside it
BufferedReader wordReader = new BufferedReader(new FileReader("\\Users\\rosha\\eclipse-workspace\\working\\src\\workingfix\\spx_data_five_years.txt"));
ArrayList<String> spxIndex = new ArrayList<>();
ArrayList<String> date = new ArrayList<>();
//populating the Array with the file
String line = wordReader.readLine();
while (line != null) {
date.add(line);
line = wordReader.readLine();
}
wordReader.close();
Would really love to understand how to separate this file into two Arrays. Been at it for a while and some Guidance in the right direction would be incredible. Apologies if it's a simple solution for some reason I'm having trouble getting started.
Here is the some of the Text File, if I can get guidance on this I'll be in good shape
1/4/2010 1132.99
1/5/2010 1136.52
1/6/2010 1137.14
1/7/2010 1141.69
1/8/2010 1144.98
1/11/2010 1146.98
1/12/2010 1136.22
1/13/2010 1145.68
Your two examples are a little bit different, in the Image, it seems like the entries are separated by a tab. In your text example, the entries are separated by a space. If they are separated by a space, a simple String[] splitter = line.split(" "); suffices. This gives you the result as an Array, which you can write in the ArrayLists.
Here is the solution, using split method
public static void main(String[] args) throws IOException {
ArrayList<String> spxIndex = new ArrayList<>();
ArrayList<String> date = new ArrayList<>();
String sCurrentLine;
BufferedReader br = new BufferedReader(new FileReader("\\Users\\rosha\\eclipse-workspace\\working\\src\\workingfix\\spx_data_five_years.txt"));
while ((sCurrentLine = br.readLine()) != null) {
String[] lineValues = sCurrentLine.split(" ");
date.add(lineValues[0]);
spxIndex.add(lineValues[1]);
}
br.close();
System.out.println(date);
System.out.println(spxIndex);
}

How to compare 2 files and remove non-existent lines?

I'm trying to remove non-existing lines from file 1 compared to file 2
Example:
Input
file 1
text
example
word
file 2
example
word
Output
file 1
example
word
My code is totally the opposite: it eliminates all duplicate words in the 2 files.
My actual output is:
file 1
text
Code
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader.readLine()) != null) {
lines2.add(line);
}
BufferedReader reader = new BufferedReader(new FileReader(file1));
Set<String> lines = new HashSet<String>(10000);
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
Set set3 = new HashSet(lines);
set3.removeAll(lines2);
You need the intersection between the two sets. Right now you are calculating the symmetrical difference between the sets.
public static void main(String []args){
Set<String> file1 = new HashSet<>();
Set<String> file2 = new HashSet<>();
file1.add("text");
file1.add("example");
file1.add("word");
file2.add("example");
file2.add("word");
Set<String> intersection = new HashSet<>(file1);
intersection.retainAll(file2);
System.out.println(intersection);
}
Output:
[word, example]
Ok you are almost there with your approach all you missing is another line of code were you call
lines.removeAll(set3);
then you have the set (lines) with the needed result.
In your original code, you read in file 2 then file 1 and just removed the words in file2 from file1, leaving the one different word.
Here I wrote out the code, and commented. You needed to have a set that then removed that one word from the complete list.
In my code I made a new set, just in case you want to rebuild the first set, and leave it as un-modified.
package scrapCompare;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class CompareLines {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
//You create a set of words from file 1.
BufferedReader reader = new BufferedReader(new FileReader("file1"));
Set<String> lines = new HashSet<String>(10000);
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
//You create a set of words from file 2.
BufferedReader reader2 = new BufferedReader(new FileReader("file2"));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader2.readLine()) != null) {
lines2.add(line2);
}
//In your original code, you create a third set of words equal to file 1, and then delete all the words from file 2.
//It isolates the one different word, but you stopped there.
Set set3 = new HashSet(lines);
set3.removeAll(lines2);
lines.removeAll(set3);
//the answer set is made, in case you want to rebuild the lines set.
Set <String> answer = lines;
//iterator for printing to console.
Iterator<String> itr = answer.iterator();
//print the answer to console
while(itr.hasNext())
System.out.println(itr.next());
//close your readers
reader.close();
reader2.close();
}
}
public class RemoveLine {
public static void main(String[] args) throws IOException {
String file = "../file.txt";
String file1 = "../file1.txt";
String file2 = "../file2.txt";
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader2.readLine()) != null) {
lines2.add(line2);
}
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
Set<String> lines1 = new HashSet<String>(10000);
String line1;
while ((line1 = reader1.readLine()) != null) {
lines1.add(line1);
}
Set<String> outPut = lines1.stream().filter(l1 -> lines2.stream().anyMatch(l2 -> l2.equals(l1))).collect(Collectors.toSet());
Charset utf8 = StandardCharsets.UTF_8;
Files.write(Paths.get(file), outPut, utf8, StandardOpenOption.CREATE);
}
}

Scanning, spliting and assigning values from a text file

I'm having trouble scanning a given file for certain words and assigning them to variables, so far I've chosen to use Scanner over BufferedReader because It's more familiar. I'm given a text file and this particular part I'm trying to read the first two words of each line (potentially unlimited lines) and maybe add them to an array of sorts. This is what I have:
File file = new File("example.txt");
Scanner sc = new Scanner(file);
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] ary = line.split(",");
I know It' a fair distance off, however I'm new to coding and cannot get past this wall...
An example input would be...
ExampleA ExampleAA, <other items seperated by ",">
ExampleB ExampleBB, <other items spereated by ",">
...
and the proposed output
VariableA = ExampleA ExampleAA
VariableB = ExampleB ExampleBB
...
You can try something like this
File file = new File("D:\\test.txt");
Scanner sc = new Scanner(file);
List<String> list =new ArrayList<>();
int i=0;
while (sc.hasNextLine()) {
list.add(sc.nextLine().split(",",2)[0]);
i++;
}
char point='A';
for(String str:list){
System.out.println("Variable"+point+" = "+str);
point++;
}
My input:
ExampleA ExampleAA, <other items seperated by ",">
ExampleB ExampleBB, <other items spereated by ",">
Out put:
VariableA = ExampleA ExampleAA
VariableB = ExampleB ExampleBB
To rephrase, you are looking to read the first 2 words of a line (everything before the first comma) and store it in a variable to process further.
To do so, your current code looks fine, however, when you grab the line's data, use the substring function in conjunction with indexOf to just get the first part of the String before the comma. After that, you can do whatever processing you want to do with it.
In your current code, ary[0] should give you the first 2 words.
public static void main(String[] args)
{
File file = new File("example.txt");
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String line = "";
List l = new ArrayList();
while ((line = br.readLine()) != null) {
System.out.println(line);
line = line.trim(); // remove unwanted characters at the end of line
String[] arr = line.split(",");
String[] ary = arr[0].split(" ");
String firstTwoWords[] = new String[2];
firstTwoWords[0] = ary[0];
firstTwoWords[1] = ary[1];
l.add(firstTwoWords);
}
Iterator it = l.iterator();
while (it.hasNext()) {
String firstTwoWords[] = (String[]) it.next();
System.out.println(firstTwoWords[0] + " " + firstTwoWords[1]);
}
}

Java - Create String Array from text file [duplicate]

This question already has answers here:
Java: Reading a file into an array
(5 answers)
Closed 1 year ago.
I have a text file like this :
abc def jhi
klm nop qrs
tuv wxy zzz
I want to have a string array like :
String[] arr = {"abc def jhi","klm nop qrs","tuv wxy zzz"}
I've tried :
try
{
FileInputStream fstream_school = new FileInputStream("text1.txt");
DataInputStream data_input = new DataInputStream(fstream_school);
BufferedReader buffer = new BufferedReader(new InputStreamReader(data_input));
String str_line;
while ((str_line = buffer.readLine()) != null)
{
str_line = str_line.trim();
if ((str_line.length()!=0))
{
String[] itemsSchool = str_line.split("\t");
}
}
}
catch (Exception e)
{
// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
Anyone help me please....
All answer would be appreciated...
If you use Java 7 it can be done in two lines thanks to the Files#readAllLines method:
List<String> lines = Files.readAllLines(yourFile, charset);
String[] arr = lines.toArray(new String[lines.size()]);
Use a BufferedReader to read the file, read each line using readLine as strings, and put them in an ArrayList on which you call toArray at end of loop.
Based on your input you are almost there. You missed the point in your loop where to keep each line read from the file. As you don't a priori know the total lines in the file, use a collection (dynamically allocated size) to get all the contents and then convert it to an array of String (as this is your desired output).
Something like this:
String[] arr= null;
List<String> itemsSchool = new ArrayList<String>();
try
{
FileInputStream fstream_school = new FileInputStream("text1.txt");
DataInputStream data_input = new DataInputStream(fstream_school);
BufferedReader buffer = new BufferedReader(new InputStreamReader(data_input));
String str_line;
while ((str_line = buffer.readLine()) != null)
{
str_line = str_line.trim();
if ((str_line.length()!=0))
{
itemsSchool.add(str_line);
}
}
arr = (String[])itemsSchool.toArray(new String[itemsSchool.size()]);
}
Then the output (arr) would be:
{"abc def jhi","klm nop qrs","tuv wxy zzz"}
This is not the optimal solution. Other more clever answers have already be given. This is only a solution for your current approach.
This is my code to generate random emails creating an array from a text file.
import java.io.*;
public class Generator {
public static void main(String[]args){
try {
long start = System.currentTimeMillis();
String[] firstNames = new String[4945];
String[] lastNames = new String[88799];
String[] emailProvider ={"google.com","yahoo.com","hotmail.com","onet.pl","outlook.com","aol.mail","proton.mail","icloud.com"};
String firstName;
String lastName;
int counter0 = 0;
int counter1 = 0;
int generate = 1000000;//number of emails to generate
BufferedReader firstReader = new BufferedReader(new FileReader("firstNames.txt"));
BufferedReader lastReader = new BufferedReader(new FileReader("lastNames.txt"));
PrintWriter write = new PrintWriter(new FileWriter("emails.txt", false));
while ((firstName = firstReader.readLine()) != null) {
firstName = firstName.toLowerCase();
firstNames[counter0] = firstName;
counter0++;
}
while((lastName= lastReader.readLine()) !=null){
lastName = lastName.toLowerCase();
lastNames[counter1]=lastName;
counter1++;
}
for(int i=0;i<generate;i++) {
write.println(firstNames[(int)(Math.random()*4945)]
+'.'+lastNames[(int)(Math.random()*88799)]+'#'+emailProvider[(int)(Math.random()*emailProvider.length)]);
}
write.close();
long end = System.currentTimeMillis();
long time = end-start;
System.out.println("it took "+time+"ms to generate "+generate+" unique emails");
}
catch(IOException ex){
System.out.println("Wrong input");
}
}
}
You can read file line by line using some input stream or scanner and than store that line in String Array.. A sample code will be..
File file = new File("data.txt");
try {
//
// Create a new Scanner object which will read the data
// from the file passed in. To check if there are more
// line to read from it we check by calling the
// scanner.hasNextLine() method. We then read line one
// by one till all line is read.
//
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
//store this line to string [] here
System.out.println(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner scanner = new Scanner(InputStream);//Get File Input stream here
StringBuilder builder = new StringBuilder();
while (scanner.hasNextLine()) {
builder.append(scanner.nextLine());
builder.append(" ");//Additional empty space needs to be added
}
String strings[] = builder.toString().split(" ");
System.out.println(Arrays.toString(strings));
Output :
[abc, def, jhi, klm, nop, qrs, tuv, wxy, zzz]
You can read more about scanner here
You can use the readLine function to read the lines in a file and add it to the array.
Example :
File file = new File("abc.txt");
FileInputStream fin = new FileInputStream(file);
BufferedReader reader = new BufferedReader(fin);
List<String> list = new ArrayList<String>();
while((String str = reader.readLine())!=null){
list.add(str);
}
//convert the list to String array
String[] strArr = Arrays.toArray(list);
The above array contains your required output.

Compare values in two files

I have two files Which should contain the same values between Substring 0 and 10 though not in order. I have Managed to Outprint the values in each file but I need to Know how to Report say id the Value is in the first File and Notin the second file and vice versa. The files are in these formats.
6436346346....Other details
9348734873....Other details
9349839829....Other details
second file
8484545487....Other details
9348734873....Other details
9349839829....Other details
The first record in the first file does not appear in the second file and the first record in the second file does not appear in the first file. I need to be able to report this mismatch in this format:
Record 6436346346 is in the firstfile and not in the secondfile.
Record 8484545487 is in the secondfile and not in the firstfile.
Here is the code I currently have that gives me the required Output from the two files to compare.
package compare.numbers;
import java.io.*;
/**
*
* #author implvcb
*/
public class CompareNumbers {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
File f = new File("C:/Analysis/");
String line;
String line1;
try {
String firstfile = "C:/Analysis/RL001.TXT";
FileInputStream fs = new FileInputStream(firstfile);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
System.out.println(account);
}
String secondfile = "C:/Analysis/RL003.TXT";
FileInputStream fs1 = new FileInputStream(secondfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fs1));
while ((line1 = br1.readLine()) != null) {
String account1 = line1.substring(0, 10);
System.out.println(account1);
}
} catch (Exception e) {
e.fillInStackTrace();
}
}
}
Please help on how I can effectively achieve this.
I think I needed to say that am new to java and may not grab the ideas that easily but Am trying.
Here is the sample code to do that:
public static void eliminateCommon(String file1, String file2) throws IOException
{
List<String> lines1 = readLines(file1);
List<String> lines2 = readLines(file2);
Iterator<String> linesItr = lines1.iterator();
while (linesItr.hasNext()) {
String checkLine = linesItr.next();
if (lines2.contains(checkLine)) {
linesItr.remove();
lines2.remove(checkLine);
}
}
//now lines1 will contain string that are not present in lines2
//now lines2 will contain string that are not present in lines1
System.out.println(lines1);
System.out.println(lines2);
}
public static List<String> readLines(String fileName) throws IOException
{
List<String> lines = new ArrayList<String>();
FileInputStream fs = new FileInputStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
String line = null;
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
lines.add(account);
}
return lines;
}
Perhaps you are looking for something like this
Set<String> set1 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL001.TXT")));
Set<String> set2 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL003.TXT")));
Set<String> onlyInSet1 = new HashSet<>(set1);
onlyInSet1.removeAll(set2);
Set<String> onlyInSet2 = new HashSet<>(set2);
onlyInSet2.removeAll(set1);
If you guarantee that the files will always be the same format, and each readLine() function is going to return a different number, why not have an array of strings, rather than a single string. You can then compare the outcome with greater ease.
Ok, first I would save the two sets of strings in to collections
Set<String> s1 = new HashSet<String>(), s2 = new HashSet<String>();
//...
while ((line = br.readLine()) != null) {
//...
s1.add(line);
}
Then you can compare those sets and find elements that do not appear in both sets. You can find some ideas on how to do that here.
If you need to know the line number as well, you could just create a String wrapper:
class Element {
public String str;
public int lineNr;
public boolean equals(Element compElement) {
return compElement.str.equals(str);
}
}
Then you can just use Set<Element> instead.
Open two Scanners, and :
final TreeSet<Integer> ts1 = new TreeSet<Integer>();
final TreeSet<Integer> ts2 = new TreeSet<Integer>();
while (scan1.hasNextLine() && scan2.hasNexLine) {
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
}
You can now compare ordered results of the two trees
EDIT
Modified with TreeSet
Put values from each file to two separate HashSets accordingly.
Iterate over one of the HashSets and check whether each value exists in the other HashSet. Report if not.
Iterate over other HashSet and do same thing for this.

Categories