Java CSV Regex matching per element for validation of each cell

Java CSV Regex matching per element for validation of each cell - java

Ok, so I have this issue that I can't seem to wrap my head around.
what I'm trying to do, is the following:
read in a CSV file line by line, split it up at the comma and pass it into a hashmap and then carry out some operations.
I'm effectively trying to replicate some of the behaviours of map reduce in java.
Now, what I have so far is:
public class mapper {
public static void main(String[] args) {
//file reading - here.
Scanner filePathInput = new Scanner(System.in);
String filePath = filePathInput.nextLine();
File file = new File(filePath);
if (file.isFile()) {
Scanner fileInput = null;
try {
fileInput = new Scanner(file);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
ArrayList<String> lineBuffer = new ArrayList<>();
while (fileInput.hasNextLine()) {
String line = fileInput.nextLine();
// char ch = line.charAt(0);
lineBuffer.add(line);
//String[] values = line.split(",");
// Map<String, Integer> reducer = new HashMap<String, Integer>();
// parse the line here
//System.out.println(values);
}
HashMap<String, ArrayList<FlightData>> test = mapper(lineBuffer);
}
}
I then have the mapper to the hash map down:
public static HashMap<String, ArrayList<FlightData>> mapper(ArrayList<String> lineBuffer) {
HashMap<String, ArrayList<FlightData>> mapdata = new HashMap<>();
for (String flightData: lineBuffer) {
String[] str = flightData.split(",");
FlightData flight = new FlightData(str[0], str[1], str[2].toCharArray(),str[3].toCharArray(), new Date(Long.valueOf(str[4])), Long.valueOf(str[5]).longValue());
mapdata.get(flight.getFlightID());
if(mapdata.containsKey(flight.getFlightID())){
mapdata.get(flight.getFlightID()).add(flight);
}
else {
ArrayList<FlightData> noID = new ArrayList<>();
noID.add(flight);
mapdata.put(flight.getFlightID(), noID);
}
}
System.out.println(mapdata);
return mapdata;
}
I have my flight data object defined here with the getters etc:
public class FlightData {
private String passengerID;
private String flightID;
private char[] fromID = new char[3];
private char[] tooID = new char[3];
public Date departTime;
public long flightTimeMins;
public Date arrivalTime;
//Constucter;
public FlightData(String passengerID, String flightID, char[] fromID, char[] tooID, Date departTime, long flightTimeMins) {
setPassengerID(passengerID);
setFlightID(flightID);
setFromID(fromID);
setTooID(tooID);
setFlightTimeMins(flightTimeMins);
setDepartTime(departTime);
setArrivalTime(arrivalTime);
However, I am having an issue is, how do I carry out the validation:
presumably I need to create a class that contains all of my patterns and all of the logic for that right? and just call it when needed?
I have set up a basic class for this:
public class Validation {
public static void validate(String theReg, String str2Check) {
final Pattern PtnPassenger = Pattern.compile(theReg);
final Pattern PtnFlight = Pattern.compile(theReg);
final Pattern PtnFrom = Pattern.compile(theReg);
final Pattern PtnToo = Pattern.compile(theReg);
Matcher regexMatcher = PtnPassenger.matcher(str2Check);
while (regexMatcher.find()) {
if (regexMatcher.group().length() != 0) {
System.out.println(regexMatcher.group().trim());
}
}
}
But, how to I do the following:
set it up so that as each line is read it it checks, it is empty?
set it up so that then, if its not it checks that "cell" against the pattern, moves onto the next and repeats step 1
so for example, each line should contain following comma separated data:
PID, FID, FromID, TooID, time(linux epoch) minutes e.g:
BWI0520BG0, MOO1786A, MAD, FRA, 1420563408, 184
so, for example, for pID I would need a regex like this:
[A-Z]{3}[0-9]{4}[A-Z]{2}[0-9]{1}
but, how do I check each element? should I do this before I pass them into the hash map? or?
Any help would be great.
Cheers

Related

How to store substrings from .docx to java bean class?

I would need to create java bean class from strings which are read from .docx file. The docx file looks like:
Comments:
20.6.2018 16:18-16:25 problem: first problem, action first action
20.6.2018 16:20-16:45 problem: second problem, action: second action
20.6.2018 16:25-16:30 problem: third problem, action: third action
Check list based on data from 24.6.2018.
I created a CheckList Class to read the docx where I used the instance of FAData.class POJO to set its instant variables according to How to parse a csv file and store datails in java bean class
public class CheckList {
String Check;
ArrayList<String> ActionG = new ArrayList<String>();
ArrayList<String> ProblemG = new ArrayList<String>();
ArrayList<String> DateG = new ArrayList<String>();
ArrayList<String> BeginTimeG = new ArrayList<String>();
ArrayList<String> EndTimeG = new ArrayList<String>();
public void readDocxFile(String fileName) {
FAData data = new FAData(); // instance of FAData POJO
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
String Comments = "";
for (XWPFParagraph para : paragraphs) {
Comments = Comments.concat(para.getText() + "\n");
}
Check = Comments.substring(Comments.lastIndexOf("Check"), Comments.length()); // split the last sentence
String CommentsG = Comments.substring(Comments.indexOf(":") + 1, Comments.lastIndexOf("Check")); // split the sentences between „Comments:” and the last sentence
data.setCheck(Check); // add the Check instant variable to FAData POJO
if (!CommentsG.equals("")) { // check if sentences exist between „Comments:” and the last sentence
// add the substrings to ArrayLists
String[] FAG = CommentsG.split("\n");
for (int i = 1; i < FAG.length; i++) {
ActionG.add(FAG[i].substring(FAG[i].indexOf("action") + 7, FAG[i].length()));
ProblemG.add(FAG[i].substring(FAG[i].indexOf("problem") + 9, FAG[i].indexOf("action") - 2));
DateG.add(FAG[i].substring(0, 10));
BeginTimeG.add(FAG[i].substring(FAG[i].indexOf(":") - 2, FAG[i].indexOf("-")));
EndTimeG.add(FAG[i].substring(FAG[i].indexOf("-") + 1, FAG[i].indexOf("-") + 6));
}
}
// add the ArrayList instant variables to FAData POJO
data.setActionG(ActionG);
data.setProblemG(ProblemG);
data.setDateG(DateG);
data.setBeginTimeG(BeginTimeG);
data.setEndTimeG(EndTimeG);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
public class FAData {
String Check;
ArrayList<String> ActionG = new ArrayList<String>();
ArrayList<String> ProblemG = new ArrayList<String>();
ArrayList<String> DateG = new ArrayList<String>();
ArrayList<String> BeginTimeG = new ArrayList<String>();
ArrayList<String> EndTimeG = new ArrayList<String>();
public String getCheck() {
return Check;
}
public void setCheck(String Check) {
this.Check = Check;
}
public ArrayList<String> getActionG() {
return ActionG;
}
public void setActionG(ArrayList<String> ActionG) {
this.ActionG = ActionG;
}
....
//Getter and Setter for the rest variables
}
I tested if the variables are available from FAData.class:
public class test {
public static void main(String[] args) {
CheckList TC = new CheckList();
String fileName = "c:\\path\\to\\docx";
try {
TC.readDocxFile(fileName);
FAData datas = new FAData();
String testCheck = datas.getCheck();
ArrayList<String> testDate = datas.getDateG();
System.out.println(testCheck);
System.out.println(testDate);
} catch (Exception e) {
System.out.println(e);
}
}
}
but I got null values. I do not know what I did wrong or how I should thest FAData.class. Could someone give me a suggestion?

Meantime I figured out the solution. I added the List<FAData> datas = new ArrayList<FAData>(); inside readDocxFile method and added to it the data values:
datas.add(data);
If I create a new instant of CheckList object in an other class I can get the data e.g:
CheckList TC = new CheckList();
List<FAData> D = TC.readDocxFile(fileName);
String DateG = D.get(0).getDateG(0);

Using a buffer reader to read blocks of data for a given input

So this is the structure of the file that I'm reading from:
[MESSAGE BEGIN]
uan:123
messageID: 111
[MESSAGE END]
[MESSAGE BEGIN]
uan:123
status:test
[MESSAGE END]
What I'm trying to do is, for a given uan, return all the details for it, whilst maintaining the block structure "MESSAGE BEGIN" "MESSAGE END".
This is the code I've written:
startPattern= "uan:123"
endPattern= "[MESSAGE END]"
System.out.println("Matching: " + this.getStartPattern());
List<String> desiredLines = new ArrayList<>();
try (BufferedReader buff = Files.newBufferedReader(getPath())) {
String line = "";
while ((line = buff.readLine()) != null) {
if (line.contains(this.getStartPattern())) {
desiredLines.add(line);
System.out.println(" \nMatch Found! ");
buff.lines().forEach(streamElement -> {
if (!streamElement.contains(this.getEndPattern())) {
desiredLines.add(streamElement);
} else if (streamElement.contains(this.getEndPattern())) {
throw new IndexOutOfBoundsException("Exit Status 0");
}
});
}
Now, the problem is, the while condition breaks when it sees the first "uan" and just captures the message ID. I want the code to also include "status" when I pass the uan.
Can anyone help with this?
EDIT
This is my expected output:
uan:123
messageID: 111
uan:123
status:test
All instances of uan:123 should be captured

What about to create e.g. Data class, that holds all fields for given uan? I can see that you have an object with id (i.e. uan) and many messaged for this object.
I offer to use this approach and collect all relative information (belong to the same object with uan) in the same instance:
This is Data class:
final class Data {
private String uan;
private final List<Map<String, String>> events = new LinkedList<>();
public Data(String uan) {
this.uan = uan;
}
public String getUan() {
return uan;
}
public boolean hasUan() {
return uan != null && !uan.isEmpty();
}
public void set(Data data) {
if (data != null)
events.addAll(data.events);
}
public void addEvent(String key, String value) {
if ("uan".equalsIgnoreCase(key))
uan = value;
else
events.add(Collections.singletonMap(key, value));
}
}
This is method that reads given file and retrieves Map<String, Data> with key as uan and values are all data for this object:
private static final String BEGIN = "[MESSAGE BEGIN]";
private static final String END = "[MESSAGE END]";
private static final Pattern KEY_VALUE_PATTERN = Pattern.compile("\\s*(?<key>[^:]+)\\s*:\\s*(?<value>[^:]+)\\s*");
private static Map<String, Data> readFile(Reader reader) throws IOException {
try (BufferedReader br = new BufferedReader(reader)) {
Data data = null;
Map<String, Data> map = new TreeMap<>();
for (String str; (str = br.readLine()) != null; ) {
if (str.equalsIgnoreCase(BEGIN))
data = new Data(null);
else if (str.equalsIgnoreCase(END)) {
if (data != null && data.hasUan()) {
String uan = data.getUan();
map.putIfAbsent(uan, new Data(uan));
map.get(uan).set(data);
}
data = null;
} else if (data != null) {
Matcher matcher = KEY_VALUE_PATTERN.matcher(str);
if (matcher.matches())
data.addEvent(matcher.group("key"), matcher.group("value"));
}
}
return map;
}
}
And finally, this is like the client looks like:
Map<String, Data> map = readFile(new FileReader("data.txt"));

Grouping filtered messages
Your general approach seems good. Instead of the nested loop I would break it down to a simpler and more straightforward logic like:
String needle = "uan:123";
String startPattern = "[MESSAGE BEGIN]";
String endPattern = "[MESSAGE END]";
List<List<String>>> result = new ArrayList<>();
try (BufferedReader buff = Files.newBufferedReader(getPath())) {
// Lines and flag for current message
List<String> currentMessage = new ArrayList<>();
boolean messageContainedNeedle = false;
// Read all lines
while (true) {
String line = buff.readLine();
if (line == null) {
break;
}
// Collect current line to message, ignore indicator
if (!line.equals(endPattern) && !line.equals(startPattern)) {
currentMessage.add(line);
}
// Set flag if message contains needle
if (!messageContainedNeedle && line.equals(needle)) {
messageContainedNeedle = true;
}
// Message ends
if (line.equals(endPattern)) {
// Collect if needle was contained
if (messageContainedNeedle) {
result.add(currentMessage);
}
// Prepare for next message
messageContainedNeedle = false;
currentMessage = new ArrayList<>();
}
}
}
It's easier to read and understand. And it supports that your message items come in arbitrary order. Also, the resulting result does still group messages in a List<List<String>>. You can easily flat-map that if you still want a List<String>.
The resulting structure is:
[
["uan:123", "messageID: 111"],
["uan:123", "status: test"]
]
Achieving exactly your desired output is simple now:
// Variant 1: Nested for-each
result.forEach(message -> message.forEach(System.out::println));
// Variant 2: Flat-map
result.stream().flatMap(List::stream).forEach(System.out::println));
// Variant 3: Without streams
for (List<String> message : result) {
for (String line : message) {
System.out.println(line);
}
}
Grouping all messages
If you leave out the flag-part you can parse all messages into that structure and then easily stream on them:
public static List<List<String>> parseMessages(Path path) {
String startPattern = "[MESSAGE BEGIN]";
String endPattern = "[MESSAGE END]";
List<List<String>>> result = new ArrayList<>();
try (BufferedReader buff = Files.newBufferedReader(path)) {
// Data for current message
List<String> currentMessage = new ArrayList<>();
// Read all lines
while (true) {
String line = buff.readLine();
if (line == null) {
break;
}
// Collect current line to message, ignore indicator
if (!line.equals(endPattern) && !line.equals(startPattern)) {
currentMessage.add(line);
}
// Message ends
if (line.equals(endPattern)) {
// Collect message
result.add(currentMessage);
// Prepare for next message
currentMessage = new ArrayList<>();
}
}
}
return result;
}
Usage is simple and straightforward. For example, filtering for messages with "uan:123":
List<List<String>> messages = parseMessages(getPath());
String needle = "uan:123";
List<List<String>> messagesWithNeedle = messages.stream()
.filter(message -> message.contains(needle))
.collect(Collectors.toList());
The resulting structure again is:
[
["uan:123", "messageID: 111"],
["uan:123", "status: test"]
]
Achieving your desired output can be made directly on the stream cascade:
messages.stream() // Stream<List<String>>
.filter(message -> message.contains(needle))
.flatMap(List::stream) // Stream<String>
.forEach(System.out::println);
Message Container
A natural idea would be to group the message data in a designated Message container class. Something like that:
public class Message {
private final Map<String, String> mProperties;
public Message() {
mProperties = new HashMap<>();
}
public String getValue(String key) {
return mProperties.get(key);
}
public void put(String key, String value) {
mProperties.put(key, value);
}
public static Message fromLines(List<String> lines) {
Message message = new Message();
for (String line : lines) {
String[] data = line.split(":");
message.put(data[0].trim(), data[1].trim());
}
return message;
}
// Other methods ...
}
Note the handy Message#fromLines method. Using that you get a List<Message> and working with the data is way more convenient.

Just use simple parsing logic and only output data if you see the matching uan. I use a boolean variable to keep track of whether we have hit a matching uan inside a given block. If so, then we output all lines, otherwise we no-op and skip everything.
try (BufferedReader buff = Files.newBufferedReader(getPath())) {
String line = "";
String uan = "uan:123";
String begin = "[MESSAGE BEGIN]";
String end = "[MESSAGE END]";
boolean match = false;
while ((line = buff.readLine()) != null) {
if (uan.equals(line)) {
match = true;
}
else if (end.equals(line)) {
match = false;
}
else if (!begin.equals(line) && match) {
System.out.println(line);
}
}
}
Note that I don't do any validation to check if, for example, every BEGIN is mirrored by a proper closing END. If you need this you may add extra logic to the above code.

Scanner hasNextLine() doesn't access last line in Java

I am using TreeMap structure in Java. The key contains character ':' in it and the values is a list of things. The problem is when i debug the program stops at this line (not working anymore...)
if (!string.isEmpty()) {
**string = jin.nextLine();**
}
I really have no idea what can be the problem. Here below is my code. Data(where I keep date variable) and ListOfBills (where I keep list of objects of the Bill Class) are two other Classes.
public void read(InputStream in) throws ParseException {
Scanner jin = new Scanner(in);
TreeMap<Date, ListOfBills> tree = new TreeMap<Date, ListOfBills>();
ListOfBills obBill = new ListOfBills();
Data data;
String string = jin.nextLine();
while (jin.hasNextLine()) {
if (string.contains(":")) {
data = new Data((string));
string = jin.nextLine();
while (!string.contains(":")) {
String[] parts1 = string.split(" ");
obBill.listOfBills.add(new Bill(Integer.parseInt(parts1[0]), Float.parseFloat(parts1[2]),
parts1[3], Float.parseFloat(parts1[5])));
if (!string.isEmpty()) {
string = jin.nextLine();
}
}
tree.put(data.date1, obBill);
}
}
for (Date date : tree.keySet()) {
System.out.println(date + "\n");
}
jin.close();
}

It keeps blocked because it is waiting an input ...
You need to enter a value in your stdin if your testing from console ...
Hope it helps

How to Add newline after every 3rd element in arraylist in java?

My program need to add newline after every 3rd element in the arraylist. Here is my input file which contain the following data:
934534519137441534534534366, 0796544345345345348965,
796345345345544894534565, 734534534596544534538965 ,
4058991374534534999999, 34534539624, 91953413789453450452,
9137534534482080, 9153453459137482080, 405899137999999,
9653453564564524, 91922734534779797, 0834534534980001528, 82342398534
6356343430001528, 405899137999999, 9191334534643534547423752,
3065345782642564522021, 826422205645345345645621,
40584564563499137999999, 953453345344624, 3063454564345347,
919242353463428434451, 09934634634604641264, 990434634634641264,
40346346345899137999999, 963445636534653452, 919234634643325857953,
91913453453437987385, 59049803463463453455421, 405899137534534999999,
9192273453453453453758434,
and it goes on to multiple lines.
Code:
public class MyFile {
private static final Pattern ISDN =Pattern.compile("\\s*ISDN=(.*)");
public List<String> getISDNsFromFile(final String fileName) throws IOException {
final Path path = Paths.get(fileName);
final List<String> ret = new ArrayList<>();
Matcher m;
String line;
int index = 0;
try (
final BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);) {
while ((line = reader.readLine()) != null) {
m = ISDN.matcher(line);
if (m.matches()) {
ret.add(m.group(1));
index++;
if(index%3==0){
if(ret.size() == index){
ret.add(m.group(1).concat("\n"));
}
}
}
}
return ret;
}
}
}

I changed your code to read using the "new" Java 7 files I/O and corrected the use of a line separator, along with some formatting. Here I iterate over the list after it was completed. If it is really long you can iterate over it while constructing it.
public class MyFile {
private static final Pattern ISDN = Pattern.compile("\\s*ISDN=(.*)");
private static final String LS = System.getProperty("line.separator");
public List<String> getISDNsFromFile(final String fileName) throws IOException {
List<String> ret = new ArrayList<>();
Matcher m;
List<String> lines = Files.readAllLines(Paths.get(fileName), StandardCharsets.UTF_8);
for (String line : lines) {
m = ISDN.matcher(line);
if (m.matches())
ret.add(m.group(1));
}
for (int i = 3; i < ret.size(); i+=4)
ret.add(i, LS);
return ret;
}
}

I think you do not need to compare the size of arraylist with index. Just remove that condition and try with this
if(index%3==0){
ret.add(System.getProperty("line.separator"));
}
Although I support the comment made by #Peter
I would add them all to a List and only add the formatting when you print. Adding formatting in your data usually leads to confusion.

Scanner - managing input from text file

I’m working with java Scanner trying to extract product information from a text file called Inventory.txt.
This file contains data on products in this format:
“Danelectro|Bass|D56BASS-AQUA|336177|395.00Orange|Amplifier|BT1000-H|319578|899.00Planet Waves|Superpicks|1ORD2-5|301075|4.50Korg|X50 Music Synthesizer|X50|241473|735.00Alpine|Alto Sax|AAS143|198490|795.00”
I am trying to parse the strings and add them into an arraylist such that each element in the arraylist would look something like this:
"Danelectro|Bass|D56BASS-AQUA|336177|395.00"
"Orange|Amplifier|BT1000-H|319578|899.00"
"KorPlanet Waves|Superpicks|1ORD2-5|301075|4.50"
"g|X50 Music Synthesizer|X50|241473|735.00"
"Alpine|Alto Sax|AAS143|198490|555.00”
Following is my code:
public class ItemDao {
public ItemDao() {
scanFile();
}
public void scanFile() {
Scanner scanner;
ArrayList <String> content = new ArrayList <String>();
try {
Pattern p1 = Pattern.compile("\\.[0-9]{2}$");
scanner = new Scanner(new File("Inventory.txt"));
while (scanner.hasNext(p1)) {
content.add(scanner.next(p1));
}
for (String item : content) {
System.out.println("Items:" + item);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
When I tested this code I found that the arraylist is empty. Any help would be much appreciated.
java -jar A00123456Lab5.jar
Create an ItemDAO class in a dao package
This class will contain an static inner class which implements Comparator
(DAO = Data Access Object)

You can define a Scanner on a String, and a delimiter.
Since the | is used in regex as OR combinator, you have to mask it with (double)-backslash:
sc = new java.util.Scanner ("Danelectro|Bass|D56BASS-AQUA|336177|395.00");
sc.useDelimiter ("\\|");
String name = sc.next ();
// name: java.lang.String = Danelectro
String typ = sc.next ();
// typ: java.lang.String = Bass
String model = sc.next
// model: java.lang.String = D56BASS-AQUA
int id = sc.nextInt ();
// id: Int = 336177
val d = sc.nextDouble ();
// d: Double = 395.0

I see you're using a pattern, those can come in handy--but I'd just take each line and substring it.
while(scanner.hasNextLine()){
String temp = scanner.nextLine();
while(temp.indexOf("|") != -1){
content.add(temp.substring(temp.indexOf("|"));
temp.substring(temp.indexOf("|")+1);
}
}
Just a thought--might be easier to debug with this way.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java CSV Regex matching per element for validation of each cell - java

Related

How to store substrings from .docx to java bean class?

Using a buffer reader to read blocks of data for a given input

Scanner hasNextLine() doesn't access last line in Java

How to Add newline after every 3rd element in arraylist in java?

Scanner - managing input from text file

Categories

Resources