Can't implement RegEx on .dat file - java

I have to read through a .dat file with restaurant names, addresses, ratings etc. and display anything that isn't formatted correctly. The problem is not with the regular expression.
My problem is that I have no idea how to implement the regular expression so that it can read through the files and pick out any errors in the formatting of the above categories.
The contents of the file are not evenly spaced out so I can't just make a constructor that reads each substring. Is there any way I can use regular expressions to pull out the information I need from the file? Any help will be appreciated.

If you already have a regular expression, you can just test every line and print it, if it does not match.

Related

Best way to store words for given scenerio

I am working on Java project [Maven].
I am confused in one point. I don't know what is logiclaly corect.
Problem is as follows :-
Sentence is given, and from their I have extract some particular words.
Solution that I found
I make one regex and put in Constants class. Whenever I have to add more words, I simply appended words in regex.
This solves the problem.
I am confused here
I am thinking, if I put numbers of text files in resources folder where each text file denotes one regex expression.
REGEX = (?:A|B|C|D)
A, B, C, D = Word(String)
Is it a good idea ? If not please suggest any other.
Why would you save regex's in a text file? The fact that you're using a regex seems like an implementation detail that you would want to encapsulate (unless you want the significantly greater functionality but also overhead of supporting regexes).
Also, why do you need new files for each word? That seems like you could just have one file with a word per line that is all of the words you're interested in. This would be much more simple for a user to understand than 100 files with one regex per file.
As my understanding, you want to find some key words from the input string. And those key words could be extened according your requirments.
your current solution is to make this regex (?:A|B|C|D) in your Constant class, wheneveer it's required, you'll add more key words in this regex.
If my understanding is not wrong, maybe, one suggestion is to put this regex in your properties file, like this
REGEX = (?:city|Animal|plant|student)
if too long, it's could be like this
REGEX = (?:city|Animal|plant|student|car|computer|clothes|\
furnature|others)
Your second idea, if my understanding is not wrong, is to put the keywords as the file name, and those files are put in one resource folder. therefore, you could obtain those files name to compose the final regexp. If your regex are always fixed as the (?:A|B|C|D) format, then this solution is good & convenient. (Every time, you add one new keyword file, you don't need to modify any source code & property file)

Sorting an Array with .txt

Quick enquiry..
I have created an array and it will be populated by a scanner passing reading through information from a .txt file. The .txt file has a specific structure:
<job role> <years of experience> <name>
( this is an example ). This will be inputted by more than one person so there will be multiple of these in the text file. So, I now need to find a way to gather them into an array into an ordered structure. The order should be based on the first alphabetic letter of their job role. I was
thinking about implementing a comparator would this be possible/efficient to do?
So my idea would be use a comparator somehow on specifically and compare them will all other job role entries..
thanks and sorry if it's a brief or not very clear found it difficult describing the situation...
In order to accurately read in the text file, you would need a delimiter. A delimiter is a character that tells the computer that you are moving to the next data entry.
I recommend the following: for a new person, I recommend a new line \n as the delimiter. To tell the difference in the objects job, years, and name, I recommend using a comma, ",", as it does not appear that the the data set requires the use of that.
Once you have decided on a delimiter, I would recommend looking into the class File, FileInputStream and FileOutputStream. These classes are data streams that can read information from files saved on the computer.

Suggested ways of reading a text file with inconsistent formatting

I'm trying to read a text file of numbers as a double array and after various methods (usually resulting in an input format exception) I have come to the conclusion that the text file I am trying to read is inconsistent with it's delimiting.
The majority of the text format is in the form "0.000,0.000" so I have been using a Scanner and the useDelimiter(",") to read in each value.
It turns out though (this is a big file of numbers) that some of the formatting is in the form "0.000 0.000" (at the end of a line I presume) which of course produces an input format exception.
This is an open question really, I'm a pretty basic Java programmer so I would just like to see if there are any suggestions/ways of performing this. Is Scanner the correct class to go on this?
Thank you for your time!
Read file as text line-by-line. Then split line into parts:
String[] parts = line.split("[ ,]");
Now iterate over the parts and call Double.parseDouble() for each part.
Scanner allows any Java Regex Pattern to function as a delimiter. You should be able to use any number of delimiters by doing the following:
scanner.setDelimiter("[,\\s]"); // Will match commas and whitespace
I'd like to comment this in instead of making it a separate answer, but my reputation is too low. Apologies, Alex.
You mentioned having two different delimited characters used in different instances, not a combination of the two as a single delimiter.
You can use the vertical bar as logical OR in a regular expression.
scanner.setDelimiter("[,|\\s]"); //Will match commas or whitespace as appropriate
line by line:
String[] parts = line.split("[,|\\s]");

Regex for lines from /etc/passwd and /etc/group

I've been working on a small Java problem set and have come across some trouble. I'm not very experienced writing regular expressions and could really use two for verifying line entries in /etc/group and /etc/passwd in Java.
I found Regex Verification of Line in /etc/passwd earlier and have yet to test it, but it looks adaptable for what I need. Could anyone else help in providing a regex string for either file?
I'm looking to verify user-entered passwd and group lines, in java, before writing them out to disk. If not, I'll likely end up tokenizing each piece and running various expensive operations.
Rather than writing a regex you should probably just read the files with Scanner and parse each line with String.split(":"). Then you can check that each part is valid without dealing with a complex expression to handle all cases. It'll probably be easier to write the code and easier to read it later.
Why do you want to use regular expressions? Just split the line on the colons and inspect the pieces.

Regarding Java Split Command CSV File Parsing

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.
I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.
opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.
If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882
Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.

Categories