Reading line by line from file in java - java

I have problems with BufferedReader in java. I am reading line by line from large file, parsing lines and inserting into HashMap, but in the result only few lines are in HashMap
Map< Integer, String> data = new HashMap<>(1000000);
int completedTestsCount = 0;
BufferedReader reader = new BufferedReader(new FileReader("file.txt"), 120000);
String line = null;
while ((line = reader.readLine()) != null) {
if (line.contains("START executing FOR"))
{
String tempId = line.substring(42, line.length() - 38);
int startId = Integer.parseInt(tempId);
String dateTime = line.substring(6, 14);
data.put(startId, dateTime);
}
And it's an example of line from file that I want to parse "INFO 00:00:09 - START executing FOR test3625 at Mon Sep 23 00:00:09 GMT+00:00 2013", so keys are test id

HashMap saves data as , where key is unique, so may in your case,
String tempId = line.substring(42, line.length() - 38);
is the key, and as you are reading it from file, this might not be unique. This is the problem, you have to make sure key is unique.

The most likely explanation is that lots of lines in the file have the same startId. Each time you put a key/value pair with the same key, you will actually replace the previous map entry for that key. (A Map maps one key to exactly one value ...)
This might be because the ids are genuinely the same, or it might be that the way you are extracting the id from each line is incorrect; e.g. if the actual id doesn't always start at character 42.
By my counting, character 42 of your example line is the 2 in 3625 ... and that does not seem correct!

For using HashMaps, you will need all the stardId values to be unique.
You should be using a list, instead of a map in this case.
Define a custom KeyValuePair class and add the objects in the list.
class KeyValuePair{
int startId;
String dateTime;
}
List<KeyValuePair> data = new ArrayList<>();
String tempId = line.substring(42, line.length() - 38);
int startId = Integer.parseInt(tempId);
String dateTime = line.substring(6, 14);
data.add(new KeyValuePair(startId, dateTime))

Related

How to read from a .txt file into an array of objects

I have the following sample data in a .txt file
111, Sybil, 21
112, Edith, 22
113, Mathew, 30
114, Mary, 25
the required output is
[{"number":"111","name":"Sybil","age":"21" },
{"number":"112","name":"Edith","age":"22"},
{"number":"113","name":"Mathew","age":"30"},
"number":"114","name":"Mary","age":"25"]
Sadly, I have not gone far because I cant seem to get the values out of each line. instead, this is what is displayed
[one, two, three]
private void loadFile() throws FileNotFoundException, IOException {
File txt = new File("Users.txt");
try (Scanner scan = new Scanner(txt)) {
ArrayList data = new ArrayList<>() ;
while (scan.hasNextLine()) {
data.add(scan.nextLine());
System.out.print(scan.nextLine());
}
System.out.print(data);
}
I would appreciate any help. thank you
Not too sure about the requirements. If you just need to know how to get the values out, then use String.split() combined with Scanner.nextLine().
Codes below:
private void loadFile() throws FileNotFoundException, IOException {
File txt = new File("Users.txt");
try (Scanner scan = new Scanner(txt)) {
ArrayList data = new ArrayList<>();
while (scan.hasNextLine()) {
// split the data by ", " and split at most (3-1) times
String[] input = scan.nextLine().split(", ", 3);
data.add(input[0]);
data.add(input[1]);
data.add(input[2]);
System.out.print(scan.nextLine());
}
System.out.print(data);
}
}
The output would be as below and you can further modify it yourself:
[111, Sybil, 21, 112, Edith, 22, 113, Mathew, 30, 114, Mary, 25]
However, if you need the required format as well, the closest I can get is by using a HaspMap and put it into the ArrayList.
Codes below:
private void loadFile() throws FileNotFoundException, IOException {
File txt = new File("Users.txt");
try (Scanner scan = new Scanner(txt)) {
ArrayList data = new ArrayList<>();
while (scan.hasNextLine()) {
// Create a hashmap to store data in correct format,
HashMap<String, String> info = new HashMap();
String[] input = scan.nextLine().split(", ", 3);
info.put("number", input[0]);
info.put("name", input[1]);
info.put("age", input[2]);
// Put it inside the ArrayList
data.add(info);
}
System.out.print(data);
}
}
And the output would be:
[{number=111, name=Sybil, age=21}, {number=112, name=Edith, age=22}, {number=113, name=Mathew, age=30}, {number=114, name=Mary, age=25}]
Hope this answer helps you well.
Currently, you're skipping lines. A quote from the Scanner::nextLine documentation:
This method returns the rest of the current line, excluding any line separator at the end. The position is set to the beginning of the next line.
So you're adding one line to your list, and writing the next one to the console.
To get the data from each line, you can use the String::split method, which supports RegEx.
Example:
"line of my file".split(" ")
We can use streams to write some compact code.
First we define a record to hold our data.
Files.lines reads your file into memory, producing a stream of strings, one per line.
We call Stream#map to produce another stream, a series of string arrays. Each array has three elements, the three fields within each line.
We call map again, this time to produce a stream of Person objects. We construct each person object by parsing and passing to the constructor each of line’s three fields.
We call Stream#toList to collect those person objects into a list.
We call List#toString to generate text representing the contents of the list of person objects.
record Person ( int id , String name , int age ) {}
String output =
Files
.lines( Paths.of("/path/to/Users.txt" ) )
.map( line -> line.split( ", " ) )
.map( parts -> new Person(
Integer.parseInt( parts[ 0 ] ) ,
parts[ 1 ] ,
Integer.parseInt( parts[ 2 ] )
) )
.toList()
.toString()
;
If the format of the default Person#toString method does not suit you, add an override of that method to produce your desired output.

How to deal with NumberFormatException when reading from a csv file [duplicate]

This question already has answers here:
How can I prevent java.lang.NumberFormatException: For input string: "N/A"?
(6 answers)
Closed 9 months ago.
My task is to read values from a csv file, and import each line of information from this file into an object array. I think my issue is the blank data elements in my csv file which doesn't work for my parsing from string to int, but I have found no way to deal with this. Here is my code:
`fileStream = new FileInputStream(pFileName);
rdr = new InputStreamReader(fileStream);
bufRdr = new BufferedReader(rdr);
lineNum = 0;`
while (line != null) {
lineNum++;
String[] Values = new String[13];
Values = line.split(",");
int cumulPos = Integer.parseInt(Values[6]);
int cumulDec = Integer.parseInt(Values[7]);
int cumuRec = Integer.parseInt(Values[8]);
int curPos = Integer.parseInt(Values[9]);
int hosp = Integer.parseInt(Values[10]);
int intenCar = Integer.parseInt(Values[11]);
double latitude = Double.parseDouble(Values[4]);
double longitude = Double.parseDouble(Values[5]);
covidrecordArray[lineNum] = new CovidRecord(Values[0], cumulPos, cumulDec, cumuRec, curPos, hosp,
intenCar, new Country(Values[1], Values[2], Values[3], Values[13], latitude, longitude));
If anyone could help it would be greatly appreciated.
As already suggested, use a proper CSV Parser if you can but if for some unknown reason you can't, this could be one way you can do it. Be sure to read the comments in code:
fileStream = new FileInputStream(pFileName);
rdr = new InputStreamReader(fileStream);
bufRdr = new BufferedReader(rdr);
// Remove the following line if there is no Header line in the CSV file.
String line = bufRdr.readLine();
String csvFileDataDelimiter = ",";
List<CovidRecord> recordsList = new ArrayList<>();
// True value calculated later in code (read comments).
int expectedNumberOfElements = 0; // 0 is default
while ((line = bufRdr.readLine()) != null) {
line = line.trim();
// If for some crazy reason a blank line is encountered...skip it.
if (line.isEmpty()) {
continue;
}
/* Get the expected number of elements within each CSV File Data Line.
This is based off of the number of actual delimiters within a file
data line plus 1. This is only calculated from the very first data
line. */
if (expectedNumberOfElements == 0) {
expectedNumberOfElements = line.replaceAll("[^\\" + csvFileDataDelimiter + "]", "").length() + 1;
}
/* Create and fill (with Null String) an array to be the expected
size of a CSV data line. This is done because if a data line
contains nothing for the last data element on that line then
when the line is split, the srray that is created will be short
by one element. This will ensure that there will alsways be a
Null String ("") present within the array when there is nothing
in the CSV data line. This null string is used in data validations
so as to provide a default value (like 0) if an Array Element
contains an actual Null String (""). */
String[] csvLineElements = new String[expectedNumberOfElements];
Arrays.fill(csvLineElements, "");
/* Take the array from the split (values) and place the data into
the csvLineElements[] array. */
String[] values = line.split("\\s*,\\s*"); // Takes care of any comma/whitespace combinations (if any).
for (int i = 0; i < values.length; i++) {
csvLineElements[i] = values[i];
}
/* Is the csvLineElements[] element a String representation of a signed
or unsigned integer data type value ("-?\\d+"). If so, convert the
String array element into an Integer value. If not, provide a default
value of 0. */
int cumulPos = Integer.parseInt(csvLineElements[6].matches("-?\\d+") ? csvLineElements[6] : "0");
int cumulDec = Integer.parseInt(csvLineElements[7].matches("-?\\d+") ? csvLineElements[7] : "0");
int cumuRec = Integer.parseInt(csvLineElements[8].matches("-?\\d+") ? csvLineElements[8] : "0");
int curPos = Integer.parseInt(csvLineElements[9].matches("-?\\d+") ? csvLineElements[9] : "0");
int hosp = Integer.parseInt(csvLineElements[10].matches("-?\\d+") ? csvLineElements[10] : "0");
int intenCar = Integer.parseInt(csvLineElements[11].matches("-?\\d+") ? csvLineElements[11] : "0");
/* Is the csvLineElements[] element a String representation of a signed
or unsigned integer or floating point value ("-?\\d+(\\.\\d+)?").
If so, convert the String array element into an Double data type value.
If not, provide a default value of 0.0 */
double latitude = Double.parseDouble(csvLineElements[4]
.matches("-?\\d+(\\.\\d+)?") ? csvLineElements[4] : "0.0d");
double longitude = Double.parseDouble(csvLineElements[5]
.matches("-?\\d+(\\.\\d+)?") ? csvLineElements[5] : "0.0d");
/* Create an instance of Country to pass into the constructor of
CovidRecord below. */
Country country = new Country(csvLineElements[1], csvLineElements[2],
csvLineElements[3], csvLineElements[13],
latitude, longitude);
// Create an add an instance of CovidRecord to the recordsList List.
recordsList.add(new CovidRecord(csvLineElements[0], cumulPos, cumulDec,
cumuRec, curPos, hosp, intenCar, country));
// Do what you want with the recordList List....
}
For obvious reasons, the code above was not tested. If you have any problems with it then let me know.
You will also notice the instead of the covidrecordArray[] CovidRecord Array I opted to use a List Interface named recordsList. This List can grow dynamically whereas the array is fixed meaning you need to determine the number of data lines within the file when initializing the array. This is not required with the List.
you can create one generic method for null check and check if it's null then return empty string or any thing else based on your needs
int hosp = Integer.parseInt(checkForNull(Values[10]));
public static String checkForNull(String val) {
return (val == null ? " " : val);
}

Adding new key-value pair gets other keys' values replaced in HashMap

So, I have a HashMap<String,ArrayList> that stores an arraylist per String. But when I add another pair with new value of ArrayList, the other key values are being replaced. Hence, all the values for the different keys are getting the same.
public class Reports{
private ArrayList<Resource> resourceList;
private HashMap<String,ArrayList<Resource>> consolidatedAttendance = new HashMap<String,ArrayList<Resource>>();
public void readReport(String reportFile){
//initialized with resources from config file
ArrayList<Resource> repResourceList = new ArrayList<Resource>(getResourceList());
try (BufferedReader br = new BufferedReader(new FileReader(reportFile))) {
String line;
line = br.readLine(); // disregards first line (columns)
while ((line = br.readLine()) != null) {
if(line.length()==0){
break;
}
//store each resource status in resourceList
String[] values = line.split(",");
String resourceName = values[1], resourceStatus = values[2];
int resourceIndex = indexOfResource(resourceList, resourceName);
// to add validation
if(resourceIndex!=-1){
repResourceList.get(resourceIndex).setStatus(resourceStatus);
}
}
}catch(IOException e){
e.printStackTrace();
}
//get Date
String reportFilename = reportFile.substring(0, reportFile.indexOf("."));
String strDate = reportFilename.substring(reportFilename.length()-9);
consolidateRecords(strDate, new ArrayList<Resource>(repResourceList));
}
public void consolidateRecords(String strDate, ArrayList<Resource> repResourceList){
//consolidate records in hashmap
consolidatedAttendance.put(strDate, repResourceList);
// test print
for (String key: consolidatedAttendance.keySet()){
ArrayList<Resource> resources = consolidatedAttendance.get(key);
for(Resource resource: resources){
System.out.println(key+": "+resource.getNickname()+" "+resource.getEid()+" "+resource.getStatus());
}
}
}
}
So the output for the map when it is printed is:
First key added:
"21-Dec-20": John Working
"21-Dec-20": Alice Working
"21-Dec-20": Jess Working
For second key, there's difference in the list. But,
When second key is added (after put() method), the first key's values have been replaced.
"21-Dec-20": John SL
"21-Dec-20": Alice Working
"21-Dec-20": Jess SL
"28-Dec-20": John SL
"28-Dec-20": Alice Working
"28-Dec-20": Jess SL
The values of your Map are Lists whose elements are the same as the elements of the List returned by getResourceList(). The fact that you are creating a copy of that List (twice), doesn't change that.
If each call to getResourceList() returns a List containing the same instances, all the keys in your Map will be associated with different Lists that contain the same instances.

How to select random text value from specific row using java

I have three input fields.
First Name
Last item
Date Of Birth
I would like to get random data for each input from a property file.
This is how the property file looks. Field name and = should be ignored.
- First Name= Robert, Brian, Shawn, Bay, John, Paul
- Last Name= Jerry, Adam ,Lu , Eric
- Date of Birth= 01/12/12,12/10/12,1/2/17
Example: For First Name: File should randomly select one name from the following names
Robert, Brian, Shawn, Bay, John, Paul
Also I need to ignore anything before =
FileInputStream objfile = new FileInputStream(System.getProperty("user.dir "+path);
in = new BufferedReader(new InputStreamReader(objfile ));
String line = in.readLine();
while (line != null && !line.trim().isEmpty()) {
String eachRecord[]=line.trim().split(",");
Random rand = new Random();
//I need to pick first name randomly from the file from row 1.
send(firstName,(eachRecord[0]));
If you know that you're always going to have just those 3 lines in your property file I would get put each into a map with an index as the key then randomly generate a key in the range of the map.
// your code here to read the file in
HashMap<String, String> firstNameMap = new HashMap<String, String>();
HashMap<String, String> lastNameMap = new HashMap<String, String>();
HashMap<String, String> dobMap = new HashMap<String, String>();
String line;
while (line = in.readLine() != null) {
String[] parts = line.split("=");
if(parts[0].equals("First Name")) {
String[] values = lineParts[1].split(",");
for (int i = 0; i < values.length; ++i) {
firstNameMap.put(i, values[i]);
}
}
else if(parts[0].equals("Last Name")) {
// do the same as FN but for lastnamemap
}
else if(parts[0].equals("Date of Birth") {
// do the same as FN but for dobmap
}
}
// Now you can use the length of the map and a random number to get a value
// first name for instance:
int randomNum = ThreadLocalRandom.current().nextInt(0, firstNameMap.size(0 + 1);
System.out.println("First Name: " + firstNameMap.get(randomNum));
// and you would do the same for the other fields
The code can easily be refactored with some helper methods to make it cleaner, we'll leave that as a HW assignment :)
This way you have a cache of all your values that you can call at anytime and get a random value. I realize this isn't the most optimum solution having nested loops and 3 different maps but if your input file only contains 3 lines and you're not expecting to have millions of inputs it should be just fine.
Haven't programmed stuff like this in a long time.
Feel free to test it, and let me know if it works.
The result of this code should be a HashMap object called values
You can then get the specific fields you want from it, using get(field_name)
For example - values.get("First Name"). Make sure to use to correct case, because "first name" won't work.
If you want it all to be lower case, you can just add .toLowerCase() at the end of the line that puts the field and value into the HashMap
import java.lang.Math;
import java.util.HashMap;
public class Test
{
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
// set the value of "in" here, so you actually read from it
HashMap<String, String> values = new HashMap<String, String>();
String line;
while (((line = in.readLine()) != null) && !line.trim().isEmpty()) {
if(!line.contains("=")) {
continue;
}
String[] lineParts = line.split("=");
String[] eachRecord = lineParts[1].split(",");
System.out.println("adding value of field type = " + lineParts[0].trim());
// now add the mapping to the values HashMap - values[field_name] = random_field_value
values.put(lineParts[0].trim(), eachRecord[(int) (Math.random() * eachRecord.length)].trim());
}
System.out.println("First Name = " + values.get("First Name"));
System.out.println("Last Name = " + values.get("Last Name"));
System.out.println("Date of Birth = " + values.get("Date of Birth"));
}
}

Lucene indexing - lots of docs/phrases

What approach should I use in indexing following set of files.
Each file contains around 500k lines of characters (400MB) - characters are not words, they are, lets say for sake of question random characters, without spaces.
I need to be able to find each line which contains given 12-character string, for example:
line:
AXXXXXXXXXXXXJJJJKJIDJUD....ect up to 200 chars
interesting part: XXXXXXXXXXXX
While searching, I'm only interested in characters 1-13 (so XXXXXXXXXXXX). After the search I would like to be able to read line containing XXXXXXXXXXXX without looping through the file.
I wrote following poc (simplified for question:
Indexing:
while ( (line = br.readLine()) != null ) {
doc = new Document();
Field fileNameField = new StringField(FILE_NAME, file.getName(), Field.Store.YES);
doc.add(fileNameField);
Field characterOffset = new IntField(CHARACTER_OFFSET, charsRead, Field.Store.YES);
doc.add(characterOffset);
String id = "";
try {
id = line.substring(1, 13);
doc.add(new TextField(CONTENTS, id, Field.Store.YES));
writer.addDocument(doc);
} catch ( IndexOutOfBoundsException ior ) {
//cut off for sake of question
} finally {
//simplified snipped for sake of question. characterOffset is amount of chars to skip which reading a file (ultimately bytes read)
charsRead += line.length() + 2;
}
}
Searching:
RegexpQuery q = new RegexpQuery(new Term(CONTENTS, id), RegExp.NONE); //cause id can be a regexp concernign 12char string
TopDocs results = searcher.search(q, Integer.MAX_VALUE);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
Map<String, Set<Integer>> fileToOffsets = new HashMap<String, Set<Integer>>();
for ( int i = 0; i < numTotalHits; i++ ) {
Document doc = searcher.doc(hits[i].doc);
String fileName = doc.get(FILE_NAME);
if ( fileName != null ) {
String foundIds = doc.get(CONTENTS);
Set<Integer> offsets = fileToOffsets.get(fileName);
if ( offsets == null ) {
offsets = new HashSet<Integer>();
fileToOffsets.put(fileName, offsets);
}
String offset = doc.get(CHARACTER_OFFSET);
offsets.add(Integer.parseInt(offset));
}
}
The problem with this approach is that, it will create one doc per line.
Can you please give me hints how to approach this problem with lucene and if lucene is a way to go here?
Instead of adding a new document for each iteration, use the same document and keep adding fields with the same name to it, something like:
Document doc = new Document();
Field fileNameField = new StringField(FILE_NAME, file.getName(), Field.Store.YES);
doc.add(fileNameField);
String id;
while ( (line = br.readLine()) != null ) {
id = "";
try {
id = line.substring(1, 13);
doc.add(new TextField(CONTENTS, id, Field.Store.YES));
//What is this (characteroffset) field for?
Field characterOffset = new IntField(CHARACTER_OFFSET, bytesRead, Field.Store.YES);
doc.add(characterOffset);
} catch ( IndexOutOfBoundsException ior ) {
//cut off
} finally {
if ( "".equals(line) ) {
bytesRead += 1;
} else {
bytesRead += line.length() + 2;
}
}
}
writer.addDocument(doc);
This will add the id from each line as a new term in the same field. The same query should continue to work.
I'm not really sure what to make of your use of the CharacterOffset field, though. Each value will, as with the ids, be appended to the end of the field as another term. It won't be directly associated with a particular term, aside from being, one would assume, the same number of tokens into the field. If you need to retreive a particular line, rather than the contents of the whole file, your current approach of indexing line by line might be the most reasonable.

Categories