I have a relatively inefficent CSVReader code, see below. It takes more than 30 seconds to read 30000+ lines. How to speed up this reading process as fast as possible?
public class DataReader {
private String csvFile;
private List<String> sub = new ArrayList<String>();
private List<List> master = new ArrayList<List>();
public void ReadFromCSV(String csvFile) {
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
System.out.println("Header " + br.readLine());
while ((line = br.readLine()) != null) {
// use comma as separator
String[] list = line.split(cvsSplitBy);
// System.out.println("the size is " + country[1]);
for (int i = 0; i < list.length; i++) {
sub.add(list[i]);
}
List<String> temp = (List<String>) ((ArrayList<String>) sub).clone();
// master.add(new ArrayList<String>(sub));
master.add(temp);
sub.removeAll(sub);
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(master);
}
public List<List> getMaster() {
return master;
}
}
UPDATE: I have found that my code actually can finish the reading work in less than 1 second if run it separately. As this DataReader is a part used by my simulation model to initialize the relevant properties. And the following part is associated with the use of the data imported, WHICH TAKES 40 SECONDS TO FINISH! Anyone could help by looking at the generic part of the codes?
// add route network
Network<Object> net = (Network<Object>)context.getProjection("IntraCity Network");
IndexedIterable<Object> local_hubs = context.getObjects(LocalHub.class);
for (int i = 0; i <= CSV_reader_route.getMaster().size() - 1; i++) {
String source = (String) CSV_reader_route.getMaster().get(i).get(0);
String target = (String) CSV_reader_route.getMaster().get(i).get(3);
double dist = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(6));
double time = Double.parseDouble((String) CSV_reader_route.getMaster().get(i).get(7));
Object source_hub = null;
Object target_hub = null;
Query<Object> source_query = new PropertyEquals<Object>(context, "hub_code", source);
for (Object o : source_query.query()) {
if (o instanceof LocalHub) {
source_hub = (LocalHub) o;
}
if (o instanceof GatewayHub) {
source_hub = (GatewayHub) o;
}
}
Query<Object> target_query = new PropertyEquals<Object>(context, "hub_code", target);
for (Object o : target_query.query()) {
if (o instanceof LocalHub) {
target_hub = (LocalHub) o;
}
if (o instanceof GatewayHub) {
target_hub = (GatewayHub) o;
}
}
// System.out.println(target_hub.getClass() + " " + time);
// Route this_route = (Route) net.addEdge(source_hub, target_hub);
// context.add(this_route);
// System.out.println(net.getEdge(source_hub, target_hub));
if (net.getEdge(source, target) == null) {
Route this_route = (Route) net.addEdge(source, target);
context.add(this_route);
// this_route.setDist(dist);
// this_route.setTime(time); }
}
}
In your code you are doing many write operation to just add the list of values from current row in your master list which is not required. You can replace the existing code with simple one as given below.
Existing code:
String[] list = line.split(cvsSplitBy);
// System.out.println("the size is " + country[1]);
for (int i = 0; i < list.length; i++) {
sub.add(list[i]);
}
List<String> temp = (List<String>) ((ArrayList<String>) sub).clone();
// master.add(new ArrayList<String>(sub));
master.add(temp);
sub.removeAll(sub);
Suggested code:
master.add(Arrays.asList(line.split(cvsSplitBy)));
I don't have a CSV that big, but you could try the following:
public static void main(String[] args) throws IOException {
Path csvPath = Paths.get("path/to/file.csv");
List<List<String>> master = Files.lines(csvPath)
.skip(1)
.map(line -> Arrays.asList(line.split(",")))
.collect(Collectors.toList());
}
EDIT: I tried it with a CSV sample with 50k entries and the code runs in less than one second.
With extends to the answer of #Alex R, you can process it in parallel as well like this:
public static void main(String[] args) throws IOException {
Path csvPath = Paths.get("path/to/file.csv");
List<List<String>> master = Files.lines(csvPath)
.skip(1).parallel()
.map(line -> Arrays.asList(line.split(",")))
.collect(Collectors.toList());
}
Related
I have a csv file, after I overwrite 1 line with the Write method, after re-writing to the file everything is already added to the end of the file, and not to a specific line
using System.Collections;
using System.Collections.Generic;
using UnityEngine.UI;
using UnityEngine;
using System.Text;
using System.IO;
public class LoadQuestion : MonoBehaviour
{
int index;
string path;
FileStream file;
StreamReader reader;
StreamWriter writer;
public Text City;
public string[] allQuestion;
public string[] addedQuestion;
private void Start()
{
index = 0;
path = Application.dataPath + "/Files/Questions.csv";
allQuestion = File.ReadAllLines(path, Encoding.GetEncoding(1251));
file = new FileStream(path, FileMode.Open, FileAccess.ReadWrite);
writer = new StreamWriter(file, Encoding.GetEncoding(1251));
reader = new StreamReader(file, Encoding.GetEncoding(1251));
writer.AutoFlush = true;
List<string> _questions = new List<string>();
for (int i = 0; i < allQuestion.Length; i++)
{
char status = allQuestion[i][0];
if (status == '0')
{
_questions.Add(allQuestion[i]);
}
}
addedQuestion = _questions.ToArray();
City.text = ParseToCity(addedQuestion[0]);
}
private string ParseToCity(string current)
{
string _city = "";
string[] data = current.Split(';');
_city = data[2];
return _city;
}
private void OnApplicationQuit()
{
writer.Close();
reader.Close();
file.Close();
}
public void IKnow()
{
string[] quest = addedQuestion[index].Split(';');
int indexFromFile = int.Parse(quest[1]);
string questBeforeAnsver = "";
for (int i = 0; i < quest.Length; i++)
{
if (i == 0)
{
questBeforeAnsver += "1";
}
else
{
questBeforeAnsver += ";" + quest[i];
}
}
Debug.Log("indexFromFile : " + indexFromFile);
for (int i = 0; i < allQuestion.Length; i++)
{
if (i == indexFromFile)
{
writer.Write(questBeforeAnsver);
break;
}
else
{
reader.ReadLine();
}
}
reader.DiscardBufferedData();
reader.BaseStream.Seek(0, SeekOrigin.Begin);
if (index < addedQuestion.Length - 1)
{
index++;
}
City.text = ParseToCity(addedQuestion[index]);
}
}
There are lines in the file by type :
0;0;Africa
0;1;London
0;2;Paris
The bottom line is that this is a game, and only those questions whose status is 0, that is, unanswered, are downloaded from the file. And if during the game the user clicks that he knows the answer, then there is a line in the file and is overwritten, only the status is no longer 0, but 1 and when the game is repeated, this question will not load.
It turns out for me that the first question is overwritten successfully, and all subsequent ones are simply added at the end of the file :
1;0;Africa
0;1;London
0;2;Paris1;1;London1;2;Paris
What's wrong ?
The video shows everything in detail
I'm trying to import a txt file with car info and separate the strings into arrays and then display them. The number of doors is combined with the next number plate. Have tried a few ways to get rid of the whitespace characters which I think is causing the issue but have had no luck.
whitespace chars
My code displays this result:
Number Plate : AG53DBO
Car Type : Mercedes
Engine Size : 1000
Colour : (255:0:0)
No. of Doors : 4
MD17WBW
Number Plate : 4
MD17WBW
Car Type : Volkswagen
Engine Size : 2300
Colour : (0:0:255)
No. of Doors : 5
ED03HSH
Code:
public class Application {
public static void main(String[] args) throws IOException {
///// ---- Import File ---- /////
String fileName =
"C:\\Users\\beng\\eclipse-workspace\\Assignment Trailblazer\\Car Data";
BufferedReader reader = new BufferedReader(new FileReader(fileName));
StringBuilder stringBuilder = new StringBuilder();
String line = null;
String ls = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
reader.close();
String content = stringBuilder.toString();
///// ---- Split file into array ---- /////
String[] dataList = content.split(",");
// Display array
for (String temp : dataList) {
// System.out.println(temp);
}
ArrayList<Car> carArray = new ArrayList();
// Loop variables
int listLength = 1;
int arrayPosition = 0;
// (dataList.length/5)
while (listLength < 5) {
Car y = new Car(dataList, arrayPosition);
carArray.add(y);
listLength++;
arrayPosition += 4;
}
for (Car temp : carArray) {
System.out.println(temp.displayCar());
}
}
}
And
public class Car {
String[] data;
private String modelUnpro;
private String engineSizeUnpro;
private String registrationUnpro;
private String colourUnpro;
private String doorNoUnpro;
// Constructor
public Car(String[] data, int arrayPosition) {
registrationUnpro = data[arrayPosition];
modelUnpro = data[arrayPosition + 1];
engineSizeUnpro = data[arrayPosition + 2];
colourUnpro = data[arrayPosition + 3];
doorNoUnpro = data[arrayPosition + 4];
}
// Getters
private String getModelUnpro() {
return modelUnpro;
}
private String getEngineSizeUnpro() {
return engineSizeUnpro;
}
private String getRegistrationUnpro() {
return registrationUnpro;
}
private String getColourUnpro() {
return colourUnpro;
}
private String getDoorNoUnpro() {
return doorNoUnpro;
}
public String displayCar() {
return "Number Plate : " + getRegistrationUnpro() + "\n Car Type : " + getModelUnpro() + "\n Engine Size : "
+ getEngineSizeUnpro() + "\n Colour : " + getColourUnpro() + "\n No. of Doors : " + getDoorNoUnpro() + "\n";
}
}
Text file:
AG53DBO,Mercedes,1000,(255:0:0),4
MD17WBW,Volkswagen,2300,(0:0:255),5
ED03HSH,Toyota,2000,(0:0:255),4
OH01AYO,Honda,1300,(0:255:0),3
WE07CND,Nissan,2000,(0:255:0),3
NF02FMC,Mercedes,1200,(0:0:255),5
PM16DNO,Volkswagen,1300,(255:0:0),5
MA53OKB,Honda,1400,(0:0:0),4
VV64BHH,Honda,1600,(0:0:255),5
ER53EVW,Ford,2000,(0:0:255),3
Remove Line separator from while loop.
String fileName = "D:\\Files\\a.txt";
BufferedReader reader = new BufferedReader(new FileReader(fileName));
StringBuilder stringBuilder = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
stringBuilder.append(line.trim());
}
reader.close();
String content = stringBuilder.toString();
String[] dataList = content.split(",");
ArrayList<Car> carArray = new ArrayList();
int listLength = 1;
int arrayPosition = 0;
// (dataList.length/5)
while (listLength < 3) {
Car y = new Car(dataList, arrayPosition);
carArray.add(y);
listLength++;
arrayPosition += 4;
}
for (Car temp : carArray) {
System.out.println(temp.displayCar());
}
In StringBuilder you collect all lines:
AG53DBO,Mercedes,1000,(255:0:0),4\r\nMD17WBW,Volkswagen,2300,(0:0:255),5\r\n...
This string should first be spit on ls - and then you have lines with fields separated by comma.
Now just splitting by comma will cause a doubled array element 4\r\nMD17WBW.
Something like:
String fileName =
"C:\\Users\\beng\\eclipse-workspace\\Assignment Trailblazer\\Car Data";
Path path = Paths.get(fileName);
List<String> lines = Files.readAllLines(path); // Without line ending.
List<Car> cars = new ArrayList<>();
for (String line : lines) {
String[] data = line.split(",");
Car car = new Car(data);
cars.add(car);
}
Path, Paths and especially Files are very handy classes. With java Streams one also can abbreviate things like:
String fileName =
"C:\\Users\\beng\\eclipse-workspace\\Assignment Trailblazer\\Car Data";
Path path = Paths.get(fileName);
List<Car> cars = Files.lines(path) // Stream<String>
.map(line -> line.split(",")) // Stream<String[]>
.map(Car::new) // Stream<Car>
.collect(Collectors.toList()); // List<Car>
Here .lines returns a Stream<String> (walking cursor) of lines in the file, without line separator.
Then .map(l -> l.split(",")) splits every line.
Then the Car(String[]) constructor is called on the string array.
Then the result is collected in a List.
I am reading a file with a disease name and its remedies. Therefore, i want to save the name as key and remedies in a set as the value. How can i reach that? It seems there is some problems in my code.
public static HashMap<String,Set<String>> disease = new HashMap <> ();
public static void main(String[] args) {
Scanner fin = null;
try {
fin = new Scanner (new File ("diseases.txt"));
while (fin.hasNextLine()) {
HashSet <String> remedies = null;
String [] parts = fin.nextLine().split(",");
int i = 1;
while (fin.hasNext()) {
remedies.add(parts[i].trim());
i++;
}
disease.put(parts[0],remedies);
}
fin.close();
}catch(Exception e) {
System.out.println("Error: " + e.getMessage());
}
finally {
try {fin.close();} catch(Exception e) {}
}
Set <String> result = disease.get("thrombosis");
display(result);
public static <T> void display (Set<T> items) {
if (items == null)
return;
int LEN = 80;
String line = "[";
for (T item:items) {
line+= item.toString() + ",";
if (line.length()> LEN) {
line = "";
}
}
System.out.println(line + "]");
}
here is my code
cancer,pain,swelling,bleeding,weight loss
gout,pain,swelling
hepatitis A,discoloration,malaise,tiredness
thrombosis,high heart rate
diabetes,frequent urination
and here is what the txt contains.
In your code , you haven't initialized the remedies HashSet(thats why it is throwing NullPointerException at line number 14).
and second issue is : i is getting incremented by 1 and you are not checking with size of your pats array ( i > parts.length) .
I edited your code :
Scanner fin = null;
try {
fin = new Scanner(new File("diseases.txt"));
while (fin.hasNextLine()) {
HashSet<String> remedies = new HashSet<String>();
String[] parts = fin.nextLine().split(",");
int i = 1;
while (fin.hasNext()&&parts.length>i) {
remedies.add(parts[i].trim());
i++;
}
disease.put(parts[0], remedies);
}
import java.util.HashMap;
import java.util.HashSet;
import java.util.Scanner;
import java.io.File;
import java.util.Set;
public class Solution {
public static HashMap<String, Set<String>> disease = new HashMap<>();
public static void main(String[] args) {
Scanner fin = null;
try {
fin = new Scanner (new File("diseases.txt"));
while (fin.hasNextLine()) {
HashSet <String> remedies = new HashSet<>();
String [] parts = fin.nextLine().split(",");
for (int i=1; i < parts.length; i++) {
remedies.add(parts[i].trim());
}
disease.put(parts[0],remedies);
}
fin.close();
}catch(Exception e) {
System.out.println("Error: " + e.getMessage());
}
finally {
try {fin.close();} catch(Exception e) {}
}
Set <String> result = disease.get("thrombosis");
display(result);
}
public static <T> void display(Set<T> items) {
if (items == null)
return;
int LEN = 80;
String line = "[";
for (T item : items) {
line += item.toString() + ",";
if (line.length() > LEN) {
line = "";
}
}
System.out.println(line + "]");
}
}
Here is full working code. As suggested by #Pratik that you forget to initialize HashSet that's why NullPointerException error was coming.
You have a few issues here:
no need for inner while loop (while (fin.hasNext()) {) - instead use `for(int i=1; i
HashSet <String> remedies = null; - this means the set is not initialized and we cannot put items in it - nede to change to: HashSet<String> remedies = new HashSet<>();
It is better practice to close() the file in the finally part
The 'display' method will delete the line (if it is longer than 80 characters) before printing it.
it is better to use StringBuilder when appending strings
So the corrected code would be:
import java.io.File;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
public class TestSOCode {
public static HashMap<String,Set<String>> disease = new HashMap<>();
private static int LINE_LENGTH = 80;
public static void main(String[] args) {
Scanner fin = null;
try {
fin = new Scanner(new File("diseases.txt"));
while (fin.hasNextLine()) {
HashSet<String> remedies = new HashSet<>();
String[] parts = fin.nextLine().split(",");
disease.put(parts[0], remedies);
for (int i = 1; i < parts.length; i++) {
remedies.add(parts[i].trim());
}
}
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
} finally {
try {
fin.close();
} catch (Exception e) {
System.out.println("Error when closing file: " + e.getMessage());
}
}
Set<String> result = disease.get("thrombosis");
display(result);
}
public static <T> void display (Set<T> items) {
if (items == null)
return;
StringBuilder line = new StringBuilder("[");
int currentLength = 1; // start from 1 because of the '[' char
for (T item:items) {
String itemStr = item.toString();
line.append(itemStr).append(",");
currentLength += itemStr.length() + 1; // itemStr length plus the ',' char
if (currentLength >= LINE_LENGTH) {
line.append("\n");
currentLength = 0;
}
}
// replace last ',' with ']'
line.replace(line.length() - 1, line.length(), "]");
System.out.println(line.toString());
}
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am still learning JAVA and have been trying to find a solution for my program for a few days, but I haven't gotten it fixed yet.
I have many text files (my program saves). The files look like this:
text (tab) number (tab) number (tab)...
text (tab) number (tab) number (tab)...
(tab) means that there is tabulation mark,
text means that is text (string),
number means that there is number (integer).
number of files can be from 1 up to 32 and file with names like: january1; january2; january3...
I need to read all of those files (ignore strings) and sum only numbers like so:
while ((line = br.readLine()) != null) {
counter=counter+1;
String[] info = line.split("\\s+");
for(int j = 2; j < 8; j++) {
int num = Integer.parseInt(info[j]);
data[j][counter]=data[j][counter]+num;
}
};
Simply I want sum all that "tables" to array of arrays (or to any similar kind of variable) and then display it as table. If someone knows any solution or can link any similar calculation, that would be awesome!
So, as I see it, you have four questions you need answered, this goes against the site etiquette of asking A question, but will give it a shot
How to list a series of files, presumably using some kind of filter
How to read a file and process the data in some meaningful way
How to manage the data in data structure
Show the data in a JTable.
Listing files
Probably the simplest way to list files is to use File#list and pass a FileFilter which meets your needs
File[] files = new File(".").listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().startsWith("janurary");
}
});
Now, I'd write a method which took a File object representing the directory you want to list and a FileFilter to use to search it...
public File[] listFiles(File dir, FileFilter filter) throws IOException {
if (dir.exists()) {
if (dir.isDirectory()) {
return dir.listFiles(filter);
} else {
throw new IOException(dir + " is not a valid directory");
}
} else {
throw new IOException(dir + " does not exist");
}
}
This way you could search for a number of different set of files based on different FileFilters.
Of course, you could also use the newer Paths/Files API to find files as well
Reading files...
Reading multiple files comes down to the same thing, reading a single file...
// BufferedReader has a nice readline method which makes
// it easier to read text with. You could use a Scanner
// but I prefer BufferedReader, but that's me...
try (BufferedReader br = new BufferedReader(new FileReader(new File("...")))) {
String line = null;
// Read each line
while ((line = br.readLine()) != null) {
// Split the line into individual parts, on the <tab> character
String parts[] = line.split("\t");
int sum = 0;
// Staring from the first number, sum the line...
for (int index = 1; index < parts.length; index++) {
sum += Integer.parseInt(parts[index].trim());
}
// Store the key/value pairs together some how
}
}
Now, we need some way to store the results of the calculations...
Have a look at Basic I/O for more details
Managing the data
Now, there are any number of ways you could do this, but since the amount of data is variable, you want a data structure that can grow dynamically.
My first thought would be to use a Map, but this assumes you want to combining rows with the same name, otherwise you should just us a List within a List, where the outer List represents the rows and the Inner list represents the column values...
Map<String, Integer> data = new HashMap<>(25);
File[] files = listFiles(someDir, januraryFilter);
for (File file : files {
readFile(file, data);
}
Where readFile is basically the code from before
protected void readData(File file, Map<String, Integer> data) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line = null;
// Read each line
while ((line = br.readLine()) != null) {
//...
// Store the key/value pairs together some how
String name = parts[0];
if (data.containsKey(name)) {
int previous = data.get(name);
sum += previous;
}
data.put(name, sum);
}
}
}
Have a look at the Collections Trail for more details
Showing the data
And finally, we need to show the data. You could simply use a DefaultTableModel, but you already have the data in structure, why not re-use it with a custom TableModel
public class SummaryTableModel extends AbstractTableModel {
private Map<String, Integer> data;
private List<String> keyMap;
public SummaryTableModel(Map<String, Integer> data) {
this.data = new HashMap<>(data);
keyMap = new ArrayList<>(data.keySet());
}
#Override
public int getRowCount() {
return data.size();
}
#Override
public int getColumnCount() {
return 2;
}
#Override
public Class<?> getColumnClass(int columnIndex) {
Class type = Object.class;
switch (columnIndex) {
case 0:
type = String.class;
break;
case 1:
type = Integer.class;
break;
}
return type;
}
#Override
public Object getValueAt(int rowIndex, int columnIndex) {
Object value = null;
switch (columnIndex) {
case 0:
value = keyMap.get(rowIndex);
break;
case 1:
String key = keyMap.get(rowIndex);
value = data.get(key);
break;
}
return value;
}
}
Then you would simply apply it to a JTable...
add(new JScrollPane(new JTable(new SummaryTableModel(data)));
Take a look at How to Use Tables for more details
Conclusion
There are a lot of assumptions that have to be made which are missing from the context of the question; does the order of the files matter? Do you care about duplicate entries?
So it becomes near impossible to provide a single "answer" which will solve all of your problems
I took all the january1 january2... files from the location and used your same function to calculate the value to be stored.
Then I created a table with two headers, Day and Number. Then just added rows according to the values generated.
DefaultTableModel model = new DefaultTableModel();
JTable table = new JTable(model);
String line;
model.addColumn("Day");
model.addColumn("Number");
BufferedReader br = null;
model.addRow(new Object[]{"a","b"});
for(int i = 1; i < 32; i++)
{
try {
String sCurrentLine;
String filename = "january"+i;
br = new BufferedReader(new FileReader("C:\\january"+i+".txt"));
int counter = 0;
while ((sCurrentLine = br.readLine()) != null) {
counter=counter+1;
String[] info = sCurrentLine.split("\\s+");
int sum = 0;
for(int j = 2; j < 8; j++) {
int num = Integer.parseInt(info[j]);
sum += num;
}
model.addRow(new Object[]{filename, sum+""});
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
JFrame f = new JFrame();
f.setSize(300, 300);
f.add(new JScrollPane(table));
f.setVisible(true);
Use Labled Loop and Try-Catch. Below piece adds all number in a line.
You could get some hint from here:
String line = "text 1 2 3 4 del";
String splitLine[] = line.split("\t");
int sumLine = 0;
int i = 0;
contSum: for (; i < splitLine.length; i++) {
try {
sumLine += Integer.parseInt(splitLine[i]);
} catch (Exception e) {
continue contSum;
}
}
System.out.println(sumLine);
Here is another example using vectors . in this example directories will be searched for ".txt" files and added to the JTable.
The doIt method will take in the folder where your text files are located.
this will then with recursion, look for files in folders.
each file found will be split and added following you example file.
public class FileFolderReader
{
private Vector<Vector> rows = new Vector<Vector>();
public static void main(String[] args)
{
FileFolderReader fileFolderReader = new FileFolderReader();
fileFolderReader.doIt("D:\\folderoffiles");
}
private void doIt(String path)
{
System.out.println(findFile(new File(path)) + " in total");
JFrame frame = new JFrame();
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
Vector<String> columnNames = new Vector<String>();
columnNames.addElement("File Name");
columnNames.addElement("Size");
JTable table = new JTable(rows, columnNames);
JScrollPane scrollPane = new JScrollPane(table);
frame.add(scrollPane, BorderLayout.CENTER);
frame.setSize(300, 150);
frame.setVisible(true);
}
private int findFile(File file)
{
int totalPerFile = 0;
int total = 0;
File[] list = file.listFiles(new FilenameFilter()
{
public boolean accept(File dir, String fileName)
{
return fileName.endsWith(".txt");
}
});
if (list != null)
for (File textFile : list)
{
if (textFile.isDirectory())
{
total = findFile(textFile);
}
else
{
totalPerFile = scanFile(textFile);
System.out.println(totalPerFile + " in " + textFile.getName());
Vector<String> rowItem = new Vector<String>();
rowItem.addElement(textFile.getName());
rowItem.addElement(Integer.toString(totalPerFile));
rows.addElement(rowItem);
total = total + totalPerFile;
}
}
return total;
}
public int scanFile(File file)
{
int sum = 0;
Scanner scanner = null;
try
{
scanner = new Scanner(file);
while (scanner.hasNextLine())
{
String line = scanner.nextLine();
String[] info = line.split("\\s+");
int count = 1;
for (String stingInt : info)
{
if (count != 1)
{
sum = sum + Integer.parseInt(stingInt);
}
count++;
}
}
scanner.close();
}
catch (FileNotFoundException e)
{
// you will need to handle this
// don't do this !
e.printStackTrace();
}
return sum;
}
}
I have written code for to find out the broken links present in the website using selenium webdriver in java. As links are getting added in the HashSet while launching the different urls. I have tried to read the added urls from HashSet it stops executing after sometime. This is happening because iterator remains as it is even adding of new links to the HashSet. I want that execution should continue for all links present in the HashSet.
[I have tried to convert Set to an array but duplicate links are executing multiple times.]
public Set<String> unique_links;
HashMap<String, String> result;
Set<String> finalLinkSet = new HashSet<>();
Set<String> hs = new HashSet<>();
Set<String> uniqueLinkSet = new HashSet<>();
// String[] finalLinkArray;
String[] finalLinkArray;
boolean isValid = false;
FileWriter fstream;
BufferedWriter out;
int count = 1;
int FC = 0;
Set<String> secondaryset = new HashSet<>();
// String Responsecode = null;
#Test
public void LinkTesting() throws IOException, RowsExceededException,
WriteException {
w.manage().deleteAllCookies();
unique_links = new HashSet<String>();
w.get("http://www.skyscape.com");
ArrayList<WebElement> urlList = new ArrayList<WebElement>();
urlList = (ArrayList<WebElement>) w.findElements(By.tagName("a"));
setFinalLinkSet(getUniqueList(urlList));
for(Iterator<String> i = finalLinkSet.iterator(); i.hasNext(); ) {
System.out.println(finalLinkSet.size());
String currenturl = (String) i.next();
if ((currenturl.length() > 0 && currenturl
.startsWith("http://www.skyscape.com"))) {
if (!currenturl.startsWith("http://www.skyscape.com/estore/")&&
(!currenturl.startsWith("http://www.skyscape.com/demos/"))) {
System.out.println(currenturl);
getResponseCode(currenturl);
}
}
}
writetoexcel();
}
public void setFinalLinkSet(Set<String> finalLinkSet) {
this.finalLinkSet = finalLinkSet;
}
// function to get link from page and return array list of links
public Set<String> getLinksOnPage(String url) {
ArrayList<WebElement> secondaryUrl = new ArrayList<WebElement>();
secondaryUrl = (ArrayList<WebElement>) w.findElements(By.tagName("a"));
for (int i = 0; i < secondaryUrl.size(); i++) {
secondaryset.add((secondaryUrl.get(i).getAttribute("href")
.toString()));
}
return secondaryset;
}
// function to fetch link from array list and store unique links in hashset
public Set<String> getUniqueList(ArrayList<WebElement> url_list) {
for (int i = 0; i < url_list.size(); i++) {
uniqueLinkSet.add(url_list.get(i).getAttribute("href").toString());
}
return uniqueLinkSet;
}
public boolean getResponseCode(String url) {
boolean isValid = false;
if (result == null) {
result = new HashMap<String, String>();
}
try {
URL u = new URL(url);
w.navigate().to(url);
HttpURLConnection h = (HttpURLConnection) u.openConnection();
h.setRequestMethod("GET");
h.connect();
System.out.println(h.getResponseCode());
if ((h.getResponseCode() != 500) && (h.getResponseCode() != 404)
&& (h.getResponseCode() != 403)
&& (h.getResponseCode() != 402)
&& (h.getResponseCode() != 400)
&& (h.getResponseCode() != 401)) {
// && (h.getResponseCode() != 302)) {
//getLinksOnPage(url);
Set<String> unique2 = getLinksOnPage(url);
setFinalLinkSet(unique2);
result.put(url.toString(), "" + h.getResponseCode());
} else {
result.put(url.toString(), "" + h.getResponseCode());
FC++;
}
return isValid;
} catch (Exception e) {
}
return isValid;
}
private void writetoexcel() throws IOException, RowsExceededException,
WriteException {
FileOutputStream fo = new FileOutputStream("OldLinks.xls");
WritableWorkbook wwb = Workbook.createWorkbook(fo);
WritableSheet ws = wwb.createSheet("Links", 0);
int recordsToPrint = result.size();
Label HeaderUrl = new Label(0, 0, "Urls");
ws.addCell(HeaderUrl);
Label HeaderCode = new Label(1, 0, "Response Code");
ws.addCell(HeaderCode);
Label HeaderStatus = new Label(2, 0, "Status");
ws.addCell(HeaderStatus);
Iterator<Entry<String, String>> it = result.entrySet().iterator();
while (it.hasNext() && count < recordsToPrint) {
String Responsecode = null;
Map.Entry<String, String> pairs = it.next();
System.out.println("Value is --" + pairs.getKey() + " - "
+ pairs.getValue() + "\n");
Label Urllink = new Label(0, count, pairs.getKey());
Label RespCode = new Label(1, count, pairs.getValue());
Responsecode = pairs.getValue();
System.out.println(Responsecode);
if ((Responsecode.equals("500")) || (Responsecode.equals("404"))
|| (Responsecode.equals("403"))
|| (Responsecode.equals("400"))
|| (Responsecode.equals("402"))
|| (Responsecode.equals("401"))) {
// || (Responsecode.equals("302"))) {
Label Status1 = new Label(2, count, "Fail");
ws.addCell(Status1);
} else {
Label Status2 = new Label(2, count, "Pass");
ws.addCell(Status2);
}
try {
ws.addCell(Urllink);
} catch (RowsExceededException e) {
e.printStackTrace();
} catch (WriteException e) {
e.printStackTrace();
}
ws.addCell(RespCode);
count++;
}
Label FCS = new Label(4, 1, "Fail Urls Count is = " + FC);
ws.addCell(FCS);
wwb.write();
wwb.close();
}
}
In short, as far as I understand the problem: You have (at least) two threads (although I couldn't find them in the too long code example), one is adding entries to the HashSet, and the other should continuously list elements as they are added to the HashSet.
1st: You should use a concurrent data structure for this, but not a simple HashSet.
2nd: Iterators of HashSet do not support concurrent modification, so you can now have an iterator "waiting" for new entries being added.
Best is to change your code to use some kind of event-message pattern (sometimes also called broadcaster/listener), where the finding of a new URL generates an event, that other parts of your code listen to and then write them to the file.
Your loop finishes (earlier than desired) for the following reasons:
The initiation part Iterator<String> i = finalLinkSet.iterator()of your for-loop
for(Iterator<String> i = finalLinkSet.iterator(); i.hasNext(); ) {
is evaluated once when the loop is started. Hence it will not react on changes to finalLinkSet even if there where some.
You are not making any changes to finalLinkSet. Instead you are overwriting it with a new set when calling
setFinalLinkSet(unique2);
So instead you should:
Use a list, so you have ordered elements. (Adding entries to an unordered set will make it impossible to know which ones you already have iterated over). I suggest you therefore use an ArrayList<String>, so you have constant access time by the little drawback of performance for resizing on adding new entries.
Modify your for-loop to use an index, so evaluating the init-part once is sufficient and you can react on the changing size of list:
for(int i = 0; i < finalLinkList.size(); i++) {
System.out.println(finalLinkSet.size());
String currenturl = (String) finalLinkList.get(i);
Then instead of overwriting the list you should:
// for both occurrences
addToFinalLinkList(...); // see new code below
and
public void addToFinalLinkList(Set<String> tempSet) {
for(String url: tempSet)
{
if(!finalLinkList.contains(url))
finalListList.add(url);
}
}
I know this is not best from the performance point of view, but since you are inside a test, this shouldn't be a problem from what I see...