Java - Text File - Reading between certain Strings - java

I tried to figure out the following problem for the last 20 hours, so I thought before I start thinking about jumping out of the window ;-), I better ask here for help:
I have a text file with following content:
ID
1
Title
Men and mice
Content
Lenny loves kittens
ID
2
Title
Here is now only the Title of a Book
ID
3
Content
Here is now only the Content of a Book
The problem as you can see is that there is either both title and content after id or only title after id.
I want to create text files which contain an ID value (for example 1) and the corresponding title value and/or content value.
The best I achieved was three lists. One with id values, one with title values and one with content values. But it is actually useless, because the information between id, content and title is lost.
I would really appreciate your held.

So you want to populate a collection of a class with three fields.
class Data {
int id;
String title;
String content;
// helper method to read a file and return a list.
public static List<Data> readAll(String filename) throws IOException {
// List we will return.
List<Data> ret = new ArrayList<Data>();
// last value we added.
Data last = null;
// Open a file as text so we can read the lines.
// us try-with-resource so the file is closed when we are done.
try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
// declare a String and use it in a loop.
// read line and stop when we get a null
for (String line; (line = br.readLine()) != null; ) {
// look the heading.
switch (line) {
case "ID":
// assume ID is always first
ret.add(last = new Data());
// read the next line and parse it as an integer
last.id = Integer.parseInt(br.readLine());
break;
case "Title":
// read the next line and save it as a title
last.title = br.readLine();
break;
case "Content":
// read the next line and save it as a content
last.content = br.readLine();
break;
}
}
}
return ret;
}
}
Note: the only field which matters is ID. Content and Title are optional.
To get from 20 hours down to 5 minutes, you need to practice, a lot.

You can keep "the information between id, content and title" in your program if you create a Book class and then have a list of Book instances.
Book class:
public class Book {
private int id;
private String title;
private String content;
//...
//getters and setters
}
List of books:
private List<Book> books = new ArrayList<Book>();

Related

Extract word document comments and the text they comment on

I need to extract word document comments and the text they comment on. Below is my current solution, but it is not working as expcted
public class Main {
public static void main(String[] args) throws Exception {
var document = new Document("sample.docx");
NodeCollection<Paragraph> paragraphs = document.getChildNodes(PARAGRAPH, true);
List<MyComment> myComments = new ArrayList<>();
for (Paragraph paragraph : paragraphs) {
var comments = getComments(paragraph);
int commentIndex = 0;
if (comments.isEmpty()) continue;
for (Run run : paragraph.getRuns()) {
var runText = run.getText();
for (int i = commentIndex; i < comments.size(); i++) {
Comment comment = comments.get(i);
String commentText = comment.getText();
if (paragraph.getText().contains(runText + commentText)) {
myComments.add(new MyComment(runText, commentText));
commentIndex++;
break;
}
}
}
}
myComments.forEach(System.out::println);
}
private static List<Comment> getComments(Paragraph paragraph) {
#SuppressWarnings("unchecked")
NodeCollection<Comment> comments = paragraph.getChildNodes(COMMENT, false);
List<Comment> commentList = new ArrayList<>();
comments.forEach(commentList::add);
return commentList;
}
static class MyComment {
String text;
String commentText;
public MyComment(String text, String commentText) {
this.text = text;
this.commentText = commentText;
}
#Override
public String toString() {
return text + "-->" + commentText;
}
}
}
sample.docx contents are:
And the output is (which is incorrect):
factors-->This is word comment
%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
Expected output is:
factors-->This is word comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->First paragraph comment
Please help me with a better way of extarcting word document comments and the text they comment on. If you need additional details let me know, I will provide all the required details
The commented text is marked by special nodes CommentRangeStart and CommentRangeEnd. CommentRangeStart and CommentRangeEnd nodes has Id, which corresponds the Comment id the range is linked to. So you need to extract content between the corresponding start and end nodes.
By the way, the code example in the Aspose.Words API reference shows how print the contents of all comments and their comment ranges using a document visitor. Looks like exactly what you are looking for.
EDIT: You can use code like the following to accomplish your task. I did not provide full code for extracting content between nodes, is is availabel on GitHub
Document doc = new Document("C:\\Temp\\in.docx");
// Get the comments in the document.
Iterable<Comment> comments = doc.getChildNodes(NodeType.COMMENT, true);
Iterable<CommentRangeStart> commentRangeStarts = doc.getChildNodes(NodeType.COMMENT_RANGE_START, true);
Iterable<CommentRangeEnd> commentRangeEnds = doc.getChildNodes(NodeType.COMMENT_RANGE_END, true);
for (Comment c : comments)
{
System.out.println(String.format("Comment %d : %s", c.getId(), c.toString(SaveFormat.TEXT)));
CommentRangeStart start = null;
CommentRangeEnd end = null;
// Search for an appropriate start and end.
for (CommentRangeStart s : commentRangeStarts)
{
if (c.getId() == s.getId())
{
start = s;
break;
}
}
for (CommentRangeEnd e : commentRangeEnds)
{
if (c.getId() == e.getId())
{
end = e;
break;
}
}
if (start != null && end != null)
{
// Extract content between the start and end nodes.
// Code example how to extract content between nodes is here
// https://github.com/aspose-words/Aspose.Words-for-Java/blob/master/Examples/src/main/java/com/aspose/words/examples/programming_documents/document/ExtractContentBetweenCommentRange.java
}
else
{
System.out.println(String.format("Comment %d Does not have comment range"));
}
}

A Very Strange StringIndexOutOfBoundsException

Before asking this question , i spent around half an hour on google , but since i didn't find a solution i thought i maybe should ask here.
So basically i'm using Java Reader to read a text file and converting each line of information into an Object that i called Nation ( With a constructor of course ) and making an array out of all those objects.
The problem is that a single line on my text file goes to 75 characters. But i get an error telling me that the length is only 68 ! So Here's the part of the code where i read informations from the file :
static int lireRemplir (String nomFichier, Nation[] nations)
throws IOException
{
boolean existeFichier = true;
int n =0;
FileReader fr = null;
try {
fr = new FileReader(nomFichier);
}
catch (java.io.FileNotFoundException erreur) {
System.out.println("Probléme avec l'ouverture du fichier " + nomFichier);
existeFichier = false;
}
if (existeFichier) {
BufferedReader entree = new BufferedReader(fr);
boolean finFichier = false;
while (!finFichier) {
String uneLigne = entree.readLine();
if (uneLigne == null) {
finFichier=true;
}
else {
nations[n] = new Nation(uneLigne.charAt(0),uneLigne.substring(55,63),
uneLigne.substring(64,74),uneLigne.substring(1,15),uneLigne.substring(36,54));
n++;
}
}
entree.close();
}
return n;
}
The Error i get is :
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
begin 64, end 74, length 68
Since i'm new here i tried to post an image of my text but i couldn't, so i'll just try to hand write an exemple:
2ETATS-UNIS WASHINGTON 9629047 291289535
4CHINE PEKIN 9596960 1273111290
3JAPON KYOTO 377835 12761000
There is alot of space between the words it's like an array!
If i change the 74 to 68 i get a result when i try to print my array , but the information is missing.
Here's my constructor:
public Nation(char codeContinent, String superficie, String population, String nom, String capitale) {
this.codeContinent = codeContinent;
this.superficie = superficie;
this.population = population;
this.nom = nom;
this.capitale = capitale;
}
I hope you could help me with this! If you need to know more about my code let me know ! Thank you very much.
To avoid Runtime Exceptions, you need to be careful with your code. In cases where you are dealing with indexes of a String or an array, please check for length of the String to be greater or equal to the maximum index you are using. Enclose you code that is throwing the exception within:
if(uneLigne.length() > 74) {
nations[n] = new Nation(uneLigne.charAt(0),uneLigne.substring(55,63),
uneLigne.substring(64,74),uneLigne.substring(1,15),uneLigne.substring(36,54));
} else {
//your logic to handle the line with less than 74 characters
}
This will ensure your code does not break even if any line is smaller than expected characters.
______________________________________________________________________________
Another approach
Adding the comment as an answer:
The other way would be to use split() method of String class or StringTokenizer class to get the array/tokens if the line is delimited with space or some other character. With this, you need not break the string using substring() method where you need to worry about the lengths and possible Runtime.
Check the below code snippet using split() method, for each line you read from file, you probably have to do this way:
Nation nation = null;
String uneLigne = "2ETATS-UNIS WASHINGTON 9629047 291289535";
String[] strArray = uneLigne.split(" ");
if(strArray.length > 3) {
nation = new Nation(getContinentCodeFromFirstElement(strArray[0]), strArray[1],
strArray[2], strArray[3], strArray[4]);
}
//getContinentCodeFromFirstElement(strArray[0]) is your private method to pick the code from your first token/element.
The simpliest way to solve your problem is to change the 74 by uneLigne.length.
Here's the new code.
nations[n] = new Nation(uneLigne.charAt(0),uneLigne.substring(55,63),
uneLigne.substring(64,uneLigne.length),uneLigne.substring(1,15),uneLigne.substring(36,54));

How can I read and search from file in jTable?

What I want to do in this code: When the search button is clicked it will read a file then match the search values with the data inside the file & will show the search result in the jTable.
Problems I am facing: If GPA is selected A+ then it shows A+, A- both & when I press the search button again after giving another search value, the table just adds more data in it.
Solutions needed: I want to just read the file and show only the results in the jTable, not adding the results again & again. The search button should do search in the GPA & Class columns only. & when GPA is selected "A/B/C+" or "-" the search result should give only the data containing that particular GPA.
NOTE: I don't want to change the search options.
I m a total newbie in JAVA. So any kind of help would be appreciated! :)
Screenshot of the UI
private void srchBtnActionPerformed(java.awt.event.ActionEvent evt) {
//file read
String filepath = "E:\\Netbeans workspace\\modified\\Project\\Info.txt";
File file = new File(filepath);
try {
BufferedReader br = new BufferedReader(new FileReader(file));
model = (DefaultTableModel)jTable1.getModel();
Object[] tableLines = br.lines().toArray();
for (int i = 0; i < tableLines.length; i++){
String line = tableLines[i].toString().trim();
String[] dataRow = line.split("/");
model.addRow(dataRow);
}
} catch (Exception ex) {
Logger.getLogger(ReceiverF.class.getName()).log(Level.SEVERE, null, ex);
}
//search from file
String bGroupSrch = (String) jComboBoxBGroup.getSelectedItem();
if(positiveRBtn.isSelected())
bGroupSrch = bGroupSrch + "+";
else if(negativeRBtn.isSelected())
bGroupSrch = bGroupSrch + "-";
String areaSrch = (String)jComboBoxArea.getSelectedItem();
if (bgGroup.getSelection() != null) {
filter(bGroupSrch);
filter(areaSrch);
} else {
SrchEMsg sem = new SrchEMsg(this);
sem.setVisible(true);
sem.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
}
}
//Filter Method
private void filter(String query){
TableRowSorter<DefaultTableModel> tr= new TableRowSorter<DefaultTableModel>(model);
jTable1.setRowSorter(tr);
tr.setRowFilter(RowFilter.regexFilter(query));
}
the table just adds more data in it.
When you start the search you do:
model.setRowCount(0);
to clear the data in the table model of the table.
Or the easier solution is to NOT reload the data all the time. Instead you just change the filter that is used by the table.
Read the section from the Swing tutorial on Sorting and Filtering. The code there replaces the filter every time a character is typed.
Your code will change the filter when the search option is changed.

Java string array doesn't print correctly

I am currently working on a Java program that crawls a webpage and prints out some information from it.
There is one part that I can't figure out, and thats when I try to print out one specific String Array with some information in it, all it gives me is " ] " for that line. However, a few lines before, I also try printing out another String array in the exact same way and it prints out fine. When I test what is actually being passed to the "categories" variable, its the correct information and can be printed out there.
public class Crawler {
private Document htmlDocument;
String [] keywords, categories;
public void printData(String urlToCrawl)
{
nextURL=urlToCrawl;
crawl();
//This does what its supposed to do. (Print Statement 1)
System.out.print("Keywords: ");
for (String i :keywords) {System.out.print(i+", ");}
//This doesnt. (Print Statement 2)
System.out.print("Categories: ");
for (String b :categories) {System.out.print(b+", ");}
}
public void crawl()
{
//Gather Data
//open up JSOUP for HTTP parsing.
Connection connection = Jsoup.connect(nextURL).userAgent(USER_AGENT);
Document htmlDocument = connection.get();
this.htmlDocument=htmlDocument;
System.out.println("Recieved Webpage "+ nextURL);
int guacCounter = 0;
for(Element guac : htmlDocument.select("script"))
{
if(guacCounter==5)
{
//String concentratedGuac = guac.toString();
String[] items = guac.toString().split("\\n");
categories = processGuac(items);
break;
}
else if(guacCounter<5) {
guacCounter++;
}
}
}
public String[] processKeywords(String totalKeywords)
{
String [] separatedKeywords = totalKeywords.split(",");
//System.out.println(separatedKeywords.toString());
return separatedKeywords;
}
public String[] processGuac(String[] inputGuac)
{
int categoryIsOnLine = 6;
String categoryData = inputGuac[categoryIsOnLine-1];
categoryData = categoryData.replace(",","");
categoryData = categoryData.replace("'","");
categoryData = categoryData.replace("|",",");
categoryData = categoryData.split(":")[1];
//this prints out the list of categories in string form.(Print Statement 3)
System.out.println("Testing here: " + categoryData.toString());
String [] categoryList=categoryData.split(",");
//This prints out the list of categories in array form correctly.(Print statement 4)
System.out.println("Testing here too: " );
for(String a : categoryList) {System.out.println(a);}
return categoryList;
}
}
I cut out a lot of the irrelevant parts of my code so there might be some missing variables.
Here is what my printouts look like:
PS1:
Keywords: What makes a good friend, making friends, signs of a good friend, supporting friends, conflict management,
PS2:
]
PS3:
Testing here: wellbeing,friends-and-family,friendships
PS4:
Testing here too:
wellbeing
friends-and-family
friendships

Modifying complex csv files in java

I wanted to write a program which can print, and modify the irregular csv files. The format is as follows:
1.date
2.organization name
3. student name, id number, residence
student name, id number, residence
student name, id number, residence
student name, id number, residence
student name, id number, residence
1.another date
2.another organization name
3. student name, id number, residence
student name, id number, residence
student name, id number, residence
..........
For instance, the data may be given as follows:
1. 10/09/2016
2. cycling club
3. sam, 1000, oklahoma
henry, 1001, california
bill, 1002, NY
1. 11/15/2016
2. swimming club
3. jane, 9001, georgia
elizabeth, 9002, lousiana
I am a beginner and I have not found any viable resource online which deals with this type of problem. My main concern is, how do we iterate through the loop and identify the date and name of the club, and feed them into a array?
Please advise.
I think this should be helpful for you. Basically there should be some pattern in your messed up csv. Below is my code to arrange your csv
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException {
PrintWriter writer = new PrintWriter("file.txt", "UTF-8");
try{
//Create object of FileReader
FileReader inputFile = new FileReader("csv.txt");
//Instantiate the BufferedReader Class
BufferedReader bufferReader = new BufferedReader(inputFile);
//Variable to hold the one line data
String line;
String date="";String org ="";String student ="";
// Read file line by line and print on the console
while ((line = bufferReader.readLine()) != null) {
if(line.contains("1.")){
if(date!="" || org!=""){
writer.println(date+","+org+","+student);
student ="";
}
date = line.substring(2);
}else if(line.contains("2.")){
org = line.substring(2);
}else{
line = "("+line+")";
student += line+",";
}
System.out.println(line);
}
writer.println(date+","+org+","+student);
//Close the buffer reader
bufferReader.close();
}catch(Exception e){
System.out.println("Error while reading file line by line:" + e.getMessage());
}
writer.close();
}
This is the output you will get for this
10/09/2016, cycling club,(3. sam, 1000, oklahoma),( henry, 1001, california),( bill, 1002, NY),
11/15/2016, swimming club,(3. jane, 9001, georgia),( elizabeth, 9002, lousiana),
I am reading the file from csv.txt. while loop goes through each line of text file.all the fields are stored in a variable. When next date comes I write all of them into output file. Last line of the csv is written to file after the while loop terminates.
Try uniVocity-parsers to handle this. For parsing this sort of format, you'll find a few examples here. For writing, look here and here.
Adapting from the examples I've given, you could write:
final ObjectRowListProcessor dateProcessor = new ObjectRowListProcessor();
final ObjectRowListProcessor clubProcessor = new ObjectRowListProcessor();
final ObjectRowListProcessor memberProcessor = new ObjectRowListProcessor();
InputValueSwitch switch = new InputValueSwitch(0){
public void rowProcessorSwitched(RowProcessor from, RowProcessor to) {
//your custom logic here
if (to == dateProcessor) {
//processing dates.
}
if (to == clubProcessor) {
//processing clubs.
}
if (to == memberProcessor){
//processing members
}
};
switch.addSwitchForValue("1.", dateProcessor, 1); //getting values of column 1 and sending them to `dateProcessor`
switch.addSwitchForValue("2.", clubProcessor, 1); //getting values of column 1 and sending them to `clubProcessor`
switch.addSwitchForValue("3.", memberProcessor, 1, 2, 3); //getting values of columns 1, 2, and 3 and sending them to `memberProcessor`
setDefaultSwitch(memberProcessor, 1, 2, 3); //Rows with blank value at column 0 are members. Also get columns 1, 2, and 3 and send them to `memberProcessor`
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial and examples
// configure the parser to use the switch
settings.setRowProcessor(switch);
//creates a parser
CsvParser parser = new CsvParser(settings);
//parse everying. Rows will be sent to the RowProcessor of each switch, depending on the value at column 0.
parser.parse(new File("/path/to/file.csv"));
Disclaimer: I'm the author of this library, it's open-source and free (Apache 2.0 license)

Categories