Parsing a file with specific file format - java

Hi i'm relatively new to java and i'm wondering how to go about parsing a certain file format into a 2D array.
The file format consists of a a comma separating values with < and /> separating an addition set of values.
<a,b,c/><x,y,z>
<...
<...
Each line will then be inputted into an array[ ] [ ] where the first set will go into the first column and the next set to the second.
The line should then be outputted to look like this.
a, b, c
x ,y ,z
...
Any help would be great thanks.
EDIT: this is what i have so far
public static main (String args[])
{
//Open file, read to get number of lines of file = numLine
int[][] array = new int[numLine][numLine]
for (int i = 0; i < numLine; i++)
{
//Unsure how to write element/line split
array[i][i] = //input each element to array
}
}
}

You can modify this to suit your need. I added some comments and so you might want to pay attention to them.
Scanner sc = new Scanner(file);
String[][] array = new String[numLine][numLine];//declaring the matrix
int r=0 , c=0;//declaring the index of the matrices column and row
while (sc.hasNextLine()) {
String line = sc.nextLine();
line = line.replaceAll("[<>]", "");//removing > and < so we gonna have a,b,c/x,y,z
String[] col = line.split("/");// spliting using / and we gonna have a,b,c x,y,z
for (String row : col) {
//a,b,c or x,y,z
String[] oneCol = row.split(",");
for (String oneRow : oneCol) {
if(c >= numLine){
c = 0;
break;
}
array[r][c] = oneRow;
c++;
}
r++;
//System.out.println();
}
c = 0;
}
sc.close();

As #Young Millie pointed out, what have you attempted so far? That being said, there are several approaches you may take, one of them being the following.
A valid attempt would be to read the file line by line, then remove all occurrences of the symbols using replaceAll(...) (which is further explained in their java docs) but instead you could use the following replace:
String line = "<a,b,c/><x,y,z>";
line = line.replaceAll("[<>]", "");
System.out.println("1. " + line);
with a result of:
1. a,b,c/x,y,z
and then we split split the string on "/", resulting in two arrays of your required strings:
String[] lines = line.split("/");
System.out.println("1. " + lines[0]);
System.out.println("2. " + lines[1]);
with a result of:
1. a,b,c
2. x,y,z

Related

Set int position to the line after a string is read

Edit: As some have asked, I will try to make it more clear. The user inserts a value, any value, into a text box. This is saved as the result int. The problem is finding the right line to insert the strings to for every choice the user might make.
I am trying to insert strings through a loop in a file and as it is right now, I'm using a static declaration of the location (line number) through an int. The problem is that if the number of iterations changes, the strings are not inserted in the right location.
In the code below, result represents the number of strings to be inserted, as written by the user in a text box.
for (int a = result; a >= 1; a--) {
Path path = Paths.get("ScalabilityModel.bbt");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
int position = 7;
String extraLine = "AttackNode" + a;
lines.add(position, extraLine);
Files.write(path, lines, StandardCharsets.UTF_8);
}
I would like to change "int position = 7" to something like position = "begin attack nodes" + 1 (so that the string is inserted on the line below the line that contains the string I'm looking for.
What's the easiest way to do this?
Assuming from the comments in the question that user wants to add 2 lines (for example). If user adds '2' into input box.
Please mention in the comment if I am missing something.
One of the way to get that can be:
public static void main(String[] args) throws IOException {
// Assuming the user input here
int result = 2;
for (int a = result; a >= 1; a--) {
Path path = Paths.get("ScalabilityModel.bbt");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
// Used CopyOnWriteArrayList to avoid ConcurrentModificationException
CopyOnWriteArrayList<String> myList = new CopyOnWriteArrayList<String>(lines);
// taking index to get the position of line when it matches the string
int index = 0;
for (String string : myList) {
index = index + 1;
if (string.equalsIgnoreCase("AttackNode")) {
myList.add(index, "AttackNode" + a);
}
}
Files.write(path, myList, StandardCharsets.UTF_8);
}
}
I moved the reading of the file to outside the loop and created a list of the lines to add. Since I wasn't sure what string you want to match with I added a variable searchString for this, so just replace it or assign the right value to it.
Path path = Paths.get("ScalabilityModel.bbt");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
String searchString = "abc";
List<String> newLines = new ArrayList<>();
for (int i = 0; i < result; i++) {
String extraLine = "AttackNode" + (result - i);
newLines.add(extraLine);
}
for (int i = 0; i < lines.size(); i++) {
if (lines.get(i).contains(searchString)) { //Check here can be modified to equeals, startsWith etc depending on the search pattern
if (i + 1 < lines.size()) {
lines.addAll(i + 1, newLines);
} else {
lines.addAll(newLines);
}
break;
}
}

using scanner to read file but skip blank lines into a 2d array

I am struggling to use scanner class to read in a text file while skipping the blank lines.
Any suggestions?
Scanner sc = new Scanner(new BufferedReader(new FileReader("training2.txt")));
trainingData = new double[48][2];
while(sc.hasNextLine()) {
for (int i=0; i<trainingData.length; i++) {
String[] line = sc.nextLine().trim().split(" ");
if(line.length==0)
{
sc.nextLine();
}else{
for (int j=0; j<line.length; j++) {
trainingData[i][j] = Double.parseDouble(line[j]);
}
}
}
}
if(sc.hasNextLine())
{
sc.nextLine();
}
sc.close();
I am currently trying to get it working like this. But it is not working
Scanner sc = new Scanner(new BufferedReader(new FileReader("training.txt")));
trainingData = new double[48][2];
while(sc.hasNextLine()) {
String line = sc.nextLine().trim();
if(line.length()!=0)
{
for (int i=0; i<trainingData.length; i++) {
String[] line2 = sc.nextLine().trim().split(" ");
for (int j=0; j<line2.length; j++) {
trainingData[i][j] = Double.parseDouble(line2[j]);
}
}
}
}
return trainingData;
while(sc.hasNextLine()) {
for (int i=0; i<trainingData.length; i++) {
String[] line = sc.nextLine().trim().split(" ");
You can't just check the scanner once to see if it has data and then use a loop to read the lines of data. You can't assume that you have 48 lines of data just because you define your array to hold 48 lines of data.
You need to go back to the basics and learn how to read data from a file one line at a time and then you process that data.
Here is a simple example to get you started:
import java.util.*;
public class ScannerTest2
{
public static void main(String args[])
throws Exception
{
String data = "1 2\n\n3 4\n\n5 6\n7 8";
// First attempt
System.out.println("Display All Lines");
Scanner s = new Scanner( data );
while (s.hasNextLine())
{
String line = s.nextLine();
System.out.println( line );
}
// Second attempt
System.out.println("Display non blank lines");
s = new Scanner( data );
while (s.hasNextLine())
{
String line = s.nextLine();
if (line.length() != 0)
{
System.out.println( line );
}
}
// Final attempt
String[][] values = new String[5][2];
int row = 0;
System.out.println("Add data to 2D Array");
s = new Scanner( data );
while (s.hasNextLine())
{
String line = s.nextLine();
if (line.length() != 0)
{
String[] digits = line.split(" ");
values[row] = digits;
row++;
}
}
for (int i = 0; i < values.length; i++)
System.out.println( Arrays.asList(values[i]) );
}
}
The example uses a String variable to simulate data from a file.
The first block of code is how you simply read all lines of data from the file. The logic simply:
invokes the hasNextLine() method so see if there is data
invokes the nextLine() method to get the line of data
display the data that was read
repeats steps 1-3 until there is no data.
Then next block of code simply adds an "if condition" so that you only display non-blank data.
Finally the 3rd block of code is closer to what you want. As it reads each line of data, it splits the data into an array and then adds this array to the 2D array.
This is the part of code you will need to change. You will need to convert the String array to an double array before adding it to your 2D array. So change this code first to get it working. Then once this works make the necessary changes to your real application once you understand the concept.
Note in my code how the last row displays [null, null]. This is why it is not a good idea to use arrays because you never know how big the array should be. If you have less that 5 you get the null values. If you have more than 5 you will get an out of bounds exception.
Try adding this to your code:
sc.skip("(\r\n)");
It will ignore blank lines. For More information: Scanner.skip()

Compare content of two text files and split words java

I know this question has been already asked several times but I can't find the way to apply it on my code.
So my propose is the following:
I have two files griechenland_test.txt and outagain5.txt . I want to read them and then get which percentage of outagain5.txt is inside the other file.
Outagain5 has input like that:
mit dem 542824
und die 517126
And Griechenland is an normal article from Wikipedia about that topic (so like normal text, without freqeuncy Counts).
1. Problem
- How can I split the input in bigramms? Like every two words, but always with the one before? So if I have words A, B, C, D --> get AB, BC, CD ?
I have this:
while ((sCurrentLine = in.readLine()) != null) {
// System.out.println(sCurrentLine);
arr = sCurrentLine.split(" ");
for (int i = 0; i < arr.length; i++) {
if (null == hash.get(arr[i])) {
hash.put(arr[i], 1);
} else {
int x = hash.get(arr[i]) + 1;
hash.put(arr[i], x);
}
}
Then I read the other file with this code ( I just add the word, and not the number (I split it with 4 spaces, so the two words are at h[0])).
for (String line = br.readLine(); line != null; line = br.readLine()) {
String h[] = line.split(" ");
words.add(h[0]);
}
2. Problem
Now I make the comparsion between the String x in hash and the String s in words. I have put the else System out.print to get which words are not contained in outagain5.txt, but there are several words printed out which ARE contained in outagain5.txt. I don't understand why :D
So I think that the comparsion doesn't work well or maybe this will be solved will fix the first problem.
ArrayList<String> words = new ArrayList<String>();
ArrayList<String> neuS = new ArrayList<String>();
ArrayList<Long> neuZ = new ArrayList<Long>();
for (String x : hash.keySet()) {
summe = summe + hash.get(x);
long neu = hash.get(x);
for (String s : words) {
if (x.equals(s)) {
neuS.add(x);
neuZ.add(neu);
disc = disc + 1;
} else {
System.out.println(x);
break;
}
}
}
Hope I made my question clear, thanks a lot!!
public static List<String> ngrams(int n, String str) {
List<String> ngrams = new ArrayList<String>();
String[] words = str.split(" ");
for (int i = 0; i < words.length - n + 1; i++)
ngrams.add(concat(words, i, i+n));
return ngrams;
}
public static String concat(String[] words, int start, int end) {
StringBuilder sb = new StringBuilder();
for (int i = start; i < end; i++)
sb.append((i > start ? " " : "") + words[i]);
return sb.toString();
}
It is much easier to use the generic "n-gram" approach so you can split every 2 or 3 words if you want. Here is the link I used to grab the code from: I have used this exact code almost any time I need to split words in the (AB), (BC), (CD) format. NGram Sequence.
If I recall, String has a method titled split(regex, count) that will split the item according to a specific point and you can tell it how many times to do it.
I am referencing this JavaDoc https://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String, int).
And I guess for running comparison between two text files I would recommend having your code read both of them, populated two unique arrays and then try to run comparisons between the two strings each time. Hope I helped.

how to separate values in a string index that are char and int

okay basically im wanting to separate the elements in a string from int and char values while remaining in the array, but to be honest that last parts not a requirement, if i need to separate the values into two different arrays then so be it, id just like to keep them together for neatness. this is my input:
5,4,A
6,3,A
8,7,B
7,6,B
5,2,A
9,7,B
now the code i have so far does generally what i want it to do, but not completely
here is the output i have managed to produce with my code but here is where im stuck
54A
63A
87B
76B
52A
97B
here is where the fun part is, i need to take the numbers and the character values and separate them so i can use them in a comparison/math formula.
basically i need this
int 5, 4;
char 'A';
but of course stored in the array that they are in.
Here is the code i have come up with so far.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
public class dataminingp1
{
String[] data = new String[100];
String line;
public void readf() throws IOException
{
FileReader fr = new FileReader("C:\\input.txt");
BufferedReader br = new BufferedReader(fr);
int i = 0;
while ((line = br.readLine()) != null)
{
data[i] = line;
System.out.println(data[i]);
i++;
}
br.close();
System.out.println("Data length: "+data.length);
String[][] root;
List<String> lines = Files.readAllLines(Paths.get("input.txt"), StandardCharsets.UTF_8);
root = new String[lines.size()][];
lines.removeAll(Arrays.asList("", null)); // <- remove empty lines
for(int a =0; a<lines.size(); a++)
{
root[a] = lines.get(a).split(" ");
}
String changedlines;
for(int c = 0; c < lines.size(); c++)
{
changedlines = lines.get(c).replace(',', ' '); // remove all commas
lines.set(c, changedlines);// Set the 0th index in the lines with the changedLine
changedlines = lines.get(c).replaceAll(" ", ""); // remove all white/null spaces
lines.set(c, changedlines);
changedlines = lines.get(c).trim(); // remove all null spaces before and after the strings
lines.set(c, changedlines);
System.out.println(lines.get(c));
}
}
public static void main(String[] args) throws IOException
{
dataminingp1 sarray = new dataminingp1();
sarray.readf();
}
}
i would like to do this as easily as possible because im not to incredibly far along with java but i am learning so if need be i can manage with a difficult process. Thank you in advance for any at all help you may give. Really starting to love java as a language thanks to its simplicity.
This is an addition to my question to clear up any confusion.
what i want to do is take the values stored in the string array that i have in the code/ input.txt and parse those into different data types, like char for character and int for integer. but im not sure how to do that currently so what im asking is, is there a way to parse these values all at the same time with out having to split them into different arrays cause im not sure how id do that since it would be crazy to go through the input file and find exactly where every char starts and every int starts, i hope this cleared things up a bit.
Here is something you could do:
int i = 0;
for (i=0; i<list.get(0).size(); i++) {
try {
Integer.parseInt(list.get(0).substring(i, i+1));
// This is a number
numbers.add(list.get(0).substring(i, i+1));
} catch (NumberFormatException e) {
// This is not a number
letters.add(list.get(0).substring(i, i+1));
}
}
When the character is not a number, it will throw a NumberFormatException, so, you know it is a letter.
for(int c = 0; c < lines.size(); c++){
String[] chars = lines.get(c).split(",");
String changedLines = "int "+ chars[0] + ", " + chars[1] + ";\nchar '" + chars[0] + "';";
lines.set(c, changedlines);
System.out.println(lines.get(c));
}
It is very easy, if your input format is standartized like this. As long as you dont specify more (like can have more than 3 variables in one row, or char can be in any column, not only just third, the easiest approach is this :
String line = "5,4,A";
String[] array = line.split(",");
int a = Integer.valueOf(array[0]);
int b = Integer.valueOf(array[1]);
char c = array[2].charAt(0);
Maybe something like this will help?
List<Integer> getIntsFromArray(String[] tokens) {
List<Integer> ints = new ArrayList<Integer>();
for (String token : tokens) {
try {
ints.add(Integer.parseInt(token));
} catch (NumberFormatException nfe) {
// ...
}
}
return ints;
}
This will only grab the integers, but maybe you could hack it around a bit to do what you want :p
List<String> lines = Files.readAllLines(Paths.get("input.txt"), StandardCharsets.UTF_8);
String[][] root = new String[lines.size()][];
for (int a = 0; a < lines.size(); a++) {
root[a] = lines.get(a).split(","); // Just changed the split condition to split on comma
}
Your root array now has all the data in the 2d array format where each row represents the each record/line from the input and each column has the data required(look below).
5 4 A
6 3 A
8 7 B
7 6 B
5 2 A
9 7 B
You can now traverse the array where you know that first 2 columns of each row are the numbers you need and the last column is the character.
Try this way by using getNumericValue() and isDigit methods. This might also work,
String myStr = "54A";
boolean checkVal;
List<Integer> myInt = new ArrayList<Integer>();
List<Character> myChar = new ArrayList<Character>();
for (int i = 0; i < myStr.length(); i++) {
char c = myStr.charAt(i);
checkVal = Character.isDigit(c);
if(checkVal == true){
myInt.add(Character.getNumericValue(c));
}else{
myChar.add(c);
}
}
System.out.println(myInt);
System.out.println(myChar);
Also check, checking character properties

add elements into the list in Java

Here is the code:
class Test {
public static void main (String[] args) throws Exception {
java.io.File fail = new java.io.File("C:/Users/Student/Desktop/Morze.txt");
java.util.Scanner sc = new java.util.Scanner(fail);
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] lst = line.split(" ");
int[] letter = new int[26];
int[] sumbol = new int[26];
for (int i = 0; i < lst.length; i++)
System.out.print(lst[i] + " ");
System.out.println();
// How to add?
}
}
}
Please, explain how can I add all letters into list Letter and symbols into list Sumbol?
Content of the file Morze.txt:
A .-
B -...
C -.-.
D -..
E .
F ..-.
G --.
H ....
I ..
J .---
K -.-
L .-..
M --
N -.
O ---
P .--.
Q --.-
R .-.
S ...
T -
U ..-
V ...-
W .--
X -..-
Y -.--
Z --..
Thanks!
You don't have a list, you have an array(s). It appears you want to add the values to two arrays. However you appear to have some code in your loop which should not be in your loop.
Additionally your data is text/String not numbers/int values.
String[] letter = new String[26];
String[] symbol = new String[26];
int count = 0;
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] lst = line.split(" ");
letter[count] = lst[0];
symbol[count] = lst[1];
count++;
}
for (int i = 0; i < count; i++)
System.out.println(letter[i] + " " + symbol[i]);
I'm going to offer a solution that fixes your implementation because I think it might help you understand a few concepts. However I would recommend once you get it working that you go back and read about the Java List interface and re-write your code. Lists are much cleaner way of maintaing sequences that may grow or shrink in length and will greatly reduce the complexity of your code.
You should start by moving your letter and symbol array declarations out of your while loop. Variables within a block in Java are scoped to its bounds. In other words, no statement outside the while loop has visibility of either array. This has the side-effect of creating a new array for every line you parse using your scanner.
int[] letter = new int[26];
int[] sumbol = new int[26];
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] lst = line.split(" ");
Next you'll need to know where to put your current symbol/letter in the array, an index. So you'll want to keep a count of how many lines/symbols you've processed so far.
int[] letter = new int[26];
int[] sumbol = new int[26];
int numberOfSymbolsProcessed = 0;
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] lst = line.split(" ");
Now you have two arrays and an index into each, add the symbol and letter to the array as follows...
int[] letter = new int[26];
int[] sumbol = new int[26];
int numberOfSymbolsProcessed = 0;
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] lst = line.split(" ");
letter[numberOfSymbolsProcessed] = lst[0];
sumbol[numberOfSymbolsProcessed] = lst[1];
numberOfSymbolsProcessed = numberOfSymbolsProcessed + 1;
This would be an excellent usecase for the List interface.
List<String> list = new LinkedList<String>();
while (sc.hasNextLine()) {
String line = sc.nextLine();
list.addAll(Arrays.asList(line.split(" ")));
}
If you know that your file will either have letters or symbols, then, what you can do is to use the Pattern class and use a regular expression such as
^[a-z][A-Z]+$
to check if the given string, in your case it will be lst[i] has one or more letters. The ^ at the beginning and $ at the end ensure that you have only letters in the string.
If the string matches the pattern, than you know that it is a letter, so you can add it to the Letter list. If it does not, you can add it to the symbol data structure.
I recommend that you do not use arrays, but rather dynamic data structures such as an ArrayList for your lists since this will grow dynamically as you add elements to it.
For more information regarding the pattern class, you can check this tutorial

Categories