Split file having Integers and string into strings only - java

i have the file which has data stored as " Integer-> \t(tab)-> String ->couple of space-> ".
Am I doing Wrong?
What I am doing is.
Trie t = new Trie();
BufferedReader bReader = new BufferedReader(new FileReader(
"H:\\100kfound.txt"));
String line;
String[] s = null;
while ((line = bReader.readLine()) != null) {
s = line.split("\t");
}
int i;
for (i = 0; i < s.length; i++) {
System.out.println(s[i]);
if (!(s[i].matches("\\d+"))) {
t.addWord(s[i]);
System.out.println(s[i]);
}
}
What I can see by debugging it is going properly till while loop but in for loop it just stores two strings and prints the same.

You might want to and a ^[0-9]+$ for the expressions so you just get complete integers. Without the ^ and $ you could be matching other characters like tt55gh would match.
if (!(s[i].matches("^[0-9]+$"))) {
}
Per the comment above you need to move the for loop inside the while loop.
while ((line = bReader.readLine()) != null) {
s = line.split("\t");
for (int i = 0; i < s.length; i++) {
System.out.println("Value "+i+": "+s[i]);
if (!(s[i].matches("^[0-9]+$"))) {
t.addWord(s[i]);
System.out.println("Integer "+i+": "+s[i]);
}
}
}

Related

Doesn't save the words in array

i've got a propably simple question. I try to read the file and i want to add each single word to my array "phrase". The problem occures in for loop. I got the exception "index 0 out of bounds for length 0".
Can you please help me with that?
String [] tokens;
String line;
String hash = " ";
int n = 0;
String [] phrase = new String [n];
public void loadFile()
{
try
{
#SuppressWarnings("resource")
BufferedReader br = new BufferedReader(new FileReader("z3data1.txt"));
while((line = br.readLine()) != null)
{
tokens = line.split("[ ]");
n += tokens.length;
}
for(int j = 0; j<tokens.length; j++)
{
phrase[j] = tokens[j];
}
}
catch(IOException ex)
{
ex.printStackTrace();
}
}
A couple observations.
you are getting the error because your array is not large enough and the index j is exceeding its size.
you keep overwriting tokens in the while loop. The while loop needs to encompass the copying of the tokens to the phrase array.
So try the following:
while((line = br.readLine()) != null) {
tokens = line.split("[ ]");
n += tokens.length; // don't really need this.
//starting offset to write into phrase
int len = phrase.length;
phrase = Arrays.copyOf(phrase,phrase.length + tokens.length);
for(int j = 0; j<tokens.length; j++) {
phrase[j + len] = tokens[j];
}
}
This statement
phrase = Arrays.copyOf(phrase,phrase.length + tokens.length)
Copies the contents of phrase and increases the array size to handle the writing of tokens.
Another (and probably preferred) alternative is to use a List<String> which grows as you need it.
List<String> phrase = new ArrayList<>();
for(int j = 0; j<tokens.length; j++) {
phrase.add(tokens[j]);
}
// or skip the loop and just do
Collections.addAll(phrase,tokens);
One observation. I don't know what you are splitting on but your split statement looks suspicious.
You're setting n to 0, so phrase is also of length 0 when you say String[] phrase = String[n]. Therefore, you can't add anything to it.
If you want something of variable length, you can use an ArrayList. In the code below, you can directly use Collections.addAll to split up the line and put everything into the phrase ArrayList.
String line;
//Note that you can get rid of tokens here, since it's being inlined below
ArrayList<String> phrase = new ArrayList<>();
public void loadFile()
{
try
{
#SuppressWarnings("resource")
BufferedReader br = new BufferedReader(new FileReader("z3data1.txt"));
while((line = br.readLine()) != null)
{
//No need for a for-loop below, you can do everything in one line
Collections.addAll(phrase, line.split("[ ]"));
}
}
catch(IOException ex)
{
ex.printStackTrace();
}
}

Procedure to bold Strings in a text file

So, I'm working on a procedure that has an entry of a txt file called orders that specifies the number of words to bold and wich words must be bolded. I've managed to to it for one word but when i try with two words the output gets doubled. For example:
Input:
2
Ophelia
him
Output:
ACT I
ACT I
SCENE I. Elsinore. A platform before the castle.
SCENE I. Elsinore. A platform before the castle.
FRANCISCO at his post. Enter to him BERNARDO
FRANCISCO at his post. Enter to *him* BERNARDO
Here's my code, can anyone help me? PS: Ignore the boolean I guess.
static void bold(char bold, BufferedReader orders, BufferedReader in, BufferedWriter out) throws IOException
{
String linha = in.readLine();
boolean encontrou = false;
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for (int i = 0; i < palavras.length; i++)
{
palavras[i] = orders.readLine();
}
while (linha != null)
{
StringBuilder str = new StringBuilder(linha);
for (int i = 0; i < palavras.length && !encontrou; i++)
{
if (linha.toLowerCase().indexOf(palavras[i]) != -1)
{
str.insert((linha.toLowerCase().indexOf(palavras[i])), bold);
str.insert((linha.toLowerCase().indexOf(palavras[i])) + palavras[i].length() + 1, bold);
out.write(str.toString());
out.newLine();
}
else
{
out.write(linha);
out.newLine();
}
}
linha = in.readLine();
}
}
This merits a regular expression replace of WORD-BOUNDARY + ALTERNATIVES + WORD-BOUNDARY.
String linha = in.readLine(); // Read number of words to be bolded.
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for(int i = 0; i < palavras.length; i++){
palavras[i]=orders.readLine();
}
// We make a regular expression Pattern.
// Like "\\b(him|her|it)\\b" where \\b is a word-boundary.
// This prevents mangling "shimmer".
StringBuilder regex = new StringBuilder("\\b(");
for (int i = 0; i < palavras.length; i++) {
if (i != 0) {
regex.append('|');
}
regex.append(Pattern.quote(palavras[i]));
}
regex.append(")\\b");
Pattern pattern = Pattern.compile(regex.toString(), Pattern.CASE_INSENSITIVE);
boolean encontrou = false;
linha = in.readLine(); // Read first line.
while(linha != null){
Matcher m = pattern.matcher(linha);
String linha2 = m.replaceAll(pattern, "*$1*");
if (linha2 != linha) {
encontrou = true; // Found a replacement.
}
out.write(linha2);
out.newLine();
linha = in.readLine(); // Read next line.
}
A replaceAll (instead of replaceFirst) then replaces all occurrences.
It's writing out twice because you output your StringBuilder (out.write(str.toString())) for the line (linha) every time you iterate through it, which will be at least the number of words in the lookup list.
Move the out.write() statements outside the loop and you should be fine.
Note this will only find one match in each line for each word. If you need to find more than one, the code is a little more complicated. You need to introduce a while loop instead of your if test for matching, or you could consider using replaceAll() using a regular expression based on your word palavras[i]. Ensuring you respected the capitalisation of the original is not simple there, but possible.
Fixed version
static void bold(char bold, BufferedReader orders, BufferedReader in, BufferedWriter out)
throws IOException
{
String linha = in.readLine();
boolean encontrou = false;
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for (int i = 0; i < palavras.length; i++)
{
palavras[i] = orders.readLine();
}
while (linha != null)
{
StringBuilder str = new StringBuilder(linha);
for (int i = 0; i < palavras.length && !encontrou; i++)
{
if (linha.toLowerCase().indexOf(palavras[i]) != -1)
{
str.insert((linha.toLowerCase().indexOf(palavras[i])), bold);
str.insert(
(linha.toLowerCase().indexOf(palavras[i])) + palavras[i].length() + 1,
bold);
}
}
out.write(str.toString());
out.newLine();
linha = in.readLine();
}
}
With replaceAll
static void bold(char bold, BufferedReader orders, BufferedReader in, BufferedWriter out)
throws IOException
{
String linha = in.readLine();
boolean encontrou = false;
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for (int i = 0; i < palavras.length; i++)
{
palavras[i] = orders.readLine();
}
while (linha != null)
{
for (int i = 0; i < palavras.length && !encontrou; i++)
{
String regEx = "\\b("+palavras[i]+")\\b";
linha = linha.replaceAll(regEx, bold + "$1"+bold);
}
out.write(linha);
our.newLine();
linha = in.readLine();
}
}
P.S. I've left the found boolean (encontrou) in, although it is not doing anything at the moment.

Making bufferedreader from loop, to next line?

sorry about the poor title, didn't quite know what to call it!
basically I made the loop below to show the first six lines from the file, got that sorted. When it comes to showing the six lines though, I'm not sure how to get them to appear on a different line each time. The closest I got, was including the joptionpane in the loop, showing one line, then on the next joptionpane on the next et al. The second joptionpane at the bottom shows all the lines but on the same line instead of the next etc. How ought I make it so they appear on the next line each time? \n doesn't seem to work.
private static void doOptionTwo(int balance) throws IOException {
JOptionPane.showMessageDialog(null, "Option two selected ");
String sum = null;
BufferedReader br = null;
br = new BufferedReader(new FileReader("file1.txt"));
for (int i = 1; i <= 6; i++){
String line1 = br.readLine();
//JOptionPane.showMessageDialog(null, line1);
sum = sum + line1;
}
if (br != null)br.close();
String log = sum;
JOptionPane.showMessageDialog(null, log);
}
Use StringBuilder instead of String which is initialized as null. You could do whatever you want with following code:
StringBuilder stringBuilder = new StringBuilder();
String newLineCharacter = System.getProperty("line.separator");
for (int i = 1; i <= 6; i++){
stringBuilder.append(br.readLine());
stringBuilder.append(newLineCharacter);//note: will add new line at end as well..
}
Just insert break each time in your String :
for (int i = 1; i <= 6; i++) {
String line1 = br.readLine();
sum += line1 + "\n";
}
You can just add "\n" between the lines.
String sum = "";
for (int i = 1; i <= 6; i++){
String line1 = br.readLine();
sum += line1 + "\n";
}
Or more appropriately, use a StringBuilder.
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= 6; i++){
String line1 = br.readLine();
if (sb.length() > 0) {
sb.append('\n');
}
sb.append(line1);
}
String sum = sb.toString();
This is more efficient :
JOptionPane.showMessageDialog(null, "Option two selected ");
StringBuilder build = new StringBuilder();
BufferedReader br = null;
br = new BufferedReader(new FileReader("file1.txt"));
for (int i = 1; i <= 6; i++){
String line1 = br.readLine();
//JOptionPane.showMessageDialog(null, line1);
build.append(sum).append("\n");
}
if (br != null)br.close();
System.out.println(build.toString());

Java Reading 2D array in from file, numbers separated by comma

This is some code that I found to help with reading in a 2D Array, but the problem I am having is this will only work when reading a list of number structured like:
73
56
30
75
80
ect..
What I want is to be able to read multiple lines that are structured like this:
1,0,1,1,0,1,0,1,0,1
1,0,0,1,0,0,0,1,0,1
1,1,0,1,0,1,0,1,1,1
I just want to essentially import each line as an array, while structuring them like an array in the text file.
Everything I have read says to use scan.usedelimiter(","); but everywhere I try to use it the program throws straight to the catch that replies "Error converting number". If anyone can help I would greatly appreciate it. I also saw some information about using split for the buffered reader, but I don't know which would be better to use/why/how.
String filename = "res/test.txt"; // Finds the file you want to test.
try{
FileReader ConnectionToFile = new FileReader(filename);
BufferedReader read = new BufferedReader(ConnectionToFile);
Scanner scan = new Scanner(read);
int[][] Spaces = new int[10][10];
int counter = 0;
try{
while(scan.hasNext() && counter < 10)
{
for(int i = 0; i < 10; i++)
{
counter = counter + 1;
for(int m = 0; m < 10; m++)
{
Spaces[i][m] = scan.nextInt();
}
}
}
for(int i = 0; i < 10; i++)
{
//Prints out Arrays to the Console, (not needed in final)
System.out.println("Array" + (i + 1) + " is: " + Spaces[i][0] + ", " + Spaces[i][1] + ", " + Spaces[i][2] + ", " + Spaces[i][3] + ", " + Spaces[i][4] + ", " + Spaces[i][5] + ", " + Spaces[i][6]+ ", " + Spaces[i][7]+ ", " + Spaces[i][8]+ ", " + Spaces[i][9]);
}
}
catch(InputMismatchException e)
{
System.out.println("Error converting number");
}
scan.close();
read.close();
}
catch (IOException e)
{
System.out.println("IO-Error open/close of file" + filename);
}
}
I provide my code here.
public static int[][] readArray(String path) throws IOException {
//1,0,1,1,0,1,0,1,0,1
int[][] result = new int[3][10];
BufferedReader reader = new BufferedReader(new FileReader(path));
String line = null;
Scanner scanner = null;
line = reader.readLine();
if(line == null) {
return result;
}
String pattern = createPattern(line);
int lineNumber = 0;
MatchResult temp = null;
while(line != null) {
scanner = new Scanner(line);
scanner.findInLine(pattern);
temp = scanner.match();
int count = temp.groupCount();
for(int i=1;i<=count;i++) {
result[lineNumber][i-1] = Integer.parseInt(temp.group(i));
}
lineNumber++;
scanner.close();
line = reader.readLine();
}
return result;
}
public static String createPattern(String line) {
char[] chars = line.toCharArray();
StringBuilder pattern = new StringBuilder();;
for(char c : chars) {
if(',' == c) {
pattern.append(',');
} else {
pattern.append("(\\d+)");
}
}
return pattern.toString();
}
The following piece of code snippet might be helpful. The basic idea is to read each line and parse out CSV. Please be advised that CSV parsing is generally hard and mostly requires specialized library (such as CSVReader). However, the issue in hand is relatively straightforward.
try {
String line = "";
int rowNumber = 0;
while(scan.hasNextLine()) {
line = scan.nextLine();
String[] elements = line.split(',');
int elementCount = 0;
for(String element : elements) {
int elementValue = Integer.parseInt(element);
spaces[rowNumber][elementCount] = elementValue;
elementCount++;
}
rowNumber++;
}
} // you know what goes afterwards
Since it is a file which is read line by line, read each line using a delimiter ",".
So Here you just create a new scanner object passing each line using delimter ","
Code looks like this, in first for loop
for(int i = 0; i < 10; i++)
{
Scanner newScan=new Scanner(scan.nextLine()).useDelimiter(",");
counter = counter + 1;
for(int m = 0; m < 10; m++)
{
Spaces[i][m] = newScan.nextInt();
}
}
Use the useDelimiter method in Scanner to set the delimiter to "," instead of the default space character.
As per the sample input given, if the next row in a 2D array begins in a new line, instead of using a ",", multiple delimiters have to be specified.
Example:
scan.useDelimiter(",|\\r\\n");
This sets the delimiter to both "," and carriage return + new line characters.
Why use a scanner for a file? You already have a BufferedReader:
FileReader fileReader = new FileReader(filename);
BufferedReader reader = new BufferedReader(fileReader);
Now you can read the file line by line. The tricky bit is you want an array of int
int[][] spaces = new int[10][10];
String line = null;
int row = 0;
while ((line = reader.readLine()) != null)
{
String[] array = line.split(",");
for (int i = 0; i < array.length; i++)
{
spaces[row][i] = Integer.parseInt(array[i]);
}
row++;
}
The other approach is using a Scanner for the individual lines:
while ((line = reader.readLine()) != null)
{
Scanner s = new Scanner(line).useDelimiter(',');
int col = 0;
while (s.hasNextInt())
{
spaces[row][col] = s.nextInt();
col++;
}
row++;
}
The other thing worth noting is that you're using an int[10][10]; this requires you to know the length of the file in advance. A List<int[]> would remove this requirement.

Java HashMap size limit? Some keys are disappearing in Bigram Frequency Count

I am writing a simple bigram frequency count algorithm in Java and encountering a problem I don't know how to fix.
My source file is a 9MB .txt file with random words, separated by spaces.
When I run the script limiting the input to the first 100 lines, I get a value of 1 for the frequency of the bigram "hey there".
But when I remove the restriction to only scan the first 100 lines and instead scan the entire file, I get a value of null for the same bigram search. The key/value pair in the HashMap is now null.
I am storing all the bigrams in a HashMap, and using a BufferedReader to read the text file.
What is causing the bigram (key) to be removed from or overwritten in the HashMap? It shouldn't matter if I am reading the entire file or just the first part of it.
public class WordCount {
public static ArrayList<String> words = new ArrayList<String>();
public static Map<String, Integer> bi_count = new HashMap<String, Integer>();
public static void main(String[] args) {
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader(args[0]));
System.out.println("\nProcessing file...");
while (br.readLine() != null) {
// for (int i = 0; i < 53; i++ ) {
sCurrentLine = br.readLine();
if (sCurrentLine != null) {
String[] input_words = sCurrentLine.split("\\s+");
for (int j = 0; j < input_words.length; j++) {
words.add(input_words[j]);
}
}
}
}
catch (IOException e) {
e.printStackTrace();
}
finally {
try {
if (br != null)br.close();
countWords();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
private static void countWords() {
for (int k = 0; k < words.size(); k++) {
String word = words.get(k);
String next = "";
if (k != words.size() - 1) {
next = words.get(k+1);
}
String two_word = word + " " + next;
if (bi_count.containsKey(two_word)) {
int current_count = bi_count.get(two_word);
bi_count.put (two_word, current_count + 1);
}
else {
bi_count.put( two_word, 1);
}
}
System.out.println("File processed successfully.\n");
}
I'm not totally confident this is the cause of your problem, bot you are not reading all lines of your input file.
while (br.readLine() != null) {
sCurrentLine = br.readLine();
The line read in the if() statement is not being processed at all - you are missing alternate lines.
Instead try this:
while ((sCurrentline = nr.readLine()) != null) {
//now use sCurrentLine...
}
This block of code is wrong because readline is called twice:
while (br.readLine() != null) {
// for (int i = 0; i < 53; i++ ) {
sCurrentLine = br.readLine();
if (sCurrentLine != null) {
String[] input_words = sCurrentLine.split("\\s+");
for (int j = 0; j < input_words.length; j++) {
words.add(input_words[j]);
}
}
}
I would suggest:
while ((sCurrentline = nr.readLine()) != null) {
String[] input_words = sCurrentLine.split("\\s+");
for (int j = 0; j < input_words.length; j++) {
words.add(input_words[j]);
}
}

Categories