I'm doing a stop word code for data cleaning. I followed a tutorial in YouTube: https://www.youtube.com/watch?v=ckQUlI7x7hI his code works and shows output but mine doesn't
I'm using english stop words, example of my stop words are "a", "an", "away", "keeps". the input will be "An apple a day keeps the doctor away" output should be "apple day the doctor".
this is the content of my file: https://ufile.io/gikev
Here is the code:
import java.io.FileInputStream;
import java.util.ArrayList;
public class DataCleaning {
public static void main(String[] args) {
ArrayList sw = new ArrayList<>();
try{
FileInputStream x = new FileInputStream("/Users/Dan/Desktop/DATA/stopwords.txt");
byte b[] = new byte[x.available()];
x.read(b);
x.close();
String data[] = new String(b).split("\n");
for(int i = 0; i < data.length; i++)
{
sw.add(data[i].trim());
}
FileInputStream xx = new FileInputStream("/Users/Dan/Desktop/DATA/cleandata.txt");
byte bb[] = new byte[xx.available()];
xx.read(bb);
xx.close();
String dataa[] = new String(bb).split("\n");
for(int i = 0; i < dataa.length; i++)
{
String file = "";
String s[] = dataa[i].split("\\s");
for(int j = 0; j < s.length; i++)
{
if(sw.contains(s[j].trim().toLowerCase()))
{
file=file + s[j] + " ";
}
}
System.out.println(file + "\n");
}
} catch(Exception a){
a.printStackTrace();
}
}
}
and when I run mine it only does this:
what should I do?
There are 3 issues with your code :
You are incrementing the wrong variable in the innermost loop thus
resulting in an infinite loop as j will always be lesser that
s.length and you are never incrementing j. Change this line :
for (int j = 0; j < s.length; i++) {
to
for (int j = 0; j < s.length; j++) {
To print words that are not stopwords you need to negate your if
condition as follows :
if (!sw.contains(s[j].trim().toLowerCase()))
Also, make sure the file stopwords.txt is separated by \n(new
line) because you are splitting it based on that and not like the
file in the link shared by you.
I recommend you to indent your code and also use meaningful names to name your variables. Debugging issues like this will be much simpler.
Related
This is my first post, so I'm sorry if I didn't do it correctly.
I'm trying to do this USACO problem but basically, my code is throwing an error every time for this particular test case for some reason on the .equals line
I know it's alot of code, but it's a really simple problem
Here's the code:
public class gift1 {
public static void main(String[] Args) throws IOException {
Scanner sc = new Scanner(new File("gift1.in"));
int peeps = sc.nextInt();
String[][] chart = new String[2][peeps];
sc.nextLine();
for(int i = 0; i < peeps; i++) {
chart[0][i] = sc.nextLine();
chart[1][i] = "0";
}
while(sc.hasNextLine()) {
String giver = sc.next(); //we need to find giver
int indexOfgiver = -1;
for(int i = 0; i < peeps; i++) { //finds indexOfgiver
if(giver.equals(chart[0][i])) {
indexOfgiver = i;
break;
}
}
int moneyTogive = sc.nextInt();
chart[1][indexOfgiver] = Integer.toString(Integer.parseInt(chart[1][indexOfgiver]) - moneyTogive);
int numReceivers = sc.nextInt();
if(numReceivers == 0) {
chart[1][indexOfgiver] = Integer.toString( Integer.parseInt(chart[1][indexOfgiver]) );
}
else {
chart[1][indexOfgiver] = Integer.toString( Integer.parseInt(chart[1][indexOfgiver]) + (int) Math.floor(moneyTogive%numReceivers) );
}
String[] receivers = new String[numReceivers];
for(int i = 0; i < numReceivers; i++) { //list the receivers' names in an array
receivers[i] = sc.next();
}
for(int i = 0; i < numReceivers; i++) { //give money to the receivers
for(int j = 0; j < peeps; j++) {
if(chart[0][j].equals(receivers[i])) {
chart[1][j] = Integer.toString( Integer.parseInt(chart[1][j]) + (int) Math.floor(moneyTogive/numReceivers));
}
}
}
}
PrintWriter fW = new PrintWriter("gift1.out");
for(int i = 0; i < peeps; i++)
System.out.println(chart[0][i] + " " + chart[1][i]);
}
}
The error is occurring on line 31 (it's the ugly one that starts with chart[1][indexOfgiver]) and it's saying its an ArrayOutOfBoundsException, which means that the if statement line that is changing the value of variable indexOfgiver for some reason isn't working despite the file being correct.
Here's the file("gift1.in") I'm reading from with the scanner:
10
mitnik
Poulsen
Tanner
Stallman
Ritchie
Baran
Spafford
Farmer
Venema
Linus
mitnik
300 3
Poulsen
Tanner
Baran
Poulsen
1000 1
Tanner
Spafford
2000 9
mitnik
Poulsen
Tanner
Stallman
Ritchie
Baran
Farmer
Venema
Linus
Tanner
Even the debugger is showing that during the first run of the while loop, ~giver~ is equal to "mitnik" and so is ~chart[0][0]~ , but the loop isn't setting ~indexOfgiver~ to ~i~. What is exactly happening?
You have space in names in input file, hence entry in chart array is "Spafford " instead of "Spafford" which you are trying to match.
Since it doesnt match index remains as -1 and causes IndexOutofBoundsException.
This is the initial challenge I'm trying to address
1) takes two arguments—a “source" English word in a string, and an English dictionary supplied in an array
2) returns a list of English words as an array
The words returned are those from the dictionary that have four consecutive letters (or more) in common with the “source” word. For example, the word MATTER has the four letters in a row “ATTE" in common ATTEND.
The code however gives me errors with the substring
Below is the code for your reference.
public class FourLetterInCommon {
static String wrd = "split";
static String[] d = new String[]{"SPLITS", "SPLITTED", "SPLITTER", "SPLITTERS", "SPLITTING", "SPLITTINGS", "SPLITTISM", "SPLITTISMS", "SPLITTIST", "SPLITTISTS"};
public static void main(String[] args){
System.out.println(fourletters (wrd, d));
}
public static List<String> fourletters (String word, String[] dict){
int dictsize = dict.length;
int wordlength = word.length();
List<String> Commonletters = new ArrayList<String>();
for(int i = 0; i<=dictsize; i++) {
for (int j=0; j<=wordlength;) {
if(dict[i].contains(word.substring(i, 5)))
{
Commonletters.add(dict[i]);
}
break;
}
}
return Commonletters;
}
}
This is the error message I get:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(Unknown Source) at FourLetterInCommon.fourletters(FourLetterInCommon.java:22) at FourLetterInCommon.main(FourLetterInCommon.java:10)
What does the errors mean? Apologies, but a bit clueless at this stage.
Quite a few issues here.
1)
for(int i = 0; i<=dictsize; i++) {
should be
for(int i = 0; i<dictsize; i++) {
2)
for (int j=0; j<=wordlength;) {
should be
for (int j=0; j<=wordlength-4; j++) {
3)
if(dict[i].contains(word.substring(i, 5)))
should be
if(dict[i].contains(word.substring(j, j+4)))
4)
I don't believe you really want to break; there. I'm guessing you want to break when you find a match, so it should be inside the if statement.
Corrected code:
public class FourLetterInCommon
{
static String wrd = "SPLIT";
static String[] d = new String[] { "SPLITS", "SPLITTED", "SPLITTER", "SPLITTERS", "SPLITTING", "SPLITTINGS",
"SPLITTISM", "SPLITTISMS", "SPLITTIST", "SPLITTISTS" };
public static void main(String[] args)
{
System.out.println(fourletters(wrd, d));
}
public static List<String> fourletters(String word, String[] dict)
{
int dictsize = dict.length;
int wordlength = word.length();
List<String> Commonletters = new ArrayList<String>();
for (int i = 0; i < dictsize; i++)
{
for (int j = 0; j <= wordlength - 4; j++)
{
if (dict[i].contains(word.substring(j, j + 4)))
{
Commonletters.add(dict[i]);
break;
}
}
}
return Commonletters;
}
}
Output:
[SPLITS, SPLITTED, SPLITTER, SPLITTERS, SPLITTING, SPLITTINGS, SPLITTISM, SPLITTISMS, SPLITTIST, SPLITTISTS]
The following line would throw an Exception:
if(dict[i].contains(word.substring(i, 5)))
Here, i ranges from 0 to dict.length which is 10 in this case. word contains 5 characters only. So, accessing any character from index 5 onwards would throw an Exception.
If you want to check for certain number of characters then you should use this:
for (int j=0; j < wordlength;) {
if(dict[i].contains(word.substring(j, wordlength - j)))
package com.cp.javapractice;
import java.util.ArrayList;
import java.util.Scanner;
public class Cp {
public static void main(String args[]) {
ArrayList al = new ArrayList();
Scanner s = new Scanner(System.in);
String str = null;
str = new String();
System.out.println("Enter the string which you want to remove the duplicates");
str = s.nextLine();
String arr[] = str.split(" ");
for (int k = 0; k < arr.length; k++) {
al.add(arr[k]);
}
try {
for (int i = 0; i < arr.length; i++) {
for (int j = i + 1; j < arr.length; j++) {
if (arr[i].equalsIgnoreCase(arr[j])) {
al.remove(j);
}
}
}
System.out.println(al);
}
catch (Exception e) {
System.out.println(e);
}
}
}
I am going to replace the repeating words in particular given string from the user. So, I split the given string with space using split method and put in array as well as in arraylist.
After Iterate through array and checked the condition it is equal then I removed that in ArrayList. But While removing it shows Index out of bound Exception.
This code is working for small array size but shows exception while giving large number of array size.
I am having problem while I am giving the string with array size of 13 words.
Here is my full code.
for (int i = 0; i < al.size(); i++) {
for (int j = i + 1; j < al.size(); j++) {
if (al.get(i).equals(al.get(j)) {
al.remove(j);
}
}
}
The exception is because you are using arr.length instead of al.size(). For every removal, the size of the arraylist al decreases. So, you have to consider using size of arraylist instead of size of the array.
for (int i = 0; i < al.size(); i++) { // change arr.length to al.size()
for (int j = i + 1; j < al.size(); j++) { // change arr.length to al.size()
if (arr[i].equalsIgnoreCase(arr[j])) {
al.remove(j);
}
}
}
I would recommend you to check out HashSet and TreeSet which reduces your effort of removing duplicates.
Implementing in HashSet:
import java.util.Arrays;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
public class Cp {
public static void main(String args[]) {
Scanner s = new Scanner(System.in);
String str = null;
str = new String();
System.out.println("Enter the string which you want to remove the duplicates");
str = s.nextLine();
String arr[] = str.split(" ");
Set<String> ts = new HashSet<String>(Arrays.asList(arr)); // -> added only this line
System.out.println(ts);
}
}
The Problem is your 2nd loop. Is Starts at i+1. But i is from 0 to length -1. So the Last ein will be j=length-1+1 which is beyond Array length.
So Change the 1st for loop to:
for(int i=0;i < arr.length-2;i++)
I want to read and save the content of the file in a 2d array, but I don't know the size of the file, because the program should read different files. So there is the first problem after "new char". I searched for the problem and found that "matrix[x][y]=zeile.charAt(x);"
should be right, but that throws the error "NullPointerException" when I write any number into the first brackets of new char.
Could somebody explain and give some ideas oder solutions? Thank you :)
import java.io.*;
class Unbenannt
{
public static void main(String[] args) throws IOException
{
FileReader fr = new FileReader("Level4.txt");
BufferedReader br = new BufferedReader(fr);
String zeile = br.readLine();
char [][] matrix = new char [][];
while(zeile != null )
{
int y = 0;
for(int x = 0; x < zeile.length(); x++) {
matrix[x][y] = zeile.charAt(x);
}
y++;
} System.out.print(matrix);
br.close();
}
}
Arrays are stored as blocks in memory in order to achieve O(1) operations, which is why you need to define their size during definition. If you insist on arrays (rather than a dynamic ADT such as List), you'll need to know the dimensions in advance.
What you could do is store the file lines temporarily in a list and find out the maximum line length, i.e.:
List<String> lines = new ArrayList<String>();
String zeile = null;
int max = 0;
while ((zeile = br.readLine()) != null) {
lines.add(zeile);
if (zeile.length() > max)
max = zeile.length();
}
char[][] matrix = new char[lines.length()][max];
// populate the matrix:
for (int i = 0; i < lines.length(); i++) {
String line = lines.get(i);
for (int j = 0; j < line.length(); j++) {
matrix[i][j] = line.charAt(j);
}
}
Note that since char is a primitive, you'll be initialized with the default value 0 (the integer, not the character!) in every cell of the inner array, so for lines which are shorter than the others, you'll have trailing zero characters.
you initialize the matrix (char [][]) but you never initialize any of the inbound arrays. This leads to the NullPointerException.
In addition your 'while' condition looks invalid, seems you only are reading the first line of your file here > your code will never complete and read the first line over and over again
Thank you all! It works! But there is still one problem. I changed lines.length() into lines.size(), because it doesn't work with length. The problem is the output. It shows for example: xxxx xxxx instead of "xxx" and "x x" and "xxx" among each other.
How can I build in a line break?
my programcode is:
import java.io.*;
import java.util.ArrayList;
class Unbenannt
{
public static void main(String[] args) throws IOException
{
FileReader fr = new FileReader("Level4.txt");
BufferedReader br = new BufferedReader(fr);
ArrayList<String> lines = new ArrayList<String>();
String zeile = null;
int max = 0;
while ((zeile = br.readLine()) != null) {
lines.add(zeile);
if (zeile.length() > max)
max = zeile.length();
}
char [][] matrix = new char[lines.size()][max];
for(int i = 0; i < lines.size(); i++) {
String line = lines.get(i);
for(int j = 0; j < line.length(); j++) {
matrix[i][j] = line.charAt(j);
System.out.print(matrix[i][j]);
}
}
br.close();
}
}
I'm supposed to sort a char array and print it in descending order. Should be simple, however, I'm not getting the output I want. All solutions around the internet tells me Arrays.sort is okay to use, but am I supposed to use another method? Or am I overlooking something?
public class mainClassTextFile {
public static void main(String[] args) throws IOException {
FileReader fileReader = new FileReader("file.txt");
String fileContents = "";
int i;
int loopcount = 0;
int count = 0;
while((i = fileReader.read())!=-1){
char ch = (char)i;
fileContents = fileContents + ch;
}
char[] ch=fileContents.toCharArray();
for(int n = 0; n < ch.length; n++) {
boolean addCharacter=true;
for(int t = 0; t < n; t++) {
if (ch[n] == (fileContents.charAt(t)))
addCharacter=false;
}
if (addCharacter) {
for(int j = 0; j < fileContents.length(); j++) {
if(ch[n]==fileContents.charAt(j))
count=count+1;
}
Arrays.sort(ch);
System.out.print(ch[n] + ": "+(count));
System.out.println();
count=0;
loopcount++;
}
}
}
}
The output is supposed to be all the characters in the text, counted and sorted, however this is the result:
: 3339
X: 4
X: 4
X: 4
X: 4
[: 2
]: 2
If I // Arrays.sort() then I get all the characters in the text file counted correctly, however they are neither sorted or in descending order!
Your for-loop really doesnot make sense to me but if you want to read string from a file convert it to charArray and sort it you can do it this way
FileReader fileReader = new FileReader("yourFile.txt");
StringBuilder br = new StringBuilder();
while(true){
int ch = fileReader.read();
if(ch==-1)
break;
char chArr = (char)ch;
br.append(chArr);
}
char[] ch=br.toString().replaceAll("\\s+", "").toCharArray(); //removed all spaces
System.out.println(Arrays.toString(ch));
Arrays.sort(ch);
System.out.println("After sorting : "+Arrays.toString(ch)); // ascending order
for(int i = ch.length - 1; i >= 0; i--)
System.out.println(arr[i]); //descending order
Your code had a lot of mistakes, including conceptual ones; so instead of getting into detailed explanations on how it could be corrected, I submit code which works and which I tried to make as similar to yours:
public static void main(String[] args) throws IOException {
final String fileContents;
try (Scanner sc = new Scanner(new File("file.txt"))) {
fileContents = sc.useDelimiter("\\Z").next();
}
final char[] ch = fileContents.toCharArray();
Arrays.sort(ch);
int prevChar = -1, count = 0;
for (int i = 0; i < ch.length; i++) {
if (ch[i] != prevChar) {
if (count > 0) System.out.println((char)prevChar + ": "+count);
count = 1;
prevChar = ch[i];
} else count++;
}
if (count > 0) System.out.println((char)prevChar + ": "+count);
}
Note that I took the liberty to completely change the routine to load the whole file as a String. This is because I regard file reading as a side concern here.
The loop works by going through the sorted array and emitting count each time it encounters a character different from the previous one. In the end we have a duplicated line of code which prints the final character.