Counting lines, words, characters and top ten words?

Counting lines, words, characters and top ten words? - java

Hi I'm pretty new to Stack Overflow so I hope that I'm doing this correctly and that someone out there has the answer I need.
I'm currently coding a program in Java with Eclipse IDE an my question is this:
I need a snippet of code that does the following
It's supposed to get a .TXT file containing text and from that .TXT file
count the number of rows and print it,
count the number of words and print it,
count the number of characters and print it.
And finally make a list of the top 10 words used and print that.
Allt the printing is done to system outprintln
I'm pretty new to Java and am having some difficulties.
Anyone out there who can provide me with these lines of code or that knows where i can find them? I want to study the code provided that's how I learn best=)
Thanks to all
Didnt find the edit button sorry...
I Added this to my question:
Hehe it´s an assignment but not a homework assignment ok i see well i could provide what i've done so far, i think im pretty close but it´s not working for me. Is there anything i have missed?
// Class Tip
import java.io.*;
import java.util.*;
class Tip
{
public static void main(String [] args) throws Exception
{
String root = System.getProperty("user.dir");
InputStream is = new FileInputStream( root + "\\tip.txt" );
Scanner scan = new Scanner( is );
String tempString = "";
int lines = 0;
int words = 0;
Vector<Integer> wordLength = new Vector<Integer>();
int avarageWordLength = 0;
while(scan.hasNextLine() == true)
{
tempString = scan.nextLine();
lines++;
}
is.close();
is = new FileInputStream( root );
scan = new Scanner( is );
while(scan.hasNext() == true)
{
tempString = scan.next();
wordLength.add(tempString.length());
words++;
}
for(Integer i : wordLength)
{
avarageWordLength += i;
}
avarageWordLength /= wordLength.size();
System.out.println("Lines : " + lines);
System.out.println("Words : " + words);
System.out.println("Words Avarage Length : " + avarageWordLength);
is.close();
}
}

This sounds a bit too much like a homework assignment to warrant providing a full answer, but I'll give you some tips on where to look in the Java API:
FileReader and BufferedReader for getting the data in.
Collections API for storing your data
A custom data structure for storing your list of words and occurence count
Comparator or Comparable for sorting your data structure to get the top 10 list out
Once you've started work and have something functioning and need specific help, come back here with specific questions and then we'll do our best to help you.
Good luck!

Typing "java count words example" into Google came up with a few suggestions.
This link looks to be a decent starting point.
This simple example from here might also give you some ideas:
public class WordCount
{
public static void main(String args[])
{
System.out.println(java.util.regex.Pattern.compile("[\\w]+").split(args[0].trim()).length);
}
}

Here's a solution:
public static void main(String[] args) {
int nRows = 0;
int nChars = 0;
int nWords = 0;
final HashMap<String, Integer> map = new HashMap<String, Integer>();
try {
BufferedReader input = new BufferedReader(new FileReader("c:\\test.txt"));
try {
String line = null;
Pattern p = Pattern.compile("[^\\w]+");
while ((line = input.readLine()) != null) {
nChars += line.length();
nRows++;
String[] words = p.split(line);
nWords += words.length;
for (String w : words) {
String word = w.toLowerCase();
Integer n = map.get(word);
if (null == n)
map.put(word, 1);
else
map.put(word, n.intValue() + 1);
}
}
TreeMap<String, Integer> treeMap = new TreeMap<String, Integer>(new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
if (map.get(o1) > map.get(o2))
return -1;
else if (map.get(o1) < map.get(o2))
return 1;
else
return o1.compareTo(o2);
}
});
treeMap.putAll(map);
System.out.println("N.º Rows: " + nRows);
System.out.println("N.º Words: " + nWords);
System.out.println("N.º Chars: " + nChars);
System.out.println();
System.out.println("Top 10 Words:");
for (int i = 0; i < 10; i++) {
Entry<String, Integer> e = treeMap.pollFirstEntry();
System.out.println("Word: " + e.getKey() + " Count: " + e.getValue());
}
} finally {
input.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}

Not a complete answer but I'd recomend looking at Sun's Java IO tutorials. It deals with reading and writing from files. Especially the tutorial on Scanners and Formaters
Here is the summary of the tutorial from the website
Programming I/O often involves
translating to and from the neatly
formatted data humans like to work
with. To assist you with these chores,
the Java platform provides two APIs.
The scanner API breaks input into
individual tokens associated with bits
of data. The formatting API assembles
data into nicely formatted,
human-readable form.
So to me it looks like it is exactly the APIs you are asking about

You might get some leverage out of using Apache Commons Utils which has a handy util called WordUtil that does some simple things with sentences and words.

Related

Why is my radix sorting algorithm returning a partially sorted list?

First off I want to point out that this assignment is homework /but/ I am not looking for a direct answer, but rather at a hint or some insight as to why my implementation is not working.
Here is the given: We are provided with a list of words of 7 characters long each and are asked to sort them using the Radix Sorting Algorithm while using queues.
EDIT 1: Updated Code
Here is my code:
import java.util.*;
import java.io.File;
public class RadixSort {
public void radixSort() {
ArrayList<LinkedQueue> arrayOfBins = new ArrayList<LinkedQueue>();
LinkedQueue<String> masterQueue = new LinkedQueue<String>();
LinkedQueue<String> studentQueue = new LinkedQueue<String>();
//Creating the bins
for (int i = 0; i < 26; i++) {
arrayOfBins.add(new LinkedQueue<String>());
}
// Getting the file name and reading the lines from it
try {
Scanner input = new Scanner(System.in);
System.out.print("Enter the file name with its extension: ");
File file = new File(input.nextLine());
input = new Scanner(file);
while (input.hasNextLine()) {
String line = input.nextLine();
masterQueue.enqueue(line);
}
input.close();
} catch (Exception ex) {
ex.printStackTrace();
}
for (int p = 6; p >= 0; p--) {
for (LinkedQueue queue : arrayOfBins) {
queue.clear();
}
while (masterQueue.isEmpty() == false) {
String s = (String) masterQueue.dequeue();
char c = s.charAt(p);
arrayOfBins.get(c-'a').enqueue(s);
}
for (LinkedQueue queue : arrayOfBins) {
studentQueue.append(queue);
}
}
masterQueue = studentQueue;
System.out.println(masterQueue.size());
System.out.println(masterQueue.dequeue());
}
public static void main(String [] args) {
RadixSort sort = new RadixSort();
sort.radixSort();
}
}

I can see so many problems, I'm not sure how you get an answer at all.
Why do you have two nested outermost loops from 0 to 6?
Why don't you ever clear studentQueue?
The j loop doesn't execute as many times as you think it does.
Aside from definite bugs, the program doesn't output anything -- are you just looking at the result in the debugger? Also are you actually allowed to assume that the words will contain no characters besides lowercase letters?

Compare content of two text files and split words java

I know this question has been already asked several times but I can't find the way to apply it on my code.
So my propose is the following:
I have two files griechenland_test.txt and outagain5.txt . I want to read them and then get which percentage of outagain5.txt is inside the other file.
Outagain5 has input like that:
mit dem 542824
und die 517126
And Griechenland is an normal article from Wikipedia about that topic (so like normal text, without freqeuncy Counts).
1. Problem
- How can I split the input in bigramms? Like every two words, but always with the one before? So if I have words A, B, C, D --> get AB, BC, CD ?
I have this:
while ((sCurrentLine = in.readLine()) != null) {
// System.out.println(sCurrentLine);
arr = sCurrentLine.split(" ");
for (int i = 0; i < arr.length; i++) {
if (null == hash.get(arr[i])) {
hash.put(arr[i], 1);
} else {
int x = hash.get(arr[i]) + 1;
hash.put(arr[i], x);
}
}
Then I read the other file with this code ( I just add the word, and not the number (I split it with 4 spaces, so the two words are at h[0])).
for (String line = br.readLine(); line != null; line = br.readLine()) {
String h[] = line.split(" ");
words.add(h[0]);
}
2. Problem
Now I make the comparsion between the String x in hash and the String s in words. I have put the else System out.print to get which words are not contained in outagain5.txt, but there are several words printed out which ARE contained in outagain5.txt. I don't understand why :D
So I think that the comparsion doesn't work well or maybe this will be solved will fix the first problem.
ArrayList<String> words = new ArrayList<String>();
ArrayList<String> neuS = new ArrayList<String>();
ArrayList<Long> neuZ = new ArrayList<Long>();
for (String x : hash.keySet()) {
summe = summe + hash.get(x);
long neu = hash.get(x);
for (String s : words) {
if (x.equals(s)) {
neuS.add(x);
neuZ.add(neu);
disc = disc + 1;
} else {
System.out.println(x);
break;
}
}
}
Hope I made my question clear, thanks a lot!!

public static List<String> ngrams(int n, String str) {
List<String> ngrams = new ArrayList<String>();
String[] words = str.split(" ");
for (int i = 0; i < words.length - n + 1; i++)
ngrams.add(concat(words, i, i+n));
return ngrams;
}
public static String concat(String[] words, int start, int end) {
StringBuilder sb = new StringBuilder();
for (int i = start; i < end; i++)
sb.append((i > start ? " " : "") + words[i]);
return sb.toString();
}
It is much easier to use the generic "n-gram" approach so you can split every 2 or 3 words if you want. Here is the link I used to grab the code from: I have used this exact code almost any time I need to split words in the (AB), (BC), (CD) format. NGram Sequence.

If I recall, String has a method titled split(regex, count) that will split the item according to a specific point and you can tell it how many times to do it.
I am referencing this JavaDoc https://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String, int).
And I guess for running comparison between two text files I would recommend having your code read both of them, populated two unique arrays and then try to run comparisons between the two strings each time. Hope I helped.

List names and number of occurrences

So the task was to read a file with the following names:
Alice
Bob
James
Richard
Bob
Alice
Alice
Alice
James
Richard
Bob
Richard
Bob
Stephan
Michael
Henry
And print out each name with its value of occurrence e.g "Alice - <4>".
I got it working, basically. The only problem I have is that the last name (Stephan - <1>) is missing in my output and I can't get it to work properly.. It's probably because I used [i-1] but as I said, I'm not getting the right solution here.
Well, here's my code..
package Assignment4;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.util.Arrays;
public class ReportUniqueNames {
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println (" This programm counts words, characters and lines!\n");
System.out.println ("Please enter the name of the .txt file:");
BufferedReader input = new BufferedReader(new InputStreamReader (System.in));
BufferedReader read = null;
String file = "";
String text = "";
String line = "";
boolean unique = true;
int nameCounter = 1;
try {
file = input.readLine();
read = new BufferedReader (new FileReader(file));
while ((line = read.readLine()) != null) {
text += line.trim() + " ";
}
} catch (FileNotFoundException e) {
System.out.println("File was not found.");
} catch (IOException e) {
System.out.println("An error has occured.");
}
String textarray[] = text.split(" ");
Arrays.sort(textarray);
for (int i=0; i < textarray.length; i++) {
if (i > 0 && textarray[i].equals(textarray[i-1])) {
nameCounter++;
unique = false;
}
if (i > 0 && !textarray[i].equals(textarray[i-1]) && !unique) {
System.out.println("<"+textarray[i-1]+"> - <"+nameCounter+">");
nameCounter = 1;
unique = true;
} else if (i > 0 && !textarray[i].equals(textarray[i-1]) && unique) {
//nameCounter = 1;
System.out.println("<"+textarray[i-1]+"> - <"+nameCounter+">");
}
}
}
}
So that's it.. Hopefully one of you could help me out.
EDIT: Wow, so many different approaches. First of all thanks for all of your help. I'll look through your suggested solutions and maybe restart from the bottom ;). Will give you a heads up when I'm done.

You could simply use a Map (that emulates a "Multiset") for the purpose of counting words:
String textarray[] = text.split(" ");
// TreeMap gives sorting by alphabetical order "for free"
Map<String, Integer> wordCounts = new TreeMap<>();
for (int i = 0; i < textarray.length; i++) {
Integer count = wordCounts.get(textarray[i]);
wordCounts.put(textarray[i], count != null ? count + 1 : 1);
}
for (Map.Entry<String, Integer> e : wordCounts.entrySet()) {
System.out.println("<" + e.getKey() + "> - <" + e.getValue() + ">");
}

You can use Scanner to read your input file (whose location is denoted by "filepath") using the new line character as your delimiter and add the words directly to an ArrayList<String>.
Then, iterate the ArrayList<String> and count the frequency of each word in your original file in a HashMap<String, Integer>.
Full Working Code:
Scanner s = new Scanner(new File("filepath")).useDelimiter("\n");
List<String> list = new ArrayList<>();
while (s.hasNext()){
list.add(s.next());
}
s.close();
Map<String, Integer> wordFrequency = new HashMap<>();
for(String str : list)
{
if(wordFrequency.containsKey(str))
wordFrequency.put(str, wordFrequency.get(str) + 1); // Increment the frequency by 1
else
wordFrequency.put(str, 1);
}
//Print the frequency:
for(String str : list)
{
System.out.println(str + ": " + wordFrequency.get(str));
}
EDIT:
Alternatively, you can read the entire file into a single String and then split the contents of the String using \n as delimiter into a list. The code is shorter than the first option:
String fileContents = new Scanner(new File("filepath")).useDelimiter("\\Z").next(); // \Z is the end of string anchor, so the entire file is read in one call to next()
List<String> list = Arrays.asList(fileContents.split("\\s*\\n\\s*"));// Using new line character as delimiter, it adds every word to the list

I'd do it like this:
Map<String,Integer> occurs = new HashMap<String,Integer>();
int i = 0, number;
for (; i < textarray.length; i++) {
if (occurs.containsKey(textarray[i])) {
number = occurs.get(testarray[i]);
occurs.put(testarray[i], number + 1);
} else {
occurs.put(testarray[i], 1);
}
}
for(Map.Entry<String, Integer> entry : occurs.entrySet()){
System.out.println("<" + entry.getKey() + "> - " + entry.getValue());
}

System.out.println("<"+textarray[textarray.length-1]+"> - <"+nameCounter+">");
you need this after your loop because you print only till i-1 even though your loop runs correct number of times
But using a map is a better choice

Because your code prints the results for a name the first time that that name isn't the same any more. Then you are missing a print statement for the last entry. To solve this you can just add another if statement at the end of your loop that checks if this is the last time the loop will loop. The if statement would look like this:
if(i == textarray.length - 1){
System.out.println("<"+textarray[i]+"> - <"+nameCounter+">");
}
Now the loop will look like this:
for (int i=1; i < textarray.length; i++) {
if (i > 0 && textarray[i].equals(textarray[i-1])) {
nameCounter++;
unique = false;
}
if (i > 0 && !textarray[i].equals(textarray[i-1]) && !unique) {
System.out.println("<"+textarray[i-1]+"> - <"+nameCounter+">");
nameCounter = 1;
unique = true;
}
else if (i > 0 && !textarray[i].equals(textarray[i-1]) && unique) {
//nameCounter = 1;
System.out.println("<"+textarray[i-1]+"> - <"+nameCounter+">");
}
if(i == textarray.length - 1){
System.out.println("<"+textarray[i]+"> - <"+nameCounter+">");
}
}
And now the loop will also print the results for the last entry in the list.
I hope this helps :)
P.S. some of the other solutions here are far more efficient but this is a solution for your current approach.

I want to discuss the logic you used initially to solve the problem of uniqueness of values in an array of strings.
You just compared two cells of the array, and supposing if they are not equal this means that the name of textarray[i] is unique !
This is false, because it can occur later on while your "unique" boolean variable was set to true.
example:
john | luke| john|charlotte|
comparing the first and second will give you that both john and luke are unequal and comparing them again will say also that they are unequal too when the "i" of loop advances, but this is not the truth.
so lets imagine that we have no map in java, how to solve this with algorithms ?
I will help you with an idea.
1 - create a function that takes in parameter the string that you want to verify and the table
2- then loop all the table testing if the string is equal the cell of the current table if yes, return null or -1
3- if you finish looping the table until the last cell of the array, this means that your string is unique just print if on screen.
4- call this function textarray.length times
and you will have on your screen only the unique names.

First things first:
You won't need your text variable since we will be replacing it with a more appropriate data structure. You need one to store the names that you found so far in the file, along with an integer (number of occurrences) for each name that you found.
Like Dmitry said, the best data structure you can use for this particular case is Hashtable or HashMap.
Assuming that the file structure is a single name per line without any punctuation or spaces, your code would look something like this:
try {
Hashtable<String,Integer> table = new Hashtable<String,Integer>();
file = input.readLine();
read = new BufferedReader (new FileReader(file));
while ((line = read.readLine()) != null) {
line.trim();
if(table.containsKey(line))
table.put(line, table.get(line)+1);
else
table.put(line, 1);
}
System.out.println(table); // looks pretty good and compact on the console... :)
} catch (FileNotFoundException e) {
System.out.println("File was not found.");
} catch (IOException e) {
System.out.println("An error has occured.");
}

counting the duplicates in the arraylist [duplicate]

This question already has answers here:
How to remove duplicates from a list?
(15 answers)
Closed 8 years ago.
I have a text file that contains:
File1.txt
File2.doc
File3.out
File4.txt
File5.so
File6.dll
I'm trying to get the output to return just the extension name and how many times it has occurred in the text file.
So for this specific file the output should return:
txt 2
doc 1
out 1
so 1
dll 1
The output needs to be in the java output, not the file it self.
I have this so far:
import java.util.*;
import java.io.*;
public class prob03 {
public static void main(String[] args) throws Exception {
Scanner input = new Scanner(new File("prob03.txt"));
ArrayList<String> extensionsArray = new ArrayList<String>();
while (input.hasNextLine()) {
String line = input.nextLine();
String[] parts = line.split("\\.");
String part1 = parts[0];
String extension = parts[1];
extensionsArray.add(extension);
}
for (int i = 0; i < extensionsArray.size(); i++) {
for (int j = 0; j < i; j++) {
if (extensionsArray.get(i).equals(extensionsArray.get(j))) {
}
}
}
}
}
Please help me figure out this problem, I'm not trying to copy and this is not my homework or any kind of assignment. Let me know if I'm going in the right direction please. thank you

If you really want to be really lazy, you can do something like:
int dupeCount = extensionsArray.size() - new HashSet<String>(extensionsArray).size();
Which basically subtracts the total number of elements by the number of unique elements (Sets don't allow duplicates), which will give you the number of elements that were a duplicate.

You can use a HashMap, where the key is the file extension and the value is the count of times you found it.
HashMap<String, Integer> extensionsCount = new HashMap<>();
for (String extension: extensionsArray) {
Integer count = extensionsCount.get(extension);
if (count == null) {
count = 0;
}
count++;
extensionsCount.put(extension, count);
}

Use a HashMap
ArrayList<String> files // <-- your list of filenames
HashMap<String, Integer> extensions = new HashMap<String, Integer>();
for (String filename : files) {
String ext = filename.split(".")[1];
if (extensions.containsKey()) {
Integer count = extensions.get(ext);
count++;
extensions.put(ext, count);
}
else {
extensions.put(ext, new Integer(1))
}
}
for (String ext : extensions.ketSet()) {
System.out.println(ext + ", " + exttensions.get(ext));
}
This will give you the counts of ALL file extensions.

How to Check for Deleted Words Between 2 Sentences in Java

What's the best approach in Java if you want to check for words that were deleted from sentence A in sentence B. For example:
Sentence A: I want to delete unnecessary words on this simple sentence.
Sentence B: I want to delete words on this sentence.
Output: I want to delete (unnecessary) words on this (simple) sentence.
where the words inside the parenthesis are the ones that were deleted from sentence A.

Assuming order doesn't matter: use commons-collections.
Use String.split() to split both sentences into arrays of words.
Use commons-collections' CollectionUtils.addAll to add each array into an empty Set.
Use commons-collections' CollectionUtils.subtract method to get A-B.

Assuming order and position matters, this looks like it would be a variation of the Longest Common Subsequence problem, a dynamic programming solution.
wikipedia has a great page on the topic, there's really too much for me to outline here
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Everyone else is using really heavy-weight algorithms for what is actually a very simple problem. It could be solved using longest common subsequence, but it's a very constrained version of that. It's not a full diff; it only includes deletes. No need for dynamic programming or anything like that. Here's a 20-line implementation:
private static String deletedWords(String s1, String s2) {
StringBuilder sb = new StringBuilder();
String[] words1 = s1.split("\\s+");
String[] words2 = s2.split("\\s+");
int i1, i2;
i1 = i2 = 0;
while (i1 < words1.length) {
if (words1[i1].equals(words2[i2])) {
sb.append(words1[i1]);
i2++;
} else {
sb.append("(" + words1[i1] + ")");
}
if (i1 < words1.length - 1) {
sb.append(" ");
}
i1++;
}
return sb.toString();
}
When the inputs are the ones in the question, the output matches exactly.
Granted, I understand that for some inputs there are multiple solutions. For example:
a b a
a
could be either a (b) (a) or (a) (b) a and maybe for some versions of this problem, one of these solutions is more likely to be the "actual" solution than the other, and for those you need some recursive or dynamic programming approach... but let's not make it too much more complicated than what Israel Sato originally asked for!

String a = "I want to delete unnecessary words on this simple sentence.";
String b = "I want to delete words on this sentence.";
String[] aWords = a.split(" ");
String[] bWords = b.split(" ");
List<String> missingWords = new ArrayList<String> ();
int x = 0;
for(int i = 0 ; i < aWords.length; i++) {
String aWord = aWords[i];
if(x < bWords.length) {
String bWord = bWords[x];
if(aWord.equals(bWord)) {
x++;
} else {
missingWords.add(aWord);
}
} else {
missingWords.add(aWord);
}
}

This works well....for updated strings also
updated strings enclosed with square brackets.
import java.util.*;
class Sample{
public static void main(String[] args){
Scanner sc=new Scanner(System.in);
String str1 = sc.nextLine();
String str2 = sc.nextLine();
List<String> flist = Arrays.asList(str1.split("\\s+"));
List<String> slist = Arrays.asList(str2.split("\\s+"));
List<String> completedString = new ArrayList<String>();
String result="";
String updatedString = "";
String deletedString = "";
int i=0;
int startIndex=0;
int endIndex=0;
for(String word: slist){
if(flist.contains(word)){
endIndex = flist.indexOf(word);
if(!completedString.contains(word)){
if(deletedString.isEmpty()){
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
}
}
startIndex=endIndex+1;
if(!deletedString.isEmpty()){
result += "("+deletedString.substring(0,deletedString.length()-1)+") ";
deletedString="";
}
if(!updatedString.isEmpty()){
result += "["+updatedString.substring(0,updatedString.length()-1)+"] ";
updatedString="";
}
result += word+" ";
completedString.add(word);
if(i==slist.size()-1){
endIndex = flist.size();
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
startIndex = endIndex+1;
}
}
else{
if(i == 0){
boolean boundaryCheck = false;
for(int j=i+1;j<slist.size();j++){
if(flist.contains(slist.get(j))){
endIndex=flist.indexOf(slist.get(j));
boundaryCheck=true;
break;
}
}
if(!boundaryCheck){
endIndex = flist.size();
}
if(!completedString.contains(word)){
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
}
startIndex = endIndex+1;
}else if(i == slist.size()-1){
endIndex = flist.size();
if(!completedString.contains(word)){
for(int j=startIndex;j<endIndex;j++){
deletedString+= flist.get(j)+" ";
}
}
startIndex = endIndex+1;
}
updatedString += word+" ";
completedString.add(word);
}
i++;
}
if(!deletedString.isEmpty()){
result += "("+deletedString.substring(0,deletedString.length()-1)+") ";
}
if(!updatedString.isEmpty()){
result += "["+updatedString.substring(0,updatedString.length()-1)+"] ";
}
System.out.println(result);
}
}

This is basically a differ, take a look at this:
diff
and the root algorithm:
Longest common subsequence problem
Here's a sample Java implementation:
http://introcs.cs.princeton.edu/java/96optimization/Diff.java.html
which compares lines. The only thing you need to do is split by word instead of by line or alternatively put each word of both sentences in a separate line.
If e.g. on Linux, you can actually see the results of the latter option using diff program itself before you even write any code, try this:
$ echo "I want to delete unnecessary words on this simple sentence."|tr " " "\n" > 1
$ echo "I want to delete words on this sentence."|tr " " "\n" > 2
$ diff -uN 1 2
--- 1 2012-10-01 19:40:51.998853057 -0400
+++ 2 2012-10-01 19:40:51.998853057 -0400
## -2,9 +2,7 ##
want
to
delete
-unnecessary
words
on
this
-simple
sentence.
The lines with - in front are different (alternatively, it would show + if the lines were added into sentence B that were not in sentence A). Try it out to see if that fits your problem.
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Counting lines, words, characters and top ten words? - java

You might get some leverage out of using Apache Commons Utils which has a handy util called WordUtil that does some simple things with sentences and words.

Related

Why is my radix sorting algorithm returning a partially sorted list?

Compare content of two text files and split words java

List names and number of occurrences

counting the duplicates in the arraylist [duplicate]

How to Check for Deleted Words Between 2 Sentences in Java

Categories

Resources