Invert Concordance Using Java - java

today I'm working with a client that creates a concordance from a text file using Java. All I need to do is invert the concordance to essentially recreate the text from start to finish. Now, the issue I seem to be having is where to start and how to do each step. As of now I have tried to create an array of words and iterate through my symbol table and assign each key to the array. Then I end up getting just a list of words from the concordance. For some reason this problem makes me feel very stupid because it seems like it should be a simple solution. I can't seem to think of any valid ideas to get me started with recreating the story. I have included the source here:
public class InvertedConcordance {
public static ST<String, SET<Integer>> createConcordance (String[] words) {
ST<String, SET<Integer>> st = new ST<String, SET<Integer>>();
for (int i = 0; i < words.length; i++) {
String s = words[i];
if (!st.contains(s)) {
st.put(s, new SET<Integer>());
}
SET<Integer> set = st.get(s);
set.add(i);
}
return st;
}
public static String[] invertConcordance (ST<String, SET<Integer>> st) {
//This is what I have so far
//Here is what I have that doesnt work
for(String key : st.keys())
{
inv[i++] = key;
}
for(int z = 0; z< inv.length; z++)
{
System.out.println(inv[z]);
}
String[]inv = new String[st.size()];
return inv;
}
private static void saveWords (String fileName, String[] words) {
int MAX_LENGTH = 70;
Out out = new Out (fileName);
int length = 0;
for (String word : words) {
length += word.length ();
if (length > MAX_LENGTH) {
out.println ();
length = word.length ();
}
out.print (word);
out.print (" ");
length++;
}
out.close ();
}
public static void main(String[] args) {
String fileName = "data/tale.txt";
In in = new In (fileName);
String[] words = in.readAll().split("\\s+");
ST<String, SET<Integer>> st = createConcordance (words);
StdOut.println("Finished building concordance");
// write to a file and read back in (to check that serialization works)
//serialize ("data/concordance-tale.txt", st);
//st = deserialize ("data/concordance-tale.txt");
words = invertConcordance (st);
saveWords ("data/reconstructed-tale.txt", words);
}
}

First of all - why are you using some weird classes like:
SET
ST
instead of built-in java classes:
Set
Map
Which are nedded here?
As for your problem, your code should not compile at all since you are declaring the variable inv AFTER using it:
public static String[] invertConcordance (ST<String, SET<Integer>> st) {
//This is what I have so far
//Here is what I have that doesnt work
for(String key : st.keys())
{
inv[i++] = key;
}
for(int z = 0; z< inv.length; z++)
{
System.out.println(inv[z]);
}
String[]inv = new String[st.size()];
return inv;
}
If I understand your idea correctly, the concordances simply creates the list of words and sets containing indices on which there were found. If this is a correct interpretation then an inverse operation would be:
public static String[] invertConcordance (ST<String, SET<Integer>> st) {
//First - figure out the length of the document, which is simply the maximum index in the concordancer
int document_length = 0;
for(String key : st.keys()){
for(Integer i : st.get(key)){
if(i>document_length){
document_length=i;
}
}
}
//Create the document
String[] document = new String[document_length+1];
//Reconstruct
for(String key : st.keys()){
for(Integer i : st.get(key)){
document[i] = key;
}
}
return document;
}
I assumed, that indices are numbered from 0 to the document's length-1, if there are actually stored from the 1 to document'length you should modify lines:
String[] document = new String[document_length+1];
to
String[] document = new String[document_length];
and
document[i] = key;
to
document[i-1] = key;

Related

How can I double the size of the array without getting NullPointException?

First, for quick context, here's my post from yesterday:
How to work around a NullPointerException in Java?
So I'm getting this NullPointerException, which I now believe is occurring before I try to find the index of the first duplicate in the array of strings. Before I search for the index of the first duplicate, I double the size of the string array using this method:
static String[] upSizeArr( String[] fullArr )
{
int size = fullArr.length;
String[] newSizeArr = new String[(2 * size)];
for (int a = 0; a < size; a++) {
newSizeArr[a] = fullArr[a];
}
return newSizeArr;
}
and I then use that method in the context of this while loop:
static final int CAPACITY = 10;
int wordCount = 0;
BufferedReader wordFile = new BufferedReader( new FileReader(args[1]) );
String[] wordList = new String[CAPACITY];
while ( wordFile.ready() )
{ if ( wordCount == wordList.length )
wordList = upSizeArr( wordList );
wordList[wordCount++] = wordFile.readLine();
}
wordFile.close();
Is there any possible work around for this using the upSizeArr method? I would prefer the solution be basic and using only arrays with no other data structures. I am new to programming and am really trying to get a grasp of the fundamentals...been looking for a solution to this NullPointException for about a week or so now.
Here is the code in it's entirety:
import java.io.*;
import java.util.*;
public class Practice
{
static final int CAPACITY = 10;
static final int NOT_FOUND = -1;
public static void main (String[] args) throws Exception
{
if (args.length < 1 )
{
System.out.println("\nusage: C:\\> java Practice <words filename>\n\n"); // i.e. C:\> java Lab2 10Kints.txt 172822words.txt
System.exit(0);
}
String[] wordList = new String[CAPACITY];
int wordCount = 0;
BufferedReader wordFile = new BufferedReader( new FileReader(args[0]) );
while ( wordFile.ready() ) // i.e. while there is another line (word) in the file
{ if ( wordCount == wordList.length )
wordList = upSizeArr( wordList );
wordList[wordCount++] = wordFile.readLine();
} //END WHILE wordFile
wordFile.close();
System.out.format( "%s loaded into word array. size=%d, count=%d\n",args[0],wordList.length,wordCount );
int dupeIndex = indexOfFirstDupe( wordList, wordCount );
if ( dupeIndex == NOT_FOUND )
System.out.format("No duplicate values found in wordList\n");
else
System.out.format("First duplicate value in wordList found at index %d\n",dupeIndex);
} // END OF MAIN
// TWO METHODS
static String[] upSizeArr( String[] fullArr )
{
int size = fullArr.length; //find the length of the arrays
String[] newSizeArr = new String[(2 * size)]; // creates new array, doubled in size
for (int a = 0; a < size; a++) {
newSizeArr[a] = fullArr[a];
}
return newSizeArr;
}
static int indexOfFirstDupe( String[] arr, int count )
{
Arrays.sort(arr);
int size = arr.length;
int index = NOT_FOUND;
for (int x = 0; x < size; x++) {
for (int y = x + 1; y < size; y++) {
if (arr[x].equals(arr[y])) {
index = x;
break;
}
}
}
return index;
}
} // END OF PROGRAM
Also, the file that's being used as the argument is a txt file of strings.
I'm not sure if it's the cause of your problem, but it is very suspicious...
while ( wordFile.ready() ) {
//...
}
is not how you should be reading the file. Instead, you should be checking the return result of readLine, which will return null when it reaches the end of the file.
Maybe something more like....
try (BufferedReader wordFile = new BufferedReader(new FileReader(args[1]))) {
String[] wordList = new String[CAPACITY];
String text = null;
while ((text = wordFile.readLine()) != null) {
if (wordCount == wordList.length) {
wordList = upSizeArr(wordList);
}
wordList[wordCount++] = text;
}
} catch (IOException ex) {
ex.printStackTrace();
}
Your code also runs the risk of leaving the file resource open. The above example makes use of the try-with-resources statement to ensure that it is closed properly, regardless of the success of the operation.
Take a look at The try-with-resources Statement for more details.
Unless it's a specific requirement, I would also recommend using an ArrayList or System.arraycopy over rolling your own solution like this.
Maybe have a look at List Implementations for some more details
Update from runnable example...
After having a play without a runnable example of the code, when upSizeArr creates a new array, it's defaulting the new elements to null, which is expected, I'm surprised that Arrays.sort can't handle this.
"A" solution is to fill the unused space with a different non-default value...
static String[] upSizeArr(String[] fullArr) {
int size = fullArr.length; //find the length of the arrays
String[] newSizeArr = new String[(2 * size)]; // creates new array, doubled in size
for (int a = 0; a < size; a++) {
newSizeArr[a] = fullArr[a];
}
for (int a = size; a < newSizeArr.length; a++) {
newSizeArr[a] = "";
}
return newSizeArr;
}
"Another" solution might be to "downsize" the array to fit the available data...
static String[] downsizeToCapacity(String[] fullArr) {
int lastIndex = 0;
while (lastIndex < fullArr.length && fullArr[lastIndex] != null) {
lastIndex++;
}
if (lastIndex >= fullArr.length) {
return fullArr;
}
String[] downSized = new String[lastIndex];
System.arraycopy(fullArr, 0, downSized, 0, lastIndex);
return downSized;
}
All this tries to do is create a new array whose size is only large enough to contain all the none-null values and return that.
You could then use to something like...
System.out.format("%s loaded into word array. size=%d, count=%d\n", "words.txt", wordList.length, wordCount);
wordList = downsizeToCapacity(wordList);
System.out.format("%s loaded into word array. size=%d, count=%d\n", "words.txt", wordList.length, wordCount);
int dupeIndex = indexOfFirstDupe(wordList, wordCount);
Which, in my testing, outputs
words.txt loaded into word array. size=160, count=99
words.txt loaded into word array. size=99, count=99
No duplicate values found in wordList

Reversing a HashMap storing words and their line numbers

I have a HashMap
HashMap<String, LinkedList<Integer>> indexMap;
which is storing all words in a file and their corresponding line numbers where they appear.
Example -
This is just an example
to demonstrate what I am saying an is
Would display
This [1]
demonstrate [2]
an [1 2]
is [1 2]
...
....
And so on. I want to reverse this HashMap so that it displays the words stored at each line number.
For the particular example above, it should display
1 [This, an, just, example, is]
2 [demonstrate, what, to, I, am, saying, is, an]
For this particular task, this is what I have done -
import java.io.FileReader;
import java.io.LineNumberReader;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
public class ReverseIndex {
private static Map<String, LinkedList<Integer>> indexMap = new HashMap<String, LinkedList<Integer>>();
public static LinkedList<Integer> getIndex(String word) {
return indexMap.get(word);
}
public static void main(String[] args) {
try {
LineNumberReader rdr = new LineNumberReader(
new FileReader(
args[0]));
String line = "";
int lineNumber = 0;
//CREATING THE INITIAL HASHMAP WHICH WE WANT TO REVERSE
while ((line = rdr.readLine()) != null) {
lineNumber++;
String[] words = line.split("\\s+");
for (int i = 0; i < words.length; i++) {
LinkedList<Integer> temp = new LinkedList<Integer>();
if (getIndex(words[i]) != null)
temp = getIndex(words[i]);
temp.add(lineNumber);
indexMap.put(words[i], temp);
}
}
//FINISHED CREATION
Map<Integer, LinkedList<String>> myNewHashMap = new HashMap<Integer, LinkedList<String>>();
for(Map.Entry<String, LinkedList<Integer>> entry : indexMap.entrySet()){
LinkedList<Integer> values = entry.getValue();
String key = entry.getKey();
LinkedList<String> temp = new LinkedList<String>();
for(int i = 0; i <= lineNumber; i++) {
if(values.contains(i)) {
if(!temp.contains(key))
temp.add(key);
myNewHashMap.put(i, temp);
}
}
}
for(Map.Entry<Integer, LinkedList<String>> entry : myNewHashMap.entrySet()){
Integer tester = entry.getKey();
LinkedList<String> temp2 = new LinkedList<String>();
temp2 = entry.getValue();
System.out.print(tester + " ");
for(int i = 0; i < temp2.size(); i++) {
System.out.print(temp2.get(i) + " ");
}
System.out.println();
}
rdr.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
However the problem with this is, for the example that we had above, it would print -
1 example
2 an
How could I reverse it so that it works perfectly with the expected output?
Just replace the first for loop in your main with the below code. I have made some changes to you original code as per convention like moved variable declaration out of loop and changed the logic in a way it checks if the LinkedList<'String'> already exists for the line number if so add it to the list or else create a new LinkedList<'String'> and then add word.
LinkedList<Integer> values = null;
String key = null;
LinkedList<String> temp = null;
for(Map.Entry<String, LinkedList<Integer>> entry : indexMap.entrySet())
{
values = entry.getValue();
key = entry.getKey();
temp = new LinkedList<String>();
for(int value : values)
{
temp = myNewHashMap.get(value);
if(temp == null )
{
temp = new LinkedList<String>();
myNewHashMap.put(value,temp);
}
temp.add(key);
}
}

Duplicates in Array even though a Set was used

For a class project, we have to take a string(a paragraph),make it into an array of the individual words, and then make those words into objects of Object Array. The words cannot repeat so I used a Set to only get the unique values but only certain words are repeating! Here is the code for the method. Sorry for the vague description.
Private void processDocument()
{
String r = docReader.getLine();
lines++;
while(docReader.hasLines()==true)
{
r= r+" " +docReader.getLine();
lines++;
}
r = r.trim();
String[] linewords = r.split(" ");
while(linewords.length>words.length)
{
this.expandWords();
}
String[] newWord = new String[linewords.length];
for(int i=0;i<linewords.length;i++)
{
newWord[i] = (this.stripPunctuation(linewords[i]));
}
Set<String> set = new HashSet<String>(Arrays.asList(newWord));
Object[]newArray = set.toArray();
words = new Word[set.size()-1];
String newString = null;
for(int i =0;i<set.size();i++)
{
if(i==0)
{
newString = newArray[i].toString() + "";
}
else
{
newString = newString+newArray[i].toString()+" ";
}
}
newString = newString.trim();
String[] newWord2 = newString.split(" ");
for(int j=0;j<set.size()-1;j++)
{
Word newWordz = new Word(newWord2[j].toLowerCase());
words[j] = newWordz;
}
I believe the problem is when you put it into the HashSet the words are capitalized differently, causing the HashCode to be different. Cast everything to lowercase the moment you read it from the file and it should work.
newWord[i] = (this.stripPunctuation(linewords[i])).toLowerCase();
Try this:
public String[] unique(String[] array) {
return new HashSet<String>(Arrays.asList(array)).toArray();
}
Shamelessly copied from Bohemain's answer.
Also, as noted by #Brinnis, make sure that words are trimmed and in the right case.
for(int i = 0; i < linewords.length; i++) {
newWord[i] = this.stripPunctuation(linewords[i]).toLowerCase();
}
String[] newArray = unique(newWord);

LZW decoding miss the first code entry

I followed the Rosetta Java code implementation.
I tried do this LZW coding with my own Dictionary and not with the ASCII Dictionary which was used.
When I try with my own Dictioanry there is a problem about decoding... The result is wrong, because each of decoded word don't view the first 'a' letter.
The result have to be 'abraca abrac abra' and not 'braca brac bra'
I see the problem in decode() method at String act = "" + (char)(int)compressed.remove(0); This will remove all first 'a' letter.
But I don't have any ideas how can I modify this line...
For example if I use the String act = "";instead of above line... the coding will be very wrong, or use another command... I don't know how can I solve this little problem... Or maybe I am looking for on the bad way for the solution.
public class LZW {
public static List<Integer> encode(String uncompressed) {
Map<String,Integer> dictionary = DictionaryInitStringInt();
int dictSize = dictionary.size();
String act = "";
List<Integer> result = new ArrayList<Integer>();
for (char c : uncompressed.toCharArray()) {
String next = act + c;
if (dictionary.containsKey(next))
act = next;
else {
result.add(dictionary.get(act));
// Add next to the dictionary.
dictionary.put(next, dictSize++);
act = "" + c;
}
}
// Output the code for act.
if (!act.equals(""))
result.add(dictionary.get(act));
return result;
}
public static String decode(List<Integer> compressed) {
Map<Integer,String> dictionary = DictionaryInitIntString();
int dictSize = dictionary.size();
String act = "" + (char)(int)compressed.remove(0);
//String act = "";
String result = act;
for (int k : compressed) {
String entry;
if (dictionary.containsKey(k))
entry = dictionary.get(k);
else if (k == dictSize)
entry = act + act.charAt(0);
else
throw new IllegalArgumentException("Nincs ilyen kulcs: " + k);
result += entry;
dictionary.put(dictSize++, act + entry.charAt(0));
act = entry;
}
return result;
}
public static Map<String,Integer> DictionaryInitStringInt()
{
char[] characters = {'a','b','c','d','e','f','g','h','i','j', 'k','l','m','n',
'o','p','q','r','s','t','u','v','w','x','y','z',' ','!',
'?','.',','};
int charactersLength = characters.length;
Map<String,Integer> dictionary = new HashMap<String,Integer>();
for (int i = 0; i < charactersLength; i++)
dictionary.put("" + characters[i], i);
return dictionary;
}
public static Map<Integer,String> DictionaryInitIntString()
{
char[] characters = {'a','b','c','d','e','f','g','h','i','j', 'k','l','m','n',
'o','p','q','r','s','t','u','v','w','x','y','z',' ','!',
'?','.',','};
int charactersLength = characters.length;
Map<Integer,String> dictionary = new HashMap<Integer,String>();
for (int i = 0; i < charactersLength; i++)
dictionary.put(i,"" + characters[i]);
return dictionary;
}
public static void main(String[] args) {
List<Integer> compressed = encode("abraca abrac abra");
System.out.println(compressed);
String decodeed = decode(compressed);
// decodeed will be 'braca brac bra'
System.out.println(decodeed);
}
}
The rosetta example use
"" + (char) (int) compressed.remove(0);
because the first 256 entries of the dictionnary map exactly the 'char' values.
With a custom dictionnary this line should be:
String act = dictionary.get(compressed.remove(0));

java delete reverse string in list

I have struct Array or List String like:
{ "A.B", "B.A", "A.C", "C.A" }
and I want delete reverse string from list that end of only:
{ "A.B", "A.C" }
how type String use and how delete reverse String?
To reverse a string I recommend using a StringBuffer.
String sample = "ABC";
String reversed_sample = new StringBuffer(sample).reverse().toString();
To delete object form you ArrayList use the remove method.
String sample = "ABC";String to_remove = "ADS";
ArrayList<String> list = new ArrayList<Sample>();
list.add(to_remove);list.add(sample );
list.remove(to_remove);
You can get use of a HashMap to determine whether a string is a reversed version of the other strings in the list. And you will also need a utility function for reversing a given string. Take a look at this snippets:
String[] input = { "A.B", "B.A", "A.C", "C.A" };
HashMap<String, String> map = new HashMap<String, String>();
String[] output = new String[input.length];
int index = 0;
for (int i = 0; i < input.length; i++) {
if (!map.containsKey(input[i])) {
map.put(reverse(input[i]), "default");
output[index++] = input[i];
}
}
A sample String-reversing method could be like this:
public static String reverse(String str) {
String output = "";
int size = str.length();
for (int i = size - 1; i >= 0; i--)
output += str.charAt(i) + "";
return output;
}
Output:
The output array will contain these elements => [A.B, A.C, null, null]
A code is worth thousand words.....
public class Tes {
public static void main(String[] args) {
ArrayList<String> arr = new ArrayList<String>();
arr.add("A.B");
arr.add("B.A");
arr.add("A.C");
arr.add("C.A");
System.out.println(arr);
for (int i = 0; i < arr.size(); i++) {
StringBuilder str = new StringBuilder(arr.get(i));
String revStr = str.reverse().toString();
if (arr.contains(revStr)) {
arr.remove(i);
}
}
System.out.println(arr);
}
}
You can do this very simply in O(n^2) time. Psuedocode:
For every element1 in the list:
For every element2 in the list after element1:
if reverse(element2).equals(element1)
list.remove(element2)
In order to make your life easier and prevent ConcurrentModificationException use Iterator. I won't give you the code because it's a good example to learn how to properly use iterators in Java.
Reverse method:
public String reverse(String toReverse) {
return new StringBuilder(toReverse).reverse().toString();
}
Edit: another reverse method:
public String reverse(String toReverse) {
if (toReverse != null && !toReverse.isEmpty) {
String[] elems = toReverse.split(".");
}
StringBuilder reversedString = new StringBuilder("");
for (int i = elems.length - 1; i >= 0; i++) {
reversedString.append(elems[i]);
reversedString.append(".");
}
return reversedString.toString();
}
Check this
public static void main(String arg[]){
String str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
List<String> strList = new ArrayList<String>();
strList.add("A.B");
strList.add("B.A");
strList.add("A.C");
strList.add("C.A");
Iterator<String> itr = strList.iterator();
while(itr.hasNext()){
String [] split = itr.next().toUpperCase().split("\\.");
if(str.indexOf(split[0])>str.indexOf(split[1])){
itr.remove();
}
}
System.out.println(strList);
}
output is
[A.B, A.C]
You can iterate the list while maintaining a Set<String> of elements in it.
While you do it - create a new list (which will be the output) and:
if (!set.contains(current.reverse())) {
outputList.append(current)
set.add(current)
}
This solution is O(n*|S|) on average, where n is the number of elements and |S| is the average string length.
Java Code:
private static String reverse(String s) {
StringBuilder sb = new StringBuilder();
for (int i = s.length()-1 ; i >=0 ; i--) {
sb.append(s.charAt(i));
}
return sb.toString();
}
private static List<String> removeReverses(List<String> arr) {
Set<String> set = new HashSet<String>();
List<String> res = new ArrayList<String>();
for (String s : arr) {
if (!set.contains(reverse(s))) {
res.add(s);
set.add(s);
}
}
return res;
}
public static void main(String[]args){
String[] arr = { "a.c" , "b.c", "c.a", "c.b" };
System.out.println(removeReverses(arr));
}
will yield:
[a.c, b.c]

Categories