so I'm still fairly new to Java and Android programming. I have designed a simple Text Lingo Android app. Everything works well, but I was just wondering if there is an easier way to create my own "dictionary" of words. Currently...my code involves about 80 lines of
HashMap<String> words //new hashMap
words.put("Lol,"laugh out loud");
words.put(someKey, someValue); //repeat for 80 different words and counting..
I don't know much about databases, although I don't know if that would really make it easier. Just wondering. Thanks.
The method you have shown looks pretty easy but time consuming. The only other idea I could have would be if you have the words in a .csv file you could read the file, then split the string on the delimiter (usually a comma) then iterate over the string array.
That's only faster if you already have a file with the words in.
There is no shorter way indeed. Though it is possible to store the data somewhere online in a XML file and load/parse this data only when you need it. you'll have to make a network connection. See this link.
Related
I'm starting to work on a new Java desktop app that should help me and my colleagues learn vocabulary. It will contain around 700 words, some texts (that point to the words contained in them) and maybe some images (not sure about that part yet). The data will never change and I want the program to be able to run offline.
The question is: Should I use database, text file or serialize the data into file? Or perhaps if there is any other option I don't know about? If you could explain your choice in detail I would be glad.
If the data never changes and is only 700 words it would probably be easiest to use a file.
If your data was a bit more complex and had many fields and was being constantly updated, a database would be more preferable but a csv file could still be used.
Since you want to access this data offline and data never changes, I think the best option would be to just use text file, which will be more efficient in terms of access and speed.
Keep all the data in memory as Serializable Java objects, and store them serialized when your application is not running. Evaluate airomem - really nice solution that would perfectly work for you.
When I google "how to make a dictionary", it gives me a great measure of the explanation of "make", which is very helpful. But I need something else, so I put this question here.
I want to make a small project. I want to make a dictionary with java or android. But I don't know how should I organize the words. I have considered a JSON file, a XML file or I can also simply output all the words as ojbects into a file. Could anyone please give me some adivce?
Assuming that you want to be able to read (quickly) values from your dictionary, and maybe update values or create new values then I suggest that you store your dictionary in a Database. For a simple Java database I suggest that you use an embedded Derby Database.
see http://db.apache.org/derby/
I'm writing a tool to analyze stock market data. For this I download data and then save all the data corresponding to a stock as a double[][] 20*100000 array in a data.bin on my hd, I know I should put it in some database but this is simply performance wise the best method.
Now here is my problem: I need to do updates and search on the data:
Updates: I have to append new data to the end of the array as time progresses.
Search: I want to iterate over different data files to find a minimum or calculate moving averages etc.
I could do both of them by reading the whole file in and update it writing or do search in a specific area... but this is somewhat overkill since I don't need the whole data.
So my question is: Is there a library (in Java) or something similar to open/read/change parts of the binary file without having to open the whole file? Or searching through the file starting at a specific point?
RandomAccessFile allows seeking into particular position in a file and updating parts of the file or adding new data to the end without rewriting everything. See the tutorial here: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
You could try looking at Random Access Files:
Tutorial: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
API: http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html
... but you will still need to figure out the exact positions you want to read in a binary file.
You might want to consider moving to a database, maybe a small embedded one like H2 (http://www.h2database.com)
I want to build an XML file as a datastore. It should look something like this:
<datastore>
<item>
<subitem></subitem>
...
<subitem></subitem>
</item>
....
<item>
<subitem></subitem>
...
<subitem></subitem>
</item>
</datastore>
At runtime I may need to add items to the datastore. The number of items may be high, so that I don't want to hold the whole document in memory and can't use DOM. I just want to write the part where a change occures. Or does DOM supports this?
I had a first look at StAX, but I am not sure if it does what I want.
Wouldn't it be the best to remember a cursor position at the end of the file just right before the root element is beeing closed? That is always the position where new items will be added. So if I remember that position and keep it up to date during changes, I could add an new item at the end, without iterating through the whole file .
Maybe a second cursor, could be used independendly from the first one, to iterate over the document just for reading purposes.
I can't see that StAX supports any of this, does it?
Isn't there a block based API for files instead of a stream bases one? Aren't files and filesystems typical examples for block "devices"? And if there is such an API, does it help me with my problem?
Thanks in advance.
Updating XML is basically impossible because there's no "cheap" way to insert data.
Appending XML is not so bad. All you need to do there is seek to the end of the file, then GO BACK over the "end tag" (</datastore> in this case), and then just start writing. This is a cheap operation all told, but none of the frameworks really support this as they're all mostly designed to work with well formed, full boat XML documents, as a whole, not in pieces.
You could use a StAX like thing, but in this case, StAX isn't aware of the <datastore> tag, rather it's just aware of the <item> tags and its elements. Then you create Items and start writing, over and over and over, to the same OutputStream that you have set up.
That's the best way to do this.
But if you need to delete or change data, then you get to rewrite stuff, or do hacks, such as marking elements as "inactive", hunting them down in the XML file, seeking to the 'active="Y"' attribute, and then inplace changing the Y to N. It can be done, it will be mostly efficient, but its far and away outside what the normal XML processing frameworks let you do. If I were to do that, I'd read the entire file and keep track of those entries and note their locations within it so later I could easily seek and change them efficiently.
Then when you update something, you "inactivate" the old one, and "append" the new one. Eventually get to GC the file by rewriting it all and throwing out the old, "inactive" entries.
As a rule of thumb, XML files aren't very efficient as datastores, not for the record-based data you seem to want to use them for.
But if you've already got the file and absolutely can't do anything about it, you can use StAX XMLEventReaders and XMLEventWriters to read through a file quickly and insert/modify elements in it.
But when I say quickly, what I mean is more quickly than DOM would be, but nowhere near as effective as any relational DB.
Update: Another option you can consider is vtd-xml, although I haven't tried it in real projects, it actually looks pretty decent.
If you always want to append items at the end, then the best way to handle this is to have two XML files. The outer one datstore.xml is simply a wrapper, and looks like this:
<!DOCTYPE datastore [
<!ENTITY e SYSTEM "items.xml">
]>
<datastore>&e;</datastore>
The file items.xml looks like this:
<item>....</item>
<item>....</item>
<item>....</item>
with no wrapper element.
When you want to append data, you can open items.xml and write to the end of it. When you want to read data, open datastore.xml with an XML parser.
Of course, once your data grows beyond 20Mb or so, it may well be better to use an XML database. But I've been using this approach for years for records of Saxon orders, with files that are currently about 8Mb, and it works fine.
It's not very easy or efficient to partially update an XML file so you won't find much support for it as a use case.
Really it sound like you need a proper database, perhaps with a tool to export the data as XML.
If you don't want to use a DB and insist on storing the data purely as XML you might consider keeping all your items in memory as objects. Whenever a new one is added you can write all of them out to XML. It might seem inefficient, but depending on your data size might still be good enough.
If you choose this path, you might want to check out the Xstream library to make this quite easy, see stream tutorial for a quick example.
I need to test my data structure (in java) which is like a dictionary - holds a key/value map. I would like to know how do you test your data structure? I would like to insert real words in my data structure and then find them. I am wondering if there is a way to download all the english words and then I can read that file and populate my structure. Once populated, I can perform many searches and produce some real statistics of how long does it take to search?
There are indeed several open-source dictionaries for the English language, e.g. the WordNet file.
That said, I must insist that the English language is not a “closed” language, nor does it have one true official definition. As such, there is no dictionary that contains “all the English words” and such a dictionary can never exist: English words are made up all the time, and once enough people use them, the become part of the English language. Case in point: “to google.”
Perhaps Project Gutenberg would be helpful. I've used them on past CS projects. They provide plain text files (e.g. The Valley of Fear), which should be easy to process. You may want to skip over the headers to avoid skewing the results.
This will let you test your dictionary by keeping e.g. a word->count mapping (e.g. Map<String, Integer>) of the words in the file.
If you're on Linux, you could use the contents of /usr/share/dict/words; there's also WordNet, an English word database.
If you have a key-value pair you probably don't want a simple list of words, you want words to definitions or to words in other languages.
If you don't mind parsing a text file, IDP has a bunch of files for download royalty free.