Here is a piece of data I am working with:
snmp-server view DenyAll iso excluded
snmp-server view iso_view iso included
snmp-server view Cust_View interfaces included
snmp-server view Cust_View ifMIB included
I am attemping to get it into a YML format as seen below:
snmp-server:
view:
Cust_View:
- "interfaces included"
- "ifMIB included"
- "etc etc etc"
DenyAll: "iso included"
iso_view: "iso included"
I've tried to Iterate through the data set, split each piece of data by a space, and use the first two elements in the list as the "key" in the YML file, and the remaining elements in the list as the values.
However this doesn't fit any other data set which I might want to format in the same way.
I am not looking for the code to be written for me. I am looking for ideas on how I'd go about doing this and outputting it into the structure I'd like, I'm perfectly fine writing to a YML etc, the only part I'm struggling on is the formatting of data
You need to use a Trie (Prefix tree) for your task. Read each line, separate the words by space, and then insert it into a trie. Then start from the root of the trie and try to print the elements in an pre-order traversal and use tab (or space) for indentation at each level.
It also looks like that you need the data be printed sorted alphabetically. You can achieve this by inserting the nodes in a sorted order the trie.
Related
I'm completely new to programming and to java in particular and I am trying to determine which data structure to use for a specific situation. Since I'm not familiar with Data Structures in general, I have no idea what structure does what and what the limitations are with each.
So I have a CSV file with a bunch of items on it, lets say Characters and matching Numbers. So my list looks like this:
A,1,B,2,B,3,C,4,D,5,E,6,E,7,E,8,E,9,F,10......etc.
I need to be able to read this in, and then:
1)display just the letters or just the numbers sorted alphabetically or numerically
2)search to see if an element is contained in either list.
3)search to see if an element pair (for example A - 1 or B-10) is contained in the matching list.
Think of it as an excel spreadsheet with two columns. I need to be able to sort by either column while maintaining the relationship and I need to be able to do an IF column A = some variable AND the corresponding column B contains some other variable, then do such and such.
I need to also be able to insert a pair into the original list at any location. So insert A into list 1 and insert 10 into list 2 but make sure they retain the relationship A-10.
I hope this makes sense and thank you for any help! I am working on purchasing a Data Structures in Java book to work through and trying to sign up for the class at our local college but its only offered every spring...
You could use two sorted Maps such as TreeMap.
One would map Characters to numbers (Map<Character,Number> or something similar). The other would perform the reverse mapping (Map<Number, Character>)
Let's look at your requirements:
1)display just the letters or just the numbers sorted alphabetically
or numerically
Just iterate over one of the maps. The iteration will be ordered.
2)search to see if an element is contained in either list.
Just check the corresponding map. Looking for a number? Check the Map whose keys are numbers.
3)search to see if an element pair (for example A - 1 or B-10) is
contained in the matching list.
Just get() the value for A from the Character map, and check whether that value is 10. If so, then A-10 exists. If there's no value, or the value is not 10, then A-10 doesn't exist.
When adding or removing elements you'd need to take care to modify both maps to keep them in sync.
I have been researching different methods for saving and loading configuration settings for my application. I've looked into Preferences, JSON, Properties and XML but I think I've settled on using the Properties method for most of my application settings.
However, I'm not able to find any information on how to best save and load an ArrayList from that file. It seems there are only individual key/pair string combinations possible.
So my question is basically, is there a better way to do this? I have an ArrayList of Strings in my application that I need to be able to save and load. Can this be done with Properties or do I need to use a separate file just to hold this list and then read it in as an ArrayList (per line, perhaps)?
EDIT: I should mention, I would like to keep all config files as readable text so I am avoiding using Serialization.
You can use commas to place multiple values on the same key.
key:value1,value2,value3
Then split them using the split function of a string after reading them in which will give you a String[] array which can be turned into an ArrayList via Arrays.asList().
Here's a partial MCVE:
ArrayList<String> al = new ArrayList<>();
al.add("value1");
al.add("value2");
al.add("value3");
String values = al.toString();
//Substring used to get rid of "[" and "]"
prop.setProperty("name",values.substring(1,values.length() - 1);
I found that using the following combination worked perfectly in my case.
Save:
String csv = String.join(",", arrayList());
props.setProperty("list", csv);
This will create a String containing each element of the ArrayList, separated with a comma.
Load:
arrayList = Arrays.asList(csv.split(","));
Takes the csv String and splits it at each comma, adding the elements to the arrayList reference.
I've seen two approaches for writing lists to a Properties file. One is to store each element of the list as a separate entry by adding the index to the name of the property—something like "mylist.1", "mylist.2". The other is to make a single value of the elements, separated by a delimiter.
The advantage of the first method is that you can handle any value without worrying about what to do if the value contains the delimiter. The advantage of the second is that you can retrieve the whole list without iterating over all entries in the Properties.
In either case, you probably want to write a wrapper (or find a library) around the Properties object that adds methods to store and retrieve lists using whichever scheme you choose. Often these wrappers have methods to validate and convert other common data types, like numbers and URLs.
Question: I have two files one with list of serial number,items,price, location and other file has items. So i would like compare two files and printout the number times items are repeated in file1 with serial number.
Text1 file will have
Text2 file will have
Output should be
So the file1 is not formatted in proper order and file 2 is in order (line by line).
Since you have no apparent code or effort put into this, I'll only hint/guide you to some tools you can use.
For parsing strings: http://docs.oracle.com/javase/6/docs/api/java/lang/String.html
For reading in from a file: http://www.roseindia.net/java/beginners/java-read-file-line-by-line.shtml
And I would recommend reading file #2 first and saving those values to an arraylist, perhaps, so you can iterate through them later on when you do your searching.
Okay my approach to this would be
Read in the file1 and file2 into a string
"Split" the string in file 1 as well as file2 based on "," if that is what is being used
Check for the item in every 3rd one so my iteration would iterate +3 every time (You might need to sort if not in order both of these)
If found store in an Array,ArrayList etc. Go back to Step 3 if more items present. Else stop
Even though your file1 is not well formatted, it's content has some pattern which you can use to read it successfully.
For each item, it has all the information (i.e. serial number, name, price, location) but not in a certain order. So, you have pay attention to and use the following patterns while you read each item from the file1 -
Serial number is always a plain integer.
Price has that $ and . character.
Location is 2-character long, all capital.
And name is a string can not be any of the above.
Such problems are not best solved by monolithic JAVA code. If you don't have tool constraint then recommended way to solve it is to import data from file 1 into a database table and then run queries from your program to fetch whatever information you like. You can easily select serial numbers based on items and group them for count based on location.
This approach will ensure that you can keep up with changing requirements and if your files are huge you will have good performance.
I hope you are well versed with SQL and DB tools, so I have not posted any details on them.
Use regex.
Step one, tracing and splitting at [\d,], store results in map
Step two, read in the word from the second file. say it's "pen"
Step three, do regex search "pen" on each string within the map.
Step four, if the above returns true , do something like ([A-Z][A-Z],) on each string within the map.
I have one List in C#.This String array contains elements of Paragraph that are read from the Ms-Word File.for example,
list 0-> The picture above shows the main report which will be used for many of the markup samples in this chapter. There are several interesting elements in this sample document. First there rae the basic text elements, the primary building blocks for your document. Next up is the table at the bottom of the report which will be discussed in full, including the handy styling effects such as row-banding. Finally the image displayed in the header will be added to finalize the report.
list 1->The picture above shows the main report which will be used for many of the markup samples in this chapter. There are several interesting elements in this sample document. First there rae the basic text elements, the primary building blocks for your document. Various other elements of WordprocessingML will also be handled. By moving the formatting information into styles a higher degree of re-use is made possible. The document will be marked using custom XML tags and the insertion of other advanced elements such as a table of contents is discussed. But before all the advanced features can be added, the base of the document needs to be built.
Some thing like that.
Now My search String is :
The picture above shows the main report which will be used for many of the markup samples in this chapter. There are several interesting elements in this sample document. First there rae the basic text elements, the primary building blocks for your document. Next up is the table at the bottom of the report which will be discussed in full, including the handy styling effects such as row-banding. Before going over all the elements which make up the sample documents a basic document structure needs to be laid out. When you take a WordprocessingML document and use the Windows Explorer shell to rename the docx extension to zip you will find many different elements, especially in larger documents.
I want to check my search String with that list elements.
my criteria is "If each list element contains 85% match or exact match of search string then we want to retrieve that list elements.
In our case,
list 0 -> more satisfies my search string.
list 1 -it also matches some text,but i think below not equal to my criteria...
How i do this kind of criteria based search on String...?
I have more confusion on my problem also
Welcome your ideas and thoughts...
The keyword is DISTANCE or "string distance". and also, "Paragraph similarity"
You seek to implement a function which would express as a scalar, say a percentage as suggested in the question, indicative of how similar a string is from another string.
Plain string distance functions such as hamming or Levenstein may not be appropriate, for they work at character level rather than at word level, but generally these algorithms convey the idea of what is needed.
Working at word level you'll probably also want to take into account some common NLP features, for example ignore (or give less weight to) very common words (such as 'the', 'in', 'of' etc.) and maybe allow for some forms of stemming. The order of the words, or for the least their proximity may also be of import.
One key factor to remember is that even with relatively short strings, many distances functions can be quite expensive, computationally speaking. Before selecting one particular algorithm you'll need to get an idea of the general parameters of the problem:
how many strings would have to be compared? (on average, maximum)
how many words/token do the string contain? (on average, max)
Is it possible to introduce a simple (quick) filter to reduce the number of strings to be compared ?
how fancy do we need to get with linguistic features ?
is it possible to pre-process the strings ?
Are all the records in a single language ?
Comparing Methods for Single Paragraph Similarity Analysis, a scholarly paper provides a survey of relevant techniques and considerations.
In a nutshell, the the amount of design-time and run-time one can apply this relatively open problem varies greatly and is typically a compromise between the level of precision desired vs. the run-time resources and the overall complexity of the solution which may be acceptable.
In its simplest form, when the order of the words matters little, computing the sum of factors based on the TF-IDF values of the words which match may be a very acceptable solution.
Fancier solutions may introduce a pipeline of processes borrowed from NLP, for example Part-of-Speech Tagging (say for the purpose of avoiding false positive such as "SAW" as a noun (to cut wood), and "SAW" as the past tense of the verb "to see". or more likely to filter outright some of the words based on their grammatical function), stemming and possibly semantic substitutions, concept extraction or latent semantic analysis.
You may want to look into lucene for Java or lucene.net for c#. I don't think it'll do the percentage requirement you want out of the box, but it's a great tool for doing text matching.
You maybe could run a separate query for each word, and then work out the percentage yourself of ones that matched.
Here's an idea (and not a solution by any means but something to get started with)
private IEnumerable<string> SearchList = GetAllItems(); // load your list
void Search(string searchPara)
{
char[] delimiters = new char[]{' ','.',','};
var wordsInSearchPara = searchPara.Split(delimiters, StringSplitOptions.RemoveEmptyEntries).Select(a=>a.ToLower()).OrderBy(a => a);
foreach (var item in SearchList)
{
var wordsInItem = item.Split(delimiters, StringSplitOptions.RemoveEmptyEntries).Select(a => a.ToLower()).OrderBy(a => a);
var common = wordsInItem.Intersect(wordsInSearchPara);
// now that you know the common items, you can get the differential
}
}
I m MCS 2nd year student.I m doing a project in Java in which I have different images. For storing description of say IMAGE-1, I have ArrayList named IMAGE-1, similarly for IMAGE-2 ArrayList IMAGE-2 n so on.....
Now I need to develop a search engine, in which i need to find a all image's whose description matches with a word entered in search engine..........
FOR EX If i enter "computer" then I should be able to find all images whose description contain "computer".
So my question is...
How should i do this efficiently?
How should i maintain all those
ArrayList since i can have 100 of
such...? or should i use another
data structure instead of ArrayList?
A simple implementation is to tokenize the description and use a Map<String, Collection<Item>> to store all items for a token.
Building:
for(String token: tokenize(description)) map.get(token).add(item)
(A collection is needed as multiple entries could be found for a token. The initialization of the collection is missing in the code. But the idea should be clear.)
Use:
List<Item> result = map.get("Computer")
The the general purpose HashMap implementation is not the most efficient in this case. When you start getting memory problems you can look into a tree implementation that is more efficient (like radix trees - implementation).
The next step could be to use some (in-memory) database. These could be relational (HSQL) or not (Berkeley DB).
If you have a small number of images and short descriptions (< 1000 characters), load them into an array and search for words using String.indexOf() (i.e. one entry in the array == one complete image description). This is efficient enough for, say, less than 10'000 images.
Use toLowerCase() to fold the case of the characters (so users will find "Computer" when they type "computer"). String.indexOf() will also work for short words (using "comp" to find "Computer" or "compare").
If you have lots of images and long descriptions and/or you want to give your users some comforts for the search (like Google does), then use Lucene.
There is no simple, easy-to-use data structure that supports efficient fulltext search.
But do you actually need efficiency? Is this a desktop app or a web app? In the former case, don't worry about efficiency, a modern CPU can search through megabytes of text in fractions of a second - simply look through all your descriptions using String.contains() (or a regexp to allow more flexible searches).
If you really need efficiency (such as for a webapp where many people could do searches at the same time), look into Apache Lucene.
As for your ArrayLists, it seems strange to use one for the description of a single image. Why a list, what does the index represent? Lines? If so, and unless you actually need to access lines directly, replace the lists with a simple String - it can contain newline characters just fine.
I would suggest you to use the Hashtable class or to organize your content into a tree to optimize searching.