Using a Markov Model to analyze a text input in Java

Using a Markov Model to analyze a text input in Java - java

I'm very new to Java and I'm required to use a Markov Model to analyze a text (String) input. I will be honest: this is for an assignment. I am looking to learn how to answer it, not just copy and paste code.
The code I am working with (this is given and cannot be altered) is below:
public class MarkovModel
{
/** Markov model order parameter */
int k;
/** ngram model of order k */
NgramAnalyser ngram;
/** ngram model of order k+1 */
NgramAnalyser n1gram;
/**
* Construct an order-k Markov model from string s
* #param k int order of the Markov model
* #param s String input to be modelled
*/
public MarkovModel(int k, String s)
{
//TODO replace this line with your code
}
the exact question is: Complete the code for the constructor, which takes an integer k, and an input string s, and initializes any data structures needed for a k-th order Markov model.
We are also using an n-gram analyser, and the text should wrap around (for example, with "abbbc", the 3-grams would be abb, bbb, bbc, bca, and cab). Please let me know if you need any more information! Again, I am not looking to copy and paste code, just want a little bit of help understanding how to solve it.
Thanks in advance.

Related

Using Text data type in JavaRDD and returning void in FlatMap

I am trying to migrate a hadoop code into spark. I already have some predefined functions which I should be able to reuse in spark, as they are mere java codes, without much of hadoop dependency. I have a function that accepts input (spatial data-longitude, latitude) in Text format and converts them into shape (Polygons, linestream etc). When I try to read it in Spark, I am reading each line of the files first as String. Then converting them to Text so that I can use my previously created function. But I have two doubts, firstly it seems like JavaRDD doesn't use Text and I am getting some problems for that. Secondly the function that converts Text to shape doesn't return anything. But I am not being able to use flatMap or any other mapping technique. I am not even sure if my approach is correct or not.
Here is my code model:
/*function for converting Text to Shape*/
public interface TextSerializable {
public Text toText(Text text);
public void fromText(Text text);
* Retrieve information from the given text.
* #param text The text to parse
*/
}
/*Shape Class looks something like this*/
public interface Shape extends Writable, Cloneable, TextSerializable {
/
* Returns minimum bounding rectangle for this shape.
* #return The minimum bounding rectangle for this shape
*/
public Rectangle getMBR();
/**
* Gets the distance of this shape to the given point.
* #param x The x-coordinate of the point to compute the distance to
* #param y The y-coordinate of the point to compute the distance to
* #return The Euclidean distance between this object and the given point
*/
......
......
......*/
/*My code structure*/
SparkConf conf = new SparkConf().setAppName("XYZ").setMaster("local");
JavaSparkContext sc =new JavaSparkContext(conf);
final Text text=new Text();
JavaRDD<String> lines = sc.textFile("ABC.csv");
lines.foreach(new VoidFunction<String>(){
public void call(String lines){
text.set(lines);
System.out.println(text);
}
});
/*Problem*/
text.flatMap(new FlatMapFunction<Text>(){
public Iterable<Shape> call(Shape s){
s.fromText(text);
//return void;
}
The last line of the code is wrong, but I don't know how to fix it. JavaRDD can be used with user defined class (as per my knowledge). I am not even sure if the way I have converted the String lines to Text text, if that is allowed in the RDD or not. I am completely new in Spark. Any kind of help would be great.

You are totally off from the concept. First thing you cannot call functions like map, flatmap etc. on any object they can be called only from JavaRDD and Text is not a JavaRDD and Spark do support Text but not in the way you used it.
Now coming to your question since you want to convert string to text format use something like this
SparkConf conf = new SparkConf().setAppName("Name of Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile("replace with address of file");
/*This map function will take string as input because we are calling it on javaRDD logData and that logData return string type value. This map fucntion will give Text as output
you can replace the return statement with logic of your toText function(However new Text(s) is also a way to convert string into Text) but remember use of return is mandatory so apply logic accordingly
*/
JavaRDD<Text> rddone = logData.map(new Function<String,Text>(){
public Text call(String s)
{// type logic of your toText() function here
return new Text(s);}});
Now when we call our flatmap function over JavaRDD rddone it will take input as Text since the output of rddone is Text and it can give output whatever you want.
/* This flatmap fucntion will take Text as input and will give iterator over object */
JavaRDD <Object> empty = rddone.flatMap(new FlatMapFunction<Text,Object>(){
public Iterator<Object> call(Text te)
{
// here you can call your fromText(te) method.
return null;
}
});
also refer these links for more details http://spark.apache.org/docs/latest/programming-guide.html
http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/JavaRDD.html

Java 3.0 get distinct values from a column mongodb

I am really struggling here and have looked at other questions but just cant seem to get the answer I need.
What I am trying to do is pull through all the unique values of a column and then iterate through them and add them to an array. Ending up with the one column being stored in my array, but one of each value that exists not the multiple like there currently is.
Every time I try and do .distinct it asks me for the return class I have tried many different class but it just doesn't seem to work... Code is below any help would be appreciated.
public static void MediaInteraction() {
//Storing data from MediaInteraction in MediaArray
//BasicDBObject Query = new BasicDBObject();
//Query.put("consumerid", "");
MongoCursor<Document> cursormedia = collectionmedia.distinct("consumerid", (What do I put here?)).iterator();
while (cursormedia.hasNext()) {
System.out.println(cursormedia.next());
MediasessionID.add(cursormedia.next());
}
System.out.println("Media Array Complete");
System.out.println(MediasessionID.size());
}

The change that you probably want to introduce shall be somewhat like -
MongoCursor<Document> cursormedia = collectionmedia.distinct("consumerid",
<ConsumerId-DataType>.class).iterator(); //please replace the consumerId field's datatype here
Also from the docs -
/**
* Gets the distinct values of the specified field name.
*
* #param fieldName the field name
* #param resultClass the class to cast any distinct items into.
* #param <TResult> the target type of the iterable.
* #return an iterable of distinct values
* #mongodb.driver.manual reference/command/distinct/ Distinct
*/
<TResult> DistinctIterable<TResult> distinct(String fieldName, Class<TResult> resultClass);
So in your example, if you are trying to attain cursor for Document you probably want to use Document.class in the above suggested code.
Edit - Also the fact that you are calling cursormedia.next() twice the count of your MediasessionID would be halved. Suggest you do that(.next) once improving it further to obtain results.

Searching for suitable data structure in Java [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
We are working on a game and looking to develop a functionality which will allow us to mix various items in a manner similar to "alchemy" game. The main idea is that we have a number of elements, which can be split into three groups: basic, intermediate and final. Basic resources can be merged together and make an intermediate resource, intermediate resources can be merged with intermediate and basic resources and make final and so on.
So, we are thinking about having 2 HashMaps: one would have a indicate what each resource is combinable with, second one would map what each resource would be made of. Is there a better way to do this? Any data structure that we are not aware of?
Thanks

Just write your own Datastructure like this
public class Element {
enum Type{BASIC, INTERMEDIATE, FINAL};
private Type type;
private String name;
private List<Element> combinable;
}

What you want is an enum containing all your elements, with a couple of methods. Here is an example, feel free to use it if it suites your needs.
If desired, you can also make a second enum for Type (as Templar suggested) and add it as a field in you Element enum.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public enum Element {
//Example instances, replace with what is appropriate for your game.
WATER, // Basic
WOOD, // Basic
IRON, // Basic
STONE, // Basic
FIRE, // Basic
CARBON(WOOD, FIRE), //Intermediate
FORGE(STONE, IRON), // Intermediate
STEEL(FORGE, IRON); // Final
private Element[] parts;
private Element() {
//instantiates parts to prevent NullPointerException
this.parts = new Element[0];
}
private Element(Element... parts) {
this.parts = parts;
}
/**
* return all the parts of this Element.
* #return
*/
public List<Element> getParts() {
return Arrays.asList(parts);
}
/**
* find all elements that have this Element listed as one of their parts.
*
* #param part
* #return
*/
public List<Element> getComposites() {
List<Element> composites = new ArrayList<Element>();
// Iterate through all Elements
for (Element composite : Element.values()) {
// Iterate through each Element's parts
for (Element part : composite.parts) {
// If the element has a part equal to the argument,
// Add the element to the list of composites.
if (part == this) {
composites.add(composite);
}
}
}
return composites;
}
}

You should use the Composite design pattern.
In your case BasicResource is a leaf class.
intermediate and final are composites.

I would actually separate elements and their combinability - if each element has to contain a list of elements it's combinable with, whenever you want to add a new element you have to go back and add it to all old elements you want it to combine with.
I'd separate the concept out into something like "Elements" and "Formulas" -
class Element {
enum Type{NULL, BASIC, INTERMEDIATE, FINAL};
private Type type;
private String name;
// ....
}
class Formula {
private List<Element> requires = new ArrayList<Element>();
private Element produces;
public Formula(List<Element> requires, Element produces) {
Collections.copy(requires, this.requires);
this.produces = produces;
}
public final List<Element> requiredElements() {
return Collections.unmodifiableList(requires);
}
public final boolean applyFormula(List<Element> ingredients) {
for (Element e : requires) {
if (!ingredients.contains(e)) {
// ingredients doesn't contain a required element - return early.
return false;
}
}
for (Element e : requires) {
ingredients.remove(e);
}
ingredients.add(produces);
return true;
}
}

If you're creating a game, having this data hard-coded in your Java source code is going to make things a pain. You'll have to recompile every time you want to add a new element, change a relationship (what is composed of what), etc.
Instead, I'd recommend storing all of your element information in an external source and then reading it in / accessing it from your program. You could do this with a database, but I have a feeling that's a little bit overkill (at least for now). Instead, you could use a nice readable, plain-text, standardized format like JSON to define your elements and their relationships externally, and then import all the data using a library (I'd suggest GSon) for easy access in your program.
As for the data structure, I think your choice of HashMaps would work just fine. Since JSON is built on two basic types of data structures—lists [] and maps {}—that's what Gson would convert things to anyway. Here's a very simple example of what how I'd envision your element specification:
{
"elements" : {
"iron" : "basic",
"carbon" : "basic",
"steel" : "intermediate"
},
"formulae" : {
"steel" : [ "iron", "carbon" ]
}
}
You could read that in with Gson (or whatever JSON library you choose), and then build whatever other data structures you need from that. If you can figure out how to get Gson to create the data structures you want directly (I know this is possible, but I don't remember how hard it is to do the configuration), then that would be even better. For example, if you could turn the "formulae" value into a BidiMap (the Apache Commons bi-directional map) then that might be very useful (but you'd also need to turn the components list into a set to keep it order-agnostic, e.g. iron+carbon is the same as carbon+iron).
For even more dynamic behavior, you could add a feature into your program to allow you to reload all of your elements data while your game is still running! (This might make debugging easier.)
I know this isn't exactly what you were asking, but I hope you find my suggestions helpful anyway!

Char[] to Byte[] for output optimize in web (java)

I just find in an experence share presentation from infoq. It claims that if you convert the String to byte[] in servlet, it will increase the QPS (Queries per Second?).
The code example shows the comparison:
Before
private static String content = “…94k…”;
protected doGet(…){
response.getWrite().print(content);
}
After
private static String content = “…94k…”;
Private static byte[] bytes = content.getBytes();
protected doGet(…){
response.getOutputStream().write(bytes);
}
Result before
page size（K）94
max QPS 1800
Result after
page size（K）94
max QPS 3500
Can anyone explain why it was optimized? I trust it to be true.
UPDATE
In case I cause any misleading. I need explain that the original presentation only uses this as an example. They actually refactor the velocity engine by this way. BUt this source code is a bit long.
Actually in the presentation didn't imply how they do it in detail. But I found some lead.
In ASTText.java, they cached the byte[] ctext instead of char[] ctext , which boosts the performance a lot~!
Just like the way above. It makes a lot of sense,right?
(BUT definitely they should also refactor the Node interface. Writer cannot write byte[]. Which means using OutputStream instead!)
As Perception adviced actually a Write finally delegate to a StreamEncoder. And StreamEncoder write will first change char[] into byte[]. And then delegate it to the OutputSteam to do the real write. You can easily refer to the source code and prove it.
Considering render method will be called each time for showing the page, the saving of cost will be considerable.
StreamEncoder.class
public class ASTText extends SimpleNode {
private char[] ctext;
/**
* #param id
*/
public ASTText(int id) {
super (id);
}
/**
* #param p
* #param id
*/
public ASTText(Parser p, int id) {
super (p, id);
}
/**
* #see org.apache.velocity.runtime.parser.node.SimpleNode#jjtAccept(org.apache.velocity.runtime.parser.node.ParserVisitor, java.lang.Object)
*/
public Object jjtAccept(ParserVisitor visitor, Object data) {
return visitor.visit(this , data);
}
/**
* #see org.apache.velocity.runtime.parser.node.SimpleNode#init(org.apache.velocity.context.InternalContextAdapter, java.lang.Object)
*/
public Object init(InternalContextAdapter context, Object data)
throws TemplateInitException {
Token t = getFirstToken();
String text = NodeUtils.tokenLiteral(t);
ctext = text.toCharArray();
return data;
}
/**
* #see org.apache.velocity.runtime.parser.node.SimpleNode#render(org.apache.velocity.context.InternalContextAdapter, java.io.Writer)
*/
public boolean render(InternalContextAdapter context, Writer writer)
throws IOException {
if (context.getAllowRendering()) {
writer.write(ctext);
}
return true;
}
}

Apart from the fact that you aren't calling the same output methods, in your second example you avoid the overhead of converting the String to bytes before writing it to the output stream. These scenarios are not very realistic though, the dynamic nature of web applications precludes pre-converting all your data models into byte streams. And, there are no serious architectures out there now where you will be writing directly to the HTTP output stream like this.

Better way to represent array in java properties file

I'm currently making a .properties file that needs to be loaded and transformed into an array. But there is a possibility of anywhere from 0-25 of each of the property keys to exist. I tried a few implementations but i'm just stuck at doing this cleanly. Anyone have any ideas?
foo.1.filename=foo.txt
foo.1.expire=200
foo.2.filename=foo2.txt
foo.2.expire=10
etc more foo's
bar.1.filename=bar.txt
bar.1.expire=100
where I'll assemble the filename/expire pairings into a data object, as part of an array for each parent property element like foo[myobject]
Formatting of the properties file can change, I'm open to ideas.

I can suggest using delimiters and using the
String.split(delimiter)
Example properties file:
MON=0800#Something#Something1, Something2
prop.load(new FileInputStream("\\\\Myseccretnetwork\\Project\\props.properties"));
String[]values = prop.get("MON").toString().split("#");
Hope that helps

Didn't exactly get your intent.
Do check Apache Commons configuration library http://commons.apache.org/configuration/
You can have multiple values against a key as in
key=value1,value2
and you can read this into an array as configuration.getAsStringArray("key")

Either define a delimiter that will not be a potential value or learn to use XML.
If you still insist on using properties use one of the methods that will return a list of all keys. Your key appears to have three parts a group identifier (foo, bar) an index (1, 2) and then an element name (filename, expire). Get all the keys break them into their component parts. Create a List for each type of identifier, when processing the list use the identifier to determine which List to add to. Create you paired elements as you said and simply add to the list! If the index order is important either add that as a field to your paired elements or sort the keys before processing.

Use YAML files for properties, this supports properties as an array.
Quick glance about YAML:
A superset of JSON, it can do everything JSON can + more
Simple to read
Long properties into multiline values
Supports comments
Properties as Array
YAML Validation

I have custom loading. Properties must be defined as:
key.0=value0
key.1=value1
...
Custom loading:
/** Return array from properties file. Array must be defined as "key.0=value0", "key.1=value1", ... */
public List<String> getSystemStringProperties(String key) {
// result list
List<String> result = new LinkedList<>();
// defining variable for assignment in loop condition part
String value;
// next value loading defined in condition part
for(int i = 0; (value = YOUR_PROPERTY_OBJECT.getProperty(key + "." + i)) != null; i++) {
result.add(value);
}
// return
return result;
}

I highly recommend using Apache Commons (http://commons.apache.org/configuration/). It has the ability to use an XML file as a configuration file. Using an XML structure makes it easy to represent arrays as lists of values rather than specially numbered properties.

here is another way to do by implementing yourself the mechanism.
here we consider that the array should start with 0 and would have no hole between indice
/**
* get a string property's value
* #param propKey property key
* #param defaultValue default value if the property is not found
* #return value
*/
public static String getSystemStringProperty(String propKey,
String defaultValue) {
String strProp = System.getProperty(propKey);
if (strProp == null) {
strProp = defaultValue;
}
return strProp;
}
/**
* internal recursive method to get string properties (array)
* #param curResult current result
* #param paramName property key prefix
* #param i current indice
* #return array of property's values
*/
private static List<String> getSystemStringProperties(List<String> curResult, String paramName, int i) {
String paramIValue = getSystemStringProperty(paramName + "." + String.valueOf(i), null);
if (paramIValue == null) {
return curResult;
}
curResult.add(paramIValue);
return getSystemStringProperties(curResult, paramName, i+1);
}
/**
* get the values from a property key prefix
* #param paramName property key prefix
* #return string array of values
*/
public static String[] getSystemStringProperties(
String paramName) {
List<String> stringProperties = getSystemStringProperties(new ArrayList<String>(), paramName, 0);
return stringProperties.toArray(new String[stringProperties.size()]);
}
Here is a way to test :
#Test
public void should_be_able_to_get_array_of_properties() {
System.setProperty("my.parameter.0", "ooO");
System.setProperty("my.parameter.1", "oO");
System.setProperty("my.parameter.2", "boo");
// WHEN
String[] pluginParams = PropertiesHelper.getSystemStringProperties("my.parameter");
// THEN
assertThat(pluginParams).isNotNull();
assertThat(pluginParams).containsExactly("ooO","oO","boo");
System.out.println(pluginParams[0].toString());
}
hope this helps
and all remarks are welcome..

As user 'Skip Head' already pointed out, csv or a any table file format would be a better fitt in your case.
If it is an option for you, maybe this Table implementation might interest you.

I'd suggest having the properties file, reference a CSV file.
Then parse the CSV file into a collection/array etc instead.
Properties file seems wrong fit for this kind of data.

Actually all answers are wrong
Easy: foo.[0]filename

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.