Ignite Cache Sum of values - java

I am using Ignite tutorial code (link below), but I want to modify it such that it operates on a different type of data and the counts are done differently - rather than incrementing counter by 1 I want to add a current value.
So let's assume I have number of occurences of a certain word in different documents, so let's assume I have something like this:
'the' 6586
'the' 925
So I want Cache to hold
'the' 7511
So given this:
try (Stream<String> lines = Files.lines(path)) {
lines.forEach(line -> {
Stream<String> words = Stream.of(line.split(" "));
List<String> tokens = words.collect(Collectors.toList());
// this is just to emphasize that I want to pass value
Long value = Long.parseLong(tokens.get(1));
stmr.addData(tokens.get(0), Long.parseLong(tokens.get(1));
I would like the value to be passed to the stmr.receiver() method, so I can add it to the val.
I have even tried creating a class variable in StreamWords that would store the value, but the value does not get updated, and in stmr.receiver() it is still 0 ( as initialized).
Link to tutorial:
Word Count Ignite Example

I managed to figure it out. In stmr.receiver(), arg is actually the value that I would like to insert, so just cast it to the object of your desire and you should be able to get the value.

Related

Most efficient way to check string array and then write it into file

I need to check a List of Strings to contain certain predefined strings and - in case all these predefined string are contained into the list - I need to write the list to a File.
As a first approach I thought to do something like
if(doesTheListContainPredefinedStrings(list))
writeListIntoFile(list);
Where the doesTheListContainPredefinedStrings and writeListIntoFile executes loops to check for the prefefinedStrings and to write every element of the list to a file, respectively.
But - since in this case I have to worry about performance - I wanted to leverage the fact that in the doesTheListContainPredefinedStrings method I'm still evaluating the elements of the list once.
I also thought about something like
String[] predefinedStrings = {...};
...
PrintWriter pw = new FileWriter("fileName");
int predefinedStringsFound = 0;
for (String string : list)
{
if (predefinedStrings.contains(string))
predefinedStringsFound++;
pw.println(string);
}
if (predefinedStringsFound == predefinedStrings.length)
pw.close();
Since I observed that - at least on the system where I'm developing (Ubuntu 19.04) - if I don't close the stream the strings aren't written to the file.
Nevertheless, this solution seems really bad and the file would still be created, so - if the list wouldn't pass the check - I'd have the problem to delete it (which requires another access to the storage) anyway.
Someone could suggest me a better/the best approach to this and explain why it is better/the best to me?
check the reverse case — is any string from predefs in the strings-to-check-list missing?
Collection<String> predefs; // Your certain predefined strings
List<String> list; // Your list of strings to check
if( ! predefs.parallelStream().anyMatch( s -> ! list.contains( s ) ) )
writeListIntoFile(list);
The above lambda expression stops as soon as the first string from predefs can't be found in the strings-to-check-list and returns true — You must not write the file in this case.
It does not check if any additional strings are in the strings-to-check-list, that are not contained in the predefs strings.

Total number of non repeated words in each tweet

I'm new to java and Trident , I imported project for getting tweets but i want to get something How this code get more than one tweet as i got form the code that tuple.getValue(0); means first tweet only ?!
Problem with me to get all tweets in hashset or hashmap to get total number of distnictive words in each tweet
public void execute(TridentTuple tuple, TridentCollector collector) {
this method is used to execute equations on tweet
public Values getValues(Tweet tweet, String[] words){
}
This code got first tweet then get body of it ,converting it to array of string , i know what i need to solve but couldn't write it well
My Think :
Make for loop like
for (int i=0;i<10;i++)
{
Tweet tweet = (Tweet) tuple.getValue(i);
}
For each tweet:
For each word in tweet:
Try adding each word to a set.
If the word already exists in the set, remove it from the set.
count size of set containing words for that tweet.
The "problem" is a miss-match between "get the count of distinct words over all tweets" and Strom as a stream processor. The query you want to answer can only be computed on a finite set of Tweets. However, in stream processing you process an potential infinite stream of input data.
If you have a finite set of Tweets, you might want to use a batch processing framework such as Flink, Spark, or MapReduce. If you indeed have an infinite number of Tweets, you must rephrase your question...
As you mentioned already, you actually want to "loop over all Tweets". As you so stream processing, there is no such concept. You have an infinite number of input tuples, and Storm applies execute() on each of those (ie, you can think of it as if Storm "loops over the input" automatically -- even in "looping" is not the correct term for it). As your computation is "over all Tweets" you would need to maintain a state in your Bolt code, such that you can update this state for each Tweet. The simples form of a state in Storm would be member variable in your Bolt class.
public class MyBolt implements ??? {
// this is your "state" variable
private final Set<String> allWords = new HashSet<String>();
public void execute(TridentTuple tuple, TridentCollector collector) {
Tweet tweet = (Tweet)tuple.getValue(0);
String tweetBody = tweet.getBody();
String words[] = tweetBody.toLowerCase().split(regex);
for(String w : words) {
// as allWords is a set, you cannot add the same word twice
// the second "add" call on the same word will just be ignored
// thus, allWords will contain each word exactly once
this.allWords.add(w);
}
}
}
Right now, this code does not emit anything, because it is unclear what you actually want to emit? As in stream processing, there is no end, you cannot say "emit the final count of words, contained in allWords". What you could do, it to emit the current count after each update... For this, add collector.emit(new Values(this.allWords.size())); at the end of execute().
Furthermore, I want to add, that the presented solution only works correctly, if no parallelism is applied to MyBolt -- otherwise, the different sets over the instances might contain the same word. To resolve this, it would be required to tokenize each Tweet into its words in a stateless Bolt and feet this streams of words into an adopted MyBolt that uses an internal Set as state. The input data for MyBolt must also receive the data via fieldsGrouping to ensure distinct sets of words on each instance.

How do you recursively replace occurences of a string in an array

So consider a class A with two String variables "name" and "value"
class B contains a variable which is Set of A
Set<A> allVariables
is a set that would look like this
A.name="$var1"
A.value = "x+10>2"
A.name="$var2"
A.value="11+y%10==0"
A.name="$var3"
A.value="$var1 && $var2"
What I need to do is evaluate these expressions. I'm using jexl for this. I need to iterate through the Set and replace these variable names with their respective values.
In this case, the object with name $var3 needs to be replaced with "x+10>2 && 11+y%10==0"
How do I do this?
You create 2 Hashmap, translated and toTranslate.
You parse your Set.
For each A in your Set, you look at value. If value contains any number of $element (started by $ sign), you look for this $element in your translated Hashmap keys.
If it's in there, you replace the occurrences of $element by the value found in your translated hashmap.
You do this for each different $element you found in your A object.
If all $element have been translated, you add your object A into the translated hashmap (key = name, value = value).
Else, you add it to your toTranslate hashmap.
Once all your Set has been parsed, you've got 2 hashmaps.
You create a while loop: while toTranslate hashmap is not empty, you take each value, and try to translate the $element within it by the ones in your translate hashmap.
Be careful, you may end with an infinite loop. One good thing to do would be to make sure that each time you loop on the toTranslate hashmap, the numbers of its elements is reduced. If not you're in an infinite loop.
I don't think it needs to be recursive. I think just this would work:
bool madeReplacement;
do:
bool madeReplacement = false
For each member of the set, X:
For each other member of the set, Y:
Replace all instances of Y.name with Y.value in X.value. If you replaced anything, madeReplacement = true.
while (madeReplacement)
Example:
$var1 is value 1
$var2 is value $var1
$var3 is value $var2 + 2
$var3.value contains $var2, replace $var2 with $var1 -> $var1 + 2
$var2.value contains $var1, replace $var1 with 1 -> 1
$var3.value contains $var1, replace $var1 with 1 -> 1 + 2
No value contains any other name, execution finished.
Even though we 'evaluated out of order' we eventually got the right answer anyway. However, this algorithm can be O(n^3) in the worst case (imagine if you had n variables that referenced each other in a long chain, and you started the replacement on the wrong end). One way to solve this would be to, when you X.value contains Y.name, first evaluate Y.value recursively (by doing the same loop-over-the-rest-of-the-set). This makes it O(n^2) worst case, so your suspicion that a recursive approach is appropriate may be correct ;)
(I wasn't sure if variable names were guaranteed to start with $, so I wrote it so it would not matter)

Using Java, how do I set arrays to store the next user input in the next index/

What I'm trying to do is set up 14 arrays that will be of both type string and double to be able to accept input into the first index, and then the next time that the user enters in their information I don't want to add that information into an index that's already been given a value, but I do want to add that new information into the next index of that array. Do I just ++Array[index]? I'm also trying to set up this program to allow the user to be able to access 1 of 3 different IFstatements at a time to input values. If they enter input into an Array in IFstatement1, and then leave IFstatement1, go to IFstatement2 to input something, and then want to come back to IFstatement1 and input data into Arrays there, will the program know to add the newest user inputs into the next index in the array by using the ++Array[index] indicator, or do I have to do something else? How do I accomplish this?
Try looking into the ArrayList object this should work for what you are looking for and it has a nice .add() function to add to the end of the list.
List<String> l1 = new ArrayList<String>();
l1.add("String");
l1.add("Another");
for (String str : l1) {
System.out.println(str);
}
This will add the strings and provide you the iterator.
If you want another with Doubles just change for or you can have one without generics that will just hold objects <> remove them. Then your for loop would traverse the objects instead.
You can't use ++Array[index] to add an element to an array in Java. As in other statically typed languages, array's are given a size when they are created and can't contain more values then their initial size. So for instance int numbers[10]; can hold 10 integer values. If you want to set the values in that array use something like this:
int numbers[10];
for (int i = 0; i < 10; i++) {
numbers[i] = 1 + 1;
}
With regards to the user input you describe, you would need to use some form of looping, maybe a while loop?

Help me understand question related to HashMap in Java

Im given a task which i am a little confused to understand. Here is the question statement:
The following program should read a file and store all its tokens in a member variable.
Your task is to write a single method that returns the number of items in tokenMap, the average length (as double value) of the elements in tokenMap, and the number of tokens starting with character "a".
Here the tokenMap is an object of type HashMap<String, Integer>;
I do have some idea about HashMap but what i want to know the "key value" for HashMap required is a single character or the whole word?? that i should store in tokenMap.
Also how can i compute the average length?
Looks like you have to use the entire word as the key.
The average length of tokens can be computed by summing the lengths of each token and dividing by the number of tokens.
In Java, you can find the number of tokens in the HashMap by tokenMap.size().
You can write loops that visit each member of the map like this:
for(String t: tokenMap.values()){
//t is a token
}
and if you look up String in the Java API docs you will see that it is easy to find the length of a String.
To compute the average length of the items in a hash map, you'll have to iterate over them all and count the length and calculate the average.
As for your other question about what to use for a key, how are we supposed to know? A hashmap can use practically any* value for a key.
*The value must be hashable, which is defined differently for different languages.
Reading the question closely, it seems that you have to read a file, extract each word and use it as the key value, and store the length of each key as the integer:
an example line
leads to a HashMap like this
an : 2
example : 7
line : 4
After you've built your map (made of keys mapping to entries, or seemingly elements in the question), you'll need to run some statistics over it to find
the number of keys (look at HashMap)
the average length of all keys (again, simple enough)
the number beginning with "a" (just look at the String)
Then make a value object containing these values and return it from the method that does the statistics.
I know I've given more information that you require, but someone else may benefit from a little extra help.
Guys there is some confusion. Im not asking for a solution. Im just confused for one thing.
For the time being, im gonna use String type as the key type.
The only confusion i have is once i read the file line by line, should i split it based upon words or based upon each character. So that the key value should be a single character type string or a String of whole word.
If you can go through the question statement, what do you suggest. That's all im asking.
should i split it based upon words or
based upon each character
The requirement is to make tokens, so you should split them based on words. Each word becomes a unique String key. It would make sense for the value to be the count of each token.
If the file you are reading has these three lines:
int alpha;
int beta;
float delta;
Then you should have something like
<"int", 2>
<";", 3>
<"alpha", 1>
<"beta", 1>
<"float", 1>
<"delta", 1>
(The semicolon may or may not be considered a token.)
Your average length would be ( 3x2 + 3x1 + 5 + 4 + 5 + 5) / 6.
Your length of tokens starting with "a" would be 5.0.
Look elsewhere on this forum for keySet and you should be good to go.

Categories