Spark reduceByKey function seems not working with single one key - java

I have a 5 row records in mysql, like
sku:001 seller:A stock:UK margin:10
sku:002 seller:B stock:US margin:5
sku:001 seller:A stock:UK margin:10
sku:001 seller:A stock:UK margin:3
sku:001 seller:A stock:UK margin:7
And I've this rows read into spark and transformed them into
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<margin,xxx>).
Seems like works fine until now.
However, When I used the reduceByKey function to sum the margin as the structure like:
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<marginSummary, xxx>).
the final result got 2 elements
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<margin,xxx>).
JavaPairRDD<Tuple3<String,String,String>, Map>(<sku,seller,stock>, Map<marginSummary, xxx>).
seems like the row2 didn't enter the reduceByKey function body. I was wondering why?

It is expected outcome. func is called only when objects for a single key are merged. If there is only one key, there is no reason to call it.
Unfortunately it looks like you have a bigger problem, which can be inferred from you question. You are trying to change the type of the value in reduceByKey. In general it shouldn't even compile as reduceByKey takes Function2<V,V,V> - input and output types have to be identical.
If you want to change a type, you should use either combineByKey
public <C> JavaPairRDD<K,C> combineByKey(Function<V,C> createCombiner,
Function2<C,V,C> mergeValue,
Function2<C,C,C> mergeCombiners)
or aggregateByKey
public <U> JavaPairRDD<K,U> aggregateByKey(U zeroValue,
Function2<U,V,U> seqFunc,
Function2<U,U,U> combFunc)
Both can change the types and fixed your current problem. Please refer to Java test suite for examples: 1 and 2.

Related

Substitute ints into Dataflow via Cloudbuild yaml

I've got a streaming Dataflow pipeline, written in Java with BEAM 2.35. It commits data to BigQuery via StorageWriteApi. Initially the code looks like
BigQueryIO.writeTableRows()
.withTimePartitioning(/* some column */)
.withClustering(/* another column */)
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.withTriggeringFrequency(Duration.standardSeconds(30))
.withNumStorageWriteApiStreams(20) // want to make this dynamic
This code runs in different environment eg Dev & Prod. When I deploy in Dev I want 2 StorageWriteApiStreams, in Prod I want 20, and I'm trying to pass/resolve these values at the moment I deploy with a Cloudbuild.
The cloudbuild-dev.yaml looks like
steps:
- lots-of-steps
args:
--numStorageWriteApiStreams=${_NUM_STORAGEWRITEAPI_STREAMS}
substitutions:
_PROJECT: dev-project
_NUM_STORAGEWRITEAPI_STREAMS: '2'
I expose the substitution in the job code with an interface
ValueProvider<String> getNumStorageWriteApiStreams();
void setNumStorageWriteApiStreams(ValueProvider<String> numStorageWriteApiStreams);
I then refactor the writeTableRows() call to invoke getNumStorageWriteApiStreams()
BigQueryIO.writeTableRows()
.withTimePartitioning(/* some column */)
.withClustering(/* another column */)
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.withTriggeringFrequency(Duration.standardSeconds(30))
.withNumStorageWriteApiStreams(Integer.parseInt(String.valueOf(options.getNumStorageWriteApiStreams())))
Now it's dynamic but I get a build failure on account of java.lang.IllegalArgumentException: methods with same signature getNumStorageWriteApiStreams() but incompatible return types: [class java.lang.Integer, interface org.apache.beam.sdk.options.ValueProvider]
My understanding was that Integer.parseInt returns an int, which I want so I can pass it to withNumStorageWriteApiStreams() which requires an int.
I'd appreciate any help I can get here thanks
Turns out BigQueryOptions.java already has a method getNumStorageWriteApiStreams() that returns an Integer. I was unknowingly trying to rewrite it with a different return, oops.
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L95-L98

Solr sorting by custom function query

I'm running into some issues developing a custom function query using Solr 3.6.2.
My goal is to be able to implement a custom sorting technique.
I have a field called daily_prices_str, it is a single value str.
Example:
<str name="daily_prices_str">
2014-05-01:130 2014-05-02:130 2014-05-03:130 2014-05-04:130 2014-05-05:130 2014-05-06:130 2014-05-07:130 2014-05-08:130 2014-05-09:130 2014-05-10:130 2014-05-11:130 2014-05-12:130 2014-05-13:130 2014-05-14:130 2014-05-15:130 2014-05-16:130 2014-05-17:130 2014-05-18:130 2014-05-19:130 2014-05-20:130 2014-05-21:130 2014-05-22:130 2014-05-23:130 2014-05-24:130 2014-05-25:130 2014-05-26:130 2014-05-27:130 2014-05-28:130 2014-05-29:130 2014-05-30:130 2014-05-31:130 2014-06-01:130 2014-06-02:130 2014-06-03:130 2014-06-04:130 2014-06-05:130 2014-06-06:130 2014-06-07:130 2014-06-08:130 2014-06-09:130 2014-06-10:130 2014-06-11:130 2014-06-12:130 2014-06-13:130 2014-06-14:130 2014-06-15:130 2014-06-16:130 2014-06-17:130 2014-06-18:130 2014-06-19:130 2014-06-20:130 2014-06-21:130 2014-06-22:130 2014-06-23:130 2014-06-24:130 2014-06-25:130 2014-06-26:130 2014-06-27:130 2014-06-28:130 2014-06-29:130 2014-06-30:130 2014-07-01:130 2014-07-02:130 2014-07-03:130 2014-07-04:130 2014-07-05:130 2014-07-06:130 2014-07-07:130 2014-07-08:130 2014-07-09:130 2014-07-10:130 2014-07-11:130 2014-07-12:130 2014-07-13:130 2014-07-14:130 2014-07-15:130 2014-07-16:130 2014-07-17:130 2014-07-18:130 2014-07-19:170 2014-07-20:170 2014-07-21:170 2014-07-22:170 2014-07-23:170 2014-07-24:170 2014-07-25:170 2014-07-26:170 2014-07-27:170 2014-07-28:170 2014-07-29:170 2014-07-30:170 2014-07-31:170 2014-08-01:170 2014-08-02:170 2014-08-03:170 2014-08-04:170 2014-08-05:170 2014-08-06:170 2014-08-07:170 2014-08-08:170 2014-08-09:170 2014-08-10:170 2014-08-11:170 2014-08-12:170 2014-08-13:170 2014-08-14:170 2014-08-15:170 2014-08-16:170 2014-08-17:170 2014-08-18:170 2014-08-19:170 2014-08-20:170 2014-08-21:170 2014-08-22:170 2014-08-23:170 2014-08-24:170 2014-08-25:170 2014-08-26:170 2014-08-27:170 2014-08-28:170 2014-08-29:170 2014-08-30:170
</str>
As you can see the structure of the string is date:price.
Basically, I would like to parse the string to get the price for a particular period and sort by that price.
I’ve already developed the java plugin for the custom function query and I’m at the point where my code compiles, runs, executes, etc. Solr is happy with my code.
Example:
price(daily_prices_str,2015-01-01,2015-01-03)
If I run this query I can see the correct price in the score field:
/select?price=price(daily_prices_str,2015-01-01,2015-01-03)&q={!func}$price
One of the problems is that I cannot sort by function result.
If I run this query:
/select?price=price(daily_prices_str,2015-01-01,2015-01-03)&q={!func}$price&sort=$price+asc
I get a 404 saying that "sort param could not be parsed as a query, and is not a field that exists in the index: $price"
But it works with a workaround:
/select?price=sum(0,price(daily_prices_str,2015-01-01,2015-01-03))&q={!func}$price&sort=$price+asc
The main problem is that I cannot filter by range:
/select?price=sum(0,price(daily_prices_str,2015-1-1,2015-1-3))&q={!frange l=100 u=400}$price
Maybe I'm going about this totally incorrectly?
Instead of passing the newly created "price" to the "sort" parameter, can you pass the function with data itself like so?
q=*:*&sort=price(daily_prices_str,2015-01-01,2015-01-03) ...

what is this error and how do I prevent this? The bucket expression values are not comparable and no comparator specified

Im using jasperReports with dynamicReports and I want to build a crosstab report. so far I have figured out that this error happens when I add columns that are numeric to rowGroups or columnGroups. this is what I get and I don't know why and I don't know how to solve this.
The error is:
The bucket expression values are not comparable and no comparator specified
My code is:
CrosstabValues crosstabValues = report.getCrosstab().getCrosstabValues();
Collection<CrosstabRowGroupBuilder> rowGroup = generateRowGroup(crosstabValues);
Collection<CrosstabColumnGroupBuilder> columnGroup = generateColumnGroup(crosstabValues);
Collection<CrosstabMeasureBuilder> measures = generateMeasures(crosstabValues);
CrosstabBuilder crosstab = ctab.crosstab();
for(CrosstabRowGroupBuilder row : rowGroup)
crosstab.addRowGroup(row);
for(CrosstabColumnGroupBuilder columnGroupBuilder : columnGroup)
crosstab.addColumnGroup(columnGroupBuilder);
for(CrosstabMeasureBuilder measure : measures)
crosstab.addMeasure(measure);
crosstab.headerCell(cmp.text(crosstabValues.getHeader())
.setStyle(getCrosstabHeaderCellStyle(report.getTemplate().getReportTemplateValues())));
the problem was the class I was giving to this method:
CrosstabRowGroupBuilder cTabRow = ctab.rowGroup(column.getName()
, getColumnTypeClass(column));
i was using Number class for all numeric data. the funny thing is that it worked for measures but it did not work for rowGroup or columnGroup. that is why I got confused.
now with Integer.Class or Long.Class it works good.
Crosstab must know in which order display rowHeader or columnHeader. And crosstab must know in which cell of crosstab put measure. It is possible only if crosstab is able compare rowGroup (and ColumnGroup) values.
Classes which used in rowGroup and columnGroup must implements Comparable interface

MongoDB can't find by national regex or query

Looks like I miss something important.
I have some records inserted into mongoDB that contains fields with national characters. There are no problem to insert it to DB or find them and all values looks pretty good.
But if I try to find particular one with "find()" or "regex()" they return nothing. For example:
DBObject query = new BasicDBObject();
query.put("position", Pattern.compile(".*forsøg.*"));
--or--
query.put("position","forsøg");
System.out.println(collection.find(query).count()); // prints 0
in log
query={ "position" : { "$regex" : ".*������.*"}}
--or---
query={ "position" : "������"}
Field value for "position" is equal "forsøg" ofc. Pattern.matches(".*forsøg.*", "forsøg") returns true.
If I replace pattern with one containing only ASCII characters (".abc." for example ) all methods work as expected. Collection.findAll() return all saved instances with readable and correct values.
Versions: MongoDB 2.0.6 64bit, mongo-java-driver 2.8.0, Java 7. I tried the same with spring-data-mongodb 1.0.2.RELEASE but removed it.
Looks like I meet a strange bug related with a maven + testng. The same code executed from .war and from testsuit provides totally different result in database.
The difference may be easily found if point your browser to
http://127.0.0.1:28017/baseName/collectionName/
and look at the values after each execution.

Hibernate multiple parameters setString generated Java

I'm using hibernate, and trying to do a LIKE on certain fields.
I'm splitting a string, and then generating the HQL, with
table.entry LIKE :argsearch_0 OR table.entry LIKE :argsearch_0 OR
table.entry LIKE :argsearch_1 OR table.entry LIKE :argsearch_1
(0 and 1 is in fact incremented with a counter).
But i get :
Not all named parameters have been set: [argsearch_0]
First question :
Can I used 2 named parameter and do only 1 setParameter (or setString) :
String nameParam = "argsearch_"+i;
q.setParameter(nameParam, "%"+args[i]+"%");
Second question :
Why my parameters are not working ?
Depends what you mean when you ask "Can I used 2 named parameter and do only 1 setParameter".
In your original query you have 2 named parameters ('argsearch_0' and 'argsearch_1') and each has 2 usages in the query. So you have to call set for both 'argsearch_0' and 'argsearch_1'. But you only call set once for each (actually you can call set multiple times for each parameter if you really want, but only the last once is used.
As for your second question, as someone already pointed out, its because you have a bug in your code. You are not setting the value for the 'argsearch_0' parameter.
You can Try this
**Step 1--:** Add How many parameter you need just add in hashmap
-------------------------------------------------------------------
HashMap param_List=new HashMap();
param_List.put("contactNo",22);
**Step 2--:** Just You pass your Query
-------------------------------------------------------------------
Query query1 = session.createQuery("select * from emailTemplate where c.contactNo =:contactNo");
**Step 3--:** What ever Data type is no matters but get an output.
-------------------------------------------------------------------
for(Object paramKey : param_List.keySet())
{
query1.setParameter(paramkey.toString(), param_List.get(paramKey);
}
**Step 4--:**
-------------------------------------------------------------------
String finalResult=query1.getSingleResult().toString();

Categories