A while ago, I asked how to get the body from a Result in Play 2.5.0 Java.
The answer was basically to use play.core.j.JavaResultExtractor. I am now upgrading to 2.6, and JavaResultExtractor no longer exists (or at least is not public).
How does one do this in Play 2.6?
I did find Result.body().consumeData which seems like it might work, but also comes with the worrisome warning:
This method should be used carefully, since if the source represents an ephemeral stream, then the entity may not be usable after this method is invoked.
I suppose that, since I am doing this in an action, I could call consumeData to get all of the data into a local variable, process that and then return a new result with the stored data. That only fails in the case where the data is too big to fit into memory, which is not something I am currently expecting.
In in Play 2.6 it is still possible to re-implement 2.5 functionality. Please have a look at the example that get Json body from Result:
public static JsonNode resultToJsonNode(final Result result, final long timeout, final Materializer mat)
throws Exception {
FiniteDuration finiteDuration = Duration.create(timeout, TimeUnit.MILLISECONDS);
byte[] body = Await.result(
FutureConverters.toScala(result.body().consumeData(mat)), finiteDuration).toArray();
ObjectMapper om = new ObjectMapper();
final ObjectReader reader = om.reader();
return reader.readTree(new ByteArrayInputStream(body));
}
Related
We need to read data from our checkpoints manually for different reasons (let's say we need to change our state object/class structure, so we want to read restore and copy data to a new type of object)
But, while we are reading everything is good, when we want to keep/store it in memory and deploying to flink cluster we get empty list/map. in log we see that we are reading and adding all our data properly to list/map but as soon as our method completes it's work we lost data, list/map is empty :(
val env = ExecutionEnvironment.getExecutionEnvironment();
val savepoint = Savepoint.load(env, checkpointSavepointLocation, new HashMapStateBackend());
private List<KeyedAssetTagWithConfig> keyedAssetsTagWithConfigs = new ArrayList<>();
val keyedStateReaderFunction = new KeyedStateReaderFunctionImpl();
savepoint.readKeyedState("my-uuid", keyedStateReaderFunction)
.setParallelism(1)
.output(new MyLocalCollectionOutputFormat<>(keyedAssetsTagWithConfigs));
env.execute("MyJobName");
private static class KeyedStateReaderFunctionImpl extends KeyedStateReaderFunction<String, KeyedAssetTagWithConfig> {
private MapState<String, KeyedAssetTagWithConfig> liveTagsValues;
private Map<String, KeyedAssetTagWithConfig> keyToValues = new ConcurrentHashMap<>();
#Override
public void open(final Configuration parameters) throws Exception {
liveTagsValues = getRuntimeContext().getMapState(ExpressionsProcessor.liveTagsValuesStateDescriptor);
}
#Override
public void readKey(final String key, final Context ctx, final Collector<KeyedAssetTagWithConfig> out) throws Exception {
liveTagsValues.iterator().forEachRemaining(entry -> {
keyToValues.put(entry.getKey(), entry.getValue());
log.info("key {} -> {} val", entry.getKey(), entry.getValue());
out.collect(entry.getValue());
});
}
public Map<String, KeyedAssetTagWithConfig> getKeyToValues() {
return keyToValues;
}
}
as soon as this code executes I expect having all values inside map which we get from keyedStateReaderFunction.getKeyToValues(). But it returns empty map. However, I see in logs we are reading all of them properly. Even data empty inside keyedAssetsTagWithConfigs list where we are reading output in it.
If anyone has any idea will be very helpful because I get lost, I never had such experience that I put data to map and then I lose it :) When I serialize and write my map or list to text file and then deserialize it from there (using jackson) I see my data exists, but this is not a solution, kind of "workaround"
Thanks in advance
The code you show creates and submits a Flink job to be executed in its own environment orchestrated by the Flink framework: https://nightlies.apache.org/flink/flink-docs-stable/docs/concepts/flink-architecture/#flink-application-execution
The job runs independently than the code that builds and submits the Flink job so when you call keyedStateReaderFunction.getKeyToValues(), you are calling the method of the object that was used to build the job, not the actual object that was run in the Flink execution environment.
Your workaround seems like a valid option to me. You can then submit the file with your savepoint contents to your new job to recreate its state as you'd like.
You have an instance of KeyedStateReaderFunctionImpl in the Flink client which gets serialized and sent to each task manager. Each task manager then deserializes a copy of that KeyedStateReaderFunctionImpl and calls its open and readKey methods, and gradually builds up a private Map containing its share of the data extracted from the savepoint/checkpoint.
Meanwhile the original KeyedStateReaderFunctionImpl back in the Flink client has never had its open or readKey methods called, and doesn't hold any data.
In your case the parallelism is one, so there is only one task manager, but in general you will need collect the output from each task manager and assemble together the complete results from these pieces. These results are not available in the flink client process because the work hasn't been done there.
I found a solution, started job in attached mode and collecting results in main thread
val env = ExecutionEnvironment.getExecutionEnvironment();
val configuration = env.getConfiguration();
configuration
.setBoolean(DeploymentOptions.ATTACHED, true);
...
val myresults = dataSource.collect();
Hope will help somebody else because I wasted couple of days while trying to find a soltion.
Here is a non Singleton class which is used to send a payload to an API...
class MyApiClient {
String url = "http://www.yankeeService.com"
int playerId = 99
String playerFirstName = "Aaron"
String playerLastName = "Judge"
public void sendPayload(String content) {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost httpPost = new HttpPost();
String jsonPayload = """ "{"id":"$playerId","name":"$playerLastName","dailyReport":"$content"}" """ ;
StringEntity entity = new StringEntity(jsonPayload);
httpPost.setEntity(entity);
CloseableHttpResponse response = client.execute(httpPost);
assertThat(response.getStatusLine().getStatusCode(), equalTo(200));
client.close();
}
}
Would there be any problem if multiple threads were to enter that sendPayload method?
I think it would be fine because none of the global variables are modified in any way (they are read only and used to facilitate the API call).
Also the jsonPayload is a local variable so each thread would get their own version of it and there would be no chance for one thread to grab the payload content of another right?
Multi-threading issues come to play when threads are using and writing to shared data in an uncontrolled manner.
Meaning:
when all your threads are only invoking the send method, then you do not have a problem - because all threads are reading and using the same data
but when these threads change the content of any of the fields - then all bets are off.
And thing is: your fields have package visibility - this means it is very simple to update them from "outside". An object of MyApiClient would not even notice if field content is changed.
Thus:
first of all, make these fields private to hide such details from the outside
consider making them final as well
Yes it is thread safe. You are trying to post some thing to the remote location. It seems you are not worried about people overwriting the content at the remote location ( if you are then even thread safe logic will not be your help)
Your logic "I think it would be fine because none of the global variables are modified in any way (they are read only and used to facilitate the API call)."
is correct.
For readability and convention purpose I would suggest to use final construct with the attributes.
While there are multiple problems outside the scope of the question with the method you've proposed (what are """, do you really want to be crafting JSON objects by hand, and you don't handle exceptions) your assessment of concurrency appears to be correct.
You may want to ensure that, though, perhaps by making your variables final if they aren't supposed to ever be changed. This way if a future code modification does cause them to be changed, you'll know at compile time that there's a mistake. Or maybe it's not a mistake and those variables need to change... but you'll know you have to revisit your concurrency issue.
You can use http-request built on apache http api. Documentation here.
class MyApiClient {
private static final HttpRequest<?> HTTP_REQUEST =
HttpRequestBuilder.createGet("http://www.yankeeService.com")
.addContentType(ContentType.APPLICATION_JSON)
.build();
int playerId = 99
String playerFirstName = "Aaron"
String playerLastName = "Judge"
public void sendPayload(String content) {
String jsonPayload = """ "{"id":"$playerId","name":"$playerLastName","dailyReport":"$content"}" """ ;
assertThat(HTTP_REQUEST.executeWithBody(jsonPayload).getStatusCode(), equalTo(200));
}
}
HTTP_REQUEST is Thread Safe
I'm having a scheduler that gets our cluster metrics and writes the data onto a HDFS file using an older version of the Cloudera API. But recently, we updated our JARs and the original code errors with an exception.
java.lang.ClassCastException: org.apache.hadoop.io.ArrayWritable cannot be cast to org.apache.hadoop.hive.serde2.io.ParquetHiveRecord
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)
I need help in using the ParquetHiveRecord class write the data (which are POJOs) in parquet format.
Code sample below:
Writable[] values = new Writable[20];
... // populate values with all values
ArrayWritable value = new ArrayWritable(Writable.class, values);
writer.write(value); // <-- Getting exception here
Details of "writer" (of type ParquetWriter):
MessageType schema = MessageTypeParser.parseMessageType(SCHEMA); // SCHEMA is a string with our schema definition
ParquetWriter<ArrayWritable> writer = new ParquetWriter<ArrayWritable>(fileName, new
DataWritableWriteSupport() {
#Override
public WriteContext init(Configuration conf) {
if (conf.get(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA) == null)
conf.set(DataWritableWriteSupport.PARQUET_HIVE_SCHEMA, schema.toString());
}
});
Also, we were using CDH and CM 5.5.1 before, now using 5.8.3
Thanks!
I think you need to use DataWritableWriter rather than ParquetWriter. The class cast exception indicates the write support class is expecting an instance of ParquetHiveRecord instead of ArrayWritable. DataWritableWriter likely breaks down the individual records in ArrayWritable to individual messages in the form of ParquetHiveRecord and sends each to the write support.
Parquet is sort of mind bending at times. :)
Looking at the code of the DataWritableWriteSupport class:
https ://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
You can see it is using the DataWritableWriter, hence you do not need to create an instance of DataWritableWriter, the idea of Write support is that you will be able to write different formats to parquet.
What you do need is to wrap your writables in ParquetHiveRecord
I am implementing REST through RESTlet. This is an amazing framework to build such a restful web service; it is easy to learn, its syntax is compact. However, usually, I found that when somebody/someprogram want to access some resource, it takes time to print/output the XML, I use JaxbRepresentation. Let's see my code:
#Override
#Get
public Representation toXml() throws IOException {
if (this.requireAuthentication) {
if (!this.app.authenticate(getRequest(), getResponse()))
{
return new EmptyRepresentation();
}
}
//check if the representation already tried to be requested before
//and therefore the data has been in cache
Object dataInCache = this.app.getCachedData().get(getURI);
if (dataInCache != null) {
System.out.println("Representing from Cache");
//this is warning. unless we can check that dataInCache is of type T, we can
//get rid of this warning
this.dataToBeRepresented = (T)dataInCache;
} else {
System.out.println("NOT IN CACHE");
this.dataToBeRepresented = whenDataIsNotInCache();
//automatically add data to cache
this.app.getCachedData().put(getURI, this.dataToBeRepresented, cached_duration);
}
//now represent it (if not previously execute the EmptyRepresentation)
JaxbRepresentation<T> jaxb = new JaxbRepresentation<T>(dataToBeRepresented);
jaxb.setFormattedOutput(true);
return jaxb;
}
AS you can see, and you might asked me; yes I am implementing Cache through Kitty-Cache. So, if some XML that is expensive to produce, and really looks like will never change for 7 decades, then I will use cache... I also use it for likely static data. Maximum time limit for a cache is an hour to remain in memory.
Even when I cache the output, sometimes, output are irresponsive, like hang, printed partially, and takes time before it prints the remaining document. The XML document is accessible through browser and also program, it used GET.
What are actually the problem? I humbly would like to know also the answer from RESTlet developer, if possible. Thanks
Short question:
I'm looking for a way (java) to intercept a query to Solr and inject a few extra filtering parameters provided by my business logic. What structures should I use?
Context:
First of all, a little confession: I'm such a rookie regarding Solr. For me, setting up a server, defining a schema, coding a functional indexmanager and afterwards actually seeing the server returning the right results - exactly as intended! - was already much of an achievement for itself. Yay me!
However I'm currently working in an enterprise project that requires a little more than that. Roughly speaking, the solr instance is to be queried by several thousands of users through the very same requestHandler, being that the documents returned are automatically filtered according to a user's permission level. For example, if both the user A and the super-user B tried the very same search parameters (even the very same url), the user B would get all of user A's files and then some more. In order to accomplish this the documents are already indexed with the necessary permission level information.
Well, with this in mind and making use of Solr's extensive documentation for newb developers I tried to come up with a simple custom requestHandler that overrides the handleRequest function in order to inject the necessary extra parameters in the SolrQueryRequest. All is fine and dandy - except that I never see any difference at all in the QueryResponse, the server rudely ignoring my little manipulation. After a couple of days searching the web without so much of a hint weather if this the best approach, finally decided to come up and bother the fine folks here at StackOverflow.
So, in short, my questions are:
Is this a correct approach? Are there other alternatives? I can already grasp some of Solr's concepts, but admittedly there is much lacking and its entirely possible that am missing something.
If so, after modifying the query parameters is there anything I should do to force the QueryResponse to be updated? As far as I can tell these are merely encapsulating http requests, and I fail to sniff anything querying the server after the modifications are made.
Thanks in advance and so very sorry for the long post!
UPDATE
After a lot of reading APIs and specially much trial and error I've managed to get a functional solution. However I still fail to understand much of Solr's internals, therefore would still appreciate some enlightening. Feel free to bash at will, am still very aware of my rookiness.
The relevant part of the solution is this function which is called from by overriden handleRequestBody:
private void SearchDocumentsTypeII(SolrDocumentList results,
SolrIndexSearcher searcher, String q,
UserPermissions up, int ndocs, SolrQueryRequest req,
Map<String, SchemaField> fields, Set<Integer> alreadyFound)
throws IOException, ParseException {
BooleanQuery bq = new BooleanQuery();
String permLvl = "PermissionLevel:" + up.getPermissionLevel();
QParser parser = QParser.getParser(permLvl, null, req);
bq.add(parser.getQuery(), Occur.MUST);
Filter filter = CachingWrapperFilter(new QueryWrapperFilter(bq));
QueryParser qp = new QueryParser(q, new StandardAnalyzer());
Query query = qp.parse(q);
append (results, searcher.search(
query, filter, 50).scoreDocs,
alreadyFound, fields, new HashMap<String,Object>(), 0,
searcher.getReader(), true);
}
Basically the search query is not modified in any way, and instead a filter is applied containing the PermissionLevel of the user. Even so, why doesn't the following alternative work? The search query works perfectly when applied in the standard requestHandler, while in this case it simply doesn't hit any document.
private void SearchDocumentsTypeII(SolrDocumentList results,
SolrIndexSearcher searcher, String q,
UserPermissions up, int ndocs, SolrQueryRequest req,
Map<String, SchemaField> fields, Set<Integer> alreadyFound)
throws IOException, ParseException {
String qFiltered = q + " AND " + "PermissionLevel:" + up.getPermissionLevel();
QueryParser qp = new QueryParser(qFiltered, new StandardAnalyzer());
Query query = qp.parse(qFiltered);
append (results, searcher.search(
query, null, 50).scoreDocs,
alreadyFound, fields, new HashMap<String,Object>(), 0,
searcher.getReader(), true);
}
Good news: you don't need to write any code to do that, you just have to configure Solr properly. The superuser would hit the standard request handler while the regular user would hit another request handler (also a solr.StandardRequestHandler) configured with an invariant with the filter query you want to force upon them.
See also http://wiki.apache.org/solr/SolrRequestHandler
Oh well. As previously stated, the answer that worked for me. Feel free to comment or bash!
private void SearchDocumentsTypeII(SolrDocumentList results,
SolrIndexSearcher searcher, String q,
UserPermissions up, int ndocs, SolrQueryRequest req,
Map<String, SchemaField> fields, Set<Integer> alreadyFound)
throws IOException, ParseException {
BooleanQuery bq = new BooleanQuery();
String permLvl = "PermissionLevel:" + up.getPermissionLevel();
QParser parser = QParser.getParser(permLvl, null, req);
bq.add(parser.getQuery(), Occur.MUST);
Filter filter = CachingWrapperFilter(new QueryWrapperFilter(bq));
QueryParser qp = new QueryParser(q, new StandardAnalyzer());
Query query = qp.parse(q);
append (results, searcher.search(
query, filter, 50).scoreDocs,
alreadyFound, fields, new HashMap<String,Object>(), 0,
searcher.getReader(), true);
}
Take a look this solr wiki page it says we should first consider using Apache Manifold Framework, and if it doesn't suite your need, then write your own requestHandler
I had exact same requirement. In case anyone seeing this, here is my solution.
Request handler
public class SearchRequestHanlder extends SearchHandler {
#Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
var map = req.getParams();
if (map instanceof MultiMapSolrParams) {
MultiMapSolrParams m = (MultiMapSolrParams) map;
MultiMapSolrParams.addParam("bq", "category:film^220", m.getMap());
}
super.handleRequestBody(req, rsp);
}
#Override
public String getDescription() {
return "Custom SearchRequestHanlder";
}
}
solrconf.xml
<lib dir="/opt/solr/data/cores/movies/lib" regex=".*\.jar" />
<!-- Make sure following line is after existing default <requestHandler name="/select" -->
<requestHandler name="/select" class="com.solrplugin.SearchRequestHanlder" />