Its been a while since i've worked with Java especially exceptions. I'm in the process of adding ektorp couchdb intergration into something i'm working on. However i'm encountering content consumed exceptions.
The program in question uses twitter4j and i'm getting my statuses and writing them to a couchdb instance.
public void putTweet(Status status)
{
Map<String, Object> newTweetDoc = new HashMap<String, Object>();
String docname = status.getUser().getName() + " "
+ status.getCreatedAt().toString();
newTweetDoc.put("_id", docname);
newTweetDoc.put("User", status.getUser().getName());
newTweetDoc.put("Contents", status.getText());
newTweetDoc.put("Created", status.getCreatedAt().toString());
newTweetDoc.put("RetweetCount", status.getRetweetCount());
UserMentionEntity[] mentions = status.getUserMentionEntities();
Map<String, HashMap<String, String>> formattedMentions = formatMentions(mentions);
newTweetDoc.put("Mentions", formattedMentions);
db.addToBulkBuffer(newTweetDoc);
}
At first i tried db.create(newTweetDoc) as well. Does the couchdbConnector need to be recreated every time i try this?
db is a global CouchDbConnector:
public CouchDbConnector db = null;
public CouchTwitter()
{
//create the db connection etc
}
It's the db.create(doc) or flushBulkBuffer that results in the error. Here is the stacktrace:
Exception in thread "main" java.lang.IllegalStateException: Content has been consumed
at org.apache.http.entity.BasicHttpEntity.getContent(BasicHttpEntity.java:84)
at org.apache.http.conn.BasicManagedEntity.getContent(BasicManagedEntity.java:88)
at org.ektorp.http.StdHttpResponse.releaseConnection(StdHttpResponse.java:82)
at org.ektorp.http.RestTemplate.handleResponse(RestTemplate.java:111)
at org.ektorp.http.RestTemplate.post(RestTemplate.java:66)
at org.ektorp.impl.StdCouchDbConnector.executeBulk(StdCouchDbConnector.java:638)
at org.ektorp.impl.StdCouchDbConnector.executeBulk(StdCouchDbConnector.java:596)
at org.ektorp.impl.StdCouchDbConnector.flushBulkBuffer(StdCouchDbConnector.java:617)
I see in the above that two seperate Entity classes both call .getContent(), i've been playing around with my referenced libraries recently is it possible that its calling an old apache Http lib as well as the current?
CouchDbConnector is thread safe so you don't need recreate it for each operation.
I have never encountered your problem, your use case is pretty simple and there should not be any problem saving a basic doc.
Verify that httpclient-4.1.1 or above is in the classpath.
Related
I have an Apache Beam pipeline that reads data from DynamoDB. To read the data I use the Apache Beam DynamoDBIO SDK. I need to read specific/filter data in my use case, meaning I have to use filterExpression in DynamoDBIO. My current code is as follows,
Map<String, AttributeValue> expressionAttributeValues = new HashMap<>();
expressionAttributeValues.put(":message", AttributeValue.builder().s("Ping").build());
pipeline
.apply(DynamoDBIO.<List<Map<String, AttributeValue>>>read()
.withClientConfiguration(DynamoDBConfig.CLIENT_CONFIGURATION)
.withScanRequestFn(input -> ScanRequest.builder().tableName("SiteProductCache").totalSegments(1)
.filterExpression("KafkaEventMessage = :message")
.expressionAttributeValues(expressionAttributeValues)
.projectionExpression("key, KafkaEventMessage")
.build())
.withScanResponseMapperFn(new ResponseMapper())
.withCoder(ListCoder.of(MapCoder.of(StringUtf8Coder.of(), AttributeValueCoder.of())))
)
.apply(...)
----
static final class ResponseMapper implements SerializableFunction<ScanResponse, List<Map<String, AttributeValue>>> {
#Override
public List<Map<String, AttributeValue>> apply(ScanResponse input) {
if (input == null) {
return Collections.emptyList();
}
return input.items();
}
}
When executing the code, am getting the below error,
Exception in thread "main" java.lang.IllegalArgumentException: Forbidden IOException when writing to OutputStream
at org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:89)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:70)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:55)
at org.apache.beam.sdk.transforms.Create$Values$CreateSource.fromIterable(Create.java:413)
at org.apache.beam.sdk.transforms.Create$Values.expand(Create.java:370)
at org.apache.beam.sdk.transforms.Create$Values.expand(Create.java:277)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:548)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:499)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
at org.apache.beam.sdk.io.aws2.dynamodb.DynamoDBIO$Read.expand(DynamoDBIO.java:301)
at org.apache.beam.sdk.io.aws2.dynamodb.DynamoDBIO$Read.expand(DynamoDBIO.java:172)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:548)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:482)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:44)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:177)
at some_package.beam_state_storage.dynamodb.DynamoDBPipelineDefinition.run(DynamoDBPipelineDefinition.java:40)
at some_package.beam_state_storage.dynamodb.DynamoDBPipelineDefinition.main(DynamoDBPipelineDefinition.java:28)
Caused by: java.io.NotSerializableException: software.amazon.awssdk.core.util.DefaultSdkAutoConstructList
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1197)
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1582)
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1539)
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1448)
Caused by: java.io.NotSerializableException: software.amazon.awssdk.core.util.DefaultSdkAutoConstructList
Does anyone have an idea how to solve this or the correct way to read and filter data, am a bit new to this Apache Beam stuff and appreciate any guidance.
I believe the problem here is that you are trying to use outside members inside a lambda, and for that to happen, the parent instance needs to be serialized, but there are members that do not implement Serializable (similar to Apache Beam: Unable to serialize DoFnWithExecutionInformation because of PipelineOptions not serializable).
Maybe expressionAttributeValues itself is causing the issue, I am not sure what refers to DefaultSdkAutoConstructList from your post.
Try to replace lambda with a well-scoped static class, or if possible, initialize expressionAttributeValues inside the lambda itself instead of having to carry through DoFn.
This documentation will help in understanding the underlying issue here: https://beam.apache.org/documentation/programming-guide/#user-code-serializability.
We need to read data from our checkpoints manually for different reasons (let's say we need to change our state object/class structure, so we want to read restore and copy data to a new type of object)
But, while we are reading everything is good, when we want to keep/store it in memory and deploying to flink cluster we get empty list/map. in log we see that we are reading and adding all our data properly to list/map but as soon as our method completes it's work we lost data, list/map is empty :(
val env = ExecutionEnvironment.getExecutionEnvironment();
val savepoint = Savepoint.load(env, checkpointSavepointLocation, new HashMapStateBackend());
private List<KeyedAssetTagWithConfig> keyedAssetsTagWithConfigs = new ArrayList<>();
val keyedStateReaderFunction = new KeyedStateReaderFunctionImpl();
savepoint.readKeyedState("my-uuid", keyedStateReaderFunction)
.setParallelism(1)
.output(new MyLocalCollectionOutputFormat<>(keyedAssetsTagWithConfigs));
env.execute("MyJobName");
private static class KeyedStateReaderFunctionImpl extends KeyedStateReaderFunction<String, KeyedAssetTagWithConfig> {
private MapState<String, KeyedAssetTagWithConfig> liveTagsValues;
private Map<String, KeyedAssetTagWithConfig> keyToValues = new ConcurrentHashMap<>();
#Override
public void open(final Configuration parameters) throws Exception {
liveTagsValues = getRuntimeContext().getMapState(ExpressionsProcessor.liveTagsValuesStateDescriptor);
}
#Override
public void readKey(final String key, final Context ctx, final Collector<KeyedAssetTagWithConfig> out) throws Exception {
liveTagsValues.iterator().forEachRemaining(entry -> {
keyToValues.put(entry.getKey(), entry.getValue());
log.info("key {} -> {} val", entry.getKey(), entry.getValue());
out.collect(entry.getValue());
});
}
public Map<String, KeyedAssetTagWithConfig> getKeyToValues() {
return keyToValues;
}
}
as soon as this code executes I expect having all values inside map which we get from keyedStateReaderFunction.getKeyToValues(). But it returns empty map. However, I see in logs we are reading all of them properly. Even data empty inside keyedAssetsTagWithConfigs list where we are reading output in it.
If anyone has any idea will be very helpful because I get lost, I never had such experience that I put data to map and then I lose it :) When I serialize and write my map or list to text file and then deserialize it from there (using jackson) I see my data exists, but this is not a solution, kind of "workaround"
Thanks in advance
The code you show creates and submits a Flink job to be executed in its own environment orchestrated by the Flink framework: https://nightlies.apache.org/flink/flink-docs-stable/docs/concepts/flink-architecture/#flink-application-execution
The job runs independently than the code that builds and submits the Flink job so when you call keyedStateReaderFunction.getKeyToValues(), you are calling the method of the object that was used to build the job, not the actual object that was run in the Flink execution environment.
Your workaround seems like a valid option to me. You can then submit the file with your savepoint contents to your new job to recreate its state as you'd like.
You have an instance of KeyedStateReaderFunctionImpl in the Flink client which gets serialized and sent to each task manager. Each task manager then deserializes a copy of that KeyedStateReaderFunctionImpl and calls its open and readKey methods, and gradually builds up a private Map containing its share of the data extracted from the savepoint/checkpoint.
Meanwhile the original KeyedStateReaderFunctionImpl back in the Flink client has never had its open or readKey methods called, and doesn't hold any data.
In your case the parallelism is one, so there is only one task manager, but in general you will need collect the output from each task manager and assemble together the complete results from these pieces. These results are not available in the flink client process because the work hasn't been done there.
I found a solution, started job in attached mode and collecting results in main thread
val env = ExecutionEnvironment.getExecutionEnvironment();
val configuration = env.getConfiguration();
configuration
.setBoolean(DeploymentOptions.ATTACHED, true);
...
val myresults = dataSource.collect();
Hope will help somebody else because I wasted couple of days while trying to find a soltion.
I'm using logback and I need to log all data queried by clients, to a log file. All the data queried by clients is needed to be logged to the same file. The logging process simply looks like below:
private static final OUTPUTFILELOGGER = Logger.getLogger(...);
String outputString = null;
try {
Map<String, Object> outputMap = doService(); // queries data requested by clients.
.... // do something after business logic..
outputLog = outputMap.toString(); // critical!!
} catch (Throwable e) {
handling exception
} finally {
OUTPUTFILELOGGER.info(outputString);
}
It usually works fine, but sometimes it arises OutOfMemoryError with the call of toString to the outputMap variable when the requested data is too big to make a string.
So I want to make it work in a way of streaming without any problem to performance. And I don't know how to make it effectively and gracefully.
Any idea?
Loop through the map so that you're only working with a small part at a time:
LOGGER.info("Map contains:")
map.forEach( (key, value) -> LOGGER.info("{}: {}", key, value));
(Assumes Java 8 and SLF4J)
However if the map is big enough for the code you've given to generate OOMs, you should probably consider whether it's appropriate to log it in such detail -- or whether your service ought to be capping the response size.
I am having some problems developing a Maven project with Eclipse. I tried to search on the net, but there is nothing similar.
To sum up, I am using the WFSDataStore (geotools) to get the collection of features from an XML and then adding to a Database.
There are two different behaviours:
When I do Run As > Java Application everything is correct and the code is working.
When I do Run As > Maven (clean install tomcat:run-war). There is an error on line dataStoreBD = DataStoreFinder.getDataStore(params);, where dataStore is null:
(If you want to check the getCapabilities parameter)
public static void dataAccess(String getCapabilities, WFSDataStoreFactory dsf) throws Exception {
// Variables
// WFS Connection
Map connectionParameters = new HashMap();
connectionParameters.put(WFSDataStoreFactory.URL.key, getCapabilities);
connectionParameters.put(WFSDataStoreFactory.PROTOCOL.key, false);
connectionParameters.put(WFSDataStoreFactory.LENIENT.key, true);
connectionParameters.put(WFSDataStoreFactory.MAXFEATURES.key, "5");
connectionParameters.put(WFSDataStoreFactory.TIMEOUT.key, 600000);
// Database Connection
DataStore dataStoreBD = null;
Transaction transaction = null;
Filter filter = null;
Map<String, String> params = new HashMap<String, String>();
params.put("dbtype", configTypeDatabase);
params.put("host", configIp);
params.put("port", configPort);
params.put("schema", configUser);
params.put("database", configDatabase);
params.put("user", configUser);
params.put("passwd", configPassword);
params.put("accessToUnderlyingConnectionAllowed", true);
dataStoreBD = DataStoreFinder.getDataStore(params);
// Etc. }
The parameters are correct. I am getting them from a configuration file stored in my computer and I debugged like a thousand times to know what is really happening, but maybe there is a problem that I cannot see.
After this code I have another piece:
SimpleFeatureSource initialBDFeatureSource = dataStoreBD.getFeatureSource(configDatesTable);
FeatureIterator<SimpleFeature> ifs = initialBDFeatureSource.getFeatures().features();
The first line ends the program with this error:
java.lang.NullPointerException
at com.sitep.imi.acefat.server.daemon.InsertarBBDDDaemon.dataAccess(InsertarBBD
DDaemon.java:972)
First of all, I deployed the project to try something else (using tomcat7 and adding the information of tomcat.xml (project path from Eclipse's workspace) to context.xml (tomcat7 path: C:\Program Files\Apache Software Foundation\Tomcat 7.0\conf).
Then I was studying the connections (pool, jndi and jdbc) because I was not able to connect properly to my database. That is why I endeded up changing the jdbc (a general-purpose interface to relational databases) connection to jndi (a general-purpose interface to naming systems) connection, like the following:
Map<String, String> params = new HashMap<String, String>();
params.put("dbtype", configTypeDatabase);
params.put("jndiReferenceName", "java:comp/env/jdbc/DBName");
params.put("accessToUnderlyingConnectionAllowed", true);
params.put("schema", configUser);
I omit some parameters because they are not significant nor irrelevant for jndi connection. The reason why is a little bit tricky because after resolving it I can't even explain it. For ever reason when I tried to run the Java Application or option 1 locally it always worked, but I defined the connection as a lookup (<jee:jndi-lookup id="dataSource" jndi-name="java:comp/env/jdbc/DBName"/>) so it only works with jndi when you have to run or deploy it with maven.
If I find more information I will update my answer to clear my it or improve it.
Short question:
I'm looking for a way (java) to intercept a query to Solr and inject a few extra filtering parameters provided by my business logic. What structures should I use?
Context:
First of all, a little confession: I'm such a rookie regarding Solr. For me, setting up a server, defining a schema, coding a functional indexmanager and afterwards actually seeing the server returning the right results - exactly as intended! - was already much of an achievement for itself. Yay me!
However I'm currently working in an enterprise project that requires a little more than that. Roughly speaking, the solr instance is to be queried by several thousands of users through the very same requestHandler, being that the documents returned are automatically filtered according to a user's permission level. For example, if both the user A and the super-user B tried the very same search parameters (even the very same url), the user B would get all of user A's files and then some more. In order to accomplish this the documents are already indexed with the necessary permission level information.
Well, with this in mind and making use of Solr's extensive documentation for newb developers I tried to come up with a simple custom requestHandler that overrides the handleRequest function in order to inject the necessary extra parameters in the SolrQueryRequest. All is fine and dandy - except that I never see any difference at all in the QueryResponse, the server rudely ignoring my little manipulation. After a couple of days searching the web without so much of a hint weather if this the best approach, finally decided to come up and bother the fine folks here at StackOverflow.
So, in short, my questions are:
Is this a correct approach? Are there other alternatives? I can already grasp some of Solr's concepts, but admittedly there is much lacking and its entirely possible that am missing something.
If so, after modifying the query parameters is there anything I should do to force the QueryResponse to be updated? As far as I can tell these are merely encapsulating http requests, and I fail to sniff anything querying the server after the modifications are made.
Thanks in advance and so very sorry for the long post!
UPDATE
After a lot of reading APIs and specially much trial and error I've managed to get a functional solution. However I still fail to understand much of Solr's internals, therefore would still appreciate some enlightening. Feel free to bash at will, am still very aware of my rookiness.
The relevant part of the solution is this function which is called from by overriden handleRequestBody:
private void SearchDocumentsTypeII(SolrDocumentList results,
SolrIndexSearcher searcher, String q,
UserPermissions up, int ndocs, SolrQueryRequest req,
Map<String, SchemaField> fields, Set<Integer> alreadyFound)
throws IOException, ParseException {
BooleanQuery bq = new BooleanQuery();
String permLvl = "PermissionLevel:" + up.getPermissionLevel();
QParser parser = QParser.getParser(permLvl, null, req);
bq.add(parser.getQuery(), Occur.MUST);
Filter filter = CachingWrapperFilter(new QueryWrapperFilter(bq));
QueryParser qp = new QueryParser(q, new StandardAnalyzer());
Query query = qp.parse(q);
append (results, searcher.search(
query, filter, 50).scoreDocs,
alreadyFound, fields, new HashMap<String,Object>(), 0,
searcher.getReader(), true);
}
Basically the search query is not modified in any way, and instead a filter is applied containing the PermissionLevel of the user. Even so, why doesn't the following alternative work? The search query works perfectly when applied in the standard requestHandler, while in this case it simply doesn't hit any document.
private void SearchDocumentsTypeII(SolrDocumentList results,
SolrIndexSearcher searcher, String q,
UserPermissions up, int ndocs, SolrQueryRequest req,
Map<String, SchemaField> fields, Set<Integer> alreadyFound)
throws IOException, ParseException {
String qFiltered = q + " AND " + "PermissionLevel:" + up.getPermissionLevel();
QueryParser qp = new QueryParser(qFiltered, new StandardAnalyzer());
Query query = qp.parse(qFiltered);
append (results, searcher.search(
query, null, 50).scoreDocs,
alreadyFound, fields, new HashMap<String,Object>(), 0,
searcher.getReader(), true);
}
Good news: you don't need to write any code to do that, you just have to configure Solr properly. The superuser would hit the standard request handler while the regular user would hit another request handler (also a solr.StandardRequestHandler) configured with an invariant with the filter query you want to force upon them.
See also http://wiki.apache.org/solr/SolrRequestHandler
Oh well. As previously stated, the answer that worked for me. Feel free to comment or bash!
private void SearchDocumentsTypeII(SolrDocumentList results,
SolrIndexSearcher searcher, String q,
UserPermissions up, int ndocs, SolrQueryRequest req,
Map<String, SchemaField> fields, Set<Integer> alreadyFound)
throws IOException, ParseException {
BooleanQuery bq = new BooleanQuery();
String permLvl = "PermissionLevel:" + up.getPermissionLevel();
QParser parser = QParser.getParser(permLvl, null, req);
bq.add(parser.getQuery(), Occur.MUST);
Filter filter = CachingWrapperFilter(new QueryWrapperFilter(bq));
QueryParser qp = new QueryParser(q, new StandardAnalyzer());
Query query = qp.parse(q);
append (results, searcher.search(
query, filter, 50).scoreDocs,
alreadyFound, fields, new HashMap<String,Object>(), 0,
searcher.getReader(), true);
}
Take a look this solr wiki page it says we should first consider using Apache Manifold Framework, and if it doesn't suite your need, then write your own requestHandler
I had exact same requirement. In case anyone seeing this, here is my solution.
Request handler
public class SearchRequestHanlder extends SearchHandler {
#Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
var map = req.getParams();
if (map instanceof MultiMapSolrParams) {
MultiMapSolrParams m = (MultiMapSolrParams) map;
MultiMapSolrParams.addParam("bq", "category:film^220", m.getMap());
}
super.handleRequestBody(req, rsp);
}
#Override
public String getDescription() {
return "Custom SearchRequestHanlder";
}
}
solrconf.xml
<lib dir="/opt/solr/data/cores/movies/lib" regex=".*\.jar" />
<!-- Make sure following line is after existing default <requestHandler name="/select" -->
<requestHandler name="/select" class="com.solrplugin.SearchRequestHanlder" />