I am trying to run a Bulk Request through JEST and want to append my data (say "bills") one at a time and then execute all at once, however when i run the following code on 10 bills just the last bill is getting executed, can someone please correct this code to execute all 10 bills (by executing it outside the for loop ie using Bulk Request)?
for(JSONObject bill : bills) {
bulkRequest = new Bulk.Builder()
.addAction(new Index.Builder(bill.toString()).index(index).type(type).id(id).build())
.build();
}
bulkResponse = Client.execute(bulkRequest);
You need to build the Bulk Builder out of the loop and then use it to add all bills:
bulkRequest = new Bulk.Builder()
for(JSONObject bill : bills) {
bulkRequest.addAction(new Index.Builder(bill.toString()).index(index).type(type).id(id).build())
}
bulkResponse = Client.execute(bulkRequest.build());
I know It's an old question, but just in case someone stumbles across this, here is a java 8/(lambdas) way of doing the same thing.
Client.execute( new Bulk.Builder()
.addAction(
bills.stream()
.map(bill ->
new Index.Builder(bill.toString()
)
.index(index).type(type).id(id).build())
.collect(Collectors.toList())
).build());
Related
I need to work on ajax response, that is one of responses received upon visiting a page. I use selenium dev tools and java. I create a listener, that intercepts a specific request and then I want to work on response it brings. However I need to setup static wait, or else selenium don't have time to save RequestId. I read Chrome Dev Tools documentation, but it's a new thing for me. I wonder if there is a method that would allow me to wait for this call to be completed, other than the static wait.
Here is my code:
#Test(groups = "test")
public void x() throws InterruptedException, JsonProcessingException {
User user = User.builder();
ManageAccountStep manageAccountStep = new ManageAccountStep(getDriver());
DashboardPO dashboardPO = new DashboardPO(getDriver());
manageAccountStep.login(user);
DevTools devTools = ((HasDevTools) getDriver()).maybeGetDevTools().orElseThrow();
devTools.createSessionIfThereIsNotOne();
devTools.send(Network.enable(Optional.empty(), Optional.empty(), Optional.empty()));
// end of boilerplate
final RequestId[] id = new RequestId[1];
devTools.addListener(Network.responseReceived(), response -> {
log.info(response.getResponse().getUrl());
if (response.getResponse().getUrl().contains(DESIRED_URL)){
id[0] = response.getRequestId();
}
});
dashboardPO
.clickLink(); // here is when my DESIRED_URL happens
Utils.sleep(5000); // Something like Thread.sleep(5000)
String responseBody = devTools.send(Network.getResponseBody(id[0])).getBody();
// some operations on responseBody
devTools.clearListeners();
devTools.disconnectSession();
}
If I don't use 5 seconds wait id variable gets never assigned and I null pointer exception requestId is required. During these 5 seconds log.info prints all api calls that are happening and it almost always finds my id. I would like to refrain from static wait though. I am thinking about something similiar to maybe jQuery.active()==0, but my page doesn't use jQuery.
You may try custom function Explicit Wait. Something like this:
public String getResponseBody(WebDriver driver, DevTools devTools) {
return new WebDriverWait(driver,5)
.ignoring(NullPointerException.class)
.until(driver ->
devTools.send(Network.getResponseBody(id[0])).getBody());
}
So, it won't wait for all 5 seconds. The moment it got the data, it would come of out of the until method. Also add whichever Exception that was coming up here.
Has put these lines of code as separate method because, devTools object is locally defined. In order to use them inside this anonymous inner function, it has to be final or effectively final.
I seem to run into this issue when running tests in parallel (and headless) and trying to capture the requests and responses, I get:
{"No data found for resource with given identifier"},"sessionId" ...
However, now .until seems to only take ExpectedCondition
So a similar solution (to the accepted answer), but without using "WebDriverWait.until" that I use is:
public static String getResponseBody(DevTools devTools, RequestId id) {
String requestPostData = "";
LocalDateTime then = LocalDateTime.now();
String err = "";
Integer it = 0;
while (true) {
err = "";
try{requestPostData = devTools.send(Network.getResponseBody(id)).getBody();} catch( Exception e){err = e.getMessage();};
if (requestPostData != null && !requestPostData.equals("")) {break;}
if (err.equals("")) {break;} // if we don't have an error message, its quite possible the responseBody really is an empty string
long timeTaken = ChronoUnit.SECONDS.between(then, LocalDateTime.now());
if (timeTaken >= 5) {requestPostData = err + ", timeTaken:" + timeTaken; break;}
if(it > 0) {TimeUnit.SECONDS.sleep(it);} // I prefer waiting longer and longer, avoiding stack overflows
it++;
}
return requestPostData;
}
It just loops until it doesn't error, and returns the string back as soon as it can (but I actually set timeTaken >= 60 due to many parallel requests)
I need to periodically clear the data in multiple sheets and re-populate them with data (via the Google Sheets API v4). To do this, I'm executing 2 separate requests for each sheet (1 clear & 1 update). This is kind of a slow process when the user is sitting there waiting for it. It seems to me that each new request significantly adds to the completion time. If I could wrap all these into a single batch-command request, it might help a lot.
I'm currently doing this for each sheet...
service.spreadsheets()
.values()
.clear(idSpreadsheet, sheetTitle + "!$A$1:$Y", new ClearValuesRequest())
.execute();
service.spreadsheets()
.values()
.update(idSpreadsheet, range, new ValueRange().setValues(values))
.setValueInputOption("USER_ENTERED")
.execute();
I don't see a way to just wrap a bunch of generic commands into a single batch request. I see that DeleteDimensionRequest and AppendCellsRequest can be wrapped into a batch, but I can't really find a good AppendCellsRequest example (and it seems that people recommend my current values().update() method anyway).
Can anyone recommend a good way to streamline this? Or am I already doing it the best way?
Still don't know if I'm doing it the BEST way, but I was able to accomplish my goal of clearing and populating multiple sheets with data in a single batch request. The trick was to not use the clear() method but instead, overwrite the the sheet with blank data using RepeatCellRequest. Also, now using AppendCellsRequest instead of update(). These two requests can be wrapped in a batch request.
My early tests with 3 sheets, shows about a 25% performance improvement. Not spectacular, but it helps.
List<Request> requests = new ArrayList<Request>();
for (SheetData mySheet : sheetDatas)
{
List<List<Object>> values = mySheet.getValues();
Request clearSheetRequest = new Request()
.setRepeatCell(new RepeatCellRequest()
.setRange(new GridRange()
.setSheetId(mySheet.getSheetId())
)
.setFields("*")
.setCell(new CellData())
);
List<RowData> preppedRows = new ArrayList<RowData>();
for (List<Object> row : values)
{
RowData preppedRow = new RowData();
List<CellData> cells = new ArrayList<CellData>();
for (Object value : row)
{
CellData cell = new CellData();
ExtendedValue userEnteredValue = new ExtendedValue();
if (value instanceof String)
{
userEnteredValue.setStringValue((String) value);
}
else if (value instanceof Double)
{
userEnteredValue.setNumberValue((Double) value);
}
else if (value instanceof Integer)
{
userEnteredValue.setNumberValue(Double.valueOf((Integer) value).doubleValue());
}
else if (value instanceof Boolean)
{
userEnteredValue.setBoolValue((Boolean) value);
}
cell.setUserEnteredValue(userEnteredValue);
cells.add(cell);
}
preppedRow.setValues(cells);
preppedRows.add(preppedRow);
}
Request appendCellsRequest = new Request().setAppendCells(
new AppendCellsRequest()
.setSheetId(mySheet.getSheetId())
.setRows(preppedRows)
.setFields("*")
);
requests.add(clearSheetRequest);
requests.add(appendCellsRequest);
}
BatchUpdateSpreadsheetRequest batch = new BatchUpdateSpreadsheetRequest().setRequests(requests);
BatchUpdateSpreadsheetResponse batchResponse = service.spreadsheets().batchUpdate(idSpreadsheet, batch).execute();
I m trying to index some data in ES and I m receiving out of memory exception:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.jackson.core.util.BufferRecycler.balloc(BufferRecycler.java:155)
at org.elasticsearch.common.jackson.core.util.BufferRecycler.allocByteBuffer(BufferRecycler.java:96)
at org.elasticsearch.common.jackson.core.util.BufferRecycler.allocByteBuffer(BufferRecycler.java:86)
at org.elasticsearch.common.jackson.core.io.IOContext.allocWriteEncodingBuffer(IOContext.java:152)
at org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator.<init>(UTF8JsonGenerator.java:123)
at org.elasticsearch.common.jackson.core.JsonFactory._createUTF8Generator(JsonFactory.java:1284)
at org.elasticsearch.common.jackson.core.JsonFactory.createGenerator(JsonFactory.java:1016)
at org.elasticsearch.common.xcontent.json.JsonXContent.createGenerator(JsonXContent.java:68)
at org.elasticsearch.common.xcontent.XContentBuilder.<init>(XContentBuilder.java:96)
at org.elasticsearch.common.xcontent.XContentBuilder.builder(XContentBuilder.java:77)
at org.elasticsearch.common.xcontent.json.JsonXContent.contentBuilder(JsonXContent.java:38)
at org.elasticsearch.common.xcontent.XContentFactory.contentBuilder(XContentFactory.java:122)
at org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder(XContentFactory.java:49)
at EsController.importProductEs(EsController.java:60)
at Parser.fromCsvToJson(Parser.java:120)
at CsvToJsonParser.parseProductFeeds(CsvToJsonParser.java:43)
at MainParser.main(MainParser.java:49)
This is how I instantiate the ES client:
System.out.println("Elastic search client is instantiated");
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch_brew").build();
client = new TransportClient(settings);
String hostname = "localhost";
int port = 9300;
((TransportClient) client).addTransportAddress(new InetSocketTransportAddress(hostname, port));
bulkRequest = client.prepareBulk();
and then I run the bulk request:
// for each product in the list, we need to include the fields in the bulk request
for(HashMap<String, String> productfields : products)
try {
bulkRequest.add(client.prepareIndex(index,type,productfields.get("Product_Id"))
.setSource(jsonBuilder()
.startObject()
.field("Name",productfields.get("Name") )
.field("Quantity",productfields.get("Quantity"))
.field("Make", productfields.get("Make"))
.field("Price", productfields.get("Price"))
.endObject()
)
);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//execute the bulk request
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
// process failures by iterating through each bulk response item
}
I am trying to index products from various shops. Each shop is a different index. When I reach the 6th shop containing around 60000 products I get the above exception. I split the bulk request in chunks of 10000, trying to avoid the out of memory problems.
I can't understand where exactly is the bottleneck. Would it help if i somehow flush the bulk request or restart the client??
I ve seen similar posts but non works for me.
EDIT
When I m instantiting a new client every time I process a new bulk request, then I don't get the out of memory exception. But instantiating a new client each time doesnt seem right..
Thank you
So I figured out what was wrong.
Every new bulk request was adding up to the previous one and eventually it was leading to out of memory.
So now before I start a new bulk request I run the
bulkRequest = client.prepareBulk();
which flushes the previous request.
Thank you guys for your comments
I am using Java EMR API to run pig job on EMR cluster. I am using following code to add Steps in JobFLow:
String jobFlowId = "j-assdasd";
AmazonElasticMapReduceClient client = new AmazonElasticMapReduceClient(
credentials);
StepFactory stepFactory = new StepFactory();
StepConfig executePig = new StepConfig()
.withName("Execute Pig")
.withActionOnFailure(ActionOnFailure.CANCEL_AND_WAIT)
.withHadoopJarStep(
stepFactory
.newRunPigScriptStep("s3://bucket/script/load.pig"));
AddJobFlowStepsRequest pig = new AddJobFlowStepsRequest(jobFlowId)
.withSteps( executePig);
AddJobFlowStepsResult result = client.addJobFlowSteps(pig);
How can i get the status of the "Execute Pig" status? I want to make program wait till the step finishes on EMR.
I found a way to do it java:
List<String> id = result.getStepIds();
DescribeStepResult res = client.describeStep(new DescribeStepRequest().withStepId(id.get(0)));
StepStatus status = res.getStep().getStatus();
String stas = status.getState();
But, here we need to loop on status till its return completed.
As Ajay mentioned on his own answer, there is a need for a loop to constantly check the statuses of the cluster, bootstrap actions, and steps. This post shows how to create such loop to keep the program inside of it until a certain status is reached.
I'm using twitter4j version 2.2.5. setPage() doesn't seem to work if I use it with setSince() and setUntil(). See the following code:
Twitter twitter = new TwitterFactory().getInstance();
twitter.setOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);
AccessToken accessToken = new AccessToken(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
twitter.setOAuthAccessToken(accessToken);
int page = 1;
while(true) {
Query query = new Query("from:someUser");
query.setSince("2012-01-01");
query.setUntil("2012-07-05");
query.setRpp(100);
query.setPage(page++);
QueryResult qr = twitter.search(query);
List<twitter4j.Tweet> qrTweets = qr.getTweets();
if(qr.getTweets().size() == 0) break;
for(twitter4j.Tweet t : qrTweets) {
System.out.println(t.getId() + " " + t.getText());
}
}
The code inside the loop is only executed once if I use setSince() and setUntil() methods. But without them, setPage() seems to work and I get more tweet results.
Any ideas why this is happening?
Your code appears to be working for me. It only returns tweets from the past nine days, but that's expected (approximately) according to your comment.
You mentioned in the question that the loop only ran once with rpp=100, and you said in a comment that all 80 of the user's tweets were returned when rpp=40. I think that would indicate that the code is working as expected. If the user only has 80 tweets, the loop only should run once when rpp=100, and it should execute twice and display all their tweets when rpp=40. Try displaying the page number as soon as you enter the while loop so you can see how many times the loop runs.