Split grouped timestamped log data using timestamp itself as splitter? - java

I wanted to write code which could read a log file and split it into multiple events, using the timestamp as the splitter (since every log entry begins with a timestamp). A sample of the logs I want to split is given below. I'd also like to keep the timestamp itself.
So if this is my input:
01 Aug 2016 04:48:13,311 ERROR [pool-2-thread-12436] com.orders.queue.OrdersQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Exception while calling /purchases: x-company-status : UNKNOWN_ERROR response status: Internal Server Error
01 Aug 2016 04:48:13,311 WARN [pool-2-thread-12436] com.orders.queue.OrdersQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Failed to process order, will be re-tried: ADD2500051FR
01 Aug 2016 04:48:13,332 INFO [pool-2-thread-12436] com.delegate - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Getting Email from Primary email
01 Aug 2016 04:48:13,363 WARN [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Message processing failed QueueMessage [payload=ADD2500051FR, delaySeconds=0, sqsId=51f70e3f-554a-463b-8384-0b2c25a90450, stringAttributes={features=adac2911-0578-4bcd-b8c3-783481a48e1d, accept-language=FR_FR, request-id=836ac8b6-515d-4414-b4c6-ddd8a52ef497}]
com.orders.exception.orderserviceException: Error in Calling PUT purchase from main service
at com.OrderServiceDelegate.handleInternalServerErrors(OrderServiceDelegate.java:352)
at com.OrderServiceDelegate.sendOrderForProcessing_aroundBody0(OrderServiceDelegate.java:113)
at com.OrderServiceDelegate.sendOrderForProcessing_aroundBody1$advice(OrderServiceDelegate.java:37)
at com.OrderServiceDelegate.sendOrderForProcessing(OrderServiceDelegate.java:1)
at com.orders.queue.OrdersQueueWorker.doWork(OrdersQueueWorker.java:168)
at com.queue.SQSQueueWorker.lambda$0(SQSQueueWorker.java:149)
at com.queue.SQSQueueWorker.dt_access$492(SQSQueueWorker.java)
at com.queue.SQSQueueWorker$$dtt$$Lambda$8/852112146.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
01 Aug 2016 04:48:13,365 INFO [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Order will be re-tried after 300 seconds: ADD2500051FR
01 Aug 2016 04:48:15,600 INFO [myScheduler-3] com.queue.SQSQueueWorker - x-company-requestid=sqs-worker service_name=orders_v3 Processing messages message_number=1
01 Aug 2016 04:48:15,600 INFO [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Received msg from SQS:QueueMessage [payload=428CB476547214700268914651663, delaySeconds=0, sqsId=7f4dcbbe-90c4-4e56-b4ab-50332597b5d8, stringAttributes={features=FIS-JEM, accept-language=EN_US, request-id=a2c31da4-517f-40ec-8587-624f97393659}]
Then my output should be (the horizontal line depicts where one entry ends and another begins):
01 Aug 2016 04:48:13,311 ERROR [pool-2-thread-12436] com.orders.queue.OrdersQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Exception while calling /purchases: x-company-status : UNKNOWN_ERROR response status: Internal Server Error
//--------------------------------------------
01 Aug 2016 04:48:13,311 WARN [pool-2-thread-12436] com.orders.queue.OrdersQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Failed to process order, will be re-tried: ADD2500051FR
//--------------------------------------------
01 Aug 2016 04:48:13,332 INFO [pool-2-thread-12436] com.delegate - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Getting Email from Primary email
//--------------------------------------------
01 Aug 2016 04:48:13,363 WARN [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Message processing failed QueueMessage [payload=ADD2500051FR, delaySeconds=0, sqsId=51f70e3f-554a-463b-8384-0b2c25a90450, stringAttributes={features=adac2911-0578-4bcd-b8c3-783481a48e1d, accept-language=FR_FR, request-id=836ac8b6-515d-4414-b4c6-ddd8a52ef497}]
com.orders.exception.orderserviceException: Error in Calling PUT purchase from main service
at com.OrderServiceDelegate.handleInternalServerErrors(OrderServiceDelegate.java:352)
at com.OrderServiceDelegate.sendOrderForProcessing_aroundBody0(OrderServiceDelegate.java:113)
at com.OrderServiceDelegate.sendOrderForProcessing_aroundBody1$advice(OrderServiceDelegate.java:37)
at com.OrderServiceDelegate.sendOrderForProcessing(OrderServiceDelegate.java:1)
at com.orders.queue.OrdersQueueWorker.doWork(OrdersQueueWorker.java:168)
at com.queue.SQSQueueWorker.lambda$0(SQSQueueWorker.java:149)
at com.queue.SQSQueueWorker.dt_access$492(SQSQueueWorker.java)
at com.queue.SQSQueueWorker$$dtt$$Lambda$8/852112146.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
//--------------------------------------------
01 Aug 2016 04:48:13,365 INFO [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Order will be re-tried after 300 seconds: ADD2500051FR
//--------------------------------------------
01 Aug 2016 04:48:15,600 INFO [myScheduler-3] com.queue.SQSQueueWorker - x-company-requestid=sqs-worker service_name=orders_v3 Processing messages message_number=1
//--------------------------------------------
01 Aug 2016 04:48:15,600 INFO [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Received msg from SQS:QueueMessage [payload=428CB476547214700268914651663, delaySeconds=0, sqsId=7f4dcbbe-90c4-4e56-b4ab-50332597b5d8, stringAttributes={features=FIS-JEM, accept-language=EN_US, request-id=a2c31da4-517f-40ec-8587-624f97393659}]
I imagine some sort of regex would be needed, but I have no experience in using regex to do such a split through Java code.
I also found the following related question, but I didn't understand the solution suggested there:
java regex: capture multiline sequence between tokens

The following code uses the log you supplied and produces your desired output. I've included comments in the code that explain the program flow.
Code:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Pattern;
public class Driver {
private static final String LINE_BREAK = System.lineSeparator();
// path to the logfile
private static final String LOG_FILE = "xin/xin.log";
// The regex below matches lines that begin with "00 Mth 0000 00:00:00,000".
private static Pattern pattern = Pattern.compile("^[0-9][0-9]\\s[A-Z][a-z][a-z]\\s[0-9][0-9][0-9][0-9]\\s[0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]");
public static void main(String[] args) {
BufferedReader reader = null;
StringBuilder sb = new StringBuilder();
String line = "";
boolean firstRun = true;
try {
// instantiated the buffered reader
reader = new BufferedReader(new FileReader(new File(LOG_FILE)));
} catch(FileNotFoundException e) {
e.printStackTrace();
}
// loop until all lines are read from log file
while(true) {
try {
// get next line from log file
line = reader.readLine();
// if no more content in log file then break out of loop
if(line == null) {
break;
}
} catch (IOException e) {
e.printStackTrace();
}
// we don't want a line-break before the first line
if(firstRun) {
firstRun = false;
// append the first line to the string builder
sb.append(line);
} else {
// we've handled the first line, so now we
// append a line-break to the string builder
// before appending the next line
sb.append(LINE_BREAK);
// if the line matches the timestamp pattern then
// append another line-break
if(pattern.matcher(line).find()) {
sb.append(LINE_BREAK);
}
// add the line to the string builder
sb.append(line);
}
}
try {
// close the buffered reader
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
// print string builder contents to standard out
System.out.println(sb.toString());
}
}
Output:
01 Aug 2016 04:48:13,311 ERROR [pool-2-thread-12436] com.orders.queue.OrdersQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Exception while calling /purchases: x-company-status : UNKNOWN_ERROR response status: Internal Server Error
01 Aug 2016 04:48:13,311 WARN [pool-2-thread-12436] com.orders.queue.OrdersQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Failed to process order, will be re-tried: ADD2500051FR
01 Aug 2016 04:48:13,332 INFO [pool-2-thread-12436] com.delegate - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Getting Email from Primary email
01 Aug 2016 04:48:13,363 WARN [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Message processing failed QueueMessage [payload=ADD2500051FR, delaySeconds=0, sqsId=51f70e3f-554a-463b-8384-0b2c25a90450, stringAttributes={features=adac2911-0578-4bcd-b8c3-783481a48e1d, accept-language=FR_FR, request-id=836ac8b6-515d-4414-b4c6-ddd8a52ef497}]
com.orders.exception.orderserviceException: Error in Calling PUT purchase from main service
at com.OrderServiceDelegate.handleInternalServerErrors(OrderServiceDelegate.java:352)
at com.OrderServiceDelegate.sendOrderForProcessing_aroundBody0(OrderServiceDelegate.java:113)
at com.OrderServiceDelegate.sendOrderForProcessing_aroundBody1$advice(OrderServiceDelegate.java:37)
at com.OrderServiceDelegate.sendOrderForProcessing(OrderServiceDelegate.java:1)
at com.orders.queue.OrdersQueueWorker.doWork(OrdersQueueWorker.java:168)
at com.queue.SQSQueueWorker.lambda$0(SQSQueueWorker.java:149)
at com.queue.SQSQueueWorker.dt_access$492(SQSQueueWorker.java)
at com.queue.SQSQueueWorker$$dtt$$Lambda$8/852112146.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
01 Aug 2016 04:48:13,365 INFO [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Order will be re-tried after 300 seconds: ADD2500051FR
01 Aug 2016 04:48:15,600 INFO [myScheduler-3] com.queue.SQSQueueWorker - x-company-requestid=sqs-worker service_name=orders_v3 Processing messages message_number=1
01 Aug 2016 04:48:15,600 INFO [pool-2-thread-12436] com.queue.SQSQueueWorker - x-company-requestid=836ac8b6-515d-4414-b4c6-ddd8a52ef497-sqs service_name=orders_v3 Received msg from SQS:QueueMessage [payload=428CB476547214700268914651663, delaySeconds=0, sqsId=7f4dcbbe-90c4-4e56-b4ab-50332597b5d8, stringAttributes={features=FIS-JEM, accept-language=EN_US, request-id=a2c31da4-517f-40ec-8587-624f97393659}]
If you need any additional help, then just leave a comment on this answer and I'll try my best to help.

Related

jHipster: Search criteria "GreaterThan,LessThan" for date time

In my jHipster project I want to search row by datetime range.
The fragments of my code are:
product-inquiry-view.component.html
<div class="input-group date" id='datetimepicker'>
<input id="field_dateStart" #dateStart="ngbDatepicker" [(ngModel)]="criteriaByStartDate"/>
<input id="field_dateEnd" #dateEnd="ngbDatepicker" [(ngModel)]="criteriaByEndDate"/>
<button (click)="findByCriteria()"
</button>
</div>
Component class:
product-inquiry-view.component.ts
export class ProductInquiryViewComponent implements OnInit, OnDestroy {
criteriaByStartDate: string;
criteriaByEndDate: string;
..........
findByCriteria() {
criteriaByStartDate: Date;
criteriaByEndDate: Date;
let criteriaByStartDate = this.criteriaByStartDate;
let criteriaByEndDate = this.criteriaByEndDate
this.productInquiryViewService
.query({
'piDate.greaterThan': criteriaByStartDate,
'piDate.lessThan': criteriaByEndDate,
'piDocnum.contains': this.criteriaByDocnum
}).subscribe(
(res: HttpResponse<IProductInquiryView[]>) => {
this.productInquiryViews = res.body;
},
(res: HttpErrorResponse) => this.onError(res.message)
);
}
Controller:
ProductInquiryViewResource.java
#GetMapping("/product-inquiry-views")
#Timed
public ResponseEntity<List<ProductInquiryViewDTO>> getAllProductInquiryViews(ProductInquiryViewCriteria criteria) {
log.debug("REST request to get ProductInquiryViews by criteria: {}", criteria);
List<ProductInquiryViewDTO> entityList = productInquiryViewQueryService.findByCriteria(criteria);
return ResponseEntity.ok().body(entityList);
}
And I got exception:
2019-04-15 12:48:45.910 WARN 24330 --- [ XNIO-2 task-1] .m.m.a.ExceptionHandlerExceptionResolver : Resolved exception caused by handler execution: org.springframework.validation.BindException:
org.springframework.validation.BeanPropertyBindingResult: 2 errors
Field error in object 'productInquiryViewCriteria' on field 'piDate.greaterThan': rejected value [Mon Apr 01 2019 00:00:00 GMT 0300]; codes
[typeMismatch.productInquiryViewCriteria.piDate.greaterThan,typeMismatch.piDate.greaterThan,typeMismatch.greaterThan,typeMismatch.java.time.ZonedDateTime,typeMismatch]; arguments
[org.springframework.context.support.DefaultMessageSourceResolvable: codes [productInquiryViewCriteria.piDate.greaterThan,piDate.greaterThan]; arguments [];
default message [piDate.greaterThan]]; default message [Failed to convert property value of type 'java.lang.String' to required type 'java.time.ZonedDateTime'
for property 'piDate.greaterThan'; nested exception is org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String]
to type [java.time.ZonedDateTime] for value 'Mon Apr 01 2019 00:00:00 GMT 0300'; nested exception is java.lang.IllegalArgumentException: Parse attempt failed for value [Mon Apr 01 2019 00:00:00 GMT 0300]] enter code here
I see that "Mon Apr 01 2019 00:00:00 GMT 0300" is not correst string for parsing to ZoneDateTime, so I tried send criteria in such format :
"2019-04-01T00:15:30+03:00[Europe/Moscow]";
Same result.
Should be :
this.productInquiryViewService
.query({
'piDate.greaterThan': criteriaByStartDate.toISOString(),
'piDate.lessThan': criteriaByEndDate.toISOString(),
...

How can i trans the unicode and BASE64 in the JAVA IMAP programming

I am writing a demo to learn IMAP command. When I fetched the header, the server returns the "date" and "from" fields to my input stream. But checking the result in the terminal, I find that some fields of the mail header are Unicode(like ?gb2312?B?zfjS19PDu6fW0NDE?) or BASE64
strings. So how can I translate them to standard encoding?
This is my code:
import com.sun.xml.internal.ws.policy.privateutil.PolicyUtils;
import java.io.*;
import java.net.Socket;
/**
* Created by joelchen on 2016/12/7.
*/
public class APP {
public static Socket Client = null;
public static BufferedReader inFormServer = null;
public static DataOutputStream toServer = null;
public static void connect() throws IOException{
Client = new Socket(host,143);
inFormServer = new BufferedReader(new InputStreamReader(Client.getInputStream()));
toServer = new DataOutputStream(Client.getOutputStream());
//str = inFormServer.readLine();
//System.out.println(str);
if(Client != null && inFormServer != null && toServer != null){
toServer.writeBytes("a001 login user pass\n");
toServer.writeBytes("a002 select inbox\n");
//toServer.writeBytes("a003 SEARCH UNSEEN UNDELETED\n");
toServer.writeBytes("A654 FETCH 1:10 (FLAGS BODY[HEADER.FIELDS (DATE FROM)])\n");
toServer.writeBytes("a005 LOGOUT\n");
String answer;
while((answer = inFormServer.readLine()) != null){
System.out.println("Server :" + answer);
//*if(answer.indexOf("OK") != -1){
//break;
}
}
}
}
The result I got:
Server :* FLAGS (\Answered \Seen \Deleted \Draft \Flagged)
Server :* OK [PERMANENTFLAGS (\Answered \Seen \Deleted \Draft \Flagged)] Limited
Server :a002 OK [READ-WRITE] SELECT completed
Server :* 1 FETCH (FLAGS (\Seen) BODY[HEADER.FIELDS (DATE FROM)] {104}
Server :Date: Tue, 8 Nov 2016 21:10:27 +0800
Server :From: =?UTF-8?B?5paw5rWq5b6u5Y2a?= <message#service.weibo.com>
Server :
Server :)
Server :* 2 FETCH (FLAGS (\Seen) BODY[HEADER.FIELDS (DATE FROM)] {114}
Server :Date: Wed, 9 Nov 2016 22:53:38 +0800 (CST)
Server :From: =?gb2312?B?zfjS19PDu6fW0NDE?= <passport#service.netease.com>
Server :
Server :)
Server :* 3 FETCH (FLAGS (\Seen) BODY[HEADER.FIELDS (DATE FROM)] {104}
Server :Date: Fri, 11 Nov 2016 05:07:05 -0000
Server :From: "PlayStation" <Sony#email.sonyentertainmentnetwork.com>
Server :
Server :)
Server :* 4 FETCH (FLAGS (\Seen) BODY[HEADER.FIELDS (DATE FROM)] {104}
Server :Date: Fri, 11 Nov 2016 05:28:26 -0000
Server :From: "PlayStation" <Sony#email.sonyentertainmentnetwork.com>
Server :
Server :)
Server :* 5 FETCH (FLAGS () BODY[HEADER.FIELDS (DATE FROM)] {105}
Server :Date: Sat, 12 Nov 2016 02:44:12 +0800
Server :From: =?UTF-8?B?5paw5rWq5b6u5Y2a?= <message#service.weibo.com>
Server :
Server :)
Server :* 6 FETCH (FLAGS () BODY[HEADER.FIELDS (DATE FROM)] {105}
Server :Date: Sun, 13 Nov 2016 18:18:02 +0800
Server :From: =?UTF-8?B?5paw5rWq5b6u5Y2a?= <message#service.weibo.com>
Server :
Server :)
Server :* 7 FETCH (FLAGS () BODY[HEADER.FIELDS (DATE FROM)] {105}
Server :Date: Mon, 14 Nov 2016 12:55:08 +0800
Server :From: =?UTF-8?B?5paw5rWq5b6u5Y2a?= <message#service.weibo.com>
Server :
Server :)
Server :* 8 FETCH (FLAGS () BODY[HEADER.FIELDS (DATE FROM)] {105}
Server :Date: Tue, 15 Nov 2016 16:35:52 +0800
Server :From: =?UTF-8?B?5paw5rWq5b6u5Y2a?= <message#service.weibo.com>
Server :
Server :)

Crawler4j authentication not working

I'm trying to use the FormAuthInfo authentication from Crawler4J to crawler into a specific LinkedIn page. This page can only be rendered, when I am correctly logged.
This is my Controller with the access URLs:
public class Controller {
public static void main(String[] args) throws Exception {
String crawlStorageFolder = "/data/";
int numberOfCrawlers = 1;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
String formUsername = "session_key";
String formPassword = "session_password";
String session_user = "email#email.com";
String session_password = "myPasswordHere";
String urlLogin = "https://www.linkedin.com/uas/login";
AuthInfo formAuthInfo = new FormAuthInfo(session_password, session_user, urlLogin, formUsername, formPassword);
config.addAuthInfo(formAuthInfo);
config.setMaxDepthOfCrawling(0);
controller.addSeed("https://www.linkedin.com/vsearch/f?keywords=java");
controller.start(Crawler.class, numberOfCrawlers);
controller.shutdown();
}
}
And this is my Crawler class:
public class Crawler extends WebCrawler {
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|gif|jpg" + "|png|mp3|mp3|zip|gz))$");
#Override
public boolean shouldVisit(Page referringPage, WebURL url) {
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches() && href.startsWith("https://www.linkedin.com");
}
#Override
public void visit(Page page) {
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
System.out.println(html);
Set<WebURL> links = htmlParseData.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
}
}
}
When I run this app using the Auth, I get these errors:
ADVERTÊNCIA: Cookie rejected [JSESSIONID="ajax:3637761943332982524", version:1, domain:.www.linkedin.com, path:/, expiry:null] Illegal domain attribute ".www.linkedin.com". Domain of origin: "www.linkedin.com"
jun 22, 2016 10:59:14 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
ADVERTÊNCIA: Cookie rejected [lang="v=2&lang=en-us", version:1, domain:linkedin.com, path:/, expiry:null] Domain attribute "linkedin.com" violates RFC 2109: domain must start with a dot
jun 22, 2016 10:59:14 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
ADVERTÊNCIA: Invalid cookie header: "Set-Cookie: lidc="b=TGST09:g=87:u=1:i=1466603959:t=1466690359:s=AQEc3R_6kIhooZN1RsDNkO2DaYEqzUWp"; Expires=Thu, 23 Jun 2016 13:59:19 GMT; domain=.linkedin.com; Path=/". Invalid 'expires' attribute: Thu, 23 Jun 2016 13:59:19 GMT
jun 22, 2016 10:59:14 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
ADVERTÊNCIA: Cookie rejected [JSESSIONID="ajax:4912042947175739413", version:1, domain:.www.linkedin.com, path:/, expiry:null] Illegal domain attribute ".www.linkedin.com". Domain of origin: "www.linkedin.com"
jun 22, 2016 10:59:14 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
ADVERTÊNCIA: Cookie rejected [lang="v=2&lang=en-us", version:1, domain:linkedin.com, path:/, expiry:null] Domain attribute "linkedin.com" violates RFC 2109: domain must start with a dot
jun 22, 2016 10:59:14 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
ADVERTÊNCIA: Invalid cookie header: "Set-Cookie: lidc="b=TGST09:g=87:u=1:i=1466603960:t=1466690360:s=AQE100NLG_uPIcJSJ7GLtRVkH7j_Ylu9"; Expires=Thu, 23 Jun 2016 13:59:20 GMT; domain=.linkedin.com; Path=/". Invalid 'expires' attribute: Thu, 23 Jun 2016 13:59:20 GMT
jun 22, 2016 10:59:14 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
ADVERTÊNCIA: Invalid cookie header: "Set-Cookie: lidc="b=TGST09:g=87:u=1:i=1466603960:t=1466690360:s=AQE100NLG_uPIcJSJ7GLtRVkH7j_Ylu9"; Expires=Thu, 23 Jun 2016 13:59:20 GMT; domain=.linkedin.com; Path=/". Invalid 'expires' attribute: Thu, 23 Jun 2016 13:59:20 GMT
Is this something related to the way how my http client deal with the cookie returned by LInkedIn?
Any suggestions?
Thanks!
First of all: This is not a problem of crawler4j. It is a problem of Linkedin, which they did not fix for a long time according to the latest google entries.
However, your approach will not work because crawler4j respects crawler ethics.
If you look at robots.txt, you will see, that the crawler will not crawl anything.

How to fix "com.hazelcast.core.OperationTimeoutException: No response for 120000 ms" exceptions in hazelcast?

I wrote a hazelcast that reads 100M datasets from a file and writes them into a MultiMap. I have 10 (locally running) members. After 70 mio nodes, I see this in my application log:
ERROR 2014-02-19 03:03:56,168 [pool-1-thread-1] HazelcastMaster:190 - Could not finish calculations No response for 120000 ms. Aborting invocation! InvocationFuture{invocation=Invoc
ationImpl{ serviceName='hz:impl:multiMapService', op=com.hazelcast.multimap.operations.PutOperation#d8b737, partitionId=179, replicaIndex=0, tryCount=250, tryPauseMillis=500, invoke
Count=1, callTimeout=60000, target=Address[144.76.62.99]:5704}, done=false}
com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! InvocationFuture{invocation=InvocationImpl{ serviceName='hz:impl:multiMapService', op=c
om.hazelcast.multimap.operations.PutOperation#d8b737, partitionId=179, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[144.76.62.9
9]:5704}, done=false}
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.waitForResponse(InvocationImpl.java:382)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:294)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:286)
at com.hazelcast.multimap.MultiMapProxySupport.invoke(MultiMapProxySupport.java:276)
at com.hazelcast.multimap.MultiMapProxySupport.putInternal(MultiMapProxySupport.java:52)
at com.hazelcast.multimap.ObjectMultiMapProxy.put(ObjectMultiMapProxy.java:81)
at de.komoot.spock.executor.HazelcastMultiMapStorage.add(HazelcastMultiMapStorage.java:31)
at de.komoot.spock.impl.OsmNodesRepositoryImpl.add(OsmNodesRepositoryImpl.java:22)
at de.komoot.spock.executor.HazelcastMaster$1.process(HazelcastMaster.java:96)
at org.openstreetmap.osmosis.pbf2.v0_6.impl.PbfDecoder.sendResultsToSink(PbfDecoder.java:106)
at org.openstreetmap.osmosis.pbf2.v0_6.impl.PbfDecoder.processBlobs(PbfDecoder.java:163)
at org.openstreetmap.osmosis.pbf2.v0_6.impl.PbfDecoder.run(PbfDecoder.java:175)
at de.komoot.spock.impl.PbfUrlReader.run(PbfUrlReader.java:63)
at de.komoot.spock.executor.HazelcastMaster.call(HazelcastMaster.java:146)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
And this had been send to STDERR:
INFO: [144.76.62.99]:5710 [dev] memory.used=4.3G, memory.free=1.2G, memory.total=5.5G, memory.max=7.0G, memory.used/total=78.83%, memory.used/max=62.20%, load.process=0.00%, load.system=77.00%, load.systemAverage=1144.00%, thread.count=59, thread.peakCount=87, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.operation.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operation.size=0, executor.q.response.size=0, operations.remote.size=0, operations.running.size=0, proxy.count=2, clientEndpoint.count=0, connection.active.count=9, connection.count=9
Feb 19, 2014 3:03:46 AM com.hazelcast.spi.Invocation
WARNING: [144.76.62.99]:5701 [dev] No response for 120000 ms. InvocationFuture{invocation=InvocationImpl{ serviceName='hz:impl:multiMapService', op=com.hazelcast.multimap.operations.PutOperation#d8b737, partitionId=179, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[144.76.62.99]:5704}, done=false}
Feb 19, 2014 3:03:47 AM com.hazelcast.spi.Invocation
WARNING: [144.76.62.99]:5701 [dev] Asking if operation execution has been started: InvocationImpl{ serviceName='hz:impl:multiMapService', op=com.hazelcast.multimap.operations.PutOperation#d8b737, partitionId=179, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[144.76.62.99]:5704}
Feb 19, 2014 3:03:52 AM com.hazelcast.spi.Invocation
WARNING: [144.76.62.99]:5701 [dev] While asking 'is-executing': InvocationImpl{ serviceName='hz:impl:multiMapService', op=com.hazelcast.multimap.operations.PutOperation#d8b737, partitionId=179, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[144.76.62.99]:5704}
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(InvocationImpl.java:432)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:294)
at com.hazelcast.spi.impl.InvocationImpl.isOperationExecuting(InvocationImpl.java:476)
at com.hazelcast.spi.impl.InvocationImpl.access$1300(InvocationImpl.java:36)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.waitForResponse(InvocationImpl.java:376)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:294)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java:286)
at com.hazelcast.multimap.MultiMapProxySupport.invoke(MultiMapProxySupport.java:276)
at com.hazelcast.multimap.MultiMapProxySupport.putInternal(MultiMapProxySupport.java:52)
at com.hazelcast.multimap.ObjectMultiMapProxy.put(ObjectMultiMapProxy.java:81)
(I redirected the output of all instances into one file).
My question is now: How could I start finding the cause? It worked with a very simple dataset on my dev machine but now that I did a first bigger try I'm lost. What can I do to find the cause of the timeout?
EDIT: I tried to give an example of what I do:
MultiMap<String, Node> data = instance.getMultiMap("MyMap");
RunnableSource reader = new PbfUrlReader(input.openStream(), 4);
Sink master = new Sink() {
#Override
public void process(EntityContainer entityContainer) {
Node node = ... //something made of entityContainer
String key = ... //key generated from entityContainer
data.put(key, node);
}
};
reader.setSink(master);
reader.run();

MongoDb Mass Saving "isOK() checkWriteError" Exception

I'm trying to write a parser and and I'm using MongoDB as a database. Essentially it is going through, creating objects, and then saving them. It is doing this several times a second. After around 164 objects are saved it crashes with this error:
com.mongodb.MongoException: isOk() at
com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:130)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:142) at
com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:141) at
com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:97) at
com.mongodb.DBCollection.insert(DBCollection.java:61) at
com.mongodb.DBCollection.save(DBCollection.java:547) at
com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:638) at
com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:685) at
com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:679) at
com.soleo.internal.releasenotes.orm.Storage.save(Storage.java:764) at
com.soleo.internal.releasenotes.page.MainPage$2.onSubmit(MainPage.java:256)
Now at one point I had over 1000 objects in this same database, I just didn't insert them all at once. So it can't be a hard drive space issue. I can't find any documentation at all of this error online. Oddly it's only when I try saving THIS object. If I try saving Object B after the crash it saves just fine. It just crashes on Object A, the one I initially mass saved.
I ran a test multiple times and it failed in the same place. I used random values so I could prove it wasn't a variable issue:
FIRST TRY:
==============================
CREATING RELEASE #162
Component: iHateYou
Location: 250344
Version: 8.8.1.5-2
Date: Sun Feb 07 00:00:00 EST 3188 (02-07-3188)
SAVING.............
SUCCESS.
==============================
CREATING RELEASE #163
Component: iHateYou
Location: 227407
Version: 5.5.7.6-7
Date: Sat Mar 04 00:00:00 EST 439 (03-04-439)
SAVING.............
SUCCESS.
==============================
CREATING RELEASE #164
Component: iHateYou
Location: 38694
Version: 3.5.4.7-7
Date: Mon Jan 03 00:00:00 EST 158 (01-03-158)
SAVING.............
Oct 28, 2011 11:17:11 AM org.apache.wicket.RequestCycle logRuntimeException
SEVERE: isOk()
com.mongodb.MongoException: isOk()
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:130)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:142)
SECOND TRY:
==============================
CREATING RELEASE #162
Component: iHateYou
Location: 64717
Version: 0.1.0.4-8
Date: Sun May 07 00:00:00 EST 971 (05-07-971)
SAVING.............
SUCCESS.
==============================
CREATING RELEASE #163
Component: iHateYou
Location: 19360
Version: 4.5.8.1-3
Date: Wed Aug 04 00:00:00 EST 1339 (08-04-1339)
SAVING.............
SUCCESS.
==============================
CREATING RELEASE #164
Component: iHateYou
Location: 115518
Version: 0.0.8.0-2
Date: Sat Apr 07 00:00:00 EST 143 (04-07-143)
SAVING.............
Oct 28, 2011 11:15:28 AM org.apache.wicket.RequestCycle logRuntimeException
SEVERE: isOk()
com.mongodb.MongoException: isOk()
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:130)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:142)
Here's some partially obfuscated code:
Random blank = new Random();
ObjectRef blah = new ObjectRef("iHateYou");
storage.save(blah);
for(int i = 0; i < 300; i++)
{
System.out.println("==============================\nCREATING OBJECT #" + i);
ObjectA saveMe = new ObjectA();
saveMe.setRef(storage.getRefByName("iHateYou"));
System.out.println("Component: " + saveMe.getRef.getName());
saveMe.setLocation(blank.nextInt(300000) + "");
System.out.println("Location: " + saveMe.getLocation());
saveMe.setVersion(new Version(blank.nextInt(9) + "." + blank.nextInt(9) + "." + blank.nextInt(9) + "." + blank.nextInt(9) + "-" + blank.nextInt(9)));
System.out.println("Version: " + saveMe.getVersion());
try
{
String randomDate = "0" + blank.nextInt(9) + "-0" + blank.nextInt(9) + "-" + blank.nextInt(4000);
saveMe.setReleaseDate(new SimpleDateFormat("MM-dd-yyyy").parse(randomDate));
System.out.println("Date: " + saveMe.getReleaseDate() + " (" + randomDate + ") ");
}
catch (ParseException e)
{
e.printStackTrace();
}
System.out.println("SAVING.............");
storage.save(saveMe);
System.out.println("SUCCESS.");
}
Sounds like it could be a variation of this bug: https://jira.mongodb.org/browse/RUBY-324
Are you using sharding? In that case your config DB might be corrupt. The driver is probably not expecting to receive the message "isOk" back.
Please tell us more about your environment: MongoDB version, using sharding or not, driver version, etc.

Categories