log4j: Standard way to prevent repetitive log messages? - java

Our production application logs an error when it fails to establish a TCP/IP connection. Since it is constantly retrying the connection, it logs the same error message over and over. And similarly, other running components in the application can get into an error loop if some realtime resource is unavailable for a period of time.
Is there any standard approach to controlling the number of times the same error gets logged? (We are using log4j, so if there is any extension for log4j to handle this, it would be perfect.)

I just created a Java class that solves this exact problem using log4j. When I want to log a message, I just do something like this:
LogConsolidated.log(logger, Level.WARN, 5000, "File: " + f + " not found.", e);
Instead of:
logger.warn("File: " + f + " not found.", e);
Which makes it log a maximum of 1 time ever 5 seconds, and prints how many times it should have logged (e.g. |x53|). Obviously, you can make it so you don't have as many parameters, or pull the level out by doing log.warn or something, but this works for my use case.
import java.util.HashMap;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
public class LogConsolidated {
private static HashMap<String, TimeAndCount> lastLoggedTime = new HashMap<>();
/**
* Logs given <code>message</code> to given <code>logger</code> as long as:
* <ul>
* <li>A message (from same class and line number) has not already been logged within the past <code>timeBetweenLogs</code>.</li>
* <li>The given <code>level</code> is active for given <code>logger</code>.</li>
* </ul>
* Note: If messages are skipped, they are counted. When <code>timeBetweenLogs</code> has passed, and a repeat message is logged,
* the count will be displayed.
* #param logger Where to log.
* #param level Level to log.
* #param timeBetweenLogs Milliseconds to wait between similar log messages.
* #param message The actual message to log.
* #param t Can be null. Will log stack trace if not null.
*/
public static void log(Logger logger, Level level, long timeBetweenLogs, String message, Throwable t) {
if (logger.isEnabledFor(level)) {
String uniqueIdentifier = getFileAndLine();
TimeAndCount lastTimeAndCount = lastLoggedTime.get(uniqueIdentifier);
if (lastTimeAndCount != null) {
synchronized (lastTimeAndCount) {
long now = System.currentTimeMillis();
if (now - lastTimeAndCount.time < timeBetweenLogs) {
lastTimeAndCount.count++;
return;
} else {
log(logger, level, "|x" + lastTimeAndCount.count + "| " + message, t);
}
}
} else {
log(logger, level, message, t);
}
lastLoggedTime.put(uniqueIdentifier, new TimeAndCount());
}
}
private static String getFileAndLine() {
StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();
boolean enteredLogConsolidated = false;
for (StackTraceElement ste : stackTrace) {
if (ste.getClassName().equals(LogConsolidated.class.getName())) {
enteredLogConsolidated = true;
} else if (enteredLogConsolidated) {
// We have now file/line before entering LogConsolidated.
return ste.getFileName() + ":" + ste.getLineNumber();
}
}
return "?";
}
private static void log(Logger logger, Level level, String message, Throwable t) {
if (t == null) {
logger.log(level, message);
} else {
logger.log(level, message, t);
}
}
private static class TimeAndCount {
long time;
int count;
TimeAndCount() {
this.time = System.currentTimeMillis();
this.count = 0;
}
}
}

It would be fairly simple to control this by recording a timestamp each time you log the error, and then only logging it next time if a certain period has elapsed.
Ideally this would be a feature within log4j, but coding it within your app isn't too bad, and you could encapsulate it within a helper class to avoid boilerplate throughout your code.
Clearly, each repetitive log statement would need some kind of unique ID so that you could merge statements from the same source.

Related

How expensive would this function be to call with every log message in a Java/Kotlin application?

The code is
private fun inferTag(): String =
Throwable().stackTrace
.getOrElse(CALL_STACK_INDEX) {
throw IllegalStateException("Synthetic stacktrace didn't have enough elements")
}
.let {
val className =
ANONYMOUS_CLASS_PATTERN.matcher(it.className)
.takeIf(Matcher::find)
?.replaceAll("")
?: it.className
"[${Thread.currentThread().name}] " +
"${className.substringAfterLast(".")}#${it.methodName}:${it.lineNumber}"
}
The idea is that it gets the origin of the caller with every log, so that we do not need to tag our logs with the calling class. Wondering how expensive this would be if you had say 1000 logs

Logging with optional parameters

I have method where I want to add specific logging:
#Slf4j
#Service
public class SomethingService {
public void doSomething(Something data, String comment, Integer limit) {
Long id = saveSomethingToDatabase(data, comment);
boolean sentNotification = doSomething(id);
// ...
// Log what you done.
// Variables that always have important data: data.getName(), id
// Variables that are optional: sentNotification, comment, limit
// (optional means they aren't mandatory, rarely contains essential data, often null, false or empty string).
}
}
I can simply log all:
log.info("Done something '{}' and saved (id {}, sentNotification={}) with comment '{}' and limit {}",
something.getName(), id, sentNotification, comment, limit);
// Done something 'Name of data' and saved (id 23, sentNotification=true) with comment 'Comment about something' and limit 2
But most of the time most of the parameters are irrelevant. With the above I get logs like:
// Done something 'Name of data' and saved (id 23, sentNotification=false) with comment 'null' and limit null
That makes logs hard to read, long and unnecessarily complicated (in most cases other parameters aren't present).
I want to handle all cases with preserving only essential data. Examples:
// Done something 'Name of data' and saved (id 23)
// Done something 'Name of data' and saved (id 23) with comment 'Comment about something'
// Done something 'Name of data' and saved (id 23) with limit 2
// Done something 'Name of data' and saved (id 23) with comment 'Comment about something' and limit 2
// Done something 'Name of data' and saved (id 23, sent notification)
// Done something 'Name of data' and saved (id 23, sent notification) with limit 2
// Done something 'Name of data' and saved (id 23, sent notification) with comment 'Comment about something'
// Done something 'Name of data' and saved (id 23, sent notification) with comment 'Comment about something' and limit 2
I can code it by hand:
String notificationMessage = sentNotification ? ", sent notification" : "";
String commentMessage = comment != null ? String.format(" with comment '%s'", comment) : "";
String limitMessage = "";
if (limit != null) {
limitMessage = String.format("limit %s", limit);
limitMessage = comment != null ? String.format(" and %s", limitMessage) : String.format(" with %s", limitMessage);
}
log.info("Done something '{}' and saved (id {}{}){}{}",
something.getName(), id, notificationMessage, commentMessage, limitMessage);
But it's hard to write, hard to read, complicated and causes errors.
I would like something like specify part of logs.
Example pseudocode:
log.info("Done something '{}' and saved (id {} $notification) $parameters",
something.getName(), id,
$notification: sentNotification ? "sent notification" : "",
$parameters: [comment, limit]);
It should supports optional parameters, replace boolean/condition with given string, supports separating spaces, commas and words with and and.
Maybe are there existing library for this? Or maybe is there at least a simpler way for coding this?
If not, it remains for me nothing else to write my own library for messages to logging. Additionally, this kind of library will provide that all logs would be consistent.
If you don't see a problem with three optional parameters, just imagine there are more (and you can't always pack them into a class - another class layer only for parameter logging cause even more complications).
At the end, I know I can log each action separately. But with this I get many more logs and I won't have the most important information in one place. Other logs are in the debug level, not info.
both of these are possible. You can either:
register a component with the Logger to do the work for you
write a wrapper class for your logger to use
I will demonstrate both and explain why I think the second is the better choice. Let's start with that:
Instead of having the Logger own the knowledge of how to format your specific properties, let your code own this responsibility.
For example, rather than logging each parameter, collect them and define their logging separately. See this code:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class LoggingExample {
private static final Logger LOGGER = LoggerFactory.getLogger(LoggingExample.class);
public static void main(String[] args) {
LogObject o = new LogObject();
LOGGER.info("{}", o);
o.first = "hello";
LOGGER.info("{}", o);
o.second = "World";
LOGGER.info("{}", o);
o.last = "And finally";
LOGGER.info("{}", o);
}
public static class LogObject {
String first;
String second;
String last;
#Override
public String toString() {
StringBuffer buffer = new StringBuffer();
buffer.append("Log Object: ");
if (first != null) {
buffer.append("First: " + first + " ");
}
if (second != null) {
buffer.append("Second: " + second + " ");
}
if (last != null) {
buffer.append("Second: " + last + " ");
}
return buffer.toString();
}
}
}
We define LogObject as a container and this container implements toString. All Loggers will call toString() on their objects, that is how they figure out what they should print (unless special formatters applied etc).
With this, the log statements print:
11:04:12.465 [main] INFO LoggingExample - Log Object:
11:04:12.467 [main] INFO LoggingExample - Log Object: First: hello
11:04:12.467 [main] INFO LoggingExample - Log Object: First: hello Second: World
11:04:12.467 [main] INFO LoggingExample - Log Object: First: hello Second: World Second: And finally
Advantages:
this works with any Logger. You won't have to implement specifics depending on what you want to use
the knowledge is encapsulated in 1 object that can be easily tested. This should mitigate the error prone formatting problem you stated.
no need for a complex formatter library or implementation
It will make the logging look much nicer and compact in the end. log.info("{}", object);
Disadvantage:
You are required to write the Bean.
Now the same can be achieved using for example a custom Layout. I am using logback, so this is an example for logback.
We may define a Layout that owns the knowledge of what to do with your custom formatting instructions.
import org.slf4j.LoggerFactory;
import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.Logger;
import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.encoder.PatternLayoutEncoder;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.ConsoleAppender;
import ch.qos.logback.core.LayoutBase;
public class LoggingExample2 {
private static final Logger CUSTOM_LOGGER = createLoggerFor("test");
public static void main(String[] args) {
LogObject o = new LogObject();
CUSTOM_LOGGER.info("{}", o);
o.first = "hello";
CUSTOM_LOGGER.info("{}", o);
o.second = "World";
CUSTOM_LOGGER.info("{}", o);
o.last = "And finally";
CUSTOM_LOGGER.info("{}", o);
}
public static class LogObject {
String first;
String second;
String last;
#Override
public String toString() {
StringBuffer buffer = new StringBuffer();
buffer.append("Log Object: ");
if (first != null) {
buffer.append("First: " + first + " ");
}
if (second != null) {
buffer.append("Second: " + second + " ");
}
if (last != null) {
buffer.append("Second: " + last + " ");
}
return buffer.toString();
}
}
public static class ModifyLogLayout extends LayoutBase<ILoggingEvent> {
#Override
public String doLayout(ILoggingEvent event) {
String formattedMessage = event.getFormattedMessage() + "\n";
Object[] args = event.getArgumentArray();
return String.format(formattedMessage, args);
}
}
private static Logger createLoggerFor(String string) {
LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory();
PatternLayoutEncoder ple = new PatternLayoutEncoder();
ple.setPattern("%date %level [%thread] %logger{10} [%file:%line] %msg%n");
ple.setContext(lc);
ple.start();
ConsoleAppender<ILoggingEvent> consoleAppender = new ConsoleAppender<ILoggingEvent>();
consoleAppender.setEncoder(ple);
consoleAppender.setLayout(new ModifyLogLayout());
consoleAppender.setContext(lc);
consoleAppender.start();
Logger logger = (Logger) LoggerFactory.getLogger(string);
logger.addAppender(consoleAppender);
logger.setLevel(Level.DEBUG);
logger.setAdditive(false); /* set to true if root should log too */
return logger;
}
}
I borrowed the Logger instatiation from: Programmatically configure LogBack appender
Note that I have not found a library that can parse the complex expressions that you have listed. I think you may have to write your own implementation.
In my example, i only illustrate how to intercept and modify the message based on the arguments.
Why I would not recommend this unless it is really needed:
the implementation is specific to logback
writing correct formatting is hard ... it will produce more errors than creating a custom object to format
It is harder to test because you literally have unlimited objects that may pass through this (and formatting). Your code must be resilient to this now, and in the future since any developer may add the weirdest things at any time.
The last (unasked) answer:
Why don't you use a json encoder? And then use something like logstash to aggregate (or cloudlwatch, or anything else).
This should solve all your problems.
This is what I have done in the past:
Define 1 bean that you like to log "differently". I call it metadata. This bean can be i.e.
public class MetaHolder {
// map holding key/values
}
This basically just stores all your variables with a key. It allows you to effectively search on these keys, sink them into databases, etc. etc.
In your log, you simply do:
var meta = // create class
meta.put("comment", comment);
// put other properties here
log.info("formatted string", formattedArguments, meta); // meta is always the last arg
In the Layout this can then be converted quite nicely. Because you are no longer logging "human language", there are no "withs" and "in" to replace. Your log will simply be:
{
"time" : "...",
"message" : "...",
"meta" : {
"comment" : "this is a comment"
// no other variables set, so this was it
}
}
And one last (last) one in just pure java, if you wanted that. You could write:
public static void main(String[] args) {
String comment = null;
String limit = "test";
String id = "id";
LOGGER.info(
"{} {} {}",
Optional.ofNullable(comment).map(s -> "The comment " + s).orElse(""),
Optional.ofNullable(limit).map(s -> "The Limit " + s).orElse(""),
Optional.ofNullable(id).map(s -> "The id " + s).orElse(""));
}
Which effectively moves the conditional logic you want in your formatting into Java's Optional.
I find this also is hard to read and test and would still recommend the first solution

Java: Azure Service Bus Queue Receiving messsages with sessions

I'm writing code in java (using Azure SDK for Java), I have a Service bus queue that contains sessionful messages. I want to receive those messages and process them to another place.
I make a connection to the Queue by using QueueClient, and then I use registerSessionHandler to process through the messages (code below).
The problem is that whenever a message is received, I can print all details about it including the content, but it is printed 10 times and after each time it prints an Exception.
(printing 10 times: I understand that this is because there is a 10 times retry policy before it throws the message to the Dead letter queue and goes to the next message.)
The Exception says
> USERCALLBACK-Receiver not created. Registering a MessageHandler creates a receiver.
The output with the Exception
But I'm sure that the SessionHandler does the same thing as MessageHandler but includes support for sessions, so it should create a receiver since it receives messages. I have tried to use MessageHandler but it won't even work and stops the whole program because it doesn't support sessionful messages, and the ones I receive have sessions.
My problem is understanding what the Exception wants me to do, and how can I fix the code so it won't give me any exceptions? Does anyone have suggestions on how to improve the code? or other methods that do the same thing?
QueueClient qc = new QueueClient(
new ConnectionStringBuilder(connectionString),
ReceiveMode.PEEKLOCK);
qc.registerSessionHandler(
new ISessionHandler() {
#Override
public CompletableFuture<Void> onMessageAsync(IMessageSession messageSession, IMessage message) {
System.out.printf(
"\nMessage received: " +
"\n --> MessageId = %s " +
"\n --> SessionId = %s" +
"\n --> Content Type = %s" +
"\n --> Content = \n\t\t %s",
message.getMessageId(),
messageSession.getSessionId(),
message.getContentType(),
getMessageContent(message)
);
return qc.completeAsync(message.getLockToken());
}
#Override
public CompletableFuture<Void> OnCloseSessionAsync(IMessageSession iMessageSession) {
return CompletableFuture.completedFuture(null);
}
#Override
public void notifyException(Throwable throwable, ExceptionPhase exceptionPhase) {
System.out.println("\n Exception " + exceptionPhase + "-" + throwable.getMessage());
}
},
new SessionHandlerOptions(1, true, Duration.ofMinutes(1)),
Executors.newSingleThreadExecutor()
);
(The getMessageContent(message) method is a separate method, for those interested:)
public String getMessageContent(IMessage message){
List<byte[]> content = message.getMessageBody().getBinaryData();
StringBuilder sb = new StringBuilder();
for (byte[] b : content) {
sb.append(new String(b)
);
}
return sb.toString();
}
For those who wonder, I managed to solve the problem!
It was simply done by using Azure Functions ServiceBusQueueTrigger, it will then listen to the Service bus Queue and process the messages. By setting isSessionsEnabled to true, it will accept sessionful messages as I wanted :)
So instead of writing more than 100 lines of code, the code looks like this now:
public class Function {
#FunctionName("QueueFunction")
public void run(
#ServiceBusQueueTrigger(
name = "TriggerName", //Any name you choose
queueName = "queueName", //QueueName from the portal
connection = "ConnectionString", //ConnectionString from the portal
isSessionsEnabled = true
) String message,
ExecutionContext context
) {
// Write the code you want to do with the message here
// Using the variable messsage which contains the messageContent, messageId, sessionId etc.
}
}

PUSH Notifications for >1000 devices through GCM Server(Java)

I have a GCM-backend Java server and I'm trying to send to all users a notification msg. Is my approach right? To just split them into 1000 each time before giving the send request? Or is there a better approach?
public void sendMessage(#Named("message") String message) throws IOException {
int count = ofy().load().type(RegistrationRecord.class).count();
if(count<=1000) {
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).limit(count).list();
sendMsg(records,message);
}else
{
int msgsDone=0;
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).list();
do {
List<RegistrationRecord> regIdsParts = regIdTrim(records, msgsDone);
msgsDone+=1000;
sendMsg(regIdsParts,message);
}while(msgsDone<count);
}
}
The regIdTrim method
private List<RegistrationRecord> regIdTrim(List<RegistrationRecord> wholeList, final int start) {
List<RegistrationRecord> parts = wholeList.subList(start,(start+1000)> wholeList.size()? wholeList.size() : start+1000);
return parts;
}
The sendMsg method
private void sendMsg(List<RegistrationRecord> records,#Named("message") String message) throws IOException {
if (message == null || message.trim().length() == 0) {
log.warning("Not sending message because it is empty");
return;
}
Sender sender = new Sender(API_KEY);
Message msg = new Message.Builder().addData("message", message).build();
// crop longer messages
if (message.length() > 1000) {
message = message.substring(0, 1000) + "[...]";
}
for (RegistrationRecord record : records) {
Result result = sender.send(msg, record.getRegId(), 5);
if (result.getMessageId() != null) {
log.info("Message sent to " + record.getRegId());
String canonicalRegId = result.getCanonicalRegistrationId();
if (canonicalRegId != null) {
// if the regId changed, we have to update the datastore
log.info("Registration Id changed for " + record.getRegId() + " updating to " + canonicalRegId);
record.setRegId(canonicalRegId);
ofy().save().entity(record).now();
}
} else {
String error = result.getErrorCodeName();
if (error.equals(Constants.ERROR_NOT_REGISTERED)) {
log.warning("Registration Id " + record.getRegId() + " no longer registered with GCM, removing from datastore");
// if the device is no longer registered with Gcm, remove it from the datastore
ofy().delete().entity(record).now();
} else {
log.warning("Error when sending message : " + error);
}
}
}
}
Quoting from Google Docs:
GCM is support for up to 1,000 recipients for a single message. This capability makes it much easier to send out important messages to your entire user base. For instance, let's say you had a message that needed to be sent to 1,000,000 of your users, and your server could handle sending out about 500 messages per second. If you send each message with only a single recipient, it would take 1,000,000/500 = 2,000 seconds, or around half an hour. However, attaching 1,000 recipients to each message, the total time required to send a message out to 1,000,000 recipients becomes (1,000,000/1,000) / 500 = 2 seconds. This is not only useful, but important for timely data, such as natural disaster alerts or sports scores, where a 30 minute interval might render the information useless.
Taking advantage of this functionality is easy. If you're using the GCM helper library for Java, simply provide a List collection of registration IDs to the send or sendNoRetry method, instead of a single registration ID.
We can not send more than 1000 push notification at time.I searched a lot but not result then i did this with same approach split whole list in sub lists of 1000 items and send push notification.

accumulo - batchscanner: one result per range

So my general question is "Is it possible to have an Accumulo BatchScanner only pull back the first result per Range I give it?"
Now some details about my use case as there may be a better way to approach this anyway. I have data that represent messages from different systems. There can be different types of messages. My users want to be able to ask the system questions, such as "give me the most recent message of a certain type as of a certain time for all these systems".
My table layout looks like this
rowid: system_name, family: message_type, qualifier: masked_timestamp, value: message_text
The idea is that the user gives me a list of systems they care about, the type of message, and a certain timestamp. I used masked timestamp so that the table sorts most recent first. That way when I scan for a timestamp, the first result is the most recent prior to that time. I am using a BatchScanner because I have multiple systems I am searching for per query. Can I make the BatchScanner only fetch the first result for each Range? I can't specify a specific key because the most recent may not match the datetime given by the user.
Currently, I am using the BatchScanner and ignoring all but the first result per Key. It works right now, but it seems like a waste to pull back all the data for a specific system/type over the network when I only care about the first result per system/type.
EDIT
My attempt using the FirstEntryInRowIterator
#Test
public void testFirstEntryIterator() throws Exception
{
Connector connector = new MockInstance("inst").getConnector("user", new PasswordToken("password"));
connector.tableOperations().create("testing");
BatchWriter writer = writer(connector, "testing");
writer.addMutation(mutation("row", "fam", "qual1", "val1"));
writer.addMutation(mutation("row", "fam", "qual2", "val2"));
writer.addMutation(mutation("row", "fam", "qual3", "val3"));
writer.close();
Scanner scanner = connector.createScanner("testing", new Authorizations());
scanner.addScanIterator(new IteratorSetting(50, FirstEntryInRowIterator.class));
Key begin = new Key("row", "fam", "qual2");
scanner.setRange(new Range(begin, begin.followingKey(PartialKey.ROW_COLFAM_COLQUAL)));
int numResults = 0;
for (Map.Entry<Key, Value> entry : scanner)
{
Assert.assertEquals("qual2", entry.getKey().getColumnQualifier().toString());
numResults++;
}
Assert.assertEquals(1, numResults);
}
My goal is that the returned entry will be the ("row", "fam", "qual2", "val2") but I get 0 results. It almost seems like the Iterator is being applied before the Range maybe? I haven't dug into this yet.
This sounds like a good use case for using one of Accumulo's SortedKeyValueIterators, specifically the FirstEntryInRowIterator (contained in the accumulo-core artifact).
Create an IteratorSetting with the FirstEntryInRowIterator and add it to your BatchScanner. This will return the first Key/Value in that system_name, and then stop avoiding the overhead of your client ignoring all other results.
A quick modification of the FirstEntryInRowIterator might get you what you want:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.accumulo.core.iterators;
import java.io.IOException;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
import org.apache.accumulo.core.client.IteratorSetting;
import org.apache.accumulo.core.data.ByteSequence;
import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.PartialKey;
import org.apache.accumulo.core.data.Range;
import org.apache.accumulo.core.data.Value;
import org.apache.hadoop.io.Text;
public class FirstEntryInRangeIterator extends SkippingIterator implements OptionDescriber {
// options
static final String NUM_SCANS_STRING_NAME = "scansBeforeSeek";
// iterator predecessor seek options to pass through
private Range latestRange;
private Collection<ByteSequence> latestColumnFamilies;
private boolean latestInclusive;
// private fields
private Text lastRowFound;
private int numscans;
/**
* convenience method to set the option to optimize the frequency of scans vs. seeks
*/
public static void setNumScansBeforeSeek(IteratorSetting cfg, int num) {
cfg.addOption(NUM_SCANS_STRING_NAME, Integer.toString(num));
}
// this must be public for OptionsDescriber
public FirstEntryInRangeIterator() {
super();
}
public FirstEntryInRangeIterator(FirstEntryInRangeIterator other, IteratorEnvironment env) {
super();
setSource(other.getSource().deepCopy(env));
}
#Override
public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
return new FirstEntryInRangeIterator(this, env);
}
#Override
public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException {
super.init(source, options, env);
String o = options.get(NUM_SCANS_STRING_NAME);
numscans = o == null ? 10 : Integer.parseInt(o);
}
// this is only ever called immediately after getting "next" entry
#Override
protected void consume() throws IOException {
if (finished == true || lastRowFound == null)
return;
int count = 0;
while (getSource().hasTop() && lastRowFound.equals(getSource().getTopKey().getRow())) {
// try to efficiently jump to the next matching key
if (count < numscans) {
++count;
getSource().next(); // scan
} else {
// too many scans, just seek
count = 0;
// determine where to seek to, but don't go beyond the user-specified range
Key nextKey = getSource().getTopKey().followingKey(PartialKey.ROW);
if (!latestRange.afterEndKey(nextKey))
getSource().seek(new Range(nextKey, true, latestRange.getEndKey(), latestRange.isEndKeyInclusive()), latestColumnFamilies, latestInclusive);
else {
finished = true;
break;
}
}
}
lastRowFound = getSource().hasTop() ? getSource().getTopKey().getRow(lastRowFound) : null;
}
private boolean finished = true;
#Override
public boolean hasTop() {
return !finished && getSource().hasTop();
}
#Override
public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
// save parameters for future internal seeks
latestRange = range;
latestColumnFamilies = columnFamilies;
latestInclusive = inclusive;
lastRowFound = null;
super.seek(range, columnFamilies, inclusive);
finished = false;
if (getSource().hasTop()) {
lastRowFound = getSource().getTopKey().getRow();
if (range.beforeStartKey(getSource().getTopKey()))
consume();
}
}
#Override
public IteratorOptions describeOptions() {
String name = "firstEntry";
String desc = "Only allows iteration over the first entry per range";
HashMap<String,String> namedOptions = new HashMap<String,String>();
namedOptions.put(NUM_SCANS_STRING_NAME, "Number of scans to try before seeking [10]");
return new IteratorOptions(name, desc, namedOptions, null);
}
#Override
public boolean validateOptions(Map<String,String> options) {
try {
String o = options.get(NUM_SCANS_STRING_NAME);
if (o != null)
Integer.parseInt(o);
} catch (Exception e) {
throw new IllegalArgumentException("bad integer " + NUM_SCANS_STRING_NAME + ":" + options.get(NUM_SCANS_STRING_NAME), e);
}
return true;
}
}

Categories