Regarding a data structure for O(1) get on prefixes

Regarding a data structure for O(1) get on prefixes - java

So I am trying to write a little utility in Scala that constantly listens on a bunch of directories for file system changes (deletes, creates, modifications etc) and rsyncs it immediately across to a remote server. (https://github.com/Khalian/LockStep)
My configurations are stored in JSON as the follows:-
{
"localToRemoteDirectories": {
"/workplace/arunavs/third_party": {
"remoteDir": "/remoteworkplace/arunavs/third_party",
"remoteServerAddr": "some Remote server address"
}
}
}
This configuration is stored in a Scala Map (key = localDir, value = (remoteDir, remoteServerAddr)). The tuple is represented as a case class
sealed case class RemoteLocation(remoteDir:String, remoteServerAddr:String)
I am using an actor from a third party:
https://github.com/lloydmeta/schwatcher/blob/master/src/main/scala/com/beachape/filemanagement/FileSystemWatchMessageForwardingActor.scala)
that listens on these directories (e.g. /workplace/arunavs/third_party and then outputs an Java 7 WatchKind event (EVENT_CREATE, EVENT_MODIFY etc). The problem is that the events sent are absolute path (for instance if I create a file helloworld in third_party dir, the message sent by the actor is (ENTRY_CREATE, /workplace/arunavs/third_party/helloworld))
I need a way to write a getter that gets the nearest prefix from the configuration map stored above. The obvious way to do it is to filter on the map:-
def getRootDirsAndRemoteAddrs(localDir:String) : Map[String, RemoteLocation] =
localToRemoteDirectories.filter(e => localDir.startsWith(e._1))
This simply returns the subset of keys that are a prefix to the localDir (in the above example this method is called with localDir = /workplace/arunavs/third_party/helloworld. While this works, this implementation is O(n) where n is the number of items in my configuration. I am looking for better computational complexity (I looked at radix and patricia tries, but they dont cut it since I feeding a string and trying to get keys which are prefixes to it, tries solve the opposite problem).

Related

How do I make "picklists" or "enums" in bulk from one source system to a destination system using Zapier or Code?

I am by no means experienced in programming, but I can trial and error my way through some basic code. I am currently working on integrating a couple of my business application via Zapier or make (integromat) and I always get stuck on field mapping in bulk.
Right now, I am trying to create a two-way sync between my "loan origination system" and "Monday.com" (project management system) and I am using "AirTable" as a mid point to store/hold the data.
The flow looks like this: Loan Orignation System -> Airtable -> Monday.com.
In reverse, the flow is: Monday.com -> Airtable -> Loan Origination System.
I always get stuck when I have "picklists" or "enums" - basically drop down fields that have different values in both systems.
I know I can use a lookup table in zapier but there are so many fields that it would be nearly impossible and also impracticle to add that many lookup table steps to get transformed values.
Here is an example of what is being passed from the loan origination system -
Field: "propertyType" (Display Name is "Property Type")
Potential Values:
SINGLE_FAMILY_DETACHED,
SINGLE_FAMILY_ATTACHED,
TWO_UNIT,
THREE_UNIT,
FOUR_UNIT,
MANUFACTURED_SINGLE_WIDE,
MANUFACTURED_DOUBLE_WIDE
Now in Monday.com, I have the same field listed as a status field called "Property Type" and the values are the normalized names or display names for these values. So the following:
Monday.com Property Type Status Field Options
"Single Family Detached",
"Single Family Attached",
"Two Unit",
"Three Unit",
"Four Unit",
"Manufactured Single Wide",
"Manufactured Double Wide"
Is there any good way to transform the values for all the possible inputs/outputs for either direction without having to make a million zap steps for all the fields that function this way?
It is a systems integration problem, and I don't have a ton of money to buy a fancy tool. We have zapier, airtable, integromat, etc. And the tools do not have open API's at the moment so I have to work through these tools.
Any help or guidance is super appreciated!

When using Code by Zapier (JavaScript), usually it is a good idea to put some guarantees in your code right up front. This piece of code sets 'defaults' for the inputData keys:
// ☸
class DefaultKeys {
constructor(keys={}){Object.assign(this,{
propertyType : 'default_value',
loanType : 'default_value'
}, keys)}
}
let values = new DefaultKeys(inputData)
// ☸
Now to your actual question... It is possible to use switch() like the lookups provided by Zapier:
let newPropertyType = switch (propertyType) {
case 'SINGLE_FAMILY_DETACHED':
'Single Family Detached' // a function could also make this transformation
break;
case 'SINGLE_FAMILY_ATTACHED':
'Single Family Attached'
default:
'default_value';
}
So then copy that section for each of your different 'keys'.
Lastly, output your results for the next Zapier actions to access:
output = {propertyType: newPropertyType, loanType: newLoanType};

How to obtain PayloadSize from Genicam reference implementation?

I'm trying to access a GigE camera using the Genicam reference implementation by trying to look at the online resources and existing existing resources (aravis, harvesters) and follow the GenTL standard using the SNFC which every Genicam compatible camera supports. The producer I'm currently using is from Basler since the camera I have here is from them.
/* I wrapped the Genicam classes with my own. Here are the relevant parts */
tl = new GenicamTransportlayer("/opt/pylon/lib/gentlproducer/gtl/ProducerGEV.cti");
if0 = tl.getFirstInterface();
dev0 = if0.getFirstDevice();
ds = dev0.getFirstDataStream();
I'm able to connect to the System, Interface, Device, DataStream, connect the nodemaps and am now trying to set up the buffers for acquisition. To do so I need to get the maximum payload size from the camera. The GenTL standard document standard says, I need to query it from the DataStream module using
boolean definesPayloadSize = ds.getInfoBool8(StreamInfoCommand.STREAM_INFO_DEFINES_PAYLOADSIZE);
which gives me 0 or false. The producer MAY provide a PayloadSize feature which can be queried using
ds.getInfoSizet(StreamInfoCommand.STREAM_INFO_PAYLOAD_SIZE);
which is obviously also 0 and with being a may I cannot rely on it. The standard further tells me if both fail, I need to inquire via the remote devices NodeMap to read the PayloadSize:
long payloadSizeFromRemoteMap = dev0.remoteMap.getIntegerNode("PayloadSize").getValue();
This gives me 0 too. The standard goes on that if the producer does not implement an interface standard (whatever this means?), the required payload size has to be queried via the producer using the StreamInfo Commands which also fails (GenTL maps the constant STREAM_INFO_PAYLOAD_SIZE to 7 which produces a BufferTooSmallException on the System port).
At this point I'm confused on what to do. Most of my nodes are locked (I can overwrite TLParamsLocked but still cannot change parameters, eg, execute a load of the default parameter set) so I cannot set Width/Height/ImageFormat to infer the PayloadSize:
/* Trying to set a default configuration fails */
IEnumeration userSetSelector = dev0.remoteMap.getEnumerationNode("UserSetSelector");
log.debug("Loading Feature set: " + userSetSelector.getEntries().get(0).getName());
// Prints: Loading Feature set: EnumEntry_UserSetSelector_Default
userSetSelector.setValue("Default");
dev0.remoteMap.getCommandNode("UserSetLoad").execute();
// AccessException: Node is not writable. : AccessException thrown in node 'UserSetLoad' while calling 'UserSetLoad.Execute()' - Node is not writable.
Without knowing the size of the buffers I cannot continue. How can I infer the PayloadSize to set them up?

(Too) complex configuration management (Java properties)

I'm working at a company having a (too) complex configuration management process:
In each module there is an application.properties file. There are properties for the developers like: database.host = localhost
Properties which change in other environments are maintained in an application.properties file in an override-properties folder (for each module) like: database.host=#dbhost#
There is a default-deployment.properties file with default values for other environments like: database.HOST=noValueConfigured.DB_HOST
A postconfigure.properties file with DATABASE_ HOST=#configure.DB_HOST#
Those files are only needed if a property value depends on the environments (is different for development, testing, live).
Finally there is a Excel document with a sheet for every environment and a row like: configure.DB_HOST - a comment ... - 127.0.0.1 (just as example). The Excel is responsible for generating the correct property files for the rpm packages.
This process is not only complex but also error prone.
How could it be simplified/improved?
The approach should be compatbiel with Spring DI.

I would start with a master configuration file and generate the properties files to start with.
Ultimately you could have a set of proprties files which can be deployed in all environments e.g.
database.host = localhost
database.host.prod = proddb1
database.host.uat = uatdb1
i.e. use the environment/host/region/service at the end as a search path. This has the advantage that you can see the variations between environments.
You can implement this collect like this
public class SearchProperties extends Properties {
private final List<String> searchList;
public SearchProperties(List<String> searchList) {
this.searchList = searchList;
}
#Override
public String getProperty(String key) {
for (String s : searchList) {
String property = super.getProperty(key + "." + s);
if (property != null)
return property;
}
return super.getProperty(key);
}
You might construct this like
Properties prop = new SearchProperties(Arrays.asList(serverName, environment));
This way, if there is a match for that server, it will override, the environment which will overidden the default.
In Java 8 you can do
public String getProperty(String key) {
return searchList.stream()
.map(s -> key + "." + s)
.map(super::getProperty)
.filter(s -> s != null)
.findFirst()
.orElseGet(()-> super.getProperty(key));
}

There should be only one file, even if it has a lot of properties. Also, there should be only one property for each functionality, like database.host, not database.host and database_host, or anything similar.
You need to create hierarchy for such and for every property in order to know which one will be user. For example, if there is some head global value for database.host, it should be used for that property. If not, check next level in hierarchy, like specific environments (like production value). If such does not exist, check next level, like local or test level. And for bottom level, have a default value. In such way, you have two dimension of consuming properties and as such, decreases chances for error dramatically.
In one company I used to work, we had automated deployer which would handle such level setup, we would just set variable on its web site for level we wanted and it would go from top to bottom and set them. We never had problems with such setup and we would have more then 50 variables in app.properties file.

If not to take into consideration all the redesign methods mentioned in the previous comments, you can wrap all the complexity into Tomtit task manager which is good with these types if tasks.
Just create properties files templates and populate them using environments

Mapreduce - sequence jobs?

I am using MapReduce (just map, really) to do a data processing task in four phases. Each phase is one MapReduce job. I need them to run in sequence, that is, don't start phase 2 until phase 1 is done, etc. Does anyone have experience doing this that can share?
Ideally we'd do this 4-job sequence overnight, so making it
cron-able would be a fine thing as well.
thank you

As Daniel mentions, the appengine-pipeline library is meant to solve this problem. I go over chaining mapreduce jobs together in this blog post, under the section "Implementing your own Pipeline jobs".
For convenience, I'll paste the relevant section here:
Now that we know how to launch the predefined MapreducePipeline, let’s take a look at implementing and running our own custom pipeline jobs. The pipeline library provides a low-level library for launching arbitrary distributed computing jobs within appengine, but, for now, we’ll talk specifically about how we can use this to help us chain mapreduce jobs together. Let’s extend our previous example to also output a reverse index of characters and IDs.
First, we define the parent pipeline job.
class ChainMapReducePipeline(mapreduce.base_handler.PipelineBase):
def run(self):
deduped_blob_key = (
yield mapreduce.mapreduce_pipeline.MapreducePipeline(
"test_combiner",
"main.map",
"main.reduce",
"mapreduce.input_readers.RandomStringInputReader",
"mapreduce.output_writers.BlobstoreOutputWriter",
combiner_spec="main.combine",
mapper_params={
"string_length": 1,
"count": 500,
},
reducer_params={
"mime_type": "text/plain",
},
shards=16))
char_to_id_index_blob_key = (
yield mapreduce.mapreduce_pipeline.MapreducePipeline(
"test_chain",
"main.map2",
"main.reduce2",
"mapreduce.input_readers.BlobstoreLineInputReader",
"mapreduce.output_writers.BlobstoreOutputWriter",
# Pass output from first job as input to second job
mapper_params=(yield BlobKeys(deduped_blob_key)),
reducer_params={
"mime_type": "text/plain",
},
shards=4))
This launches the same job as the first example, takes the output from that job, and feeds it into the second job, which reverses each entry. Notice that the result of the first pipeline yield is passed in to mapper_params of the second job. The pipeline library uses magic to detect that the second pipeline depends on the first one finishing and does not launch it until the deduped_blob_key has resolved.
Next, I had to create the BlobKeys helper class. At first, I didn’t think this was necessary, since I could just do:
mapper_params={"blob_keys": deduped_blob_key},
But, this didn’t work for two reasons. The first is that “generator pipelines cannot directly access the outputs of the child Pipelines that it yields”. The code above would require the generator pipeline to create a temporary dict object with the output of the first job, which is not allowed. The second is that the string returned by BlobstoreOutputWriter is of the format “/blobstore/”, but BlobstoreLineInputReader expects simply “”. To solve these problems, I made a little helper BlobKeys class. You’ll find yourself doing this for many jobs, and the pipeline library even includes a set of common wrappers, but they do not work within the MapreducePipeline framework, which I discuss at the bottom of this section.
class BlobKeys(third_party.mapreduce.base_handler.PipelineBase):
"""Returns a dictionary with the supplied keyword arguments."""
def run(self, keys):
# Remove the key from a string in this format:
# /blobstore/<key>
return {
"blob_keys": [k.split("/")[-1] for k in keys]
}
Here is the code for the map2 and reduce2 functions:
def map2(data):
# BlobstoreLineInputReader.next() returns a tuple
start_position, line = data
# Split input based on previous reduce() output format
elements = line.split(" - ")
random_id = elements[0]
char = elements[1]
# Swap 'em
yield (char, random_id)
def reduce2(key, values):
# Create the reverse index entry
yield "%s - %s\n" % (key, ",".join(values))

I'm unfamiliar with google-app-engine, however couldn't you put all of the job-configurations in a single main program and then run them in sequence? something like the following? I think this works in normal map-reduce programs, so if google-app-engine code isn't too different it should work fine.
Configuration conf1 = getConf();
Configuration conf2 = getConf();
Configuration conf3 = getConf();
Configuration conf4 = getConf();
//whatever configuration you do for the jobs
Job job1 = new Job(conf1,"name1");
Job job2 = new Job(conf2,"name2");
Job job3 = new Job(conf3,"name3");
Job job4 = new Job(conf4,"name4");
//setup for the jobs here
job1.waitForCompletion(true);
job2.waitForCompletion(true);
job3.waitForCompletion(true);
job4.waitForCompletion(true);

You need the appengine-pipeline project, which is meant for exactly this.

Upsert for LDAP directory in Java

I'm attempting to execute an Upsert using the Novell JLDAP library, unfortunately, I'm having trouble finding an example of this. Currently, I have to:
public EObject put(EObject eObject){
Subject s = (Subject) eObject;
//Query and grab attributes from subject
LDAPAttributes attr = resultsToAttributes(getLDAPConnection().get(s));
//No modification needed - return
if(s.getAttributes().equals(attr)){
return eObject;
} else {
//Keys:
//REPLACE,ADD,DELETE, depending on which attributes are present in the maps, I choose the operation which will be used
Map<String,LDAPAttribute> operationalMap = figureOutWhichAttributesArePresent(c.getAttributes(),attr);
//Add the Modifcations to a modification array
ArrayList<LDAPModification> modList = new ArrayList<LDAPModification>();
for(Entry entry: operationalMap.getEntrySet()){
//Specify whether it is an update, delete, or insert here. (entry.getKey());
modList.add(new LDAPModification(entry.getKey(),entry.getValue());
}
//commit
connection.modify("directorypathhere",modList.toArray(new LDAPModification[modList.size()]));
}
I'd prefer to not have to query on the customer first, which results in cycling through the subject's attributes as well. Is anyone aware if JNDI or another library is able to execute an update/insert without running multiple statements against LDAP?

Petesh was correct - the abstraction was implemented within the Novell library (as well as the UnboundId library). I was able to "upsert" values using the Modify.REPLACE param for every attribute that came in, passing in null for empty values. This effectively created, updated, and deleted the attributes without having to parse them first.

In LDAP, via LDIF files, an upset would be a single event with two steps. A remove and add of a value. This is denoted by a single dash on a line, between the remove then the add.
I am not sure how you would do it in this library. I would would try to modList.remove and then modList.add one after another and see if that works.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.