Check multiple ip addresses effciently against one address - java

I have a list of ip addresses as Strings, and an adress in String format to check agianst these adresses. My goal is to find whether my address is in the list or not. To make this situation difficult, these ip addresses not full adresses, but rather regular expressions. e.g. 10\.25\.3
What is the most efficient way to run hundreds of regex patters on a single String? Piping them? Search tree? A specific java container that can help me?
Do you have any suggestions?
Edit:
I can transform and store the list of addresses to anything before processing, this does not matter.

You can try using this library that was originally part of the RIPE whois server to quickly find matches. It allows you to build a tree of ranges and navigates the tree in an extremely efficient manner.
Example:
// create & populate whitelist/blacklist tree
NestedIntervalMap<Ipv4Resource, Boolean> map = new NestedIntervalMap<>();
map.put(Ipv4Interval.parse("192.168/19"), true);
map.put(Ipv4Interval.parse("192.168.52.1"), false);
map.put(Ipv4Interval.parse("0/0"), false);
// lookup if incoming IP is allowed to connect
boolean allow = map.findFirstLessSpecific(new Ipv4Interval(incomingSocket.getInetAddress()));

Related

Find difference of two Strings using java-diff-utils, or other?

i am looking to find an example of Java-diff-utils finding the difference between two strings e.g.
String s1 = "{\"type\": \"message_typing\",\"requestId\": \"requestid\",\"clientMutationId\": \"mutationid\",\"chatRoomId\": \"roomid\",\"conversationId\": \"conversationid\"}";
String s2 = "{\"type\": \"type_2\",\"requestId\": \"1\",\"clientMutationId\": \"2\",\"chatRoomId\": \"dev/test\",\"conversationId\": \"aa2344ceDsea1\"}";
so the first is the base message the second is the one i would like to compare against the base message and get the different values (e.g. type_2,1,2,dev/test,aa2344ceDsea1) however i would like to be able to reassemble the string correctly if given only the base message and the values of the diffs.
I can only find examples using two files online and no examples using two strings if anyone could give me a code example that would be very helpful. I have already did it using google-diff-patch-match however the returned diffs are too large for what i need. as i will be sending the diffs over MQTT in order to keep payload size down, so i need something that can extract the values but still be able to reassemble on the other side when given the values and base message.

How do you convert a java String to a mailing address object?

As input I am getting an address as a String. It may say something like "123 Fake Street\nLos Angeles, CA 99988". How can I convert this into an object with fields like this:
Address1
Address2
City
State
Zip Code
Or something similar to this? If there is a java library that can do this, all the better.
Unfortunately, I don't have a choice about the String as input. It's part of a specification I'm trying to implement.
The input is not going to be very well structured so the code will need to be very fault tolerant. Also, the addresses could be from all over the world, but 99 out of 100 are probably in the US.
You can use JGeocoder
public static void main(String[] args) {
Map<AddressComponent, String> parsedAddr = AddressParser.parseAddress("Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043");
System.out.println(parsedAddr);
Map<AddressComponent, String> normalizedAddr = AddressStandardizer.normalizeParsedAddress(parsedAddr);
System.out.println(normalizedAddr);
}
Output will be:
{street=Amphitheatre, city=Mountain View, number=1600, zip=94043, state=CA, name=Google Inc, type=Parkway}
{street=AMPHITHEATRE, city=MOUNTAIN VIEW, number=1600, zip=94043, state=CA, name=GOOGLE INC, type=PKWY}
There is another library International Address Parser you can check its trial version. It supports country as well.
AddressParser addressParser = AddressParser.getInstance();
AddressStandardizer standardizer = AddressStandardizer.getInstance();//if enabled
AddressFormater formater = AddressFormater.getInstance();
String rawAddress = "101 Avenue des Champs-Elysées 75008 Paris";
//you can try to detect the country
CountryDetector detector = CountryDetector.getInstance();
String countryCode = detector.getCountryCode("7580 Commerce Center Dr ALABAMA");
System.out.println("detected country=" + countryCode);
Also, please check Implemented Countries in this library.
Cheers !!
I work at SmartyStreets where we develop address parsing and extraction algorithms.
It's hard.
If most of your addresses are in the US, you can use an address verification service to provide guaranteed accurate parse results (since the addresses are checked against a master list).
There are several providers out there, so take a look around and find one that suits you. Since you probably won't be able to install the database locally (not without a big fee, because address data is licensed by the USPS), look for one that offers a REST endpoint so you can just make an HTTP request. Since it sounds like you have a lot of addresses, make sure the API is high-performing and lets you do batch requests.
For example, with ours:
Input:
13001 Point Richmond Dr NW, Gig Harbor WA
Output:
Or the more specific breakdown of components, if needed:
If the input is even messier, there are a few address extraction services available that can handle a little bit of noise within an address and parse addresses out of text and turn them into their components. (SmartyStreets offers this also, as a beta API. I believe some other NLP services do similar things too.)
Granted, this only works for US addresses. I'm not as expert on UK or Canadian addresses, but I believe they may be slightly simpler in general.
(Beyond a small handful of well-developed countries, international data is really hit-and-miss. Reliable data sets are hard to obtain or don't exist. But if you're on a really tight budget you could write your own parser for all the address formats.)
If you are sure on the format, you can use regular expressions to get the address out of the string. For the example you provided something like this:
String address = "123 Fake Street\\nLos Angeles, CA 99988";
String[] parts = address.split("(.*)\\n(.*), ([A-Z]{2}) ([0-9]{5})");
I assume the sequence of information is always the same, as in the user will never enter postal code before State. If I got your question correctly you need logic to process afdress that may be incomplete (like missing a portion).
One way to do it is look for portions of string you know are correct. You can treat the known parts of Address as separators. You will need City and State names and address words (Such as "Street", "Avenue", "Road" etc) in an array.
Perform Index of with cities,states and the address words (and store them).
Substring and cut out the 1st line of address (from start to the index of address signifying word +it's length).
Check index of city name (index found in step 1). If it's -1 skip this step. If it's 0 Take it out (0 also means address line 2 is not in string). If it's more than 0, Substring and cut out anything from start of string to index of city name as the 2nd line of address.
Check the index of state name. Once again if -1 skip this step. If 0 substring and cut out as state name.
Whatever remains is your postal code
Check the strings you just extracted for left over separators (commas, dots, new lines etc) and extract them;
If the address is missing both state and city you would actually need an a list of zip codes too, so better ensure the user enters at least 1 of them.
It's not impossible to implement what you need, but you probably don't want to waste all that time doing it. It's easier to just ensure user enters everything correctly.
Maybe you can use Regular Expression

Java Library to validate and compare MAC address

We have a Java web service that gets a String representing a MAC address. We want to validate if the given String actually matches the required format. Further we want to create a normalized form to make them comparable.
I searched quite a while but found only some "loose regular expressions". We would really prefer to have a library that can parse different formats and return a normalized (String) representation (i.e. 01-23-45-67-89-ab and 01:23:45:67:89:ab would return the same representation and be comparable).
I expected to find some mature and well tested library, which could do that kind of task. Can anyone please point me to it? I just cannot believe that it doesn't exist yet.
I would be very thankful to not see any RegExes as possible solutions (we know how to do that if necessary).
The IPAddress Java library will do it. The javadoc is available at the link. Disclaimer: I am the project manager.
The library will read various common formats for MAC addresses, like aa:bb:cc:dd:ee:ff, aa-bb-cc-dd-ee-ff, aabb.ccdd.eeff, it supports addresses that are 48 or 64 bits, and also allows you to specify ranges of addresses like aa-ff:bb:cc:*:ee:ff
Verify if an address is valid:
String str = "aa:bb:cc:dd:ee:ff";
MACAddressString addrString = new MACAddressString(str);
try {
MACAddress addr = addrString.toAddress();
...
} catch(AddressStringException e) {
//e.getMessage provides validation issue
}
The library is well tested, it has a test suite with thousands of tests.
mature and well tested library
To verify MAC addresses? It's 6 bytes in hex optionally separated by a delimiter. It's a homework assignment or light interview question, no need to write a library. My solution is 10 lines, and it's more paranoid than necessary...

Optimized way of doing string.endsWith() work.

I need to look for all web requests received by Application Server to check if the URL has extensions like .css, .gif, etc
Referred how tomcat is listening for every request and they pick the right configured Servlet to serve.
CharChunk , MessageBytes , Mapper
Here is my idea to implement:
Load all the extensions we like to compare and get the byte
representation of them.
get a unique value for this xtension by summing up the bytes in the byte Array // eg: "css".getBytes()
Add the result value to Sorted List
Whenever we receive the request, get the byte representation of the URL // eg: "flipkart.com/eshopping/images/theme.css".getBytes()
Start summing the bytes from the byte array's last index and break when we encounter "." dot byte value
Search for existence of the value thus summed with the Sorted List // Use binary Search here
Kindly give your feed backs about the implementation and issues if any.
-With thanks, Krishna
This sounds way more complicated than it needs to be.
Use String.lastIndeXOf to find the last dot in the URL
Use String.substring to get the extension based on that
Have a HashSet<String> for a set of supported extensions, or a HashMap<String, Whatever> if you want to map the extension to something else
I would be absolutely shocked to discover that this simple approach turned out to be a performance bottleneck - and indeed I suspect it would be more efficient than the approach you suggested, given that it doesn't require the entire URL to be converted into a byte array... (It's not clear why your approach uses byte arrays anyway instead of forming the hash from char values.)
Fundamentally, my preferred approach to performance is:
Do up-front design and testing around things which are hard to change later, architecturally
For everything else:
Determine the performance criteria first so you know when you can stop
Write the simplest code that works
Test it with realistic data
If it doesn't perform well enough, use profilers (etc) to work out where the bottleneck is, and optimize that making sure that you can prove the benefits using your existing tests

Represent short form of IPV6 Address

I have an IPv6 address string: 2001:1:0:0:10:0:10:10
I want to represent it as a short form of IPV6 string: 2001:1::10:0:10:10
Does any one know the java methods to do this?
Since it can be shorten in many different ways in some cases, there is probably no such function in java API. You can manually do:
Inet6Address.getByName("1080::8:800:200C:417A").replaceFirst("(:0)+:", "::");
but I did'n test it very well. There might be some cases this code is wrong.
The open-source IPAddress Java library can provides numerous ways of producing strings for IPv4 and/or IPv6, including the canonical string for IPv6 matching rfc 5952. Disclaimer: I am the project manager of that library.
The method toCanonicalString() produces the canonical string, there is also a method toCompressedString() that is slightly different. With the canonical string a single segment of zero is not compressed, but toCompressedString() will compress such a segment. The method toNormalizedString() will not compress at all.
Using your example 2001:1:0:0:10:0:10:10 and another here is sample code:
IPAddress addr = new IPAddressString("2001:1:0:0:10:0:10:10").getAddress();
System.out.println(addr.toNormalizedString());
System.out.println(addr.toCanonicalString());
System.out.println(addr.toCompressedString());
System.out.println();
addr = new IPAddressString("2001:db8:0:1:1:1:1:1").getAddress();
System.out.println(addr.toNormalizedString());
System.out.println(addr.toCanonicalString());
System.out.println(addr.toCompressedString());
Output:
2001:1:0:0:10:0:10:10
2001:1::10:0:10:10
2001:1::10:0:10:10
2001:db8:0:1:1:1:1:1
2001:db8:0:1:1:1:1:1
2001:db8::1:1:1:1:1

Categories