We have a Java web service that gets a String representing a MAC address. We want to validate if the given String actually matches the required format. Further we want to create a normalized form to make them comparable.
I searched quite a while but found only some "loose regular expressions". We would really prefer to have a library that can parse different formats and return a normalized (String) representation (i.e. 01-23-45-67-89-ab and 01:23:45:67:89:ab would return the same representation and be comparable).
I expected to find some mature and well tested library, which could do that kind of task. Can anyone please point me to it? I just cannot believe that it doesn't exist yet.
I would be very thankful to not see any RegExes as possible solutions (we know how to do that if necessary).
The IPAddress Java library will do it. The javadoc is available at the link. Disclaimer: I am the project manager.
The library will read various common formats for MAC addresses, like aa:bb:cc:dd:ee:ff, aa-bb-cc-dd-ee-ff, aabb.ccdd.eeff, it supports addresses that are 48 or 64 bits, and also allows you to specify ranges of addresses like aa-ff:bb:cc:*:ee:ff
Verify if an address is valid:
String str = "aa:bb:cc:dd:ee:ff";
MACAddressString addrString = new MACAddressString(str);
try {
MACAddress addr = addrString.toAddress();
...
} catch(AddressStringException e) {
//e.getMessage provides validation issue
}
The library is well tested, it has a test suite with thousands of tests.
mature and well tested library
To verify MAC addresses? It's 6 bytes in hex optionally separated by a delimiter. It's a homework assignment or light interview question, no need to write a library. My solution is 10 lines, and it's more paranoid than necessary...
Related
Looking around I was not able to find a good way to use libsvm with Java and I still have some open questions:
1) It is possible to use only libsvm or I have to use also weka? If any, what's the difference?
2) When using String type data how can I pass the training set as Strings? I was using matlab for a similar problem for proteins classification and there I just gave the strings to the machine without problem. Is there a way to do this in Java?
Here is an incomplete example of what I did in matlab (it works):
[~,posTrain] = fastaread('dataset/1.25.1.3_d1ilk__.pos-train.seq');
[~,posTest] = fastaread('dataset/1.25.1.3_d1ilk__.pos-test.seq');
trainKernel = spectrumKernel(trainData,k);
testKernel = spectrumKernel(testData,k);
trainKf =[(1:length(trainData))', trainKernel];
testKf = [(1:length(testData))', testKernel];
disp('custom');
model = libsvmtrain(trainLabel,trainKf,'-t 4');
[~, accuracy, ~] = libsvmpredict(testLabel,testKf,model)
As you can see I read the file in fasta format and feed them to libsvm but libsvm for java look like it wants something called Node that is made of double. What I did is to take byte[] from the String and then transform them into Double. Is it correct?
3) How to use a custom kernel? I've found this line of code
KernelManager.setCustomKernel(custom_kernel);
but with my libsvm.jar I don't find. Which lib do I have to use?
Sorry for the multiple questions, I hope you will give me a brief overview of what is going on here.
Thanks.
Please note that I've used LIBSVM for MATLAB, but not for Java. I can only really answer question 1, but hopefully this still helps:
It definitely is possible to use libsvm only, and the code is located here: https://www.csie.ntu.edu.tw/~cjlin/libsvm/. Note that jlibsvm is a port of libsvm, and it seems to be easier to use and more optimized for Java. As far as I can tell, weka just has a wrapper class that runs libsvm anyways (it even requires the libsvm.jar), though I mainly based it off of this: https://weka.wikispaces.com/LibSVM.
I have an IPv6 address string: 2001:1:0:0:10:0:10:10
I want to represent it as a short form of IPV6 string: 2001:1::10:0:10:10
Does any one know the java methods to do this?
Since it can be shorten in many different ways in some cases, there is probably no such function in java API. You can manually do:
Inet6Address.getByName("1080::8:800:200C:417A").replaceFirst("(:0)+:", "::");
but I did'n test it very well. There might be some cases this code is wrong.
The open-source IPAddress Java library can provides numerous ways of producing strings for IPv4 and/or IPv6, including the canonical string for IPv6 matching rfc 5952. Disclaimer: I am the project manager of that library.
The method toCanonicalString() produces the canonical string, there is also a method toCompressedString() that is slightly different. With the canonical string a single segment of zero is not compressed, but toCompressedString() will compress such a segment. The method toNormalizedString() will not compress at all.
Using your example 2001:1:0:0:10:0:10:10 and another here is sample code:
IPAddress addr = new IPAddressString("2001:1:0:0:10:0:10:10").getAddress();
System.out.println(addr.toNormalizedString());
System.out.println(addr.toCanonicalString());
System.out.println(addr.toCompressedString());
System.out.println();
addr = new IPAddressString("2001:db8:0:1:1:1:1:1").getAddress();
System.out.println(addr.toNormalizedString());
System.out.println(addr.toCanonicalString());
System.out.println(addr.toCompressedString());
Output:
2001:1:0:0:10:0:10:10
2001:1::10:0:10:10
2001:1::10:0:10:10
2001:db8:0:1:1:1:1:1
2001:db8:0:1:1:1:1:1
2001:db8::1:1:1:1:1
Are there any Java API(s) which will provide plural form of English words (e.g. cacti for cactus)?
Check Evo Inflector which implements English pluralization algorithm based on Damian Conway paper "An Algorithmic Approach to English Pluralization".
The library is tested against data from Wiktionary and reports 100% success rate for 1000 most used English words and 70% success rate for all the words listed in Wiktionary.
If you want even more accuracy you can take Wiktionary dump and parse it to create the database of singular to plural mappings. Take into account that due to the open nature of Wiktionary some data there might by incorrect.
Example Usage:
English.plural("Facility", 1)); // == "Facility"
English.plural("Facility", 2)); // == "Facilities"
jibx-tools provides a convenient pluralizer/depluralizer.
Groovy test:
NameConverter nameTools = new DefaultNameConverter();
assert nameTools.depluralize("apples") == "apple"
nameTools.pluralize("apple") == "apples"
I know there is simple pluralize() function in Ruby on Rails, maybe you could get that through JRuby. The problem really isn't easy, I saw pages of rules on how to pluralize and it wasn't even complete. Some rules are not algorithmic - they depend on stem origin etc. which isn't easily obtained. So you have to decide how perfect you want to be.
considering java, have a look at modeshapes Inflector-Class as member of the package org.modeshape.common.text. Or google for "inflector" and "randall hauch".
Its hard to find this kind of API. rather you need to find out some websservice which can serve your purpose. Check this. I am not sure if this can help you..
(I tried to put word cacti and got cactus somewhere in the response).
If you can harness javascript, I created a lightweight (7.19 KB) javascript for this. Or you could port my script over to Java. Very easy to use:
pluralizer.run('goose') --> 'geese'
pluralizer.run('deer') --> 'deer'
pluralizer.run('can') --> 'cans'
https://github.com/rhroyston/pluralizer-js
BTW: It looks like cacti to cactus is a super special conversion (most ppl are going to say '1 cactus' anyway). Easy to add that if you want to. The source code is easy to read / update.
Wolfram|Alpha return a list of inflection forms for a given word.
See this as an example:
http://www.wolframalpha.com/input/?i=word+cactus+inflected+forms
And here is their API:
http://products.wolframalpha.com/api/
I want to create an encryption with java.Is there anyway to get CPU Id or anything that be unique in PC such as BIOS or ...
for example System.getCpuId(); it is just an example 😉
Thanks a lot ...
So you want a unique number (or string?) that identifies the user's computer? Or at least unique enough that the chance of a duplicate is very low, right?
You can get the Mac address of the network interface. This is making many assumptions, but it may be good enough for your needs:
final byte[] address = NetworkInterface.getNetworkInterfaces().nextElement().getHardwareAddress();
System.out.println("address = " + Arrays.toString(address));
This gives you an array of bytes. You can convert that to an id in several ways... like as a hex string.
Expect support though, when people replace bits of hardware in their computer.
I think such OS specific command is not available in Java.
This link shows a way to run it on windows.
You can't (reliably) get hardware information in pure Java. You would have to use JNA or JNI. Can you clarify what kind of encryption system you're building, and why you need the hardware info?
EDIT: Steve McLeod has noted that Java has a NetworkInterface.getHardwareAddress() method. However, there are serious caveats, including the fact that not all Java implementations allow access to it, and MAC addresses can be trivially forged.
You should also consider a machine can have more than one CPU/NIC/whatever and thus more than one IDs.
if you need unique id you can use UUID :
import java.util.UUID;
public class GenerateUUID {
public static final void main(String... aArgs){
//generate random UUIDs
UUID idOne = UUID.randomUUID();
UUID idTwo = UUID.randomUUID();
log("UUID One: " + idOne);
log("UUID Two: " + idTwo);
}
private static void log(Object aObject){
System.out.println( String.valueOf(aObject) );
}
}
Example run :
>java -cp . GenerateUUID
UUID One: 067e6162-3b6f-4ae2-a171-2470b63dff00
UUID Two: 54947df8-0e9e-4471-a2f9-9af509fb5889
There's no way to get hardware information directly with Java without some JNA/JNI library. That said, you can get "somewhat unique, system-specific values" with System.getEnv(). For instance,
System.getEnv("COMPUTERNAME")
should return computer's name in a Windows system. This is, of course, higly unportable. And the values can change with time in the same system. Or be the same in different systems. Oh, well...
What you are really looking for is a good entropy source, but I would actually suggest you investigate the Java Cryptography Architechture as it provides a framework for this, so you can concentrate on your actual algorithm.
http://java.sun.com/javase/6/docs/technotes/guides/security/crypto/CryptoSpec.html
I want to come up with a binary format for passing data between application instances in a form of POFs (Plain Old Files ;)).
Prerequisites:
should be cross-platform
information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
only sequential access is required
should be a way to check data consistency
should be small and fast
should prevent an average user with archiver + notepad from modifying the data
Currently I'm using DeflaterOutputStream + OutputStreamWriter together with InflaterInputStream + InputStreamReader to save/restore objects serialized with XStream, one object per file. Readers/Writers use UTF8.
Now, need to extend this to support the previously described.
My idea of format:
{serialized to XML object}
{delimiter}
{String file name}{delimiter}{byte[] file data}
{delimiter}
{another String file name}{delimiter}{another byte[] file data}
...
{delimiter}
{delimiter}
{MD5 hash for the entire file}
Does this look sane?
What would you use for a delimiter and how would you determine it?
The right way to calculate MD5 in this case?
What would you suggest to read on the subject?
TIA.
It looks INsane.
why invent a new file format?
why try to prevent only stupid users from changing file?
why use a binary format ( hard to compress ) ?
why use a format that cannot be parsed while being received? (receiver has to receive entire file before being able to act on the file. )
XML is already a serialization format that is compressable. So you are serializing a serialized format.
Would serialization of the model (if you are into MVC) not be another way? I'd prefer to use things in the language (or standard libraries) rather then roll my own if possible. The only issue I can see with that is that the file size may be larger than you want.
1) Does this look sane?
It looks fairly sane. However, if you are going to invent your own format rather than just using Java serialization then you should have a good reason. Do you have any good reasons (they do exist in some cases)? One of the standard reasons for using XStream is to make the result human readable, which a binary format immediately loses. Do you have a good reason for a binary format rather than a human readable one? See this question for why human readable is good (and bad).
Wouldn't it be easier just to put everything in a signed jar. There are already standard Java libraries and tools to do this, and you get compression and verification provided.
2) What would you use for a delimiter and how determine it?
Rather than a delimiter I'd explicitly store the length of each block before the block. It's just as easy, and prevents you having to escape the delimiter if it comes up on its own.
3) The right way to calculate MD5 in this case?
There is example code here which looks sensible.
4) What would you suggest to read on the subject?
On the subject of serialization? I'd read about the Java serialization, JSON, and XStream serialization so I understood the pros and cons of each, especially the benefits of human readable files. I'd also look at a classic file format, for example from Microsoft, to understand possible design decisions from back in the days that every byte mattered, and how these have been extended. For example: The WAV file format.
Let's see this should be pretty straightforward.
Prerequisites:
0. should be cross-platform
1. information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
2. only sequential access is required
3. should be a way to check data consistency
4. should be small and fast
5. should prevent an average user with archiver + notepad from modifying the data
Well guess what, you pretty much have it already, it's built-in the platform already:Object Serialization
If you need to reduce the amount of data sent in the wire and provide a custom serialization ( for instance you can sent only 1,2,3 for a given object without using the attribute name or nothing similar, and read them in the same sequence, ) you can use this somehow "Hidden feature"
If you really need it in "text plain" you can also encode it, it takes almost the same amount of bytes.
For instance this bean:
import java.io.*;
public class SimpleBean implements Serializable {
private String website = "http://stackoverflow.com";
public String toString() {
return website;
}
}
Could be represented like this:
rO0ABXNyAApTaW1wbGVCZWFuPB4W2ZRCqRICAAFMAAd3ZWJzaXRldAASTGphdmEvbGFuZy9TdHJpbmc7eHB0ABhodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20=
See this answer
Additionally, if you need a sounded protocol you can also check to Protobuf, Google's internal exchange format.
You could use a zip (rar / 7z / tar.gz / ...) library. Many exists, most are well tested and it'll likely save you some time.
Possibly not as much fun though.
I agree in that it doesn't really sound like you need a new format, or a binary one.
If you truly want a binary format, why not consider one of these first:
Binary XML (fast infoset, Bnux)
Hessian
google packet buffers
But besides that, many textual formats should work just fine (or perhaps better) too; easier to debug, extensive tool support, compresses to about same size as binary (binary compresses poorly, and information theory suggests that for same effective information, same compression rate is achieved -- and this has been true in my testing).
So perhaps also consider:
Json works well; binary support via base64 (with, say, http://jackson.codehaus.org/)
XML not too bad either; efficient streaming parsers, some with base64 support (http://woodstox.codehaus.org/, "typed access API" under 'org.codehaus.stax2.typed.TypedXMLStreamReader').
So it kind of sounds like you just want to build something of your own. Nothing wrong with that, as a hobby, but if so you need to consider it as such.
It likely is not a requirement for the system you are building.
Perhaps you could explain how this is better than using an existing file format such as JAR.
Most standard files formats of this type just use CRC as its faster to calculate. MD5 is more appropriate if you want to prevent deliberate modification.
Bencode could be the way to go.
Here's an excellent implementation by Daniel Spiewak.
Unfortunately, bencode spec doesn't support utf8 which is a showstopper for me.
Might come to this later but currently xml seems like a better choice (with blobs serialized as a Map).