I have a jersey oauth provider that uses HmacSHA1 for signing/verifying requests. This works for my development & test platforms where client & server are both different physical systems. However, when I move to a production platform the HmacSHA1 algorithm (provider-side) returns a different value than the HmacSHA1 algorithm (client-side) using the same params & secret, and my oauth validation fails.
The JDK (1.6.x) is the same exact version on both the provider and client for all platforms.
When I switched my oauth provider & client to use the PLAINTEXT signature method (bad for security, I know), it works on all platforms.
When I dug into the jersey OAuthSignature.verify() method, it calls the signature method's (HmacSHA1 or PLAINTEXT) verify function, which simply signs the oauth elements with the secret and compares the value against the signature passed in.
For HmacSHA1, the method calls the Base64.encode() method to generate the signature, but for PLAINTEXT no encoding is done (as expected).
What could be causing the Base64.encode() method using an HmacSHA1 signature algorithm to have different results using the same params & secret on both systems?
Thanks in advance!
--TK
One educated guess: if platform encodings differ (quite common; some platforms use ISO-8859-1, others UTF-8, Windows maybe CP-1250 or whatever, AND OAuth library in question has newbie bugs where encoding is not specified when converting between byte[] and String, AND there are characters that encode differently on different encodings (usually anything but 7-bit ASCII range, characters 0 - 127), and you will end up with different signatures.
So -- you can see what the platform default encoding is; and force it to be same on both first. If this solves the issue, I would consider reporting this as a bug to OAuth lib (or framework that bundles it) author(s), or at least ask on mailing lists.
I have seen such bugs (String.getBytes("test")) VERY often -- it is one of most common Java anti-patterns in existence. Worst part is that it is bug that only causes issues under specific circumstances, so people are not bitten badly enough to fix these.
Another potential issue is with URL encoding -- handling of certain characters (space, %, +) can differ between implementations, due to subtle bugs in encoding/decoding. So you can see if content that you are passing has 'special' characters; try to see if eliminating them (for testing) makes difference, and zero in what triggers the difference.
Related
Context
I am dealing with a PHP system using updated bcrypt algorithm (as there's been a known vulnerability in the underlying algorithm).
So PHP's password_hash function now generates hashes prefixed with $2y$, as the old ones (prefixed with $2a) were vulnerable.
Spring Security's BCrypt that I use in another Java system generates the original $2a$ format hashes, as its underlying implementation (jBCrypt instead of C BCrypt as mentioned in this SO post) wasn't vulnerable to the same attack.
Problem
Checking PHP-generates hashes in Spring Security doesn't work. Is there a way to check PHP-generated hashes using Spring Security?
Example
php > $pwd = password_hash('foo', PASSWORD_BCRYPT, ['cost' => 12]);
php > echo $pwd;
$2y$12$TRc5ZjcmDJ8oFaoR1g7LD.RCxBTUZnGXB66EN9h9rKtNWg.hd7ExK
then using Java + Spring Security:
#Test
public void decryptsPhpHash() {
boolean result = BCrypt.checkpw("foo", "$2y$12$TRc5ZjcmDJ8oFaoR1g7LD.RCxBTUZnGXB66EN9h9rKtNWg.hd7ExK");
assertThat(result).isTrue();
}
throws the following error:
java.lang.IllegalArgumentException: Invalid salt revision
As far as I know, PHP just changed the character a to y to distinguish it itself. Only PHP made this prefix change. So maybe just changing the y back to an a solves this issue.
In June 2011, a bug was discovered in crypt_blowfish, a PHP implementation of BCrypt. It was mis-handling characters with the 8th bit set. They suggested that system administrators update their existing password database, replacing $2a$ with $2x$, to indicate that those hashes are bad (and need to use the old broken algorithm). They also suggested the idea of having crypt_blowfish emit $2y$ for hashes generated by the fixed algorithm.
Nobody else, including canonical OpenBSD, adopted the idea of 2x/2y. This version marker change was limited to crypt_blowfish.
https://en.wikipedia.org/wiki/Bcrypt
I am developing an application in which clients (written in multiple languages - Go, C++, Python, C#, Java, Perl and possibly more in the future) submit protobuf (and in some cases, JSON) messages to SQS. At the other end, the messages are read and decoded by Python and Go clients - depending on the message type. Boto seems to automatically encode the messages into base64, but other language libraries don't seem to do so. Or maybe there are some other rules?
Boto does have an option to submit raw messages.
What is the expected behavior here? Am I supposed to encode messages into base64 on my own - which makes boto an odd case - or am I missing something?
This has caused some subtle bugs in my application because an of extra layer of base64 encoding or decoding. As far as I know, there is no idiomatic way to detect whether a message is base64 encoded or not. The best option is to try to decode and see if it throws an exception - something I don't really like.
I tried to look for some documentation, but couldn't find anything with clear guidelines. Maybe I was looking at the wrong places?
Thanks in advance for any pointers.
You probably want to encode your messages as something because SQS does not accept every possible byte combination in message payload, at the API. Only valid UTF-8, tab, newline, and carriage return are supported.
Important
The following list shows the characters (in Unicode) allowed in your message, according to the W3C XML specification. For more information, go to http://www.w3.org/TR/REC-xml/#charsets If you send any characters not included in the list, your request will be rejected.
#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessage.html
The base64 alphabet clearly falls in this range, making it impossible for a message with base64 encoding to be rejected as invalid. Of course, it also bloats your payload, since base64 expands every 3 bytes of the original message into 4 bytes of output (64 symbols limits each output byte to carrying 6 bits of usable information, 3 x 8 → 4 x 6).
Presumably boto automatically base64-encodes and decodes messages for you in order to be "helpful."
But there is no reason why base64 has to be used at all.
An example that comes to mind... valid JSON would also comply with the restricted character ranges supported by SQS payloads. (Theoretically, I guess, JSON could be argued not to be an "encoding," but that would be a bit pedantic).
There is no clean way to determine whether a message needs to be decoded more than once, other than the sketchy one you proposed, but the argument could be made that if you are in a situation where the need to decode is ambiguous, then that should be eliminated.
If boto's behavior weren't documented and there were no way to make it behave otherwise, I'd say it is wrong behavior. But, as it is, I'll have to relent a bit and say it's just unusual.
Thi is a fundamental questuion about how java works and so i dont have any code to support it.
I am new to java development and want to know how the different number systems, charecter sets like UTF 8 and unicode come together in Java.
Lets say a user creates a new string and int with the same value.
int i=100;
String S="100";
The hardware of a computer understands zeros and ones. so it has to be converted to binary?(correct me if im wrong). this conversion should be done by the JVM(correct me if im wrong)? and to represent charecters of different languages into charecters that can be typed into the keyboard (english) UTF-8 and such conversions are used(correction needed)?
now how does this whole flow fit into the bigger picture of running a java web application?
how does a string/int get converted to a binary for the machine's hardware to understand?
how does it get converted to UTF-8 for a browser to understand?
and what are the default number format and charecterset in java? if im reading contents of a file? will they be read into binary or utf-8?
All computers run in binary. The conversion is done by the JVM and the computer that you have. You shouldn't worry about converting the code into the coordinating 1's and 0's. The browser has its own conversion hard code to change the universal 1's and 0's(used by all programs and computer software) into however it decides to display the given information. All languages are just a translation guide for the user to "speak" with the computer. And vice versa. Hope this helps though I don't think I really answered anything.
How java represents any data type in memory is the choice of the actual JVM. In practice, the JVM will chose the format native to the processor (e.g. chose between little/big endian for int), simply because it offers the best performance on that platform.
Basically, the JLS makes certain guarantees (like that a byte has 8 bits and the values range from -128 to 127) - the VM just maps that to the platform as it deems suitable (the JLS was specified to match common computing technology closely, so there is usually no magic needed to guess how primitive types map to the platform).
You should never care how the VM represents data in memory, java does not offer any legal way to access the data in a manner where you would need to know (bypassing most of the VM's logic by using sun.misc.Unsafe is not considered legal).
If you care for educational purposes, learn what binary representations the underlying platform (e.g. x86) uses and take a look at the VM. It has little to do with java really, its all VM and platform specific.
For java.lang.String, its the implementation of the class that defines how the String is stored internally - it went through quite some changes over major java versions - but what that String exposes is quite narrowly defined (see JDK javadoc for String.length(), String.charAt()).
As for how user input is translated to java standard types, thats actually platform specific. The JVM selects the default encoding (e.g. String.toBytes() can return quite different results for the same string, depending on the platform - thats why its recommended to explictly specify the desired encoding). Same goes for many other things (time zone, number format etc.).
CharSets and Formats are building blocks the program wires up to translate data from the outside world (file, http or user input) into javas representation of data (or vice versa). For example, a Web application will use the encoding from a HTTP header to determine what CharSet to use when interpreting the contents (the HTTP headers encoding is defined to be US-ASCII by the spec).
I would like to know if there is a multi-language library or something that permits to give me the following result:
I have a String = "Abcde12345" in Java
We will suppose its hashcode in Java is "78911"
I Have a String = "Abcde12345" in a C program
What i'd like to know is: how can i easily get the hashcode 78911 in my C program?
Since each language can provide its own hash algorithm for a String, how can i handle that?
I'm asking this in the context of using Distributed Hash Tables (datagrids, distributed caches, NoSQL...). I'm planning to create something similar to a very simple client in C for a Java proprietary datagrid.
This is my usecase for now, but for my project, i will need a hash algorithm compatible with multiple languages:
- Java hash algorithm in Ruby
- C# hash algorithm in Java
- C++ hash algorithm in Java
- Java hash algorithm in C++
- Java hash algorithm in Erlang
In any case, the hash of both algorithms in both languages will need to produce the exact same hash value.
And if possible, i'd like to extend the concept to primitive types and "simple structures" and not just for String
Does anyone know any tool to handle my usecase?
Edit: for Jim Balter
My usecase is:
I have a proprietary partitioning/datagrid technology called GemFire, written in Java.
It acts as a distributed hashmap.
The number of buckets in the hashmap is fixed.
For each map key, it computes its hashcode, and apply a modulo, so that it knows for each key to each bucket it belongs to.
For exemple, if i have 113 bucket (which is the default number of buckets in gemfire), and my map key is the String "Key"
"Key".hashCode() % 113 = 69
Thus GemFire knows "Key" belongs to the 69nth bucket.
Now i have a C application:
This application is already aware of the number of buckets used by Gemfire (113).
This application needs to be able to compute, for any random key, the bucket number in which GemFire would put that random key.
This application needs to be able to compute it fastly, we can't use a webservice.
This application should be easy to deploy, and i don't any bridge technology between C/Java - that would require a JVM to be installed to run the C application
So if you know how to do that without having to write/use a Java hashcode port in C, please tell me.
Edit: to avoid confusion: i'm not looking for a anything else, but Jim Balter you suggested i do not need what i claim to need so tell me if you see any other solution, except using like you said a custom or popular hash algorithm.
And in the future i may need to do the same for an Erlang partitionning application with a C# client application, and other languages!
Edit: I would like to avoid using a non-java hash algo (as someone suggested using md5/sha1 or any faster non-security-oriented hash algo). This is because my solution aims to be deployed on legacy distributed systems oftenly written in Java, which already contain a lot of data, and any change in the hash algorithm would require a heavy migration process of the data.
However i keep this solution in mind since it could be a sweet second option for people starting a new distributed system from scratch or ready to do their data migration.
So in the end, what i am looking for is not some people to tell me to implement the Java String hash algorithm in C, i already know i can do that thanks! I want to know if someone already did it, and not only for implementing all primitive java algorithms in C, but also in other languages, and from other languages!!! I'm looking for a multi-languages library that provides for each other language, a port of the hash algorithms.
Thus if there would be only 3 languages in earth (C, Java and Python), my question is: is there any polyglot library that provides:
A port of Java hash in C
A port of Java hash in Python
A port of C hash in Java
A port of C hash in Python
A port of Python hash in Java
A port of Python hash in C
For all primitive types available, and eventually basic structures.
If for a given language there is no "default hash algorithm" then the most widely used can be considered as the language algorithm.
You see what i mean?
I want to know if there is a LIBRARY! i know i can look in the JDK or specification and implement it on my own, but as i'm targeting a large number of languages and i don't know how to code in every languages, i'd like someone to have did it for me and made available in an opensource, free to use project!
I would add that you can browse via the source code of OpenJDK and see the hashCode implementation. However, bare in mind that as suggested at the comment suggested by Jim Garrison, different classes may override hashCode, so you will have to follow the implementation. I would suggest for performing hashing of Strings to use well known hash functions, such as sha-1 , or maybe md5 - you can find implementations both at Java , C/C++ and other programming languages.
The algorithm for calculating the hash code of a Java string is quite simple and is documented as part of the public specification: http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#hashCode()
The hash code for a String object is computed as s[0]*31^(n-1) +
s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string,
n is the length of the string, and ^ indicates exponentiation. (The
hash value of the empty string is zero.)
Note also that String is a final class so its methods cannot be overridden; thus, you are guaranteed that the given algorithm is correct for any Java String.
For languages other than Java, if the language does not specify the hash algorithm (and Java is unusual in doing so), then you cannot be sure that the hash algorithm won't change, even if you can ascertain it. I suspect that you do not actually need what you claim you need, but you would have to say more about your requirements (as opposed to what you think would address them).
I'm developing a HTTP API that requires encryption. I have tried to use AES to get compatibility between Java, PHP and Javascript, but so far I have managed to get Java<->PHP and then Java<->Javascript, but not both PHP and Javascript at the same time.
Has anyone had any experience with achieving interoperability between these languages and more?
Any advice would be much appreciated.
Thanks
To get AES to work across different systems, you have to make sure that everything is the same on all systems. That means not relying on system defaults for anything - defaults can differ between systems. You need to explicitly specify everything.
specify the mode; use CBC or CTR.
specify the IV. You can prepend it to the cyphertext.
specify the padding; for AES use PKCS7.
if your key is a text string then specify the character encoding used to convert it to bytes.
if your plaintext is a text string then specify the character encoding used to convert it to bytes.
AES is a standard (defined here). No matter which programming language you use, the result has to be the same.
Check some test vectors either from the official definition or - if you've already implemented a block mode of operation - from here.
If your implementation has different result, it might work, but it won't be AES...