Exposing Entity IDs of Google Datastore data - java

Is it save to expose entity ids of data that is in Google Datastore.
For example in my code i have entity with this id:
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
#Extension(vendorName="datanucleus", key="gae.encoded-pk", value="true")
private String id;
The id is going to be similar to this: agptZeERtzaWYvSQadLEgZDdRsUYRs
Can anyone extract password, application url and any other information from this string? What is the meaning of that string?

That entity ID contains the object id, appliation id, and object class name. It's just an encoded string. Not really any sort of security risk.

You can use the KeyFactory to convert to keytoString, stringToKey as follows URL Google App Engine:
the ID that I believe that it was an unique id for the data storage in Google App Engine.
Key instances can be converted to and
from the encoded string representation
using the KeyFactory methods
keyToString() and stringToKey().
When using encoded key strings, you
can provide access to an object's
string or numeric ID with an
additional fields.
I hope it helps.
Tiger.

If you navigate to localhost:4321/_ah/admin , you can take advantage of the sdk datastore viewer, where you will see that every kind of entity has a KEY field and a NAME/ID field;
Whether you use long, String or Key as your #PrimaryKey, there will be an ID/Name column with a String/number, and a KEY column, with the encoded key for said ID. As mentioned in other posts, this encoding hashes {md5s, most likely} your appspot application id, the fully qualified classname of the data object, and whatever you specify as the #PrimaryKey.
The only time you will ever want direct access to this field is if you absolutely don't care what the data is named,{when you need your program to find it, but humans won't be searching for it by guessing words into a text box}, or when you WANT to have multiple objects of the same type and name {maybe using a version int?} then you should use the encoded key syntax. Both KEY and ID are present in the db whether you put a field in your class, using the encoded key syntax just gives you access to this value.
Also, there is an available speed bonus for applications that use encoded keys... There are only two types of queries: SELECT * and SELECT _ _ key _ _ {spaces used to show there are two _}. For large data sets in AJAX apps, the only efficient way to paginate data is to select all the keys, send them to the client, and have the client ask for 0->X number of records, build links for the other X->Y results, and query the server with the first set of encoded keys for full data, parse response into nice little lists, and avoid loading 397 server data objects that aren't immediately useful.
Sending encoded keys up and down the wire might take a little more bandwidth than unencoded keys {unless you're as long winded at naming things as I am!}; but it shaves those cpu cycles on appengine, makes your quotas happier, and everybody's app runs just a fraction bit faster!
This key, even if somehow unhashed, will only expose data as sensitive as whatever you make a PrimaryKey. You app password is not involved, nor will user passwords in any sane data model. About the only thing that might {BIG might} leak is a user email address, if you use the provided User class for authentication, or the class names you use in your source.
...Basically, only information already available in watching a firebug request or two could possibly be exposed.

Related

FHIR Resource logical id generation using SHA256

I am trying to implement code that generates FHIR message from some type of input message. When I create each FHIR resources, I would need to create resource logical id that are unique and repeatedly generated.
From Microsoft's FHIR-Converter github repository, I found that they use SHA256 to hash the input string value to generate some type of 64 character id. I used the same approach to generate UUID in java. Here is code from Microsoft FHIR-Converter in .NET:
public static string GenerateUUID(string input)
{
if (string.IsNullOrWhiteSpace(input))
{
return null;
}
var bytes = Encoding.UTF8.GetBytes(input);
var algorithm = SHA256.Create();
var hash = algorithm.ComputeHash(bytes);
var guid = new byte[16];
Array.Copy(hash, 0, guid, 0, 16);
return new Guid(guid).ToString();
}
It generates uuid like this: e40b96a6-e62e-a67e-3ac7-69a099830e1c
My questions are:
In order to repeatedly generate the same id, does the string input MUST be same as well? Meaning, if I have an input of 123, it will generate e40b96a6-e62e-a67e-3ac7-69a099830e1c all the time?
If I HAVE to use unique id in order to generate this uuid, what is the advantage of using this extra step? If my input always have unique id for each resources, can I just assign id to be (Resource name)-(id)?
Is there a way to generate id without having unique id? I have some resources that do not have something unique. Are there other techniques where I can generate a unique input that can be repeated in different platforms? I don't see how I can do this without providing unique id from input..
A given string will always generate the same id. A different string should generate a different id, though there's a very slim chance of two strings generating the same hash.
There are rules for the format of the id (only certain characters permitted, maximum length allowed), but other than that, no obvious benefit I can see. It's fine to use your 'native' identifier as the resource id. (That said, resource ids generally shouldn't be real-world identifiers like social security numbers, license numbers, etc. as that can leak protected information.)
The expectation in FHIR is that a unique resource id corresponds to a unique real-world object. If you don't have a real identifier on the object, there's a possibility you could have multiple instances that correspond to distinct real-world objects. E.g. multiple Practitioner instances where all you have is a name of "A. Smith" would not be appropriate to presume are always the same instance. If you have no 'identity', you might be better off using the 'contained' mechanism rather than generating an id just from the content.

Convert string to specific int

I am creating foreground notification with ID like so:
startForeground(1, notification)
When initialising the service I am sending to it some string (ex: Hello). I wish that the service and notification will be bind to this string so I wish to use it as my id. So, how can I convert string to unique ID? For example the word "Hello" will always generate 123 And the word Bye will always generate 456.
That sounds like you want a "Hash Code"; a value derived from some other information that is (hopefully, but not always) unique.
There are a lot of different algorithms available to do this and if you search for "hash code" you will find lots of them (especially in the security domain; sha, md5 etc)
However,
It sounds like you may not really need to get that complex (some of the more secure and "unique" hash code algorithms can be slow to calculate).
Is there any reason why you can't use the string itself?
String comparison may be slow, but maybe not as slow as a good hash. Also you might be able to use a Hash Table if you need a faster "lookup".. hashmap
Anyway, if you really do need a hash code from a string, a quick search found this (which looks reasonable) Sam Clarke; Kotlin Hash Strings

Appengine ID/Name vs WebSafeKey

When writing the endpoints in java, for finding items by their keys, should I use the Id or the webSafeString of the key? In what situations does this matter?
It's up to you.
Do the entities have parents? Then you probably want to use the urlsafe representation as a single string will contain the full path to the entity. If you used an ID instead - you would somehow need to manually include the IDs of all parents up to the root.
No parents & IDs are numeric / alphanumeric? Then just use the IDs as they look cleaner (again, this is not a rule and is completely up to you).
No parents but IDs have special characters in them? Use the urlsafe representation as you might have issues with not being able to use some special characters without encoding them in HTTP.
Note #1: the urlsafe representation have the entity names encoded that can be easily decoded, this is unlikely a privacy issue but you still should be aware of it. The actual data (IDs) are also simply encoded and can be easily decoded, so be careful when you use personal information such as emails as IDs, they are not safe with urlsafe.
Note #2: if you decide to change the structure of your data in the future (parents <-> children), you might get stuck with some urlsafe data you issued to your users who are not aware of the changes you might have done.

Alternatives for #formula annotation or sql decrypt function

I have an issue with the #Formula annotation in Hibernate when I'm trying to decrypt a password column (PWD_COL) using a key (which is retrieved from a properties file)
The code:
#Formula("decrypt(PWD_COL, '" + MyKeys.DECRYPT_KEY + "')")
private String myPwd;
I am trying to get the DECRYPT_KEY from another property file.
I am getting an error:
The value for annotation attribute Formula.value must be a constant
expression
Alternately, is there a way to mimic the SQL decrypt function in Java?
Note: Please read the password field as just another value. The eventual purpose of this exercise is for something far less important but nevertheless needs to be encrypted.
I know that this is not what you are looking for but let me give you a advice about store password on database, maybe you should change your mindset about how to work with passwords.
You should not decrypt passwords on database because security reasons, so if someone lost password they should create a new one.
To validate login and related tasks you should take the password from the form, encrypt it and compare with the encrypted on in database.
If you really want to keep doing in this way use #formula with valid sql values.
The value of a #Formula annotation has to be valid SQL since it is passed more or less directly to the underlying DB.
This also explains why your idea won't work - the DB will have no notion of the MyKeys class.
You could insert the key in a DB table and select it from there in the #Formula but security-wise that might not be a particularly sane idea...
What you really should be doing (or actually not doing) is to avoid storing passwords, but rather store hashes of passwords and then compare those hashes with the hash of whatever credentials your user presents. That moves encryption/hashing to Java/memory and avoids the embarassment when somebody steals your database, guesses the weak password or bruteforces the encryption and posts it all on pastebin!
Cheers,

Unique serial number in a java web application

I've been wondering what's the correct practice for generating unique ids? The thing is in my web app I'll have a plugin system, when a user registers a plugin I want to generate a unique serial ID for it. I've been thinking about storing all numbers in a DB or a file on the server, generating a random number and checking whether it already exists in the DB/file, but that doesn't seem that good. Are there other ways to do it? Would using the UUID be the preferred way to go?
If the ids are user-facing, which it seems they are, then you want them to be difficult to guess. Use the built-in UUID class, which generates random ids for you and can format them nicely for you. Extract:
UUID idOne = UUID.randomUUID();
UUID idTwo = UUID.randomUUID();
log("UUID One: " + idOne);
log("UUID Two: " + idTwo);
Example output:
UUID One: 067e6162-3b6f-4ae2-a171-2470b63dff00
UUID Two: 54947df8-0e9e-4471-a2f9-9af509fb5889
There are other solutions in the link provided. I think it compares the methods quite well, so choose the one which best suits your needs.
Another interesting method is the one MongoDB uses, but this is possibly overkill for your needs:
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter. Note that the
timestamp and counter fields must be
stored big endian unlike the rest of
BSON
If they weren't user facing, then you could just leave it to the database to do an auto-incrementing id: 1, 2, 3, etc.
Why not go for a static (or, in this case, a context-scoped) AtomicInteger that would be incremented when a plugin is registered ?
You could base it on user ID and timestamp, ensuring uniqueness but also providing a certain "readability"? E.g. User ID = 101, timestamp = 23 Dec 2010 13:54, you could give it ID:
201012231354101
or
101201012231354
The alternative UUID is obviously guaranteed to be unique but takes up a lot of DB space, and is quite unwieldy to work with.
A final idea is to ask your central DB for a unique ID, e.g. Oracle has sequences, or MySQL uses AUTO_INCREMENT fields, to assign unique integers.

Categories