Transaction log library

Transaction log library - java

I am in need of transaction log library with following features:
maximum performance. No force (flush), let O/S write buffers at its own discretion. File size increases in big chunks, to minimize metadata modifications. I don't care if some last records are lost.
reading records in backward order (most recent first).
The problem is, how to find the last valid record when reading the log file? What technics can be used, or is there a ready opensource library?

Did you check if HOWL - High-speed ObjectWeb Logger matches your requirements? It is rather out of date and seems not to allow random access or reading backward. However, it supports setting a mark and replaying events from a mark on. Because it is open source it might be adaptable to your needs.
You may investigate too if the logging part of JBoss Transaction is suitable.
Please specifiy what you mean with "read backwards" through a transaction log. A transaction log may contain logs from multiple transactions each consisting of a sequence of events.
More information on transaction logging can be found here (and on the web of course):
Java Transaction Processing: Design and Implementation (ISBN 978-0130352903)
Fundamentals of Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery (ISBN 978-1558605084)
Principles of Transaction Processing (ISBN 978-1558606234)
and in various books about database system concepts
Hope this helps a bit to come closer to your goal
Michael

Most of the famous logging systems (like log4j and apache) support different kind of logging mechanisems, you just have to config them right. But, if you want to log backward, it is really resource consuming, because streams are sequential and you should push a new record into top of all the other records. Also probably you should do most of the logging code by yourself.

Related

Warn on Logging security Info

I'm so worried about people logging confidential information to server logs.
I have seen server logs in production. Some developers are accidentally logging security related
information like password, clientId, clientSecret etc.
Is there any way, like Eclipse plugin or any tool, to warn developers while writing their code?
`ex : log.info("usernam = " + username + "password = " + password) ;` //
Warn that confidential info is getting logged.
I have done some research... I have seen tools like sonarLint and FindBug
but those plugins are unable to solve my problem.

SonarLint offers the rule S2068: Credentials should not be hard-coded, which targets the use of hard-coded credentials, and it seems close to what you are trying to achieve, though it may be not enough for your needs.
As stated in other answers, however, identifying such security holes can be ultimately hard and strong code reviews is certainly a good move to reduce the risks.
Now, if you really fear about usages of loggers, already knows potential issues, and what data could leak, I would suggest to write your own Java Custom Rule for SonarQube.
Custom rules are supported by SonarLint and can be applied at enterprise level once the Custom Plugin containing it is deployed on a SonarQube server. This solution would allow you to explicitly define what you want to target, and fine-tune a rule depending on your needs and enterprise specifics. Writing such rules is not hard and documented in the following tutorial: Custom rules for Java.

There are many different ways how security holes can appear. Logging data to the browser console is only one of them.
And to my knowledge, there is no tool that can detect those security issues automatically. It is the responsibility of the programmer to not expose private user information on a page.
In this case the advice is: Never log passwords (especially unencrypted ones) to the browser console! Instead, encrypt your passwords in the database with an algorithm that can't be decrypted.

Another approach is to create a custom log appender that looks for certain tell-tale patterns (e.g. works like "password" and "passwd") and obliterates the messages, or throws an error.
However, this could be dangerous. If the bad guys knew you were doing this, they might try to exploit it to cover their tracks or even crash your server.

I wouldn't hold my breath for some out-of-the-box solution on this one. Beyond your own logging, you also have to be concerned about the logging done by your dependencies. That said, you have two areas to work on: what goes into the logs and who has access to the logs.
As far as what goes into the logs, your best tools to combat this problem are education and collaboration (including the aforementioned code reviews). Start with writing a list of non-functional requirements for logging that includes security that addresses what to log and how to log (markers, levels, sensitive parameters, etc). I recommend working with colleagues on defining this list so it doesn't become known as "Ravi's logging crusade" instead of "something we really need to do".
Once that list is defined and you get your colleague's and/or management's buy-in, you can write wrappers for logging implementations that support the list of non-functional logging requirements that you assembled. If it is really necessary to log sensitive parameters, provide a way for the parameters to be asymmetrically encrypted for later retrieval by a root account: such as the encryption key stored in a file only accessible by root/container. For management, you might have to spend some time writing up value propositions describing why your initiative is valuable to your company.
Next work with whoever defines your SLDC - make sure the change to your SDLC is outwardly communicated. Have them create a Secure Coding checklist for your company to implement with 1 item on it that says: All logging is implemented using OurCompanySecureLogger. Now you can start working on enforcing the initiative. I recommend writing a check on the build server that looks at dependencies and fails the build if it finds a direct reference to log4j, slf4j, logback, etc.
Regarding the other half of the problem, work with your SysOps team to define rules of Segregation of Duties. That is, software engineers shouldn't have access to the servers where logging is being performed. If you're not staffed well enough at this point to support this notion, you might have to get creative.

May be you should try Contrast tool. Its good one and we are using it since long.
It takes care of all updated owasp top 10 issues.
Quite good for finding security holes in enterprise applications.
Their support is also good.

Logging from Java app to ELK without need for parsing logs

I want to send logs from a Java app to ElasticSearch, and the conventional approach seems to be to set up Logstash on the server running the app, and have logstash parse the log files (with regex...!) and load them into ElasticSearch.
Is there a reason it's done this way, rather than just setting up log4J (or logback) to log things in the desired format directly into a log collector that can then be shipped to ElasticSearch asynchronously? It seems crazy to me to have to fiddle with grok filters to deal with multiline stack traces (and burn CPU cycles on log parsing) when the app itself could just log it the desired format in the first place?
On a tangentially related note, for apps running in a Docker container, is best practice to log directly to ElasticSearch, given the need to run only one process?

If you really want to go down that path, the idea would be to use something like an Elasticsearch appender (or this one or this other one) which would ship your logs directly to your ES cluster.
However, I'd advise against it for the same reasons mentioned by #Vineeth Mohan. You'd also need to ask yourself a couple questions, but mainly what would happen if your ES cluster goes down for any reason (OOM, network down, ES upgrade, etc)?
There are many reasons why asynchronicity exists, one of which is robustness of your architecture and most of the time that's much more important than burning a few more CPU cycles on log parsing.
Also note that there is an ongoing discussion about this very subject going on in the official ES discussion forum.

I think it's usually ill-advised to log directly to Elasticsearch from a Log4j/Logback/whatever appender, but I agree that writing Logstash filters to parse a "normal" human-readable Java log is a bad idea too. I use https://github.com/logstash/log4j-jsonevent-layout everywhere I can to have Log4j's regular file appenders produce JSON logs that don't require any further parsing by Logstash.

There is also https://github.com/elastic/java-ecs-logging which provides a layout for log4j, log4j2 and Logback. It's quite efficient and the Filebeat configuration is very minimal.
Disclaimer: I'm the author of this library.

If you need a quick solution I've written this appender here Log4J2 Elastic REST Appender if you want to use it. It has the ability to buffer log events based on time and/or number of events before sending it to Elastic (using the _bulk API so that it sends it all in one go). It has been published to Maven Central so it's pretty straight forward.
As the other folks have already mentioned the best way to do it would be to save it to file, and then ship it to ES separately. However I think that there is value if you need to get something running quickly until you have time/resources implement the optimal way.

Are transactions on top of "normal file system" possible?

It seems to be possible to implement transactions on top of normal file systems using techniques like write-ahead logging, two-phase commit, and shadow-paging etc.
Indeed, it must have been possible because a transactional database engine like InnoDB can be deployed on top of a normal file system. There are also libraries like XADisk.
However, Apache Commons Transaction state:
...we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. ...
Why did Apache Commons Transactions claim implementing transactions on top of normal file systems is impossible?
Is it impossible to do transactions on top of normal file systems?

Windows offers transactions on top of NTFS. See the description here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb968806%28v=vs.85%29.aspx
It's not recommended for use at the moment and there's an extensive discussion of alternative scenarios right in MSDN: http://msdn.microsoft.com/en-us/library/windows/desktop/hh802690%28v=vs.85%29.aspx .
Also if you take a definition of the filesystem, DBMS is also a kind of a filesystem and a filesystem (like NTFS or ext3) can be implemented on top (or in) DBMS as well. So Apache's statement is a bit, hmm, incorrect.

This answer is pure speculation, but you may be comparing apples and oranges. Or perhaps more accurately, milk and dairy products.
When a database uses a file system, it is only using a small handful of predefined files on the system (per database). These include data files and log files. The one operation that is absolutely necessary for ACID-compliant transactions is the ability to force a write to permanent memory (either disk or static RAM). And, I think most file systems provide this capability.
With this mechanism, the database can maintain locks on objects in the database as well as control access to all objects. Happily, the database has layers of memory/page management built on top of the file system. The "database" itself is written in terms of things like pages, tables, and indexes, not files, directories, and disk blocks.
A more generic transactional system has other challenges. It would need, for instance, atomic actions for more things. E.g. if you "transactionally" delete 10 files, all these would have to disappear at the same time. I don't think "traditional" file systems have this capability.
In the database world, the equivalent would be deleting 10 tables. Well, you essentially create new versions of the system tables without the tables — within a transaction, while the old tables are being used. Then you put a full lock on the system tables (preventing reads and writes), waiting until they are available. Then you swap in the new table definitions (i.e. without the tables), unlock the tables, and clean up the data. (This is intended as an intuitive view of the locking mechanism in this case, not a 100% accurate description.)
So, notice that locking and transactions are deeply embedded in the actions the database is doing. I suspect that the authors of this module come to realize that they had to basically fully re-implement all existing file system functionality to support their transactions — and that was a bit too much scope to take on.

Running Neo4j purely in memory without any persistence

I don't want to persist any data but still want to use Neo4j for it's graph traversal and algorithm capabilities. In an embedded database, I've configured cache_type = strong and after all the writes I set the transaction to failure. But my write speeds (node, relationship creation speeds) are a slow and this is becoming a big bottleneck in my process.
So, the question is, can Neo4j be run without any persistence aspects to it at all and just as a pure API? I tried others like JGraphT but those don't have traversal mechanisms like the ones Neo4j provides.

As far as I know, Neo4J data storage and Lucene indexes are always written to files. On Linux, at least, you could set up a ramfs filing system to hold the files in-memory.
See also:
Loading all Neo4J db to RAM

How many changes do you group in each transaction? You should try to group up to thousands of changes in each transaction since committing a transaction forces the logical log to disk.
However, in your case you could instead begin your transactions with:
db.tx().unforced().begin();
Instead of:
db.beginTx();
Which makes that transaction not wait for the logical log to force to disk and makes small transactions much faster, but a power outage could have you lose the last couple of seconds of data potentially.
The tx() method sits on GraphDatabaseAPI, which for example EmbeddedGraphDatabase implements.

you can try a virtual drive. It would make neo4j persist to the drive, but it would all happen in memory
https://thelinuxexperiment.com/create-a-virtual-hard-drive-volume-within-a-file-in-linux/

Hibernate Monitoring Solution

I would like to monitor hibernate action.
I see on the internet the zentracker monitor solution that permit to monitor a lot of activity of hibernate.
But It is compatible with the last version of hibernate 3.5.*?
if it's not, do you have solution to monitor query execution time, sessionFactory opened, persitence object created, ... ?
Thank you in advance for your help.
Best regards,
Florent
P.S: I'm french, sorry for my english.

I see on the internet the zentracker monitor solution that permit to monitor a lot of activity of hibernate. But It is compatible with the last version of hibernate 3.5.x?
Why don't you get the sources and recompile the project with a more recent version of Hibernate Core? Well, I did because I was curious and it doesn't compile, there are a few API changes that require some modifications. But nothing overcomplicated though. And since the project doesn't seem to be very active, your best option would be to make them yourself.
If it's not, do you have solution to monitor query execution time, sessionFactory opened, persitence object created, ... ?
Well, as I said, you can make it compatible...
I personally gather Statistics via JMX and use a custom tool. From the documentation:
20.6.2. Metrics
Hibernate provides a number of
metrics, from basic information to
more specialized information that is
only relevant in certain scenarios.
All available counters are described
in the Statistics interface API, in
three categories:
Metrics related to the general Session usage, such as number of open
sessions, retrieved JDBC connections,
etc.
Metrics related to the entities, collections, queries, and caches as a
whole (aka global metrics).
Detailed metrics related to a particular entity, collection, query
or cache region.
For example, you can check the cache
hit, miss, and put ratio of entities,
collections and queries, and the
average time a query needs. Be aware
that the number of milliseconds is
subject to approximation in Java.
Hibernate is tied to the JVM precision
and on some platforms this might only
be accurate to 10 seconds.
Have a look at Performance Monitoring using Hibernate for more inspiration.
See also
Hibernate Profiler: a commercial tool
Related questions
Tool for monitoring Hibernate cache usage
References
Hibernate Core Reference Guide
20.6. Monitoring performance

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.