I want to make a system to gather some data of the JVM and notify me at certain levels, jconsole have this data, but I have no idea how it collects such data.
Does someone know a way to gather this data programatically with java?
As described in the docs for JConsole, it "is a monitoring tool that complies to the Java Management Extensions (JMX) specification" and "uses the extensive instrumentation of the Java Virtual Machine" to gather datta.
In other words, it uses the Java Management Extensions (JMX).
JMX provides a ton of data from various sources within the JVM, and new data sources can be defined by libraries. These are advertised using managed beans (JMX MBeans), which use a particular set of defs or annotations to indicate that they provide data or operations, what, and how.
with the JMX 2.0 spec, you can use a set of annotations to mark your beans, making it fairly easy to provide data. Depending on your container, adding new beans may be trivial. JMX can act as both a data source and allow methods to be called from the console or other client, actually allowing you to perform (supported) operations on the VM being watched.
Various containers (such as Tomcat) and libraries (such as C3P0) provide additional metrics, in addition to the slew of beans the JVM provides. These expose such things as memory usage (one of the more popular).
These beans are exposed by the JVM over a pair of ports using a domain: key-property-list naming convention. Each bean and property exposes some relevant information, which clients like JConsole can use to build the tree of available beans and each screen with counters and buttons.
Related
I am new to the project, and I am trying to create a connector between Dataflow and a database.
The documentation clearly states that I should use a Source and a Sink but I see a lot of people using directly a PTransform associated with a PInput or a PDone.
The source/sink API is in experimental (which explaines all the examples with the PTransform), but seems more easy to integrate it with a custom runner (ie: spark for example).
If I refer to the code, the two methods are used. I cannot see any use case where it will be more interesting to use the PTransform API.
Is the Source/Sink API is supposed to remplace the PTranform API?
Did I miss something that clearly differentiate the two methods?
Is the Source/Sink API stable enough to be considered the good way to code inputs and outputs?
Thank for you advice!
The philosophy of Dataflow is that PTransform is the main unit of abstraction and composability, i.e., any self-contained data processing task should be encapsulated as a PTransform. This includes the task of connecting to a third-party storage system: ingesting data from somewhere or exporting it to somewhere.
Take, for example, Google Cloud Datastore. In the code snippet:
PCollection<Entity> entities =
p.apply(DatastoreIO.readFrom(dataset, query));
...
p.apply(some processing)
.apply(DatastoreIO.writeTo(dataset));
the return type of DatastoreIO.readFrom(dataset, query) is a subclass of PTransform<PBegin, PCollection<Entity>>, and the type of DatastoreIO.writeTo(dataset) is a subclass of PTransform<PCollection<Entity>, PDone>.
It is true that these functions are under the hood implemented using the Source and Sink classes, but to a user who just wants to read or write something to Datastore, that's an implementation detail that usually should not matter (however, see the note at the end of this answer about exposing the Source or Sink class). Any connector, or for that matter, any other data processing task is a PTransform.
Note: Currently connectors that read from somewhere tend to be PTransform<PBegin, PCollection<T>>, and connectors that write to somewhere tend to be PTransform<PCollection<T>, PDone>, but we are considering options to make it easier to use connectors in more flexible ways (for example, reading from a PCollection of filenames).
However, of course, this detail matters to somebody who wants to implement a new connector. In particular, you may ask:
Q: Why do I need the Source and Sink classes at all, if I could just implement my connector as a PTransform?
A: If you can implement your connector by just using the built-in transforms (such as ParDo, GroupByKey etc.), that's a perfectly valid way to develop a connector. However, the Source and Sink classes provide some low-level capabilities that, in case you need them, would be cumbersome or impossible to develop yourself.
For example, BoundedSource and UnboundedSource provide hooks for controlling how parallelization happens (both initial and dynamic work rebalancing - BoundedSource.splitIntoBundles, BoundedReader.splitAtFraction), while these hooks are not currently exposed for arbitrary DoFns.
You could technically implement a parser for a file format by writing a DoFn<FilePath, SomeRecord> that takes the filename as input, reads the file and emits SomeRecord, but this DoFn would not be able to dynamically parallelize reading parts of the file onto multiple workers in case the file turned out to be very large at runtime. On the other hand, FileBasedSource has this capability built-in, as well as handling of glob filepatterns and such.
Likewise, you could try implementing a connector to a streaming system by implementing a DoFn that takes a dummy element as input, establishes a connection and streams all elements into ProcessingContext.output(), but DoFns currently don't support writing unbounded amounts of output from a single bundle, nor do they explicitly support the checkpointing and deduplication machinery needed for the strong consistency guarantees Dataflow gives to streaming pipelines. UnboundedSource, on the other hand, supports all this.
Sink (more precisely, the Write.to() PTransform) is also interesting: it is just a composite transform that you could write yourself if you wanted to (i.e. it has no hard-coded support in the Dataflow runner or backend), but it was developed with consideration for typical distributed fault tolerance issues that arise when writing data to a storage system in parallel, and it provides hooks that force you to keep those issues in mind: e.g., because bundles of data are written in parallel, and some bundles may be retried or duplicated for fault tolerance, there is a hook for "committing" just the results of the successfully completed bundles (WriteOperation.finalize).
To summarize: using Source or Sink APIs to develop a connector helps you structure your code in a way that will work well in a distributed processing setting, and the source APIs give you access to advanced capabilities of the framework. But if your connector is a very simple one that needs neither, then you are free to just assemble your connector from other built-in transforms.
Q: Suppose I decide to make use of Source and Sink. Then how do I package my connector as a library: should I just provide the Source or Sink class, or should I wrap it into a PTransform?
A: Your connector should ultimately be packaged as a PTransform, so that the user can just p.apply() it in their pipeline. However, under the hood your transform can use Source and Sink classes.
A common pattern is to expose the Source and Sink classes as well, making use of the Fluent Builder pattern, and letting the user wrap them into a Read.from() or Write.to() transform themselves, but this is not a strict requirement.
Just curious,What is the use of below imports in java.I'm wrongly imported while doing hibernate stuff and those are not compatible with hibernate.
import javax.management.Query;
import javax.management.QueryExp;
I gone through the api and found in that they can fire queries on the beans.
Can i use them on my hibernate pojo(to avoid some memory eat up)??or I understood in a wrong way ??
Any idea about them??
I gone through the api and found in that they can fire queries on the beans.
Not exactly. API Page states:
The MBean Server can be queried for MBeans that meet a particular condition, using its queryNames or queryMBeans method
So, it's not exactly about regular pojos. MBean or managed bean is one of the concepts introduced by Java Management Extensions (JMX) technology. As JMX Technology Overview states:
The Java objects that implement resources and their instrumentation are called managed beans, or MBeans. MBeans must follow the design patterns and interfaces defined in the JMX specification (JSR 3). This ensures that all MBeans provide the instrumentation of managed resources in a standardized way.
Basically MBeans are used to extend standard JVM management functionality. So, developers can integrate application-specific options to standard monitoring tools (jconsole) and, thus, simplify and standardize resource administration.
Query is just a utility class that introduce several methods used to build QueryExps. QueryExp objects are used to query MBeansServer.
Can i use them on my hibernate pojo(to avoid some memory eat up)?
Well, they aren't meant to be used that way. So, using them for such a purposes will just introduce confusion.
If you are looking for a way to query your pojos (I don't understand how does it help with memory eat up, though), check out these questions:
How do you query object collections in Java (Criteria/SQL-like)?
What is the Java equivalent for LINQ?
They are part of the JMX Framework. I don't think using them without the framework would make sense.
I have Java EE app in which I want to small little amount of data to disk, eg just user/passwords.
I dont want to go through the hassle of integrating with a full db for this little amount of data.
Is there a standard way to access the file system and a standard folder where web applications can store their data on disk, other than using a database?
Note:
I am not using EJBs. It's a web application using Servlets.
You could consider using the preferences API to store this data - it's available on Java EE as well.
Use a simple Java based database, like HSqlDB or h2. The setup won't be that complicated compared to a heavyweight DB. The key advantage this will give you is managing concurrent updates, which you would have to code yourself if you use direct file access.
File access has always been a controversial activity within EJB-based applications because of the restrictions placed upon bean providers by the EJB specification. The part of the specification relevant here is under the section entitled Programming Restrictions, and it states the following about accessing the filing system.
An enterprise bean must not use the
java.io package to attempt to access
files and directories in the file
system.
This is a fairly specific statement, and is followed up by a short explanation of why this is the case.
The file system APIs are not
well-suited for business components to
access data. Business components
should use a resource manager API,
such as JDBC, to store data.
While this explanation highlights a key reason for not using file I/O, I think that there is much more to this. However, although this is a well known restriction, actually finding more information on this is a time consuming task. So, in the quest for knowledge, I did some digging and came up with the following reasons why file I/O is "a bad thing"TM.
The WORA mantra of Java and J2EE means that there might not actually be a filing system to access. I've seen various comments saying that the J2EE server might be running on some device that doesn't have a filing system, or the application server doesn't have access to the filing because it's deployed in, for example, a database server. Although this is a valid reason, I don't think that this applies to most projects.
Access to files isn't transactional. Yes, typically, files aren't transational resources and when building enterprise systems, you usually want to be sure that some information has been correctly and accurately stored, hence the use of relational databases and the like.
Accessing file systems is a potential security hole. If we look at how other resources (e.g. JDBC DataSources, JMS Topics, etc) are accessed, it's usually through JNDI. To ensure that only authorised parties can access these, we typically have such resources protected by some sort of authentication mechanism, be it a username/password combination, or an SSL certificate. The problem with filing systems is that they are much more open and it's harder to control access. One solution is to lock file access via the operating system, and another is to use the Java security model to restrict access to only a specific part of the disk. If you are going to access the filing system from your business components, then locking down access will help to make the system more secure and resilient to attacks.
So then, how are we supposed to access files from EJB? Many people advocate the use of an intermediary Java class to wrap up the file access, believing that the EJB specification only disallows access from the bean class itself. Is this true? I'm not convined because all the same reasons apply. The specification itself presents an answer, and that answer is to use a resource manager so that we can treat file access as a secure, transactional, pooled resource. One such implementation is a J2EE Connector Architecture (JCA) adapter that you write, deploy and configure to access your filing system. In fact, some vendors have already built JCA adapters that access flat files and these are particularly useful if you have to access the outputs of legacy, mainframe systems.
Of course, many types of file access can be worked around. For example, configuration information can be placed in LDAP, JNDI, a database, or even properties files delivered inside your JAR files that get loaded as a resource through the classloader. In those circumstances where accessing files is a requirement, then other solutions include loading the file through the servlet container, having it sent to the EJB tier via messaging, downloading the file from a webserver through a socket connection and so on.
These are all workarounds for the programming restriction but at the end of the day I think you have to be pragmatic. Many projects do utilise file access from within the EJB tier and their solutions work. Although the EJB specification imposes a restriction, in reality many vendors choose not to enforce this, meaning that using the java.io package for accessing files is possible. Whatever solution you come up with, you should ideally keep the specification in mind. It's there to help you build portable and upgradable applications, but pragmatism should be employed. Hopefully a future version of the EJB specification will address this issue in more detail and this controversy will become a thing of the past.
Credit for above goes to: Simon Brown
In addition it is never a good idea to keep passwords on your site. Most security references tell you it is OK to keep a hash of a password that you can check against. If somebody breaks into your site, then he could retrieve all the passwords for all users.
Is there a standard way to access the file system and a standard folder where web applications can store their data on disk, other than using a database?
There's the java.io.File API, but the Servlet or the Java EE specifications do not provide a standard directory where you may persist files for a lengthy duration. At the very best, the javax.servlet.context.tempdir attribute may be used to locate the location of the temporary files directory, from the ServletContext.
The rationale for not supporting standard file directories is due to the inability to predict whether the container would be able to access a file system in the first place. Your container might be running off an embedded device in the first place to begin with, that might rely on a SAN or another remote file system for persisting files.
In addition, using the File API in EJBs is frowned upon, due to the non-transactional nature of file systems, so there is no similar concept of working directories and files.
Is it against the Java EE spec to create new classloader from within the code flow of any application?
I want to load classes at runtime into a separate class loader that will be created from the application.
It is definitely a violation of the spec. See here for example:
Attempting to create or obtain a class loader, set or create a new security manager, stop the JVM, change the input, output, and error streams. That restriction enforces security and maintains the EJB container's ability to manage the runtime environment.
There are two ways to respond to your underlying need. One is that if you envision deploying internally on a specific appserver, it doesn't matter - as long as you know that it works. You are most likely to mess up hot deployment, so that is where you should test.
The other is to see what Java EE or your specific app server give you. Weblogic, for example, allows you to configure the class loading hierarchy of your ear. At this point Java EE is mature enough that if you have a legitimate need, you can almost certainly get it done. It may not be nice, pretty, comfortable or as easy as doing your custom class loader, and may be app server dependent but it would likely be able to be made to work.
I am not 100% sure if it applies to all parts of a Java EE application (e.g. web apps), but from EJBs, your are not allowed to create new class loaders:
From the JSR 220 (EJB 3.0, Core Contracts and Requirements):
"21.1.2 Programming Restrictions
...
The enterprise bean must not attempt to create a class loader; obtain the current class loader;
set the context class loader; set security manager; create a new security manager; stop the
JVM; or change the input, output, and error streams."
I'm writing an HTTP Cache library for Java, and I'm trying to use that library in the same application which is started twice. I want to be able to share the cache between those instances.
What is the best solution for this? I also want to be able to write to that same storage, and it should be available for both instances.
Now I have a memory-based index of the files available to the cache, and this is not shareable over multiple VMs. It is serialized between startups, but this won't work for a shared cache.
According to the HTTP Spec, I can't just map files to URIs as there might be a variation of the same payload based on the request. I might, for instance, have a request that varies on the 'accept-language' header: In that case I would have a different file for each subsequent request which specifies a different language.
Any Ideas?
First, are you sure you want to write your own cache when there are several around? Things like:
ehcache
jboss cache
memcached
The first two are written in Java and the third can be accessed from Java. The first two also handle distributed caching, which is the general case of what you are asking for, I think. When they start up, they look to connect to other members so that they maintain a consistent cache across instances. Changes to one are reflected across instances. They can be set up to connect via multicast or with specific lists of servers specified.
Memcached typically works in a slightly different manner in that it is running externally to the Java processes you are running, so that all Java instances that start up will be talking to a common service. You can set up memcached to work in a distributed manner, but it does so by hashing keys so that the server you want to connect to can be determined by what it is you are looking for.
Doing a true distributed cache with consistent content is very hard to do well, which is why I suggest looking at an existing library. If you want to do it yourself, it would still help to look at those listed to see how they go about it and consider using something like JGroups as your underlying mechanism.
I think you should have a look at the WebDav-Specifications. It's an HTTP extension for sharing/editing/storing/versioning resources on a server. There exists an implementation as an Apache module, wich allows you a swift start using them.
So instead of implementing your own cache server implementation, you might be better off with a local Apache + mod-dav instance that is available to both of your applications.
Extra bonus: Since WebDav is a specified protocoll you get the interoperability with lots of tools for free.