I'm creating a program in Java that uses scripting. I'm just wondering if I should split my scripts into one file for each script (more realistically every type of script like "math scripts" and "account scripts" etc.), or if I should use one clumped file for all scripts.
I'm looking for an answer from more of a technical viewpoint rather than a practical viewpoint if possible, since this question kind of already explained the practical side (separate often modified scripts and large scripts).
In terms of technical performance impacts, one could argue that using a single Globals instance is actually more efficient since any libraries are loaded only once instead of multiple times. However the question about usage of multiple files really depends. Multiple physical lua files can be loaded using the same Globals, or a single file can be loaded using the Globals instance, either way the Globals table contains the same amount of data in the end regardless of whether it was loaded from multiple files or not. If you use multiple Globals for each file this is not the case.
Questions like this really depend on what the intended goals are that you wish to use lua for. Using a single Globals instance will use RAM more efficiently, but besides that will not really give any performance increase. Loading multiple files versus a single file may take slightly longer, as the time to open and close the file handles, but this is such a micro optimization it seriously isn't worth the hassle it requires to write all the code in a single file, not to mention how hard it'd be to organize it efficiently.
There are a few advantages to using multiple Globals as well however, each Globals instance has it's own global storage, so changing something, like overloading operators on an objects metatable or overriding functions don't carry over to other instances. If this isn't a problem for you, then my suggestion may be to write the code in multiple files, and load them all with a single Globals instance. However if you do this be careful to structure all your files properly, if you use the global scope a lot you may find that keeping track of object names becomes difficult and is prone to accidentally modifying values from other files by naming them the same. To avoid this each file can define all of its functionality in it's own table, and then these Tables work as individual modules, where you can select features based on the tables, almost like choosing from a specific file.
In the end it really doesn't make much of a difference, but depending on which you choose you may need to take care to ensure good organization of the code.
Using multiple Globals takes more RAM, but can allow each file to have their own custom libraries without affecting others, but comes at the cost of requiring more structural management from the Java end of your software to keep all the files organized.
Using a single Globals takes less RAM, but all files share the same global scope, making customized versions of libraries more difficult and requires more structural organization from the Lua end of the software to prevent names and other functionality from conflicting.
If you intend other users to use your Lua API to add-on to your software through an addon system for example, you may wish to use multiple instance of Globals, because requiring the user creating addons to be the one responsible for ensuring they're code won't conflict with other addons is not only dangerous but also a burden that doesn't need to exist. An inexperienced user comes along trying to make an addon, doesn't organize it properly, and may mess up parts of the software or software addons.
Related
There is an application that will need to have something like a look up table. This application can be started many times with different configurations. Is there a way to share a datastructure across JVMs. static would be valid within a JVM. Having a database would solve the issue. However, is there something simpler and fast?
You might use a file. Write the object to a file. There is no such thing as an object shared within JVMs because the life cycle of an Object is defined for and within a JVM.
File IO is usually faster than DB operations and simpler as well. But on the downside, ACID properties are not guaranteed by files and there could be inconsistencies if multiple processes try to read / write on the same file.
It seems to be possible to implement transactions on top of normal file systems using techniques like write-ahead logging, two-phase commit, and shadow-paging etc.
Indeed, it must have been possible because a transactional database engine like InnoDB can be deployed on top of a normal file system. There are also libraries like XADisk.
However, Apache Commons Transaction state:
...we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. ...
Why did Apache Commons Transactions claim implementing transactions on top of normal file systems is impossible?
Is it impossible to do transactions on top of normal file systems?
Windows offers transactions on top of NTFS. See the description here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb968806%28v=vs.85%29.aspx
It's not recommended for use at the moment and there's an extensive discussion of alternative scenarios right in MSDN: http://msdn.microsoft.com/en-us/library/windows/desktop/hh802690%28v=vs.85%29.aspx .
Also if you take a definition of the filesystem, DBMS is also a kind of a filesystem and a filesystem (like NTFS or ext3) can be implemented on top (or in) DBMS as well. So Apache's statement is a bit, hmm, incorrect.
This answer is pure speculation, but you may be comparing apples and oranges. Or perhaps more accurately, milk and dairy products.
When a database uses a file system, it is only using a small handful of predefined files on the system (per database). These include data files and log files. The one operation that is absolutely necessary for ACID-compliant transactions is the ability to force a write to permanent memory (either disk or static RAM). And, I think most file systems provide this capability.
With this mechanism, the database can maintain locks on objects in the database as well as control access to all objects. Happily, the database has layers of memory/page management built on top of the file system. The "database" itself is written in terms of things like pages, tables, and indexes, not files, directories, and disk blocks.
A more generic transactional system has other challenges. It would need, for instance, atomic actions for more things. E.g. if you "transactionally" delete 10 files, all these would have to disappear at the same time. I don't think "traditional" file systems have this capability.
In the database world, the equivalent would be deleting 10 tables. Well, you essentially create new versions of the system tables without the tables β within a transaction, while the old tables are being used. Then you put a full lock on the system tables (preventing reads and writes), waiting until they are available. Then you swap in the new table definitions (i.e. without the tables), unlock the tables, and clean up the data. (This is intended as an intuitive view of the locking mechanism in this case, not a 100% accurate description.)
So, notice that locking and transactions are deeply embedded in the actions the database is doing. I suspect that the authors of this module come to realize that they had to basically fully re-implement all existing file system functionality to support their transactions β and that was a bit too much scope to take on.
I have an array of configs that while they may possibly change in the future, the likelihood is that they will never have to be changed.
If any are missing or incorrect then a certain feature of my system will not work correctly.
Should these still be retrieved be some sort of config, either xml, database etc and made available to the end user to change - or is this a good situation where it makes more sense to hard code them in the class that uses them?
I have spent a long time changing mind over and over on this.
Designer's estimate of the likelihood of something needing to change is not a reliable criterion to make a decision, because real-world use of our programs has its peculiar ways of proving us wrong.
Instead of asking yourself "how likely is something to change?", ask yourself "does it make sense for an end-user to make a change?" If the answer is "yes", make it user-changeable; otherwise, make it changeable only through your code.
A particular mechanism through which you make something changeable (a database, a configuration file, a custom XML file, and so on) does not matter much. An important thing is to have good defaults for settings that are missing, so that your end-users would have harder time breaking your system by supplying partial configurations.
Best practice is to use any kind of config or properties file and use default values and failsafe if the file is damaged/missing. These approach has the following advantages:
It can easily be recognised as a config file meaning another dev would not need to dive through your classes to change a parameter
property files can be written by build tools like ant, so if you have e.g. a test server address and a productive server address the ant task could change the content accordingly
it works with default values even without it
Disadvantage is the added complexity.
Yes, it's almost certainly a bad idea to hard-code them; if nothing else, it can make testing (whether automated or manual) a lot more difficult than it needs to be. It's easy to include a .properties file in your jar with the usual defaults, and changing them in the future would just require overriding them at runtime. Dependency injection is usually an even better choice if you have the flexibility to arrange it.
If the configs will never gonna change as you said then its fine if you declare those properties as a variable in interface or a separate class and use this constant through out the program.
Separate property files are used only when some property value are not fixed and is depend on environment like database name,username, password etc. Whereas some property are fixed and is not dependent on the environment in which it is going to deploy like portno, tablenames if any etc.
It depends on your application. As a baseline, its good design to use static variables to hold data that your program will need, instead of hardcoding strings and integers all over the place; This means any changes (i.e. application wide font color) in the future will only require a single change, then a compile cycle and your good to go.
However, if these settings are user configurable, then they cannot be hard coded, but instead need to be read from an external source, and where you do it, is a matter of design, complexity and security.
Plain text files are good for a small application, where security is lax and things are plain text. The SublimeText editor and notepad++ editor do this for their theme settings and it works well. (I believe it was plain text, perhaps they have moved to XML now)
A better option is XML, as it is structured, easier to read/parse/write. Lots of projects use this as an option. One thing to look out for is corrupt files, while reading/writing to them, if the user closes the program or the JVM exits randomly for whatever reason. You might want to look at things like buffers. And also deal with FileNotFoundExceptions, if the text/xml file is missing.
Another option is a database file of some sort, its a bit more secure, you can add application level encryption and you have a multitude of options. Large programs that already use a DB backend, like MySQL, already have a database to hand, so create a new table and store the config there. Small applications can look at SQLite as an option.
Never ever hard code things if hey "might" change, or you might be sorry later and make others mad (very likely in big or/and open source projects). If the config will never change, it is not a config any more but a constant.
Only use hard coding when experimenting with code.
If you want to save simple values, you can user java properties.
Look HERE for an example.
good luck.
There are some properties you can change without having to retest the software. These properties you have tested for a range of values, or you are sure it is safe to change with at most a restart. This properties can be made configurable.
There are other properties which you cannot assume will just work without retesting the software. In this case it is better to hard code them IMHO. This encourages you to go through the release process when you change such a value. Values which you never expect to change are a good candidate for this.
Say there's a legacy Java project A. That project for whatever reason has some things in it that are confidential (e.g. passwords, encryption keys, emails) and / or environment-specific (e.g. hard-coded paths, server names, emails). Due to the complexities involved, it doesn't seem possible to change the project to not contain the information in the source code.
At some point, a new outsourcing team joins the development. Given the above situation, the outsourcing team cannot get access to the project source verbatim. They have a separate development environment, so it's possible to make a separate copy of the project in their VCS that has the problems addressed (i.e. all the things needed are cleaned / updated as necessary to work in their environment). Let's call that version A2.
The workflow would generally include two things related to A and A2:
The code can change at both sides (i.e. both A and A2 can change, A being changed by the original team and A2 by the outsourcing team), including having source code change conflicts
There's a need to keep the two projects in sync. It's not required to have them in sync all the time, but it's important to have a relatively painless way to do that. It's assumed this must be a manual process when there are conflicts to be resolved
This workflow can be achieved by manually keeping two projects and merging between them.
Related questions:
How would one go about managing the two versions with git, i.e. what are the options compared to manual merging?
Is this the best setup or is there a better option?
For new projects, what is the preferred way (in the sense - what do you do if you have similar situation?) to keep the confidential / environment-specific things out of source control? Is that a good thing anyway?
This approach is going to cause you pain. What you need to do is use git filter-branch to eliminate server names, passwords out and replace with a non-working general form - ie, it should not run - anywhere!
Next, set up smudge/clean scripts to alter the files that contain that information to populate the values to what they need to be for your solution to run on that local system only. There will be different parameters on your production environment compared to your development environment. The key is to have this information abstracted.
Now you should have no issue sharing the same repository with the outsourced team. Managing branches in one repo versus scrubbing commits between to repos is way easier.
#icyrock.com: That seems the recipe for a disaster.
My suggestion is to separate source code from sensible data.
Note that this is a more general suggestion, you probably want to keep that sensible data safely stored and with limited access.
Steps:
1. remove every sensible data from the source code,
2. create a new git repository that contains that sensible data
3. reference sensible data from the original source code (this depends on the programming language, Java is not my field of expertise)
At this point the "cleaned" source code can be safely shared with the outsourcing team, thay will not have access to the "sensible data" repo, but they probably have a similar repo with their own version of that sensible data (i.e. "demo" or "trial" or "non-production" paths, server names, emails).
Of course the above is needed if the outsourcing team should be put in a position to test their changes in a test environment, which I strongly assume as a MUST have. They ARE doing tests, aren't they?
This will drastically reduce if not eliminate as a whole any problem related to big messy merge between 2 copy of the same stuff being actively developed in parallel.
I have some Java programs, now I want to find out whether it is modular or not, if it is modular then up to what extent, because modularity can never be binary term i.e. 0 or 1.
How do I decide that particular code is modular upto this much extent. I want to know how to make code much more modular?
Some Benchmarks for modularity:
How many times are you rewriting similar code for doing a particular task?
How much do you have to refactor your code when you change some part of your program?
Are the files small and easy to navigate through?
Are the application modules performing adequately and independently as and when required?
Is your code minimally disastrous? Does all hell break lose when you delete just one function or variable? Do you get 20-odd errors upon re-naming a class? (To examine this, you can implement a stacking mechanism to keep trace of all the hops in your application)
How near is the code to natural language usage? (i.e. modules and their subcomponents represent more real world objects without giving much concern to net source file size).
For more ideas check out this blurb about modularity and this one on software quality
As for your concern on making your code more modular first you should ask yourself the above questions, obtain specific answers for them and then have a look at this.
The basic philosophy is to break down your application into as small of code fragments as possible, arranged neatly across a multitude of easily understandable and accessible directory layouts.
Each method in your application must do no more than the minimum quanta of processing needed. Combining these methods into more and more macro level methods should lead you back to your application.
Key points are
Separation of concerns
Cohesion
Encapsulation (communicates via interface)
Substitutability
Reusability
A good example of such module system is standard car parts like disk brakes and car stereo.
You don't want to build car stereo from scratch when you are building cars. You'd rather buy it and plug it in. You also don't want the braking system affecting the car stereo ββor worse car stereo affecting the brake system.
To answer your question, "How do I decide that particular code is modular up to this much extent," we can form questions to test the modularity. Can you easily substitute your modules with something else without affecting other parts of your application?
XML parsers could be another example. Once you obtain the DOM interface, you really don't care which implementation of XML parser is used underneath (e.g. Apache Xerces or JAXP).
In Java, another question may be: Are all functionality accessible via interfaces? Interface pretty much takes care of the low coupling.
Also, can you describe each module in your system with one sentence? For example, a car stereo plays music and radio. Disk brakes decelerate the vehicle safely.
(Here's what I wrote to What is component driven development?)
According to Wikipedia, Component-Based Development is an alias for Component-based software engineering (CBSE).
[It] is a branch of software
engineering, the priority of which is
the separation of concerns in respect
of the wide-ranging functionality
available throughout a given software
system.
This is somewhat vague, so let's look at more details.
An individual component is a software
package, or a module, that
encapsulates a set of related
functions (or data).
All system processes are placed into
separate components so that all of the
data and functions inside each
component are semantically related
(just as with the contents of
classes). Because of this principle,
it is often said that components are
modular and cohesive.
So, according to this definition, a component can be anything as long as it does one thing really well and only one thing.
With regards to system-wide
co-ordination, components communicate
with each other via interfaces. [...]
This principle results in components referred to as encapsulated.
So this is sounding more and more like what we think of good API or SOA should look like.
The provided interfaces are represented by a lollipop and required interfaces are represented by an open socket symbol attached to the outer edge of the component in UML.
Another important attribute of
components is that they are
substitutable, so that a component
could be replaced by another (at
design time or run-time), if the
requirements of the initial component
(expressed via the interfaces) are met
by the successor component.
Reusability is an important
characteristic of a high quality
software component. A software
component should be designed and
implemented so that it can be reused
in many different programs.
Substitutability and reusability is what makes a component a component.
So what's the difference between this and Object-Oriented Programming?
The idea in object-oriented
programming (OOP) is that software
should be written according to a
mental model of the actual or imagined
objects it represents. [...]
Component-based software engineering,
by contrast, makes no such
assumptions, and instead states that
software should be developed by gluing
prefabricated components together much
like in the field of electronics or
mechanics.
To answer your specific question of how to make the code more modular, a couple of approaches are:
One of best tool for modularization is spotting code re-use. If you find that your code does the same exact (or very similar) thing in more than once place, it's a good candidate for modularizing away.
Determine which pieces of logic can be made independent, in a sense that other logic would use them without needing to know how they are built. This is somewhat similar to what you to in OO design, although module/component does not necessarily need to correspond to a modeled object as in OO.
Hej,
See, "How to encapsulate software (Part 1)," here:
http://www.edmundkirwan.com/encap/overview/paper7.html
Regards,
Ed.
Since this has been tagged with 'osgi', I can throw in an OSGi-related perspective.
The short answer is that it is possible to go from completely spaghetti code to modular in small steps; it doesn't have to be a big bang. For example, even spaghetti code depends on some kind of bolognaise logging library, so in some sense, it's already modular, just with One Very Big Metball (sorry, module) in it.
The trick is to break the big meatball into one smaller chunk and then a slightly less big meatball and then recurse. It doesn't all have to be done in one go either; simply chip off a bit more each time until there is nothing left to remove.
As for OSGi, it's still possible to put an uber-jar into a bundle. In fact, you can do this without changing the bits; either by modifying the Manifest.MF in place, or by wrapping that in another JAR and specify Bundle-ClassPath: metaball.jar in the manifest.
Failing that, tools like BND can help generate the right data you'd need, and then it can be dropped in an OSGi runtime easily enough. But beware of overly coupled code, and stuff that mucks around with classloaders - those will trip you up.
Assuming I understand your question, that you want to know what it is that makes code modular, since code modules will obviously need some dependency between each other to work at all. This is my answer:
If you can break your system down into modules, and you can test those modules in isolation, that is a good indication that a system is modular.
As you say modularity is not a binary thing so it depends on your relative definition.
I would say: Can you use a given method in any program where you need to perform that function? Is it the "black box" where you wouldn't need to know what it were doing under the hood? If the answer is no, i.e. the method would only work properly in that program then it is not truely modular.
Modularity is relative to who ever is developing the code. But I think the general consensus is that modular code is code that has portions that can easily be swapped out without changing most of the original code.
IMHO, If you have 3 modules A B and C and you want to change or replace module C completely, if it is a SIMPLE task to do so then you have modular code.
You can use a code analysis tool such as CAP to analyse the dependencies between types and packages. They'll help you find and remove any cyclic dependencies, which are often a problem when trying to develop modular code.
If there are no cyclic dependencies, you can start separating your code into discrete jars.
In general it is good practice to code to interfaces if you can, this generally means your code can more easily be refactored and/or used in different contexts.
Dependency injection frameworks such as Spring can also help with the modularity of your design. As types are injected with their dependencies by some external configuration process they don't need a direct dependency on an implementation.
The package-by-feature idea helps to make code more modular.
Many examples seen on the web divide applications first into layers, not features
models
data access
user interface
It seems better, however, to divide applications up using top-level packages that align with features, not layers.
Here is an example of a web app that uses package-by-feature. Note the names of the top-level packages, which read as a list of actual features in the application. Note as well how each package contains all items related to a feature - the items aren't spread out all over the place; most of the time, they are all in a single package/directory.
Usually, deletion of a feature in such an app can be implemented in a single operation - deletion of a single directory.