AEM: After deleting user groups, rep:policy nodes remain intact - java

I'm quite stunned at what I have found while tinkering with AEM (don't think it matters but for accuracy of my reporting I'm using 6.1) trying to automate my group permission creation. I have this group called aem-tools-readonly that has a specific set of permissions on it. No problem there, the thing that kind of surprises me is the following, if I happen to delete said group it does not delete the respective rep:policy nodes that correspond to that group. So if I re-create aem-tools-readonly it picks up the same config for my group. I am wondering a couple of things.
Should I be concerned security wise of creating holes in my permission scheme if groups get deleted as I move along with my projects ?
Why aren't these rep:policy nodes not getting deleted, is there a
valid reason ?
How can I easily delete all rep:policy nodes of for example my aem-tools-readonly group ?
Any information/thoughts are welcomed ...
Thanks

As far as I know this has always been this way.
This is how the ACL's implementation works in CRX.
To fix that prior to deleting a group you could clear its whole accesses - probably by deleting the proper entries lying under any rep:policy.
There is no easy (automatic way) to do that. just code. it should be quite easy though to find any descendant of any rep:policy that has your group name within it.

Related

Is there a tree-like structure where nodes can appear multiple times and even be ancestors of themselves?

I'm crawling some web pages, recursively getting all the existing links, and I would like to preserve in some kind of structure the history of links I've had to visit to get to each link. This means it's possible to find the same link multiple times in the process.
I already store the links in a Set to make sure I don't visit the same link more than once, but probably this is not the right kind of structure to keep the history
As commented by John Bollinger
Any way around, the generalization you're looking for is a graph, presumably a directed one to match the directed nature of hyperlinks. You might need to augment that with, say, an index.
That's what I was looking for at the time of the question. It's not what I used in the end, though. Turns out keeping all the links, info and references can be quite heavy on the memory

Work out Analyzer, Version, etc. from Lucene index files?

Just double-checking on this: I assume this is not possible and that if you want to keep such info somehow bundled up with the index files in your index directory you have to work out a way to do it yourself.
Obviously you might be using different Analyzers for different directories, and 99% of the time it is pretty important to use the right one when constructing a QueryParser: if your QP has a different one all sorts of inaccuracies might crop up in the results.
Equally, getting the wrong Version of the index files might, for all I know, not result in a complete failure: again, you might instead get inaccurate results.
I wonder whether the Lucene people have ever considered bundling up this sort of info with the index files? Equally I wonder if anyone knows whether any of the Lucene derivative apps, like Elasticsearch, maybe do incorporate such a mechanism?
Actually, just looking inside the "_0" files (_0.cfe, _0.cfs and _0.si) of an index, all 3 do actually contain the word "Lucene" seemingly followed by version info. Hmmm...
PS other related thoughts which occur: say you are indexing a text document of some kind (or 1000 documents)... and you want to keep your index up-to-date each time it is opened. One obvious way to do this would be to compare the last-modified date of individual files with the last time the index was updated: any documents which are now out-of-date would need to have info pertaining to them removed from the index, and then have to be re-indexed.
This need must occur all the time in connection with Lucene indices. How is it generally tackled in the absence of helpful "meta info" included in with the index files proper?
Anyone interested in this issue:
It does appear from what I said that the Version is contained in the index files. I looked at the CheckIndex class and the various info you can get from that, e.g. CheckIndex.Status.SegmentInfoStatus, without finding a way to obtain the Version. I'm starting to assume this is deliberate, and that the idea is just to let Lucene handle the updating of the index as required. Not an entirely satisfactory state of affairs if so...
As for getting other things, such as the Analyzer class, it appears you have to implement this sort of "metadata" stuff yourself if you want to... this could be done by just including a text file in with the other files, or alternately it appears you can use the IndexData class. Of course your Version could also be stored this way.
For writing such info, see IndexWriter.setCommitData().
For retrieving such info, you have to use one of several (?) subclasses of IndexReader, such as DirectoryReader.

Find and delete duplicates in a Lotus Notes database

I am very new to lotus notes. Recently my team mates were facing a problem regarding the Duplicates in Lotus notes as shown below in the CASE A and CASE B.
So we bought a app named scanEZ (Link About scanEX). Using this tool we can remove the first occurrence or the second occurrence. As in the case A and Case B the second items are considered as redundant because they do not have child. So we can remove all the second item as given below and thus removing the duplicates.
But in the Case 3 the order gets changed, the child item comes first and the Parent items comes second so i am unable to use the scanEX app.
Is there any other better way or software or script to accomplish my task. As I am new to this field I have not idea. Kindly help me.
Thanks in advance.
Probably the easiest way to approach this would be to force the view to always display documents with children first. That way the tool you have purchased will behave consistently for you. You would do this by adding a hidden sorted column to the right of the column that that you have circled. The formula in this column would be #DocChildren, and the sort options for the column would be set to 'Descending'. (Note that if you are uncomfortable making changes in this view, you can make a copy of it, make your changes in the copy, and run ScanEZ against the copy as well. You can also do all of this in a local replica of the database, and only replicate it back to the server when you are satisified that you have the right results.)
The other way would be to write your own code in LotusScript or Java, using the Notes classes. There are many different ways that you could write that code,
I agree with Richard's answer. If you want more details on how to go thru the document collection you could isolate the documents into a view that shows only the duplicates. Then write an agent to look at the UNID of the document, date modified and other such data elements to insure that you are getting the last updated document. I would add a field to the document as in FLAG='keep'. Then delete documents that don't have your flag in the document with a second agent. If you take this approach you can often use the same agents in other databases.
Since you are new to Notes keep in mind that Notes is a document database. There are several different conflicts like save conflicts or replication conflicts. Also you need to look at database settings on how duplicates can be handled. I would read up on these topics just so you can explain it to your co-workers/project manager.
Eventually in your heavily travelled databases you might be able to automate this process after you work down the source of the duplicates.
These are clearly not duplicates.
The definition of duplicate is that they are identical and so it does not matter which one is kept and which one is removed. To you, the fact that one has children makes it more important, which means that they are not pure duplicates.
What you have not stated is what you want to do if multiple documents with similar dates/subjects have children (a case D if you will).
To me this appears as three separate problems.
The first problem is to sort out the cases where more than one
document in a set has children.
Then sort out the cases where only one document in a set has children.
Then sort out the cases where none of the documents in a set has children.
The approach in each case will be different. The article from Ytira only really covers the last of these cases.

Find out the SHA1 of a file from the index

Short story:
Like the title says, how to get the SHA1 or hash from the index of a checked out file using JGit?
Long story:
I am writing an GUI application and using JGit for revisioning of just one file. So a user is able to open a window which contains all of the revisions of this file in a nice table.
User can make his changes and commit them. Also, user can go back in time and choose an older revision from a table to work on.
This workflow is very simple. What I internally do with JGit is that I use only one branch (the master branch). The HEAD is always pointing to this branch and the tip of this branch is always the newest commit. When a user chooses an older revision I simply instantiate CheckoutCommand class, set path to file using addPath() and use the branch master using setName().
The above results in HEAD pointing to master branch which in turn points to the newest revision (not the user chosen revision). But the index and the working directory itself are now at the revision chosen by the user.
So, finally, I want to be able to present to user which of those revisions in table is currently checked out or activated or whatever you want to call this. This revision would than be highlighted like on the below screenshot. But i cannot use the tip of the master branch for this purpose. I need to somehow get the SHA1 from the index.
There is a question posted which is the exactly what I want but in the context of JGit (the author of the question uses git).
EDIT: After just a little bit more analyzing I found that I could use JGit DirCache to access the contents of the index. So using DirCache class I am able to get the SHA1 of the file in the index just like in this question. But now I see that this hash is not the same as the revision hash from which I checked out. Meaning, I can not use this method to determine which revision from a table is checked out.
So, is there any other way using my workflow described as is to determine which of the revisions is user chosen to work on? Or even, maybe someone can propose a different approach.
My current approach for this problem is to use JGit AddNoteCommand. When user checks out the revision I will simply add a note to this revision with some "key: value". This key will indicate if the revision is checked out or not. Anyone with a better suggestion?
so first of all, sorry to say that, but I think it's dangerous and unintuitive to do what you do. Git is built so that you use branches. I think what you do is called Detached-head manipulation and it's not recommended, even though JGit allows you to do many things.
But if you are very careful well you can go on.
Second the Dircache (previously Index) object has been very mysterious to me and I think (I am not too sure though) the JGit team is still working on it.
Finally, to actually answer the question: I think you should use the LogCommand, with its addPath(...) method. You will get a list of RevCommit, from which you can determine the SHA1. I don't precisely remember how you get the SHA1, I think you should call getName() when you have a Ref object. I guess you'll find it on StackOverflow.
However, I would recommend to use branches (depending on what operation you want to perform on your commit), based on the SHA1 you got: you create a branch from the SHA1 you just found and can perform safely any operation you want. Then, either you destroy the branch if you don't want to commit anything or you will merge it later.

how to resolve plenty of concurrent write operation in Oracle?

I am maintaining a lottery website with more than millions of users. Some active user(Perhaps more than 30,000) will buy more than 1000 lotteries within 1 second.
Now the current logics use select .... for update to make sure the account balance, but meantime the database server is over-loaded and very slow to deal with? We have to process them in real-time.
Have anyone met the similar scene before?
First, you need to design a transactional system that satisfies your business rules. For the moment, forget about disk and memory, and what goes where. Try to design a system that is as lightweight as possible, that does the minimum required amount of locking, that satisfies your business rules.
Now, run the system, what happens? If performance is acceptable, congratulations, you're done.
If performance is not acceptable, avoid the temptation to guess at the problem, and start making adjustments. You need to profile the system. You need to understand where the most time is being spent, so that you know what areas to focus your tuning efforts on. The easiest way to do this, is to trace it, using SQL_TRACE. You've not made any mention of Oracle edition, version, or platform. So, I'll assume you're at least on some version of 10gR2. So, use DBMS_MONITOR to start/end traces. Now, scoping is important here. What I mean is, it's critically important that you start the trace, run the code that you want to profile and then immediately shut off the trace. This way, you trace only what you're interested in, and the profile won't contain any extraneous information. Once you have the trace file, you need to process it. There are several tools. The most common is TkProf, which is provided by Oracle, but really doesn't do a very good job. The best free profiler that I'm aware of, is OraSRP. Download a copy of OraSRP, and check your results. The data in the report should point you in the right direction.
Once you've done all that, if you still have questions, ask a new question here, and I'm sure we can help you interpret the output of OraSRP, to help you understand where your bottlenecks are.
Hope that helps.
Personally, I would lock/update the accounts in memory and update the database as a background task. Using this approach you can easily support thousands of updates and accounts.
A. Speed up things without modifying the code:
1 - You can keep the table entirely in the memory(that is SGA - because it is also on disks):
alter table t storage ( buffer_pool keep )
(discuss with your dba before to do this)
2 - if the table is too big and you update same rows again and again, probably it is sufficient to use the cache attribute:
alter table t cache
This command put the blocks of your table when they are used with best priority in the LRU list, so it is less chance to be aged from the SGA.
Here is it a discusion about differences: ask tom
3 - Another solution, advanced, that need more analysis and resources is TimesTen
B.Speed up your database operations:
Identify top querys and:
create indexes where you update or select only one row or a small set of rows.
partition large tables scanned for only a segment of data.
Have you identified a top query?

Categories