Get similar part of string array items - java

I have an array of Strings:
qTrees[0] = "023012311312201123123130110332";
qTrees[1] = "023012311130023103123130110332";
qTrees[2] = "023013200020123103123130110333";
qTrees[3] = "023013200202301123123130110333";
Using this cycle I'm trying to retrieve similar part from them:
String similarPart = "";
for (int i = 0; i < qTrees[0].length(); i++){
if (qTrees[0].charAt(i) == qTrees[1].charAt(i) &&
qTrees[1].charAt(i) == qTrees[2].charAt(i) &&
qTrees[2].charAt(i) == qTrees[3].charAt(i) ){
similarPart += qTrees[0].charAt(i);
} else {
break;
}
}
But this is wrong. As you see it will return only "02301", but the deeper similarity is possible.
Please suggest me a better way to do it. Thanks.

You need to better define what you are trying to achieve. Do you want to:
find the longest common starting sequence between any two entries in the array;
find the longest common starting sequence across all of the entries in the array;
find the longest common sequence (i.e. same characters in same position) between any two entries;
find the longest common sequence across all entries in the array.
All of these will give slightly different approaches, but it will all boil down to correctly using break and continue in your loops.

Remove the else part in your code. Then it will check until the end of the string.
The code :
for (int i = 0; i < qTrees[0].length(); i++){
if (qTrees[0].charAt(i) == qTrees[1].charAt(i) &&
qTrees[1].charAt(i) == qTrees[2].charAt(i) &&
qTrees[2].charAt(i) == qTrees[3].charAt(i) ){
similarPart += qTrees[0].charAt(i);
}
}

Related

Efficient alternative to nested For Loop

I am doing profanity filter. I have 2 for loops nested as shown below. Is there a better way of avoiding nested for loop and improve time complexity.
boolean isProfane = false;
final String phraseInLowerCase = phrase.toLowerCase();
for (int start = 0; start < phraseInLowerCase.length(); start++) {
if (isProfane) {
break;
}
for (int offset = 1; offset < (phraseInLowerCase.length() - start + 1 ); offset++) {
String subGeneratedCode = phraseInLowerCase.substring(start, start + offset);
//BlacklistPhraseSet is a HashSet which contains all profane words
if (blacklistPhraseSet.contains(subGeneratedCode)) {
isProfane=true;
break;
}
}
}
Consider Java 8 version of #Mad Physicist implementation:
boolean isProfane = Stream.of(phrase.split("\\s+"))
.map(String::toLowerCase)
.anyMatch(w -> blacklistPhraseSet.contains(w));
or
boolean isProfane = Stream.of(phrase
.toLowerCase()
.split("\\s+"))
.anyMatch(w -> blacklistPhraseSet.contains(w));
If you want to check every possible combination of consecutive characters, then your algorithm is O(n^2), assuming that you use a Set with O(1) lookup characteristics, like a HashSet. You would probably be able to reduce this by breaking the data and the blacklist into Trie structures and walking along each possibility that way.
A simpler approach might be to use a heuristic like "profanity always starts and ends at a word boundary". Then you can do
isProfane = false;
for(String word: phrase.toLowerCase().split("\\s+")) {
if(blacklistPhraseSet.contains(word)) {
isProfane = true;
break;
}
}
You won't improve a lot on time complexity, because those use iterations under the hood but you could split the phrase on spaces and iterate over the array of words from your phrase.
Something like:
String[] arrayWords = phrase.toLowerCase().split(" ");
for(String word:arrayWords){
if(blacklistPhraseSet.contains(word)){
isProfane = true;
break;
}
}
The problem of this code is that unless your word contains compound words, it won't match those, whereas your code as I understand it will. The word "f**k" in the black list won't match "f**kwit" in my code, it will in yours.

How to find a root word in an ArrayList

I'm working on a NLP project and try to match a specific input with a root in an ArrayList.
For example, the user will enter لاعبون and try to find the word لعب in an ArrayList, but when i run my code it gives me more than one root.
for(String dbData : rootList) {
//System.out.println(dbData);
// if(dbData.contains(x)) {
// System.out.println(dbData);
// }
for (int i = 0; i < dbData.length(); i++) {
c = dbData.charAt(i);
for (int j = 0; i < x.length(); i++) {
d = x.charAt(i);
if (c == d && m != rootList.size()) {
match = true;
//System.out.println(dbData);
} else {
++m;
match = false;
//System.out.println("لا يوجد تطابق");
}
if(match) {
System.out.println(dbData);
container = dbData;
}
}
}
}
This does not seem like a right approach to do stemming. Try the below that is a simple way to find stems in Arabic.
First you need a list of stems, and obviously you have that.
Then you should need to write the Arabic literature rules and forms that can parse a word to a stem.
Now you just convert your rules to java regex.
For example if you want to find لعب from لاعبون you should remove ون as it shows person and count, then you should check if لاعب is derived from one of the stems. As you know the forms لاعب is فاعل form of لعب so you should choose لعب.

Java/Angularjs - convert variable names to normal English conventions

My goal here is to retrieve the attribute names from a class, which I have already done using JAVA Reflections. But I want to be able to transform the variable naming convention, say firstName to First Name.
My current idea is to use .split() to transform position: 0 (usually a lower-case) to Uppercase, then loop until I find subsequent UpperCases, and push a blank space in between. Are there any better way to do this?
EDIT: This is my current method if any of you are interested:
public List<String> getProfileConstraintTemplateEnglish() {
//what I want to return
List<String> transformedList = new ArrayList<>();
//The reflection that I'm getting
List<ResultProfileConstraintTemplate> tmp = constraintService.getProfileCTml();
//loop each obj in reflection list
for (ResultProfileConstraintTemplate r : tmp) {
//get the letters first from the title in obj
String[] field = r.getTitle().split("");
//this is the transformed string in each tmp.
String transformed = "";
//converting the array to a list for simpler addition.
List<String> fieldString = Arrays.asList(field);
//adding a counter to know which is the "first" position.
int counter = 0;
for (String s : fieldString) {
//first letter
if (counter == 0) {
transformed += s.toUpperCase();
}
//everything else
if (counter != 0 && s.equals(s.toUpperCase())) {
transformed+= " ";
transformed+=s;
}
else if(counter != 0 && s.equals(s.toLowerCase())){
transformed+=s;
}
//increment counter
counter++;
}
//add the transformed word to list.
transformedList.add(transformed);
}
return transformedList;
}
Result:
I think your way is the only way. If you post your code, maybe we can shed more light on the matter.
You can use isUpperCase() method and if it returns true replace it with a space and the letter and always convert first letter i.e indexOf(0) to toUpperCase().

Java for loop - code efficiency

I have an array of 10,000 elements and i want to loop through and find a particular number say '6573'.
Example
for(int i=0;i<=user.length;i++)
{
if(user[i]=="6573")
System.out.println("Found it!!!");
}
Could you please suggest a way to improve the performance of the code.
Thanks
Use .equals instead of == to compare strings
Break when a match is found
he end condition must be i < user.length to prevent ArrayIndexOutOfBoundsException:
--
for(int i = 0; i < user.length; i++) {
if("6573".equals(user[i])) {
System.out.println("Found it!!!");
break;
}
}
Note that I inverted the .equals() call to prevent NullPointerException if the array contains null values.
If you need to do it once then that's about it. If you try to find several users in that list then you could use a set for O(1) search:
Set<String> set = new HashSet<>(Arrays.asList(user));
if(set.contains("6573"))
System.out.println("Found it!!!");
And it may actually make sense to store your users directly in that set in the first place instead of using an array.
if the array elements is in order(sorted) then u can use the binary search .. it will increase the performance of your program.
Sort the Array using sort() (efficiency O(n Log n) and use binary search (O(log n) ) .. I think this will be more efficient than your current efficency i.e, O(n).. Just giving details of #abhi's answer...
try like this
int index = -1;
boolean found = false;
for(int i = 0; i < array.length; i++)
{
if(array[i].equalsIgnoreCase(userInput))
{
index = i;
found = true;
break;
}
}
you could use multi-threading and make for example 2 pointers, one travel from the start of the array and one form the end till the middle of the array.
It will use more processing but less time.

Java regex: How to replace all character inside a bracket?

How do I able to replace:
((90+1)%(100-4)) + ((90+1)%(100-4/(6-4))) - (var1%(var2%var3(var4-var5)))
with
XYZ((90+1),(100-4)) + XYZ((90+1),100-4/(6-4)) - XYZ(var1,XYZ(var2,var3(var4-var5)))
with regex?
Thanks,
J
this doesn't really look like a very good job for a regex. It looks like you might want to write a quick recursive descent parser instead. If I understand you correctly, you want to replace the infix operator % with a function name XYZ?
So (expression % expression) becomes XYZ(expression, expression)
This looks like a good resource to study: http://www.cs.uky.edu/~lewis/essays/compilers/rec-des.html
I don't know much about regex, but try looking at this, especially 9 and 10:
http://www.mkyong.com/regular-expressions/10-java-regular-expression-examples-you-should-know/
And of course:
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
You could at least check them out until an in depth answer comes along.
See this code:
String input = "((90+1)%(100-4)) + ((90+1)%(100-4/(6-4))) - (var1%(var2%var3(var4-var5)))";
input = input.replaceAll("%", ",");
int level = 0;
List<Integer> targetStack = new ArrayList<Integer>();
List<Integer> splitIndices = new ArrayList<Integer>();
// add the index of last character as default checkpoint
splitIndices.add(input.length());
for (int i = input.length() - 1; i >= 0; i--) {
if (input.charAt(i) == ',') {
targetStack.add(level - 1);
} else if (input.charAt(i) == ')') {
level++;
}
else if (input.charAt(i) == '(') {
level--;
if (!targetStack.isEmpty() && level == targetStack.get(targetStack.size() - 1)) {
splitIndices.add(i);
}
}
}
Collections.reverse(splitIndices); // reversing the indices so that they are in increasing order
StringBuilder result = new StringBuilder();
for (int i = 1; i < splitIndices.size(); i++) {
result.append("XYZ");
result.append(input.substring(splitIndices.get(i - 1), splitIndices.get(i)));
}
System.out.println(result);
The output is as you expect it:
XYZ((90+1),(100-4)) + XYZ((90+1),(100-4/(6-4))) - XYZ(var1,XYZ(var2,var3(var4-var5)))
However keep in mind that it is a bit hacky and it might not work exactly as you expect it. Btw, I had to change a bit the output I added couple of brackets: XYZ((90+1), ( 100-4/(6-4 ) )) because otherwise you were not following your own conventions. Hopefully this code helps you. For me it was a good exercise at least.
Would it satisfy your requirements to do the following:
Find ( at first position or preceded by space and replace it with XYZ(
Find % and replace it with ,
If those two instructions are sufficient and satisfactory, then you could transform the original string in three "moves":
Replace ^\( with XYZ(
Replace \( with XYZ(
Replace % with ,

Categories