Java Get the position of a value using an arraylist - java

I'm trying to get the position of the STX (0x02) from the byte array message below. If you see the message, it has 0x2 in a number of places but the only position I want is the STX one. I've been looping through it backwards using a for loop. I have to loop backwards btw. I've tried a number of ways but I'm having difficulty getting that position. One way I've tried but has not worked is, wherever a 0x2 is, and has elements of 3 or more between that and an ETX (0x3) in front of it get that position of that STX. But I'm doing something wrong because I keep on getting an error which I am having difficulty resolving. Can you please help?
EDIT: If there is a better way then my logic of finding that position (STX) by distinguishing it from the other 0x2, please can you provide that.
EDIT: I have to loop backwards as that is required by the instructions given to me.
EDIT: Here is the code:
//Test 3:
public String Test3(List<Byte> byteList) {
//checking positions of the STX and ETX
//looping through the array backwards
for (int i = byteList.size() - 1; i >= 0; i--) {
if (byteList.get(i) == STX && (i >= 3 && byteList.get(i) == ETX)) {
STXpos = i;
}
}
return STXpos;
}
byte[] validMsgWithRandomData = {0x32,0x32,0x32, //Random data
0x02, // STX
0x31,0x32,0x10,0x02,0x33, // Data 31 32 02 33
0x03, // ETX
0x31^0x32^0x02^0x33^0x03,// LRC calculated from the data (with the DLE removed) plus the ETX
0x2,0x3,0x00,0x02 //Random data
};

My first attempt with backward loop and in O(n) complexity.
EDIT : getting rid of candidate for STX.
EDIT 2 : This solution works at least for a few cases including OP's one (but it has not been tested extensively...).
final int NOTFOUND = -1;
final int ETX = 0x03;
final int STX = 0x02;
int stxpos = NOTFOUND;
int etxpos = NOTFOUND;
int etxcandidatepos = NOTFOUND;
for (int i = validMsgWithRandomData.length - 1; i >=0; --i)
{
if (ETX == validMsgWithRandomData[i])
{
etxcandidatepos = i;
if (NOTFOUND == etxpos)
{
etxpos = i;
stxpos = NOTFOUND;
}
}
else if (STX == validMsgWithRandomData[i])
{
if (NOTFOUND != etxpos)
{
stxpos = i;
if (NOTFOUND != etxcandidatepos)
{
etxpos = etxcandidatepos;
etxcandidatepos = NOTFOUND;
}
}
}
}

Since the amount of elements between STX and ETX is not a constant, i'd search in normal order and look for ETX after I find STX:
public String Test3(List<Byte> byteList) {
// find potential STX
for (int i = 0; i < byteList.size(); ++i) {
if (byteList.get(i) == STX) {
// make sure matching ETX exists
for (int j = i + 1; j < byteList.size(); ++j) {
if (byteList.get(j) == ETX) {
return i;
}
}
}
}
}
You can also do it in reverse order if you really want:
public String Test3(List<Byte> byteList) {
// find potential ETX
for (int i = byteList.size() - 1; i > 0; --i) {
if (byteList.get(i) == ETX) {
// make sure matching STX exists
for (int j = i - 1; j > 0; --j) {
if (byteList.get(j) == STX) {
return j;
}
}
}
}
}
By the way if you want to force a distance of elements between STX and ETX you can do it by changing j's initialization.

Related

Java - String frequency with huge data

I am trying to find frequency of a longest substring in a huge string.
'Huge string' can be up to 2M characters long, only a-z
'Substring' may be between 100k to 2M characters long
'Substring' is always same or smaller size than 'Huge string'
currently, I am using following method which I created:
public static int[] countSubstringOccurence(String input, int substringLength) {
// input = from 100 000 to 2 000 000 alphanumeric characters long string;
// substringLength = from 100 000 to 2 000 000, always smaller than input
LinkedHashMap < String, Integer > substringOccurence = new LinkedHashMap < > ();
int l;
for (int i = 0; i < (input.length() - substringLength) + 1; i++) {
String substring = input.substring(i, i + substringLength);
if (substringOccurence.containsKey(substring)) {
l = substringOccurence.get(substring);
substringOccurence.put(substring, ++l);
} else {
substringOccurence.put(substring, 1);
}
}
List < Integer > substringOccurenceList = new ArrayList < > (substringOccurence.values());
int numberOfUniqueSubstrings = substringOccurenceList.size();
int numberOfOccurenciesOfMostCommonSubstrings = 0;
int numberOfSubstringsOfMostCommonSubstring = 0;
for (int i: substringOccurenceList) {
if (i > numberOfOccurenciesOfMostCommonSubstrings) {
numberOfOccurenciesOfMostCommonSubstrings = i;
numberOfSubstringsOfMostCommonSubstring = 1;
} else if (i == numberOfOccurenciesOfMostCommonSubstrings) {
numberOfSubstringsOfMostCommonSubstring++;
}
}
return new int[] {
numberOfUniqueSubstrings,
numberOfOccurenciesOfMostCommonSubstrings,
numberOfSubstringsOfMostCommonSubstring
};
}
later I am converting this to ArrayList and I iterate through whole list to find how many substrings and how many times these substrings are represented.
But after around 4 000 to 8 000 iterations I get java.lang.OutOfMemoryError Exception (which I expect since the process of this code takes over 2GB of memory at this point (I know, storing this amount of strings in memory can take up to 2TB in edge cases)). I tried using SHA1 hash as a key, which works, but it takes way more time, there are possible collisions and I think that there might be a better way to do this, but I can't think of any "better" optimization.
Thank you for any kind of help.
EDIT
There is some example input => output:
f("abcabc", 3) => 3 2 1
f("abcdefghijklmnopqrstuvwqyzab", 3) => 26 1 26
f("abcdefghijklmnopqrstuvwqyzab", 2) => 26 2 1
Ive changed the code to this:
public static int[] countSubstringOccurence(String text, int substringLength) {
int textLength = text.length();
int numberOfUniqueSubstrings = 0;
List<Integer> substrIndexes = new ArrayList<>();
for (int i = 0; i < (textLength - substringLength) + 1; i++) {
boolean doesNotExists = true;
for (int j = i + 1; j < (textLength - substringLength) + 1; j++) {
String actualSubstr = text.substring(i, i + substringLength);
String indexSubstr = text.substring(j, j + substringLength);
if (actualSubstr.equals(indexSubstr)) {
doesNotExists = false;
substrIndexes.add(j);
}
}
if (doesNotExists) {
numberOfUniqueSubstrings++;
substrIndexes.add(i);
}
}
LinkedHashMap<Integer, Integer> substrCountMap = new LinkedHashMap<>();
for (int i : substrIndexes) {
String substr = text.substring(i, i + substringLength);
int lastIndex = 0;
int count = 0;
while (lastIndex != -1) {
lastIndex = text.indexOf(substr, lastIndex);
if (lastIndex != -1) {
count++;
lastIndex += substr.length();
}
}
substrCountMap.put(i, count);
}
List<Integer> substrCountList = new ArrayList<>(substrCountMap.values());
int numberOfOccurenciesOfMostCommonSubstrings = 0;
int numberOfSubstringsOfMostCommonSubstring = 0;
for (int count : substrCountList) {
if (count > numberOfOccurenciesOfMostCommonSubstrings) {
numberOfOccurenciesOfMostCommonSubstrings = count;
numberOfSubstringsOfMostCommonSubstring = 1;
} else if (count == numberOfOccurenciesOfMostCommonSubstrings) {
numberOfSubstringsOfMostCommonSubstring++;
}
}
return new int[] {
numberOfUniqueSubstrings,
numberOfOccurenciesOfMostCommonSubstrings,
numberOfSubstringsOfMostCommonSubstring
};
}
this code does not crash, its just really, really slow (I guess its at least O(2n^2)). Can anyone think of a faster way?
It would be great if it could fit under 1GB RAM and under 15 minutes on a CPU equal to i3-3xxx. I am done for today.
Run it on Java 6. Not kidding!
Java 6 substring does NOT copy the characters, but only the reference, the index and the length.
just use StrinsgTokenizer class and extract each word.Then store each word in an array of String type of size given by the method <object name>.countTokens();
then you can easily calculate the frequencies of the given word

Resolving method calls from generic object in enhanced for loop

This is more than likely a simple question for someone who is more familiar with Java than I am. Here's the gist of my issue:
I have a function that basically generates the possible combinations of the objects contained within an ArrayList. Being that I have multiple objects that need to use this function, the function is screaming at me to be made generic. The issue I'm encountering, though, is that an enhanced for-loop is unable to resolve method calls from the generic iterator. I understand why this happening, but I'm not familiar enough with Java to know how to resolve this issue. In any case, here is my code:
private <T> ArrayList<T> determineIdealOrderCombination(ArrayList<T> orders, int position){
// Local Variable Declarations
List<ArrayList<T>> subsets = new ArrayList<>();
int k = orders.size()+1; // Add one due to the do-while loop
int theoreticalQuantity;
int indexOfMaxProfit;
double maxProfit;
int[] s; // Here we'll keep indices pointing to elements in input array
double[] profits; // Here we'll keep track of the profit of each combination
// Begin searching for valid combinations
do {
// Setup
k--;
s = new int[k];
profits = new double[k];
// Generate combinations
if ( (k <= orders.size()) && (k > 0) ) {
// Set the first index sequence: 0, 1, 2,...
for (int i = 0; (s[i] = i) < k - 1; i++) ;
subsets.add(getSubset(orders, s));
for (; ; ) {
int i;
// Find position of item that can be incremented
for (i = k - 1; i >= 0 && s[i] == orders.size() - k + i; i--) ;
if (i < 0) {
break;
} else {
s[i]++; // increment this item
for (++i; i < k; i++) { // fill up remaining items
s[i] = s[i - 1] + 1;
}
subsets.add(getSubset(orders, s));
}
}
// All combinations have been evaluated, now throw away invalid combinations that violate the upper limit
// and calculate the valid combinations profits.
for (int i = 0; i < subsets.size(); i++) {
// Calculate the final position
theoreticalQuantity = position;
profits[i] = 0;
for (T t : subsets.get(i)) {
theoreticalQuantity += t.getQuantity(); // <-- THE PROBLEM
profits[i] += calculateProjectedProfit(t.getSecurity(), t.getQuantity(), t.getPrice()); // <-- THE PROBLEM
}
if(theoreticalQuantity > _MAX_POSITION_PER_ASSET){
// Negate profits if final position violates the position limit on an asset
profits[i] = Double.MIN_VALUE;
}
}
}
else{
break;
}
}
while( (subsets.size() == 0) );
// Verify that the subset array is not zero - it should never be zero
if(subsets.size() == 0){
return new ArrayList<>();
}
// Return the most profitable combination, if any.
indexOfMaxProfit = -1;
maxProfit = Double.MIN_VALUE;
for(int i = 0; i < profits.length; i++){
if(profits[i] != Double.MIN_VALUE){
if(profits[i] > maxProfit){
maxProfit = profits[i];
indexOfMaxProfit = i;
}
}
}
if( (maxProfit > 0) && (indexOfMaxProfit != -1) ){
return subsets.get(indexOfMaxProfit);
}
else{
return new ArrayList<>();
}
}
Any help would be appreciated.
This is how you tell the compiler that the incoming objects have the relevant methods:
public interface MyCommonInterface {
public int getQuantity();
}
private <T extends MyCommonInterface> ArrayList<T> determineIdealOrderCombination(ArrayList<T> orders, int position) {
As an additional note, i would read some tutorials on generics before attempting to use them. they are a little tricky to get the hang of initially. however, once you put out a little effort to learn the basics, you should be in a much better place to actually utilize them.

Getting a list of binary numbers composing a number

In Java, having a number like 0b1010, I would like to get a list of numbers "composing" this one: 0b1000 and 0b0010 in this example: one number for each bit set.
I'm not sure about the best solution to get it. Do you have any clue ?
Use a BitSet!
long x = 0b101011;
BitSet bs = BitSet.valueOf(new long[]{x});
for (int i = bs.nextSetBit(0); i >=0 ; i = bs.nextSetBit(i+1)) {
System.out.println(1 << i);
}
Output:
1
2
8
32
If you really want them printed out as binary strings, here's a little hack on the above method:
long x = 0b101011;
char[] cs = new char[bs.length()];
Arrays.fill(cs, '0');
BitSet bs = BitSet.valueOf(new long[]{x});
for (int i = bs.nextSetBit(0); i >=0 ; i = bs.nextSetBit(i+1)) {
cs[bs.length()-i-1] = '1';
System.out.println(new String(cs)); // or whatever you want to do with this String
cs[bs.length()-i-1] = '0';
}
Output:
000001
000010
001000
100000
Scan through the bits one by one using an AND operation. This will tell you if a bit at one position is set or not. (https://en.wikipedia.org/wiki/Bitwise_operation#AND). Once you have determined that some ith-Bit is set, make up a string and print it. PSEUDOCODE:
public static void PrintAllSubbitstrings(int number)
{
for(int i=0; i < 32; i++) //32 bits maximum for an int
{
if( number & (1 << i) != 0) //the i'th bit is set.
{
//Make up a bitstring with (i-1) zeroes to the right, then one 1 on the left
String bitString = "1";
for(int j=0; j < (i-1); j++) bitString += "0";
System.out.println(bitString);
}
}
}
Here is a little test that works for me
public static void main(String[] args) {
int num = 0b1010;
int testNum = 0b1;
while(testNum < num) {
if((testNum & num) >0) {
System.out.println(testNum + " Passes");
}
testNum *= 2;
}
}

Adding 1 to binary byte array

I am trying to add 1 to a byte array containing binary number. It works for some cases and not for others. I cannot convert my array to an integer and add one to it. I am trying to do the addition with the number in the array. If someone could please point me i where I am messing up on this!
Test cases that have worked: 1111, 0, 11
EDIT: I understand how to do it with everyone's help! I was wondering if the binary number had the least significant bit at the first position of the array.
Example: 1101 would be stored as [1,0,1,1]-how could I modify my code to account for that?
public static byte[] addOne(byte[] A)
{
//copy A into new array-size+1 in case of carry
byte[] copyA = new byte[A.length+1];
//array that returns if it is empty
byte [] copyB = new byte [1];
//copy A into new array with length+1
for(byte i =0; i <copyA.length&& i<A.length; i ++)
{
copyA[i]=A[i];
}
//if there is nothing in array: return 1;
if(copyA.length == 0)
{
//it will return 1 bc 0+1=1
copyB[0]=1;
return copyB;
}
//if first slot in array is 1(copyA) when you hit zero you dont have to carry anything. Go until you see zero
if(copyA[0] ==1 )
{
//loops through the copyA array to check if the position 0 is 1 or 0
for(byte i =0; i<copyA.length; i ++)
{
if(copyA[i] == 0)//if it hits 0
{
copyA[i]=1;//change to one
break;//break out of for loop
}
else{
copyA[i]=0;
}
}
return copyA;
}
else if (copyA[0]==0)
{
copyA[0]=1;
}
return copyA;
}
The idea:
100010001 + 1000000 + 1111111 +
1 = 1 = 1 =
--------- ------- -------
100010010 1000001 (1)0000000
I designed the operation as you can do on paper.
As for decimal operation adding a number is done starting from right (less significant digit) to left (most significant digit).
Note that 0 + 1 = 1 and I finished so I can exit
Instead 1 + 1 = 10 (in binary) so I write 0 (at the rightest position) and I have a remainder of 1 to add to next digit. So I move left of one position and I redo the same operation.
I hope this is helpful to understand it
It is a simple algorithm:
Set position to the last byte.
If current byte is 0 change it to 1 and exit.
If current byte is 1 change it to 0 and move left of one position.
public static byte[] addOne(byte[] A) {
int lastPosition = A.length - 1;
// Looping from right to left
for (int i = lastPostion; i >= 0; i--) {
if (A[i] == 0) {
A[i] = 1; // If current digit is 0 I change it to 1
return A; // I can exit because I have no reminder
}
A[i] = 0; // If current digit is 1 I change it to 0
// and go to the next position (one position left)
}
return A; // I return the modified array
}
If the starting array is [1,0,1,1,1,1,1,0,0] the resulting array will be [1,0,1,1,1,1,1,0,1].
If the starting array is [1,0,1,1,1,1,1,1,1] the resulting array will be [1,1,0,0,0,0,0,0,0].
If the starting array is [1,1,1,1,1,1,1,1,1] the resulting array will be [0,0,0,0,0,0,0,0,0].
Note If you need to handle this last situation (overflow) in a different manner you can try one of the following:
throw an exception
enlarge the array of 1 and result [1,0,0,0,0,0,0,0,0,0]
Here is a piece of code to handle both situations:
Throwing exception:
public static byte[] addOne(byte[] A) throws Exception {
for (int i = A.length - 1; i >= 0; i--) {
if (A[i] == 0) {
A[i] = 1;
return A;
}
A[i] = 0;
if (i == 0) {
throw new Exception("Overflow");
}
}
return A;
}
Enlarging array:
public static byte[] addOne(byte[] A) {
for (int i = A.length - 1; i >= 0; i--) {
if (A[i] == 0) {
A[i] = 1;
return A;
}
A[i] = 0;
if (i == 0) {
A = new byte[A.length + 1];
Arrays.fill(A, (byte) 0); // Added cast to byte
A[0] = 1;
}
}
return A;
}
I suspect it works in some cases but not other as your code is too complicated.
static byte[] increment(byte[] bits) {
byte[] ret = new byte[bytes.length+1];
int carry = 1, i = 0;
for(byte b: bits) {
// low bit of an add;
ret[i++] = b ^ carry;
// high bit of an add.
carry &= b;
}
if (carry == 0)
return Arrays.copyOf(ret, bytes.length);
ret[i] = 1;
return ret;
}
For an array bits containing the binary numbers, the algorithm for adding 1 is:
Boolean carried = true;
for(int i = bits.length-1; i>=0; i--) {
if(bits[i] == 1 && carried) {
carried = true;
bits[i] = 0;
}
else if (bits[i] == 0 && carried) {
carried = false;
bits[i] = 1;
}
{
if(carried)
throw new Exception("Overflow");

About the CharMatcher.WHITESPACE implementation

When i looked up the implementation of CharMatcher and notice a field WHITESPACE_MULTIPLIER=1682554634 , then i set this value to 1582554634 , running the testcase CharMatcherTest#testWhitespaceBreakingWhitespaceSubset, of course it failed.
After that I changed testWhitespaceBreakingWhitespaceSubset to only invoke WHITESPACE.apply((char)c) without assert, print the index in the method of WHITESPACE.matches
int index=(WHITESPACE_MULTIPLIER * c) >>> WHITESPACE_SHIFT)
finally found that index collided after changed the WHITESPACE_MULTIPLIER from 1682554634 to 1582554634
No doubt, 1682554634 is well designed , my question is how can I infer this "magic number"?`
Upon Martin Grajcar's proposal, I try to write the "magic number generator" as follows and worked :
char[] charsReq = WHITESPACE_TABLE.toCharArray();
Arrays.sort(charsReq);
OUTER:
for (int WHITESPACE_MULTIPLIER_WANTTED = 1682553701; WHITESPACE_MULTIPLIER_WANTTED <= 1682554834; WHITESPACE_MULTIPLIER_WANTTED++) {
int matchCnt = 0;
for (int c = 0; c <= Character.MAX_VALUE; c++) {
int position = Arrays.binarySearch(charsReq, (char) c);
char index = WHITESPACE_TABLE.charAt((WHITESPACE_MULTIPLIER_WANTTED * c) >>> WHITESPACE_SHIFT);
if (position >= 0 && index == c) {
matchCnt++;
} else if (position < 0 && index != c) {
matchCnt++;
} else {
continue OUTER;
}
}
// all valid
if ((matchCnt - 1) == (int) (Character.MAX_VALUE)) {
System.out.println(WHITESPACE_MULTIPLIER_WANTTED);
}
}
if changed the sequence of characters(swap \u2001 \u2002 position) in WHITESPACE_TABLE the algorithms has no solution (changed the loop end condition to Integer.MAX_VALUE).
as the IntMath.gcd implementation is refer to http://en.wikipedia.org/wiki/Binary_GCD_algorithm
my question is : where can i find the material of CharMatcher.WHITESPACE.match implementation?
I'm not sure if the generator still exists somewhere, but it can be recreated easily. The class Result contains the data used in the implementation of CharMatcher.WHITESPACE:
static class Result {
private int shift;
private int multiplier;
private String table;
}
// No duplicates allowed.
private final String allMatchingString = "\u2002\r\u0085\u200A\u2005\u2000"
+ "\u2029\u000B\u2008\u2003\u205F\u1680"
+ "\u0009\u0020\u2006\u2001\u202F\u00A0\u000C\u2009"
+ "\u2004\u2028\n\u2007\u3000";
public Result generate(String allMatchingString) {
final char[] allMatching = allMatchingString.toCharArray();
final char filler = allMatching[allMatching.length - 1];
final int shift = Integer.numberOfLeadingZeros(allMatching.length);
final char[] table = new char[1 << (32 - shift)];
OUTER: for (int i=0; i>=0; ++i) {
final int multiplier = 123456789 * i; // Jumping a bit makes the search faster.
Arrays.fill(table, filler);
for (final char c : allMatching) {
final int index = (multiplier * c) >>> shift;
if (table[index] != filler) continue OUTER; // Conflict found.
table[index] = c;
}
return new Result(shift, multiplier, new String(table));
}
return null; // No solution exists.
}
It generates a different multiplier, but this doesn't matter.
In case no solution for a given allMatchingString exists, you can decrement shift and try again.

Categories