Java - String frequency with huge data

Java - String frequency with huge data - java

I am trying to find frequency of a longest substring in a huge string.
'Huge string' can be up to 2M characters long, only a-z
'Substring' may be between 100k to 2M characters long
'Substring' is always same or smaller size than 'Huge string'
currently, I am using following method which I created:
public static int[] countSubstringOccurence(String input, int substringLength) {
// input = from 100 000 to 2 000 000 alphanumeric characters long string;
// substringLength = from 100 000 to 2 000 000, always smaller than input
LinkedHashMap < String, Integer > substringOccurence = new LinkedHashMap < > ();
int l;
for (int i = 0; i < (input.length() - substringLength) + 1; i++) {
String substring = input.substring(i, i + substringLength);
if (substringOccurence.containsKey(substring)) {
l = substringOccurence.get(substring);
substringOccurence.put(substring, ++l);
} else {
substringOccurence.put(substring, 1);
}
}
List < Integer > substringOccurenceList = new ArrayList < > (substringOccurence.values());
int numberOfUniqueSubstrings = substringOccurenceList.size();
int numberOfOccurenciesOfMostCommonSubstrings = 0;
int numberOfSubstringsOfMostCommonSubstring = 0;
for (int i: substringOccurenceList) {
if (i > numberOfOccurenciesOfMostCommonSubstrings) {
numberOfOccurenciesOfMostCommonSubstrings = i;
numberOfSubstringsOfMostCommonSubstring = 1;
} else if (i == numberOfOccurenciesOfMostCommonSubstrings) {
numberOfSubstringsOfMostCommonSubstring++;
}
}
return new int[] {
numberOfUniqueSubstrings,
numberOfOccurenciesOfMostCommonSubstrings,
numberOfSubstringsOfMostCommonSubstring
};
}
later I am converting this to ArrayList and I iterate through whole list to find how many substrings and how many times these substrings are represented.
But after around 4 000 to 8 000 iterations I get java.lang.OutOfMemoryError Exception (which I expect since the process of this code takes over 2GB of memory at this point (I know, storing this amount of strings in memory can take up to 2TB in edge cases)). I tried using SHA1 hash as a key, which works, but it takes way more time, there are possible collisions and I think that there might be a better way to do this, but I can't think of any "better" optimization.
Thank you for any kind of help.
EDIT
There is some example input => output:
f("abcabc", 3) => 3 2 1
f("abcdefghijklmnopqrstuvwqyzab", 3) => 26 1 26
f("abcdefghijklmnopqrstuvwqyzab", 2) => 26 2 1
Ive changed the code to this:
public static int[] countSubstringOccurence(String text, int substringLength) {
int textLength = text.length();
int numberOfUniqueSubstrings = 0;
List<Integer> substrIndexes = new ArrayList<>();
for (int i = 0; i < (textLength - substringLength) + 1; i++) {
boolean doesNotExists = true;
for (int j = i + 1; j < (textLength - substringLength) + 1; j++) {
String actualSubstr = text.substring(i, i + substringLength);
String indexSubstr = text.substring(j, j + substringLength);
if (actualSubstr.equals(indexSubstr)) {
doesNotExists = false;
substrIndexes.add(j);
}
}
if (doesNotExists) {
numberOfUniqueSubstrings++;
substrIndexes.add(i);
}
}
LinkedHashMap<Integer, Integer> substrCountMap = new LinkedHashMap<>();
for (int i : substrIndexes) {
String substr = text.substring(i, i + substringLength);
int lastIndex = 0;
int count = 0;
while (lastIndex != -1) {
lastIndex = text.indexOf(substr, lastIndex);
if (lastIndex != -1) {
count++;
lastIndex += substr.length();
}
}
substrCountMap.put(i, count);
}
List<Integer> substrCountList = new ArrayList<>(substrCountMap.values());
int numberOfOccurenciesOfMostCommonSubstrings = 0;
int numberOfSubstringsOfMostCommonSubstring = 0;
for (int count : substrCountList) {
if (count > numberOfOccurenciesOfMostCommonSubstrings) {
numberOfOccurenciesOfMostCommonSubstrings = count;
numberOfSubstringsOfMostCommonSubstring = 1;
} else if (count == numberOfOccurenciesOfMostCommonSubstrings) {
numberOfSubstringsOfMostCommonSubstring++;
}
}
return new int[] {
numberOfUniqueSubstrings,
numberOfOccurenciesOfMostCommonSubstrings,
numberOfSubstringsOfMostCommonSubstring
};
}
this code does not crash, its just really, really slow (I guess its at least O(2n^2)). Can anyone think of a faster way?
It would be great if it could fit under 1GB RAM and under 15 minutes on a CPU equal to i3-3xxx. I am done for today.

Run it on Java 6. Not kidding!
Java 6 substring does NOT copy the characters, but only the reference, the index and the length.

just use StrinsgTokenizer class and extract each word.Then store each word in an array of String type of size given by the method <object name>.countTokens();
then you can easily calculate the frequencies of the given word

Related

Algorithm to create all permutations and lengths

I am looking to create an algorithm preferably in Java. I would like to go through following char array and create every possible permutations and lengths out of it.
For example, loop and print the following:
a
aa
aaaa
aaaaa
.... keep going ....
aaaaaaaaaaaaaaaaa ....
ab
aba
abaa .............
Till I hit all possible lengths and permutations from my array.
private void method(){
char[] data = "abcdefghiABCDEFGHI0123456789".toCharArray();
// loop and print each time
}
I think it would be silly to come up with 10s of for loops for this. I am guessing some form of recursion would help here but can't get my head around to even start with. Could I get some help with this please? Even if pointing me to a start or a blog or something. Been Googling and looking around and many permutations examples exists but keeps to fixed max length. None seems to have examples on multiple length + permutations. Please advice. Thanks.

Another way to do it is this:
public class HelloWorld{
public static String[] method(char[] arr, int length) {
if(length == arr.length - 1) {
String[] strArr = new String[arr.length];
for(int i = 0; i < arr.length; i ++) {
strArr[i] = String.valueOf(arr[i]);
}
return strArr;
}
String[] before = method(arr, length + 1);
String[] newArr = new String[arr.length * before.length];
for(int i = 0; i < arr.length; i ++) {
for(int j = 0; j < before.length; j ++) {
if(i == 0)
System.out.println(before[j]);
newArr[i * before.length + j] = (arr[i] + before[j]);
}
}
return newArr;
}
public static void main(String []args){
String[] all = method("abcde".toCharArray(), 0);
for(int i = 0; i < all.length; i ++) {
System.out.println(all[i]);
}
}
}
However be careful you'll probably run out of memory or the program will take a looooong time to compile/run if it does at all. You are trying to print 3.437313508041091e+40 strings, that's 3 followed by 40 zeroes.
Here's the solution also in javascript because it starts running but it needs 4 seconds to get to 4 character permutations, for it to reach 5 character permutations it will need about 28 times that time, for 6 characters it's 4 * 28 * 28 and so on.
const method = (arr, length) => {
if(length === arr.length - 1)
return arr;
const hm = [];
const before = method(arr, length + 1);
for(let i = 0; i < arr.length; i ++) {
for(let j = 0; j < before.length; j ++) {
if(i === 0)
console.log(before[j]);
hm.push(arr[i] + before[j]);
}
}
return hm;
};
method('abcdefghiABCDEFGHI0123456789'.split(''), 0).forEach(a => console.log(a));

private void method(){
char[] data = "abcdefghiABCDEFGHI0123456789".toCharArray();
// loop and print each time
}
With your given input there are 3.43731350804×10E40 combinations. (Spelled result in words is eighteen quadrillion fourteen trillion three hundred ninety-eight billion five hundred nine million four hundred eighty-one thousand nine hundred eighty-four. ) If I remember it correctly the maths is some how
1 + x + x^2 + x^3 + x^4 + ... + x^n = (1 - x^n+1) / (1 - x)
in your case
28 + 28^2 + 28^3 + .... 28^28
cause you will have
28 combinations for strings with length one
28*28 combinations for strings with length two
28*28*28 combinations for strings with length three
...
28^28 combinations for strings with length 28
It will take a while to print them all.
One way I can think of is to use the Generex library, a Java library for generating String that match a given regular expression.
Generex github. Look at their page for more info.
Generex maven repo. Download the jar or add dependency.
Using generex is straight forward if you are somehow familiar with regex.
Example using only the first 5 chars which will have 3905 possible combinations
public static void main(String[] args) {
Generex generex = new Generex("[a-e]{1,5}");
System.out.println(generex.getAllMatchedStrings().size());
Iterator iterator = generex.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next());
}
}
Meaning of [a-e]{1,5} any combination of the chars a,b,c,d,e wit a min length of 1 and max length of 5
output
a
aa
aaa
aaaa
aaaaa
aaaab
aaaac
aaaad
aaaae
aaab
aaaba
aaabb
aaabc
aaabd
aaabe
aaac
....
eeee
eeeea
eeeeb
eeeec
eeeed
eeeee

You can have a for loop that starts from 1 and ends at array.length and in each iteration call a function that prints all the permutations for that length.
public void printPermutations(char[] array, int length) {
/*
* Create all permutations with length = length and print them
*/
}
public void method() {
char data = "abcdefghiABCDEFGHI0123456789".toCharArray();
for(int i = 1; i <= data.length; i ++) {
printPermutations(data, i);
}
}

I think the following recursion could solve your problem:
public static void main(String[] args) {
final String[] data = {"a", "b", "c"};
sampleWithReplacement(data, "", 1, 5);
}
private static void sampleWithReplacement(
final String[] letters,
final String prefix,
final int currentLength,
final int maxLength
) {
if (currentLength <= maxLength) {
for (String letter : letters) {
final String newPrefix = prefix + letter;
System.out.println(newPrefix);
sampleWithReplacement(letters, newPrefix, currentLength + 1, maxLength);
}
}
}
where data specifies your possible characters to sample from.

Is this what you're talking about?
public class PrintPermutations
{
public static String stream = "";
public static void printPermutations (char[] set, int count, int length)
{
if (count < length)
for (int i = 0; i < set.length; ++i)
{
stream += set[i];
System.out.println (stream);
printPermutations (set, count + 1, length);
stream = stream.substring (0, stream.length() - 1);
}
}
public static void main (String[] args)
{
char[] set = "abcdefghiABCDEFGHI0123456789".toCharArray();
printPermutations (set, 0, set.length);
}
}
Test it using a smaller string first.

On an input string 28 characters long this method is never going to end, but for smaller inputs it will generate all permutations up to length n, where n is the number of characters. It first prints all permutations of length 1, then all of length 2 etc, which is different from your example, but hopefully order doesn't matter.
static void permutations(char[] arr)
{
int[] idx = new int[arr.length];
char[] perm = new char[arr.length];
Arrays.fill(perm, arr[0]);
for (int i = 1; i < arr.length; i++)
{
while (true)
{
System.out.println(new String(perm, 0, i));
int k = i - 1;
for (; k >= 0; k--)
{
idx[k] += 1;
if (idx[k] < arr.length)
{
perm[k] = arr[idx[k]];
break;
}
idx[k] = 0;
perm[k] = arr[idx[k]];
}
if (k < 0)
break;
}
}
}
Test:
permutations("abc".toCharArray());
Output:
a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc

First Longest Increasing Subsequence

The longest increasing subsequence is the well known problem and I have a solution with the patience algorithm.
Problem is, my solution gives me the "Best longest increasing sequence" instead of the First longest increasing sequence that appears.
The difference is that some of the members of the sequence are larger numbers in the first(but the sequence length is exactly the same).
Getting the first sequence is turning out to be quite harder than expected, because having the best sequence doesn't easily translate into having the first sequence.
I've thought of doing my algorithm then finding the first sequence of length N, but not sure how to.
So, how would you find the First longest increasing subsequence from a sequence of random integers?
My code snippet:
public static void main (String[] args) throws java.lang.Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int inputInt;
int[] intArr;
try {
String input = br.readLine().trim();
inputInt = Integer.parseInt(input);
String inputArr = br.readLine().trim();
intArr = Arrays.stream(inputArr.split(" ")).mapToInt(Integer::parseInt).toArray();
} catch (NumberFormatException e) {
System.out.println("Could not parse integers.");
return;
}
if(inputInt != intArr.length) {
System.out.println("Invalid number of arguments.");
return;
}
ArrayList<ArrayList<Integer>> sequences = new ArrayList<ArrayList<Integer>>();
int sequenceCount = 1;
sequences.add(new ArrayList<Integer>());
sequences.get(0).add(0);
for(int i = 1; i < intArr.length; i++) {
for(int j = 0; j < sequenceCount; j++) {
if(intArr[i] <= intArr[sequences.get(j).get(sequences.get(j).size() - 1)]) {
sequences.get(j).remove(sequences.get(j).size() - 1);
sequences.get(j).add(i);
break;
} else if (j + 1 == sequenceCount) {
sequences.add(new ArrayList<Integer>(sequences.get(j)));
sequences.get(j + 1).add(i);
sequenceCount++;
break; //increasing sequenceCount causes infinite loop
} else if(intArr[i] < intArr[sequences.get(j + 1).get(sequences.get(j + 1).size() - 1)]) {
sequences.set(j+ 1, new ArrayList<Integer>(sequences.get(j)));
sequences.get(j+ 1).add(i);
break;
}
}
}
int bestSequenceLength = sequenceCount;
ArrayList<Integer> bestIndexes = new ArrayList<Integer>(sequences.get(bestSequenceLength - 1));
//build bestSequence, then after it I'm supposed to find the first one instead
int[] bestSequence = Arrays.stream(bestIndexes.toArray()).mapToInt(x -> intArr[(int) x]).toArray();
StringBuilder output = new StringBuilder("");
for(Integer x : bestSequence) {
output.append(x + " ");
}
System.out.println(output.toString().trim());
}
I'm storing indexes instead in preparation for having to access the original array again. Since it's easier to go from indexes to values than vice versa.
Example:
3 6 1 2 8
My code returns: 1 2 8
First sequence is: 3 6 8
Another Example:
1 5 2 3
My code correctly returns: 1 2 3
Basically, my code works as long as the first longest sequence is the same as the best longest sequence. But when you have a bunch of longest sequences of the same length, it grabs the best one not the first one.

Code is self-explanatory. (Have added comments, let me know if you need something extra).
public class Solution {
public static void main(String[] args) {
int[] arr = {3,6,1,2,8};
System.out.println(solve(arr).toString());
}
private static List<Integer> solve(int[] arr){
int[][] data = new int[arr.length][2];
int max_length = 0;
// first location for previous element index (for backtracing to print list) and second for longest series length for the element
for(int i=0;i<arr.length;++i){
data[i][0] = -1; //none should point to anything at first
data[i][1] = 1;
for(int j=i-1;j>=0;--j){
if(arr[i] > arr[j]){
if(data[i][1] <= data[j][1] + 1){ // <= instead of < because we are aiming for the first longest sequence
data[i][1] = data[j][1] + 1;
data[i][0] = j;
}
}
}
max_length = Math.max(max_length,data[i][1]);
}
List<Integer> ans = new ArrayList<>();
for(int i=0;i<arr.length;++i){
if(data[i][1] == max_length){
int curr = i;
while(curr != -1){
ans.add(arr[curr]);
curr = data[curr][0];
}
break;
}
}
Collections.reverse(ans);// since there were added in reverse order in the above while loop
return ans;
}
}
Output:
[3, 6, 8]

Getting a list of binary numbers composing a number

In Java, having a number like 0b1010, I would like to get a list of numbers "composing" this one: 0b1000 and 0b0010 in this example: one number for each bit set.
I'm not sure about the best solution to get it. Do you have any clue ?

Use a BitSet!
long x = 0b101011;
BitSet bs = BitSet.valueOf(new long[]{x});
for (int i = bs.nextSetBit(0); i >=0 ; i = bs.nextSetBit(i+1)) {
System.out.println(1 << i);
}
Output:
1
2
8
32
If you really want them printed out as binary strings, here's a little hack on the above method:
long x = 0b101011;
char[] cs = new char[bs.length()];
Arrays.fill(cs, '0');
BitSet bs = BitSet.valueOf(new long[]{x});
for (int i = bs.nextSetBit(0); i >=0 ; i = bs.nextSetBit(i+1)) {
cs[bs.length()-i-1] = '1';
System.out.println(new String(cs)); // or whatever you want to do with this String
cs[bs.length()-i-1] = '0';
}
Output:
000001
000010
001000
100000

Scan through the bits one by one using an AND operation. This will tell you if a bit at one position is set or not. (https://en.wikipedia.org/wiki/Bitwise_operation#AND). Once you have determined that some ith-Bit is set, make up a string and print it. PSEUDOCODE:
public static void PrintAllSubbitstrings(int number)
{
for(int i=0; i < 32; i++) //32 bits maximum for an int
{
if( number & (1 << i) != 0) //the i'th bit is set.
{
//Make up a bitstring with (i-1) zeroes to the right, then one 1 on the left
String bitString = "1";
for(int j=0; j < (i-1); j++) bitString += "0";
System.out.println(bitString);
}
}
}

Here is a little test that works for me
public static void main(String[] args) {
int num = 0b1010;
int testNum = 0b1;
while(testNum < num) {
if((testNum & num) >0) {
System.out.println(testNum + " Passes");
}
testNum *= 2;
}
}

Incrementing charaters past 'Z' in Java like a Spreadsheet

I didn't start too long ago with programming, and currently I need a method to produce an array, containing a character which is comes after the previous character. It should start with an 'A' at 0, then a B at '1' etc.. The hard part is making it so that after the 'Z' comes 'AA'.
What I came up with:
public static String[] charArray(int length)
{
String[] res = new String[length];
for(int i = 0; i < length; i++)
{
String name = "";
int colNumber = i;
while(colNumber > 0)
{
char c = (char) ('A' + (colNumber % 26));
name = c + name;
colNumber = colNumber / 26;
}
res[i] = name;
}
return res;
}
This works fine for the first 26 letters of the alphabet, but it produces "... Y, Z, BA, BB, BC..." instead of "... Y, Z, AA, AB, AC..."
What's wrong? Or are there any more efficient or easier ways to do this?
Thanks in advance!

You had a nice start. Instead of running through the while loop this example basically calculates the value of C based on the number % 26
Then the letter is added (concatenated) to the value within the array at the position: (index / 26) - 1 which ensures it's keeping up with the changes over time.
When iterating through on the first go, you'll have only one letter in each slot in the array A B C etc.
Once you've run through the alphabet, you'll then have an index that looks backwards and adds the current letter to that value.
You'll eventually get into AAA AAB AAC etc. or even more.
public static String[] colArray(int length) {
String[] result = new String[length];
String colName = "";
for(int i = 0; i < length; i++) {
char c = (char)('A' + (i % 26));
colName = c + "";
if(i > 25){
colName = result[(i / 26) - 1] + "" + c;
}
result[i] = colName;
}
return result;
}

Try like this:
public static String[] charArray(int length)
{
String[] res = new String[length];
int counter = 0;
for(int i = 0; counter < length; i++)
{
String name = "";
int colNumber = i;
while(colNumber > 0 && colNumber % 27 != 0)
{
char c = (char) ('A' + ((colNumber) % 27) - 1);
name = c + name;
colNumber = colNumber / 27;
}
res[counter] = name;
if (i % 27 != 0) {
counter++;
}
}
return res;
}
Basically your algorithm skipped all elements starting with an A (A, AA, AB, ...) (because an A is created when colNumber is 0, but this never happens because your while terminates in that case). Taking modulo of 27 and then actually subtracting 1 after from the char fixes this issue. Then we use counter as index as otherwise we would end up with some empty elements in the array (the ones where i would be i % 27 == 0).

This solution works for me. Having 26 vocabulary letters and knowing that 65 is the char 'A' in ASCII table, we can get the incrementing with this recursive method...
private fun indexLetters(index: Int, prefix: String = "") : String {
val indexSuffix:Int = index.rem(26)
val suffix = (65 + indexSuffix).toChar().toString().plus(".")
val newPrefix = suffix.plus(prefix)
val indexPrefix: Int = index / 26
return if (indexPrefix > 0) {
indexLetters(indexPrefix - 1, newPrefix)
} else {
newPrefix
}
}
You can call this kotlin method like
indexLetters(0) //To get an 'A'
indexLetters(25) //To get a 'Z'
indexLetters(26) //To get an 'A.A'
etcetera...
from an array iteration, depending of your requirements

java - how to reduce execution time for this program [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
int n, k;
int count = 0, diff;
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String[] input;
input = br.readLine().split(" ");
n = Integer.parseInt(input[0]);
int[] a = new int[n];
k = Integer.parseInt(input[1]);
input = br.readLine().split(" ");
for (int i = 0; i < n; i++) {
a[i] = Integer.parseInt(input[i]);
for (int j = 0; j < i; j++) {
diff = a[j] - a[i];
if (diff == k || -diff == k) {
count++;
}
}
}
System.out.print(count);
This is a sample program where I am printing particular difference count, where n range is <=100000
Now problem is to decrease execution for this program. How can I make it better to reduce running time.
Thanks in advance for suggestions

Read the numbers from a file and put them in a Map (numbers as keys, their frequencies as values). Iterate over them once, and for each number check if the map contains that number with k added. If so, increase your counter. If you use a HashMap it's O(n) that way, instead of your algorithm's O(n^2).
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int k = Integer.parseInt(br.readLine().split(" ")[1]);
Map<Integer, Integer> readNumbers = new HashMap<Integer, Integer>();
for (String aNumber : br.readLine().split(" ")) {
Integer num = Integer.parseInt(aNumber);
Integer freq = readNumbers.get(num);
readNumbers.put(num, freq == null ? 1 : freq + 1);
}
int count = 0;
for (Integer aNumber : readNumbers.keySet()) {
int freq = readNumbers.get(aNumber);
if (k == 0) {
count += freq * (freq - 1) / 2;
} else if (readNumbers.containsKey(aNumber + k)) {
count += freq * readNumbers.get(aNumber + k);
}
}
System.out.print(count);
EDIT fixed for duplicates and k = 0

Here is a comparison of #Socha23's solution using HashSet, TIntIntHashSet and the original solution.
For 100,000 numbers I got the following (without the reading and parsing)
For 100 unique values, k=10
Set: 89,699,743 took 0.036 ms
Trove Set: 89,699,743 took 0.017 ms
Loops: 89,699,743 took 3623.2 ms
For 1000 unique values, k=10
Set: 9,896,049 took 0.187 ms
Trove Set: 9,896,049 took 0.193 ms
Loops: 9,896,049 took 2855.7 ms
The code
import gnu.trove.TIntIntHashMap;
import gnu.trove.TIntIntProcedure;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
class Main {
public static void main(String... args) throws Exception {
Random random = new Random(1);
int[] a = new int[100 * 1000];
int k = 10;
for (int i = 0; i < a.length; i++)
a[i] = random.nextInt(100);
for (int i = 0; i < 5; i++) {
testSet(a, k);
testTroveSet(a, k);
testLoops(a, k);
}
}
private static void testSet(int[] a, int k) {
Map<Integer, Integer> readNumbers = new HashMap<Integer, Integer>();
for (int num : a) {
Integer freq = readNumbers.get(num);
readNumbers.put(num, freq == null ? 1 : freq + 1);
}
long start = System.nanoTime();
int count = 0;
for (Integer aNumber : readNumbers.keySet()) {
if (readNumbers.containsKey(aNumber + k)) {
count += (readNumbers.get(aNumber) * readNumbers.get(aNumber + k));
}
}
long time = System.nanoTime() - start;
System.out.printf("Set: %,d took %.3f ms%n", count, time / 1e6);
}
private static void testTroveSet(int[] a, final int k) {
final TIntIntHashMap readNumbers = new TIntIntHashMap();
for (int num : a)
readNumbers.adjustOrPutValue(num, 1,1);
long start = System.nanoTime();
final int[] count = { 0 };
readNumbers.forEachEntry(new TIntIntProcedure() {
#Override
public boolean execute(int key, int keyCount) {
count[0] += readNumbers.get(key + k) * keyCount;
return true;
}
});
long time = System.nanoTime() - start;
System.out.printf("Trove Set: %,d took %.3f ms%n", count[0], time / 1e6);
}
private static void testLoops(int[] a, int k) {
long start = System.nanoTime();
int count = 0;
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < i; j++) {
int diff = a[j] - a[i];
if (diff == k || -diff == k) {
count++;
}
}
}
long time = System.nanoTime() - start;
System.out.printf("Loops: %,d took %.1f ms%n", count, time / 1e6);
}
private static long free() {
return Runtime.getRuntime().freeMemory();
}
}

Since split() uses regular expressions to split a string, you should meassure whether StringTokenizer would speed up things.

You are trying to find elements which have difference k. Try this:
Sort the array.
You can do it in one pass after sorting by having two pointers and adjusting one of them depending on if the difference is bigger or smaller than k

A sparse map for the values, with their frequency of occurrence.
SortedMap<Integer, Integer> a = new TreeMap<Integer, Integer>();
for (int i = 0; i < n; ++i) {
int value = input[i];
Integer old = a.put(value, 1);
if (old != null) {
a.put(value, old.intValue() + 1);
}
}
for (Map.Entry<Integer, Integer> entry : a.entrySet()) {
Integer freq = a.get(entry.getKey() + k);
count += entry.getValue() * freq; // N values x M values further on.
}
This O(n).
Should this be too costly, you could sort the input array and do something similar.

I don't understand why you have one loop inside another. It's O(n^2) that way.
You also mingle reading in this array of ints with getting this count. I'd separate the two - read the whole thing in and then sweep through and get the difference count.
Perhaps I'm misunderstanding what you're doing, but it feels like you're re-doing a lot of wok in that inside loop.

Why not use java.util.Scanner clas instead of BufferReader.
for example :-
Scanner sc = new Scanner(System.in);
int number = sc.nextInt();
this may work faster as their are less wrappers involved.... See this link

Use sets and maps, as other users have already explained, so I won't reiterate their suggestions again.
I will suggest something else.
Stop using String.split. It compiles and uses a regular expression.
String.split has this line in it: Pattern.compile(expr).split(this).
If you want to split along a single character, you could write your own function and it would be much faster. I believe Guava (ex-Google collections API) has String split function which splits on characters without using a regular expression.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - String frequency with huge data - java

Run it on Java 6. Not kidding! Java 6 substring does NOT copy the characters, but only the reference, the index and the length.

just use StrinsgTokenizer class and extract each word.Then store each word in an array of String type of size given by the method <object name>.countTokens(); then you can easily calculate the frequencies of the given word

Related

Algorithm to create all permutations and lengths

First Longest Increasing Subsequence

Getting a list of binary numbers composing a number

Incrementing charaters past 'Z' in Java like a Spreadsheet

java - how to reduce execution time for this program [closed]

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - String frequency with huge data - java

Run it on Java 6. Not kidding! Java 6 substring does NOT copy the characters, but only the reference, the index and the length.

just use StrinsgTokenizer class and extract each word.Then store each word in an array of String type of size given by the method <object name>.countTokens(); then you can easily calculate the frequencies of the given word

Related

Algorithm to create all permutations and lengths

*First* Longest Increasing Subsequence

Getting a list of binary numbers composing a number

Incrementing charaters past 'Z' in Java like a Spreadsheet

java - how to reduce execution time for this program [closed]

Categories

Resources

First Longest Increasing Subsequence