I'm trying to figure out how to implement a method that does the following:
The method takes in a text file and counts the number of words up until a specified character "#".
An array is passed as a parameter through the method, this contains a list of sequential numbers. A number in an array corresponds to the position the word occurs in the text file (disregarding #'s) so the 4th word will correspond to a value of 3 (n-1).
The method will count the number of times the word before a # occurs in the array and divide it by the total number of entries between #'s it will then take the average of each time this is done.
So an example to make this clear:
Say you have the text file containing :
Hi my name # is something #
A corresponding array would be:
0,0,1,1,1,2,2,2,2,3,4,4 (a number for each letter in sequence)
The first hash would occur between the 2 and 3. So the 2's represent the word occuring before the #. So we would calculate (total number of 2's)/total number of 0's, 1's and 2's. This would be 4/9.
We would then calculate the same between the two hashes # is something #. 'something' corresponds to a 4, so we would have (total number of 4's)/total number of 3 and 4's.
This would be 2/3.
We would then take the average of 2/3 and 4/9
I hope this is clear, let me know if you need any clarifications.
I'd split() the string for any whitespace chars. Now I have every word in an array, with each cell's index representing the corresponding num, and the cell's content(the word) length is the 'how many 0's or 1's or ..' I have.
That should solve the first part of your problem.
Then you need to find where each # is, its offset that is, but that offset you want to represent in words, not chars. So I'd iterate through the previously created array, and check if the word I stored is a #. If it is I'd update a marker variable (this should hold the previous position/index of the last seen #), and calculate the division you want (4/8 , 2/3 w/e). That is the length of the previous cell's content divided by the sum of length's from the marker until the current index-1.
I think that's about it; the logic. It's not that hard to implement. Just don't forget to check the bounds.
Related
i'm working on an assignment that i have to find the most four frequent characters in a string.
i write this so far.
import java.util.Scanner;
public class FindTheMostOccur
{
public static void main (String[] args)
{
String input;
String example = "how to find the most frequent character in a big string using java?";
String[] array = new String[example.length()];
for (int i = 97; i < 123; i++)
{
int mostFrequent =0;
for( int j = 0; j < example.length(); j++)
{
if(example.charAt(j) == i)
{
++mostFrequent;
}
}
System.out.println( (char)i + " is showing " + mostFrequent+ " times ");
}
}
}
Here is the output for this example.
a is showing 5 times
b is showing 1 times
c is showing 2 times
d is showing 1 times
e is showing 4 times
f is showing 2 times
g is showing 3 times
h is showing 3 times
i is showing 5 times
j is showing 1 times
k is showing 0 times
l is showing 0 times
m is showing 1 times
n is showing 5 times
o is showing 3 times
p is showing 0 times
q is showing 1 times
r is showing 4 times
s is showing 3 times
t is showing 6 times
u is showing 2 times
v is showing 1 times
w is showing 1 times
x is showing 0 times
y is showing 0 times
in this examle : t, a,i,n
I DON"T NEED SOMEONE TO COMPLETE THE PROGRAM FOR ME, however i need some ideas how to find the most four frequent character in this example.
The simplest algorithm I can think of off hand is that instead of doing multiple passes do a single pass.
Create a HashMap mapping from character to count.
Each time you find a character if it is not in the map, add it with value 1. If it is in the map increment the count.
At the end of the loop your HashMap now contains the count for every character in the String.
Take the EntrySet from the HashMap and dump it into a new ArrayList.
Sort the ArrayList with a custom comparator that uses entry.getValue() to compare by.
The first (or last depending on the sort direction) values in the ArrayList will be the ones you want.
How about this?
int count[] = new int[1000];// all with zero
Then for each character from the string, do count[]++ like this way
count['c']++;
count['A']++;
At the end, find out which index holds the maximum value. Then just print the ascii of that index.
Ok, here're some ideas:
To find the 4 most frequent characters, you must first know the frequency for all the characters. So, you would need to store the frequency somewhere. Currently, you are just printing the frequency of each character. You can't compare that way. So, you need a data structure.
Here we are talking about mapping from a chararacter to its count. Possibly you need a hash table. Look out for the hash table implementation in Java. You will find HashMap, LinkedHashMap, TreeMap.
Since you want the 4 most frequent characters, you need to order the characters by their frequency. Find out what kind of map stores the elements in sorted order. Note: You need to sort the map by values, not keys. And sort it in descending order, which is obvious.
Using Array:
There is another approach you can follow. You can create an array of size 26, assuming you only want to count frequency of alphabetic characters.
int[] frequency = new int[26];
Each index of that array correspond to the frequency of a single character. Iterate over the string, and increment the index corresponding to the current character by 1.
You can do character to index mapping like this:
int index = ch - 'a';
So, for character 'a', the index will be 0, for character 'b', index will be 1, so on. You might also want to take care of case sensitivity.
Problem with the array approach is, once you sort the array to get 4 most frequent character, you've lost the character to index mapping. So, you would need to have some way to sort the indices along with the frequency at those indices. Yes you're right. You need another array here.
But, will having 2 arrays make your problem easy? No. Because it's difficult to sort 2 arrays in synchronization. So what to do? You can create a class storing index and character. Maintain an array of that class reference, of size 26. Then find your way out to port the array approach to this array.
What data structures are you allowed to use?
I'm thinking that you can use a priority queue and increment the priority of the node containing the character that is being repeated. And when you are done, the first index would contain the node with the most repeated character
I got another idea that you can do it using an array only.
Well you have 26 letters in the Alphabet. The ASCII code of the lowercase letters range from 97 to 122(you can make every letter a lower case using .lowercase() or something similar).
So for each character, get its ASCII code, do the ASCII code -97 and increment it.
Simple,and only need an array.
Hi this a java exercise on hashing. First we have an array of N strings (1<=N<=100000), the program will find the minimum length of the consecutive subseries which contains all distinct strings which present in the original array.
For example, original array is {apple,orange,orange pear,pear apple,pear}
the consecutive subarrays can be {orange, pear, pear, apple}
so answer is 19
I've written a code which visit every element in the array and create a new hash table to find the length of the subarray which contain all the distinct strings. It becomes very very slow once N is larger than 1000. So I hope there is a faster algorithm. Thank you!
Pass through the array once, using a hash to keep track of whether you've seen a word before or not. Count the distinct words in the array by adding to your count only when you're seeing a word for the first time.
Pass through the array a second time, using a hash to keep track of the number of times you've seen each word. Also keep track of the sum of the lengths of all the words you've seen. Keep going until you have seen all words at least once.
Now move the start of the range forward as long as you can do so without reducing a word's count to zero. Remember to adjust your hash and letter count accordingly. This gives you the first range which includes every word at least once, and can't be reduced without excluding a word.
Repeatedly do the following: Move the left end of your range forward by one, and then move the right end forward until you find another instance of the word that you just booted from the left end. Each time you do this, you have another minimal range that includes each word once.
While doing steps 3 and 4, keep track of the minimum length so far, and the start and end of the associated range. You're done when you need to move the right end of your range past the end of the array. At this point you have the right minimum length, and the range that achieves it.
This runs in linear time.
Had a question regarding generating a list of 10-digit phone numbers on a PhonePad, given a set of possible moves and a starting number.
The PhonePad:
1 2 3
4 5 6
7 8 9
* 0 #
Possible moves:
The same number of moves a Queen in chess can make (so north, south, east, west, north-east, north-west, south-east, south-west... n-spaces per each orientation)
Starting number: 5
So far I have implemented the PhonePad as a 2-dimensional char array, implemented the possible moves a Queen can make in a HashMap (using offsets of x and y), and I can make the Queen move one square using one of the possible moves.
My next step is to figure out an algorithm that would give me all 10-digit permutations (phone numbers), using the possible moves in my HasMap. Repetition of a number is allowed. * and # are not allowed in the list of phone numbers returned.
I would imagine starting out with
- 5555555555, 5555555551, 5555555552... and so on up to 0,
- 5555555515, 5555555155, 5555551555.. 5155555555.. and with the numbers 2 upto 0
- 5555555151, 5555551515, 5555515155.. 5151555555.. and with numbers 2 upto 0
... and so on for a two digit combination
Any suggestions on a systematic approach generating 10-digit combinations? Even a pseudocode algorithm is appreciated! Let me know if further clarification is required.
Thanks in advance! :)
In more detail, the simplest approach would be a recursive method, roughly like:
It accepts a prefix string initially empty, a current digit (initially '5'), and a number of digits to generate (initially 10).
If the number of digits is 1, it will simply output the prefix concatenated with the current digit.
If the number of digits is greater than 1, then it will make a list of all possible next digits and call itself recursively with (prefix + (current digit), next digit, (number of digits)-1 ) as the arguments.
Other approaches, and refinements to this one, are possible of course. The "output" action could be writing to a file, adding to a field in the current class or object, or adding to a local variable collection (List or Set) that will be returned as a result. In that last case, the (ndigits>1) logic would have to combine results from multiple recursive calls to get a single return value.
I have a n-digit number composed of 1's only. I want to replace 1's with 0's in all possible combinations and store the combinations in an array.
How do I find all combinations?
I was thinking of starting with one zero and then increasing the number if zeroes that will replace 1's.
If there are 2 zeroes, for example, then i will keep the position of one zero fixed and move the other, till it reaches the end then i'd do the same for the other zero. But then i'll have to root out the repeated combinations.
Basically, this is getting complicated. I want to know a better way to find combinations!
You are simply trying to generate n-digit binary numbers. It means that you can generate 2^n different numbers. So here you go:
Firstly, calculate 2^n.
Implement a loop, starting from 0 and ending at 2^n.
For every loop variable, take the variable as a decimal and convert
it to binary and append some leading zeros (if necessary) to make it
length of n.
Add the binary to your permutations array.
I have a system that generates values in a text file which contains values as below
Line 1 : Total value possible
Line 2 : No of elements in the array
Line 3(extra lines if required) : The numbers themselves
I am now thinking of an approach where I can subtract the total value from the first integer in the array and then searching the array for the remainder and then doing the same until the pair is found.
The other approach is to add the two integers in the array on a permutation and combination basis and finding the pair.
As per my analysis the first solution is better since it cuts down on the number of iterations.Is my analysis correct here and is there any other better approach?
Edit :
I'll give a sample here to make it more clear
Line 1 : 200
Line 2=10
Line 3 : 10 20 80 78 19 25 198 120 12 65
Now the valid pair here is 80,120 since it sums up to 200 (represented in line one as Total Value possible in the input file) and their positions in the array would be 3,8.So find to this pair I listed out my approach where I take the first element and I subtract it with the Total value possible and searching the other element through basic search algorithms.
Using the example here I first take 10 and subtract it with 200 which gives 190,then I search for 190,if it is found then the pair is found otherwise continue the same process.
Your problem is vague, but if you are looking for a pair in the array that is summed to a certain number, it can be done in O(n) on average using hash tables.
Iterate the array, and for each element:
(1) Check if it is in the table. If it is - stop and return there is such a pair.
(2) Else: insert num-element to the hash table.
If your iteration terminated without finding a match - there is no such pair.
pseudo code:
checkIfPairExists(arr,num):
set <- new empty hash set
for each element in arr:
if set.contains(element):
return true
else:
set.add(num-element)
return false
The general problem of "is there a subset that sums to a certain number" is NP-Hard, and is known as the subset-sum problem, so there is no known polynomial solution to it.
If you're trying to find a pair (2) numbers which sum to a third number, in general you'll have something like:
for(i=0;i<N;i++)
for(j=i+1;j<N;j++)
if(numbers[i]+numbers[j]==result)
The answer is <i,j>
end
which is O(n^2). However, it is possible to do better.
If the list of numbers is sorted (which takes O(n log n) time) then you can try:
for(i=0;i<N;i++)
binary_search 'numbers[i+1:N]' for result-numbers[i]
if search succeeds:
The answer is <i, search_result_index>
end
That is you can step through each number and then do a binary search on the remaining list for its companion number. This takes O(n log n) time. You may need to implement the search function above yourself as built-in functions may just walk down the list in O(n) time leading to an O(n^2) result.
For both methods, you'll want to check to for the special case that the current number is equal to your result.
Both algorithms use no more space than is taken by the array itself.
Apologies for the coding style, I'm not terribly familiar with Java and it's the ideas here which are important.