How To Mark a String in a File? - java

I have a text file. It is designed as following:
#1{1,12,345,867}
#2{123, 3243534, 2132131231}
#3{234, 35345}
#4{}
...
(at the end of an each entry stands "\n")
That is an example. In fact my strings #number{number,number,...,number} could be really long...
Here is a template of a constructor of a class which works with this file:
public Submatrix(String matrixFilePath, int startPos, int endPos) throws FileNotFoundException{
}
As you can see submatrix is determined by startPos and endPos numbers of strings of a matrix.
My question is : "How could I count strings to reach the right one?"
My file can contain billions of strings. Should I use LineNumberReader->readLine() billions times?????

I would be tempted to read each line sequentially until I reached the desired line. However, since the lines are numbered in the file and delimited with newlines you can treat the file as random access and employ various strategies. For example, you get use a variant of binary search to quickly find the starting line. You can estimate the average line length from the first N lines and then try to make a more accurate guess as to the starting location, and so on.

I think the answer would be yes, you read billions of lines using readLine, unless you think it's worth the trouble using either
the strategy outlined by GregS, that is, estimating the line length and using that to start reading somewhere near the correct line, or
you use a seperate index, either at the start of the file or in a separate file which is very predictable and is something like
0000001 000000001024
0000002 000000001064
0000003 000000002010
That is, line number and starting position of that line in bytes in a strictly defined fashion which makes it possible to determine the position of the index by something like:
I want to read line 3, so I find the position of line 3 by going to position (3-1) * 20,
and read 0000003 000000002010, parse that and know that line 3 is at byte position 2010, seek that position and start reading.
Calculating or maintaining the index might not be easy if it's in the main data file, as it would mean that you precalculate positions before you actually write the file. I think I would use a seperate index file and either calculate indices during writing, or have a seperate utility to create a index file given a data file.
EDIT Added example code to demonstrate my proposal
I have made a smallish Python script which reads a data file and creates an index file. The index file contains the position of a line in the data file and is designed to be easily searchable.
This example script has index formatting of 06d, which is good enough for 999.999 line data files, for you it might have to be adjusted (don't forget INDEX_LENGTH). It creates an index file, and uses that index file to read a given line out of the data file (for demonstration purposes; you would use java for that part:)
The script is called like:
python create_index.py data.txt data.idx 3
my example data file is:
#1{1,12,345,867}
#2{123, 3243534, 2132131231}
#3{234, 35345}
#4{}
and the script itself is:
import sys
# Usage: python this_script.py datafile indexfile lineno
# indexfile will be overwritten
# lineno is the data line which will be printed using the
# index file, as a demonstration
datafilename= sys.argv[1]
indexfilename = sys.argv[2]
lineno = int(sys.argv[3])
# max 999999 lines in this format
format = "%06d\n"
INDEX_LENGTH = 6+1 # +1 for newline
def create_indexfile():
indexfile = open(indexfilename, "wB")
# Print index of first line
indexfile.write(format % 0)
f = open(datafilename, "rB")
line = f.readline()
while len(line) > 0:
indexfile.write( format % f.tell() )
line = f.readline()
f.close()
indexfile.close()
# Retrieve the data of 1 line in the data file
# using the index file
def get_line():
linepos = INDEX_LENGTH * (lineno - 1)
indexfile = open(indexfilename, "rB")
indexfile.seek(linepos)
datapos = int(indexfile.readline())
indexfile.close()
datafile = open(datafilename, "rB")
datafile.seek(datapos)
print datafile.readline()
datafile.close()
if __name__ == '__main__':
create_indexfile()
get_line()
The index file needs to be rebuild after a change in the data file. You can verify if you read the right data by comparing your line number from the data read (#3{...}) with the input line number, so it's fairly safe.
Whether you choose to use it or not, I think the example is pretty clear and easy.

#extraneon
This is the class I want to use to represent a string #number{number, number,...}
package logic;
public class DenominatedBinaryRow{
private int sn;
private BinaryRow row;
public DenominatedBinaryRow(int sn, BinaryRow row){
this.sn = sn;
this.row = row;
}
public DenominatedBinaryRow plus(int sn, DenominatedBinaryRow addend){
return new DenominatedBinaryRow(sn, this.row.plus(addend.row));
}
public int getSn(){
return this.sn;
}
public BinaryRow getRow(){
return this.row;
}
public boolean equals(Object obj){
DenominatedBinaryRow res = (DenominatedBinaryRow) obj;
if (this.getSn() == res.getSn() && this.getRow().equals(res.getRow())){
return true;
}
return false;
}
}
May be it would be efficient to serialize it, instead of converting the BinaryRow (it's implementation goes below) to a string?
If I serialize many instances of it to a file, how will I deserialize the necessary string (necessary instance) back? (Hope, I understood your question correctly)
package logic;
import java.util.*;
public class BinaryRow {
private List<Integer> row;
public BinaryRow(){
this.row = new ArrayList<Integer>();
}
public List<Integer> getRow(){
return this.row;
}
public void add(Integer arg){
this.getRow().add(arg);
}
public Integer get(int index){
return this.getRow().get(index);
}
public int size(){
return this.getRow().size();
}
public BinaryRow plus(BinaryRow addend){
BinaryRow result = new BinaryRow();
//suppose, rows are already sorted (ascending order)
int i = this.size();
int j = addend.size();
while (i > 0 && j > 0)
if (this.get(this.size() - i) < addend.get(addend.size() - j)){
result.add(this.get(this.size() - i));
i--;
}
else if (this.get(this.size() - i) > addend.get(addend.size() - j)){
result.add(addend.get(addend.size() - j));
j--;
}
else{
result.add(this.get(this.size() - i));
i--;
j--;
}
if (i > 0){
for (int k = this.size() - i; k < this.size(); k++)
result.add(this.get(k));
}
if (j > 0){
for (int k = addend.size() - j; k < addend.size(); k++)
result.add(addend.get(k));
}
return result;
}
public boolean equals(Object obj){
BinaryRow binRow = (BinaryRow) obj;
if (this.size() == binRow.size()){
for (int i = 0; i < this.size(); i++){
if (this.getRow().get(i) != binRow.getRow().get(i)) return false;
}
return true;
}
return false;
}
public long convertToDec(){
long result = 0;
for (Integer next : this.getRow()) {
result += Math.pow(2, next);
}
return result;
}
}

I am affraid you have to get to the x-th line, you will have to call readLine() x times.
This means reading all the data until you reach this line. Every character could be a line end, so there is no way going to the x-th line without reading every character before this line.

Related

Reading a text file into an array and performing a sort in Java

I have a homework question I need help with
We have been given a text file containing one word per line, of a story.
We need to read this file into an array, perform a sort on the array and then perform a binary search.
The task also says I'll need to use an overload method, but I'm unsure where
I have a bubble sort, that I've tested on a small array of characters which works
public static void bubbleV1String(String[]numbers)
{
for(int i = 0; i < numbers.length-1; i++)
{
for(int j = 0; j < numbers.length-1; j++)
{
if(numbers[j] .compareTo(numbers[j+1])>0)
{
String temp = numbers[j+1];
numbers[j+1] = numbers[j];
numbers[j] = temp;
}
}
}
}`
And my binary search which I've tested on the same small array
public static String binarySearch(int[] numbers, int wanted)
{
ArrayUtilities.bucketSort(numbers);
int left = 0;
int right = numbers.length-1;
while(left <= right)
{
int middle = (left+right)/2;
if (numbers[middle] == wanted)
{
return (wanted + " was found at position " + middle);
}
else if(numbers[middle] > wanted)
{
right = middle - 1;
}
else
{
left = middle + 1;
}
}
return wanted + " was not found";
}
Here is my code in an app class to read in a file and sort it
String[] myArray = new String[100000];
int index = 0;
File text = new File("threebears.txt");
try {
Scanner scan = new Scanner(text);
while(scan.hasNextLine() && index < 100000)
{
myArray[index] = scan.nextLine();
index++;
}
scan.close();
} catch (IOException e) {
System.out.println("Problem with file");
e.printStackTrace();
}
ArrayUtilities.bubbleV1String(myArray);
try {
FileWriter outFile = new FileWriter("sorted1.txt");
PrintWriter out = new PrintWriter(outFile);
for(String item : myArray)
{
out.println(item);
}
out.close();
} catch (IOException e) {
e.printStackTrace();
}
When I go to run the code, I get a null pointer exception and the following message
Exception in thread "main" java.lang.NullPointerException
at java.base/java.lang.String.compareTo(Unknown Source)
at parrayutilities.ArrayUtilities.bubbleV1String(ArrayUtilities.java:129)
at parrayutilities.binarySearchApp.main(binarySearchApp.java:32)
Line 129 refers to this line of code of my bubblesort
if(numbers[j] .compareTo(numbers[j+1])>0)
And line 32 refers to the piece of code where I call the bubblesort
ArrayUtilities.bubbleV1String(myArray);
Does anyone know why I'm getting a null pointer exception when I've tested the bubblesort on a small string array? I'm thinking possibly something to do with the overloaded method mentioned earlier but I'm not sure
Thanks
You are creating an array of length 100000 and fill the lines as they are read. Initially all elements will be null and after reading the file quite a number of them is likely to still be null. Thus when you sort the array numbers[j] will eventually be a null element and thus calling compareTo(...) on that will throw a NullPointerException.
To fix that you need to know where in the array the non-null part ends. You are already tracking the number of read lines in index so after reading the file that would be the index of the first null element.
Now you basically have 2 options:
Pass index to bubbleV1String() and do for(int i = 0; i < index-1; i++) etc.
Make a copy of the array after reading the lines and before sorting it:
String[] copy = new String[index];
StringSystem.arrayCopy(myArray,0,copy,0,index);
//optional but it can make the rest of the code easier to handle: replace myArray with copy
myArray = copy;
Finally you could also use a List<String> which would be better than using arrays but I assume that's covered by a future lesson.
It seems that you have some null values in your numbers array. Try to debug your code (or just print array's content) and verify what you have there. Hard to tell anything not knowing what is in your input file.
Method overloading is when multiple functions have the same name but different parameters.
e.g. (taken from wikipedia - function overloading)
// volume of a cube
int volume(const int s)
{
return s*s*s;
}
// volume of a cylinder
double volume(const double r, const int h)
{
return 3.1415926*r*r*static_cast<double>(h);
}
Regarding your null pointer exception, you've created an array of size 100000, but it's likely you haven't read in enough information to fill that size. Therefore some of the array is empty when you try to access it. There are multiple ways you can go about this, off the top of my head that includes array lists, dynamic arrays or even moving the contents of the array to another one, once you know the size of the contents (however this is inefficient).

search a line in preprocessed big text file

I have a data file which contains 100,000+ lines, each line just contains two fields, key and value split by comma, and all the keys are unique. I want to query value by key from this file. Loading it to a map is out of question as that consumes too much memory(code will run on embedded device) and I don't want DB involved. What I do so far is to preprocess the file in my PC, i.e., sort the lines, then use binary search like below in the preprocessed file:
public long findKeyOffset(RandomAccessFile raf, String key)
throws IOException {
int blockSize = 8192;
long fileSize = raf.length();
long min = 0;
long max = (long) fileSize / blockSize;
long mid;
String line;
while (max - min > 1) {
mid = min + (long) ((max - min) / 2);
raf.seek(mid * blockSize);
if (mid > 0)
line = raf.readLine(); // probably a partial line
line = raf.readLine();
String[] parts = line.split(",");
if (key.compareTo(parts[0]) > 0) {
min = mid;
} else {
max = mid;
}
}
// find the right line
min = min * blockSize;
raf.seek(min);
if (min > 0)
line = raf.readLine();
while (true) {
min = raf.getFilePointer();
line = raf.readLine();
if (line == null)
break;
String[] parts = line.split(",");
if (line.compareTo(parts[0]) >= 0)
break;
}
raf.seek(min);
return min;
}
I think there are better solutions than this. Can anyone give me some enlightenment?
Data is immutable and keys are unique (as mentioned in the comments on the question).
A simple solution: Write your own hashing code to map key with the line number.
This means, leave the sorting and instead write your data to the file in the order that your hashing algorithm tells.
When key is queried, you hash the key, get the specific line number and then read the value.
In theory, you have an O(1) solution to your problem.
Ensure that the hashing algorithm has less collision, but I think that depending upon your exact case, a few collisions should be fine. Example: 3 keys map to the same line number so you write all three of them on the same line and when any of the collided keys is searched, you read all 3 entries from that line. Then do a linear (aka O(3) aka constant time in this case) search on the entire line.
An easy algorithm to optimise performance for your specific constraints:
let n be the number of lines in the original, immutable, sorted file.
let k < n be a number (we'll discuss ideal number later).
Divide the file into k files, with approximately equal number of lines in each (so each file has n/k lines). the files will be referred to as F1...Fk. If you prefer to keep the original file intact, just consider F1...Fk as line numbers within the file, cutting it into segments.
create a new file called P with k lines, each line i is the first key of Fi.
when looking for a key, first go with binary search over P using O(logk) to find which file /segment (F1...Fk) you need to go to. Then go to that file/segment and search within.
If k is big enough, then size of Fi (n/k) will be small enough to load to a HashMap and retrieve key with O(1). If it is still not practical, do a binary search of O(log(n/k)).
The total search will be O(logk)+O(log(n/k)), which is an improvement on O(logn) which is your original solution.
I would suggest to find a k that would be big enough to allow you to load a specific Fi file/segment into a HashMap, and not too big to fill up space on your device. The most balanced k it sqrt(n), which makes the solution run in O(log(sqrt(n))), but that may be quite a large P file. If you get a k which allows you to load P and Fi into a HashMap for O(1) retrieve, that would be the best solution.
What about this?
#include <iostream>
#include <fstream>
#include <boost/algorithm/string.hpp>
#include <vector>
using namespace std;
int main(int argc, char *argv[])
{
ifstream f(argv[1],ios::ate);
if (!f.is_open())
return 0;
string key(argv[2]),value;
int max = f.tellg();
int min = 0,mid = 0;
string s;
while(max-min>1)
{
mid = min + (max - min )/2;
f.seekg(mid);
f >> s;
std::vector<std::string> strs;
if (!f)
{
break;
}
if (mid)
{
f >> s;
}
boost::split(strs, s, boost::is_any_of(","));
int comp = key.compare(strs[0]);
if ( comp < 0)
{
max = mid;
}
else if (comp > 0)
{
min = mid;
}
else
{
value = strs[1];
break;
}
}
cout<<"key "<<key;
if (!value.empty())
{
cout<<" found! value = "<<value<<endl;
}
else
{
cout<<" not found..."<<endl;
}
f.close();
return 0;
}

Recursive backtracking in Java for solving a crossword

I need to solve a crossword given the initial grid and the words (words can be used more than once or not at all).
The initial grid looks like that:
++_+++
+____+
___+__
+_++_+
+____+
++_+++
Here is an example word list:
pain
nice
pal
id
The task is to fill the placeholders (horizontal or vertical having length > 1) like that:
++p+++
+pain+
pal+id
+i++c+
+nice+
++d+++
Any correct solution is acceptable, and it's guaranteed that there's a solution.
In order to start to solve the problem, I store the grid in 2-dim. char array and I store the words by their length in the list of sets: List<Set<String>> words, so that e.g. the words of length 4 could be accessed by words.get(4)
Then I extract the location of all placeholders from the grid and add them to the list (stack) of placeholders:
class Placeholder {
int x, y; //coordinates
int l; // the length
boolean h; //horizontal or not
public Placeholder(int x, int y, int l, boolean h) {
this.x = x;
this.y = y;
this.l = l;
this.h = h;
}
}
The main part of the algorithm is the solve() method:
char[][] solve (char[][] c, Stack<Placeholder> placeholders) {
if (placeholders.isEmpty())
return c;
Placeholder pl = placeholders.pop();
for (String word : words.get(pl.l)) {
char[][] possibleC = fill(c, word, pl); // description below
if (possibleC != null) {
char[][] ret = solve(possibleC, placeholders);
if (ret != null)
return ret;
}
}
return null;
}
Function fill(c, word, pl) just returns a new crossword with the current word written on the current placeholder pl. If word is incompatible with pl, then function returns null.
char[][] fill (char[][] c, String word, Placeholder pl) {
if (pl.h) {
for (int i = pl.x; i < pl.x + pl.l; i++)
if (c[pl.y][i] != '_' && c[pl.y][i] != word.charAt(i - pl.x))
return null;
for (int i = pl.x; i < pl.x + pl.l; i++)
c[pl.y][i] = word.charAt(i - pl.x);
return c;
} else {
for (int i = pl.y; i < pl.y + pl.l; i++)
if (c[i][pl.x] != '_' && c[i][pl.x] != word.charAt(i - pl.y))
return null;
for (int i = pl.y; i < pl.y + pl.l; i++)
c[i][pl.x] = word.charAt(i - pl.y);
return c;
}
}
Here is the full code on Rextester.
The problem is that my backtracking algorithm doesn't work well. Let's say this is my initial grid:
++++++
+____+
++++_+
++++_+
++++_+
++++++
And this is the list of words:
pain
nice
My algorithm will put the word pain vertically, but then when realizing that it was a wrong choice it will backtrack, but by that time the initial grid will be already changed and the number of placeholders will be reduced. How do you think the algorithm can be fixed?
This can be solved in 2 ways:
Create a deep copy of the matrix at the start of fill, modify and return that (leaving the original intact).
Given that you already pass around the matrix, this wouldn't require any other changes.
This is simple but fairly inefficient as it requires copying the matrix every time you try to fill in a word.
Create an unfill method, which reverts the changes made in fill, to be called at the end of each for loop iteration.
for (String word : words.get(pl.l)) {
if (fill(c, word, pl)) {
...
unfill(c, word, pl);
}
}
Note: I changed fill a bit as per my note below.
Of course just trying to erase all letter may erase letters of other placed words. To fix this, we can keep a count of how many words each letter is a part of.
More specifically, have a int[][] counts (which will also need to be passed around or be otherwise accessible) and whenever you update c[x][y], also increment counts[x][y]. To revert a placement, decrease the count of each letter in that placement by 1 and only remove letters with a count of 0.
This is somewhat more complex, but much more efficient than the above approach.
In terms of code, you might put something like this in fill:
(in the first part, the second is similar)
for (int i = pl.x; i < pl.x + pl.l; i++)
counts[pl.y][i]++;
And unfill would look something like this: (again for just the first part)
for (int i = pl.x; i < pl.x + pl.l; i++)
counts[pl.y][i]--;
for (int i = pl.x; i < pl.x + pl.l; i++)
if (counts[pl.y][i] == 0)
c[pl.y][i] = '_';
// can also just use a single loop with "if (--counts[pl.y][i] == 0)"
Note that, if going for the second approach above, it might make more sense to simply have fill return a boolean (true if successful) and just pass c down to the recursive call of solve. unfill can return void, since it can't fail, unless you have a bug.
There is only a single array that you're passing around in your code, all you're doing is changing its name.
See also Is Java "pass-by-reference" or "pass-by-value"?
You identified it yourself:
it will backtrack, but by that time the initial grid will be already
changed
That grid should be a local matrix, not a global one. That way, when you back up with a return of null, the grid from the parent call is still intact, ready to try the next word in the for loop.
Your termination logic is correct: when you find a solution, immediately pass that grid back up the stack.

Java - Return random index of specific character in string

So given a string such as: 0100101, I want to return a random single index of one of the positions of a 1 (1, 5, 6).
So far I'm using:
protected int getRandomBirthIndex(String s) {
ArrayList<Integer> birthIndicies = new ArrayList<Integer>();
for (int i = 0; i < s.length(); i++) {
if ((s.charAt(i) == '1')) {
birthIndicies.add(i);
}
}
return birthIndicies.get(Randomizer.nextInt(birthIndicies.size()));
}
However, it's causing a bottle-neck on my code (45% of CPU time is in this method), as the strings are over 4000 characters long. Can anyone think of a more efficient way to do this?
If you're interested in a single index of one of the positions with 1, and assuming there is at least one 1 in your input, you can just do this:
String input = "0100101";
final int n=input.length();
Random generator = new Random();
char c=0;
int i=0;
do{
i = generator.nextInt(n);
c=input.charAt(i);
}while(c!='1');
System.out.println(i);
This solution is fast and does not consume much memory, for example when 1 and 0 are distributed uniformly. As highlighted by #paxdiablo it can perform poorly in some cases, for example when 1 are scarce.
You could use String.indexOf(int) to find each 1 (instead of iterating every character). I would also prefer to program to the List interface and to use the diamond operator <>. Something like,
private static Random rand = new Random();
protected int getRandomBirthIndex(String s) {
List<Integer> birthIndicies = new ArrayList<>();
int index = s.indexOf('1');
while (index > -1) {
birthIndicies.add(index);
index = s.indexOf('1', index + 1);
}
return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}
Finally, if you need to do this many times, save the List as a field and re-use it (instead of calculating the indices every time). For example with memoization,
private static Random rand = new Random();
private static Map<String, List<Integer>> memo = new HashMap<>();
protected int getRandomBirthIndex(String s) {
List<Integer> birthIndicies;
if (!memo.containsKey(s)) {
birthIndicies = new ArrayList<>();
int index = s.indexOf('1');
while (index > -1) {
birthIndicies.add(index);
index = s.indexOf('1', index + 1);
}
memo.put(s, birthIndicies);
} else {
birthIndicies = memo.get(s);
}
return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}
Well, one way would be to remove the creation of the list each time, by caching the list based on the string itself, assuming the strings are used more often than they're changed. If they're not, then caching methods won't help.
The caching method involves, rather than having just a string, have an object consisting of:
current string;
cached string; and
list based on the cached string.
You can provide a function to the clients to create such an object from a given string and it would set the string and the cached string to whatever was passed in, then calculate the list. Another function would be used to change the current string to something else.
The getRandomBirthIndex() function then receives this structure (rather than the string) and follows the rule set:
if the current and cached strings are different, set the cached string to be the same as the current string, then recalculate the list based on that.
in any case, return a random element from the list.
That way, if the list changes rarely, you avoid the expensive recalculation where it's not necessary.
In pseudo-code, something like this should suffice:
# Constructs fastie from string.
# Sets cached string to something other than
# that passed in (lazy list creation).
def fastie.constructor(string s):
me.current = s
me.cached = s + "!"
# Changes current string in fastie. No list update in
# case you change it again before needing an element.
def fastie.changeString(string s):
me.current = s
# Get a random index, will recalculate list first but
# only if necessary. Empty list returns index of -1.
def fastie.getRandomBirthIndex()
me.recalcListFromCached()
if me.list.size() == 0:
return -1
return me.list[random(me.list.size())]
# Recalculates the list from the current string.
# Done on an as-needed basis.
def fastie.recalcListFromCached():
if me.current != me.cached:
me.cached = me.current
me.list = empty
for idx = 0 to me.cached.length() - 1 inclusive:
if me.cached[idx] == '1':
me.list.append(idx)
You also have the option of speeding up the actual searching for the 1 character by, for example, useing indexOf() to locate them using the underlying Java libraries rather than checking each character individually in your own code (again, pseudo-code):
def fastie.recalcListFromCached():
if me.current != me.cached:
me.cached = me.current
me.list = empty
idx = me.cached.indexOf('1')
while idx != -1:
me.list.append(idx)
idx = me.cached.indexOf('1', idx + 1)
This method can be used even if you don't cache the values. It's likely to be faster using Java's probably-optimised string search code than doing it yourself.
However, you should keep in mind that your supposed problem of spending 45% of time in that code may not be an issue at all. It's not so much the proportion of time spent there as it is the absolute amount of time.
By that, I mean it probably makes no difference what percentage of the time being spent in that function if it finishes in 0.001 seconds (and you're not wanting to process thousands of strings per second). You should only really become concerned if the effects become noticeable to the user of your software somehow. Otherwise, optimisation is pretty much wasted effort.
You can even try this with best case complexity O(1) and in worst case it might go to O(n) or purely worst case can be infinity as it purely depends on Randomizer function that you are using.
private static Random rand = new Random();
protected int getRandomBirthIndex(String s) {
List<Integer> birthIndicies = new ArrayList<>();
int index = s.indexOf('1');
while (index > -1) {
birthIndicies.add(index);
index = s.indexOf('1', index + 1);
}
return birthIndicies.get(rand.nextInt(birthIndicies.size()));
}
If your Strings are very long and you're sure it contains a lot of 1s (or the String you're looking for), its probably faster to randomly "poke around" in the String until you find what you are looking for. So you save the time iterating the String:
String s = "0100101";
int index = ThreadLocalRandom.current().nextInt(s.length());
while(s.charAt(index) != '1') {
System.out.println("got not a 1, trying again");
index = ThreadLocalRandom.current().nextInt(s.length());
}
System.out.println("found: " + index + " - " + s.charAt(index));
I'm not sure about the statistics, but it rare cases might happen that this Solution take much longer that the iterating solution. On case is a long String with only a very few occurrences of the search string.
If the Source-String doesn't contain the search String at all, this code will run forever!
One possibility is to use a short-circuited Fisher-Yates style shuffle. Create an array of the indices and start shuffling it. As soon as the next shuffled element points to a one, return that index. If you find you've iterated through indices without finding a one, then this string contains only zeros so return -1.
If the length of the strings is always the same, the array indices can be static as shown below, and doesn't need reinitializing on new invocations. If not, you'll have to move the declaration of indices into the method and initialize it each time with the correct index set. The code below was written for strings of length 7, such as your example of 0100101.
// delete this and uncomment below if string lengths vary
private static int[] indices = { 0, 1, 2, 3, 4, 5, 6 };
protected int getRandomBirthIndex(String s) {
int tmp;
/*
* int[] indices = new int[s.length()];
* for (int i = 0; i < s.length(); ++i) indices[i] = i;
*/
for (int i = 0; i < s.length(); i++) {
int j = randomizer.nextInt(indices.length - i) + i;
if (j != i) { // swap to shuffle
tmp = indices[i];
indices[i] = indices[j];
indices[j] = tmp;
}
if ((s.charAt(indices[i]) == '1')) {
return indices[i];
}
}
return -1;
}
This approach terminates quickly if 1's are dense, guarantees termination after s.length() iterations even if there aren't any 1's, and the locations returned are uniform across the set of 1's.

checking if my array elements meet requirements

I need to create a method which checks each element in my array to see if it is true or false, each element holds several values such as mass, formula, area etc for one compound, and in total there are 30 compounds (so the array has 30 elements). I need an algorithm to ask if mass < 50 and area > 5 = true .
My properties class looks like:
public void addProperty (Properties pro )
{
if (listSize >=listlength)
{
listlength = 2 * listlength;
TheProperties [] newList = new TheProperties [listlength];
System.arraycopy (proList, 0, newList, 0, proList.length);
proList = newList;
}
//add new property object in the next position
proList[listSize] = pro;
listSize++;
}
public int getSize()
{
return listSize;
}
//returns properties at a paticular position in list numbered from 0
public TheProperties getProperties (int pos)
{
return proList[pos];
}
}
and after using my getters/setters from TheProperties I put all the information in the array using the following;
TheProperties tp = new properties();
string i = tp.getMass();
String y = tp.getArea();
//etc
theList.addProperty(tp);
I then used the following to save an output of the file;
StringBuilder builder = new StringBuilder();
for (int i=0; i<theList.getSize(); i++)
{
if(theList.getProperties(i).getFormatted() != null)
{
builder.append(theList.getProperties(i).getFormatted());
builder.append("\n");
}
}
SaveFile sf = new SaveFile(this, builder.toString());
I just cant work out how to interrogate each compound individually for whether they reach the value or not, reading a file in and having a value for each one which then gets saved has worked, and I can write an if statement for the requirements to check against, but how to actually check the elements for each compound match the requirements? I am trying to word this best I can, I am still working on my fairly poor java skills.
Not entirely sure what you are after, I found your description quite hard to understand, but if you want to see if the mass is less than 50 and the area is greater than 5, a simple if statement, like so, will do.
if (tp.getMass() < 50 && tp.getArea() > 5) {}
Although, you will again, have to instantiate tp and ensure it has been given its attributes through some sort of constructor.
Lots of ways to do this, which makes it hard to answer.
You could check at creation time, and just not even add the invalid ones to the list. That would mean you only have to loop once.
If you just want to save the output to the file, and not do anything else, I suggest you combine the reading and writing into one function.
Open up the read and the write file
while(read from file){
check value is ok
write to file
}
close both files
The advantage of doing it this way are:
You only loop through once, not three times, so it is faster
You never have to store the whole list in memory, so you can handle really large files, with thousands of elements.
In case the requirements changes, you can write method that uses Predicate<T>, which is a FunctionalInterface designed for such cases (functionalInterfaces was introduced in Java 8):
// check each element of the list by custom condition (predicate)
public static void checkProperties(TheList list, Predicate<TheProperties> criteria) {
for (int i=0; i < list.getSize(); i++) {
TheProperties tp = list.get(i);
if (!criteria.apply(tp)) {
throw new IllegalArgumentException(
"TheProperty at index " + i + " does not meet the specified criteria");
}
}
}
If you want to check if mass < 50 and area > 5, you would write:
checkProperties(theList, new Predicate<TheProperties> () {
#Override
public boolean apply(TheProperties tp) {
return tp.getMass() < 50 && tp.getArea() > 5;
}
}
This can be shortened by using lambda expression:
checkProperties(theList, (TheProperties tp) -> {
return tp.getMass() < 50 && tp.getArea() > 5;
});

Categories