So I have this method that should read a file and detect if the character after the symbol is a number or a word. If it is a number, I want to delete the symbol in front of it, translate the number into binary and replace it in the file. If it is a word, I want to set the characters to number 16 at first, but then, if another word is used, I want to add the 1 to the original number.
Here's the input that i'm using:
Here's my method:
try {
ReadFile files = new ReadFile(file.getPath());
String[] anyLines = files.OpenFile();
int i;
int wordValue = 16;
// to keep track words that are already used
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String line : anyLines) {
if (!line.startsWith("#")) {
continue;
}
line = line.substring(1);
Integer binaryValue = null;
if (line.matches("\\d+")) {
binaryValue = Integer.parseInt(line);
}
else if (line.matches("\\w+")) {
binaryValue = wordValueMap.get(line);
// if the map doesn't contain the word value, then assign and store it
if (binaryValue == null) {
binaryValue = wordValue;
wordValueMap.put(line, binaryValue);
++wordValue;
}
}
// --> I want to replace with this
System.out.println(Integer.toBinaryString(binaryValue));
}
for (i=0; i<anyLines.length; i++) {
// --> Here are a bunch of instructions that replace certain strings - they are the lines after # symbols <--
// --> I'm not going to list them ... <--
System.out.println(anyLines[i]);
So the question is, how do I replace those lines that start with ("#" line-by-line), in order?
I basically want the output to look like this:
101
1110110000010000
10000
1110001100001000
10001
1110101010001000
10001
1111000010001000
10000
1110001110001000
10010
1110001100000110
10011
1110101010000111
10010
1110101010000111
I don't quite understand the logic. If you are simply trying to replace all the # symbols in order, why not read all the numbers into a List in order, until you see an # symbol. Then you can start replacing them in order from that List (or Queue since you want first in first out). Does that satisfy your requirements?
If you must keep the wordValueMap, the code below should loop through the lines after you have populated the wordValueMap and write them to the console. It uses the same logic that you used to populate the map in the first place and outputs the values that should be replaced.
boolean foundAt = false;
for (i=0; i<anyLines.length; i++) {
// --> Here are a bunch of instructions that replace certain strings - they are the lines after # symbols <--
// --> I'm not going to list them ... <--
if (anyLines[i].startsWith("#")) {
foundAt = true;
String theLine = anyLines[i].substring(1);
Integer theInt = null;
if (theLine.matches("\\d+")) {
theInt = Integer.parseInt(theLine);
}
else {
theInt = wordValueMap.get(anyLines[i].substring(1));
}
if(theInt!=null) {
System.out.println(Integer.toBinaryString(theInt));
}
else {
//ERROR
}
}
else if(foundAt) {
System.out.println(anyLines[i]);
}
}
When I run this loop, I get the output you were looking for from your question:
101
1110110000010000
10000
1110001100001000
10001
1110101010001000
10001
1111000010001000
10000
1110001110001000
10010
1110001100000110
10011
1110101010000111
10010
1110101010000111
I hope this helps, but take a look at my question above to see if you can do this in a more straight forward manner.
Related
I am working on an exercise with the following criteria:
"The input consists of pairs of tokens where each pair begins with the type of ticket that the person bought ("coach", "firstclass", or "discount", case-sensitively) and is followed by the number of miles of the flight."
The list can be paired -- coach 1500 firstclass 2000 discount 900 coach 3500 -- and this currently works great. However, when the String and int value are split like so:
firstclass 5000 coach 1500 coach
100 firstclass
2000 discount 300
it breaks entirely. I am almost certain that it has something to do with me using this format (not full)
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ")
while(token.hasMoreTokens())
{
String ticketClass = token.nextToken().toLowerCase();
int count = Integer.parseInt(token.nextToken());
...
}
}
because it will always read the first value as a String and the second value as an integer. I am very lost on how to keep track of one or the other while going to read the next line. Any help is truly appreciated.
Similar (I think) problems:
Efficient reading/writing of key/value pairs to file in Java
Java-Read pairs of large numbers from file and represent them with linked list, get the sum and product of each pair
Reading multiple values in multiple lines from file (Java)
If you can afford to read the text file in all at once as a very long String, simply use the built-in String.split() with the regex \\s+, like so
String[] tokens = fileAsString.split("\\s+");
This will split the input file into tokens, assuming the tokens are separated by one or more whitespace characters (a whitespace character covers newline, space, tab, and carriage return). Even and odd tokens are ticket types and mile counts, respectively.
If you absolutely have to read in line-by-line and use StringTokenizer, a solution is to count number of tokens in the last line. If this number is odd, the first token in the current line would be of a different type of the first token in the last line. Once knowing the starting type of the current line, simply alternating types from there.
int tokenCount = 0;
boolean startingType = true; // true for String, false for integer
boolean currentType;
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ");
startingType = startingType ^ (tokenCount % 2 == 1); // if tokenCount is odd, the XOR ^ operator will flip the starting type of this line
tokenCount = 0;
while(token.hasMoreTokens())
{
tokenCount++;
currentType = startingType ^ (tokenCount % 2 == 0); // alternating between types in current line
if (currentType) {
String ticketClass = token.nextToken().toLowerCase();
// do something with ticketClass here
} else {
int mileCount = Integer.parseInt(token.nextToken());
// do something with mileCount here
}
...
}
}
I found another way to do this problem without using either the StringTokenizer or the regex...admittedly I had trouble with the regular expressions haha.
I declare these outside of the try-catch block because I want to use them in both my finally statement and return the points:
int points = 0;
ArrayList<String> classNames = new ArrayList<>();
ArrayList<Integer> classTickets = new ArrayList<>();
Then inside my try-statement, I declare the index variable because I won't need that outside of this block. That variable increases each time a new element is read. Odd elements are read as ticket classes and even elements are read as ticket prices:
try
{
int index = 0;
// read till the file is empty
while(fileScanner.hasNext())
{
// first entry is the ticket type
if(index % 2 == 0)
classNames.add(fileScanner.next());
// second entry is the number of points
else
classTickets.add(Integer.parseInt(fileScanner.next()));
index++;
}
}
You can either catch it here like this or use throws NoSuchElementException in your method declaration -- As long as you catch it on your method call
catch(NoSuchElementException noElement)
{
System.out.println("<###-NoSuchElementException-###>");
}
Then down here, loop through the number of elements. See which flight class it is and multiply the ticket count respectively and return the points outside of the block:
finally
{
for(int i = 0; i < classNames.size(); i++)
{
switch(classNames.get(i).toLowerCase())
{
case "firstclass": // 2 points for first
points += 2 * classTickets.get(i);
break;
case "coach": // 1 point for coach
points += classTickets.get(i);
break;
default:
// budget gets nothing
}
}
}
return points;
The regex seems like the most convenient way, but this was more intuitive to me for some reason. Either way, I hope the variety will help out.
simply use the built-in String.split() - #bui
I was finally able to wrap my head around regular expressions, but \s+ was not being recognized for some reason. It kept giving me this error message:
Invalid escape sequence (valid ones are \b \t \n \f \r " ' \ )Java(1610612990)
So when I went through with those characters instead, I was able to write this:
int points = 0, multiplier = 0, tracker = 0;
while(fileScanner.hasNext())
{
String read = fileScanner.next().split(
"[\b \t \n \f \r \" \' \\ ]")[0];
if(tracker % 2 == 0)
{
if(read.toLowerCase().equals("firstclass"))
multiplier = 2;
else if(read.toLowerCase().equals("coach"))
multiplier = 1;
else
multiplier = 0;
}else
{
points += multiplier * Integer.parseInt(read);
}
tracker++;
}
This code goes one entry at a time instead of reading a whole array void of whitespace as a work-around for that error message I was getting. If you could show me what the code would look like with String[] tokens = fileAsString.split("\s+"); instead I would really appreciate it :)
you need to add another "\" before "\s" to escape the slash before "s" itself – #bui
I have a code to remove duplicate words from a string. Lets say i have:
This is serious serious work. I apply the code and get: This is serious work
This is the code:
return Arrays.stream(input.split(" ")).distinct().collect(Collectors.joining(" "));
Now i want to add new constraints that is if the string/line is longer than 78 characters, break and indent it where it makes sense so the line does not run longer than 78 characters. Example:
This one is a very long line that runs off the right side because it is longer than 78 characters long
It should then be
This one is a very long line that runs off the right side because it is longer
than 78 characters long
I cant find a solution to this. It was brought to my attention that there is a possible duplicate to my question. I cant find my answer there. I need to be able to indent.
You could create a StringBuilder off of the String and then insert a newline and tab at the last word break after 78 characters. You can find the last word break to insert the newline/tab by getting the substring of the first 78 characters, and then finding the index of the last space:
StringBuilder sb = new StringBuilder(Arrays.stream(input.split(" ")).distinct().collect(Collectors.joining(" ")));
if(sb.length() > 78) {
int lastWordBreak = sb.substring(0, 78).lastIndexOf(" ");
sb.insert(lastWordBreak , "\n\t");
}
return sb.toString();
Output:
This one is a very long line that runs off the right side because it longer
than 78 characters
Also your Stream does not do what you want it to. Yes it removes duplicate words but.. it removes duplicate words. So for the String:
This is a great sentence. It is a great example.
It would remove the duplicate is, great and a, and return
This is a great sentence. It example.
To only remove consecutive duplicate words you can look at the following solution:
Removing consecutive duplicates words out of text using Regex and displaying the new text
Alternatively you could create your own them by splitting the text into words, and comparing the current element to the one ahead of it to remove the consecutive duplicate words
Instead of using
Collectors.joining(" ")
it is possible to write a custom collector that adds new lines and indentation at proper places.
Let's introduce a LineWrapper class, which contains indent and limit fields:
public class LineWrapper {
private final int limit;
private final String indent;
The default constructor sets the fields to reasonable default values.
Note how the indent starts with a new line character.
public LineWrapper() {
limit = 78;
indent = "\n ";
}
A custom constructor allows the client to specify limit and indent:
public LineWrapper(int limit, String indent) {
if (limit <= 0) {
throw new IllegalArgumentException("limit");
}
if (indent == null || !indent.matches("\\n *")) {
throw new IllegalArgumentException("indent");
}
this.limit = limit;
this.indent = indent;
}
Following is a regex used to split the input around one or more spaces. This makes sure that the split will not produce empty Strings:
private static final String SPACES = " +";
The apply method splits the input and collects the words into lines of the specified maximum length, indents the lines and removes duplicate consecutive words. Note how duplicates are not removed using the Stream.distinct method, since it also removes duplicates that are not consecutive.
public String apply(String input) {
return Arrays.stream(input.split(SPACES)).collect(toWrappedString());
}
The toWrappedString method returns a collector that accumulates the words in a new ArrayList, and uses the following methods:
addIfDistinct: to add the words to the ArrayList
combine: to merge two array lists
wrap: to split and indent the lines
.
Collector<String, ArrayList<String>, String> toWrappedString() {
return Collector.of(ArrayList::new,
this::addIfDistinct,
this::combine,
this::wrap);
}
The addIfDistinct adds the word to the accumulator ArrayList if it is different than the previous word.
void addIfDistinct(ArrayList<String> accumulator, String word) {
if (!accumulator.isEmpty()) {
String lastWord = accumulator.get(accumulator.size() - 1);
if (!lastWord.equals(word)) {
accumulator.add(word);
}
} else {
accumulator.add(word);
}
}
The combine method adds all words from the second ArrayList to the first one. It also makes sure that the first word of the second ArrayList does not duplicate the last word of the first ArrayList.
ArrayList<String> combine(ArrayList<String> words,
ArrayList<String> moreWords) {
List<String> other = moreWords;
if (!words.isEmpty() && !other.isEmpty()) {
String lastWord = words.get(words.size() - 1);
if (lastWord.equals(other.get(0))) {
other = other.subList(1, other.size());
}
}
words.addAll(other);
return words;
}
Finally the wrap method appends all words to a StringBuffer, inserting the indent when the line length limit is reached:
String wrap(ArrayList<String> words) {
StringBuilder result = new StringBuilder();
if (!words.isEmpty()) {
String firstWord = words.get(0);
result.append(firstWord);
int lineLength = firstWord.length();
for (String word : words.subList(1, words.size())) {
//add 1 to the word length,
//to account for the space character
int len = word.length() + 1;
if (lineLength + len <= limit) {
result.append(' ');
result.append(word);
lineLength += len;
} else {
result.append(indent);
result.append(word);
//subtract 1 from the indent length,
//because the new line does not count
lineLength = indent.length() - 1 + word.length();
}
}
}
return result.toString();
}
I am working on a project where I have to parse a text file and divide the strings into substrings of a length that the user specifies. Then I need to detect the duplicates in the results.
So the original file would look like this:
ORIGIN
1 gatccaccca tctcggtctc ccaaagtgct aggattgcag gcctgagcca ccgcgcccag
61 ctgccttgtg cttttaatcc cagcactttc agaggccaag gcaggcgatc agctgaggtc
121 aggagttcaa gaccagcctg gccaacatgg tgaaacccca tctctaatac aaatacaaaa
181 aaaaaacaaa aaacgttagc caggaatgag gcccggtgct tgtaatccta aggaaggaga
241 ccaccactcc tcctgctgcc cttcccttcc ccacaccgct tccttagttt ataaaacagg
301 gaaaaaggga gaaagcaaaa agcttaaaaa aaaaaaaaaa cagaagtaag ataaatagct
I loop over the file and generate a line of the strings then use line.toCharArray() to slide over the resulting line and divide according to the user specification. So if the substrings are of length 4 the result would look like this:
GATC
ATCC
TCCA
CCAC
CACC
ACCC
CCCA
CCAT
CATC
ATCT
TCTC
CTCG
TCGG
CGGT
GGTC
GTCT
TCTC
CTCC
TCCC
CCCA
CCAA
Here is my code for splitting:
try {
scanner = new Scanner(toSplit);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
char[] chars = line.toCharArray();
for (int i = 0; i < chars.length - (k - 1); i++) {
String s = "";
for(int j = i; j < i + k; j++) {
s += chars[j];
}
if (!s.contains("N")) {
System.out.println(s);
}
}
}
}
My question is: given that the input file can be huge, how can I detect duplicates in the results?
If You want to check duplicates a Set would be a good choice to hold and test data. Please tell in which context You want to detect the duplicates: words, lines or "output chars".
You can use a bloom filter or a table of hashes to detect possible duplicates and then make a second pass over the file to check if those "duplicate candidates" are true duplicates or not.
Example with hash tables:
// First we make a list of candidates so we count the times a hash is seen
int hashSpace = 65536;
int[] substringHashes = new int[hashSpace];
for (String s: tokens) {
substringHashes[s.hashCode % hashSpace]++; // inc
}
// Then we look for words that have a hash that seems to be repeated and actually see if they are repeated. We use a set but only of candidates so we save a lot of memory
Set<String> set = new HashSet<String>();
for (String s: tokens) {
if (substringHashes[s.hashCode % hashSpace] > 1) {
boolean repeated = !set.add(s);
if (repeated) {
// TODO whatever
}
}
}
You could do something like this:
Map<String, Integer> substringMap = new HashMap<>();
int index = 0;
Set<String> duplicates = new HashSet<>();
For each substring you pull out of the file, add it to substringMap only if it's not a duplicate (or if it is a duplicate, add it to duplicates):
if (substringMap.putIfAbsent(substring, index) == null) {
++index;
} else {
duplicates.add(substring);
}
You can then pull out all the substrings with ease:
String[] substringArray = new String[substringMap.size()];
for (Map.Entry<String, Integer> substringEntry : substringMap.entrySet()) {
substringArray[substringEntry.getValue()] = substringEntry.getKey();
}
And voila! An array of output in the original order with no duplicates, plus a set of all the substrings that were duplicates, with very nice performance.
for(int j = 1;j<fileArray.size();j++) {
if(str.contains(fileArray.get(end+j))) {
}
}
(assume end is some number such as 30).
The goal of this part is when having a window length of 30 and a fileArray size > 30, check if theres anything after index 30 that matches whatever is inside the window.
ex: "i like to eat piesss aaaabbbbpiesssbbbb"
starting from the beginning of the string add the first 17 characters to a arraylist called window. then i check the rest of the string starting from right after window to see if there's anything that matches. space doesnt match so you add it to the output. keep checking then you see "piesss" matches. Then i replace the second "piesss" with wherever the first "piesss" occurs.
So right now im using fileArray.get(end+j) to check if there's anything that matches within my string(str) except this doesn't really work. Is there a way I could fix this code segment?
The replacement part of your question is still unclear. As is any reasoning to use an ArrayList. I've written some code that does a 5 character window search for a match after splitting the string you provided. Note how with the 30 and 17 values you gave nothing is ever matched (see commented out code). However with tweaked values some matches can be found.
public static void main(String[] args) {
// 1 2 3
//012345678901234567890123456789012345678 <- shows the index
String test = "i like to eat piesss aaaabbbbpiesssbbbb";
// int first = 17;
// int end = 30;
int first = 20;
int end = 37;
String firstHalf = test.substring(0, first);
String secondHalf = test.substring(first, end);
int matchSize = 5;
for (int i = 0; i + matchSize < secondHalf.length() ; i++)
{
String window = secondHalf.substring(i, i + matchSize);
if ( firstHalf.contains(window) )
{
System.out.println(window);
}
}
System.out.println("Done searching.");
}
Displays:
piess
iesss
Done searching.
If this isn't what you meant PLEASE edit your question to make your needs clear.
I'm having a little difficulty figuring the best way to go about about generating combinations for specific letters recursively.
Presently, I have a method which would changes a String and alters certains characters to create a single substitution of a word.
However, this isn't going to satisfy for the different combinations for the word. For example, if I have the word kjng commonly mistaken printer characters such as:
[j=>i, i=>j, v=>u, u=>v, s=>f, f=>s, uu=>w, vv=>w] (map lookup, "=>" this is symbolic for key, value representation to make it extra clear)
Based on this method, the word would then become king. That's fine for a word with only one possibility. However along comes murdir which should generate the following:
murdir
mvrdjr
mvrdir
murdjr
A little advice on this would be great, presently I'm unsure of how best to go manage this scenario. For instance, how to keep track of the changes, do it in chunks of characters (1, then 2, then 3, etc).
One changes a word at some position with some rule. And then recurses further. If the new word was already found stop for that case.
So basically you iterate of wordIndex and ruleIndex. Recursive formulation is easiest, and can later be changed to iterative. You could make two level of recursion: walk rules, walk inside word.
Okay, in java:
public class Solver {
public static void main(String[] args) {
System.out.println("Solver");
Solver solver = new Solver("j=>i", "i=>j", "v=>u", "u=>v", "s=>f",
"f=>s", "uu=>w", "vv=>w");
//Set<String> words = solver.determineAllWords("murdir");
Set<String> words = solver.determineAllWords("gigi");
words.forEach(System.out::println);
System.out.println("Done");
}
static class Rule {
String from;
String to;
public Rule(String from, String to) {
this.from = from;
this.to = to;
}
}
private final Rule[] rules;
public Solver(String... tofroms) {
this.rules = new Rule[tofroms.length];
for (int i = 0; i < rules.length; ++i) {
String[] tofrom = tofroms[i].split("=>", 2);
rules[i] = new Rule(tofrom[0], tofrom[1]);
}
}
public Set<String> determineAllWords(String word) {
Set<String> solutionWords = new TreeSet<String>(); // Could be a field too.
solutionWords.add(word);
int ruleIndex = 0;
int wordIndex = 0;
solveTryingRules(solutionWords, word, wordIndex, ruleIndex);
return solutionWords;
}
private void solveTryingRules(Set<String> solutionWords,
String word, int wordIndex, int ruleIndex) {
if (ruleIndex >= rules.length) {
return;
}
Rule rule = rules[ruleIndex];
int wordIndexFound = word.indexOf(rule.from, wordIndex);
if (wordIndexFound == -1) {
// Next rule:
solveTryingRules(solutionWords, word, 0, ruleIndex + 1);
} else {
// Keep at same rule,
// Not applying rule to found word position:
solveTryingRules(solutionWords, word, wordIndexFound + 1, ruleIndex);
// Applying rule to found word position:
String nextWord = word.substring(0, wordIndexFound)
+ rule.to
+ word.substring(wordIndexFound + rule.from.length());
boolean added = solutionWords.add(nextWord);
if (added) {
solveTryingRules(solutionWords, nextWord, 0, 0);
}
}
}
}
If you need to do it recursively, how about this? This is python-like pseudocode.
#
# make a list of locations of all possible typos
#
s = [] # list
for i in range(0, len(source)):
if source[i] might be typo:
s.append(i)
#
# and to the recursion to find all combinations
#
print do_recurse(source, s)
#
# method that returns the correct char corresponding to the typo
#
def correction(char):
# you should implement
#
# the actual recursion method
#
def do_recurse(str, locations):
'''
return the list of all combinations
'''
if len(locations) <= 0:
return []
ret = []
for loc in locations:
# do the recursion with the string before the modification
r = do_recurse(str, locations[1:])
ret.extend( r )
# do the recursion with the modified string
str[loc] = correction(str[loc])
r = do_recurse(str, locations[1:])
ret.extend( r )
return ret
If your problem is just tracking the changes and you want to be sure you generated all the combination but none of them twice, all you need is to find an ordering of the set of all the possible combinations.
I'd map your list of possible replacements to a sequence of bits, like this:
j=>i ~ bit 0
i=>j ~ bit 1
v=>u ~ bit 2
u=>v ~ bit 3
s=>f ~ bit 4
f=>s ~ bit 5
uu=>w ~ bit 6
vv=>w ~ bit 7
76543210
00101011 means you replace j=>i, i=>j, u=>v, f=>s
11000001 means you replace j=>i, uu=>w, vv=>w
Then implement some kind of binary counting (0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100, ...) and generate the combinations based on the number.
I do not mean they have to be literally bits of a single long/integer variable, but the idea of ordering is this. Of course if you do not have more then 64 replacements then one single long variable is good:
String input = ...
List<...> replacements = ...
List<String> combinations = new ArrayList<>();
for (long count = 0; count < ...; count++) {
String output = input;
for (bit = 0; bit < 64; bit++) {
if ((count & (1L << bit)) != 0) { // the bit is set
// replace the characters based on replacements.get(bit)
// output = ...
}
}
combinations.add(output);
}
If you want to have it working for more unlimited number of replacements (> 64), you may use the same idea of ordering the set of all the combination and implement based on the ideas from Variable Number of Nested For Loops.