I am using Java 1.8. I have a large amount of text in a buffer. The text has some occurrences likt the following:
"... {NAME} is going to {PLACE}...", blah blah blah.
Then I have two arrays: "{NAME};{PLACE}" and "Mick Jagger;A Gogo", etc. (These are just examples).
I make a Map replacements of these such as
{NAME};Mick Jagger
{PLACE};A Gogo
So I want to do all the replacements. In this case there is only 2 so it is not so cumbersome. Say my original text is in txt:
for (EntrySet<String, String> entry : replacements.entrySet()) {
txt = txt.replace(entry.getKey(), entry.getValue());
}
You can imaging if there are like a lot of replacements this could take a long time.
Is there some better way to make all the replacements, or is this basically what you would do?
To avoid calling String.replace many times, you can use a regex which matches every replacement-key. You can then iteratively scan for the next replacement-key in the input string using a loop, and find its substitute using Map.get:
import java.util.Collection;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class ReplaceMap {
private final Pattern p;
public ReplaceMap(Collection<String> keys) {
this.p = Pattern.compile(keys.stream()
.map(ReplaceMap::escapeRegex)
.collect(Collectors.joining("|")));
}
public String replace(String input, Map<String,String> subs) {
Matcher m = p.matcher(input);
StringBuilder out = new StringBuilder();
int i = 0;
while(m.find()) {
out.append(input.substring(i, m.start()));
String key = m.group(0);
out.append(subs.get(key));
i = m.end();
}
out.append(input.substring(i));
return out.toString();
}
// from https://stackoverflow.com/a/25853507/12299000
private static Pattern SPECIAL_REGEX_CHARS = Pattern.compile("[{}()\\[\\].+*?^$\\\\|]");
private static String escapeRegex(String s) {
return SPECIAL_REGEX_CHARS.matcher(s).replaceAll("\\\\$0");
}
}
Usage:
> Map<String,String> subs = new HashMap<>();
> subs.put("{NAME}", "Alice");
> subs.put("{PLACE}", "Wonderland");
> ReplaceMap r = new ReplaceMap(subs.keySet());
> r.replace("Hello, {NAME} in {PLACE}.", subs)
"Hello, Alice in Wonderland." (String)
This solution should be about as efficient regardless of how many replacement key/value pairs there are in the subs map.
I'd suggest reading file line by line (using NIO) and for each line you can iterate your map and replace it if you have something. So in this case you need to go over your big data only once
Related
replace or remove special char from List java
List<String> somestring = ['%french',
'#spanish',
'!latin'];
How to remove the special characters and replace it with space.
List<String> somestring = ['%french',
'#spanish',
'!latin'];
somestring.replaceall('%','');
How to get this as result
List<String> somestring = ['french',
'spanish',
'latin'];
First, never use a raw List. You have a List<String>. Second, a String literal (in Java) is surrounded by double quotes (") not single quotes. Third, you can stream your List<String> and map the elements with a regular expression and collect them back to the original List<String>. Like,
List<String> somestring = Arrays.asList("%french", "#spanish", "!latin");
somestring = somestring.stream().map(s -> s.replaceAll("\\W", ""))
.collect(Collectors.toList());
System.out.println(somestring);
Outputs (as requested)
[french, spanish, latin]
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "\\!|\\%|\\#"; //control on Special characters...
private static String INPUT = "The %dog% says !meow. " + "!All #dogs #say meow.";
private static String REPLACE = ""; //Replacement string
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
//get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
Added a sample snippet to explain the same, please extend to collections accordingly.
I need to perform multiple replaceAll commands in a string and i wonder if there is a clean way to do it. This is how it is currently:
newString = oldString.replaceAll("α","a").replaceAll("β","b").replace("c","σ") /* This goes on for over 60 replacements*/;
I have implemented a specialized solution if you only want to replace a single Character with a single Character or another String:
private static Map<Character, Character> REPLACEMENTS = new HashMap<>();
static {
REPLACEMENTS.put('α','a');
REPLACEMENTS.put('β','b');
}
public static String replaceChars(String input) {
StringBuilder sb = new StringBuilder(input.length());
for(int i = 0;i<input.length();++i) {
char currentChar = input.charAt(i);
sb.append(REPLACEMENTS.getOrDefault(currentChar, currentChar));
}
return sb.toString();
}
This implementation avoids excessive string copies / complex regexes and thus should perform really well compared to an implementation that uses either replace or replaceAll. You can change the replacement to String too but replacing whole Strings instead of Characters is more complicated - I would prefer a regex then.
EDIT:
Here is a solution for whole Strings in the above style but I would recommend you to look into other solutions like e.g. regex as its performance characteristics are not as good as the above example for Character. Furthermore its more complex and error prone, a simple test showed it's working correctly though. It still avoids the string copies though so it may be preferable in performance sensitive scenarios.
private static Map<String, String> REPLACEMENTS = new HashMap<>();
static {
REPLACEMENTS.put("aa","AA");
REPLACEMENTS.put("bb","BB");
}
public static String replace(String input) {
StringBuilder sb = new StringBuilder(input.length());
for (int i = 0; i < input.length(); ++i) {
i += replaceFrom(input, i, sb);
}
return sb.toString();
}
private static int replaceFrom(String input, int startIndex, StringBuilder sb) {
for (Map.Entry<String, String> replacement : REPLACEMENTS.entrySet()) {
String toMatch = replacement.getKey();
if (input.startsWith(toMatch, startIndex)) {
sb.append(replacement.getValue());
//we matched the whole word skip all matched characters
//not just the first
return toMatch.length() - 1;
}
}
sb.append(input.charAt(startIndex));
return 0;
}
You can do something like this. Map will contain the mappings and all you have to do is to loop through the mappings and call replace.
public static void main(String[] args) {
// your input
String old = "something";
// the mappings
Map<Character, Character> mappings = new HashMap<>();
mappings.put('α','a');
// loop through the mappings and perform the action
for (Map.Entry<Character, Character> entry : mappings.entrySet()) {
old = old.replace(entry.getKey(), entry.getValue());
}
}
I am a beginner in Java. Basically, I have loaded each text document and stored each individual words in the text document in the hasmap. Afterwhich, I tried storing all the hashmaps in an ArrayList. Now I am stuck with how to retrieve all the words in my hashmaps that is in the arraylist!
private static long numOfWords = 0;
private String userInputString;
private static long wordCount(String data) {
long words = 0;
int index = 0;
boolean prevWhiteSpace = true;
while (index < data.length()) {
//Intialise character variable that will be checked.
char c = data.charAt(index++);
//Determine whether it is a space.
boolean currWhiteSpace = Character.isWhitespace(c);
//If previous is a space and character checked is not a space,
if (prevWhiteSpace && !currWhiteSpace) {
words++;
}
//Assign current character's determination of whether it is a spacing as previous.
prevWhiteSpace = currWhiteSpace;
}
return words;
} //
public static ArrayList StoreLoadedFiles()throws Exception{
final File f1 = new File ("C:/Users/Admin/Desktop/dataFiles/"); //specify the directory to load files
String data=""; //reset the words stored
ArrayList<HashMap> hmArr = new ArrayList<HashMap>(); //array of hashmap
for (final File fileEntry : f1.listFiles()) {
Scanner input = new Scanner(fileEntry); //load files
while (input.hasNext()) { //while there are still words in the document, continue to load all the words in a file
data += input.next();
input.useDelimiter("\t"); //similar to split function
} //while loop
String textWords = data.replaceAll("\\s+", " "); //remove all found whitespaces
HashMap<String, Integer> hm = new HashMap<String, Integer>(); //Creates a Hashmap that would be renewed when next document is loaded.
String[] words = textWords.split(" "); //store individual words into a String array
for (int j = 0; j < numOfWords; j++) {
int wordAppearCount = 0;
if (hm.containsKey(words[j].toLowerCase().replaceAll("\\W", ""))) { //replace non-word characters
wordAppearCount = hm.get(words[j].toLowerCase().replaceAll("\\W", "")); //remove non-word character and retrieve the index of the word
}
if (!words[j].toLowerCase().replaceAll("\\W", "").equals("")) {
//Words stored in hashmap are in lower case and have special characters removed.
hm.put(words[j].toLowerCase().replaceAll("\\W", ""), ++wordAppearCount);//index of word and string word stored in hashmap
}
}
hmArr.add(hm);//stores every single hashmap inside an ArrayList of hashmap
} //end of for loop
return hmArr; //return hashmap ArrayList
}
public static void LoadAllHashmapWords(ArrayList m){
for(int i=0;i<m.size();i++){
m.get(i); //stuck here!
}
Firstly your login wont work correctly. In the StoreLoadedFiles() method you iterate through the words like for (int j = 0; j < numOfWords; j++) { . The numOfWords field is initialized to zero and hence this loop wont execute at all. You should initialize that with length of words array.
Having said that to retrieve the value from hashmap from a list of hashmap, you should first iterate through the list and with each hashmap you could take the entry set. Map.Entry is basically the pair that you store in the hashmap. So when you invoke map.entrySet() method it returns a java.util.Set<Map.Entry<Key, Value>>. A set is returned because the key will be unique.
So a complete program will look like.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Scanner;
public class FileWordCounter {
public static List<HashMap<String, Integer>> storeLoadedFiles() {
final File directory = new File("C:/Users/Admin/Desktop/dataFiles/");
List<HashMap<String, Integer>> listOfWordCountMap = new ArrayList<HashMap<String, Integer>>();
Scanner input = null;
StringBuilder data;
try {
for (final File fileEntry : directory.listFiles()) {
input = new Scanner(fileEntry);
input.useDelimiter("\t");
data = new StringBuilder();
while (input.hasNext()) {
data.append(input.next());
}
input.close();
String wordsInFile = data.toString().replaceAll("\\s+", " ");
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
for(String word : wordsInFile.split(" ")){
String strippedWord = word.toLowerCase().replaceAll("\\W", "");
int wordAppearCount = 0;
if(strippedWord.length() > 0){
if(wordCountMap.containsKey(strippedWord)){
wordAppearCount = wordCountMap.get(strippedWord);
}
wordCountMap.put(strippedWord, ++wordAppearCount);
}
}
listOfWordCountMap.add(wordCountMap);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if(input != null) {
input.close();
}
}
return listOfWordCountMap;
}
public static void loadAllHashmapWords(List<HashMap<String, Integer>> listOfWordCountMap) {
for(HashMap<String, Integer> wordCountMap : listOfWordCountMap){
for(Entry<String, Integer> wordCountEntry : wordCountMap.entrySet()){
System.out.println(wordCountEntry.getKey() + " - " + wordCountEntry.getValue());
}
}
}
public static void main(String[] args) {
List<HashMap<String, Integer>> listOfWordCountMap = storeLoadedFiles();
loadAllHashmapWords(listOfWordCountMap);
}
}
Since you are beginner in Java programming I would like to point out a few best practices that you could start using from the beginning.
Closing resources : In your while loop to read from files you are opening a Scanner like Scanner input = new Scanner(fileEntry);, But you never closes it. This causes memory leaks. You should always use a try-catch-finally block and close resources in finally block.
Avoid unnecessary redundant calls : If an operation is the same while executing inside a loop try moving it outside the loop to avoid redundant calls. In your case for example the scanner delimiter setting as input.useDelimiter("\t"); is essentially a one time operation after a scanner is initialized. So you could move that outside the while loop.
Use StringBuilder instead of String : For repeated string manipulations such as concatenation should be done using a StringBuilder (or StringBuffer when you need synchronization) instead of using += or +. This is because String is an immutable object, meaning its value cannot be changed. So each time when you do a concatenation a new String object is created. This results in a lot of unused instances in memory. Where as StringBuilder is mutable and values could be changed.
Naming convention : The usual naming convention in Java is starting with lower-case letter and first letter upper-case for each word. So its a standard practice to name a method as storeLoadedFiles as opposed to StoreLoadedFiles. (This could be opinion based ;))
Give descriptive names : Its a good practice to give descriptive names. It helps in later code maintenance. Say its better to give a name as wordCountMap as opposed to hm. So in future if someone tries to go through your code they'll get a better and faster understanding about your code with descriptive names. Again opinion based.
Use generics as much as possible : This avoid additional casting overhead.
Avoid repetition : Similar to point 2 if you have an operation that result in the same output and need to be used multiple times try moving it to a variable and use the variable. In your case you were using words[j].toLowerCase().replaceAll("\\W", "") multiple times. All the time the result is the same but it creates unnecessary instances and repetitions. So you could move that to a String and use that String elsewhere.
Try using for-each loop where ever possible : This relieves us from taking care of indexing.
These are just suggestions. I tried to include most of it in my code but I wont say its the perfect one. Since you are a beginner if you tried to include these best practices now itself it'll get ingrained in you. Happy coding.. :)
for (HashMap<String, Integer> map : m) {
for(Entry<String,Integer> e:map.entrySet()){
//your code here
}
}
or, if using java 8 you can play with lambda
m.stream().forEach((map) -> {
map.entrySet().stream().forEach((e) -> {
//your code here
});
});
But before all you have to change method signature to public static void LoadAllHashmapWords(List<HashMap<String,Integer>> m) otherwise you would have to use a cast.
P.S. are you sure your extracting method works? I've tested it a bit and had list of empty hashmaps all the time.
I am trying to create a program that detects if multiple words are in a string as fast as possible, and if so, executes a behavior. Preferably, I would like it to detect the order of these words too but only if this can be done fast. So far, this is what I have done:
if (input.contains("adsf") && input.contains("qwer")) {
execute();
}
As you can see, doing this for multiple words would become tiresome. Is this the only way or is there a better way of detecting multiple substrings? And is there any way of detecting order?
I'd create a regular expression from the words:
Pattern pattern = Pattern.compile("(?=.*adsf)(?=.*qwer)");
if (pattern.matcher(input).find()) {
execute();
}
For more details, see this answer: https://stackoverflow.com/a/470602/660143
Editors note: Despite being heavily upvoted and accepted, this does not function the same as the code in the question. execute is called on the first match, like a logical OR.
You could use an array:
String[] matches = new String[] {"adsf", "qwer"};
bool found = false;
for (String s : matches)
{
if (input.contains(s))
{
execute();
break;
}
}
This is efficient as the one posted by you but more maintainable. Looking for a more efficient solution sounds like a micro optimization that should be ignored until proven to be effectively a bottleneck of your code, in any case with a huge string set the solution could be a trie.
In Java 8 you could do
public static boolean containsWords(String input, String[] words) {
return Arrays.stream(words).allMatch(input::contains);
}
Sample usage:
String input = "hello, world!";
String[] words = {"hello", "world"};
if (containsWords(input, words)) System.out.println("Match");
This is a classical interview and CS problem.
Robin Karp algorithm is usually what people first talk about in interviews. The basic idea is that as you go through the string, you add the current character to the hash. If the hash matches the hash of one of your match strings, you know that you might have a match. This avoids having to scan back and forth into your match strings.
https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
Other typical topics for that interview question are to consider a trie structure to speed up the lookup. If you have a large set of match strings, you have to always check a large set of match strings. A trie structure is more efficient to do that check.
https://en.wikipedia.org/wiki/Trie
Additional algorithms are:
- Aho–Corasick https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
- Commentz-Walter https://en.wikipedia.org/wiki/Commentz-Walter_algorithm
If you have a lot of substrings to look up, then a regular expression probably isn't going to be much help, so you're better off putting the substrings in a list, then iterating over them and calling input.indexOf(substring) on each one. This returns an int index of where the substring was found. If you throw each result (except -1, which means that the substring wasn't found) into a TreeMap (where index is the key and the substring is the value), then you can retrieve them in order by calling keys() on the map.
Map<Integer, String> substringIndices = new TreeMap<Integer, String>();
List<String> substrings = new ArrayList<String>();
substrings.add("asdf");
// etc.
for (String substring : substrings) {
int index = input.indexOf(substring);
if (index != -1) {
substringIndices.put(index, substring);
}
}
for (Integer index : substringIndices.keys()) {
System.out.println(substringIndices.get(index));
}
Use a tree structure to hold the substrings per codepoint. This eliminates the need to
Note that this is efficient only if the needle set is almost constant. It is not inefficient if there are individual additions or removals of substrings though, but a different initialization each time to arrange a lot of strings into a tree structure would definitely slower it.
StringSearcher:
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
class StringSearcher{
private NeedleTree needles = new NeedleTree(-1);
private boolean caseSensitive;
private List<Integer> lengths = new ArrayList<>();
private int maxLength;
public StringSearcher(List<String> inputs, boolean caseSensitive){
this.caseSensitive = caseSensitive;
for(String input : inputs){
if(!lengths.contains(input.length())){
lengths.add(input.length());
}
NeedleTree tree = needles;
for(int i = 0; i < input.length(); i++){
tree = tree.child(caseSensitive ? input.codePointat(i) : Character.toLowerCase(input.codePointAt(i)));
}
tree.markSelfSet();
}
maxLength = Collections.max(legnths);
}
public boolean matches(String haystack){
if(!caseSensitive){
haystack = haystack.toLowerCase();
}
for(int i = 0; i < haystack.length(); i++){
String substring = haystack.substring(i, i + maxLength); // maybe we can even skip this and use from haystack directly?
NeedleTree tree = needles;
for(int j = 0; j < substring.maxLength; j++){
tree = tree.childOrNull(substring.codePointAt(j));
if(tree == null){
break;
}
if(tree.isSelfSet()){
return true;
}
}
}
return false;
}
}
NeedleTree.java:
import java.util.HashMap;
import java.util.Map;
class NeedleTree{
private int codePoint;
private boolean selfSet;
private Map<Integer, NeedleTree> children = new HashMap<>();
public NeedleTree(int codePoint){
this.codePoint = codePoint;
}
public NeedleTree childOrNull(int codePoint){
return children.get(codePoint);
}
public NeedleTree child(int codePoint){
NeedleTree child = children.get(codePoint);
if(child == null){
child = children.put(codePoint, new NeedleTree(codePoint));
}
return child;
}
public boolean isSelfSet(){
return selfSet;
}
public void markSelfSet(){
selfSet = true;
}
}
I think a better approach would be something like this, where we can add multiple values as a one string and by index of function validate index
String s = "123";
System.out.println(s.indexOf("1")); // 0
System.out.println(s.indexOf("2")); // 1
System.out.println(s.indexOf("5")); // -1
I have this input:
5
it
your
reality
real
our
First line is number of strings comming after. And i should store it this way (pseudocode):
associative_array = [ 2 => ['it'], 3 => ['our'], 4 => ['real', 'your'], 7 => ['reality']]
As you can see the keys of associative array are the length of strings stored in inner array.
So how can i do this in java ? I came from php world, so if you will compare it with php, it will be very well.
MultiMap<Integer, String> m = new MultiHashMap<Integer, String>();
for(String item : originalCollection) {
m.put(item.length(), item);
}
djechlin already posted a better version, but here's a complete standalone example using just JDK classes:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
public class Main {
public static void main(String[] args) throws Exception{
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String firstLine = reader.readLine();
int numOfRowsToFollow = Integer.parseInt(firstLine);
Map<Integer,Set<String>> stringsByLength = new HashMap<>(numOfRowsToFollow); //worst-case size
for (int i=0; i<numOfRowsToFollow; i++) {
String line = reader.readLine();
int length = line.length();
Set<String> alreadyUnderThatLength = stringsByLength.get(length); //int boxed to Integer
if (alreadyUnderThatLength==null) {
alreadyUnderThatLength = new HashSet<>();
stringsByLength.put(length, alreadyUnderThatLength);
}
alreadyUnderThatLength.add(line);
}
System.out.println("results: "+stringsByLength);
}
}
its output looks like this:
3
bob
bart
brett
results: {4=[bart], 5=[brett], 3=[bob]}
Java doesn't have associative arrays. But it does have Hashmaps, which mostly accomplishes the same goal. In your case, you can have multiple values for any given key. So what you could do is make each entry in the Hashmap an array or a collection of some kind. ArrayList is a likely choice. That is:
Hashmap<Integer,ArrayList<String>> words=new HashMap<Integer,ArrayList<String>>();
I'm not going to go through the code to read your list from a file or whatever, that's a different question. But just to give you the idea of how the structure would work, suppose we could hard-code the list. We could do it something like this:
ArrayList<String> set=new ArrayList<String)();
set.add("it");
words.put(Integer.valueOf(2), set);
set.clear();
set.add("your");
set.add("real");
words.put(Integer.valueOf(4), set);
Etc.
In practice, you probably would regularly be adding words to an existing set. I often do that like this:
void addWord(String word)
{
Integer key=Integer.valueOf(word.length());
ArrayList<String> set=words.get(key);
if (set==null)
{
set=new ArrayList<String>();
words.put(key,set);
}
// either way we now have a set
set.add(word);
}
Side note: I often see programmers end a block like this by putting "set" back into the Hashmap, i.e. "words.put(key,set)" at the end. This is unnecessary: it's already there. When you get "set" from the Hashmap, you're getting a reference, not a copy, so any updates you make are just "there", you don't have to put it back.
Disclaimer: This code is off the top of my head. No warranties expressed or implied. I haven't written any Java in a while so I may have syntax errors or wrong function names. :-)
As your key appears to be small integer, you could use a list of lists. In this case the simplest solution is to use a MultiMap like
Map<Integer, Set<String>> stringByLength = new LinkedHashMap<>();
for(String s: strings) {
Integer len = s.length();
Set<String> set = stringByLength.get(s);
if(set == null)
stringsByLength.put(len, set = new LinkedHashSet<>());
set.add(s);
}
private HashMap<Integer, List<String>> map = new HashMap<Integer, List<String>>();
void addStringToMap(String s) {
int length = s.length();
if (map.get(length) == null) {
map.put(length, new ArrayList<String>());
}
map.get(length).add(s);
}