I have the below file :
name = David
city = sydney
COuntry = Australia
I am trying to create a hash map using groovy and split it at = and store it in an array such that part[0] contains before equal and part[1] contains after equal. I am then trying to create a map here .
Desired output :
def mapedData = [name :david , city : sydney , country :australia ]
My try :
String s=""
def myfile = new File("C:/Users/.............")
BufferedReader br = new BufferedReader(new FileReader(myfile));
Map<String, String> map = new HashMap<String, String>();
while((s = br.readLine()) != null) {
if(!s.startsWith("#")) {
StringTokenizer st=new StringTokenizer(s, "=")
while(st.hasMoreElements()) {
String line=st.nextElement().toString().trim()
print line
}
}
}
}
If you want to create a map from a file in Groovy, you can use java.util.Properties for that. Here is an example:
def file = new File("C:\\stackoverflow\\props.properties")
def props = new Properties()
file.withInputStream { stream ->
props.load(stream)
}
println(props)
This prints out:
[key1:value1, key2:value2]
The props.properties file contains this:
# Stackoverflow test
key1 = value1
key2 = value2
Try with this code:
def map =[:]
new File("file.txt").eachLine{line->
if(line.contains('=')&& (!line.startsWith("#"))){
map[line.split('=')[0]]=line.split('=')[1]
}
}
println map
Here a one-liner that does what you want:
new File(/C:\Users\.............\input.txt/).readLines().collectEntries { it.trim().split(/\s*=\s*/) as List }
Related
I have 3 different types of csv files, each with different headers. I currently use a MultiresourceItemReader and delegate the reading to a FlatfileItemReader as follows
#Bean
#StepScope
public MultiResourceItemReader<Model> multiResourceItemReader() {
MultiResourceItemReader<FileRow> resourceItemReader = new MultiResourceItemReader<FileRow>();
resourceItemReader.setResources( getInputResources() );
resourceItemReader.setDelegate( reader() );
return resourceItemReader;
}
#Bean
#StepScope
public FlatFileItemReader reader() {
log.debug("Header : {}", extraInfoHolder.getHeader());
return new FlatFileItemReaderBuilder<Model>()
.skippedLinesCallback(line -> {
String rsrc = multiResourceItemReader().getCurrentResource().toString();
log.debug("Current Resource : {}", rsrc);
// Verify file header is what we expect
if (!StringUtils.equals( line, extraInfoHolder.getHeader() )) {
throw new IllegalArgumentException( String.format("Invalid Header in " + rsrc) );
}
})
.name( "myReader" )
.linesToSkip( HEADER_ROW )
.lineMapper( new DefaultLineMapper() {
{
setLineTokenizer( getDelimitedLineTokenizer() );
setFieldSetMapper( getBeanWrapperFieldSetMapper() );
}} )
.build();
}
However, I'd like to read the csv file into an HashMap instead of a Model POJO, i.e. if the file is formatted as follows
First Name, Last Name, Age Doug, Jones, 57 Sam, Reed, 39
I'd like to read each line into a map where the key is the header token and the value is the file value, Map 1: First Name -> Doug Last Name -> Jones Age -> 57
Map 2: First Name -> Sam Last Name -> Reed Age -> 39
In classic Spring Batch fashion, I'd like to read one row, convert it into a map, process + write it, then read the next row. How can I achieve this?
This will return the maps that you want,
private static List<Map<String, Object>> getMapsFrom(String file) throws IOException {
List<Map<String, Object>> maps = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(file))))) {
int index = 0;
String line;
String[] keys = new String[3];
while ((line = br.readLine()) != null) {
if (index++ == 0){
keys = line.split(",");
}else{
String[] values = line.split(",");
for (int i = 0; i < values.length; i++) {
values[i] = values[i].trim();
}
Map<String, Object> map = new HashMap<>();
map.put(keys[0], values[0]);
map.put(keys[1], values[1]);
map.put(keys[2], Integer.parseInt(values[2]));
maps.add(map);
}
}
}
return maps;
}
assuming your csv file is always in the form of
First Name, Last Name, Age
Doug, Jones, 57
Sam, Reed, 39
Here is a screenshot of the maps returned from the file sample above,
I currently have a TreeMap of the form <String, List<List<String>>
I'm trying to write my tree map to an output file where I get the inner values of my string[] all separated by a colon.
Do I need a second for loop to loop through each inner list and format it using a .join(":", elements)?
Or is there a more concise way to keep it all in a single for loop statement?
I've tried a few things and my current code is:
new File(outFolder).mkdir();
File dir = new File(outFolder);
//get the file we're writing to
File outFile = new File(dir, "javaoutput.txt");
//create a writer
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outFile), "utf-8"))) {
for (Map.Entry<String, String[]> entry : allResults.entrySet()) {
writer.write(entry.getKey() + ", "+ Arrays.toString(entry.getValue()).replace("null", "").toString());
writer.newLine();
}
Current output:
ANY, [[469, 470], [206, 1013, 1014], [2607, 2608]]
Desired output:
ANY, 469:470, 206:1013:1014, 2607:2608
Any suggestions would be greatly appreciated.
String.join(":", arr) can be used to take the String array and return a colon-separated String. This can then be used with Streams with a Collector to join these strings with a comma-separator, so :
TreeMap<String, String[]> allResults = new TreeMap<>();
allResults.put("a", new String[]{"469", "470"});
allResults.put("b", new String[]{"206", "1013", "1014"});
allResults.put("c", new String[]{"2607", "2608"});
String result = allResults.entrySet().stream()
.map(e -> String.join(":", e.getValue()))
.collect(Collectors.joining(", "));
System.out.println(result);
produces :
469:470, 206:1013:1014, 2607:2608
With a List<List<String>>, you need a stream within a stream, so :
TreeMap<String, List<List<String>>> allResults = new TreeMap<>();
allResults.put("a", Arrays.asList(Arrays.asList("469", "470"), Arrays.asList("206", "1013", "1014"), Arrays.asList("2607", "2608")));
allResults.put("b", Arrays.asList(Arrays.asList("169", "470")));
allResults.put("c", Arrays.asList(Arrays.asList("269", "470")));
String result = allResults.entrySet().stream()
.map(i -> i.getKey() + "," + i.getValue().stream().map(elements -> String.join(":", elements))
.collect(Collectors.joining(", "))
)
.collect(Collectors.joining("\n"));
System.out.println(result);
which produces :
a,469:470, 206:1013:1014, 2607:2608
b,169:470
c,269:470
I have a set of strings like this
A_2007-04, A_2007-09, A_Agent, A_Daily, A_Execute, A_Exec, B_Action, B_HealthCheck
I want output as:
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
Key = B, Value = [Action,HealthCheck]
I'm using HashMap to do this
pckg:{A,B}
count:total no of strings
reports:set of strings
Logic I used is nested loop:
for (String l : reports[i]) {
for (String r : pckg) {
String[] g = l.split("_");
if (g[0].equalsIgnoreCase(r)) {
report.add(g[1]);
dirFiles.put(g[0], report);
} else {
break;
}
}
}
I'm getting output as
Key = A, Value = [2007-04,2007-09,Agent,Execute,Exec]
How to get second key?
Can someone suggest logic for this?
Assuming that you use Java 8, it can be done using computeIfAbsent to initialize the List of values when it is a new key as next:
List<String> tokens = Arrays.asList(
"A_2007-04", "A_2007-09", "A_Agent", "A_Daily", "A_Execute",
"A_Exec", "P_Action", "P_HealthCheck"
);
Map<String, List<String>> map = new HashMap<>();
for (String token : tokens) {
String[] g = token.split("_");
map.computeIfAbsent(g[0], key -> new ArrayList<>()).add(g[1]);
}
In terms of raw code this should do what I think you are trying to achieve:
// Create a collection of String any way you like, but for testing
// I've simply split a flat string into an array.
String flatString = "A_2007-04,A_2007-09,A_Agent,A_Daily,A_Execute,A_Exec,"
+ "P_Action,P_HealthCheck";
String[] reports = flatString.split(",");
Map<String, List<String>> mapFromReportKeyToValues = new HashMap<>();
for (String report : reports) {
int underscoreIndex = report.indexOf("_");
String key = report.substring(0, underscoreIndex);
String newValue = report.substring(underscoreIndex + 1);
List<String> existingValues = mapFromReportKeyToValues.get(key);
if (existingValues == null) {
// This key hasn't been seen before, so create a new list
// to contain values which belong under this key.
existingValues = new ArrayList<>();
mapFromReportKeyToValues.put(key, existingValues);
}
existingValues.add(newValue);
}
System.out.println("Generated map:\n" + mapFromReportKeyToValues);
Though I recommend tidying it up and organising it into a method or methods as fits your project code.
Doing this with Map<String, ArrayList<String>> will be another good approach I think:
String reports[] = {"A_2007-04", "A_2007-09", "A_Agent", "A_Daily",
"A_Execute", "A_Exec", "P_Action", "P_HealthCheck"};
Map<String, ArrayList<String>> map = new HashMap<>();
for (String rep : reports) {
String s[] = rep.split("_");
String prefix = s[0], suffix = s[1];
ArrayList<String> list = new ArrayList<>();
if (map.containsKey(prefix)) {
list = map.get(prefix);
}
list.add(suffix);
map.put(prefix, list);
}
// Print
for (Map.Entry<String, ArrayList<String>> entry : map.entrySet()) {
String key = entry.getKey();
ArrayList<String> valueList = entry.getValue();
System.out.println(key + " " + valueList);
}
for (String l : reports[i]) {
String[] g = l.split("_");
for (String r : pckg) {
if (g[0].equalsIgnoreCase(r)) {
report = dirFiles.get(g[0]);
if(report == null){ report = new ArrayList<String>(); } //create new report
report.add(g[1]);
dirFiles.put(g[0], report);
}
}
}
Removed the else part of the if condition. You are using break there which exits the inner loop and you never get to evaluate the keys beyond first key.
Added checking for existing values. As suggested by Orin2005.
Also I have moved the statement String[] g = l.split("_"); outside inner loop so that it doesn't get executed multiple times.
I have two files each having the same format with approximately 100,000 lines. For each line in file one I am extracting the second component or column and if I find a match in the second column of second file, I extract their third components and combine them, store or output it.
Though my implementation works but the programs runs extremely slow, it takes more than an hour to iterate over the files, compare and output all the results.
I am reading and storing the data of both files in ArrayList then iterate over those list and do the comparison. Below is my code, is there any performance related glitch or its just normal for such an operation.
Note : I was using String.split() but I understand form other post that StringTokenizer is faster.
public ArrayList<String> match(String file1, String file2) throws IOException{
ArrayList<String> finalOut = new ArrayList<>();
try {
ArrayList<String> data = readGenreDataIntoMemory(file1);
ArrayList<String> data1 = readGenreDataIntoMemory(file2);
StringTokenizer st = null;
for(String line : data){
HashSet<String> genres = new HashSet<>();
boolean sameMovie = false;
String movie2 = "";
st = new StringTokenizer(line, "|");
//String line[] = fline.split("\\|");
String ratingInfo = st.nextToken();
String movie1 = st.nextToken();
String genreInfo = st.nextToken();
if(!genreInfo.equals("null")){
for(String s : genreInfo.split(",")){
genres.add(s);
}
}
StringTokenizer st1 = null;
for(String line1 : data1){
st1 = new StringTokenizer(line1, "|");
st1.nextToken();
movie2 = st1.nextToken();
String genreInfo2= st1.nextToken();
//If the movie name are similar then they should have the same genre
//Update their genres to be the same
if(!genreInfo2.equals("null") && movie1.equals(movie2)){
for(String s : genreInfo2.split(",")){
genres.add(s);
}
sameMovie = true;
break;
}
}
if(sameMovie){
finalOut.add(ratingInfo+""+movieName+""+genres.toString()+"\n");
}else if(sameMovie == false){
finalOut.add(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return finalOut;
}
I would use the Streams API
String file1 = "files1.txt";
String file2 = "files2.txt";
// get all the lines by movie name for each file.
Map<String, List<String[]>> map = Stream.of(Files.lines(Paths.get(file1)),
Files.lines(Paths.get(file2)))
.flatMap(p -> p)
.parallel()
.map(s -> s.split("[|]", 3))
.collect(Collectors.groupingByConcurrent(sa -> sa[1], Collectors.toList()));
// merge all the genres for each movie.
map.forEach((movie, lines) -> {
Set<String> genres = lines.stream()
.flatMap(l -> Stream.of(l[2].split(",")))
.collect(Collectors.toSet());
System.out.println("movie: " + movie + " genres: " + genres);
});
This has the advantage of being O(n) instead of O(n^2) and it's multi-threaded.
Do a hash join.
As of now you are doing an outer loop join which is O(n^2), the hash join will be amortized O(n)
Put the contents of each file in a hash map, with key the field you want (second field).
Map<String,String> map1 = new HashMap<>();
// build the map from file1
Then do the hash join
for(String key1 : map1.keySet()){
if(map2.containsKey(key1)){
// do your thing you found the match
}
}
I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1
Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.
You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2
String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}
Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).