read/write files in Java - java

How can I read a file easily in Java if I have following file format:
a|dip
a|dop
c|nap
a|dip
b|dop
b|sip
a|tang
c|dig
c|nap
I want to get all words that belongs to "a", "b", and "c". What data structure I can use to read and store this information?
You can also suggest some good file formats (two column) that is easy to read/write in Java.
I know some of you may be thinking that what is the real problem that I want to solve, I have some complex employee related data. Current (poor) system generate some files and I am trying to process them to add them in database. The current files' format is bit complex (private), I cannot copy past here.

If you can use Google Guava (http://code.google.com/p/guava-libraries/) then you'll get a few handy classes (you can use some or all of these):
com.google.common.io.Files
com.google.common.io.LineProcessor<T>
com.google.common.base.Charsets
com.google.common.collect.Multimap<K,V>
com.google.common.collect.ArrayListMultimap<K,V>
For example you could write:
LineProcessor<Multimap<String, String>> processor =
new LineProcessor<Multimap<String, String>>() {
Multimap<String, String> processed = ArrayListMultimap.create();
public boolean processLine(String line) {
String parts[] = line.split("\\|", 2); // 2 keeps any | in the rest of the line
processed.put(parts[0], parts[1]);
return true; // keep going
}
public Multimap<String, String> getResult() {
return processed;
}
};
Multimap<String, String> result = Files.readLines(
new File("filename.txt"), Charsets.UTF_8, processor);

You can use Scanner to read the text file one line at a time and then you can use String.split("\\|") to separate the parts on that line. For storing the information, a Map<String,List<String>> might work.

I'd use this data structure:
Map<String, List<String>> map = new HashMap<String, List<String>>();
And parse the file like this:
File file = new File("words.txt");
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
String next = scanner.next();
String[] parts = next.split("\\|");
String group = parts[0];
String word = parts[1];
List<String> list = map.get(group);
if (list == null) {
list = new ArrayList<String>();
map.put(group, list);
}
list.add(word);
}
So you could get the list of words for "a" like so:
for (String word : map.get("a")) {
System.out.println(word);
}

Related

Using Jackson to convert CSV to JSON - How to remove newlines embedded in CSV column header

After some quick Googling, I found an easy way to read and parse a CSV file to JSON using the Jackson library. All well and good, except ... some of the CSV header column names have embedded newlines. The program handles it, but I'm left with JSON keys with newlines embedded within. I'd like to remove these (or replace them with a space).
Here is the simple program I found:
import java.io.File;
import java.util.List;
import java.util.Map;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
public class CSVToJSON {
public static void main(String[] args) throws Exception {
File input = new File("PDM_BOM.csv");
File output = new File("output.json");
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
CsvMapper csvMapper = new CsvMapper();
// Read data from CSV file
List<Object> readAll = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
.readAll();
ObjectMapper mapper = new ObjectMapper();
// Write JSON formated data to output.json file
mapper.writerWithDefaultPrettyPrinter().writeValue(output, readAll);
// Write JSON formated data to stdout
System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
}
}
So, as an example:
PARENT\nITEM\nNUMBER
Here's an example of what is produced:
"PARENT\nITEM\nNUMBER" : "208E8840040",
I need this to be:
"PARENT ITEM NUMBER" : "208E8840040",
Is there a configuration setting on the Jackson mapper that can handle this? Or, do I need to provide some sort of custom "handler" to the mapper?
Special cases
To add some complexity, there are cases where just replacing the newline with a space will not always yield what is needed.
Example 1:
Sometimes there is a column header like this:
QTY\nORDER/\nTRANSACTION
In this case, I need the newline removed and replaced with nothing, so that the result is:
QTY ORDER/TRANSACTION
, not
QTY ORDER/ TRANSACTION
Example 2:
Sometimes, for whatever reason, a column header has a space before the newline:
EFFECTIVE \nTHRU DATE
This needs to come out as:
EFFECTIVE THRU DATE
, not
EFFECTIVE THRU DATE
Any ideas on how to handle at least the main issue would be very much appreciated.
You can use the String replaceAll() method to replace all new lines with spaces.
String str = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll);
str = str.trim().replaceAll("[\n\s]+", " ");
OK, came up with a solution. It's ugly, but it works. Basically, after the CsvMapper finishes, I go through the giant ugly collection that's produced and do a String.replaceAll (thanks to https://stackoverflow.com/users/4402505/prem-kurian-philip for that suggestion) to remove the unwanted characters and then rebuild the map.
In any case here's the new code:
public class CSVToJSON {
public static void main(String[] args) throws Exception {
File input = new File("PDM_BOM.csv");
File output = new File("output.json");
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
CsvMapper csvMapper = new CsvMapper();
// Read data from CSV file
List<Object> readData = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
.readAll();
for (Object mapObj : readData) {
LinkedHashMap<String, String> map = (LinkedHashMap<String, String>) mapObj;
List<String> deleteList = new ArrayList<>();
LinkedHashMap<String, String> insertMap = new LinkedHashMap<>();
for (Object entObj : map.entrySet()) {
Entry<String, String> entry = (Entry<String, String>) entObj;
String oldKey = entry.getKey();
String newKey = oldKey.replaceAll("[\n\s]+", " ");
String value = entry.getValue();
deleteList.add(oldKey);
insertMap.put(newKey, value);
}
// Delete the old ...
for (String oldKey : deleteList) {
map.remove(oldKey);
}
// and bring in the new
map.putAll(insertMap);
}
ObjectMapper mapper = new ObjectMapper();
// Write JSON formated data to output.json file
mapper.writerWithDefaultPrettyPrinter().writeValue(output, readData);
// Write JSON formated data to stdout
System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
}
}
It seems like there should be a better way to achieve this.

Decorating Java 8 Files.lines() stream with filename

Files.lines() returns a Stream<String> of each line in the file. What I want is a Stream of Map, with the key the line, and the value the filename. This must be an intermediate, not a terminal, result in a pipeline.
What is the best way of accomplishing this?
You can try the following:
final String fileName = ""; //File name
List<String> lines = new ArrayList(); //Get the lines
lines.stream()
.map(l -> {
Map<String, String> map = new HashMap<>();
map.put(l, fileName);
return map;
});
You can use Google Guava's ImmutableMap class to create a map (for simplified version), e.g.
final String fileName = "";
List<String> lines = new ArrayList();
lines.stream()
.map(l -> ImmutableMap.of(l, fileName));

How to read from particular header in opencsv?

I have a csv file. I want to extract particular column from it.For example:
Say, I have csv:
id1,caste1,salary,name1
63,Graham,101153.06,Abraham
103,Joseph,122451.02,Charlie
63,Webster,127965.91,Violet
76,Smith,156150.62,Eric
97,Moreno,55867.74,Mia
65,Reynolds,106918.14,Richard
How can i use opencsv to read only data from header caste1?
Magnilex and Sparky are right in that CSVReader does not support reading values by column name. But that being said there are two ways you can do this.
Given that you have the column names and the default CSVReader reads the header you can search the first the header for the position then use that from there on out;
private int getHeaderLocation(String[] headers, String columnName) {
return Arrays.asList(headers).indexOf(columnName);
}
so your method would look like (leaving out a lot of error checks you will need to put in)
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
int columnPosition;
nextLine = reader.readNext();
columnPosition = getHeaderLocation(nextLine, "castle1");
while ((nextLine = reader.readNext()) != null && columnPosition > -1) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[columnPosition]);
}
I would only do the above if you were pressed for time and it was only one column you cared about. That is because openCSV can convert directly to an object that has the variables the same as the header column names using the CsvToBean class and the HeaderColumnNameMappingStrategy.
So first you would define a class that has the fields (and really you only need to put in the fields you want - extras are ignored and missing ones are null or default values).
public class CastleDTO {
private int id1;
private String castle1;
private double salary;
private String name1;
// have all the getters and setters here....
}
Then your code would look like
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
HeaderColumnNameMappingStrategy<CastleDTO> castleStrategy = new HeaderColumnNameMappingStrategy<CastleDTO>();
CsvToBean<CastleDTO> csvToBean = new CsvToBean<CastleDTO>();
List<CastleDTO> castleList = csvToBean.parse(castleStrategy, reader);
for (CastleDTO dto : castleList) {
System.out.println(dto.getCastle1());
}
There is no built in functionality in opencsv for reading from a column by name.
The official FAQ example has the following example on how to read from a file:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
You simply fetch the value in second column for each row by accesing the row with nextLine[1] (remember, arrays indices are zero based).
So, in your case you could simply read from the second line:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[1]);
}
For a more sophisticated way of determining the column index from its header, refer to the answer from Scott Conway.
From the opencsv docs:
Starting with version 4.2, there’s another handy way of reading CSV files that doesn’t even require creating special classes. If your CSV file has headers, you can just initialize a CSVReaderHeaderAware and start reading the values out as a map:
reader = new CSVReaderHeaderAware(new FileReader("yourfile.csv"));
record = reader.readMap();
.readMap() will return a single record. You need to call .readMap() repeatedly to get all the records until you get null when it runs to the end (or to the first empty line), e.g.:
Map<String, String> values;
while ((values = reader.readMap()) != null) {
// consume the values here
}
The class also has another constructor which allows more customization, e.g.:
CSVReaderHeaderAware reader = new CSVReaderHeaderAware(
new InputStreamReader(inputStream),
0, // skipLines
parser, // custom parser
false, // keep end of lines
true, // verify reader
0, // multiline limit
null // null for default locale
);
One downside which I have found is that since the reader is lazy it does not offer a record count, therefore, if you need to know the total number (for example to display correct progress information), then you'll need to use another reader just for counting lines.
You also have available the CSVReaderHeaderAwareBuilder
I had a task to remove several columns from existing csv, example of csv:
FirstName, LastName, City, County, Zip
Steve,Hopkins,London,Greater London,15554
James,Bond,Vilnius,Vilniaus,03250
I needed only FirstName and LastName columns with values and it is very important that order should be the same - default rd.readMap() does not preserve the order, code for this task:
String[] COLUMN_NAMES_TO_REMOVE = new String[]{"", "City", "County", "Zip"};
CSVReaderHeaderAware rd = new CSVReaderHeaderAware(new StringReader(old.csv));
CSVWriter writer = new CSVWriter((new FileWriter(new.csv)),
CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER, CSVWriter.NO_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);
// let's get private field
Field privateField = CSVReaderHeaderAware.class.getDeclaredField("headerIndex");
privateField.setAccessible(true);
Map<String, Integer> headerIndex = (Map<String, Integer>) privateField.get(rd);
// do ordering in natural order - 0, 1, 2 ... n
Map<String, Integer> sortedInNaturalOrder = headerIndex.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.naturalOrder()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
(oldValue, newValue) -> oldValue, LinkedHashMap::new));
// let's get headers in natural order
List<String> headers = sortedInNaturalOrder.keySet().stream().distinct().collect(Collectors.toList());
// let's remove headers
List<String> removedColumns = new ArrayList<String>(Arrays.asList(COLUMN_NAMES_TO_REMOVE));
headers.removeAll(removedColumns);
// save column names
writer.writeNext(headers.toArray(new String[headers.size()]));
List<String> keys = new ArrayList<>();
Map<String, String> values;
while ((values = rd.readMap()) != null) {
for (String key : headers) {
keys.add(values.get(key));
if (keys.size() == headers.size()) {
String[] itemsArray = new String[headers.size()];
itemsArray = keys.toArray(itemsArray);
// save values
writer.writeNext(itemsArray);
keys.clear();
}
}
}
writer.flush();
Output:
FirstName, LastName
Steve,Hopkins
James,Bond
Looking at the javadoc
if you create a CSVReader object, then you can use the method .readAll to pull the entire file. It returns a List of String[], with each String[] representing a line of the file. So now you have the tokens of each line, and you only want the second element of that, so split them up as they have been nicely given to you with delimiters. And on each line you only want the second element, so:
public static void main(String[] args){
String data = "63,Graham,101153.06,Abraham";
String result[] = data.split(",");
System.out.print(result[1]);
}

How to get the data after "=" sign from a txt file?

I have a program, in Java, that will save data to a .txt file and save it like:
intdata=2
stringdata=hello
How will I read the data from the text file and specify the value I need. Lets say I want the intdata value, it would return the part after the equals. How do I only return the part after the equals?
If you don't care about the key it's attached to: use String.split.
Scanner scan = new Scanner(new File("file.txt"));
String value = scan.nextLine().split("=")[1];
If you do care about the key it's attached to: use String.split in conjunction with a Map.
Scanner scan = new Scanner(new File("file.txt"));
Map<String, String> values = new HashMap<>();
while(scan.hasNextLine()) {
String[] line = scan.nextLine().split("=");
values.put(line[0], line[1]);
}
Alternatively, if it's a proper .properties file, you could elect to use Properties instead:
Properties propertiesFile = new Properties();
propertiesFile.load(new FileInputStream("file.txt"));
// use it like a regular Properties object now.

Storing Elements From A Text Document Into A Map Java

So I have a text document called stock.txt which contains the following:
AAPL; Apple Inc.
IBM; International Business Machines Corp.
KO; The Coca-Cola Company
FB; Facebook Inc.
SBUX; Starbucks Corp.
And I want to store each element into a HashMap with the stock code as a key and the name of the company as the item. I originally tried storing all of it in an ArrayList however when I wanted to print out one line, for example:
AAPL;Apple Inc.
I would do:
System.out.println(array.get(0));
and it would give me the output:
APPL;Apple
and printing array.get(1) would give me the "Inc." part.
So my overarching question is how to I make sure that I can store these things properly in a HashMap so that I can get the whole string "Apple Inc." into one part of the Map.
Thanks!
You can try following:
InputStream stream=new FileInputStream(new File("path"));
BufferedReader reader=new BufferedReader(new InputStreamReader(stream));
String line;
String tok[];
Map<String, Object> map=new HashMap<String, Object>();
while((line=reader.readLine())!=null){
tok=line.split(";");
map.put(tok[0].trim(), tok[1].trim());
}
System.out.println(map);
Above code reads a file from a specific path and splits the read line from ; character and stores it into map.
Hope it helps.
There is no reason why you couldn't store the information into an ArrayList. The missing data has more to do with how you are reading and processing the file than the structure in which you are storing it. Having said that, using a HashMap will allow you to store the two parts of the line separately while maintaining the link between them. The ArrayList approach does not preserve that link - it is simply an ordered list.
Here's what I would do, which is similar to Darshan's approach (you will need Java 1.7+):
public HashMap<String, String> readStockFile(Path filePath, Charset charset)
{
HashMap<String, String> map = new HashMap<>();
try (BufferedReader fileReader =
Files.newBufferedReader(filePath, charset))
{
final int STOCK_CODE_GROUP = 1;
final int STOCK_NAME_GROUP = 2;
/*
* Regular expression - everything up to ';' goes in stock code group,
* everything after in stock name group, ignoring any whitespace after ';'.
*/
final Pattern STOCK_PATTERN = Pattern.compile("^([^;]+);\s*(.+)$");
String nextLine = null;
Matcher stockMatcher = null;
while ((nextLine = reader.readLine()) != null)
{
stockMatcher = STOCK_PATTERN.matcher(nextLine.trim());
if (stockMatcher.find(0))
if (!map.containsKey(stockMatcher.group(STOCK_CODE_GROUP)))
map.put(stockMatcher.group(STOCK_CODE_GROUP),
stockMatcher.group(STOCK_NAME_GROUP));
}
}
catch (IOException ioEx)
{
ioEx.printStackTrace(System.err); // Do something useful.
}
return map;
}
If you wish to retain insertion order into the map, substitute HashMap for a LinkedHashMap. The regular expression stuff (Pattern and Matcher) belongs to the java.util.regex package, while Path and Charset are in java.nio.file and java.nio.charset respectively. You'll need to use Path.getRoot(), Path.resolve(String filePath) and Charset.forName(String charset) to set up your arguments properly.
You may also want to consider what to do if you encounter a line that is not properly formatted or if a stock appears in the file twice. These will form 'else' clauses to the two 'ifs'.

Categories