How to add new column when create a csv file in scala - java

val csv_writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(dest+"whatever"+".csv")))
for (x <-storeTimestamp ) {
csv_writer.write(x + "\n")
}
csv_writer.close()
Is there any other ways to write 2 lists of string to a csv file?
For now in my code, I only wrote 1 list of string to the csv file. How to add a new column from another list?
Is it OK to use Bufferedwriter?

If I understand your question, you have some CSV data and you want to add a new column of data to the end of each line before writing it to a file.
Here's how I might go about it.
// some pretend data
val origCSV = Seq("a,b,c", "d,e,f", "x,y,z")
val newCLMN = Seq("4X", "2W", "9A")
// put them together
val allData = origCSV.zip(newCLMN).map{case (a,b) => s"$a,$b\n"}
Note: zip will only zip the two collections together until it runs out of one or the other. It there's data left in the larger collection then it is ignored. If that's not desirable then you might try zipAll.
On to the file writing.
import java.io.{File, FileWriter}
import util.Try
val writer = Try(new FileWriter(new File("filename.csv")))
writer.map{w => w.write(allData.mkString); w}
.recoverWith{case e => e.printStackTrace(); writer}
.map(_.close())
And the result is...
>> cat filename.csv
a,b,c,4X
d,e,f,2W
x,y,z,9A
>>

Related

BufferedReader - Output columns in in different order JAVA

I have 2 csv files with column 'car', 'bike', 'tractor' etc
The below code prints out data from the csv which works fine, however cvs 1 prints out in a different or to csv 2 so I want to arrange the columns in a different order.
From this code, how can I organise the data to print out in order of which column I want first, second etc.
BufferedReader r = new BufferedReader(new InputStreamReader(str));
Stream lines = r.lines().skip(1);
lines.forEachOrdered(
line -> {
line= ((String) line).replace("\"", "");
ret.add((String) line);
The columns print out like this:
csv 1
Car, Bike, Tractor, Plane, Train
csv 2
Bike, Plane, Tractor, Train, Car,
but I want to manipulate the code so the two csv files print out in the same order like;
Bike, Plane ,Tractor, Train, Car
I can't use the likes of col[1],col[3], as the two files are in different or so I would need to call them by column name in the csv file so col["Truck"] etc
Or is there another way. Like creating a new list from the csv 1 output and rearranging ?
I haven't used BufferedReader much so I'm not sure if this is a silly question and there's a simple solution
A BufferedReader reads lines, and does not care for the content of those lines. So this code will simply save lines into ret as it is reading them:
List<String> ret = new ArrayList<>();
try (BufferedReader r = new BufferedReader(new InputStreamReader(str))) {
r.lines().skip(1).forEachOrdered(l -> ret.add(l.replace("\"", ""));
}
// now ret contains one string per CSV line, excluding the 1st
(This is somewhat better than your code in that it is guaranteed to close the reader correctly, and does not require any casts to string).
If your CSV lines do not contain any , characters that are not separators, you can modify the above code to split lines into columns; which you can then reorder:
List<String[]> ret = new ArrayList<>(); // list of string arrays
try (BufferedReader r = new BufferedReader(new InputStreamReader(str))) {
r.lines().skip(1).forEachOrdered(l ->
ret.add(l.replace("\"", "").split(",")); // splits by ','
}
// now ret contains a String[] per CSV line, skipping the 1st;
// with ret.get(0)[1] being the 2nd column of the 1st non-skipped line
// this will output all lines, reversing the order of columns 1 and 2:
for (String[] line : ret) {
System.out.print(line[1] + ", " + line[0]);
for (int i=2; i<line.length; i++) System.out.print(", " + line[i]);
System.out.println();
}
If your CSV lines can contain ,s that are not delimiters, you will need to learn how to correctly parse (=read) CSVs, and that requires significantly more than a BufferedReader. I would recommend using an external library to handle this correctly (for there are many types of CSVs in the wild). In particular, using Apache Commons CSV, things are relatively straightforward:
try (Reader in = new FileReader("path/to/file.csv")) {
Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
for (CSVRecord record : records) {
String columnOne = record.get(0);
String columnTwo = record.get(1);
}
}

Best way to populate a user defined object using the values of string array

I am reading two different csv files and populating data into two different objects. I am splitting each line of csv file based on regex(regex is different for two csv files) and populating the object using each data of that array which is obtained by splitting each line using regex as shown below:
public static <T> List<T> readCsv(String filePath, String type) {
List<T> list = new ArrayList<T>();
try {
File file = new File(filePath);
FileInputStream fileInputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader)
list = bufferedReader.lines().skip(1).map(line -> {
T obj = null;
String[] data = null;
if (type.equalsIgnoreCase("Student")) {
data = line.split(",");
ABC abc = new ABC();
abc.setName(data[0]);
abc.setRollNo(data[1]);
abc.setMobileNo(data[2]);
obj = (T)abc;
} else if (type.equalsIgnoreCase("Employee")) {
data = line.split("\\|");
XYZ xyz = new XYZ();s
xyz.setName(Integer.parseInt(data[0]));
xyz.setCity(data[1]);
xyz.setEmployer(data[2]);
xyz.setDesignation(data[3]);
obj = (T)xyz;
}
return obj;
}).collect(Collectors.toList());} catch(Exception e) {
}}
csv files are as below:
i. csv file to populate ABC object:
Name,rollNo,mobileNo
Test1,1000,8888888888
Test2,1001,9999999990
ii. csv file to populate XYZ object
Name|City|Employer|Designation
Test1|City1|Emp1|SSE
Test2|City2|Emp2|
The issue is there can be a missing data for any of the above columns in the csv file as shown in the second csv file. In that case, I will get ArrayIndexOutOfBounds exception.
Can anyone let me know what is the best way to populate the object using the data of the string array?
Thanks in advance.
In addition to the other mistakes you made and that were pointed out to you in the comments your actual problem is caused by line.split("\\|") calling line.split("\\|", 0) which discards the trailing empty String. You need to call it with line.split("\\|", -1) instead and it will work.
The problem appears to be that one or more of the last values on any given CSV line may be empty. In that case, you run into the fact that String.split(String) suppresses trailing empty strings.
Supposing that you can rely on all the fields in fact being present, even if empty, you can simply use the two-arg form of split():
data = line.split(",", -1);
You can find details in that method's API docs.
If you cannot be confident that the fields will be present at all, then you can force them to be by adding delimiters to the end of the input string:
data = (line + ",,").split(",", -1);
Since you only use the first values few values, any extra trailing values introduced by the extra delimiters would be ignored.

Write List into .csv file

I have a list and want to write the data into .csv file.
This is an example how my list looks like:
Each element contains the values from a database table. I deleted the information here. For example List[0] contains id = 1, name = Test, date = 02.02.2016 etc.
This is my code so far, but I have no Idea how to continue.
List<String> lines = Arrays.asList("ColumnHeader", "ColumnHeader", " ColumnHeader","ColumnHeader","ColumnHeader");
Path file = Paths.get(test.csv");
Files.write(file, lines, Charset.forName("Windows-1252"));
Take a look at OpenCSV. This library contains tools for writing List and array objects to csv files, and even database result sets can be written directly to a file.
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), '#');
java.sql.ResultSet myResultSet = ....
writer.writeAll(myResultSet, includeHeaders);
writer.close();

Write to separate files in Apache Spark (with Java)

I am reading my data as whole text files. My object is of type Article which I defined. Here's the reading and processing of the data:
JavaPairRDD<String, String> filesRDD = context.wholeTextFiles(inputDataPath);
JavaRDD<Article> processingFiles = filesRDD.map(fileNameContent -> {
String content = fileNameContent._2();
Article a = new Article(content);
return a;
}
Now, once every file has been processed separately, I would like to write the result on HDFS as a separate file to, not with saveAsTextFile. I know that probably I have to do it with foreach, so:
processingFiles.foreach(a -> {
// Here is a pseudo code of how I want to do this
String fileName = here_is_full_file_name_to_write_to_hdfs;
writeToDisk(fileName, a); // This could be a simple text file
});
Any ideas how to do this in Java?

How to chain multiple different InputStreams into one InputStream

I'm wondering if there is any ideomatic way to chain multiple InputStreams into one continual InputStream in Java (or Scala).
What I need it for is to parse flat files that I load over the network from an FTP-Server. What I want to do is to take file[1..N], open up streams and then combine them into one stream. So when file1 comes to an end, I want to start reading from file2 and so on, until I reach the end of fileN.
I need to read these files in a specific order, data comes from a legacy system that produces files in barches so data in one depends on data in another file, but I would like to handle them as one continual stream to simplify my domain logic interface.
I searched around and found PipedInputStream, but I'm not positive that is what I need. An example would be helpful.
It's right there in JDK! Quoting JavaDoc of SequenceInputStream:
A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.
You want to concatenate arbitrary number of InputStreams while SequenceInputStream accepts only two. But since SequenceInputStream is also an InputStream you can apply it recursively (nest them):
new SequenceInputStream(
new SequenceInputStream(
new SequenceInputStream(file1, file2),
file3
),
file4
);
...you get the idea.
See also
How do you merge two input streams in Java? (dup?)
This is done using SequencedInputStream, which is straightforward in Java, as Tomasz Nurkiewicz's answer shows. I had to do this repeatedly in a project recently, so I added some Scala-y goodness via the "pimp my library" pattern.
object StreamUtils {
implicit def toRichInputStream(str: InputStream) = new RichInputStream(str)
class RichInputStream(str: InputStream) {
// a bunch of other handy Stream functionality, deleted
def ++(str2: InputStream): InputStream = new SequenceInputStream(str, str2)
}
}
With that, I can do stream sequencing as follows
val mergedStream = stream1++stream2++stream3
or even
val streamList = //some arbitrary-length list of streams, non-empty
val mergedStream = streamList.reduceLeft(_++_)
Another solution: first create a list of input stream and then create the sequence of input streams:
List<InputStream> iss = Files.list(Paths.get("/your/path"))
.filter(Files::isRegularFile)
.map(f -> {
try {
return new FileInputStream(f.toString());
} catch (Exception e) {
throw new RuntimeException(e);
}
}).collect(Collectors.toList());
new SequenceInputStream(Collections.enumeration(iss)))
Here is a more elegant solution using Vector, this is for Android specifically but use vector for any Java
AssetManager am = getAssets();
Vector v = new Vector(Constant.PAGES);
for (int i = 0; i < Constant.PAGES; i++) {
String fileName = "file" + i + ".txt";
InputStream is = am.open(fileName);
v.add(is);
}
Enumeration e = v.elements();
SequenceInputStream sis = new SequenceInputStream(e);
InputStreamReader isr = new InputStreamReader(sis);
Scanner scanner = new Scanner(isr); // or use bufferedReader
Here's a simple Scala version that concatenates an Iterator[InputStream]:
import java.io.{InputStream, SequenceInputStream}
import scala.collection.JavaConverters._
def concatInputStreams(streams: Iterator[InputStream]): InputStream =
new SequenceInputStream(streams.asJavaEnumeration)

Categories