Here's the deal :
I was asked to developp a JAVA program that would do some reorganisations of .tsv files (moving cells to do some kind of transposition).
So, I tried to do it cleanly and got now 3 different packages:
.
Only tsvExceptions and tsvTranspositer are needed to make the main (TSVTransposer.java) work.
Yesterday I learned that I would have to implement it in Talend myself which I had never heard of.
So by searching, i stepped on this stackOverflow topic. So i followed the steps, creating a routine, copy/pasting my main inside it (changing the package to "routines") and added the external needed libraries to it (my two packages exported as jar files and openCSV). Now, when I open the routine, no error is showned but I can't drag & drop it to my created job !
Nothing happens. It just opens the component infos as shown with "Properties not available."
package routines;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;
import tsvExceptions.ArgsExceptions;
import tsvExceptions.EmptyArgsException;
import tsvExceptions.OutOfBordersArgsException;
import tsvTranspositer.CommonLine;
import tsvTranspositer.HeadOfValuesHandler;
import tsvTranspositer.InputFile;
import tsvTranspositer.OutputFile;
public class tsvRoutine {
public static void main(String[] args) throws ArgsExceptions {
// Boolean set to true while everything is good
Boolean everythingOk = true;
String inputFile = null; // Name of the entry file to be transposed.
String outputFile = null; // Name of the output file.
int serieNb = 1 ; // Number of columns before the actual values in the input file. Can be columns describing the product as well as empty columns before the values.
int linesToCopy = 0; // Number of lines composing the header of the file (those lines will be copy/pasted in the output)
/*
* Handling the arguments first.
*/
try {
switch (args.length) {
case 0:
throw new EmptyArgsException();
case 1:
inputFile = args[0];
String[] parts = inputFile.split("\\.");
// If no outPutFile name is given, will add "Transposed" to the inputFile Name
outputFile = parts[0] + "Transposed." + parts[1];
break;
case 2:
inputFile = args[0];
outputFile = args[1];
break;
case 3:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
break;
case 4:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
linesToCopy = Integer.parseInt(args[3]);
break;
default:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
linesToCopy = Integer.parseInt(args[3]);
throw new OutOfBordersArgsException();
}
}
catch (ArgsExceptions a) {
a.notOk(everythingOk);
}
catch (NumberFormatException n) {
System.out.println("Arguments 3 & 4 should be numbers."
+ " Number 3 is the Number of columns before the actual values in the input file. \n"
+ "(Can be columns describing the product as well as empty columns before the values. (1 by default)) \n"
+ "Number 4 is the number of lines to copy/pasta. (0 by default) \n"
+ "Please try again.");
everythingOk = false;
}
// Creating an InputFile and an OutputFile
InputFile ex1 = new InputFile(inputFile, linesToCopy);
OutputFile ex2 = new OutputFile(outputFile);
if (everythingOk) {
try ( FileReader fr = new FileReader(inputFile);
CSVReader reader = new CSVReader(fr, '\t', '\'', 0);
FileWriter fw = new FileWriter(outputFile);
CSVWriter writer = new CSVWriter(fw, '\t', CSVWriter.NO_QUOTE_CHARACTER))
{
ex1.setReader(reader);
ex2.setWriter(writer);
// Reading the header of the file
ex1.readHead();
// Writing the header of the file (copy/pasta)
ex2.write(ex1.getHeadFile());
// Handling the line containing the columns names
HeadOfValuesHandler handler = new HeadOfValuesHandler(ex1.readLine(), serieNb);
ex2.writeLine(handler.createOutputHOV());
// Each lien will be read and written (in multiple lines) one after the other.
String[] row;
CommonLine cl1;
// If the period is monthly
if (handler.isMonthly()) {
while (!ex1.isAllDone()) {
row = ex1.readLine();
if (!ex1.isAllDone()) {
cl1 = new CommonLine(row, handler.getYears(), handler.getMonths(), serieNb);
ex2.write(cl1.exportOutputLines());
}
}
}
// If the period is yearly
else {
while (!ex1.isAllDone()) {
row = ex1.readLine();
if (!ex1.isAllDone()) {
cl1 = new CommonLine(row, handler.getYears(), serieNb);
ex2.write(cl1.exportOutputLines());
}
}
}
}
catch (FileNotFoundException f) {
System.out.println(inputFile + " can't be found. Cancelling...");
}
catch (IOException e) {
System.out.println("Unknown exception raised.");
e.printStackTrace();
}
}
}
}
I know the exceptions aren't correctly handled yet, but they are in some kind of hurry for it to work in some way.
Another problem that will occur later is that I have no idea how to parse arguments to the program that are required.
Anyway, thanks for reading this post!
You cannot add routines per drag and drop to a job. You will need to access the routines functions through components.
For example, you would start with a tFileListInput to get all files you need. Then you could add a tFileInputDelimited where you describe all fields of your input. After this, with e.g. a tJavaRow component, you can write some code which would access your routine.
NOTE: Keep in mind that Talend works usually row-wise. This means that your routines should handle stuff in a row-wise manner. This could also mean that your code has to be refactored accordingly. A main function won't work, this has at least to become a class which can be instanciated or has static functions.
If you want to handle everything on your own, instead of a tJavaRow component you might use a tJava component which adds more flexibility.
Still, it won't be as easy as simply adding the routine and everything will work.
In fact, the whole code can become a job on its own. Talend generates the whole Java code for you:
The parameters can become Context variables.
The check if numbers are numbers could be done several ways, for example with a tPreJob and a tJava
Input file could be connected with a tFileInputDelimited with a dot separator
Then, every row will be processed with either a tJavaRow with your custom code or with a tMap if its not too complex.
Afterwards, you can write the file with a tFileOutputDelimited component
Everything will get connected via right click / main to iterate over the rows
All exception handling is done by Talend. If you want to react to exceptions, you can use a component like tLogRow.
Hope this helps a bit to set the direction.
Related
So I'm currently taking AP Comp Sci A and am trying to learn Java, and while developing a small program to keep up with my teacher I've run into a couple issues.
The program is intended to take entries and log them in a journal. I'd eventually like to have it stored in an HTML format and then be able to email my logs to a teacher in an HTML table, but this issue is preventing that.
Basically, with my catch, I'm trying to create and then enter starting HTML code (,, and then necessary tags for the html table) but even if the file doesn't exist the catch isn't running correctly, presumably because of the 'throws IOException' Exclipse had me add.
I also attempted to add commands to my program, but nothing happens when used. No exceptions thrown, nothing printed, etc.
Here's my code:
import java.util.Scanner;
import java.util.Calendar;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.text.SimpleDateFormat;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
public class LogSend {
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
System.out.println("");
System.out.println("Type '!help' for commands");
Scanner cmd = new Scanner(System.in);
String initialCmd = cmd.nextLine();
if (initialCmd == "!help") {
System.out.println("The commands are:");
System.out.println("'!help' for commands");
System.out.println("'!log' to log an entry");
} else if (initialCmd == "!log") {
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime());
System.out.println(timeStamp);
Scanner entry = new Scanner(System.in);
String journalEntry = entry.nextLine();
try (PrintWriter saveLog = new PrintWriter(new FileWriter("log.html", true))) {
saveLog.println(timeStamp+":"+"<b>"+journalEntry+"</b>"+"<br>");
}
}
/*String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime());
System.out.println(timeStamp);
Scanner entry = new Scanner(System.in);
String journalEntry = entry.nextLine();
try (PrintWriter saveLog = new PrintWriter(new FileWriter("log.html", true))) {
saveLog.println(timeStamp+":"+"<b>"+journalEntry+"</b>"+"<br>");
} /*catch(FileNotFoundException e) {
FileWriter fileWriter = new FileWriter("log.html");
PrintWriter saveLog = new PrintWriter(fileWriter);
saveLog.println("<html>");
saveLog.println("<body>");
saveLog.println(timeStamp+":"+journalEntry);
saveLog.println("</body>");
saveLog.println("</html>");
}*/
}
}
Sorry if this is all a bit stupid, I'm brand new to Java and find I learn best through just making programs. I appreciate the help.
NOTE: It's worth noting I've commented out the catch because it's simply not working, but that's the code I used.
First String comparison goes by a method equals
initialCmd == "!help"
initialCmd.equals("!help") // Or equalsIgnoreCase
Then
try (PrintWriter saveLog = new PrintWriter(new FileWriter("log.html", true))) {
saveLog.println(timeStamp+":"+"<b>"+journalEntry+"</b>"+"<br>");
}
is okay, as it writes to the file in append mode (the true) and will almost never throw a FileNotFoundException (=could not create file).
You may do:
try (...
...
} catch (IOException e) {
e.printStackTrace();
}
As FileNotFoundException is also an IOException. For out-of-diskspace, missing rights, wrong directory path.
One remark: FileWriter will use the default (=platform) encoding. For
the full Unicode range that a String is capable of, you could use UTF-8:
try (PrintWriter saveLog = new PrintWriter(
Files.newBufferedWriter(Paths.get("log.html"),
StandardCharsets.UTF_8,
StandardOpenOption.APPEND,
StandardOpenOption.CREATE))) {
saveLog.printf("%s:<b>%s</b><br>%n", timeStamp, journalEntry);
}
Also not that one cause for an exception might be two thread logging to the file.
It would not be wrong to invest time in the java.util.logging framework that can be customized to about the same functionality, and more.
The above does not throw a catch on a non-existing file, assuming the same content being written.
For HTML one would like to write a beginning. One can do that by including the log file in a real HTML file:
<!DOCTYPE html [
<ENTITY log SYSTEM "log.html">
]>
<html>
<head>
<title>Logs</title>
<meta charset="UTF-8">
<meta http-equiv="refresh" content="5"> <!-- reload every 5s -->
</head>
<body>
&log;
</body>
</html>
#index.php For clarification, passing true as the 2nd argument of FileWriter (as #JoopEggen suggests) causes the file to be created if it doesn't exist and to be opened for appending if it does exist. Therefore you won't hit the catch as you'll create a new file logs.html and append to it once created. Isn't that what you're trying to do anyway? So if the catch is just for creating the file when it doesn't exist, it is no longer required?
On the other hand, if you're trying to log the fact the file never existed you can use java.io.File and do something like:
File f = new File(filePathString);
//Note we do !f.isDirectory() as exists will return true for directories too
if(f.exists() && !f.isDirectory()) {
// log that file didn't exist
// create file and append custom error
}
Though with this method you need to be careful not to run into race conditions.
Hope this helps, let me know how you get on!
I have for example 1000 images and their names are all very similar, they just differ in the number. "ImageNmbr0001", "ImageNmbr0002", ....., ImageNmbr1000 etc.;
I would like to get every image and store them into an ImageProcessor Array.
So, for example, if I use a method on element of this array, then this method is applied on the picture, for example count the black pixel in it.
I can use a for loop the get numbers from 1 to 1000, turn them into a string and create substrings of the filenames to load and then attach the string numbers again to the file name and let it load that image.
However I would still have to turn it somehow into an element I can store in an array and I don't a method yet, that receives a string, in fact the file path and returns the respective ImageProcessor that is stored at it's end.
Also my approach at the moment seems rather clumsy and not too elegant. So I would be very happy, if someone could show me a better to do that using methods from those packages:
import ij.ImagePlus;
import ij.plugin.filter.PlugInFilter;
import ij.process.ImageProcessor;
I think I found a solution:
Opener opener = new Opener();
String imageFilePath = "somePath";
ImagePlus imp = opener.openImage(imageFilePath);
ImageProcesser ip = imp.getProcessor();
That do the job, but thank you for your time/effort.
I'm not sure if I undestand what you want exacly... But I definitly would not save each information of each image in separate files for 2 reasons:
- It's slower to save and read the content of multiple files compare with 1 medium size file
- Each file adds overhead (files need Path, minimum size in disk, etc)
If you want performance, group multiple image descriptions in single description files.
If you dont want to make a binary description file, you can always use a Database, which is build for it, performance in read and normally on save.
I dont know exacly what your needs, but I guess you can try make a binary file with fixed size data and read it later
Example:
public static void main(String[] args) throws IOException {
FileOutputStream fout = null;
FileInputStream fin = null;
try {
fout = new FileOutputStream("description.bin");
DataOutputStream dout = new DataOutputStream(fout);
for (int x = 0; x < 1000; x++) {
dout.writeInt(10); // Write Int data
}
fin = new FileInputStream("description.bin");
DataInputStream din = new DataInputStream(fin);
for (int x = 0; x < 1000; x++) {
System.out.println(din.readInt()); // Read Int data
}
} catch (Exception e) {
} finally {
if (fout != null) {
fout.close();
}
if (fin != null) {
fin.close();
}
}
}
In this example, the code writes integers in "description.bin" file and then read them.
This is pretty fast in Java, since Java uses "channels" for files by default
Problem: I want to read a section of a file from HDFS and return it, such as lines 101-120 from a file of 1000 lines.
I don't want to use seek because I have read that it is expensive.
I have log files which I am using PIG to process down into meaningful sets of data. I've been writing an API to return the data for consumption and display by a front end. Those processed data sets can be large enough that I don't want to read the entire file out of Hadoop in one slurp to save wire time and bandwidth. (Let's say 5 - 10MB)
Currently I am using a BufferedReader to return small summary files which is working fine
ArrayList lines = new ArrayList();
...
for (FileStatus item: items) {
// ignoring files like _SUCCESS
if(item.getPath().getName().startsWith("_")) {
continue;
}
in = fs.open(item.getPath());
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
line = br.readLine();
while (line != null) {
line = line.replaceAll("(\\r|\\n)", "");
lines.add(line.split("\t"));
line = br.readLine();
}
}
I've poked around the interwebs quite a bit as well as Stack but haven't found exactly what I need.
Perhaps this is completely the wrong way to go about doing it and I need a completely separate set of code and different functions to manage this. Open to any suggestions.
Thanks!
As added noted based on research from the below discussions:
How does Hadoop process records records split across block boundaries?
Hadoop FileSplit Reading
I think SEEK is a best option for reading files with huge volumes. It did not cause any problems to me as the volume of data that i was reading was in the range of 2 - 3GB. I did not encounter any issues till today but we did use file splitting to handle the large data set. below is the code which you can use for reading purpose and test the same.
public class HDFSClientTesting {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
try{
//System.loadLibrary("libhadoop.so");
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
conf.addResource(new Path("core-site.xml"));
String Filename = "/dir/00000027";
long ByteOffset = 3185041;
SequenceFile.Reader rdr = new SequenceFile.Reader(fs, new Path(Filename), conf);
Text key = new Text();
Text value = new Text();
rdr.seek(ByteOffset);
rdr.next(key,value);
//Plain text
JSONObject jso = new JSONObject(value.toString());
String content = jso.getString("body");
System.out.println("\n\n\n" + content + "\n\n\n");
File file =new File("test.gz");
file.createNewFile();
}
catch (Exception e ){
throw new RuntimeException(e);
}
finally{
}
}
}
I have strings which look like this -
String text = "item1, item2, item3, item4 etc..."
I made java code to write these strings to a text file which will be converted to csv by simply changing the extension. The logic is - print a string, then move to new line and print another string.
Output in text file was perfect when test strings had only 10-20 items.
BUT, my real strings have about 3000 unique items each. There are about 20,000 such strings.
When i write all these strings to the text file, it gets messed up.
I see 3000 rows instead of 20,000 rows.
I think there is no need for code for this problem because its been done and tested.
I only need to be able to format my data properly.
For those who want to see the code -
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class Texty {
public static void main(String[] args) {
System.out.println("start");
String str = "";
String enter = System.getProperty( "line.separator" );
for(int i = 0; i< 5; i++){
str = str + i + ",";
}
str = str + 5;
System.out.println(str);
FileWriter fw = null;
File newTextFile = new File("C:\\filez\\output.txt");
try {
fw = new FileWriter(newTextFile);
} catch (IOException e) {
e.printStackTrace();
}
try {
for(int i = 0; i < 10; i++){
fw.write(str + enter);
}
fw.close();
} catch (IOException iox) {
//do stuff with exception
iox.printStackTrace();
}
System.out.println("stop");
}
}
You are right that there is no difference between 10 columns and 3000 columns, you just have longer lines
Also there is no difference between 10 rows and 20,000 rows, you juts have more lines.
While you can have much, much larger files in Java or on your files system, some old versions of excel could not load so many columns (it had a limit of 256 columns) or such large files (it had a limit of about 1 GB of raw data)
I would check the file is correct in another program e.g. one you wrote and you might find all the data is there.
If the data is not there, you have a bug, There is no limitation in Java or Windows or Linux which would explain the behaviour you are seeing.
I'm making a simple paint program and am stuck with getting a certain part of a string.
Here's the trouble - When I save the 9-panel image, it stores the RBG values of each panel to a .txt file. Example:
java.awt.Color[r=0,g=0,b=0]
java.awt.Color[r=255,g=255,b=255]
java.awt.Color[r=255,g=0,b=0]
java.awt.Color[r=0,g=0,b=255]
java.awt.Color[r=0,g=0,b=0]
java.awt.Color[r=255,g=255,b=0]
java.awt.Color[r=255,g=255,b=0]
java.awt.Color[r=255,g=0,b=0]
java.awt.Color[r=0,g=0,b=255]
From here, I call a scanner to read the lines of our file. I just need to find the best way to extract the values inside the [ ] to a String. I've tried using a tokenizer to no avail, still being stuck with excess Strings. I've tried manipulating characters but again failed. What would be the best way to go about extracting the data from our brackets? AND would it be easier to store the individual r=xxx, b=xxx, g=xxx values to a String[]? Thanks, and here is the source i have so far:
import java.awt.Color;
import java.io.*;
import java.lang.*;
import java.util.*;
//when finished, organize imports (narrow down what imports were used)
public class SaveLoad {
private boolean tryPassed, tryPassed2;
private Formatter x;
//final String[] rawData; will be where the rgb raws are stored
private Scanner xReader;
public void save(Color[] c, String s) {
//s is the filename
int counter = c.length;
//Tries to create a file and, if it does, adds the data to it.
try{
x = new Formatter(s+".txt");
tryPassed = true;
while(counter>0) {
x.format("%s. %s\n", (c.length-(counter-1)), c[counter-1]);
counter--;
}
x.close();
}catch (Exception e){
e.printStackTrace();
tryPassed = false;
}
}
//load will take paramaters of a filename(string); NOTE:::: make the file loaded specify an appendix (ex] .pixmap)
//MAYBE add a load interface with a jDropdownmenu for the filetype? add parameter String filetype.
public void load(String s, String filetype) {
//loads the file and, if successful, attempts to read it.
try{
xReader = new Scanner(new File(s+filetype));
tryPassed2 = true;
}catch(Exception e){
e.printStackTrace();
tryPassed2 = false;
System.out.println(s+filetype+" is not a valid file");
}
while(xReader.hasNext()&&tryPassed2==true) {
String inBrackets = xReader.next().substring(17);
System.out.println(inBrackets);
}
}
}
Also, ignore my messy notations.
The best way is to change the storage format. At least two options:
comma-separate values. Store r,g,b on each line. For example 215,222,213. Then you can have line.split(",") to obtain a String[] of the values
serialize the whole Color array using ObjectOutputStream
I would advise to change format. But if you insists on your one use regex:
String st = "java.awt.Color[r=0,g=0,b=0]";
Pattern p = Pattern.compile("java.awt.Color\\[r=(.*),g=(.*),b=(.*)\\]");
Matcher m = p.matcher(st);
if (m.matches()) {
System.out.println("r=" + m.group(1));
System.out.println("g=" + m.group(2));
System.out.println("b=" + m.group(3));
}