I have strings which look like this -
String text = "item1, item2, item3, item4 etc..."
I made java code to write these strings to a text file which will be converted to csv by simply changing the extension. The logic is - print a string, then move to new line and print another string.
Output in text file was perfect when test strings had only 10-20 items.
BUT, my real strings have about 3000 unique items each. There are about 20,000 such strings.
When i write all these strings to the text file, it gets messed up.
I see 3000 rows instead of 20,000 rows.
I think there is no need for code for this problem because its been done and tested.
I only need to be able to format my data properly.
For those who want to see the code -
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class Texty {
public static void main(String[] args) {
System.out.println("start");
String str = "";
String enter = System.getProperty( "line.separator" );
for(int i = 0; i< 5; i++){
str = str + i + ",";
}
str = str + 5;
System.out.println(str);
FileWriter fw = null;
File newTextFile = new File("C:\\filez\\output.txt");
try {
fw = new FileWriter(newTextFile);
} catch (IOException e) {
e.printStackTrace();
}
try {
for(int i = 0; i < 10; i++){
fw.write(str + enter);
}
fw.close();
} catch (IOException iox) {
//do stuff with exception
iox.printStackTrace();
}
System.out.println("stop");
}
}
You are right that there is no difference between 10 columns and 3000 columns, you just have longer lines
Also there is no difference between 10 rows and 20,000 rows, you juts have more lines.
While you can have much, much larger files in Java or on your files system, some old versions of excel could not load so many columns (it had a limit of 256 columns) or such large files (it had a limit of about 1 GB of raw data)
I would check the file is correct in another program e.g. one you wrote and you might find all the data is there.
If the data is not there, you have a bug, There is no limitation in Java or Windows or Linux which would explain the behaviour you are seeing.
Related
I'm fairly new at programming and programming in Java. Currently I'm learning about Exceptions and how to handle them. In this one lab I am given multiple text files containing numbers in a grid like manner. I was told to use the command-line arguments to run all the files at once.
The way the text files look (screenshot)
All files are named valid or invalid. The invalid files have some unique error in them that I have to check for. The first line of each text file gives the total number of rows and columns. I'm stuck trying to figure out how I can check the validity of text files: invalid2, invalid5, and invalid7. I can write some bad code that catches the errors in files invalid5 and invalid7 (which I don't want do), but invalid2 is really giving me a hard time. The problem with invalid2 is that the first line in the file tells me that the grid has 3 rows and 4 columns, but the actual grid has 4 rows and 3 columns.
invalid2 text file screenshot
I am currently trying to break the grids into individual rows. My logic (and here is where I want to know if its faulty) is that I'm using a while loop that checks if there is a next line in the text file. If there is, I will increase my row counter (even though I already know how many rows the grid has). I then check if the row counter is bigger than the actual number of rows given which would throw an exception that there are too many rows. If not, I move onto parsing each row of the grid. I use a String "nextLine" that contains all the numbers in that one row of the grid and parse that string with the Scanner "parse". I use a for loop that goes on the same number of times as there are columns (The # given by the file). I then set the variable "num" of type double to get the next double in that string. At this point my program breaks down. I don't know why I'm getting a NoSuchElementException at that line in my code. Screenshot on Eclipse I don't really need to do anything with the information I'm reading from the files so the "num" variable is just there to move onto the next number in the row.
I have two questions
Why am I getting the NoSuchElementException in that part of my code?
Is my logic correct in how I'm trying to check for the validity of each text file?
The output to the console should look something like this:
valid1.dat
VALID
invalid1.dat
java.lang.NumberFormatException: For input string: "X"
INVALID
etc...
This is my code:
public class FormatChecker {
public static void main(String[] args) {
for(int n = 3; n < args.length; n++) { // Loops through command-line arguments
String fileName = args[n]; // Sets specific file
try {
File file = new File(fileName); // new File from specified file name
Scanner read = new Scanner(file); // Scanner object to read the new file
String firstLine = read.nextLine(); // String to hold the first line of the file
Scanner parse = new Scanner(firstLine); // Scanner object to parse the first line (reads first line)
int row = parse.nextInt(); // Gets first token from Scanner object 'parse' to get # of rows
int col = parse.nextInt(); // Gets second token from Scanner object 'parse' to get # of columns
if(parse.hasNext()) { // Checks if there are more than two tokens (for invalid4.dat)
throw new GridSizeException("Too many dimensions"); // If yes, throws created exception "GridSizeException"
} else { // HERE IS WHERE I NEED HELP W/ MY LOGIC!
int rowCount = 0; // Row counter even thought we already know the # of rows
while(read.hasNextLine()) { // While loop that loops until the end of the file
rowCount++; // row count is increased
if(rowCount > row) { // if row counter is > than our given # of rows, we will throw an exception
throw new GridSizeException("Too many rows");
}
String nextLine = read.nextLine(); // String to hold the next line of the file
parse = new Scanner(nextLine); // For parsing that next line
for(int j = 0; j < col; j++) { // Loops the same # of times as the # of columns
double num = parse.nextDouble(); // Move through the String
if(j == col-1 && parse.hasNext()) { // Checks if there are more numbers in that row than expected.
throw new GridSizeException("Too many columns");
}
}
}
System.out.println(fileName);
System.out.println("VALID");
System.out.println();
}
} catch (FileNotFoundException e){
System.out.println(fileName);
System.out.println(e.toString() + " for input string: " + fileName);
System.out.println("INVALID\n");
} catch (NumberFormatException e) {
System.out.println(fileName);
System.out.println(e.toString() + " for input string: " + fileName);
System.out.println("INVALID\n");
} catch (InputMismatchException e) {
System.out.println(fileName);
System.out.println(e.toString() + " for input string: " + fileName);
System.out.println("INVALID\n");
} catch (GridSizeException e) {
System.out.println(fileName);
System.out.println(e.toString() + " for input string: " + fileName);
System.out.println("INVALID\n");
}
}
}
}
When you have read line with scanner, use string.split(" ") ( or whatever separates values )
Result of split() is an array of strings and you can
check for amount of values with array.length ( and decide thether there is enough values at all, or there is an error )
convert individual strings to number and catch an exception ( if number is not valid )
Here's the deal :
I was asked to developp a JAVA program that would do some reorganisations of .tsv files (moving cells to do some kind of transposition).
So, I tried to do it cleanly and got now 3 different packages:
.
Only tsvExceptions and tsvTranspositer are needed to make the main (TSVTransposer.java) work.
Yesterday I learned that I would have to implement it in Talend myself which I had never heard of.
So by searching, i stepped on this stackOverflow topic. So i followed the steps, creating a routine, copy/pasting my main inside it (changing the package to "routines") and added the external needed libraries to it (my two packages exported as jar files and openCSV). Now, when I open the routine, no error is showned but I can't drag & drop it to my created job !
Nothing happens. It just opens the component infos as shown with "Properties not available."
package routines;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;
import tsvExceptions.ArgsExceptions;
import tsvExceptions.EmptyArgsException;
import tsvExceptions.OutOfBordersArgsException;
import tsvTranspositer.CommonLine;
import tsvTranspositer.HeadOfValuesHandler;
import tsvTranspositer.InputFile;
import tsvTranspositer.OutputFile;
public class tsvRoutine {
public static void main(String[] args) throws ArgsExceptions {
// Boolean set to true while everything is good
Boolean everythingOk = true;
String inputFile = null; // Name of the entry file to be transposed.
String outputFile = null; // Name of the output file.
int serieNb = 1 ; // Number of columns before the actual values in the input file. Can be columns describing the product as well as empty columns before the values.
int linesToCopy = 0; // Number of lines composing the header of the file (those lines will be copy/pasted in the output)
/*
* Handling the arguments first.
*/
try {
switch (args.length) {
case 0:
throw new EmptyArgsException();
case 1:
inputFile = args[0];
String[] parts = inputFile.split("\\.");
// If no outPutFile name is given, will add "Transposed" to the inputFile Name
outputFile = parts[0] + "Transposed." + parts[1];
break;
case 2:
inputFile = args[0];
outputFile = args[1];
break;
case 3:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
break;
case 4:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
linesToCopy = Integer.parseInt(args[3]);
break;
default:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
linesToCopy = Integer.parseInt(args[3]);
throw new OutOfBordersArgsException();
}
}
catch (ArgsExceptions a) {
a.notOk(everythingOk);
}
catch (NumberFormatException n) {
System.out.println("Arguments 3 & 4 should be numbers."
+ " Number 3 is the Number of columns before the actual values in the input file. \n"
+ "(Can be columns describing the product as well as empty columns before the values. (1 by default)) \n"
+ "Number 4 is the number of lines to copy/pasta. (0 by default) \n"
+ "Please try again.");
everythingOk = false;
}
// Creating an InputFile and an OutputFile
InputFile ex1 = new InputFile(inputFile, linesToCopy);
OutputFile ex2 = new OutputFile(outputFile);
if (everythingOk) {
try ( FileReader fr = new FileReader(inputFile);
CSVReader reader = new CSVReader(fr, '\t', '\'', 0);
FileWriter fw = new FileWriter(outputFile);
CSVWriter writer = new CSVWriter(fw, '\t', CSVWriter.NO_QUOTE_CHARACTER))
{
ex1.setReader(reader);
ex2.setWriter(writer);
// Reading the header of the file
ex1.readHead();
// Writing the header of the file (copy/pasta)
ex2.write(ex1.getHeadFile());
// Handling the line containing the columns names
HeadOfValuesHandler handler = new HeadOfValuesHandler(ex1.readLine(), serieNb);
ex2.writeLine(handler.createOutputHOV());
// Each lien will be read and written (in multiple lines) one after the other.
String[] row;
CommonLine cl1;
// If the period is monthly
if (handler.isMonthly()) {
while (!ex1.isAllDone()) {
row = ex1.readLine();
if (!ex1.isAllDone()) {
cl1 = new CommonLine(row, handler.getYears(), handler.getMonths(), serieNb);
ex2.write(cl1.exportOutputLines());
}
}
}
// If the period is yearly
else {
while (!ex1.isAllDone()) {
row = ex1.readLine();
if (!ex1.isAllDone()) {
cl1 = new CommonLine(row, handler.getYears(), serieNb);
ex2.write(cl1.exportOutputLines());
}
}
}
}
catch (FileNotFoundException f) {
System.out.println(inputFile + " can't be found. Cancelling...");
}
catch (IOException e) {
System.out.println("Unknown exception raised.");
e.printStackTrace();
}
}
}
}
I know the exceptions aren't correctly handled yet, but they are in some kind of hurry for it to work in some way.
Another problem that will occur later is that I have no idea how to parse arguments to the program that are required.
Anyway, thanks for reading this post!
You cannot add routines per drag and drop to a job. You will need to access the routines functions through components.
For example, you would start with a tFileListInput to get all files you need. Then you could add a tFileInputDelimited where you describe all fields of your input. After this, with e.g. a tJavaRow component, you can write some code which would access your routine.
NOTE: Keep in mind that Talend works usually row-wise. This means that your routines should handle stuff in a row-wise manner. This could also mean that your code has to be refactored accordingly. A main function won't work, this has at least to become a class which can be instanciated or has static functions.
If you want to handle everything on your own, instead of a tJavaRow component you might use a tJava component which adds more flexibility.
Still, it won't be as easy as simply adding the routine and everything will work.
In fact, the whole code can become a job on its own. Talend generates the whole Java code for you:
The parameters can become Context variables.
The check if numbers are numbers could be done several ways, for example with a tPreJob and a tJava
Input file could be connected with a tFileInputDelimited with a dot separator
Then, every row will be processed with either a tJavaRow with your custom code or with a tMap if its not too complex.
Afterwards, you can write the file with a tFileOutputDelimited component
Everything will get connected via right click / main to iterate over the rows
All exception handling is done by Talend. If you want to react to exceptions, you can use a component like tLogRow.
Hope this helps a bit to set the direction.
I am reading in a CSV file and putting each delimited element into a two-dimensional array. The code looks like this:
public DataProcess(String filename, String[][] contents, int n) {//n is 6 for contents, 5 for fiveMinContents
Scanner fileReader = null;
try {
fileReader = new Scanner(new File(filename));
} catch (FileNotFoundException ex) {
System.out.println(ex + " FILE NOT FOUND ");
}
fileReader.useDelimiter(",");
int rowIndex = 0;
while (fileReader.hasNext()) {
for (int j = 0; j < n; j++) {
contents[rowIndex][j] = fileReader.next();
System.out.println("At (" + rowIndex +", "+j+"): " +
contents[rowIndex][j]);
}
rowIndex++;
fileReader.nextLine();
}
}
I am not sure why it reads every other line of this particular CSV file because this is file 2/2 that is being read in this manner. The first one reads fine, but now this one skips every other line. Why would it work for one but not the other? I am running this on Eclipse's latest update.
I also checked out this answer and it did not help.
Because the last line of your loop reads a line and discards it. You need something like,
while (fileReader.hasNextLine()) {
String line = fileReader.nextLine();
contents[rowIndex] = fileReader.split(",\\s*");
System.out.println("At (" + rowIndex + "): "
+ Arrays.toString(contents[rowIndex]));
rowIndex++;
}
You could also print the multi-dimensional array with one call like
System.out.println(Arrays.deepToString(contents));
While the approach may work for you, it's not optimal. There are premade CSV readers for Java. One example is commons-csv:
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
for (CSVRecord record : records) {
String date = record.get(1);
String time = record.get(2);
// and so on, so forth
}
There are a small number of dependencies that have to be on your classpath. Hope that helps.
I found the issue to this problem.
First, I recommend using the external library that was suggested.
The issue was that since this second file was reading the entire row, whereas the first CSV file was reading what I wanted it to, but there was a column at the end of the file that I was ignoring. There must be a way that a CSV file is structured where the end of a row has a different delimiter or something along those lines--not sure. To fix this issue, I just added an extra column to the second file and I am not reading it in; it is just there.
In short, use an external CSV-reader library. If you don't want to do that, then just add a column directly after the last column in the file and do not read it.
I have for example 1000 images and their names are all very similar, they just differ in the number. "ImageNmbr0001", "ImageNmbr0002", ....., ImageNmbr1000 etc.;
I would like to get every image and store them into an ImageProcessor Array.
So, for example, if I use a method on element of this array, then this method is applied on the picture, for example count the black pixel in it.
I can use a for loop the get numbers from 1 to 1000, turn them into a string and create substrings of the filenames to load and then attach the string numbers again to the file name and let it load that image.
However I would still have to turn it somehow into an element I can store in an array and I don't a method yet, that receives a string, in fact the file path and returns the respective ImageProcessor that is stored at it's end.
Also my approach at the moment seems rather clumsy and not too elegant. So I would be very happy, if someone could show me a better to do that using methods from those packages:
import ij.ImagePlus;
import ij.plugin.filter.PlugInFilter;
import ij.process.ImageProcessor;
I think I found a solution:
Opener opener = new Opener();
String imageFilePath = "somePath";
ImagePlus imp = opener.openImage(imageFilePath);
ImageProcesser ip = imp.getProcessor();
That do the job, but thank you for your time/effort.
I'm not sure if I undestand what you want exacly... But I definitly would not save each information of each image in separate files for 2 reasons:
- It's slower to save and read the content of multiple files compare with 1 medium size file
- Each file adds overhead (files need Path, minimum size in disk, etc)
If you want performance, group multiple image descriptions in single description files.
If you dont want to make a binary description file, you can always use a Database, which is build for it, performance in read and normally on save.
I dont know exacly what your needs, but I guess you can try make a binary file with fixed size data and read it later
Example:
public static void main(String[] args) throws IOException {
FileOutputStream fout = null;
FileInputStream fin = null;
try {
fout = new FileOutputStream("description.bin");
DataOutputStream dout = new DataOutputStream(fout);
for (int x = 0; x < 1000; x++) {
dout.writeInt(10); // Write Int data
}
fin = new FileInputStream("description.bin");
DataInputStream din = new DataInputStream(fin);
for (int x = 0; x < 1000; x++) {
System.out.println(din.readInt()); // Read Int data
}
} catch (Exception e) {
} finally {
if (fout != null) {
fout.close();
}
if (fin != null) {
fin.close();
}
}
}
In this example, the code writes integers in "description.bin" file and then read them.
This is pretty fast in Java, since Java uses "channels" for files by default
I'm making a simple paint program and am stuck with getting a certain part of a string.
Here's the trouble - When I save the 9-panel image, it stores the RBG values of each panel to a .txt file. Example:
java.awt.Color[r=0,g=0,b=0]
java.awt.Color[r=255,g=255,b=255]
java.awt.Color[r=255,g=0,b=0]
java.awt.Color[r=0,g=0,b=255]
java.awt.Color[r=0,g=0,b=0]
java.awt.Color[r=255,g=255,b=0]
java.awt.Color[r=255,g=255,b=0]
java.awt.Color[r=255,g=0,b=0]
java.awt.Color[r=0,g=0,b=255]
From here, I call a scanner to read the lines of our file. I just need to find the best way to extract the values inside the [ ] to a String. I've tried using a tokenizer to no avail, still being stuck with excess Strings. I've tried manipulating characters but again failed. What would be the best way to go about extracting the data from our brackets? AND would it be easier to store the individual r=xxx, b=xxx, g=xxx values to a String[]? Thanks, and here is the source i have so far:
import java.awt.Color;
import java.io.*;
import java.lang.*;
import java.util.*;
//when finished, organize imports (narrow down what imports were used)
public class SaveLoad {
private boolean tryPassed, tryPassed2;
private Formatter x;
//final String[] rawData; will be where the rgb raws are stored
private Scanner xReader;
public void save(Color[] c, String s) {
//s is the filename
int counter = c.length;
//Tries to create a file and, if it does, adds the data to it.
try{
x = new Formatter(s+".txt");
tryPassed = true;
while(counter>0) {
x.format("%s. %s\n", (c.length-(counter-1)), c[counter-1]);
counter--;
}
x.close();
}catch (Exception e){
e.printStackTrace();
tryPassed = false;
}
}
//load will take paramaters of a filename(string); NOTE:::: make the file loaded specify an appendix (ex] .pixmap)
//MAYBE add a load interface with a jDropdownmenu for the filetype? add parameter String filetype.
public void load(String s, String filetype) {
//loads the file and, if successful, attempts to read it.
try{
xReader = new Scanner(new File(s+filetype));
tryPassed2 = true;
}catch(Exception e){
e.printStackTrace();
tryPassed2 = false;
System.out.println(s+filetype+" is not a valid file");
}
while(xReader.hasNext()&&tryPassed2==true) {
String inBrackets = xReader.next().substring(17);
System.out.println(inBrackets);
}
}
}
Also, ignore my messy notations.
The best way is to change the storage format. At least two options:
comma-separate values. Store r,g,b on each line. For example 215,222,213. Then you can have line.split(",") to obtain a String[] of the values
serialize the whole Color array using ObjectOutputStream
I would advise to change format. But if you insists on your one use regex:
String st = "java.awt.Color[r=0,g=0,b=0]";
Pattern p = Pattern.compile("java.awt.Color\\[r=(.*),g=(.*),b=(.*)\\]");
Matcher m = p.matcher(st);
if (m.matches()) {
System.out.println("r=" + m.group(1));
System.out.println("g=" + m.group(2));
System.out.println("b=" + m.group(3));
}