Replace text inside a PDF file using iText

Replace text inside a PDF file using iText - java

Im using iText(5.5.13) library to read a .PDF and replace a pattern inside the file. The problem is that the pattern is not being found because somehow some weird characters appear when the library reads the pdf.
For example, in the sentence :
"This is a test in order to see if the"
becomes this one when I'm trying to read it:
[(This is a )9(te)-3(st)9( in o)-4(rd)15(er )-2(t)9(o)-5( s)8(ee)7( if t)-3(h)3(e )]
So if I tried to find and replace "test", no "test" word would be found in the pdf and it won't be replaced
here is the code i'm using:
public void processPDF(String src, String dest) {
try {
PdfReader reader = new PdfReader(src);
PdfArray refs = null;
PRIndirectReference reference = null;
int nPages = reader.getNumberOfPages();
for (int i = 1; i <= nPages; i++) {
PdfDictionary dict = reader.getPageN(i);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
if (object.isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
ArrayList<PdfObject> references = refs.getArrayList();
for (PdfObject r : references) {
reference = (PRIndirectReference) r;
PRStream stream = (PRStream) PdfReader.getPdfObject(reference);
byte[] data = PdfReader.getStreamBytes(stream);
String dd = new String(data, "UTF-8");
dd = dd.replaceAll("#pattern_1234", "trueValue");
dd = dd.replaceAll("test", "tested");
stream.setData(dd.getBytes());
}
}
if (object instanceof PRStream) {
PRStream stream = (PRStream) object;
byte[] data = PdfReader.getStreamBytes(stream);
String dd = new String(data, "UTF-8");
System.out.println("content---->" + dd);
dd = dd.replaceAll("#pattern_1234", "trueValue");
dd = dd.replaceAll("This", "FIRST");
stream.setData(dd.getBytes(StandardCharsets.UTF_8));
}
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
catch (Exception e) {
}
}

As has already been mentioned in comments and answers, PDF is not a format meant for text editing. It is a final format, and information on the flow of text, its layout, and even its mapping to Unicode is optional.
Thus, even assuming the optional information on mapping glyphs to Unicode are present, the approach to this task with iText might look a bit unsatisfying: First one would determine the position of the text in question using a custom text extraction strategy, then continue by removing the current contents of everything at that position using the PdfCleanUpProcessor, and finally draw the replacement text into the gap.
In this answer I would present a helper class allowing to combine the first two steps, finding and removing the existing text, with the advantage that indeed only the text is removed, not also any background graphics etc. as in case of PdfCleanUpProcessor redaction. The helper furthermore returns the positions of the removed text allowing stamping of replacement thereon.
The helper class is based on the PdfContentStreamEditor presented in this earlier answer. Please use the version of this class on github, though, as the original class has been enhanced a bit since conception.
The SimpleTextRemover helper class illustrates what is necessary to properly remove text from a PDF. Actually it is limited in a few aspects:
It only replaces text in the actual page content streams.
To also replace text in embedded XObjects, one has to iterate through the XObject resources of the respective page in question recursively and also apply the editor to them.
It is "simple" in the same way the SimpleTextExtractionStrategy is: It assumes the text showing instructions to appear in the content in reading order.
To also work with content streams for which the order is different and the instructions must be sorted, and this implies that all incoming instructions and relevant render information must be cached until the end of page, not merely a few instruction at a time. Then the render information can be sorted, sections to remove can be identified in the sorted render information, the associated instructions can be manipulated, and the instructions can eventually be stored.
It does not try to identify gaps between glyphs that visually represent a white space while there actually is no glyph at all.
To identify gaps the code must be extended to check whether two consecutive glyphs exactly follow one another or whether there is a gap or a line jump.
When calculating the gap to leave where a glyph is removed, it does not yet take the character and word spacing into account.
To improve this, the glyph width calculation must be improved.
Considering your example excerpt from your content stream, though, you these restrictions probably won't hinder you.
public class SimpleTextRemover extends PdfContentStreamEditor {
public SimpleTextRemover() {
super (new SimpleTextRemoverListener());
((SimpleTextRemoverListener)getRenderListener()).simpleTextRemover = this;
}
/**
* <p>Removes the string to remove from the given page of the
* document in the PDF reader the given PDF stamper works on.</p>
* <p>The result is a list of glyph lists each of which represents
* a match can can be queried for position information.</p>
*/
public List<List<Glyph>> remove(PdfStamper pdfStamper, int pageNum, String toRemove) throws IOException {
if (toRemove.length() == 0)
return Collections.emptyList();
this.toRemove = toRemove;
cachedOperations.clear();
elementNumber = -1;
pendingMatch.clear();
matches.clear();
allMatches.clear();
editPage(pdfStamper, pageNum);
return allMatches;
}
/**
* Adds the given operation to the cached operations and checks
* whether some cached operations can meanwhile be processed and
* written to the result content stream.
*/
#Override
protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {
cachedOperations.add(new ArrayList<>(operands));
while (process(processor)) {
cachedOperations.remove(0);
}
}
/**
* Removes any started match and sends all remaining cached
* operations for processing.
*/
#Override
public void finalizeContent() {
pendingMatch.clear();
try {
while (!cachedOperations.isEmpty()) {
if (!process(this)) {
// TODO: Should not happen, so warn
System.err.printf("Failure flushing operation %s; dropping.\n", cachedOperations.get(0));
}
cachedOperations.remove(0);
}
} catch (IOException e) {
throw new ExceptionConverter(e);
}
}
/**
* Tries to process the first cached operation. Returns whether
* it could be processed.
*/
boolean process(PdfContentStreamProcessor processor) throws IOException {
if (cachedOperations.isEmpty())
return false;
List<PdfObject> operands = cachedOperations.get(0);
PdfLiteral operator = (PdfLiteral) operands.get(operands.size() - 1);
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
return processTextShowingOp(processor, operator, operands);
super.write(processor, operator, operands);
return true;
}
/**
* Tries to processes a text showing operation. Unless a match
* is pending and starts before the end of the argument of this
* instruction, it can be processed. If the instructions contains
* a part of a match, it is transformed to a TJ operation and
* the glyphs in question are replaced by text position adjustments.
* If the original operation had a side effect (jump to next line
* or spacing adjustment), this side effect is explicitly added.
*/
boolean processTextShowingOp(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {
PdfObject object = operands.get(operands.size() - 2);
boolean isArray = object instanceof PdfArray;
PdfArray array = isArray ? (PdfArray) object : new PdfArray(object);
int elementCount = countStrings(object);
// Currently pending glyph intersects parameter of this operation -> cannot yet process
if (!pendingMatch.isEmpty() && pendingMatch.get(0).elementNumber < processedElements + elementCount)
return false;
// The parameter of this operation is subject to a match -> copy as is
if (matches.size() == 0 || processedElements + elementCount <= matches.get(0).get(0).elementNumber || elementCount == 0) {
super.write(processor, operator, operands);
processedElements += elementCount;
return true;
}
// The parameter of this operation contains glyphs of a match -> manipulate
PdfArray newArray = new PdfArray();
for (int arrayIndex = 0; arrayIndex < array.size(); arrayIndex++) {
PdfObject entry = array.getPdfObject(arrayIndex);
if (!(entry instanceof PdfString)) {
newArray.add(entry);
} else {
PdfString entryString = (PdfString) entry;
byte[] entryBytes = entryString.getBytes();
for (int index = 0; index < entryBytes.length; ) {
List<Glyph> match = matches.size() == 0 ? null : matches.get(0);
Glyph glyph = match == null ? null : match.get(0);
if (glyph == null || processedElements < glyph.elementNumber) {
newArray.add(new PdfString(Arrays.copyOfRange(entryBytes, index, entryBytes.length)));
break;
}
if (index < glyph.index) {
newArray.add(new PdfString(Arrays.copyOfRange(entryBytes, index, glyph.index)));
index = glyph.index;
continue;
}
newArray.add(new PdfNumber(-glyph.width));
index++;
match.remove(0);
if (match.isEmpty())
matches.remove(0);
}
processedElements++;
}
}
writeSideEffect(processor, operator, operands);
writeTJ(processor, newArray);
return true;
}
/**
* Counts the strings in the given argument, itself a string or
* an array containing strings and non-strings.
*/
int countStrings(PdfObject textArgument) {
if (textArgument instanceof PdfArray) {
int result = 0;
for (PdfObject object : (PdfArray)textArgument) {
if (object instanceof PdfString)
result++;
}
return result;
} else
return textArgument instanceof PdfString ? 1 : 0;
}
/**
* Writes side effects of a text showing operation which is going to be
* replaced by a TJ operation. Side effects are line jumps and changes
* of character or word spacing.
*/
void writeSideEffect(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {
switch (operator.toString()) {
case "\"":
super.write(processor, OPERATOR_Tw, Arrays.asList(operands.get(0), OPERATOR_Tw));
super.write(processor, OPERATOR_Tc, Arrays.asList(operands.get(1), OPERATOR_Tc));
case "'":
super.write(processor, OPERATOR_Tasterisk, Collections.singletonList(OPERATOR_Tasterisk));
}
}
/**
* Writes a TJ operation with the given array unless array is empty.
*/
void writeTJ(PdfContentStreamProcessor processor, PdfArray array) throws IOException {
if (!array.isEmpty()) {
List<PdfObject> operands = Arrays.asList(array, OPERATOR_TJ);
super.write(processor, OPERATOR_TJ, operands);
}
}
/**
* Analyzes the given text render info whether it starts a new match or
* finishes / continues / breaks a pending match. This method is called
* by the {#link SimpleTextRemoverListener} registered as render listener
* of the underlying content stream processor.
*/
void renderText(TextRenderInfo renderInfo) {
elementNumber++;
int index = 0;
for (TextRenderInfo info : renderInfo.getCharacterRenderInfos()) {
int matchPosition = pendingMatch.size();
pendingMatch.add(new Glyph(info, elementNumber, index));
if (!toRemove.substring(matchPosition, matchPosition + info.getText().length()).equals(info.getText())) {
reduceToPartialMatch();
}
if (pendingMatch.size() == toRemove.length()) {
matches.add(new ArrayList<>(pendingMatch));
allMatches.add(new ArrayList<>(pendingMatch));
pendingMatch.clear();
}
index++;
}
}
/**
* Reduces the current pending match to an actual (partial) match
* after the addition of the next glyph has invalidated it as a
* whole match.
*/
void reduceToPartialMatch() {
outer:
while (!pendingMatch.isEmpty()) {
pendingMatch.remove(0);
int index = 0;
for (Glyph glyph : pendingMatch) {
if (!toRemove.substring(index, index + glyph.text.length()).equals(glyph.text)) {
continue outer;
}
index++;
}
break;
}
}
String toRemove = null;
final List<List<PdfObject>> cachedOperations = new LinkedList<>();
int elementNumber = -1;
int processedElements = 0;
final List<Glyph> pendingMatch = new ArrayList<>();
final List<List<Glyph>> matches = new ArrayList<>();
final List<List<Glyph>> allMatches = new ArrayList<>();
/**
* Render listener class used by {#link SimpleTextRemover} as listener
* of its content stream processor ancestor. Essentially it forwards
* {#link TextRenderInfo} events and ignores all else.
*/
static class SimpleTextRemoverListener implements RenderListener {
#Override
public void beginTextBlock() { }
#Override
public void renderText(TextRenderInfo renderInfo) {
simpleTextRemover.renderText(renderInfo);
}
#Override
public void endTextBlock() { }
#Override
public void renderImage(ImageRenderInfo renderInfo) { }
SimpleTextRemover simpleTextRemover = null;
}
/**
* Value class representing a glyph with information on
* the displayed text and its position, the overall number
* of the string argument of a text showing instruction
* it is in and the index at which it can be found therein,
* and the width to use as text position adjustment when
* replacing it. Beware, the width does not yet consider
* character and word spacing!
*/
public static class Glyph {
public Glyph(TextRenderInfo info, int elementNumber, int index) {
text = info.getText();
ascent = info.getAscentLine();
base = info.getBaseline();
descent = info.getDescentLine();
this.elementNumber = elementNumber;
this.index = index;
this.width = info.getFont().getWidth(text);
}
public final String text;
public final LineSegment ascent;
public final LineSegment base;
public final LineSegment descent;
final int elementNumber;
final int index;
final float width;
}
final PdfLiteral OPERATOR_Tasterisk = new PdfLiteral("T*");
final PdfLiteral OPERATOR_Tc = new PdfLiteral("Tc");
final PdfLiteral OPERATOR_Tw = new PdfLiteral("Tw");
final PdfLiteral OPERATOR_Tj = new PdfLiteral("Tj");
final PdfLiteral OPERATOR_TJ = new PdfLiteral("TJ");
final static List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
final static Glyph[] EMPTY_GLYPH_ARRAY = new Glyph[0];
}
(SimpleTextRemover helper class)
You can use it like this:
PdfReader pdfReader = new PdfReader(SOURCE);
PdfStamper pdfStamper = new PdfStamper(pdfReader, RESULT_STREAM);
SimpleTextRemover remover = new SimpleTextRemover();
System.out.printf("\ntest.pdf - Test\n");
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++)
{
System.out.printf("Page %d:\n", i);
List<List<Glyph>> matches = remover.remove(pdfStamper, i, "Test");
for (List<Glyph> match : matches) {
Glyph first = match.get(0);
Vector baseStart = first.base.getStartPoint();
Glyph last = match.get(match.size()-1);
Vector baseEnd = last.base.getEndPoint();
System.out.printf(" Match from (%3.1f %3.1f) to (%3.1f %3.1f)\n", baseStart.get(I1), baseStart.get(I2), baseEnd.get(I1), baseEnd.get(I2));
}
}
pdfStamper.close();
(RemovePageTextContent test testRemoveTestFromTest)
with the following console output for my test file:
test.pdf - Test
Page 1:
Match from (134,8 666,9) to (177,8 666,9)
Match from (134,8 642,0) to (153,4 642,0)
Match from (172,8 642,0) to (191,4 642,0)
and the occurrences of "Test" missing at those positions in the output PDF.
Instead of outputting the match coordinates, you can use them to draw replacement text at the position in question.

A PDF file is not a Word Processing file. What you see are explicit placement of characters that are kerned together and/or many other things. your dream to "replace" text in such a way is not possible or better said, not likely if not impossible.
A PDF is a binary file with byte offsets. It have many parts. Like this is at this byte offset and read this, then go that that byte offset and read that.
You cannot just replace "foo" with "foobar" and think that it will work. It would disrupt all byte offsets and break the file completely.
Try it yourself before even asking.
In your example you have above, open the file in some editor and change the string in what you posted from this:
This is a
to this:
WOW Let me change this data around for the content "This is a"
Save that file and try an open it. Even that, which is a set string of content not crossing the boundaries you identified will not work. Because it is not a Word Processing file. It is not a text file. It is a binary file that you cannot manipulate as you think you can.

Related

Breadth-first traversal of a tree of slider puzzle configurations java

I need help with my PuzzleSolver class. I get a notice saying
Note: SliderPuzzleSolver.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
I don't understand what this means and also when I run my tester I get this
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
Index 0 out of bounds for length 0 at
SPS_Tester.main(SPS_Tester.java:23)
If anyone would have some guesses about any of these please shoot them my way.
public class SliderPuzzleSolver {
/* Returns a String encoding a shortest sequence of moves leading from
** the given starting configuration to the standard goal configuration,
** or null if there is no such sequence.
** pre: !start.equals( the standard goal configuration )
*/
public static String solution(SliderPuzzleConfig start) {
int numRows = start.numRows();
int numCols = start.numColumns();
SliderPuzzleConfig standardGoal = new SliderPuzzleConfig(numRows, numCols);
return solution(start, standardGoal);
}
public static String solution(SliderPuzzleConfig start,
SliderPuzzleConfig goal) {
SliderPuzzleSolverNode treeNode = new SliderPuzzleSolverNode(start, "");
QueueViaLink1<SliderPuzzleSolverNode> toBeExplored = new QueueViaLink1();
toBeExplored.enqueue(treeNode);
while (!toBeExplored.isEmpty()) {
treeNode = toBeExplored.dequeue();
SliderPuzzleConfig config = treeNode.config;
char[] direction = { 'U', 'D', 'L', 'R' };
for(int i = 0; i <= 4; i++) {
if(config.canMove(direction[i])) {
SliderPuzzleConfig newConfig = config.clone();
newConfig.move(direction[i]);
if(newConfig != goal) { //newConfig was not discovered earlier
String moveSeq = treeNode.moveSeq + direction;
SliderPuzzleSolverNode newTreeNode = new SliderPuzzleSolverNode(newConfig, moveSeq);
toBeExplored.enqueue(newTreeNode);
if (newConfig == goal) {
return moveSeq;
}
}
}
}
}
return null;
}
}
public class SPS_Tester {
/* Provided with command line arguments (or what jGrasp refers to as
** "run arguments") indicating the dimensions of a puzzle, a seed
** for a pseudo-random number generator, and a number of pseudo-random
** moves to make --starting with the standard goal configuration-- to
** obtain an initial configuration, this method uses the solution()
** method in the SliderPuzzleSolver class to find a minimum-length
** sequence of moves that transforms that initial configuration into
** the standard goal configuration. This method then displays the
** configurations along that path of moves.
*/
public static void main(String[] args) {
int numRows = Integer.parseInt(args[0]);
int numCols = Integer.parseInt(args[1]);
int seed = Integer.parseInt(args[2]);
Random rand = new Random(seed);
int numMoves = Integer.parseInt(args[3]);
SliderPuzzleConfig start =
new SliderPuzzleConfig(numRows, numCols, rand, numMoves);
System.out.println("Starting configuration is");
start.display();
System.out.println();
String solution = SliderPuzzleSolver.solution(start);
if (solution == null) {
System.out.println("No solution was found.");
}
else {
System.out.printf("Path of length %d found:\n\n", solution.length());
displayPath(start, solution);
}
System.out.println("\nGoodbye.");
}
/* Displays the sequence of configurations starting with the one
** given and proceeding according to the given sequence of moves.
*/
public static void displayPath(SliderPuzzleConfig config, String moveSeq) {
SliderPuzzleConfig spc = config.clone();
spc.display();
for (int i = 0; i != moveSeq.length(); i++) {
char dir = moveSeq.charAt(i);
spc.move(dir);
System.out.printf("\nAfter moving %c:\n", dir);
spc.display();
}
}
}

Note: SliderPuzzleSolver.java uses unchecked or unsafe operations
It means you are not using a generic type somewhere that you could or should. This is just a warning. It looks like the only place is
QueueViaLink1<SliderPuzzleSolverNode> toBeExplored = new QueueViaLink1();
It should be new QueViaLink1<SliderPuzzleSolverNode>() or in recent versions of Java you can also use newQueViaLink1<>() . You can use the compiler settings in jGRASP, or whatever IDE you're using if it's not that, to add the suggested compiler flag if you want the specific error messages.
java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for
length 0
It appears you are not passing any command line arguments. Your program should test the length of args[] before accessing the elements, and fail with an error message if there are not enough.

java-Assigning values to arrays in processing

Table table;
class AirPollution{
String place;
float NO2;
float CO;
int AP;
AirPollution(String p,float x, float y, int c){
place=p;
NO2=x;
CO=y;
AP=c;
}
}
void loadData(){
String[] data=loadStrings("airPollution.txt");
AirPollution[] AP=new AirPollution[5];
for(int i=0;i<data.length;i++){
//Part I don't know
}
}
Exactly, I am studying processing.
Below is the contents of the txt file
place,NO2(ppm),CO(ppm),Air pollution
Nowon-gu,0.024,0.6,26
Dobong-gu,0.02,0.4,12
Seocho-gu,0.018,0.3,18
Gwanak-gu,0.022,0.5,20
Guro-gu,0.017,0.3,21
This data exists, and I try to input values from the second line excluding the first line.
For example
AP[0].place="Nowon-gu"
AP[0].NO2=0.024
AP[0].CO=0.6
AP[0].AP=26
AP[1].place="Dobong-gu"
I hope this way. In this case, should I put in each one without using the split function?
Or should I use another method?

I see no reason to not use split.
In your loop, I would first split the current data item on a ,, then just add each part to an AirPollution object which finally gets added to the AP array.
Note that in my solution, I initialised your for loop at the index 1 opposed to 0. I'm making the assumption here that loadStrings produces the output that you included, which includes the header row, and I'm assuming that you don't actually want this header row in the AP array.
Note that now you might want to put some sort of check to ensure that data.length is greater than 1 before entering the for loop.
Note that I've also changed your hard coded array size from 5 to data.length - 1. The assumption here is that the array size shouldn't always be 5, there will always be a header row, and you don't want to include the header row in the array.
The below code should do exactly as you've asked, but see my comment after the code block for some extra points:
void loadData() {
String[] data = loadStrings("airPollution.txt");
AirPollution[] AP = new AirPollution[data.length - 1];
for (int i = 1; i < data.length; i++) {
String[] dataParts = data[i].split(",");
AirPollution airPollution = new AirPollution(
dataParts[0],
Float.parseFloat(dataParts[1]),
Float.parseFloat(dataParts[2]),
Integer.parseInt(dataParts[3])
);
AP[i - 1] = airPollution;
}
}
BUT, from your code, I see that this loadStrings method isn't actually defined in the AirPollution class, you want loadStrings to return void and the loadStrings method doesn't take in any argument. This means, you can populate the AP array just fine with the above code, but your code can't actually do anything with it.
Looking from the code context, I would assume that you want the loadStrings function to actually return the AirPollution array so that you can use it in whatever class is wrapping the AirPollution class. I would make a slight tweek to the above code to reflect this (add return statement and change return type):
AirPollution[] loadData() {
String[] data = loadStrings("airPollution.txt");
AirPollution[] AP = new AirPollution[data.length - 1];
for (int i = 1; i < data.length; i++) {
String[] dataParts = data[i].split(",");
AirPollution airPollution = new AirPollution(
dataParts[0],
Float.parseFloat(dataParts[1]),
Float.parseFloat(dataParts[2]),
Integer.parseInt(dataParts[3])
);
AP[i - 1] = airPollution;
}
return AP;
}
Let me know if anything is unclear!

In addition to Ibz' detailed answer (+1) explaining manual parsing, here's a version using Processing's Table class which can be loaded using loadTable():
Table table;
void setup() {
// load table and specify the first row is a header
table = loadTable("Airpollution.csv", "header");
int rowCount = table.getRowCount();
// for each row
for (int i = 0; i < rowCount; i++) {
// access the row
TableRow row = table.getRow(i);
// access each piece of data by column name...
String place = row.getString("place");
float no2 = row.getFloat("NO2(ppm)");
float co = row.getFloat("CO(ppm)");
// ...or column index
int airPollution = row.getInt(3);
// print the data
println(i, place, no2, co, airPollution);
}
}
Now that you can access the parsed data, you can plug it to instances AirPollution:
Table table;
void setup() {
// load table and specify the first row is the CSV header
table = loadTable("Airpollution.csv", "header");
int rowCount = table.getRowCount();
// use row count (as .csv data could change)
AirPollution[] AP = new AirPollution[rowCount];
// for each row
for (int i = 0; i < rowCount; i++) {
// access the row
TableRow row = table.getRow(i);
// initialize AirPollution data with row data
AP[i] = new AirPollution(row.getString("place"),
row.getFloat("NO2(ppm)"),
row.getFloat("CO(ppm)"),
row.getInt(3));
// print the data
println("row index",i,"data",AP[i]);
}
}
class AirPollution {
String place;
float NO2;
float CO;
int AP;
AirPollution(String p, float x, float y, int c) {
place=p;
NO2=x;
CO=y;
AP=c;
}
// display nicely when passing this instance to print/println
String toString(){
// %.3f = floating point value with 3 decimal places
return String.format("{ place=%s, NO2=%.3f, CO=%.3f, AP=%d} ", place, NO2, CO, AP);
}
}
Notice I've renamed the file from .txt to .csv: this extension might help preview/test the data using OpenOffice Calc, Excel, Google Sheets, etc.
Have fun visualising the data.
If you want to learn more about the Table class, other the reference, also check out Processing > Examples > Topics > Advanced Data > LoadSaveTable

checking if my array elements meet requirements

I need to create a method which checks each element in my array to see if it is true or false, each element holds several values such as mass, formula, area etc for one compound, and in total there are 30 compounds (so the array has 30 elements). I need an algorithm to ask if mass < 50 and area > 5 = true .
My properties class looks like:
public void addProperty (Properties pro )
{
if (listSize >=listlength)
{
listlength = 2 * listlength;
TheProperties [] newList = new TheProperties [listlength];
System.arraycopy (proList, 0, newList, 0, proList.length);
proList = newList;
}
//add new property object in the next position
proList[listSize] = pro;
listSize++;
}
public int getSize()
{
return listSize;
}
//returns properties at a paticular position in list numbered from 0
public TheProperties getProperties (int pos)
{
return proList[pos];
}
}
and after using my getters/setters from TheProperties I put all the information in the array using the following;
TheProperties tp = new properties();
string i = tp.getMass();
String y = tp.getArea();
//etc
theList.addProperty(tp);
I then used the following to save an output of the file;
StringBuilder builder = new StringBuilder();
for (int i=0; i<theList.getSize(); i++)
{
if(theList.getProperties(i).getFormatted() != null)
{
builder.append(theList.getProperties(i).getFormatted());
builder.append("\n");
}
}
SaveFile sf = new SaveFile(this, builder.toString());
I just cant work out how to interrogate each compound individually for whether they reach the value or not, reading a file in and having a value for each one which then gets saved has worked, and I can write an if statement for the requirements to check against, but how to actually check the elements for each compound match the requirements? I am trying to word this best I can, I am still working on my fairly poor java skills.

Not entirely sure what you are after, I found your description quite hard to understand, but if you want to see if the mass is less than 50 and the area is greater than 5, a simple if statement, like so, will do.
if (tp.getMass() < 50 && tp.getArea() > 5) {}
Although, you will again, have to instantiate tp and ensure it has been given its attributes through some sort of constructor.

Lots of ways to do this, which makes it hard to answer.
You could check at creation time, and just not even add the invalid ones to the list. That would mean you only have to loop once.
If you just want to save the output to the file, and not do anything else, I suggest you combine the reading and writing into one function.
Open up the read and the write file
while(read from file){
check value is ok
write to file
}
close both files
The advantage of doing it this way are:
You only loop through once, not three times, so it is faster
You never have to store the whole list in memory, so you can handle really large files, with thousands of elements.

In case the requirements changes, you can write method that uses Predicate<T>, which is a FunctionalInterface designed for such cases (functionalInterfaces was introduced in Java 8):
// check each element of the list by custom condition (predicate)
public static void checkProperties(TheList list, Predicate<TheProperties> criteria) {
for (int i=0; i < list.getSize(); i++) {
TheProperties tp = list.get(i);
if (!criteria.apply(tp)) {
throw new IllegalArgumentException(
"TheProperty at index " + i + " does not meet the specified criteria");
}
}
}
If you want to check if mass < 50 and area > 5, you would write:
checkProperties(theList, new Predicate<TheProperties> () {
#Override
public boolean apply(TheProperties tp) {
return tp.getMass() < 50 && tp.getArea() > 5;
}
}
This can be shortened by using lambda expression:
checkProperties(theList, (TheProperties tp) -> {
return tp.getMass() < 50 && tp.getArea() > 5;
});

Convert for loop to forEach Lambda

I am learning about Lambdas and am having a little difficulty in a conversion. I need to introduce a List into which the array supplied by the values method of the Field class is copied, using the asList method of the class Arrays. Then I need to convert the for loop with a forEach internal loop using a lambda expression as its parameter. The body of the lambda expression will be the code that is the current body of the for loop. I believe I have the List syntax correct ( List list = Arrays.asList(data); ), but I am having a hard time on figuring out what to do with the for loop, or even where to start with it. Any guidance would be greatly appreciated. Thanks
public AreaData(String... data)
{
List<String> list = Arrays.asList(data);
/* Assert to check that the data is of the expected number of items. */
assert data.length == Field.values().length : "Incorrect number of fields";
for( Field field : Field.values() )
{
int width;
String formatString;
if( field == NAME )
{
/* Get the name value and store it away. */
String value = data[field.position()];
strings.put(field, value);
/* Get the needed width of the field to hold the name. */
width = max(value.length(), field.getFieldHeading().length());
formatString = "s";
} else
{
/* If the value is of the wrong form, allow the NumberFormatException
to be thrown. */
Double value = Double.parseDouble(data[field.position()]);
/* Assertion to check value given is positive. */
assert value.compareTo(0.0) >= 0 :
"invalid " + field.name() + " value=" + value.toString();
/* Get the field value and store it away. */
doubles.put(field, value);
/* Get needed width of the field to hold the heading or value. */
width = max((int) log10(value) + MINIMUM,
field.getFieldHeading().length() + HEADING_SEPARATION);
formatString = ".2f";
}
/* Keep the widest value seen, and record the corresponding format. */
if( width > WIDTHS.get(field) )
{
WIDTHS.put(field, width);
FORMATS.put(field, "%" + width + formatString);
}
}
/* Optimization: to avoid doing this every time a comparison is made. */
this.nameCaseless = strings.get(NAME).toUpperCase().toLowerCase();
}

Stream.of(Field.values()).forEach() should do the trick:
public AreaData (String... data) {
List<String> list = Arrays.asList(data);
/* Assert to check that the data is of the expected number of items. */
assert data.length == Field.values().length : "Incorrect number of fields";
int width;
String formatString;
Stream.of(Field.values()).forEach(
field -> {
if (field == NAME) {
/* Get the name value and store it away. */
String value = data[field.position()];
strings.put(field, value);
/* Get the needed width of the field to hold the name. */
width = max(value.length(), field.getFieldHeading().length());
formatString = "s";
} else {
/* If the value is of the wrong form, allow the NumberFormatException
to be thrown. */
Double value = Double.parseDouble(data[field.position()]);
/* Assertion to check value given is positive. */
assert value.compareTo(0.0) >= 0 :
"invalid " + field.name() + " value=" + value.toString();
/* Get the field value and store it away. */
doubles.put(field, value);
/* Get needed width of the field to hold the heading or value. */
width = max((int) log10(value) + MINIMUM,
field.getFieldHeading().length() + HEADING_SEPARATION);
formatString = ".2f";
}
/* Keep the widest value seen, and record the corresponding format. */
if (width > WIDTHS.get(field)) {
WIDTHS.put(field, width);
FORMATS.put(field, "%" + width + formatString);
}
});
/* Optimization: to avoid doing this every time a comparison is made. */
this.nameCaseless = strings.get(NAME).toUpperCase().toLowerCase();
}
That said, you should consider the following rule of thumb:
A lambda expression should be ideally up to 3 lines of code and in no
case more than 5 lines!

If you particularly want to convert this to using streams and lambdas then I feel you should also take the opportunity to refactor it in line with the intent of these tools. That means using filters, collectors etc. rather than just convert all your code to a single lambda.
For example something like:
Arrays.stream(Field.values())
.peek(field -> field.storeValue(data))
.filter(field -> field.getWidth(data) > widths.get(field))
.forEach(field -> storeWidthAndFormat(data, widths, formats));
This assumes you encapsulate logic associated with NAME inside the Field enum (which is what I would recommend).

Optimising Code an Array of Strings for a History in Java

I am seeking guidance in the respect of optimising code. The code I have written is for a text-based game in which you type in commands into a command bar. One feature I wished to incorporate into my interface was the ability to scroll through a history of one's last 100 commands entered using the up and down arrow keys so that it would be more convenient for the user to play the game.
I have designed a class in which uses a String[] that will store each new entry in the second position (Array[1]) and move all entries back one position while the first position of the array (Array[0]) is just a blank, empty string. The code initialises the array to have 101 values to compensate for the first position being a blank line.
When a user inputs 0 - 100 in that order, it should then give me the reverse of the order (almost like a last in, first out kind of situation, but storing the last 100 values as opposed to removing them once they are accessed), and since 0 - 100 is 101 values, the last value will be overwritten.
Thus, scrolling up through the history, it would give me 100, 99, 98, ..., 2, 1. If I were to select 50 from the list, it would then be 50, 100, 99, ..., 3, 2. The code indeed does this.
The code is listed below:
public class CommandHistory {
private String[] history;
private final int firstIndex = 1;
private static int currentIndex = 0;
/**
* Default constructor, stores last 100 entries of commands plus the blank
* entry at the first index
*/
public CommandHistory() {
history = new String[101];
}
/**
* Constructor with a capacity, stores the last (capacity) entries of
* commands plus the blank entry at the first index
*
* #param capacity
* Capacity of the commands history list
*/
public CommandHistory(int capacity) {
history = new String[capacity + 1];
}
/**
* Returns the size (length) of the history list
*
* #return The size (length) of the history list
*/
private int size() {
return history.length;
}
/**
* Adds a command to the command history log
*
* #param command
* Command to be added to the history log
*/
public void add(String command) {
history[0] = "";
if (!command.equals("")) {
for (int i = firstIndex; i < size();) {
if (history[i] == null) {
history[i] = command;
break;
} else {
for (int j = size() - 1; j > firstIndex; j--) {
history[j] = history[j - 1];
}
history[firstIndex] = command;
break;
}
}
currentIndex = 0;
}
}
/**
* Gets the previous command in the history list
*
* #return The previous command from the history list
*/
public String previous() {
if (currentIndex > 0) {
currentIndex--;
}
return history[currentIndex];
}
/**
* Gets the next command in the history list
*
* #return The next command from the history list
*/
public String next() {
if (currentIndex >= 0 && (history[currentIndex + 1] != null)) {
currentIndex++;
}
return history[currentIndex];
}
/**
* Clears the command history list
*/
public void clear() {
for (int i = firstIndex; i < size(); i++) {
history[i] = null;
}
currentIndex = 0;
}
/**
* Returns the entire command history log
*/
public String toString() {
String history = "";
for (int i = 0; i < size(); i++) {
history += this.history[i];
}
return history;
}
}
In my interface class, once the user types something into the command bar and hits enter, it will get the text currently stored in the bar, uses the add method to add it to the history, parses the command via another class, and then sets the text in the bar to blank.
Pressing the up arrow calls the next method which scrolls up the list, and the down arrow calls the previous method which scrolls down the list.
It seems to work in every way I wish it to, but I was wondering if there was some way to optimise this code or perhaps even code it in a completely different way. I am making this game to keep myself practiced in Java and also to learn new and more advanced things, so I'd love to hear any suggestions on how to do so.

The comments to your question have already pointed out that you are somehow trying to reinvent the wheel by implementing functionality that the standard Java class library already provides to some extent (see LinkedList/Queue and Arraylist). But since you say you want to keep yourself practiced in Java I guess it is perfectly fine if you try to implement your own command history from scratch.
Here are some of my observations/suggestions:
1) It is not necessary and very counter-intuitive to declare a final first index of 1. It would be easy to start with a default index of 0 and add corresponding checks where necessary.
2) Forget about your private size() method - it is just returning the length of the internal array anyway (i.e. the initial capacity+1). Instead consider adding a public size() method that returns the actual number of added commands and internally update the actual size when adding new commands (see e.g. java.util.ArrayList for reference).
3) At the moment every call to add(String command) will set history[0] = "", which is not necessary. If you want the first index to be "", set it in the constructor. This is also a clear sign, that it would perhaps be better to start with an initial index of 0 instead of 1.
4) A minor issue: "if (!command.equals(""))" during your add method is perhaps OK for such a specialized class but it should definitely be commented in the documentation of the method. Personally I would always let the calling class decide if an empty "" command is considered valid or not. Also this method will throw an undocumented NullPointerException, when null is used as an argument. Consider changing this to "if (!"".equals(command))" or throw an IllegalArgumentException if null is added.
5) "if (history[i] == null)" during the add method is completely unnecessary, if you internally keep a pointer to the actual size of the commands - this is actually a special case that will only be true, when the very first command is added to the command history (i.e. when it's actual size == 0).
6) Having two nested for loops in your add method implementation is also unnecessary, if you keep a pointer to the actual size (see example below)
7) I would reconsider if it is necessary to keep a pointer to the current index in the command history. Personally I would avoid storing such a pointer and leave these details to the calling class - i.e. remove the previous and next methods and either provide a forward/backward Iterator and/or a random access to the index of the available commands. Interestingly, when this functionality is removed from your command history class, it actually comes down to either an implementation of a LinkedList or an ArrayList- whichever way you go. So in the end using one of the built in Java collections would actually be the way to go.
8) Last but nor least I would reconsider if it is useful to insert added commands at the beginning of the list - I believe it would be more natural to append them to the end as e.g. ArrayList does. Adding the commands to the end would make the swapping of all current commands during each call to add() unnecessary...
Here are some of the suggested changes to your class (not really tested...)
public class CommandHistory {
private String[] history;
private int size;
private static int currentIndex = 0;
/**
* Default constructor, stores last 100 entries of commands plus the blank
* entry at the first index
*/
public CommandHistory() {
this(100);
}
/**
* Constructor with a capacity, stores the last (capacity) entries of
* commands plus the blank entry at the first index
*
* #param capacity
* Capacity of the commands history list
*/
public CommandHistory(int capacity) {
history = new String[capacity];
}
/**
* Returns the size (length) of the history list
*
* #return The size (length) of the history list
*/
public int size() {
return size;
}
/**
* Adds a command to the command history log
*
* #param command
* Command to be added to the history log
*/
public void add(String command) {
if (!"".equals(command)) {
if (this.size < history.length) {
this.size++;
}
for (int i = size-1; i >0; i--) {
history[i] = history[i-1];
}
history[0] = command;
currentIndex = 0;
}
}
/**
* Gets the previous command in the history list
*
* #return The previous command from the history list
*/
public String previous() {
if (currentIndex >= 0 && currentIndex < size-1) {
currentIndex++;
}
return history[currentIndex];
}
/**
* Gets the next command in the history list
*
* #return The next command from the history list
*/
public String next() {
if (currentIndex > 0 && currentIndex < size) {
currentIndex--;
}
return history[currentIndex];
}
/**
* Clears the command history list
*/
public void clear() {
for (int i = 0; i < size; i++) {
history[i] = null;
}
currentIndex = 0;
}
/**
* Returns the entire command history log
*/
public String toString() {
String history = "";
for (int i = 0; i < size; i++) {
history += this.history[i] + ", ";
}
return history;
}
}
Well, I guess I have invested far too much time for this, but I learned quite a bit myself on the way - so thanks ;-)
Hope some of this is useful for you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace text inside a PDF file using iText - java

Related

Breadth-first traversal of a tree of slider puzzle configurations java

java-Assigning values to arrays in processing

checking if my array elements meet requirements

Convert for loop to forEach Lambda

Optimising Code an Array of Strings for a History in Java

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace text inside a PDF file using iText - java

Related

Breadth-first traversal of a tree of slider puzzle configurations java

java-Assigning values ​to arrays in processing

checking if my array elements meet requirements

Convert for loop to forEach Lambda

Optimising Code an Array of Strings for a History in Java

Categories

Resources

java-Assigning values to arrays in processing