Java PDFBox, extract data from a column of a table

Java PDFBox, extract data from a column of a table - java

I would like to find out how to extract from this pdf(ex. image) http://postimg.org/image/ypebht5dx/
For example, I want to extract only the values in the column "TENSIONE[V]" and if it encounters a blank cell I enter the letter "X" in the output.
How could I do?
The code I used is this:
PDDocument p=PDDocument.load(new File("a.pdf"));
PDFTextStripper t=new PDFTextStripper();
System.out.println(t.getText(p));
and I get this output:
http://s23.postimg.org/wbhcrw03v/Immagine.png

These are just guidelines. Use them upon your use. This is not tested either, but help you solve your issue. If you have any question let me know.
String text = t.getText(p);
String lines[] = text.split("\\r?\\n"); // give you all the lines separated by new line
String cols[] = lines[0].split("\\s+") // gives array separated by whitespaces
// cols[0] contains pins
// clos[1] contains TENSIONE[V]
// cols[2] contains TOLLRENZA if not present then its empty

Related

I am new to selenium.How should i check a list of webelements(which are numbers) are masked after the 4th character

How can I use selenium java to verify that the data table are masked after the 4th character.
I have used the below code to extract the WebElements from the GUI.
List IDds = driver.findElements(By.xpath("//tbody/tr/td[1]"));
Example of data:
1111$$$$
2222$$

Find all the elements you need in a loop, or give a CSS expression that returns a list of web elements:
WebElement myElm = driver.findElementByCSS("CSSExpression");
String text = myElm.getAttribute("Text");
// or
// String text = myElm.getAttribute("Value");
Char C = text.charAt(4);
if (C=='$')
// Good
else
// bad
You can also find the first location in the string where you see $ and make sure all the other chars are the same.

Reading the file character by character in LibGDX(eclipse)?

I have a .txt file which contains following text:
111000111001
x00000010001
111110000001
I want to put this content into string so I use this method.
public void read() {
FileHandle file = Gdx.files.internal("map.txt");
String text = file.readString();
System.out.println(text.charAt(12));//Here is the problem,it's showing empty character instead of x
}
When I want to get the 12th element(x on 2nd line),it's impossible(I think there's a problem of passing to new line,but I don't know how to solve it).Can you help me please?

There's something called Carriage Return that makes extra characters appear at the end of each line (3 extra characters to be precise) when reading from a text file so to avoid getting those in your way you can use:
text = text.replaceAll("(?:\\n|\\r)", "");
And now when you try to print the 12th element you get the "x" you wanted
System.out.println(text.charAt(12)); // Prints x
Here's more info about the replaceAll() method:
Java Api: String.replaceAll()

how to replace a word with in contains() Method form a JtextPane

Well, I'm trying to replace a word by using contains() Method:
String z = tfB.getText().toString();
String show = textPane.getText().toString();
if(show.contains(z)){
// how I specify the word that were found and change it without
effecting anything with in that line
}
well what I main by that:
What I'm trying to do is get the value from the user.
then search if it found replace it with something. For example:
String x = "one two three four five";
It should set the textPane to "one two 3 four five"
or
"one two 3-three-3 four five"
could any one please tell me how to do it.
Thank you

What I'm trying to do is get the value from the user. then search if it found replace it with something.
Don't use the contains() method because you will need to search the text twice:
once to see if the text is found in the string
again to replace the text with a new string.
Instead, use the String.indexof(...) method. It will return the index of the text IF it is found in the String.
Then you should replace the text directly in the Document of the text pane, not in the String itself. So the code would be something like:
int length = textPane.getDocument().getLength();
String text = textPane.getDocument().getText(0, length);
String search = "abc...";
int offset = text.indexOf(search);
if (offset != -1)
{
textPane.setSelectionStart(offset);
textPane.setSelectionEnd(offset + search.length();
textPane.replaceSelection("123...");
}
Also, not that you get the text from the Document, not the text pane. This is to make sure the offsets are correct when you replace the text in the Document. Check out Text and New Lines for more information on why this is important.

Regex to parse multiline data

I have a following data from a file and I would like to see if I can do a regex parsing here
Name (First Name) City Zip
John (retired) 10007
Mark Baltimore 21268
....
....
Avg Salary
70000 100%
Its not a big file and the entire data from the file is available in a String object with a new line characters (\n) (String data = "data from the file")
I am trying to get name, city, zip and then the salary, percentage details
data inside () considered part of Name field.
For Name field space is considered valid and there are no space for other fields.
'Avg Salary' is available only at the end of the file
Will it be easy to do this via regex parsing in Java?

If the text file is space-aligned, you can (and probably should) extract the fields based on the number of characters. So, you take the first n characters in each line as first name, the next m characters as City, and so on.
This is one code to extract using the above method, by calculating the field length of the fields automatically, assuming we know the header.
String data = "data from the file";
// This is just to ensure we have enough space in the array
int numNewLines = data.length()-data.replace("\n","").length();
String[][] result = new String[numNewLines][3];
String[] lines = data.split("\n");
int avgSalary = 0;
int secondFieldStart = lines[0].indexOf("City");
int thirdFieldStart = lines[0].indexOf("Zip");
for(int i=1; i<lines.length; i++){
String line = lines[i].trim();
if(line.equals("Avg Salary")){
avgSalary = Integer.parseInt(lines[i+1].substring(0,secondFieldStart).trim());
break;
}
result[i-1][0] = line.substring(0,secondFieldStart).trim(); // First Name
result[i-1][1] = line.substring(secondFieldStart,thirdFieldStart).trim(); // City
result[i-1][2] = line.substring(thirdFieldStart).trim(); // Zip
}
Using regex will be possible, but it will be more complicated. And regex won't be able to differentiate person's name and city's name anyway:
Consider this case:
John Long-name Joe New York 21003
How would you know the name is John Long-name Joe instead of John Long-name Joe New if you don't know that the length of the first field is at most 20 characters? (note that length of John Long-name Joe is 19 characters, leaving one space between it and New in New York)
Of course if your fields are separated by other characters (like tab character \t), you can split each line based on that. And it's easy to modify the code above to accommodate that =)
Since the solution I proposed above is simpler, I guess you might want to try it instead =)

read a file using scanner and save it in string including white space

I have a text file that i want to read. I have no problem doing that.
My problem is that i need to check if a line has a white space or not.
for example, lets assume the below is my text file. I want to save "something" in new string and if it has nothing in that column and row then i want to " " as string.
Column1 Column2 Column3
Row1 Something Something Something
Row2 Something Something
Row3 Something
i tried to read the file with scanner and and save each line in new string. but i have no clue to how to get the white space from the string. i'm not sure if this method will work or not.
any suggestions
thanks

If you want to check for white space just use
String out;
if(inputLine.indexOf(" ")!=-1){
// Whitespace exists in string
out = "something";
} else {
// No whitespace exists
out = " ";
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java PDFBox, extract data from a column of a table - java

Related

I am new to selenium.How should i check a list of webelements(which are numbers) are masked after the 4th character

Reading the file character by character in LibGDX(eclipse)?

how to replace a word with in contains() Method form a JtextPane

Regex to parse multiline data

read a file using scanner and save it in string including white space

Categories

Resources