Find Text in Microsoft Word And Replace With Java String Array - java

I am attempting to find the value $beta in my word doc and replace with an array of data from my java program. The data that I want to replace with is
.......
Blue - 33 - 100
Blue - 28 - 75
Blue - 30 - 90
I verify this is accurate by using the print statement in the syntax below. However, when I open my word document after code saves it, ONLY the last value Blue - 30 - 90 is in the word document, not all 3 stacked on each other like I have in my code post above.
Just like my example above this is how I want the code to appear in the word document when replaced with the java syntax. How should the code read to make that happen?
public static void Test() {
String valuetowrite = null;
for (SPData data : qryresults) {
valuetowrite = String.join("\r\n", data.toString());
System.out.println(valuetowrite);
}
try {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(SOURCE_FILE));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null) {
if (text.contains("$beta")) {
text = text.replace("$beta", valuetowrite);
r.setText(text, 0);
}
}
}
}
}
doc.write(new FileOutputStream(OUTPUT_FILE));
} catch (Exception ex) {
ex.printStackTrace();
}
}
EDIT
I follow code suggestion in the answer and use the below syntax which on the JAVA side of things, has the data print as desired, but once it is in word, all the data is on one line, not each on an individual line like I desire
String valuetowrite = "";
for (SPData data : qryresults) {
valuetowrite = valuetowrite + String.join("\r\n", data.toString());
}
System.out.println(valuetowrite);

Try the code below, it is using r.addBreak(); which will add in a line break like you want.
private static void WriteToWordWithLineBreak() {
ArrayList<String> datatowrite = new ArrayList<String>();
for (SPData data : qryresults) {
datatowrite.add(data.toString());
}
try {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(SOURCE_FILE));
for (XWPFParagraph p : doc.getParagraphs()) {
System.out.println("Found paragraph "+p);
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null) {
if (text.contains("$beta")) {
r.setText(datatowrite.get(0), 0);
for (int i=1; i < datatowrite.size(); i++){
r.addBreak();
r.setText(datatowrite.get(i));
}
}
}
}
}
}
doc.write(new FileOutputStream(OUTPUT_FILE));
} catch (Exception ex) {
ex.printStackTrace();
}
}

Try this:
String valuetowrite = "";
for (SPData data : qryresults) {
valuetowrite = valuetowrite + String.join("\r\n", data.toString());
}
System.out.println(valuetowrite);

Related

Reading text between quotation marks

Here's a piece of text I'm trying to work with:
lat="52.336575" lon="6.381008">< time>2016-12-19T12:12:27Z< /time>< name>Foto 8 </name>< desc>Dag 4 E&F
Geb 1.4
Hakhoutstoof < /desc>< /wpt>
I'm trying to extract the coördinates between the "" and put the values between the "" into a string, but I can't get it to work...
Here's my code (so far):
public void openFile() {
Chooser = new JFileChooser("C:\\Users\\danie\\Desktop\\");
Chooser.setAcceptAllFileFilterUsed(false);
Chooser.setDialogTitle("Open file");
Chooser.addChoosableFileFilter(new FileNameExtensionFilter("*.gpx",
"gpx"));
int returnVal = Chooser.showOpenDialog(null);
try {
Dummy = new Scanner(Chooser.getSelectedFile());
} catch (FileNotFoundException E) {
System.out.println("Error: " + E);
}
}
public void createDummy() {
Dummy.useDelimiter("<wpt");
if (Dummy.hasNext()) {
String Meta = Dummy.next();
}
Dummy.useDelimiter("\\s[<wpt]\\s|\\s[</wpt>]\\s");
try {
while (Dummy.hasNext()) {
String Test = Dummy.next();
DummyFile = new File("Dummy.txt");
Output = new PrintWriter(DummyFile);
Output.print(Test);
Output.println();
Output.flush();
Output.close();
}
Reader = new FileReader(DummyFile);
Buffer = new BufferedReader(Reader);
TestFile = new File("C:\\Users\\danie\\Desktop\\Test.txt");
Writer = new PrintWriter(TestFile);
String Final;
while ((Final = Buffer.readLine()) != null) {
String WPTS[] = Final.split("<wpt");
for (String STD:WPTS) {
Writer.println(STD);
Writer.flush();
Writer.close();
}
}
} catch (IOException EXE) {
System.out.println("Error: " + EXE);
}
Dummy.close();
}
}
I'm really new to Java :(
I think the following code will do the trick ...
the "string" is only used to test the regex
final String string = "lat=\"52.336575\" lon=\"6.381008\">< time>2016-12-19T12:12:27Z< /time>< name>Foto 8 </name>< desc>Dag 4 E&F \nGeb 1.4 \n" + "Hakhoutstoof < /desc>< /wpt>";
final String latitudeRegex = "(?<=lat=\")[0-9]+\\.[0-9]*";
final Pattern latitudePattern = Pattern.compile(latitudeRegex);
final Matcher latitudeMatcher = latitudePattern.matcher(string);
//finds the next (in this case first) subsequence matching the given regex
latitudeMatcher.find();
String latitudeString = latitudeMatcher.group();
double lat = Double.parseDouble(latitudeString); //group returns the match matched by previous match
System.out.println("lat: " + lat);
to get the longitude, just replace lat by lon in the regex
this site is very useful for creating a regex
https://regex101.com/
you can even create the java code at this site

Java parsing alternative to current solution

I have a text file to parse, that requires different logic depending on certain conditions. Below, is my current solution that works. However, I find it very clunky, and have been looking into other solutions such as StringTokenizer or Pattern class and am wondering I may be able to implement this more elegantly using them.
Do let me know if I should move this to the Code Review forum--I have not initially put it there, as I am unable to implement the other mentioned solutions.
File file = fileChooser.getSelectedFile();
java.io.BufferedReader reader = new java.io.BufferedReader(new java.io.FileReader(file));
memoryMap = new HashMap<Integer, Integer>();
registerMap = new HashMap<Integer, Integer>();
String line = reader.readLine();
while (line != null) {
if (line.contains("#")) {
System.out.println(line);
line = reader.readLine();
}
if (!Character.isDigit(line.charAt(0))) {
System.out.println(line);
String[] setFirstSplit = line.split(":");
if (setFirstSplit[0].equals("M")) {
boolean isFirst = true;
for (String setFirstSegment : setFirstSplit) {
if (!isFirst) {
String[] setSecondSplit = setFirstSegment.split(",");
for (String setSecondSegment : setSecondSplit) {
String[] setThirdSplit = setSecondSegment.split("=");
for (String setThirdSegment : setThirdSplit) {
System.out.println(setThirdSegment);
memoryMap.put(Integer.parseInt(setThirdSplit[0]), Integer.parseInt(setThirdSplit[1]));
System.out.println("Memory Set Result: " + memoryMap);
}
}
} else {
isFirst = false;
}
}
}
if (setFirstSplit[0].equals("R")) {
boolean isFirst = true;
for (String setFirstSegment : setFirstSplit) {
if (!isFirst) {
String[] setSecondSplit = setFirstSegment.split(",");
for (String setSecondSegment : setSecondSplit) {
String[] setThirdSplit = setSecondSegment.split("=");
for (String setThirdSegment : setThirdSplit) {
System.out.println(setThirdSegment);
registerMap.put(Integer.parseInt(setThirdSplit[0]), Integer.parseInt(setThirdSplit[1]));
System.out.println("Register Set Result: " + registerMap);
}
}
} else {
isFirst = false;
}
}
}
line = reader.readLine();
} else {
System.out.println(line);
String[] actionFirstSplit = line.split(" ");
if (actionFirstSplit[1].equals("LOAD")) {
String[] actionSecondSplit = actionFirstSplit[2].split(",");
LoadStep action = new LoadStep();
action.executeStep(Integer.parseInt(actionSecondSplit[0]), Integer.parseInt(actionSecondSplit[1]));
System.out.println("Memory Action Result: " + memoryMap);
System.out.println("Register Action Result: " + registerMap);
}
else {
System.out.println(line);
}
line = reader.readLine();
}
}
reader.close();
The text file looks like this:
# sets the memory address 0 to store the value 1. M stands for memory.
M:0=1,1=11
# All programs starts with an initial setup of values in memory such as the example shown above
0 LOAD 1,3
1 LOAD 0,2
2 ADD 1,2
3 ADD 0,1
4 LSS 1,3,2
5 STOR 62,1
6 STOP
Write it top-down.
String line = reader.readLine();
while (line != null) {
if (parsedComment(line)) {
} else if (parsedMemory(line)) {
} else if (parsedInstruction(line)) {
} else {
error(...);
}
line = reader.readLine();
}
Parse functions may use fields to pass results, like those maps, or have extra parameters.
(If you have multi-line syntax, the reader might be better placed in a field, and disappear as parameter. You can then read a line ahead in the field, and check on that.)
You could use a parser generator like ANTLR http://www.antlr.org/

apply extraction information with java

i trying to apply a dictionary (File of Words) on text(File of text):
we test if the word exists in a line of the text, if yes we will print it (the line). we test all word of dictionary for every line of text.
i used EXPREG pattern+matcher but the problem is the time. the operation take 5H.
The 2 File have 3330ko and 55ko
.
my question is is there another method to do this like UNITEX but in java
public class Tratemant_Dic extends Thread {
Tratemant_Dic() {
}
public void run() {
try {
BufferedReader file_corpus = new BufferedReader(
new InputStreamReader(new FileInputStream(
"corpus-medical.TXT"), "UTF-16LE"));
PrintWriter ecrire = new PrintWriter("sort.html");
String line;
String nom = null;
ecrire.write("<mot><span style=\"color:red\">startsss</span></mot></br><ligne>start\n");
while ((line = file_corpus.readLine()) != null) {
BufferedReader file_nom = new BufferedReader(
new InputStreamReader(new FileInputStream(
"Fichie_sorte.DIC"), "UTF-16LE"));
while ((nom = file_nom.readLine()) != null) {
nom = nom.substring(0, nom.length() - 3);
Pattern p = Pattern.compile("(.*)\\W+" + nom + "\\b.*",
Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(line);
if (m.find()) {
System.out.println(nom + "==>" + line);
ecrire.write("<mot><span style=\"color:red\">" + nom
+ "</span></mot></br><ligne>" + line + "\n");
}
}
file_nom.close();
}
ecrire.close();
System.out.println("FIN");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
If I understand what you are trying to do correctly, I would not use regular expressions to do it. They're slow and you do not need them.
This is really a string matching problem. Your dictionary should probably be stored in a hash table, using the hashCode() method to get a key for the string. You then search in your dictionary for each word as you read it ( calculating the appropriate hash code as you read it ) from the text. Properly done that should be as fast as it gets.
Remember that hash codes are not guaranteed to be unique, so always make sure the actual strings match even if the hash code is found in the table.
I would start by attempting to time each of the "things" your application does than then target the slowest item (as mentioned in a comment by Jay, one issue in your case is the fact you are loading the dictionary every time) rather than base the improvements on a guess of what is wrong (the regex being slow).
You can use System.nanoTime() or one of the many stopwatches to do this. I normally use guava.
Why you not use instead of
Pattern p = Pattern.compile("(.*)\\W+" + nom + "\\b.*",
Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(line);
if (m.find()) {
...
just
if(line.indexOf(nom) > -1) {
...
?
Update: if you need word boundary stuff use something:
String lineToLowerCase = line.toLowerCase(); // before second while
...
int index = lineToLowerCase.indexOf(nom.toLowerCase());
if(index > -1) {
if(index ==0 || Character.isWhitespace(lineToLowerCase.charAt(index-1))) {
int indexEnd = index + nom.length();
if (indexEnd >= lineToLowerCase.length() || !Character.isAlphabetic(lineToLowerCase.charAt(indexEnd))) {
...
for testing
public static void main(String[] s) {
check("skdc s dcd dsf", "dcd"); // print true
check("skdc sdcd dsf", "dcd"); // print false
check("dcd dsf", "dcd"); // print true
check("afasa dcd", "dcd"); // print true
check("afasa dCD11", "dcD"); // print true
check("skdc s dcda dsf", "dcd"); // print false
}
public static void check(String line, String nom) {
String lineToLowerCase = line.toLowerCase();
int index = lineToLowerCase.indexOf(nom.toLowerCase());
if(index > -1) {
if(index ==0 || Character.isWhitespace(lineToLowerCase.charAt(index-1))) {
int indexEnd = index + nom.length();
if (indexEnd >= lineToLowerCase.length() || !Character.isAlphabetic(lineToLowerCase.charAt(indexEnd))) {
System.out.println("true");
return;
}
}
}
System.out.println("false");
}

Handle Empty lines in Java

I am facing a problem in the following code. I am trying to run the program and it terminates when it hits empty space in my input. How else I should approach this.
try {
BufferedReader sc = new BufferedReader(new FileReader(text.txt);
ArrayList<String> name = new ArrayList<>();
ArrayList<String> id = new ArrayList<>();
ArrayList<String> place = new ArrayList<>();
ArrayList<String> details = new ArrayList<>();
String line = null;
while ((line = sc.readLine()) !=null) {
if (!line.trim().equals("")) {
System.out.println(line);
if (line.toLowerCase().contains("name")) {
name.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("id")) {
id.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("location")) {
place.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("details")) {
details.add(line.split("=")[1].trim());
}
}
}
PrintWriter pr = new PrintWriter(new File(text.csv));
pr.println("Name;Id;;Location;Details");
for (int i = 0; i < name.size(); i++) {
pr.println(name.get(i) + ";" + id.get(i) + ";" + place.get(i) + ";" + details.get(i));
}
pr.close();
sc.close();
} catch (Exception e) {
e.printStackTrace();
} }
My Input looks like
name = abc
id = 123
place = xyz
details = hsdyhuslkjaldhaadj
name = ert
id = 7872
place =
details = shahkjdhksdhsala
name = sfd
id = 4343
place = ksjks
Details = kljhaljs
when im trying to execute then above text my program terminates at place = "null" because of no value there.I need the output as an empty space created in place ="null" and print the rest as follows in a .csv file
If you process the location, line.split("=")[1] could result in an ArrayIndexOutOfBoundException and line.split("=")[1].trim() could result in a NullPointerException.
You can avoid this by testing your parsed result.
Instead of place.add(line.split("=")[1].trim());, do place.add(parseContentDefaultEmpty(line));, with:
private String parseContentDefaultEmpty(final String line) {
final String[] result = line.split("=");
if(result.length <= 1) {
return "";
}
final String content = line.split("=")[1];
return content != null ? content.trim() : "";
}
First there is a issue,your input file contains key as "place" but your are trying for word "location"
if (line.toLowerCase().contains("location")) { //this must be changed to place
place.add(line.split("=")[1].trim());
}
Modified the code snippet as below.check it
while ((line = sc.readLine()) != null) {
if (!line.trim().equals("")) {
System.out.println(line);
if (line.toLowerCase().contains("name")) {
name.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("id")) {
id.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("place")) {
// change done here to add space if no value
place.add(line.split("=").length > 1 ? line.split("=")[1]
.trim() : " ");
}
if (line.toLowerCase().contains("details")) {
details.add(line.split("=")[1].trim());
}
}
}
Setting question to line doesn't appear to change what line is read later (if you're wanting the line to advance before it hits the while loop).

Translate words in a string using BufferedReader (Java)

I've been working on this for a few days now and I just can't make any headway. I've tried using Scanner and BufferedReader and had no luck.
Basically, I have a working method (shortenWord) that takes a String and shortens it according to a text file formatted like this:
hello,lo
any,ne
anyone,ne1
thanks,thx
It also accounts for punctuation so 'hello?' becomes 'lo?' etc.
I need to be able to read in a String and translate each word individually, so "hello? any anyone thanks!" will become "lo? ne ne1 thx!", basically using the method I already have on each word in the String. The code I have will translate the first word but then does nothing to the rest. I think it's something to do with how my BufferedReader is working.
import java.io.*;
public class Shortener {
private FileReader in ;
/*
* Default constructor that will load a default abbreviations text file.
*/
public Shortener() {
try {
in = new FileReader( "abbreviations.txt" );
}
catch ( Exception e ) {
System.out.println( e );
}
}
public String shortenWord( String inWord ) {
String punc = new String(",?.!;") ;
char finalchar = inWord.charAt(inWord.length()-1) ;
String outWord = new String() ;
BufferedReader abrv = new BufferedReader(in) ;
// ends in punctuation
if (punc.indexOf(finalchar) != -1 ) {
String sub = inWord.substring(0, inWord.length()-1) ;
outWord = sub + finalchar ;
try {
String line;
while ( (line = abrv.readLine()) != null ) {
String[] lineArray = line.split(",") ;
if ( line.contains(sub) ) {
outWord = lineArray[1] + finalchar ;
}
}
}
catch (IOException e) {
System.out.println(e) ;
}
}
// no punctuation
else {
outWord = inWord ;
try {
String line;
while( (line = abrv.readLine()) != null) {
String[] lineArray = line.split(",") ;
if ( line.contains(inWord) ) {
outWord = lineArray[1] ;
}
}
}
catch (IOException ioe) {
System.out.println(ioe) ;
}
}
return outWord;
}
public void shortenMessage( String inMessage ) {
String[] messageArray = inMessage.split("\\s+") ;
for (String word : messageArray) {
System.out.println(shortenWord(word));
}
}
}
Any help, or even a nudge in the right direction would be so much appreciated.
Edit: I've tried closing the BufferedReader at the end of the shortenWord method and it just results in me getting an error on every word in the String after the first one saying that the BufferedReader is closed.
So I took at look at this. First of all, if you have the option to change the format of your textfile I would change it to something like this (or XML):
key1=value1
key2=value2
By doing this you could later use java's Properties.load(Reader). This would remove the need for any manual parsing of the file.'
If by any change you don't have the option to change the format then you'll have to parse it yourself. Something like the code below would do that, and put the results into a Map called shortningRules which could then be used later.
private void parseInput(FileReader reader) {
try (BufferedReader br = new BufferedReader(reader)) {
String line;
while ((line = br.readLine()) != null) {
String[] lineComponents = line.split(",");
this.shortningRules.put(lineComponents[0], lineComponents[1]);
}
} catch (IOException e) {
e.printStackTrace();
}
}
When it comes to actually shortening a message I would probably opt for a regex approach, e.g \\bKEY\\b where key is word you want shortened. \\b is a anchor in regex and symbolizes a word boundery which means it will not match spaces or punctuation.
The whole code for doing the shortening would then become something like this:
public void shortenMessage(String message) {
for (Entry<String, String> entry : shortningRules.entrySet()) {
message = message.replaceAll("\\b" + entry.getKey() + "\\b", entry.getValue());
}
System.out.println(message); //This should probably be a return statement instead of a sysout.
}
Putting it all together will give you something this, here I've added a main for testing purposes.
I think you can have a simpler solution using a HashMap. Read all the abbreviations into the map when the Shortener object is created, and just reference it once you have a word. The word will be the key and the abbreviation the value. Like this:
public class Shortener {
private FileReader in;
//the map
private HashMap<String, String> abbreviations;
/*
* Default constructor that will load a default abbreviations text file.
*/
public Shortener() {
//initialize the map
this.abbreviations = new HashMap<>();
try {
in = new FileReader("abbreviations.txt" );
BufferedReader abrv = new BufferedReader(in) ;
String line;
while ((line = abrv.readLine()) != null) {
String [] abv = line.split(",");
//If there is not two items in the file, the file is malformed
if (abv.length != 2) {
throw new IllegalArgumentException("Malformed abbreviation file");
}
//populate the map with the word as key and abbreviation as value
abbreviations.put(abv[0], abv[1]);
}
}
catch ( Exception e ) {
System.out.println( e );
}
}
public String shortenWord( String inWord ) {
String punc = new String(",?.!;") ;
char finalchar = inWord.charAt(inWord.length()-1) ;
// ends in punctuation
if (punc.indexOf(finalchar) != -1) {
String sub = inWord.substring(0, inWord.length() - 1);
//Reference map
String abv = abbreviations.get(sub);
if (abv == null)
return inWord;
return new StringBuilder(abv).append(finalchar).toString();
}
// no punctuation
else {
//Reference map
String abv = abbreviations.get(inWord);
if (abv == null)
return inWord;
return abv;
}
}
public void shortenMessage( String inMessage ) {
String[] messageArray = inMessage.split("\\s+") ;
for (String word : messageArray) {
System.out.println(shortenWord(word));
}
}
public static void main (String [] args) {
Shortener s = new Shortener();
s.shortenMessage("hello? any anyone thanks!");
}
}
Output:
lo?
ne
ne1
thx!
Edit:
From atommans answer, you can basically remove the shortenWord method, by modifying the shortenMessage method like this:
public void shortenMessage(String inMessage) {
for (Entry<String, String> entry:this.abbreviations.entrySet())
inMessage = inMessage.replaceAll(entry.getKey(), entry.getValue());
System.out.println(inMessage);
}

Categories