I have a bit of code that goes to a website, finds text, and prints it out html style in a JLabel. I want to be able to change the color of a specific word in the text ( maybe all of the word "cow" would be green). Here is the code:
public void code() throws IOException
{
Document document = Jsoup.connect("http://www.nbcwashington.com/weather/school-closings/").get();
Elements tags = document.select("p");
String txt = "<html>";
for (Element tag : tags) {
txt += tag.text() + "<br/>";
}
txt += "</html>";
output.setText(txt);
}
I have a bit of code that goes to a website, finds text, and prints it out html style in a JLabel
I find working with JTextPane and style attributes easier than working with HTML.
Just add the text to a text pane as normal text, then you can search the text and change the attributes as required:
Untested code would be something like:
JTextPane textPane = new JTextPane();
textPane.setText(...);
SimpleAttributSet keyword = new SimpleAttributeSet();
StyleConstants.setForeground(keyword, Color.GREEN);
StyledDocument doc = textPane.getStyledDocument();
int length = textPane.getDocument().getLength();
text = textPane.getDocument().getText(0, length);
String search = "cow";
int offset = 0;
while ((offset = text.indexOf(search, offset)) != -1)
{
doc.setCharacterAttributes(offset, search.length(), keyword, false);
offset += search.length();
}
You can also make the JTextPane look like a JLabel by using:
textPane.setOpaque( false );
you can check value using .equals method .you can use span tag to color it.
public void code() throws IOException
{
Document document = Jsoup.connect("http://www.nbcwashington.com/weather/school-closings/").get();
Elements tags = document.select("p");
String txt = "<html>";
for (Element tag : tags) {
if(tag.text().equals("cow")){
txt += "<span style=\"color:#00FF00\">"+tag.text()+"</span><br/>";
}else{
txt += tag.text() + "<br/>";
}
}
txt += "</html>";
output.setText(txt);
}
Related
My goal is to transfer textual content from a PDF to a new PDF while preserving the formatting of the font. (e.g. Bold, Italic, underlined..).
I try to use the TextPosition List from the existing PDF and write a new PDF from it.
For this I get from the TextPosition List the Font and FontSize of the current entry and set them in a contentStream to write the upcoming text through contentStream.showText().
after 137 successful loops this error follows:
Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+00AD in font VVHOEY+FrutigerLT-BoldCn
at org.apache.pdfbox.pdmodel.font.PDType1CFont.encode(PDType1CFont.java:357)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:333)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showTextInternal(PDPageContentStream.java:514)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:476)
at haupt.PageTest.printPdf(PageTest.java:294)
at haupt.MyTestPDF.main(MyTestPDF.java:54)
This is my code up to this step:
public void printPdf() throws IOException {
TextPosition tpInfo = null;
String pdfFileInText = null;
int charIDindex = 0;
int pageIndex = 0;
try (PDDocument pdfDocument = PDDocument.load(new File(srcFile))) {
if (!pdfDocument.isEncrypted()) {
MyPdfTextStripper myStripper = new MyPdfTextStripper();
var articlesByPage = myStripper.getCharactersByArticleByPage(pdfDocument);
createDirectory();
String newFileString = (srcErledigt + "Test.pdf");
File input = new File(newFileString);
input.createNewFile();
PDDocument document = new PDDocument();
// For Pages
for (Iterator<List<List<TextPosition>>> pageIterator = articlesByPage.iterator(); pageIterator.hasNext();) {
List<List<TextPosition>> pageList = pageIterator.next();
PDPage newPage = new PDPage();
document.addPage(newPage);
PDPageContentStream contentStream = new PDPageContentStream(document, newPage);
contentStream.beginText();
pageIndex++;
// For Articles
for (Iterator<List<TextPosition>> articleIterator = pageList.iterator(); articleIterator.hasNext();) {
List<TextPosition> articleList = articleIterator.next();
// For Text
for (Iterator<TextPosition> tpIterator = articleList.iterator(); tpIterator.hasNext();) {
tpCharID = charIDindex;
tpInfo = tpIterator.next();
System.out.println(tpCharID + ". charID: " + tpInfo);
PDFont tpFont = tpInfo.getFont();
float tpFontSize = tpInfo.getFontSize();
pdfFileInText = tpInfo.toString();
contentStream.setFont(tpFont, tpFontSize);
contentStream.newLineAtOffset(50, 700);
contentStream.showText(pdfFileInText);
charIDindex++;
}
}
contentStream.endText();
contentStream.close();
}
} else {
System.out.println("pdf Encrypted");
}
}
}
MyPdfTextStripper:
public class MyPdfTextStripper extends PDFTextStripper {
public MyPdfTextStripper() throws IOException {
super();
setSortByPosition(true);
}
#Override
public List<List<TextPosition>> getCharactersByArticle() {
return super.getCharactersByArticle();
}
// Add Pages to CharactersByArticle List
public List<List<List<TextPosition>>> getCharactersByArticleByPage(PDDocument doc) throws IOException {
final int maxPageNr = doc.getNumberOfPages();
List<List<List<TextPosition>>> byPageList = new ArrayList<>(maxPageNr);
for (int pageNr = 1; pageNr <= maxPageNr; pageNr++) {
setStartPage(pageNr);
setEndPage(pageNr);
getText(doc);
byPageList.add(List.copyOf(getCharactersByArticle()));
}
return byPageList;
}
Additional Info:
There are seven fonts in my document, all of which are set as subsets.
I need to write the Text given with the corresponding Font given.
All glyphs that should be written already exist in the original document, where I get my TextPositionList from.
All fonts are subtype 1 or 0
There is no AcroForm defined
Thanks in advance
Edit 30.08.2022:
Fixed the Issue by manually replacing this particular Unicode with a placeholder for the String before trying to write it.
Now I ran into this open ToDo:
org.apache.pdfbox.pdmodel.font.PDCIDFontType0.encode(int)
#Override
public byte[] encode(int unicode)
{
// todo: we can use a known character collection CMap for a CIDFont
// and an Encoding for Type 1-equivalent
throw new UnsupportedOperationException();
}
Anyone got any suggestions or Workarounds for this?
Edit 01.09.2022
I tried to replace occurrences of that Font with an alternative Font from the source file, but this opens another problem where a COSStream is "randomly" closed, which results in the new document not being able to save the File after writing my text with a contentStream.
Using standard Fonts like PDType1Font.HELVETICA instead works though..
I need help by replacing an image with another image in Word using Apache POI or any other library that might do the job. I know how to replace a word using Apache POI but I can't figure a way out to replace an image.
public static void main(String[] args) throws FileNotFoundException {
String c22 = "OTHER WORD";
try {
XWPFDocument doc = new XWPFDocument(OPCPackage.open("imagine.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null ) {
String imgFile = "imaginedeschis.jpg";
try (FileInputStream is = new FileInputStream(imgFile)) {
r.addPicture(is, XWPFDocument.PICTURE_TYPE_JPEG, imgFile,
Units.toEMU(200), Units.toEMU(200)); // 200x200 pixels
text = text.replace("1ST WORD", c22);
} // 200x200 pixels
r.setText(text, 0);
}
}
}
}
doc.write(new FileOutputStream("output.docx"));
} catch (InvalidFormatException | IOException m){ }
}
I am using below Java code to replace one image in Word document (*.docx). Please share if anyone have better approach.
public XWPFDocument replaceImage(XWPFDocument document, String imageOldName, String imagePathNew, int newImageWidth, int newImageHeight) throws Exception {
try {
LOG.info("replaceImage: old=" + imageOldName + ", new=" + imagePathNew);
int imageParagraphPos = -1;
XWPFParagraph imageParagraph = null;
List<IBodyElement> documentElements = document.getBodyElements();
for(IBodyElement documentElement : documentElements){
imageParagraphPos ++;
if(documentElement instanceof XWPFParagraph){
imageParagraph = (XWPFParagraph) documentElement;
if(imageParagraph != null && imageParagraph.getCTP() != null && imageParagraph.getCTP().toString().trim().indexOf(imageOldName) != -1) {
break;
}
}
}
if (imageParagraph == null) {
throw new Exception("Unable to replace image data due to the exception:\n"
+ "'" + imageOldName + "' not found in in document.");
}
ParagraphAlignment oldImageAlignment = imageParagraph.getAlignment();
// remove old image
document.removeBodyElement(imageParagraphPos);
// now add new image
// BELOW LINE WILL CREATE AN IMAGE
// PARAGRAPH AT THE END OF THE DOCUMENT.
// REMOVE THIS IMAGE PARAGRAPH AFTER
// SETTING THE NEW IMAGE AT THE OLD IMAGE POSITION
XWPFParagraph newImageParagraph = document.createParagraph();
XWPFRun newImageRun = newImageParagraph.createRun();
//newImageRun.setText(newImageText);
newImageParagraph.setAlignment(oldImageAlignment);
try (FileInputStream is = new FileInputStream(imagePathNew)) {
newImageRun.addPicture(is, XWPFDocument.PICTURE_TYPE_JPEG, imagePathNew,
Units.toEMU(newImageWidth), Units.toEMU(newImageHeight));
}
// set new image at the old image position
document.setParagraph(newImageParagraph, imageParagraphPos);
// NOW REMOVE REDUNDANT IMAGE FORM THE END OF DOCUMENT
document.removeBodyElement(document.getBodyElements().size() - 1);
return document;
} catch (Exception e) {
throw new Exception("Unable to replace image '" + imageOldName + "' due to the exception:\n" + e);
} finally {
// cleanup code
}
}
Please visit https://bitbucket.org/wishcoder/java-poi-word-document/wiki/Home for more examples like:
Open existing Microsoft Word Document (*.docx)
Clone Table in Word Document and add new data to cloned table
Update existing Table->Cell data in document
Update existing Hyper Link in document
Replace existing Image in document
Save update Microsoft Word Document (*.docx)
I recommend using transparent tables to track images. following code will replace table row 0 col 1 cell's picture.
List<XWPFParagraph> paragraphs = table.getRow(0).getCell(1).getParagraphs();
for (XWPFParagraph para: paragraphs) {
for (XWPFRun r : para.getRuns()) {
CTR ctr = r.getCTR();
List<CTDrawing> drawings = ctr.getDrawingList();
for (int i = 0; i < drawings.size(); i++) {
ctr.removeDrawing(i);
}
}
}
XWPFParagraph paragraph = table.getRow(0).getCell(1).addParagraph();
XWPFRun run = paragraph.createRun();
FileInputStream fis = new FileInputStream('filepath');
run.addPicture(fis, XWPFDocument.PICTURE_TYPE_PNG, "filename", Units.toEMU(200), Units.toEMU(60));
I'm working with a JTextPane.
JTextPane pane = new JTextPane();
String content = "I'm a line of text that will be displayed in the JTextPane";
StyledDocument doc = pane.getStyledDocument();
SimpleAttributeSet aSet = new SimpleAttributeSet();
If I add this aSet to the textpane's document like this:
doc.setParagraphAttributes(0, content.length(), aSet, false);
Nothing visible happens. No big surprise since I haven't set any custom attributes for aSet. However, if I allow aSet to replace the current ParagraphAttributes of doc like this:
doc.setParagraphAttributes(0, content.length(), aSet, true);
A lot of things happen. How can I get information on those default values of the JTextPane document? Particularly my problems is that when I'm defining a custom Font for aSet and set it to replace the current attributes, the font is displayed as if it was bold. StyleConstants.setBold(aSet, false); doesn't help.
I have looked at the source code to see what data structures are holding the information that you want. This is a modification of that code that prints the attributes for each paragraph.
int offset, length; //The value of the first 2 parameters in the setParagraphAttributes() call
Element section = doc.getDefaultRootElement();
int index0 = section.getElementIndex(offset);
int index1 = section.getElementIndex(offset + ((length > 0) ? length - 1 : 0));
for (int i = index0; i <= index1; i++)
{
Element paragraph = section.getElement(i);
AttributeSet attributeSet = paragraph.getAttributes();
Enumeration keys = attributeSet.getAttributeNames();
while (keys.hasMoreElements())
{
Object key = keys.nextElement();
Object attribute = attributeSet.getAttribute(key);
//System.out.println("key = " + key); //For other AttributeSet classes this line is useful because it shows the actual parameter, like "Bold"
System.out.println(attribute.getClass());
System.out.println(attribute);
}
}
The output for a simple textPane with some text added through the setText() method gives:
class javax.swing.text.StyleContext$NamedStyle
NamedStyle:default {foreground=sun.swing.PrintColorUIResource[r=51,g=51,b=51],size=12,italic=false,name=default,bold=false,FONT_ATTRIBUTE_KEY=javax.swing.plaf.FontUIResource[family=Dialog,name=Dialog,style=plain,size=12],family=Dialog,}
About your particular problem, looking at a related SO question I have been able to set the text of a paragraph to bold with:
StyleContext sc = StyleContext.getDefaultStyleContext();
AttributeSet aSet = sc.addAttribute(aSet, StyleConstants.Bold, true);
In this case the class of aSet is javax.swing.text.StyleContext$SmallAttributeSet which is not mutable (does not implement MutableAttributeSet). For your case something along the lines:
aSet.addAttribute(StyleConstants.Bold, true);
should work.
I want to append html content in the JEditorPane, but when I append in this way it inserts a line break automatically at the end of existing text, how to avoid this.
JEditorPane pn = new JEditorPane();
pn.setContentType("text/html");
pn.setText("This is line 1");
...
//after some time
HTMLDocument doc = (HTMLDocument) pn.getDocument();
HTMLEditorKit kit = (HTMLEditorKit) pn.getEditorKit();
kit.insertHTML(doc, doc.getLength(), "<b>Hello</b>", 0, 0, null);
kit.insertHTML(doc, doc.getLength(), "World", 0, 0, null);
It is going to place a linebreak at the end of existing text, everytime insertHTML() is called.
Is this a default behaviour?
If so how I can handle it?
HTMLDocument has methods
public void insertAfterStart(Element elem, String htmlText)
public void insertBeforeEnd(Element elem, String htmlText)
public void insertBeforeStart(Element elem, String htmlText)
public void insertAfterEnd(Element elem, String htmlText)
Where you can pass paragraph or character element (leaf) and html to be inserted
Probably not the best way, but you might try this:
pn.setText(pn.getText + "your text to add here");
When I open up my file in my text editor. I am only getting the file's location in the text pane. Am I making a simple mistake somewhere or is there a better way to do this? Should I use an ArrayList to store the images locations?
Example of what is happening: I have a file that has two lines...
C:\...\pic.png
(picture description)
When I try to open up the file (after I save it in the text editor) it shows the actual location of the picture. I want to be able to use BufferedImage to get the directory and add the image to the JTextPane. Otherwise (if the text isn't a location), simply add the text to the text pane.
FYI: textArea is of type JTextPane
Code that opens my file
// sb is my StringBuffer
try
{
b = new BufferedReader(new FileReader(filename));
String line;
while((line=b.readLine())!=null)
{
if (line.contains("C:\\...\\Pictures\\"))
{
BufferedImage image = ImageIO.read(new File(line));
ImageIcon selectedPicture = new ImageIcon(image);
textArea.insertIcon(selectedPicture);
}
sb.append(line + "\n");
textArea.setText(sb.toString());
}
b.close();
}
If you have any questions about this code or need clarification, don't hesitate to ask.
OK. The way you are setting content on to the JTextPane is incorrect.
The basic trick is to get StyleDocument out of the JTextPane and then set a Style on the document. A style basically explains how the component needs to be rendered. For example, text formatting, image icons, spacing etc.
Given that following code will get you started.
JTextPane textPane = new JTextPane();
try {
BufferedReader b = new BufferedReader(
new FileReader("inputfile.txt"));
String line;
StyledDocument doc = (StyledDocument) textPane.getDocument();
while ((line = b.readLine()) != null) {
if (line.contains("/home/user/pictures")) {
Style style = doc.addStyle("StyleName", null);
StyleConstants.setIcon(style, new ImageIcon(line));
doc.insertString(doc.getLength(), "ignore", style);
} else {
Style textStyle = doc.addStyle("StyleName", null);
//work on textStyle object to get required color/formatting.
doc.insertString(doc.getLength(), "\n" + line, textStyle);
}
}
b.close();
} catch (Exception e) {
e.printStackTrace();
}