Reading text from swf with StuartMacKay's transform-swf library

Reading text from swf with StuartMacKay's transform-swf library - java

I need to extract all the texts from some swf files. I'm using Java since I have a lot of modules developed with this language.
Thus, I did a search through the Web for all the free Java library devoted to handle SWF files.
Finally, I found the library developed by StuartMacKay. The library, named transform-swf, may be found on GitHub by clicking here.
The question is: Once I extract the GlyphIndexes from a TextSpan, how can I convert the glyps in characters?
Please, provide a complete working and tested example. No theoretical answer will be accepted nor answers like "it cannot be done", "it ain't possible", etc.
What I know and what I did
I know that the GlyphIndexes are built by using a TextTable, which is constructed by recurring to an integer that represente the font size and a font description provided by a DefineFont2 object, but when I decode all the DefineFont2, all have a zero length advance.
Here follows what I did.
//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));
//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
if (object instanceof DefineFont2) {
DefineFont2 df2 = (DefineFont2) object;
fonts.put(df2.getIdentifier(), df2);
}
}
//Now I retrieve all the texts
for (MovieTag object : list) {
if (object instanceof DefineText2) {
DefineText2 dt2 = (DefineText2) object;
for (TextSpan ts : dt2.getSpans()) {
Integer fontIdentifier = ts.getIdentifier();
if (fontIdentifier != null) {
int fontSize = ts.getHeight();
// Here I try to create an object that should
// reverse the process done by a TextTable
ReverseTextTable rtt =
new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
System.out.println(rtt.charactersForText(ts.getCharacters()));
}
}
}
}
The class ReverseTextTable follows here:
public final class ReverseTextTable {
private final transient Map<Character, GlyphIndex> characters;
private final transient Map<GlyphIndex, Character> glyphs;
public ReverseTextTable(final DefineFont2 font, final int fontSize) {
characters = new LinkedHashMap<>();
glyphs = new LinkedHashMap<>();
final List<Integer> codes = font.getCodes();
final List<Integer> advances = font.getAdvances();
final float scale = fontSize / EMSQUARE;
final int count = codes.size();
for (int i = 0; i < count; i++) {
characters.put((char) codes.get(i).intValue(), new GlyphIndex(i,
(int) (advances.get(i) * scale)));
glyphs.put(new GlyphIndex(i,
(int) (advances.get(i) * scale)), (char) codes.get(i).intValue());
}
}
//This method should reverse from a list of GlyphIndexes to a String
public String charactersForText(final List<GlyphIndex> list) {
String text="";
for(GlyphIndex gi: list){
text+=glyphs.get(gi);
}
return text;
}
}
Unfortunately, the list of advances from DefineFont2 is empty, then the constructor of ReverseTableText get an ArrayIndexOutOfBoundException.

Honestly, I don't know how to do that in Java. I'm not claiming that it is not possible, I also believe that there is a way to do that. However, you said that there are a lot of libraries that do that. You also suggested a library, i.e. swftools. So, I suggest to recurr to that library to extract the text from a flash file. To do that you can use Runtime.exec() just to execute a command line to run that library.
Personally, I prefer Apache Commons exec rather than the standard library released with JDK. Well, just let me show you how you should do. The executable file that you should use is "swfstrings.exe". Suppose that it is put in "C:\". Suppose that in the same folder you can find a flash file, e.g. page.swf. Then, I tried the following code (it works fine):
Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
commandLine.addArgument("\"" + swfFile.toString() + "\"");
DefaultExecutor executor = new DefaultExecutor();
executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
//0 for file not found, -1 for error
ByteArrayOutputStream stdout = new ByteArrayOutputStream();
PumpStreamHandler psh = new PumpStreamHandler(stdout);
executor.setStreamHandler(psh);
int exitValue;
try{
exitValue = executor.execute(commandLine);
}catch(org.apache.commons.exec.ExecuteException ex){
psh.stop();
}
if(!executor.isFailure(exitValue)){
String out = stdout.toString("UTF-8"); // here you have the extracted text
}
I know, this is not exactly the answer that you requested, but works fine.

I happened to be working on decompiling an SWF in Java now and I came across this question while figuring out how to reverse engineer the original text back.
After looking at the source code, I realise its really straightforward. Each font has an assigned sequence of characters that can be retrieved by calling DefineFont2.getCodes(), and the glyphIndex is the index to the matching character in DefineFont2.getCodes().
However, in cases where there are multiple fonts in use in a single SWF file, it is difficult to match each DefineText to the corresponding DefineFont2 because there's no attributes that identifies the DefineFont2 used for each DefineText.
To work around this issue, I came up with a self-learning algorithm which will attempt to guess the right DefineFont2 for each DefineText and hence derive the original text correctly.
To reverse engineer the original text back, I created a class called FontLearner:
public class FontLearner {
private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();
/**
* The same characters from the same font will have similar advance values.
* This constant defines the allowed difference between two advance values
* before they are treated as the same character
*/
private static final int ADVANCE_THRESHOLD = 10;
/**
* Some characters have outlier advance values despite being compared
* to the same character
* This constant defines the minimum accuracy level for each String
* before it is associated with the given font
*/
private static final double ACCURACY_THRESHOLD = 0.9;
/**
* This method adds a DefineFont2 to the learner, and a DefineText
* associated with the font to teach the learner about the given font.
*
* #param font The font to add to the learner
* #param text The text associated with the font
*/
private void addFont(DefineFont2 font, DefineText text) {
fonts.add(font);
HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
advancesMap.put(font.getIdentifier(), advances);
List<Integer> codes = font.getCodes();
List<TextSpan> spans = text.getSpans();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
advances.put(c, advance);
}
}
}
/**
*
* #param text The DefineText to retrieve the original String from
* #return The String retrieved from the given DefineText
*/
public String getString(DefineText text) {
StringBuilder sb = new StringBuilder();
List<TextSpan> spans = text.getSpans();
DefineFont2 font = null;
for (DefineFont2 getFont : fonts) {
List<Integer> codes = getFont.getCodes();
HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
if (advances == null) {
advances = new HashMap<Character, Integer>();
advancesMap.put(getFont.getIdentifier(), advances);
}
boolean notFound = true;
int totalMisses = 0;
int totalCount = 0;
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
totalCount += characters.size();
int misses = 0;
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
if (codes.size() > glyphIndex) {
char c = (char) (int) codes.get(glyphIndex);
Integer getAdvance = advances.get(c);
if (getAdvance != null) {
notFound = false;
if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
misses += 1;
}
}
} else {
notFound = false;
misses = characters.size();
break;
}
}
totalMisses += misses;
}
double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;
if (accuracy > ACCURACY_THRESHOLD && !notFound) {
font = getFont;
// teach this DefineText to the FontLearner if there are
// any new characters
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
if (advances.get(c) == null) {
advances.put(c, advance);
}
}
}
break;
}
}
if (font != null) {
List<Integer> codes = font.getCodes();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
sb.append(c);
}
sb = new StringBuilder(sb.toString().trim());
sb.append(" ");
}
}
return sb.toString().trim();
}
}
Usage:
Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());
FontLearner learner = new FontLearner();
DefineFont2 font = null;
List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
font = (DefineFont2) object;
} else if (object instanceof DefineText) {
DefineText text = (DefineText) object;
if (font != null) {
learner.addFont(font, text);
font = null;
}
String line = learner.getString(text); // reverse engineers the line
}
I am happy to say that this method has given me a 100% accuracy in reverse engineering the original String using StuartMacKay's transform-swf library.

Its seems to be difficult on what your trying to achieve, Your trying to secompile the file bur i am sorry to say that its not possible , What I would suggest you to do is to convert it into some bitmap (if possible) or by any other method try to read the characters using OCR
There are some software's which do that, you can also check some forums regarding that. Because once compiled version of swf is very difficult (and not possible as far as i know). You can check this decompiler if you want or try using some other languages like the project here

I had a similar problem with long strings using transform-swf library.
Got the source code and debugged it.
I believe there was a small bug in class com.flagstone.transform.coder.SWFDecoder.
Line 540 (applicable to version 3.0.2), change
dest += length;
with
dest += count;
That should do it for you (it's about extracting strings).
I notified Stuart as well. The problem appears only if your strings are very large.

I know this isn't what you asked but I needed to pull text from SWF recently using Java and found the ffdec library much better than transform-swf
Comment if anyone needs sample code

Related

Replace or remove text from PDF with PDFbox in Java

I'm trying to use PDFBOX 2.0 to replace empty or delete a text pattern, (in my case i want to remove all "[QR]" words from all PDF), but I can't find anything that works for me.
I tried itext, but the same, nothing works.
The "[QR]" string from my pdf were edited after the PDF was created, maybe that's why they don't appear as tj operators?
My main:
replaceText(documentoPDF, "[QR]", "");
My method (i printed Tj values and my pattern dont appear there):
public void replaceText(PDDocument documentoPDF, String searchString, String replacement) throws IOException{
for ( PDPage page : documentoPDF.getPages()){
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List<?> tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++){
Object next = tokens.get(j);
if (next instanceof Operator){
Operator op = (Operator) next;
String pstring = "";
int prej = 0;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj"))
{
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else
if (op.getName().equals("TJ"))
{
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++)
{
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString)
{
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
if (j == prej) {
pstring += string;
} else {
prej = j;
pstring = string;
}
}
}
System.out.println(pstring.trim());
if (searchString.equals(pstring.trim()))
{
COSString cosString2 = (COSString) previous.getObject(0);
cosString2.setValue(replacement.getBytes());
int total = previous.size()-1;
for (int k = total; k > 0; k--) {
previous.remove(k);
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(documentoPDF);
OutputStream out = updatedStream.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
out.close();
page.setContents(updatedStream);
}
documentoPDF.save("resources\\resultado\\nuevo.pdf");
}
This is an example of pdf with some [QR] patterns: http://www.mediafire.com/file/9w3kkc4yozwsfms/file
If someone can help, i will appreciate it.
I can upload my entire project if you need
Thanks in advance.

As already mentioned in comments, the reason why your code doesn't work is simple - you completely ignore the encoding of the font of that text. In the content stream there actually are [( >) ( 4) ( 5) ( #) ] TJ instructions (The "spaces" before '>', '4', '5', and '#' actually are zero bytes, 0x00). Thus, apparently the encoding is some 16bit encoding which additionally does not have ASCII naturally embedded.
To properly take the font into account one has to keep track of the current font. This means parsing the whole content stream and analyzing text font setting calls, save graphics state calls, and restore graphics state calls. Then you have to retrieve the proper font object from the correct resources.
All this actually is already done by the PDFBox content parsing framework used for e.g. text extraction. Thus, we can create a content stream editor around this framework.
Actually, this also has already been done, see the PdfContentStreamEditor from this answer.
As in case of your document the text pieces to delete are drawn by a single text drawing instruction each and each of these instructions draws only a text piece to remove, we can simply look at the text the current instruction draws and then decide whether to keep the instruction or not:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
final StringBuilder recentChars = new StringBuilder();
#Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, Vector displacement)
throws IOException {
String string = font.toUnicode(code);
if (string != null)
recentChars.append(string);
super.showGlyph(textRenderingMatrix, font, code, displacement);
}
#Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String recentText = recentChars.toString();
recentChars.setLength(0);
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString) && "[QR]".equals(recentText))
{
return;
}
super.write(contentStreamWriter, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
editor.processPage(page);
}
document.save("nuevo-noQrText.pdf");
(EditPageContent test testRemoveQrTextNuevo)
Depending on your PDFBox version the showGlyph method to override may have a fifth parameter; thus, please check the showGlyph signature of your PDFBox copy and adapt if this code does not work. Thanks to #DanielNorberg for the hint!
In the result the "[QR]" texts underneath the QR codes have vanished, e.g.
became

How to replace text in a pdf with correct encoding using Itext

I create a java program for translating PDFs. I am using google API for translation. I am getting the translation correct on my Eclipse IDE Console but when I check the newly created pdf, either it's not translated and copied as it is or few words are translated or the new pdf comes as empty and sometimes corrupted.
I suppose it has something to do with encoding & font types.
I have already gone through the Itext page & all the related questions but none worked for my case. I am trying to translate Portuguese Spanish Finnish French Hungarian, etc into English.
Here is my code:
public static final String SRC = "5587309Finnish.pdf";
public static final String DEST = "changed.pdf";
public static void main(String[] args) throws java.io.IOException, DocumentException {
Translate translate = TranslateOptions.getDefaultInstance().getService();
PdfReader reader = new PdfReader(SRC);
int pages = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(DEST));
for(int i=1;i<=pages;i++) {
PdfDictionary dict = reader.getPageN(i);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
if (object instanceof PRStream) {
String pageContent =
PdfTextExtractor.getTextFromPage(reader, i);
String[] word = pageContent.split(" ");
PRStream stream = (PRStream) object;
byte[] data = PdfReader.getStreamBytes(stream);
String dd = new String(data, BaseFont.CP1252);
for (int j=0; j < word.length; j++)
{
Translation translation = translate.translate(word[j],Translate.TranslateOption.sourceLanguage("fi"),
Translate.TranslateOption.targetLanguage("en"));
System.out.println(word[j]+"-->>"+translation.getTranslatedText());//here i can check the translation is correct.
dd = dd.replace(word[j],translation.getTranslatedText());
}
stream.setData(dd.getBytes());
}
}
stamper.close();
reader.close();
}
Please help.

According to a comment you have improved your code and are
getting the update dd(i.e. content stream which I am printing) correctly with the replaced text. I don't know why I am getting a blank pdf
Thus, I assume that your (hopefully representative) test PDFs have all their fonts of interest encoded in ANSI'ish encodings and the text arguments of the text drawing instructions contain whole words or even phrases which can properly be processed because otherwise text replacement would not have been possible.
Thus, here an example how one can replace text pieces with similarly long ones under such benign circumstances without breaking the content stream syntax. In this example I simply use a Map containing replacement strings. You can do your translation there.
First a frame loading the source, creating a stamper, iterating over the pages, and calling a helper to create a content stream replacement:
Map<String, String> replacements = new HashMap<>();
replacements.put("Förfallodatum", "Ablaufdatum");
try ( InputStream resource = SOURCE_INPUTSTREAM;
OutputStream result = new FileOutputStream(RESULT_FILE) ) {
PdfReader pdfReader = new PdfReader(resource);
PdfStamper pdfStamper = new PdfStamper(pdfReader, result);
for (int pageNum = 1; pageNum <= pdfReader.getNumberOfPages(); pageNum++) {
PdfDictionary page = pdfReader.getPageN(pageNum);
byte[] pageContentInput = ContentByteUtils.getContentBytesForPage(pdfReader, pageNum);
page.remove(PdfName.CONTENTS);
replaceInStringArguments(pageContentInput, pdfStamper.getUnderContent(pageNum), replacements);
}
pdfStamper.close();
}
(EditPageContentSimple test testReplaceInStringArgumentsForklaringAvFakturan)
The method replaceInStringArguments now parses the instructions in the given content stream, isolates string arguments, and calls another helper for each string argument doing the replacement.
void replaceInStringArguments(byte[] contentBytesBefore, PdfContentByte canvas, Map<String, String> replacements) throws IOException {
PRTokeniser tokeniser = new PRTokeniser(new RandomAccessFileOrArray(new RandomAccessSourceFactory().createSource(contentBytesBefore)));
PdfContentParser ps = new PdfContentParser(tokeniser);
ArrayList<PdfObject> operands = new ArrayList<PdfObject>();
while (ps.parse(operands).size() > 0){
for (int i = 0; i < operands.size(); i++) {
PdfObject pdfObject = operands.get(i);
if (pdfObject instanceof PdfString) {
operands.set(i, replaceInString((PdfString)pdfObject, replacements));
} else if (pdfObject instanceof PdfArray) {
PdfArray pdfArray = (PdfArray) pdfObject;
for (int j = 0; j < pdfArray.size(); j++) {
PdfObject arrayObject = pdfArray.getPdfObject(j);
if (arrayObject instanceof PdfString) {
pdfArray.set(j, replaceInString((PdfString)arrayObject, replacements));
}
}
}
}
for (PdfObject object : operands)
{
object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
canvas.getInternalBuffer().append((byte) ' ');
}
canvas.getInternalBuffer().append((byte) '\n');
}
}
(EditPageContentSimple helper method)
The method replaceInString in turn retrieves a single string operand (a PdfString instance), manipulates it, and returns the manipulated string version:
PdfString replaceInString(PdfString string, Map<String, String> replacements) {
String value = PdfEncodings.convertToString(string.getBytes(), PdfObject.TEXT_PDFDOCENCODING);
for (Map.Entry<String, String> entry : replacements.entrySet()) {
value = value.replace(entry.getKey(), entry.getValue());
}
return new PdfString(PdfEncodings.convertToBytes(value, PdfObject.TEXT_PDFDOCENCODING));
}
(EditPageContentSimple helper method)
Instead of that for loop here you would call your translation routine and translate value.
As has been mentioned before, this code only works under certain benign circumstances. Don't expect it to work for arbitrary documents from the wild, in particular not for documents with other than Western European glyphs.

Implementing save/open with RichTextFX?

Here is my code:
private void save(File file) {
StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle> doc = textarea.getDocument();
// Use the Codec to save the document in a binary format
textarea.getStyleCodecs().ifPresent(codecs -> {
Codec<StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle>> codec
= ReadOnlyStyledDocument.codec(codecs._1, codecs._2, textarea.getSegOps());
try {
FileOutputStream fos = new FileOutputStream(file);
DataOutputStream dos = new DataOutputStream(fos);
codec.encode(dos, doc);
fos.close();
} catch (IOException fnfe) {
fnfe.printStackTrace();
}
});
}
I am trying to implement the save/loading from the demo from here on the RichTextFX GitHub.
I am getting errors in the following lines:
StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle> doc = textarea.getDocument();
error: incompatible types:
StyledDocument<Collection<String>,StyledText<Collection<String>>,Collection<String>>
cannot be converted to
StyledDocument<ParStyle,Either<StyledText<TextStyle>,LinkedImage<TextStyle>>,TextStyle>
and
= ReadOnlyStyledDocument.codec(codecs._1, codecs._2, textarea.getSegOps());
error: incompatible types: inferred type does not conform to equality
constraint(s) inferred: ParStyle
equality constraints(s): ParStyle,Collection<String>
I have added all the required .java files and imported them into my main code. I thought it would be relatively trivial to implement this demo but it has been nothing but headaches.
If this cannot be resolved, does anyone know an alternative way to save the text with formatting from RichTextFX?
Thank you

This question is quite old, but since i ran into the same problem i figured a solution might be useful to others as well.
In the demo, the code from which you use, ParStyle and TextStyle (Custom Types) are used for defining how information about the style is stored.
The error messages you get pretty much just tell you that your way of storing the information about the style (In your case in a String) is not compatible with the way it is done in the demo.
If you want to store the style in a String, which i did as well, you need to implement some way of serializing and deserializing the information yourself.
You can do that, for example (I used an InlineCssTextArea), in the following way:
public class SerializeManager {
public static final String PAR_REGEX = "#!par!#";
public static final String PAR_CONTENT_REGEX = "#!pcr!#";
public static final String SEG_REGEX = "#!seg!#";
public static final String SEG_CONTENT_REGEX = "#!scr!#";
public static String serialized(InlineCssTextArea textArea) {
StringBuilder builder = new StringBuilder();
textArea.getDocument().getParagraphs().forEach(par -> {
builder.append(par.getParagraphStyle());
builder.append(PAR_CONTENT_REGEX);
par.getStyledSegments().forEach(seg -> builder
.append(
seg.getSegment()
.replaceAll(PAR_REGEX, "")
.replaceAll(PAR_CONTENT_REGEX, "")
.replaceAll(SEG_REGEX, "")
.replaceAll(SEG_CONTENT_REGEX, "")
)
.append(SEG_CONTENT_REGEX)
.append(seg.getStyle())
.append(SEG_REGEX)
);
builder.append(PAR_REGEX);
});
String textAreaSerialized = builder.toString();
return textAreaSerialized;
}
public static InlineCssTextArea fromSerialized(String string) {
InlineCssTextArea textArea = new InlineCssTextArea();
ReadOnlyStyledDocumentBuilder<String, String, String> builder = new ReadOnlyStyledDocumentBuilder<>(
SegmentOps.styledTextOps(),
""
);
if (string.contains(PAR_REGEX)) {
String[] parsSerialized = string.split(PAR_REGEX);
for (int i = 0; i < parsSerialized.length; i++) {
String par = parsSerialized[i];
String[] parContent = par.split(PAR_CONTENT_REGEX);
String parStyle = parContent[0];
List<String> segments = new ArrayList<>();
StyleSpansBuilder<String> spansBuilder = new StyleSpansBuilder<>();
String styleSegments = parContent[1];
Arrays.stream(styleSegments.split(SEG_REGEX)).forEach(seg -> {
String[] segContent = seg.split(SEG_CONTENT_REGEX);
segments.add(segContent[0]);
if (segContent.length > 1) {
spansBuilder.add(segContent[1], segContent[0].length());
} else {
spansBuilder.add("", segContent[0].length());
}
});
StyleSpans<String> spans = spansBuilder.create();
builder.addParagraph(segments, spans, parStyle);
}
textArea.append(builder.build());
}
return textArea;
}
}
You can then take the serialized InlineCssTextArea, write the resulting String to a file, and load and deserialize it.
As you can see in the code, i made up some Strings as regexes which will be removed in the serialization process (We don't want our Serializer to be injectable, do we ;)).
You can change these to whatever you like, just note they will be removed if used in the text of the TextArea, so they should be something users wont miss in their TextArea.
Also note that this solution serializes the Style of the Text, the Text itself and the Paragraph style, BUT not inserted images or parameters of the TextArea (such as width and height), just the text content of the TextArea with its Style.
This issue on github really helped me btw.

Error or exceptions saving a .txt file (Java)

I run the following piece of Processing (Java) code inside a bigger loop. These lines save a string in a .txt file called kinectDEM.tmp, before doing that, the old file is renamed to kinectDEM1.txt and the new one (kinectDEM.tmp) is renamed to kinectDEM0.txt.
It works fine but sometimes it get stuck and the kinectDEM1.txt file disappears, the code still work but doesn't save the .txt files. No error message appears.
Is there something wrong saving .txt files in that way?
Here's the code:
import java.io.File;
import SimpleOpenNI.*;
import java.util.*;
SimpleOpenNI kinect;
List<int[]> previousKinectValues = new LinkedList<int[]>();
int numPreviousToConsider = 60;
void setup()
{
size(640, 480);
kinect = new SimpleOpenNI(this);
kinect.enableDepth();
frameRate(60);
}
int precedente = millis();
void draw()
{
kinect.update();
PImage depthImage = kinect.depthImage();
image(depthImage, 0, 0);
int[] newDepthValues = kinect.depthMap();
previousKinectValues.add(newDepthValues);
if (previousKinectValues.size() > numPreviousToConsider) {
previousKinectValues.remove(0);
}
int[] depthValues = average(previousKinectValues);
depthValues = reverse(depthValues);
StringBuilder sb = new StringBuilder();
Deque<Integer> row = new LinkedList<Integer>();
int kinectheight = 770; // kinect distance from the baselevel [mm]
int scaleFactor = 1;
int pixelsPerRow = 640;
int pixelsToSkip = 40;
int rowNum = 0;
for (int i = 0; i < depthValues.length; i++) {
if (i > 0 && i == (rowNum + 1) * pixelsPerRow) {
fillStringBuilder(sb, row);
rowNum++;
sb.append("\n");
row = new LinkedList<Integer>();
}
if (i < ((rowNum+1) * pixelsPerRow) - pixelsToSkip) {
//if (i >= (rowNum * pixelsPerRow) + pixelsToSkip) {
row.addFirst((kinectheight - depthValues[i]) * scaleFactor);
}
}
fillStringBuilder(sb, row);
String kinectDEM = sb.toString();
final String[] txt= new String[1]; //creates a string array of 2 elements
int savingtimestep = 2000; // time step in millisec between each saving
if (millis() > precedente + savingtimestep) {
txt[0] = "ncols 600\nnrows 480\nxllcorner 0\nyllcorner 0\ncellsize 1\nNODATA_value 10\n" +kinectDEM;
saveStrings("kinectDEM0.tmp", txt);
precedente = millis();
// delete the old .txt file, from kinectDEM1 to kinectDEMtrash
File f = new File(sketchPath("kinectDEM1.txt"));
boolean success = f.delete();
// rename the old .txt file, from kinectDEM0 to kinectDEM1
File oldName1 = new File(sketchPath("kinectDEM0.txt"));
File newName1 = new File(sketchPath("kinectDEM1.txt"));
oldName1.renameTo(newName1);
// rename kinectDEM0.tmp file to kinectDEM0.txt
File oldName2 = new File(sketchPath("kinectDEM0.tmp"));
File newName2 = new File(sketchPath("kinectDEM0.txt"));
oldName2.renameTo(newName2);
}
}
void fillStringBuilder(StringBuilder sb, Deque<Integer> row) {
boolean emptyRow = false;
while (!emptyRow) {
Integer val = row.pollFirst();
if (val == null) {
emptyRow = true;
} else {
sb.append(val);
val = row.peekFirst();
if (val != null) {
sb.append(" ");
}
}
}
}
int[] average(List<int[]> previousKinectValues) {
if (previousKinectValues.size() > 0) {
int[] first = previousKinectValues.get(0);
int[] avg = new int[first.length];
for (int[] prev : previousKinectValues) {
for (int i = 0; i < prev.length; i++) {
avg[i] += prev[i];
}
}
int num = previousKinectValues.size();
for (int i = 0; i < avg.length; i++) {
avg[i] /= num;
}
return avg;
}
return new int[0];
}

You can simplify your problem into a much smaller example sketch:
saveStrings("one.txt", new String[]{"ABC"});
File oldName = new File(sketchPath("one.txt"));
File newName = new File(sketchPath("two.txt"));
boolean renamed = oldName.renameTo(newName);
println(renamed);
In the future, please try to narrow your problem down to an MCVE like this.
Anyway, this program saves the text ABC to a file named one.txt. It then tries to rename one.txt to two.txt. This is exactly what you're trying to do, just without all that extra kinect code, which doesn't really have anything to do with your problem.
Please run this little example program and then view your sketch directory (Sketch > Show Sketch Folder). You'll see the two.txt file, and that file will contain the text ABC on a single line. Also notice that the program prints true to the console, indicating that the rename was successful. This is exactly what you'd expect.
Now, change the first line to this:
saveStrings("one.txt", new String[]{"XYZ"});
Run the program again. First notice that it prints out false to the console, indicating that the rename was not successful. Then view the sketch folder, and you'll see two text files: one.txt which contains XYZ and two.txt which contains ABC. This is not what we expect, and this is what's happening in your code as well.
So, what's happening is this:
We run our code, create one.txt containing ABC, then rename it to two.txt. We then run the code again, creating one.txt containing XYZ. We then try to rename that new file to two.txt, but we can't.
From the Java API for the File#renameTo() function, emphasis mine:
Many aspects of the behavior of this method are inherently platform-dependent: The rename operation might not be able to move a file from one filesystem to another, it might not be atomic, and it might not succeed if a file with the destination abstract pathname already exists. The return value should always be checked to make sure that the rename operation was successful.
Note that the Files class defines the move method to move or rename a file in a platform independent manner.
So it looks like the rename step is failing, becuase you can't rename a file to overwrite an existing file. Instead, we can use the Files#move() function, which allows us to specify overwrite options:
Files.move(oldName.toPath(), newName.toPath(), StandardCopyOption.REPLACE_EXISTING);

How do I scan a folder in Java?

I need to scan a particular folder in Java, and be able to return the integer number of files of a particular type (based on not only extension but also naming convention.) For example, I want to know how many JPG files there are in the \src folder that have a simple integer filename (say, 1.JPG through 30.JPG). Can anyone point me in the right direction? Thx

java.io.File.list(FilenameFilter) is the method you're looking for.

I have a method that uses a regex pattern for a rather complicated file structure. Something like that could be used, although I'm sure it could be written more concisely than my example (edited for security).
/**
* Get all non-directory filenames from a given foo/flat directory
*
* #param network
* #param typeRegex
* #param locationRegex
* #return
*/
public List<String> getFilteredFilenames(String network, String typeRegex, String locationRegex) {
String regex = null;
List<String> filenames = new ArrayList<String>();
String directory;
// Look at the something network
if (network.equalsIgnoreCase("foo")) {
// Get the foo files first
directory = this.pathname + "/" + "foo/filtered/flat";
File[] foofiles = getFilenames(directory);
// run the regex if need be.
if (locationRegex != null && typeRegex != null ) {
regex = typeRegex + "." + locationRegex;
//System.out.println(regex);
}
for (int i = 0; i < foofiles.length; i++) {
if (foofiles[i].isFile()) {
String file = foofiles[i].getName();
if (regex == null) {
filenames.add(file);
}
else {
if (file.matches(regex)) {
filenames.add(file);
}
}
}
}
}
return filenames;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading text from swf with StuartMacKay's transform-swf library - java

I know this isn't what you asked but I needed to pull text from SWF recently using Java and found the ffdec library much better than transform-swf Comment if anyone needs sample code

Related

Replace or remove text from PDF with PDFbox in Java

How to replace text in a pdf with correct encoding using Itext

Implementing save/open with RichTextFX?

Error or exceptions saving a .txt file (Java)

How do I scan a folder in Java?

Categories

Resources