Email Reading in Java by keeping the text positions

Email Reading in Java by keeping the text positions - java

I am trying to read emails using Java. I got the in-box mails correctly.
But the problem is text body is shown in line by line. I need the body text as it is shown in mail,ie output text should be in same order (rephrasing proposal: "tabular alignment") as shown in mail.
This is the code I used to get body text from Message object,
private static String getTextFromMessage(Message message) throws MessagingException, IOException
{
String result = "";
if (message.isMimeType("text/plain"))
{
result = message.getContent().toString();
}
else if (message.isMimeType("multipart/*"))
{
MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
result = getTextFromMimeMultipart(mimeMultipart);
}
return result;
}
private static String getTextFromMimeMultipart(MimeMultipart mimeMultipart) throws MessagingException, IOException
{
String result = "";
int count = mimeMultipart.getCount();
for (int i = 0; i < count; i++)
{
BodyPart bodyPart = mimeMultipart.getBodyPart(i);
if (bodyPart.isMimeType("text/plain"))
{
result = result + "\n" + bodyPart.getContent();
break;
}
else if (bodyPart.isMimeType("text/html"))
{
String html = (String) bodyPart.getContent();
result = result + "\n" + org.jsoup.Jsoup.parse(html).text();
}
else if (bodyPart.getContent() instanceof MimeMultipart)
{
result = result + getTextFromMimeMultipart((MimeMultipart) bodyPart.getContent());
}
}
return result;
}
For example, this is the mail content:
.
I need output as,
Beschreibung Stückpreis Anzahl Betrag
22545047 106,56 EUR 1 €106,56 EUR
as it as shown in mail.
But I got the output,
Beschreibung
Stückpreis
Anzahl
Betrag
22545047
106,56 EUR
1
€106,56 EUR
Can anyone please help me to solve this issue.
Thanks in advance
By the way, the strange words are German for "description", "price per piece", "number of pieces", "total price for this kind". I.e. they form a bill and are irrelevant for the problem.

It seems that you do not like the newlines which you explicitly insert in some of your "rendering" methods.
In order to get rid of them, delete all occurrences of
+ "\n"
in your code.
Then consider adding a single + "\n" at the end of the output.
In case the text your are outputting is the result of a html->plain text conversion, you lose the tabular alignment created by the html rendering. There are no "ten spaces". In order to get the alignment information translated into ascii-art (spaces to align columns) you'd have to do some intense analysis of the html markup and derive an appropriate number of spaces to insert.

Related

JavaMail - how to read actual message content, instead of "javax.mail.internet.MimeMultipart"

I have created a basic email client, which fetches email from the server, and displays it to the user.
I have more code than this, but this is the main body of it that prints the messages:
// retrieve the messages from the folder in an array and print it
Message[] messages = emailFolder.getMessages();
System.out.println("messages.length---" + messages.length);
for (int i = 0, n = messages.length; i < n; i++) {
String content= messages[i].getContent().toString();
Message message = messages[i];
System.out.println("---------------------------------");
System.out.println("Email Number " + (i + 1));
System.out.println("Subject: " + message.getSubject());
System.out.println("From: " + message.getFrom()[0]);
System.out.println("Text: " + message.getContent().toString());
}
However, when I run the program, I get the following output:
messages.length---4
---------------------------------
Email Number 1
Subject: Access for less secure apps has been turned on
From: Google <no-reply#accounts.google.com>
Text: javax.mail.internet.MimeMultipart#69ea3742
---------------------------------
Email Number 2
Subject: Three tips to get the most out of Gmail
From: Gmail Team <mail-noreply#google.com>
Text: javax.mail.internet.MimeMultipart#71318ec4
---------------------------------
Email Number 3
Subject: Stay more organized with Gmail's inbox
From: Gmail Team <mail-noreply#google.com>
Text: javax.mail.internet.MimeMultipart#21213b92
---------------------------------
Email Number 4
Subject: The best of Gmail, wherever you are
From: Gmail Team <mail-noreply#google.com>
Text: javax.mail.internet.MimeMultipart#a67c67e
Process finished with exit code 0
Is there any way for the actual message to be displayed, instead of the mime multipart? For example, if an email read "Hello World!" I would like it to print this after "Text:" for the relevant email message.
Any help on this would be highly appreciated!!

The JavaMail FAQ has the following example code, but it's not going to make any sense to you unless you understand how MIME messages work.
private boolean textIsHtml = false;
/**
* Return the primary text content of the message.
*/
private String getText(Part p) throws
MessagingException, IOException {
if (p.isMimeType("text/*")) {
String s = (String)p.getContent();
textIsHtml = p.isMimeType("text/html");
return s;
}
if (p.isMimeType("multipart/alternative")) {
// prefer html text over plain text
Multipart mp = (Multipart)p.getContent();
String text = null;
for (int i = 0; i < mp.getCount(); i++) {
Part bp = mp.getBodyPart(i);
if (bp.isMimeType("text/plain")) {
if (text == null)
text = getText(bp);
continue;
} else if (bp.isMimeType("text/html")) {
String s = getText(bp);
if (s != null)
return s;
} else {
return getText(bp);
}
}
return text;
} else if (p.isMimeType("multipart/*")) {
Multipart mp = (Multipart)p.getContent();
for (int i = 0; i < mp.getCount(); i++) {
String s = getText(mp.getBodyPart(i));
if (s != null)
return s;
}
}
return null;
}

The content of a MimeMultipart itself can be of type MimeMultipart.
In such cases, you will need to write a recursive parsing method until the entire body has been parsed.
if (bodyPart.getContent() instanceof MimeMultipart){
//Parse body again
}

Thanks for the answers guys!
I found a solution - and I won't post the exact code here, but I'll say what I used in case anyone in future comes across this thread.
I basically created a Multipart object, and used the getContent() method as suggested.
I also created a BodyPart object, branching from this Multipart.
Then it was a simple case of just printing this to the system output.

Not able to replace a text in PDF using PDFBox 2.0.2

My requirements
1) I need to identify a particular text pattern
2) Then replace that
text pattern with pre-defined text-value with the same format of text
pattern, such as font, font colour, bold …
3) I am able to identify
the text, replace that text with predefined values, But writing to
PDF is failing.
I tried the following 2 appraches to write to PDF
1) By Overriding writeString(String string, List textPositions)of PDFTextStripper
2) By using cosArray.add(new COSString(replacedField)); or cosArray.set(…)
Results for approach 1 - By Overriding writeString
The pdf generated by this code is not getting opened in PDF. I am able to open in word, But there is no format of original text.
Results for approach 2 - By using cosArray.add or cosArray.set(…)
I am seeing only boxes in generated PDF .
Code for approach 1 - By Overriding writeString
public void rewrite(String templatePDFPath) throws IOException {
PDDocument document = null;
Writer pdfWriter = null;
try {
File templateFile = new File(templatePDFPath);
document = PDDocument.load(templateFile);
this.setSortByPosition(true);
this.setStartPage(0);
this.setEndPage(document.getNumberOfPages());
pdfWriter = new PrintWriter(Utils.getFilePathWithTimeStamp(templatePDFPath).toString());
this.writeText(document, pdfWriter);
} finally {
if (document != null) {
document.close();
}
if (null != pdfWriter)
pdfWriter.close();
// if (null != pdfWriter)
// pdfWriter.close();
}
}
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
for (int i = 0; i < textPositions.size(); i++) {
TextPosition text = textPositions.get(i);
String currentCharcter = text.getUnicode();
// System.out.println("String[" + text.getXDirAdj() + "," + //
// text.getYDirAdj() + " fs=" + text.getFontSize() // + " xscale=" +
// text.getXScale() + " height=" + // text.getHeightDir() + "
// space=" // +
// text.getWidthOfSpace() + " width=" + text.getWidthDirAdj() + //
// "]" +
// currentCharcter);
}
String replacedString = replaceFields(string.trim());
if (!(string.equals(replacedString))) {
System.out.println("Field " + string + " is replaced by value " + replacedString);
// super.writeString(replacedString, textPositions);
super.writeString(replacedString);
}
}
Code for approach 2 - By using cosArray.add or cosArray.set(…)
public List<String> replaceFieldsInCosArray(COSArray cosArray) {
List<String> replacedStrings = new ArrayList<String>();
String stringsOfCOSArray = "";
for (int cosArrayIndex = 0; cosArrayIndex < cosArray.size(); cosArrayIndex++) {
Object cosObject = cosArray.get(cosArrayIndex);
if (cosObject instanceof COSString) {
COSString cosString = (COSString) cosObject;
stringsOfCOSArray += cosString.getString();
}
}
stringsOfCOSArray = stringsOfCOSArray.trim();
//cosArray.clear();
String replacedField = this.replaceFields(stringsOfCOSArray);
System.out.println("cosText:" + stringsOfCOSArray + ":replacedField:" + replacedField);
cosArray.add(new COSString(replacedField));
if (!stringsOfCOSArray.equals(replacedField)) {
replacedStrings.add(replacedField);
}
strong text

1) By Overriding writeString(String string, List textPositions)of PDFTextStripper
PDFTextStripper is a tool for extraction of plain text. Thus, it is not surprising that your output cannot be opened as pdf. Furthermore, word can open it because word recognises it as plain text and opens it as such.
2) By using cosArray.add(new COSString(replacedField)); or cosArray.set(…)
It is not really clear what you mean here. In particular, which cosArray are you talking about?
One might assume you mean the parameter of the TJ operator but there are multiple reasons against that assumption:
the TJ operator is but one of the many text showing operators and the only one accepting am array argument; thus, you would look at only a few of the operators in question;
your code would assume that the whole text pattern you try to identify is drawn by the same operation; why should it?
you seem to assume that cosString.getString() returns something intelligible; unfortunately that is not the case in general, merely if the fonts in question usesome standard encoding which had been becoming less and less common;
furthermore, you assume that the glyphs for your replacement text are contained in the font of the replaced text. Why should they? Embedded font subsets have become more and more common...
Thus, what do you actually mean here?
That all being said, if you happen to work merely with naively built pdfs, you might want to look at the answer to the question #Tilmann pointed you to. There is a small set of pdfs that code may work for.
If your pdfs happen to be more sophisticated, though, even describing the approach would be beyond the scope of a single stackoverflow answer.
By the way, your requirements are not well defined, in particular
replace that text pattern with pre-defined text-value with the same format of text pattern, such as font, font colour, bold …
If the predefined text has three letters, the replacement has two letters, and the found occurrence has the first glyph in red, the second in green, and the third in blue, how should the two replacement glyphs be drawn using those three colors?

Cannot get values from splitted Array String into a String

I am trying to get the values out of String[] value; into String lastName;, but I get errors and it says java.lang.ArrayIndexOutOfBoundsException: 2
at arduinojava.OpenFile.openCsv(OpenFile.java:51) (lastName = value[2];). Here is my code, but I am not sure if it is going wrong at the split() or declaring the variables or getting the data into another variable.
Also I am calling input.next(); three times for ignoring first row, because otherwise of study of Field of study would also be printed out..
The rows I am trying to share are in a .csv file:
University Firstname Lastname Field of study
Karlsruhe Jerone L Software Engineering
Amsterdam Shahin S Software Engineering
Mannheim Saman K Artificial Intelligence
Furtwangen Omid K Technical Computing
Esslingen Cherelle P Technical Computing
Here's my code:
// Declare Variable
JFileChooser fileChooser = new JFileChooser();
StringBuilder sb = new StringBuilder();
// StringBuilder data = new StringBuilder();
String data = "";
int rowCounter = 0;
String delimiter = ";";
String[] value;
String lastName = "";
/**
* Opencsv csv (comma-seperated values) reader
*/
public void openCsv() throws Exception {
if (fileChooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
// Get file
File file = fileChooser.getSelectedFile();
// Create a scanner for the file
Scanner input = new Scanner(file);
// Ignore first row
input.next();
input.next();
input.next();
// Read from input
while (input.hasNext()) {
// Gets whole row
// data.append(rowCounter + " " + input.nextLine() + "\n");
data = input.nextLine();
// Split row data
value = data.split(String.valueOf(delimiter));
lastName = value[2];
rowCounter++;
System.out.println(rowCounter + " " + data + "Lastname: " + lastName);
}
input.close();
} else {
sb.append("No file was selected");
}
}

lines are separated by spaces not by semicolon as per your sample. Try in this way to split based on one or more spaces.
data.split("\\s+");
Change the delimiter as shown below:
String delimiter = "\\s+";
EDIT
The CSV file should be in this format. All the values should be enclosed inside double quotes and there should be a valid separator like comma,space,semicolon etc.
"University" "Firstname" "Lastname" "Field of study"
"Karlsruhe" "Jerone" "L" "Software Engineering"
"Amsterdam" "Shahin" "S" "Software Engineering"

Please check if you file is using delimiter as ';' if not add it and try it again, it should work!!

Use OpenCSV Library for read CSV files .Here is a detailed example on read/write CSV files using java by Viral Patel

Birt Report Indian Rupee formatting

I want to format numbers based on Indian Rupee/Number format (basically commas) in Birt through scripting (for some conditional reasons).
if I use:
this.getStyle().numberFormat="#,##,##,##0.000";
It still adds commas after every 3 characters .. as in 12,345,678.000 but I want the number to be 1,23,45,678.000 in this format
Can you please advise
EDIT: Bug with BIRT raised as : https://bugs.eclipse.org/bugs/show_bug.cgi?id=432211

EDIT: set a custom format number
Here is a possible workaround, forcing BIRT to make use of com.ibm.icu.text.DecimalFormat class. I don't know why indian format is not natively supported, you could report this in bugzilla of eclipse.org site.
Edit your dataset
Create a new computed column, select "String" datatype
Enter as expression: (in the first line, replace "value" with the actual name of the numeric column containing values)
var columnvalue=row["value"], customformat="#,##,##,##0.000"; //we can add here a test for conditional formatting
if (columnvalue!=null){
var symbols=Packages.com.ibm.icu.text.DecimalFormatSymbols(new Packages.java.util.Locale("en","IN"));
var formatter=Packages.com.ibm.icu.text.DecimalFormat(customformat,symbols);
var value=new Packages.java.math.BigDecimal(columnvalue.toString());
formatter.format(value);
}else{
"-"
}
Click "Preview results" in the dataset editor, a new column should be added at the end with the expected format.

You can use NumberFormat by setting the locale to Indian setting.
Locale locale = new Locale("en","IN");
String str = NumberFormat.getNumberInstance(locale).format(<your number>);
That's if you are looking for JAVA code to resolve your problem.

**you can use following javascript currency format and call it from BIRT.
function getSouthAsianCurrencyFormat(amount)
{
var l,ftemp,temp,camount,k,adecimal;
var decimals=2;
var ptrn="##,##,###,##,##,###.##";
var ptrnLength=0;
var adecimal=0;
var counts = {};
var ch, index, len, count;
amount= Number(Math.round(amount+'e'+decimals)+'e-'+decimals);
amount=amount.toFixed( decimals );
for (index = 0, len = ptrn.length; index < len; ++index) {
ch = ptrn.charAt(index);
count = counts[ch];
counts[ch] = count ? count + 1 : 1;
}
for (ch in counts) {
if(ch=="#"){
ptrnLength=counts[ch];
console.log(ch + " count: " + ptrnLength+"("+ptrn.length+")");
console.log( "amount length: " + amount.toString().length);
//console.log("decimalLength: "+decimalLength.toString().length);
}
}
if(counts['.']=0){
amount=amount+".00";
}
k=ptrn.toString().length;
l=amount.toString().length;
ftemp=amount.toString();
temp="";
camount="";
if(ptrnLength<(amount.toString().length-1)) return 0;
else {
k=k-1;
l=l-1;
for(i=l;i>-1;i--){
if(ptrn.charAt(k)=="#" || ptrn.charAt(k)=="." ){
camount=ftemp.charAt(i)+camount;
}
else{
camount=ptrn.charAt(k)+camount;
k=k-1;
if(ptrn.charAt(k)=="#"){
camount=ftemp.charAt(i)+camount;
}
}
k=k-1;
}
return (camount);
}
}

Filter words from string

I want to filter a string.
Basically when someone types a message, I want certain words to be filtered out, like this:
User types: hey guys lol omg -omg mkdj*Omg*ndid
I want the filter to run and:
Output: hey guys lol - mkdjndid
And I need the filtered words to be loaded from an ArrayList that contains several words to filter out. Now at the moment I am doing if(message.contains(omg)) but that doesn't work if someone types zomg or -omg or similar.

Use replaceAll with a regex built from the bad word:
message = message.replaceAll("(?i)\\b[^\\w -]*" + badWord + "[^\\w -]*\\b", "");
This passes your test case:
public static void main( String[] args ) {
List<String> badWords = Arrays.asList( "omg", "black", "white" );
String message = "hey guys lol omg -omg mkdj*Omg*ndid";
for ( String badWord : badWords ) {
message = message.replaceAll("(?i)\\b[^\\w -]*" + badWord + "[^\\w -]*\\b", "");
}
System.out.println( message );
}

try:
input.replaceAll("(\\*?)[oO][mM][gG](\\*?)", "").split(" ")

Dave gave you the answer already, but I will emphasize the statement here. You will face a problem if you implement your algorithm with a simple for-loop that just replaces the occurrence of the filtered word. As an example, if you filter the word ass in the word 'classic' and replace it with 'butt', the resultant word will be 'clbuttic' which doesn't make any sense. Thus, I would suggest using a word list,like the ones stored in Linux under /usr/share/dict/ directory, to check if the word is valid or it needs filtering.
I don't quite get what you are trying to do.

I ran into this same problem and solved it in the following way:
1) Have a google spreadsheet with all words that I want to filter out
2) Directly download the google spreadsheet into my code with the loadConfigs method (see below)
3) Replace all l33tsp33k characters with their respective alphabet letter
4) Replace all special characters but letters from the sentence
5) Run an algorithm that checks all the possible combinations of words within a string against the list efficiently, note that this part is key - you don't want to loop over your ENTIRE list every time to see if your word is in the list. In my case, I found every combination within the string input and checked it against a hashmap (O(1) runtime). This way the runtime grows relatively to the string input, not the list input.
6) Check if the word is not used in combination with a good word (e.g. bass contains *ss). This is also loaded through the spreadsheet
6) In our case we are also posting the filtered words to Slack, but you can remove that line obviously.
We are using this in our own games and it's working like a charm. Hope you guys enjoy.
https://pimdewitte.me/2016/05/28/filtering-combinations-of-bad-words-out-of-string-inputs/
public static HashMap<String, String[]> words = new HashMap<String, String[]>();
public static void loadConfigs() {
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new URL("https://docs.google.com/spreadsheets/d/1hIEi2YG3ydav1E06Bzf2mQbGZ12kh2fe4ISgLg_UBuM/export?format=csv").openConnection().getInputStream()));
String line = "";
int counter = 0;
while((line = reader.readLine()) != null) {
counter++;
String[] content = null;
try {
content = line.split(",");
if(content.length == 0) {
continue;
}
String word = content[0];
String[] ignore_in_combination_with_words = new String[]{};
if(content.length > 1) {
ignore_in_combination_with_words = content[1].split("_");
}
words.put(word.replaceAll(" ", ""), ignore_in_combination_with_words);
} catch(Exception e) {
e.printStackTrace();
}
}
System.out.println("Loaded " + counter + " words to filter out");
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* Iterates over a String input and checks whether a cuss word was found in a list, then checks if the word should be ignored (e.g. bass contains the word *ss).
* #param input
* #return
*/
public static ArrayList<String> badWordsFound(String input) {
if(input == null) {
return new ArrayList<>();
}
// remove leetspeak
input = input.replaceAll("1","i");
input = input.replaceAll("!","i");
input = input.replaceAll("3","e");
input = input.replaceAll("4","a");
input = input.replaceAll("#","a");
input = input.replaceAll("5","s");
input = input.replaceAll("7","t");
input = input.replaceAll("0","o");
ArrayList<String> badWords = new ArrayList<>();
input = input.toLowerCase().replaceAll("[^a-zA-Z]", "");
for(int i = 0; i < input.length(); i++) {
for(int fromIOffset = 1; fromIOffset < (input.length()+1 - i); fromIOffset++) {
String wordToCheck = input.substring(i, i + fromIOffset);
if(words.containsKey(wordToCheck)) {
// for example, if you want to say the word bass, that should be possible.
String[] ignoreCheck = words.get(wordToCheck);
boolean ignore = false;
for(int s = 0; s < ignoreCheck.length; s++ ) {
if(input.contains(ignoreCheck[s])) {
ignore = true;
break;
}
}
if(!ignore) {
badWords.add(wordToCheck);
}
}
}
}
for(String s: badWords) {
Server.getSlackManager().queue(s + " qualified as a bad word in a username");
}
return badWords;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Email Reading in Java by keeping the text positions - java

Related

JavaMail - how to read actual message content, instead of "javax.mail.internet.MimeMultipart"

Not able to replace a text in PDF using PDFBox 2.0.2

Cannot get values from splitted Array String into a String

Birt Report Indian Rupee formatting

Filter words from string

Categories

Resources