Remove a character followed by whitespace each newline of a string

Remove a character followed by whitespace each newline of a string - java

I am writing a program to edit a rtf file. The rtf file will always come in the same format with
Q XXXXXXXXXXXX
A YYYYYYYYYYYY
Q XXXXXXXXXXXX
A YYYYYYYYYYYY
I want to remove the Q / A + whitespace and leave just the X's and Y's on each line. My first idea is to split the string into a new string for each line and edit it from there using str.split like so:
private void countLines(String str){
String[] lines = str.split("\r\n|\r|\n");
linesInDoc = lines;
}
From here my idea is to take each even array value and get rid of Q + whitespace and take each odd array value and get rid of A + whitespace. Is there a better way to do this? Note: The first line somteimes contains a ~6 digit alphanumeric. I tihnk an if statement for a 2 non whitespace chars would solve this.
Here is the rest of the code:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.StringWriter;
import java.io.Writer;
import javax.swing.JEditorPane;
import javax.swing.text.BadLocationException;
import javax.swing.text.EditorKit;
public class StringEditing {
String[] linesInDoc;
private String readRTF(File file){
String documentText = "";
try{
JEditorPane p = new JEditorPane();
p.setContentType("text/rtf");
EditorKit rtfKit = p.getEditorKitForContentType("text/rtf");
rtfKit.read(new FileReader(file), p.getDocument(), 0);
rtfKit = null;
EditorKit txtKit = p.getEditorKitForContentType("text/plain");
Writer writer = new StringWriter();
txtKit.write(writer, p.getDocument(), 0, p.getDocument().getLength());
documentText = writer.toString();
}
catch( FileNotFoundException e )
{
System.out.println( "File not found" );
}
catch( IOException e )
{
System.out.println( "I/O error" );
}
catch( BadLocationException e )
{
}
return documentText;
}
public void editDocument(File file){
String plaintext = readRTF(file);
System.out.println(plaintext);
fixString(plaintext);
System.out.println(plaintext);
}

Unless I'm missing something, you could use String.substring(int) like
String lines = "Q XXXXXXXXXXXX\n" //
+ "A YYYYYYYYYYYY\n" //
+ "Q XXXXXXXXXXXX\n" //
+ "A YYYYYYYYYYYY\n";
for (String line : lines.split("\n")) {
System.out.println(line.substring(6));
}
Output is
XXXXXXXXXXXX
YYYYYYYYYYYY
XXXXXXXXXXXX
YYYYYYYYYYYY
If your format should be more general, you might prefer
System.out.println(line.substring(1).trim());

A BufferedReader will handle the newline \n for you.
You can use a matcher to validate that the line is in the desired format.
If the line is fixed length, simply use the substring
final String bodyPattern = "\\w{1,1}[ \\w]{5,5}\\d{12,12}";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
String line;
while ((line = br.readLine()) != null) {
if (line.matches(bodyPattern)) {
//
myString = line.substring(6);
}
}
}
//catch Block
You can adjust the regex pattern to your specific requirements

easily doable by a regex (assuming 'fileText' is your whole file's content)
removedPrefix = fileText.replaceAll("(A|Q) *(.+)\\r", "$2\\r");
The regex means a Q or A for start, then some (any amount of) spaces, then anything (marked as group 2), and a closing line. This doesn't do anything to the first line with the digits. The result is the file content without the Q/A and the spaces. There are easier ways if you know the exact number of spaces before your needed text, but this works for all, and greatly flexible.
If you process line by line it's
removedPrefix = currentLine.replaceAll("(A|Q) *(.+)", "$2");
As simple as that

Related

outputstream writer java for integers

I am quite new in java, I need to save xml to csv using java, but problem is I cannot use CSVWriter because in xml there are also UTF8 encoded data.
Therefore I found out it is possible to use outputstreamwriter, which can be encoded in UTF8.
For string everything is ok, but for integer I cannot get correct number.
Sample code:
import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.*;
public class UTF8WriterDemo {
public static void main(String[] args) {
Writer out = null;
try {
out = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream("c://java2//file.csv"), "windows-1250"));
//for (int i=0; i<4; i++ ) {
String text = "This tečt will be added to File !!";
int hu = 4;
out.write('\ufeff');
out.write(text+ '\n');
out.write(hu+ '\n');
//}
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
I get picture instead of a number.
I suppose it's because:
An OutputStreamWriter is a bridge from character streams to byte streams: Characters written to it are encoded into bytes using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
And that's why it's not displayed correctly.
Therefore I would like to ask, is there any option for integers to be displayed using outputstreamwriter?
Or if not, how can I convert xml data into csv using java for UTF8 encoded characters?
Thank you

Java has a difference between using double quotes and single quotes.
"foo" is a String.
'f' is a char (or Character)
'foo' will throw an Exception, because you can only have 1 character in a char.
'\n' is also 1 character, specifically the newline character. Adding a number and a character will use the number as an ASCII value and use the corresponding character, then combine both characters into a String (or array of characters, ie. char[]).
Using double quotes should fix your issue.

import java.io.*;
public class UTF8WriterDemo {
public static void main(String[] args) {
Writer out = null;
try {
out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("file.csv"), "windows-1250"));
//for (int i = 0; i < 4; i++) {
String text = "This text will be added to File !!";
int hu = 4;
String text2 = new String("" + hu);
out.write('\ufeff');
out.write(text + '\n');
out.write(text2 + '\n');
// }
out.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
System.out.println("The process is completed.");
}
}
}

actually I need to rewrite this construction:
FileWriter fileWriter = new
FileWriter("C:\\java\\test\\EEexample3.csv");
CSVWriter csvWriter = new CSVWriter(fileWriter);
csvWriter.writeNext(new String[] {
..
..
..
..
}
..code.. code..
String homeCurrencyPriceString = iit.getHomeCurrency().getPrice()!=null?iit.getHomeCurrency().getPrice().toString():"";
String headerDateString = invoiceHeaderType.getDateTax()!=null?invoiceHeaderType.getDateTax().toString():"";
String invoiceTypeString = invoiceHeaderType.getInvoiceType()!=null?invoiceHeaderType.getInvoiceType().value():"";
String headeraccountno= invoiceHeaderType.getAccount().getAccountNo()!=null?invoiceHeaderType.getAccount().getAccountNo().toString():"";
String headertext = invoiceHeaderType.getText()!=null?invoiceHeaderType.getText():"";
String invoiceitemtext= iit.getText()!=null?iit.getText():"";
String headericdph = invoiceHeaderType.getPartnerIdentity().getAddress().getIcDph()!=null?invoiceHeaderType.getPartnerIdentity().getAddress().getIcDph():"";
String symVar = invoiceHeaderType.getSymVar()!=null?invoiceHeaderType.getSymVar():"";
csvWriter.writeNext(new String[] {
invoiceHeaderType.getPartnerIdentity().getAddress().getIco(), headericdph, invoiceHeaderType.getPartnerIdentity().getAddress().getCompany(),symVar, invoiceHeaderType.getId().toString(), iit.getId().toString(), homeCurrencyPriceString, detailcentreString,headercentreString, headerDateString, invoiceTypeString,headeraccountno, headertext,invoiceitemtext
});
where objects are filled by xml
to outputstreamwriter construction.
So first I am trying outputstream as simple code, to be sure it`s working , next when it works, I wanted to rewrite the whole code.
Using CSVwriter everything works smoothly, just now there were added texts encoded in UTF8/windows1250 :( Therefore I need to fix the construction of code.
Even number objects like price are converted using .toString(), so maybe it works without int.
I hope writer of outputstreamwriter is able to do what is necessary.
I am going to try.

Splitting a text file into multiple files by specific character sequence

I have a file with the following format.
.I 1
.T
experimental investigation of the aerodynamics of a
wing in a slipstream . 1989
.A
brenckman,m.
.B
experimental investigation of the aerodynamics of a
wing in a slipstream .
.I 2
.T
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .
.A
ting-yili
.B
some texts...
some more text....
.I 3
...
".I 1" indicate the beginning of chunk of text corresponding to doc ID1 and ".I 2" indicates the beginning of chunk of text corresponding to doc ID2.
what I need is read the text between ".I 1" and ".I 2" and save it as a separate file like "DOC_ID_1.txt" and then read the text between ".I 2" and ".I 3"
and save it as a separate file like "DOC_ID_2.txt" and so on. lets assume that the number of .I # is not known.
I have tried this but cannot finish it. any help will be appreciated
String inputDocFile="C:\\Dropbox\\Data\\cran.all.1400";
try {
File inputFile = new File(inputDocFile);
FileReader fileReader = new FileReader(inputFile);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line=null;
String outputDocFileSeperatedByID="DOC_ID_";
//Pattern docHeaderPattern = Pattern.compile(".I ", Pattern.MULTILINE | Pattern.COMMENTS);
ArrayList<ArrayList<String>> result = new ArrayList<> ();
int docID =0;
try {
StringBuilder sb = new StringBuilder();
line = bufferedReader.readLine();
while (line != null) {
if (line.startsWith(".I"))
{
result.add(new ArrayList<String>());
result.get(docID).add(".I");
line = bufferedReader.readLine();
while(line != null && !line.startsWith(".I")){
line = bufferedReader.readLine();
}
++docID;
}
else line = bufferedReader.readLine();
}
} finally {
bufferedReader.close();
}
} catch (IOException ex) {
Logger.getLogger(ReadFile.class.getName()).log(Level.SEVERE, null, ex);
}

You want to find the lines which match "I n".
The regex you need is : ^.I \d$
^ indicates the beginning of the line. Hence, if there are some whitespaces or text before I, the line will not match the regex.
\d indicates any digit. For the sake of simplicty, I allow only one digit in this regex.
$ indicates the end of the line. Hence, if there are some characters after the digit, the line will not match the expression.
Now, you need to read the file line by line and keep a reference to the file in which you write the current line.
Reading a file line by line is much easier in Java 8 with Files.lines();
private String currentFile = "root.txt";
public static final String REGEX = "^.I \\d$";
public void foo() throws Exception{
Path path = Paths.get("path/to/your/input/file.txt");
Files.lines(path).forEach(line -> {
if(line.matches(REGEX)) {
//Extract the digit and update currentFile
currentFile = "File DOC_ID_"+line.substring(3, line.length())+".txt";
System.out.println("Current file is now : currentFile);
} else {
System.out.println("Writing this line to "+currentFile + " :" + line);
//Files.write(...);
}
});
Note : In order to extract the digit, I use a raw "".substring() which I consider as evil but it is easier to understand. You can do it in a better way with a Pattern and a Matcher :
With this regex : ".I (\\d)". (The same as before but with parenthesis which indicates what you will want to capture). Then :
Pattern pattern = Pattern.compile(".I (\\d)");
Matcher matcher = pattern.matcher(".I 3");
if(matcher.find()) {
System.out.println(matcher.group(1));//display "3"
}

Look up regex, Java has inbuilt libraries for this.
https://docs.oracle.com/javase/tutorial/essential/regex/
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
These links will give you a starting point, effectively you can use counter to perform a pattern match against the string and store anything between the first pattern match and the second pattern match. This information can be output to a separate file using the Formatter class.
Found here:-
http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
public class Test {
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String inputFile="C:\\logs\\test.txt";
BufferedReader br = new BufferedReader(new FileReader(new File(inputFile)));
String line=null;
StringBuilder sb = new StringBuilder();
int count=1;
try {
while((line = br.readLine()) != null){
if(line.startsWith(".I")){
if(sb.length()!=0){
File file = new File("C:\\logs\\DOC_ID_"+count+".txt");
PrintWriter writer = new PrintWriter(file, "UTF-8");
writer.println(sb.toString());
writer.close();
sb.delete(0, sb.length());
count++;
}
continue;
}
sb.append(line);
}
} catch (Exception ex) {
ex.printStackTrace();
}
finally {
br.close();
}
}
}

How to split string by new lines in JAVA?

I want to split string by new lines in Java.I am using following regex -
str.split("\\r|\\n|\\r\\n");
But still it is not splitting string by new lines.
Input -
0
0
0
0
Output = String [] array = {"0000"} instead I want = String [] array = {"0","0","0","0"}.
I have read various solutions on stack overflow but nothing works for me.
Code is -
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.DecimalFormat;
public class Input {
public static void main(String[] args) {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line;
String text = "";
try {
while((line=br.readLine())!=null){
text = text + line;
}
} catch (IOException e) {
e.printStackTrace();
}
String [] textarray = text.trim().split("[\\r\\n]+");
for(int j=0;j<textarray.length;j++)
System.out.println(textarray[j]);
// System.out.print("");
// for(int i=((textarray.length)-1);i>=0;i--){
// long k = Long.valueOf(textarray[i]).longValue();
// System.out.println(k);
//// double sqrt = Math.sqrt(k);
//// double value = Double.parseDouble(new DecimalFormat("##.####").format(sqrt));
//// System.out.println(value);
////
//// }
}

When you call br.readLine(), the newline characters are stripped from the end of the string. So if you type 0 + ENTER four times, you are trying to split the string "0000".
You would be better to read items in from stdin and store them in an expandable data structure, such as a List<String>. No need to split things if you've already read them separately.

Updated Answer:
If you are reading the inputstreamfrom the keyboard, the \n may not be put into the data correctly. In that case, you may want to choose a new sentinel value.
Original Answer:
I believe you need to create a sentinel value. So if \n is your sentinel value, you could do something like this:
Load the inputstream into a string variable
Go character by character through the string variable checking to see if \n is in the input (you could use a for loop and the substing(i, i+2)
If it is found, then you could add it to an array

How can I parse through a file for a string matching a generated string?

My bad for the title, I am usually not good at making those.
I have a programme that will generate all permutations of an inputted word and that is supposed to check to see if those are words (checks dictionary), and output the ones that are. Really I just need the last the part and I can not figure out how to parse through a file.
I took out what was there (now displaying the "String words =") because it really made thing worse (was an if statement). Right now, all it will do is output all permutations.
Edit: I should add that the try/catch was added in when I tried turning the file in a list (as opposed to the string format which it is currently in). So right now it does nothing.
One more thing: is it possible (well how, really) to get the permutations to display permutations with lesser characters than entered ? Sorry for the bad wording, like if I enter five characters, show all five character permutations, and four, and three, and two, and one.
import java.util.List;
import java.util.Scanner;
import java.io.BufferedReader;
import java.io.File;
import java.io.InputStreamReader;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import static java.lang.System.out;
public class Permutations
{
public static void main(String[] args) throws Exception
{
out.println("Enter anything to get permutations: ");
Scanner scan = new Scanner(System.in);
String io = scan.nextLine();
String str = io;
StringBuffer strBuf = new StringBuffer(str);
mutate(strBuf,str.length());
}
private static void mutate(StringBuffer str, int index)
{
try
{
String words = FileUtils.readFileToString(new File("wordsEn.txt"));
if(index <= 0)
{
out.println(str);
}
else
{
mutate(str, index - 1);
int currLoc = str.length()-index;
for (int i = currLoc + 1; i < str.length(); i++)
{
change(str, currLoc, i);
mutate(str, index - 1);
change(str, i, currLoc);
}
}
}
catch(IOException e)
{
out.println("Your search found no results");
}
}
private static void change(StringBuffer str, int loc1, int loc2)
{
char t1 = str.charAt(loc1);
str.setCharAt(loc1, str.charAt(loc2));
str.setCharAt(loc2, t1);
}
}

If each word in your file is actually on a different line, maybe you can try this:
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
while ((line = br.readLine()) != null)
{
... // check and print here
}
Or if you want to try something else, the Apache Commons IO library has something called LineIterator.
An Iterator over the lines in a Reader.
LineIterator holds a reference to an open Reader. When you have finished with the iterator you should close the reader to free internal resources. This can be done by closing the reader directly, or by calling the close() or closeQuietly(LineIterator) method on the iterator.
The recommended usage pattern is:
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
it.close();
}

Java Regex to remove all words after a key till end key

Can anyone out there please help me ,
i have a file containing several important information but also containing irrelevant information inside it as well . the irrelevant information is mentioned inside a curly
bracket for example :
Function blah blah 1+2 {unwanted information} something+2
what i wish to do is remove the unwanted information, and display the out put like this :
Function blah blah 1+2 something+2
can some 1 please give me the regex code for this ?
I have a partial code for this
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.BufferedReader;
public class SimpleReader{
public static void main( String a[] )
{
String source = readFile("source.java");
}
static String readFile(String fileName) {
File file = new File(fileName);
char[] buffer = null;
try {
BufferedReader bufferedReader = new BufferedReader( new FileReader(file));
buffer = new char[(int)file.length()];
int i = 0;
int c = bufferedReader.read();
while (c != -1) {
buffer[i++] = (char)c;
c = bufferedReader.read();
}
} catch (IOException e) {
e.printStackTrace();
}
return new String(buffer);
}
}
Thanks in advance.

newstr = str.replaceAll("{[^}]*}", "");
Modified the answer from this question: How to remove entire substring from '<' to '>' in Java

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove a character followed by whitespace each newline of a string - java

Related

outputstream writer java for integers

Splitting a text file into multiple files by specific character sequence

How to split string by new lines in JAVA?

How can I parse through a file for a string matching a generated string?

Java Regex to remove all words after a key till end key

Categories

Resources