I have a problem with a simple code nd don't know how to do it;
I have 3 txt files.
First txt file looks like this:
1 2 3 4 5 4.5 4,6 6.8 8,9
1 3 4 5 8 9,2 6,3 6,7 8.9
I would like to read numbers from this txt file and save integers to one txt file and floats to another.
You can do it with the following easy steps:
When you read a line, split it on whitespace and get an array of tokens.
While processing each token,
Trim any leading and trailing whitespace and then replace , with .
First check if the token can be parsed into an int. If yes, write it into outInt (the writer for integers). Otherwise, check if the token can be parsed into float. If yes, write it into outFloat (the writer for floats). Otherwise, ignore it.
Demo:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws FileNotFoundException, IOException {
BufferedReader in = new BufferedReader(new FileReader("t.txt"));
BufferedWriter outInt = new BufferedWriter(new FileWriter("t2.txt"));
BufferedWriter outFloat = new BufferedWriter(new FileWriter("t3.txt"));
String line = "";
while ((line = in.readLine()) != null) {// Read until EOF is reached
// Split the line on whitespace and get an array of tokens
String[] tokens = line.split("\\s+");
// Process each token
for (String s : tokens) {
// Trim any leading and trailing whitespace and then replace , with .
s = s.trim().replace(',', '.');
// First check if the token can be parsed into an int
try {
Integer.parseInt(s);
// If yes, write it into outInt
outInt.write(s + " ");
} catch (NumberFormatException e) {
// Otherwise, check if token can be parsed into float
try {
Float.parseFloat(s);
// If yes, write it into outFloat
outFloat.write(s + " ");
} catch (NumberFormatException ex) {
// Otherwise, ignore it
}
}
}
}
in.close();
outInt.close();
outFloat.close();
}
}
Assuming that , is also a decimal separator . it may be possible to unify this characters (replace , with .).
static void readAndWriteNumbers(String inputFile, String intNums, String dblNums) throws IOException {
// Use StringBuilder to collect the int and double numbers separately
StringBuilder ints = new StringBuilder();
StringBuilder dbls = new StringBuilder();
Files.lines(Paths.get(inputFile)) // stream of string
.map(str -> str.replace(',', '.')) // unify decimal separators
.map(str -> {
Arrays.stream(str.split("\\s+")).forEach(v -> { // split each line into tokens
if (v.contains(".")) {
if (dbls.length() > 0 && !dbls.toString().endsWith(System.lineSeparator())) {
dbls.append(" ");
}
dbls.append(v);
}
else {
if (ints.length() > 0 && !ints.toString().endsWith(System.lineSeparator())) {
ints.append(" ");
}
ints.append(v);
}
});
return System.lineSeparator(); // return new-line
})
.forEach(s -> { ints.append(s); dbls.append(s); }); // keep lines in the results
// write the files using the contents from the string builders
try (
FileWriter intWriter = new FileWriter(intNums);
FileWriter dblWriter = new FileWriter(dblNums);
) {
intWriter.write(ints.toString());
dblWriter.write(dbls.toString());
}
}
// test
readAndWriteNumbers("test.dat", "ints.dat", "dbls.dat");
Output
//ints.dat
1 2 3 4 5
1 3 4 5 8
// dbls.dat
4.5 4.6 6.8 8.9
9.2 6.3 6.7 8.9
Related
I have to read a flat file which is not properly structured and I need to read it by the size of the indent in a line.
Element TestData*
Content Particle Particle_3*
Element TestData1*
Content Particle Particle_62*
Above is my structure of the flat file. I need to read the empty leading spaces before the text.
The expected result to be:
Length of Empty space of 1st line = 2
Length of Empty space of 2nd line = 5
Length of Empty space of 3rd line = 8
Length of Empty space of 4th line = 11
Any help would be great...!!!
Thanks.
Something like this might work:
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args){
try (BufferedReader reader = Files.newBufferedReader(Paths.get("./testfile.txt"), StandardCharsets.UTF_8)){
int lineNr = 0;
String line;
while((line = reader.readLine()) != null){
lineNr++;
int spaces = 0;
for (int i=0;i<line.length();i++){
if (line.charAt(i) == ' '){
spaces++;
}
else{
break;
}
}
System.out.println("line "+lineNr+" has "+spaces+" leading spaces:"+line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Output:
line 1 has 2 leading spaces: Element TestData*
line 2 has 5 leading spaces: Content Particle Particle_3*
line 3 has 8 leading spaces: Element TestData1*
line 4 has 11 leading spaces: Content Particle Particle_62*
Hi I'm reading file (please, use the link to see the file) that contains this rows:
U+0000
U+0001
U+0002
U+0003
U+0004
U+0005
using this code
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
public class fgenerator {
public static void main(String[] args) {
try(BufferedReader br = new BufferedReader(new FileReader(new File("C:\\UNCDUNCD.txt")))){
String line;
String[] splited;
while ((line = br.readLine()) != null){
splited = line.split(" ");
System.out.println(splited[0]);
}
}catch(Exception e) {
e.printStackTrace();
}
}
}
but output is
U+D01C
U+D01D
U+D01E
U+D01F
U+D020
U+D021
why does this happen?
how to get the char of its code
change line datatype to char, if doesnt work then String.getBytes()
I am assuming that you want to take the Unicode representation that is on each line of the file and output the actual Unicode character which the code represents.
If we start with your loop that reads each line from the file...
while ((line = br.readLine()) != null){
System.out.println( line );
}
... then what we want to do is convert the input line to the character, and print that ...
while ((line = br.readLine()) != null){
System.out.println( convert(line) ); <- I just put a method call to "convert()"
}
So, how do you convert(line) into a character before printing it?
As my earlier comment suggested, you want to take the numeric string that follows the U+ and convert it to an actual numeric value. That, then, is the character value you want to print.
The following is a complete program — essentially like yours but I take the filename as an argument rather than hard-coding it. I've also added skipping blank lines, and rejecting invalid strings -- printing a blank space instead.
Reject the line if it does not match the U+nnnn form of a Unicode representation — match against "(?i)U\\+[0-9A-F]{4}", which means:
(?i) - ignore case
U\\+ - match U+, where the + has to be escaped to be a literal plus
[0-9A-F] - match any character 0-9 or A-F (ignoring case)
{4} - exactly 4 times
With your update that includes a linked sample file, which includes # comments, I have modified my original program (below) so it will now strip comments and then convert the remaining representation.
This is a complete program that can be run as:
javac Reader2.java
java Reader2 inputfile.txt
I tested it with a subset of your file, starting inputfile.txt at line 1 with U+0000 and ending at line 312 with U+0138
import java.io.*;
public class Reader2
{
public static void main(String... args)
{
final String filename = args[0];
try (BufferedReader br = new BufferedReader(
new FileReader(new File( filename ))
)
)
{
String line;
while ((line = br.readLine()) != null) {
if (line.trim().length() > 0) { // skip blank lines
//System.out.println( convert(line) );
final Character c = convert(line);
if (Character.isValidCodePoint(c)) {
System.out.print ( c );
}
}
}
System.out.println();
}
catch(Exception e) {
e.printStackTrace();
}
}
private static char convert(final String input)
{
//System.out.println("Working on line: " + input);
if (! input.matches("(?i)U\\+[0-9A-F]{4}(\\s+#.*)")) {
System.err.println("Rejecting line: " + input);
return ' ';
}
else {
//System.out.println("Accepting line: " + input);
}
// else
final String stripped = input.replaceFirst("\\s+#.*$", "");
final Integer cval = Integer.parseInt(stripped.substring(2), 16);
//System.out.println("cval = " + cval);
return (char) cval.intValue();
}
}
Original program that assumed a line consisted only of U+nnnn is here.
You would run this as:
javac Reader.java
java Reader input.txt
import java.io.*;
public class Reader
{
public static void main(String... args)
{
final String filename = args[0];
try (BufferedReader br = new BufferedReader(
new FileReader(new File( filename ))
)
)
{
String line;
while ((line = br.readLine()) != null) {
if (line.trim().length() > 0) { // skip blank lines
//System.out.println( line );
// Write all chars on one line rather than one char per line
System.out.print ( convert(line) );
}
}
System.out.println(); // Print a newline after all chars are printed
}
catch(Exception e) { // don't catch plain `Exception` IRL
e.printStackTrace(); // don't just print a stack trace IRL
}
}
private static char convert(final String input)
{
// Reject any line that doesn't match U+nnnn
if (! input.matches("(?i)U\\+[0-9A-F]{4}")) {
System.err.println("Rejecting line: " + input);
return ' ';
}
// else convert the line to the character
final Integer cval = Integer.parseInt(input.substring(2), 16);
//System.out.println("cval = " + cval);
return (char) cval.intValue();
}
}
Try it using this as your input file:
U+0041
bad line
U+2718
U+00E9
u+0073
Redirect standard error when you run it java Reader input.txt 2> /dev/null or comment out the line System.err.println...
You should get this output: A ✘és
I have the following Text:
1
(some text)
/
2
(some text)
/
.
.
/
8519
(some text)
and I want to split this text into several text-files where each file has the name of the number before the text i.e. (1.txt, 2.txt) and so on, and the content of this file will be the text.
I tried this code
BufferedReader br = new BufferedReader(new FileReader("(Path)\\doc.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
// sb.append(System.lineSeparator());
line = br.readLine();
}
String str = sb.toString();
String[] arrOfStr = str.split("/");
for (int i = 0; i < arrOfStr.length; i++) {
PrintWriter writer = new PrintWriter("(Path)" + arrOfStr[i].charAt(0) + ".txt", "UTF-8");
writer.println(arrOfStr[i].substring(1));
writer.close();
}
System.out.println("Done");
} finally {
br.close();
}
this code works for files 1-9. However, things go wrong for files 10-8519 since I took the first number in the string (arrOfStr [i].charAt(0)) I know my solution is insufficient any suggestions?
In addition to my comment, considering there isn't a space between the leading integer and the first word, the substring at the first space doesn't work.
This question/answer has a few options that should help, the one using regex (\d+) being the simplest one imo, and copied below.
Matcher matcher = Pattern.compile("\\d+").matcher(arrOfStr[i]);
matcher.find();
int yourNumber = Integer.valueOf(matcher.group());
Given a string find the first embedded occurrence of an integer
As you mentioned, the problem is that you only take the first digit. You could enumerate the first characters until you find a non digit character ( arrOfStr[i].charAt(j) <'0' || arrOfStr[i].charAt(j) > '9' ) but it shoud be easier to user a Scanner and an appropriate regexp.
int index = new Scanner(arrOfStr[i]).useDelimiter("\\D+").nextInt();
The delimiter is precisely any group of non-digit character
Here is a quick solution for the given problem. You can test and do proper exception handling.
package practice;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.List;
public class FileNioTest {
public static void main(String[] args) {
Path path = Paths.get("C:/Temp/readme.txt");
try {
List<String> contents = Files.readAllLines(path);
StringBuffer sb = new StringBuffer();
String folderName = "C:/Temp/";
String fileName = null;
String previousFileName = null;
// Read from the stream
for (String content : contents) {// for each line of content in contents
if (content.matches("-?\\d+")) { // check if it is a number (based on your requirement)
fileName = folderName + content + ".txt"; // create a file name with path
if (sb != null && sb.length() > 0) { // this means if content present to write in the file
writeToFile(previousFileName, sb); // write to file
sb.setLength(0); // clearing buffer
}
createFile(fileName); // create a new file if number found in the line
previousFileName = fileName; // store the name to write content in previous opened file.
} else {
sb.append(content); // keep storing the content to write in the file.
}
System.out.println(content);// print the line
}
if (sb != null && sb.length() > 0) {
writeToFile(fileName, sb);
sb.setLength(0);
}
} catch (IOException ex) {
ex.printStackTrace();// handle exception here
}
}
private static void createFile (String fileName) {
Path newFilePath = Paths.get(fileName);
if (!Files.exists(newFilePath)) {
try {
Files.createFile(newFilePath);
} catch (IOException e) {
System.err.println(e);
}
}
}
private static void writeToFile (String fileName, StringBuffer sb) {
try {
Files.write(Paths.get(fileName), sb.toString().getBytes(), StandardOpenOption.APPEND);
}catch (IOException e) {
System.err.println(e);
}
}
}
I have a file with the following format.
.I 1
.T
experimental investigation of the aerodynamics of a
wing in a slipstream . 1989
.A
brenckman,m.
.B
experimental investigation of the aerodynamics of a
wing in a slipstream .
.I 2
.T
simple shear flow past a flat plate in an incompressible fluid of small
viscosity .
.A
ting-yili
.B
some texts...
some more text....
.I 3
...
".I 1" indicate the beginning of chunk of text corresponding to doc ID1 and ".I 2" indicates the beginning of chunk of text corresponding to doc ID2.
what I need is read the text between ".I 1" and ".I 2" and save it as a separate file like "DOC_ID_1.txt" and then read the text between ".I 2" and ".I 3"
and save it as a separate file like "DOC_ID_2.txt" and so on. lets assume that the number of .I # is not known.
I have tried this but cannot finish it. any help will be appreciated
String inputDocFile="C:\\Dropbox\\Data\\cran.all.1400";
try {
File inputFile = new File(inputDocFile);
FileReader fileReader = new FileReader(inputFile);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line=null;
String outputDocFileSeperatedByID="DOC_ID_";
//Pattern docHeaderPattern = Pattern.compile(".I ", Pattern.MULTILINE | Pattern.COMMENTS);
ArrayList<ArrayList<String>> result = new ArrayList<> ();
int docID =0;
try {
StringBuilder sb = new StringBuilder();
line = bufferedReader.readLine();
while (line != null) {
if (line.startsWith(".I"))
{
result.add(new ArrayList<String>());
result.get(docID).add(".I");
line = bufferedReader.readLine();
while(line != null && !line.startsWith(".I")){
line = bufferedReader.readLine();
}
++docID;
}
else line = bufferedReader.readLine();
}
} finally {
bufferedReader.close();
}
} catch (IOException ex) {
Logger.getLogger(ReadFile.class.getName()).log(Level.SEVERE, null, ex);
}
You want to find the lines which match "I n".
The regex you need is : ^.I \d$
^ indicates the beginning of the line. Hence, if there are some whitespaces or text before I, the line will not match the regex.
\d indicates any digit. For the sake of simplicty, I allow only one digit in this regex.
$ indicates the end of the line. Hence, if there are some characters after the digit, the line will not match the expression.
Now, you need to read the file line by line and keep a reference to the file in which you write the current line.
Reading a file line by line is much easier in Java 8 with Files.lines();
private String currentFile = "root.txt";
public static final String REGEX = "^.I \\d$";
public void foo() throws Exception{
Path path = Paths.get("path/to/your/input/file.txt");
Files.lines(path).forEach(line -> {
if(line.matches(REGEX)) {
//Extract the digit and update currentFile
currentFile = "File DOC_ID_"+line.substring(3, line.length())+".txt";
System.out.println("Current file is now : currentFile);
} else {
System.out.println("Writing this line to "+currentFile + " :" + line);
//Files.write(...);
}
});
Note : In order to extract the digit, I use a raw "".substring() which I consider as evil but it is easier to understand. You can do it in a better way with a Pattern and a Matcher :
With this regex : ".I (\\d)". (The same as before but with parenthesis which indicates what you will want to capture). Then :
Pattern pattern = Pattern.compile(".I (\\d)");
Matcher matcher = pattern.matcher(".I 3");
if(matcher.find()) {
System.out.println(matcher.group(1));//display "3"
}
Look up regex, Java has inbuilt libraries for this.
https://docs.oracle.com/javase/tutorial/essential/regex/
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
These links will give you a starting point, effectively you can use counter to perform a pattern match against the string and store anything between the first pattern match and the second pattern match. This information can be output to a separate file using the Formatter class.
Found here:-
http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
public class Test {
/**
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String inputFile="C:\\logs\\test.txt";
BufferedReader br = new BufferedReader(new FileReader(new File(inputFile)));
String line=null;
StringBuilder sb = new StringBuilder();
int count=1;
try {
while((line = br.readLine()) != null){
if(line.startsWith(".I")){
if(sb.length()!=0){
File file = new File("C:\\logs\\DOC_ID_"+count+".txt");
PrintWriter writer = new PrintWriter(file, "UTF-8");
writer.println(sb.toString());
writer.close();
sb.delete(0, sb.length());
count++;
}
continue;
}
sb.append(line);
}
} catch (Exception ex) {
ex.printStackTrace();
}
finally {
br.close();
}
}
}
I am writing a program to edit a rtf file. The rtf file will always come in the same format with
Q XXXXXXXXXXXX
A YYYYYYYYYYYY
Q XXXXXXXXXXXX
A YYYYYYYYYYYY
I want to remove the Q / A + whitespace and leave just the X's and Y's on each line. My first idea is to split the string into a new string for each line and edit it from there using str.split like so:
private void countLines(String str){
String[] lines = str.split("\r\n|\r|\n");
linesInDoc = lines;
}
From here my idea is to take each even array value and get rid of Q + whitespace and take each odd array value and get rid of A + whitespace. Is there a better way to do this? Note: The first line somteimes contains a ~6 digit alphanumeric. I tihnk an if statement for a 2 non whitespace chars would solve this.
Here is the rest of the code:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.StringWriter;
import java.io.Writer;
import javax.swing.JEditorPane;
import javax.swing.text.BadLocationException;
import javax.swing.text.EditorKit;
public class StringEditing {
String[] linesInDoc;
private String readRTF(File file){
String documentText = "";
try{
JEditorPane p = new JEditorPane();
p.setContentType("text/rtf");
EditorKit rtfKit = p.getEditorKitForContentType("text/rtf");
rtfKit.read(new FileReader(file), p.getDocument(), 0);
rtfKit = null;
EditorKit txtKit = p.getEditorKitForContentType("text/plain");
Writer writer = new StringWriter();
txtKit.write(writer, p.getDocument(), 0, p.getDocument().getLength());
documentText = writer.toString();
}
catch( FileNotFoundException e )
{
System.out.println( "File not found" );
}
catch( IOException e )
{
System.out.println( "I/O error" );
}
catch( BadLocationException e )
{
}
return documentText;
}
public void editDocument(File file){
String plaintext = readRTF(file);
System.out.println(plaintext);
fixString(plaintext);
System.out.println(plaintext);
}
Unless I'm missing something, you could use String.substring(int) like
String lines = "Q XXXXXXXXXXXX\n" //
+ "A YYYYYYYYYYYY\n" //
+ "Q XXXXXXXXXXXX\n" //
+ "A YYYYYYYYYYYY\n";
for (String line : lines.split("\n")) {
System.out.println(line.substring(6));
}
Output is
XXXXXXXXXXXX
YYYYYYYYYYYY
XXXXXXXXXXXX
YYYYYYYYYYYY
If your format should be more general, you might prefer
System.out.println(line.substring(1).trim());
A BufferedReader will handle the newline \n for you.
You can use a matcher to validate that the line is in the desired format.
If the line is fixed length, simply use the substring
final String bodyPattern = "\\w{1,1}[ \\w]{5,5}\\d{12,12}";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
String line;
while ((line = br.readLine()) != null) {
if (line.matches(bodyPattern)) {
//
myString = line.substring(6);
}
}
}
//catch Block
You can adjust the regex pattern to your specific requirements
easily doable by a regex (assuming 'fileText' is your whole file's content)
removedPrefix = fileText.replaceAll("(A|Q) *(.+)\\r", "$2\\r");
The regex means a Q or A for start, then some (any amount of) spaces, then anything (marked as group 2), and a closing line. This doesn't do anything to the first line with the digits. The result is the file content without the Q/A and the spaces. There are easier ways if you know the exact number of spaces before your needed text, but this works for all, and greatly flexible.
If you process line by line it's
removedPrefix = currentLine.replaceAll("(A|Q) *(.+)", "$2");
As simple as that