Processing text files and hyphenating strings line by line in Java - java

I have a .txt file with 8,000 rows in a single column. Each line contains either an alphanumeric or a number like this:
0219381A
10101298
32192017
1720291C
04041009
I'd like to read this file, insert a 0 (zero) before each beginning digit, a hyphen in between digits 3 and 4, and then remove the remaining digits to an output file like this:
002-19
010-10
032-19
017-20
004-04
I'm able to read from and write to a file or insert a hyphen when done separately but can't get the pieces working together:
public static void main(String[] args) throws FileNotFoundException{
// TODO Auto-generated method stub
Scanner in = new Scanner(new File("file.txt"));
PrintWriter out = new PrintWriter("file1.txt");
while(in.hasNextLine())
{
StringBuilder builder = new StringBuilder(in.nextLine());
builder.insert(0, "0");
builder.insert(3, "-");
String hyph = builder.toString();
out.printf(hyph);
}
in.close();
out.close();
How can I get these pieces working together/is there another approach?

try this
while (in.hasNextLine()) {
String line = in.nextLine();
if (!line.isEmpty()) {
line = "0" + line.substring(0, 2) + "-" + line.substring(2, 4);
}
out.println(line);
}

You code looks fine. If you make this changes, you should be good i feel :
StringBuilder builder = new StringBuilder(in.nextLine().substring(0,4));

Related

Reading a File without line breaks using Buffered reader

I am reading a file with comma separated values which when split into an array will have 10 values for each line . I expected the file to have line breaks so that
line = bReader.readLine()
will give me each line. But my file doesnt have a line break. Instead after the first set of values there are lots of spaces(465 to be precise) and then the next line begins.
So my above code of readLine() is reading the entire file in one go as there are no lined breaks. Please suggest how best to efficiently tackle this scenario.
One way is to replace String with 465 spaces in your text with new line character "\n" before iterating it for reading.
I second Ninan's answer: replace the 465 spaces with a newline, then run the function you were planning on running earlier.
For aesthetics and readability I would suggest using Regex's Pattern to replace the spaces instead of a long unreadable String.replace(" ").
Your code could like below, but replace 6 with 465:
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
String content = "DOG,CAT MOUSE,CHEESE";
Pattern p = Pattern.compile("[ ]{6}",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
String newString = p.matcher(content).replaceAll("\n");
System.out.println(newString);
}
My suggestion is read file f1.txt and write to anther file f2.txt by removing all empty lines and spaces then read f2.txt something like
FileReader fr = new FileReader("f1.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("f2.txt");
String line;
while((line = br.readLine()) != null)
{
line = line.trim(); // remove leading and trailing whitespace
if (!line.equals("")) // don't write out blank lines
{
fw.write(line, 0, line.length());
}
}
Then try using your code.
You might create your own subclass of a FilterInputStream or a PushbackInputStream and pass that to an InputStreamReader. One overrides int read().
Such a class unfortunately needs a bit of typing. (A nice excercise so to say.)
private static final int NO_CHAR = -2;
private boolean fromCache;
private int cachedSpaces;
private int cachedNonSpaceChar = NO_CHAR;
int read() throws IOException {
if (fromCache) {
if (cachecSpaces > 0) ...
if (cachedNonSpaceChar != NO_CHAR) ...
...
}
int ch = super.read();
if (ch != -1) {
...
}
return ch;
}
The idea is to cache spaces till either a nonspace char, and in read() either take from the cache, return \n instead, call super.read() when not from cache, recursive read when space.
My understanding is that you have a flat CSV file without proper line break, which supposed to have 10 values on each line.
Updated:
1. (Recommended) You can use Scanner class with useDelimiter to parse csv effectively, assuming you are trying to store 10 values from a line:
public static void parseCsvWithScanner() throws IOException {
Scanner scanner = new Scanner(new File("test.csv"));
// set your delimiter for scanner, "," for csv
scanner.useDelimiter(",");
// storing 10 values as a "line"
int LINE_LIMIT = 10;
// implement your own data structure to store each value of CSV
int[] tempLineArray = new int[LINE_LIMIT];
int lineBreakCount = 0;
while(scanner.hasNext()) {
// trim start and end spaces if there is any
String temp = scanner.next().trim();
tempLineArray[lineBreakCount++] = Integer.parseInt(temp);
if (lineBreakCount == LINE_LIMIT) {
// replace your own logic for handling the full array
for(int i=0; i<tempLineArray.length; i++) {
System.out.print(tempLineArray[i]);
} // end replace
// resetting array and counter
tempLineArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
}
scanner.close();
}
Or use the BufferedReader.
You might not need the ArrayList to store all values if there is memory issue by replacing your own logic.
public static void parseCsv() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
// your delimiter
char TOKEN = ',';
// your requirement of storing 10 values for each "line"
int LINE_LIMIT = 10;
// tmp for storing from BufferedReader.read()
int tmp;
// a counter for line break
int lineBreakCount = 0;
// array for storing 10 values, assuming the values of CSV are integers
int[] tempArray = new int[LINE_LIMIT];
// storing tempArray of each line to ArrayList
ArrayList<int[]> lineList = new ArrayList<>();
StringBuilder sb = new StringBuilder();
while((tmp = br.read()) != -1) {
if ((char)tmp == TOKEN) {
if (lineBreakCount == LINE_LIMIT) {
// your logic to handle the current "line" here.
lineList.add(tempArray);
// new "line"
tempArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
// storing current value from buffer with trim of spaces
tempArray[lineBreakCount] =
Integer.parseInt(sb.toString().trim());
lineBreakCount++;
// clear the buffer
sb.delete(0, sb.length());
}
else {
// add current char from BufferedReader if not delimiter
sb.append((char)tmp);
}
}
br.close();
}

Java letter replacement in file

So I done this so far, my program works for example turning numbers 123... into letters like abc...
But my problem is I can't make it work with special characters like : č, ć, đ. Problem is when I run it with special characters my file just get deleted.
edit: forgot to mention im working with .srt files , adding utf-8 in scanner worked for txt files, but when i tryed with .srt it just delete full contect from file.
The code:
LinkedList<String> lines = new LinkedList<String>();
// Opening the file
Scanner input = new Scanner(new File("input.srt"), "UTF-8");
while (input.hasNextLine()) {
String line = input.nextLine();
lines.add(replaceLetters(line));
}
input.close();
// Saving the new edited version file
PrintWriter writer = new PrintWriter("input.srt", "UTF-8");
for (String line: lines) {
writer.println(line);
}
writer.close();
The replace method:
public static String replaceLetters(String orig) {
String fixed = "";
// Go through each letter and replace with new letter
for (int i = 0; i < orig.length(); i++) {
// Get the letter
String chr = orig.substring(i, i + 1);
// Replace letter if nessesary
if (chr.equals("a")) {
chr = "1";
} else if (chr.equals("b")) {
chr = "2";
} else if (chr.equals("c")) {
chr = "3";
}
// Add the new letter to the end of fixed
fixed += chr;
}
return fixed;
}
Turn your
Scanner input = new Scanner(new File("input.txt"));
into
Scanner input = new Scanner(new File("input.txt"), "UTF-8");
You save in UTF-8, but read in a default charset.
Also, next time, use try-catch statements properly and include them in your post.

How to remove a particular string in a text file using java?

My input file has numerous records and for sample, let us say it has (here line numbers are just for your reference)
1. end
2. endline
3. endofstory
I expect my output as:
1.
2. endline
3. endofstory
But when I use this code:
import java.io.*;
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("D:/mypath/file.txt");
File temp = File.createTempFile("file1", ".txt", file.getParentFile());
String charset = "UTF-8";
String delete = "end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), charset));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp), charset));
for (String line; (line = reader.readLine()) != null;) {
line = line.replace(delete, "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
I get my output as:
1.
2. line
3. ofstory
Can you guys help me out with what I expect as output?
First, you'll need to replace the line with the new string List item not an empty string. You can do that using line = line.replace(delete, "List item"); but since you want to replace end only when it is the only string on a line you'll have to use something like this:
line = line.replaceAll("^"+delete+"$", "List item");
Based on your edits it seems that you indeed what to replace the line that contains end with an empty string. You can do that using something like this:
line = line.replaceAll("^"+delete+"$", "");
Here, the first parameter of replaceAll is a regular expression, ^ means the start of the string and $ the end. This will replace end only if it is the only thing on that line.
You can also check if the current line is the line you want to delete and just write an empty line to the file.
Eg:
if(line.equals(delete)){
writer.println();
}else{
writer.println(line);
}
And to do this process for multiple strings you can use something like this:
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
toDelete.add("another thing");
if(toDelete.contains(line)){
writer.println();
}else{
writer.println(line);
}
Here I'm using a set of strings I want to delete and then check if the current line is one of those strings.

How to break a file into tokens based on regex using Java

I have a file in the following format, records are separated by newline but some records have line feed in them, like below. I need to get each record and process them separately. The file could be a few Mb in size.
<?aaaaa>
<?bbbb
bb>
<?cccccc>
I have the code:
FileInputStream fs = new FileInputStream(FILE_PATH_NAME);
Scanner scanner = new Scanner(fs);
scanner.useDelimiter(Pattern.compile("<\\?"));
if (scanner.hasNext()) {
String line = scanner.next();
System.out.println(line);
}
scanner.close();
But the result I got have the begining <\? removed:
aaaaa>
bbbb
bb>
cccccc>
I know the Scanner consumes any input that matches the delimiter pattern. All I can think of is to add the delimiter pattern back to each record mannully.
Is there a way to NOT have the delimeter pattern removed?
Break on a newline only when preceded by a ">" char:
scanner.useDelimiter("(?<=>)\\R"); // Note you can pass a string directly
\R is a system independent newline
(?<=>) is a look behind that asserts (without consuming) that the previous char is a >
Plus it's cool because <=> looks like Darth Vader's TIE fighter.
I'm assuming you want to ignore the newline character '\n' everywhere.
I would read the whole file into a String and then remove all of the '\n's in the String. The part of the code this question is about looks like this:
String fileString = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8);
fileString = fileString.replace("\n", "");
Scanner scanner = new Scanner(fileString);
... //your code
Feel free to ask any further questions you might have!
Here is one way of doing it by using a StringBuilder:
public static void main(String[] args) throws FileNotFoundException {
Scanner in = new Scanner(new File("C:\\test.txt"));
StringBuilder builder = new StringBuilder();
String input = null;
while (in.hasNextLine() && null != (input = in.nextLine())) {
for (int x = 0; x < input.length(); x++) {
builder.append(input.charAt(x));
if (input.charAt(x) == '>') {
System.out.println(builder.toString());
builder = new StringBuilder();
}
}
}
in.close();
}
Input:
<?aaaaa>
<?bbbb
bb>
<?cccccc>
Output:
<?aaaaa>
<?bbbb bb>
<?cccccc>

reading from text file to string array

So I can search for a string in my text file, however, I wanted to sort data within this ArrayList and implement an algorithm. Is it possible to read from a text file and the values [Strings] within the text file be stored in a String[] Array.
Also is it possible to separate the Strings? So instead of my Array having:
[Alice was beginning to get very tired of sitting by her sister on the, bank, and of having nothing to do:]
is it possible to an array as:
["Alice", "was" "beginning" "to" "get"...]
.
public static void main(String[]args) throws IOException
{
Scanner scan = new Scanner(System.in);
String stringSearch = scan.nextLine();
BufferedReader reader = new BufferedReader(new FileReader("File1.txt"));
List<String> words = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
words.add(line);
}
for(String sLine : words)
{
if (sLine.contains(stringSearch))
{
int index = words.indexOf(sLine);
System.out.println("Got a match at line " + index);
}
}
//Collections.sort(words);
//for (String str: words)
// System.out.println(str);
int size = words.size();
System.out.println("There are " + size + " Lines of text in this text file.");
reader.close();
System.out.println(words);
}
To split a line into an array of words, use this:
String words = sentence.split("[^\\w']+");
The regex [^\w'] means "not a word char or an apostrophe"
This will capture words with embedded apostrophes like "can't" and skip over all punctuation.
Edit:
A comment has raised the edge case of parsing a quoted word such as 'this' as this.
Here's the solution for that - you have to first remove wrapping quotes:
String[] words = input.replaceAll("(^|\\s)'([\\w']+)'(\\s|$)", "$1$2$3").split("[^\\w']+");
Here's some test code with edge and corner cases:
public static void main(String[] args) throws Exception {
String input = "'I', ie \"me\", can't extract 'can't' or 'can't'";
String[] words = input.replaceAll("(^|[^\\w'])'([\\w']+)'([^\\w']|$)", "$1$2$3").split("[^\\w']+");
System.out.println(Arrays.toString(words));
}
Output:
[I, ie, me, can't, extract, can't, or, can't]
Also is it possible to separate the Strings?
Yes, You can split string by using this for white spaces.
String[] strSplit;
String str = "This is test for split";
strSplit = str.split("[\\s,;!?\"]+");
See String API
Moreover you can also read a text file word by word.
Scanner scan = null;
try {
scan = new Scanner(new BufferedReader(new FileReader("Your File Path")));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while(scan.hasNext()){
System.out.println( scan.next() );
}
See Scanner API

Categories