Java letter replacement in file - java

So I done this so far, my program works for example turning numbers 123... into letters like abc...
But my problem is I can't make it work with special characters like : č, ć, đ. Problem is when I run it with special characters my file just get deleted.
edit: forgot to mention im working with .srt files , adding utf-8 in scanner worked for txt files, but when i tryed with .srt it just delete full contect from file.
The code:
LinkedList<String> lines = new LinkedList<String>();
// Opening the file
Scanner input = new Scanner(new File("input.srt"), "UTF-8");
while (input.hasNextLine()) {
String line = input.nextLine();
lines.add(replaceLetters(line));
}
input.close();
// Saving the new edited version file
PrintWriter writer = new PrintWriter("input.srt", "UTF-8");
for (String line: lines) {
writer.println(line);
}
writer.close();
The replace method:
public static String replaceLetters(String orig) {
String fixed = "";
// Go through each letter and replace with new letter
for (int i = 0; i < orig.length(); i++) {
// Get the letter
String chr = orig.substring(i, i + 1);
// Replace letter if nessesary
if (chr.equals("a")) {
chr = "1";
} else if (chr.equals("b")) {
chr = "2";
} else if (chr.equals("c")) {
chr = "3";
}
// Add the new letter to the end of fixed
fixed += chr;
}
return fixed;
}

Turn your
Scanner input = new Scanner(new File("input.txt"));
into
Scanner input = new Scanner(new File("input.txt"), "UTF-8");
You save in UTF-8, but read in a default charset.
Also, next time, use try-catch statements properly and include them in your post.

Related

How read data from file that is separated by a blank line in Java

For example I have a file "input.txt" :
This is the
first data
This is the second
data
This is the last data
on the last line
And I want to store this data in a ArrayList in this form:
[This is the first data, This is the second data, This is the last data on the last line]
Note: Every data in file is separated by a blank line. How to skip this blank line?
I try this code but it don't work right:
List<String> list = new ArrayList<>();
File file = new File("input.txt");
StringBuilder stringBuilder = new StringBuilder();
try (Scanner in = new Scanner(file)) {
while (in.hasNext()) {
String line = in.nextLine();
if (!line.trim().isEmpty())
stringBuilder.append(line).append(" ");
else {
list.add(stringBuilder.toString());
stringBuilder = new StringBuilder();
}
}
} catch (FileNotFoundException e) {
System.out.println("Not found file: " + file);
}
Blank lines are not really blank. There are end-of-line character(s) involved the terminate each line. An apparent empty line means you have a pair of end-of-line character(s) abutting.
Search for that pair, and break your inputs when found. For example, using something like String::split.
For example, suppose we have a file with the words this and that.
this
that
Let's visualize this file, showing the LINE FEED (LF) character (Unicode code point 10 decimal) used to terminate each line as <LF>.
this<LF>
<LF>
that<LF>
To the computer, there are no “lines”, so the text appears to Java like this:
this<LF><LF>that<LF>
You can more clearly now notice how pairs of LINE FEED (LF) characters delimit each line. Search for the instances of that pairing to parse your text.
You are actually almost there. What you missed is that the last 2 lines need to be handled differently, as there is NO empty-string line at the bottom of the file.
try (Scanner in = new Scanner(file)) {
while (in.hasNext()) {
String line = in.nextLine();
//System.out.println(line);
if (!line.trim().isEmpty())
stringBuilder.append(line).append(" ");
else { //this is where new line happens -> store the combined string to arrayList
list.add(stringBuilder.toString());
stringBuilder = new StringBuilder();
}
}
//Below is to handle the last line, as after the last line there is NO empty line
if (stringBuilder.length() != 0) {
list.add(stringBuilder.toString());
} //end if
for (int i=0; i< list.size(); i++) {
System.out.println(list.get(i));
} //end for
} catch (FileNotFoundException e) {
System.out.println("Not found file: " + file);
}
Output of above:
This is the first data
This is the second data
This is the last data on the last line
I added an if codition right after the while loop in your code and it worked,
List<String> list = new ArrayList<>();
File file = new File("input.txt");
StringBuilder stringBuilder = new StringBuilder();
try (Scanner in = new Scanner(file)) {
while (in.hasNext()) {
String line = in.nextLine();
if (!line.trim().isEmpty()) {
stringBuilder.append(line).append(" ");
}
else {
list.add(stringBuilder.toString());
stringBuilder = new StringBuilder();
}
}
if (stringBuilder.toString().length() != 0) {
list.add(stringBuilder.toString());
}
} catch (FileNotFoundException e) {
System.out.println("Not found file: " + file);
}
System.out.println(list.toString());
I got the below output
[This is the first data , This is the second data , This is the last data on the last line ]

Java using \034 as delimiter in a string

I am trying to use '\034' field separator character as a delimiter in a string.
The issue is when I hardcode "\034"+opField and write it to a file it works, but if the "\034" character is read from a file, it writes the output as string "col1\034col2'.
I tried using StringBuilder but it escapes the \034 to "\\034".
I am using the following code to read the character from the file:
try (BufferedReader br = new BufferedReader(new FileReader(fConfig))){
int lc = 1;
for(String line;(line = br.readLine())!=null;){
String[] rowList = line.split(delim);
int row_len = rowList.length;
if (row_len<2){
System.out.println("Incorrect dictionary file row:"+fConfig.getAbsolutePath()+"\nNot enough values found at row:"+line);
}else{
String key = rowList[0];
String value = rowList[1];
dictKV.put(key, value);
}
lc++;
}
}catch(Exception e){
throw e;
}
Any help is welcome...
[update]: The same thing is happening with '\t' character, if harcoded fine, but if read from a file its getting appended as characters. "col0\tcol1"
if(colAl.toLowerCase().contains(" as ")){
String temp = colAl.replaceAll("[ ]+as[ ]+"," | ");
ArrayList<String> tempA = this.brittle_delim(temp,'|');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}else {
ArrayList<String> tempA = this.brittle_delim(colAl,' ');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}
if(i==0){
sb.append(colAl);
headerCols+=colAl.trim();
}else{
headerCols+= this.output_field_delim + colAl;
sb.append(this.output_field_delim);
sb.append(colAl);
}
}
}
System.out.println("SB Header Cols:"+sb.toString());
System.out.println("Header Cols:"+headerCols);
Output:
SB Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
In the above code if I do the following I am getting correct results:
headerCols+= "\034"+ colAl;
output:
SPRN_CO_IDFISC_YR_MTH_DSPLY_CDCST_OBJ_CDPRFT_CTR_CDLEGL_CO_CDHEAD_CT_TYPE_IDFIN_OWN_CDFUNC_AREA_CDHEAD_CT_NR
The FS characters are there even if they are geting removed here
You should provide an example demonstrating your problem. Not just incomplete code snippets.
Following runable snippet does what you explained.
// create a file one line
byte[] bytes = "foo bar".getBytes(StandardCharsets.ISO_8859_1);
String fileName = "/tmp/foobar";
Files.write(Paths.get(fileName), bytes);
String headerCols = "";
String outputFieldDelim = "\034";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
// read the line from the file and split by blank character
String[] cols = br.readLine().split(" ");
// contcatenate the values with "\034"
// but ... for your code ...
// don't concatenate String objects in a loop like below
// use a StringBuilder or StringJoiner instead
headerCols += outputFieldDelim + cols[0];
headerCols += outputFieldDelim + cols[1];
}
// output with the "\034" character
System.out.println(headerCols);
I guess this is where I found my solution and the actual words for my Question.
How to unescape string literals in java

How to break a file into tokens based on regex using Java

I have a file in the following format, records are separated by newline but some records have line feed in them, like below. I need to get each record and process them separately. The file could be a few Mb in size.
<?aaaaa>
<?bbbb
bb>
<?cccccc>
I have the code:
FileInputStream fs = new FileInputStream(FILE_PATH_NAME);
Scanner scanner = new Scanner(fs);
scanner.useDelimiter(Pattern.compile("<\\?"));
if (scanner.hasNext()) {
String line = scanner.next();
System.out.println(line);
}
scanner.close();
But the result I got have the begining <\? removed:
aaaaa>
bbbb
bb>
cccccc>
I know the Scanner consumes any input that matches the delimiter pattern. All I can think of is to add the delimiter pattern back to each record mannully.
Is there a way to NOT have the delimeter pattern removed?
Break on a newline only when preceded by a ">" char:
scanner.useDelimiter("(?<=>)\\R"); // Note you can pass a string directly
\R is a system independent newline
(?<=>) is a look behind that asserts (without consuming) that the previous char is a >
Plus it's cool because <=> looks like Darth Vader's TIE fighter.
I'm assuming you want to ignore the newline character '\n' everywhere.
I would read the whole file into a String and then remove all of the '\n's in the String. The part of the code this question is about looks like this:
String fileString = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8);
fileString = fileString.replace("\n", "");
Scanner scanner = new Scanner(fileString);
... //your code
Feel free to ask any further questions you might have!
Here is one way of doing it by using a StringBuilder:
public static void main(String[] args) throws FileNotFoundException {
Scanner in = new Scanner(new File("C:\\test.txt"));
StringBuilder builder = new StringBuilder();
String input = null;
while (in.hasNextLine() && null != (input = in.nextLine())) {
for (int x = 0; x < input.length(); x++) {
builder.append(input.charAt(x));
if (input.charAt(x) == '>') {
System.out.println(builder.toString());
builder = new StringBuilder();
}
}
}
in.close();
}
Input:
<?aaaaa>
<?bbbb
bb>
<?cccccc>
Output:
<?aaaaa>
<?bbbb bb>
<?cccccc>

Processing text files and hyphenating strings line by line in Java

I have a .txt file with 8,000 rows in a single column. Each line contains either an alphanumeric or a number like this:
0219381A
10101298
32192017
1720291C
04041009
I'd like to read this file, insert a 0 (zero) before each beginning digit, a hyphen in between digits 3 and 4, and then remove the remaining digits to an output file like this:
002-19
010-10
032-19
017-20
004-04
I'm able to read from and write to a file or insert a hyphen when done separately but can't get the pieces working together:
public static void main(String[] args) throws FileNotFoundException{
// TODO Auto-generated method stub
Scanner in = new Scanner(new File("file.txt"));
PrintWriter out = new PrintWriter("file1.txt");
while(in.hasNextLine())
{
StringBuilder builder = new StringBuilder(in.nextLine());
builder.insert(0, "0");
builder.insert(3, "-");
String hyph = builder.toString();
out.printf(hyph);
}
in.close();
out.close();
How can I get these pieces working together/is there another approach?
try this
while (in.hasNextLine()) {
String line = in.nextLine();
if (!line.isEmpty()) {
line = "0" + line.substring(0, 2) + "-" + line.substring(2, 4);
}
out.println(line);
}
You code looks fine. If you make this changes, you should be good i feel :
StringBuilder builder = new StringBuilder(in.nextLine().substring(0,4));

removeAll operation on arraylist makes program hang

I'm trying to read in from two files and store them in two separate arraylists. The files consist of words which are either alone on a line or multiple words on a line separated by commas.
I read each file with the following code (not complete):
ArrayList<String> temp = new ArrayList<>();
FileInputStream fis;
fis = new FileInputStream(fileName);
Scanner scan = new Scanner(fis);
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (scan.hasNext()) {
String md5 = scan.next();
temp.add(md5);
}
}
scan.close();
return temp;
Each file contains almost 1 million words (I don't know the exact number), so I'm not entirely sure that the above code works correctly - but it seems to.
I now want to find out how many words are exclusive to the first file/arraylist. To do so I planned on using list1.removeAll(list2) and then checking the size of list1 - but for some reason this is not working. The code:
public static ArrayList differentWords(String fileName1, String fileName2) {
ArrayList<String> file1 = readFile(fileName1);
ArrayList<String> file2 = readFile(fileName2);
file1.removeAll(file2);
return file1;
}
My main method contains a few different calls and everything works fine until I reach the above code, which just causes the program to hang (in netbeans it's just "running").
Any idea why this is happening?
You are not using input in
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (scan.hasNext()) {
String md5 = scan.next();
temp.add(md5);
}
}
I think you meant to do this:
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (input.hasNext()) {
String md5 = input.next();
temp.add(md5);
}
}
but that said you should look into String#split() that will probably save you some time:
while (scan.hasNextLine()) {
String line = scan.nextLine();
String[] tokens = line.split(",");
for (String token: tokens) {
temp.add(token);
}
}
try this :
for(String s1 : file1){
for(String s2 : file2){
if(s1.equals(s2)){file1.remove(s1))}
}
}

Categories