I've got an oddball problem here. I've got a little java program that filters Minecraft log files to make them easier to read. On each line of these logs, there are usually multiple instances of the character "§", which returns a hex value of FFFD.
I am filtering out this character (as well as the character following it) using:
currentLine = currentLine.replaceAll("\uFFFD.", "");
Now, when I run the program through NetBeans, it works swell. My lines get outputted looking like this:
CxndyAnnie: Mhm
CxndyAnnie: Sorry
But when I build the .jar file and wrap it into a .exe file using JSmooth, that character no longer gets filtered out when I run the .exe, and my lines come out looking like this:
§e§7[§f$65§7] §1§nCxndyAnnie§e: Mhm
§e§7[§f$65§7] §1§nCxndyAnnie§e: Sorry
(note: the additional square brackets and $65 show up because their filtering is dependent on the special character and it's following character being removed first)
Any ideas why this would no longer work after putting it through JSmooth? Is there a different way to do the text replace that would preserve its function?
By the way, I also attempted to remove this character using
currentLine = currentLine.replaceAll("§.", "");
but that didn't work in Netbeans nor as a .exe.
I'll go ahead and past the full method below:
public static String[] filterLines(String[] allLines, String filterType, Boolean timeStamps) throws IOException {
String currentLine = null;
FileWriter saveFile = new FileWriter("readable.txt");
String heading;
String string1 = "[L]";
String string2 = "[A]";
String string3 = "[G]";
if (filterType.equals(string1)) {
heading = "LOCAL CHAT LOGS ONLY \r\n\r\n";
}
else if (filterType.equals(string2)) {
heading = "ADVERTISING CHAT LOGS ONLY \r\n\r\n";
}
else if (filterType.equals(string3)) {
heading = "GLOBAL CHAT LOGS ONLY \r\n\r\n";
}
else {
heading = "CHAT LINES CONTAINING \"" + filterType + "\" \r\n\r\n";
}
saveFile.write(heading);
for (int i = 0; i < allLines.length; i++) {
if ((allLines[i] != null ) && (allLines[i].contains(filterType))) {
currentLine = allLines[i];
if (!timeStamps) {
currentLine = currentLine.replaceAll("\\[..:..:..\\].", "");
}
currentLine = currentLine.replaceAll("\\[Client thread/INFO\\]:.", "");
currentLine = currentLine.replaceAll("\\[CHAT\\].", "");
currentLine = currentLine.replaceAll("\uFFFD.", "");
currentLine = currentLine.replaceAll("\\[A\\].", "");
currentLine = currentLine.replaceAll("\\[L\\].", "");
currentLine = currentLine.replaceAll("\\[G\\].", "");
currentLine = currentLine.replaceAll("\\[\\$..\\].", "");
currentLine = currentLine.replaceAll(".>", ":");
currentLine = currentLine.replaceAll("\\[\\$100\\].", "");
saveFile.write(currentLine + "\r\n");
//System.out.println(currentLine);
}
}
saveFile.close();
ProcessBuilder openFile = new ProcessBuilder("Notepad.exe", "readable.txt");
openFile.start();
return allLines;
}
FINAL EDIT
Just in case anyone stumbles across this and needs to know what finally worked, here's the snippet of code where I pull the lines from the file and re-encode it to work:
BufferedReader fileLines;
fileLines = new BufferedReader(new FileReader(file));
String[] allLines = new String[numLines];
int i=0;
while ((line = fileLines.readLine()) != null) {
byte[] bLine = line.getBytes();
String convLine = new String(bLine, Charset.forName("UTF-8"));
allLines[i] = convLine;
i++;
}
I also had a problem like this in the past with minecroft logs, I don’t remember the exact details, but the issue came down to a file format problem, where UTF8 encoding worked correctly but some other text encoding including the system default did not work correctly.
First:
Make sure that you specify UTF8 encoding when reading the byteArray from file so that allLines contains the correct info like so:
Path fileLocation = Paths.get("C:/myFileLocation/logs.txt");
byte[] data = Files.readAllBytes(fileLocation);
String allLines = new String(data , Charset.forName("UTF-8"));
Second:
Using \uFFFD is not going to work, because \uFFFD is only used to replace an incoming character whose value is unknown or unrepresentable in Unicode.
However if you used the correct encoding (shown in my first point) then \uFFFD is not necessary because the value § is known in unicode so you can simply use
currentLine.replaceAll("§", "");
or specifically use the actual unicode string for that character U+00A7 like so
currentLine.replaceAll("\u00A7", "");
or just use both those lines in your code.
Related
I am currently trying to read multiple files (UTF-8) within a directory and store each element in that text file into an array.
I am able to get the text to print to console however it shows some funny characters I can't seem to rid myself off (See image - what is should look like is displayed on the right).
Currently, I have a method that builds an array with all file names in that directory then using a for loop I send each of these file names to a read method which puts it into a string.
The below method writes these file names to an array.
public static ArrayList<String> readModelFilesInModelDir() {
File folder = new File("Models/");
File[] listOfFiles = folder.listFiles();
String random = "";
assert listOfFiles != null;
ArrayList<String> listOfModelFiles = new ArrayList<>();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
//System.out.println("File " + listOfFiles[i].getName());
listOfModelFiles.add(listOfFiles[i].getName());
} else if (listOfFiles[i].isDirectory()) {
System.out.println("Directory " + listOfFiles[i].getName());
}
}
System.out.println(listOfModelFiles);
return listOfModelFiles;
The below for loop then sends these file names to the read method.
ArrayList<String> modelFiles = readModelFilesInModelDir();
for (int i = 0; i < modelFiles.size(); i++) {
String thisString = readModelFileIntoArray(modelFiles.get(i));
System.out.println(thisString);
}
The below method then reads the string into an array, which is outputting what the images show.
public static String readModelFileIntoArray(String modelFilePath) {
StringBuilder fileHasBeenRead = new StringBuilder();
try {
Reader reader = new InputStreamReader(new FileInputStream(("Models/" + modelFilePath)), StandardCharsets.UTF_8);
String s;
BufferedReader bufferedReader = new BufferedReader(reader);
while ((s = bufferedReader.readLine()) != null) {
fileHasBeenRead.append(s + "\n");
}
reader.close();
} catch (Exception e) {
System.out.print(e);
}
return fileHasBeenRead.toString().trim();
}
Finally, how would I fix this output issue as well as store each of these files that have been read into a seperate array that I can use elsewhere? Thanks!
I agree with Johnny Mopp, your file is encoded in UTF-16, not in UTF-8. The two �� at the beginning of your output looks like a byte order mark (BOM). In UTF-16, each character is coded on two bytes. Since your text only contains characters in the ASCII range, it means that each first byte is always 0x00. This is why you're seeing all these ▯: they correspond to the non-printable character 0x00. I would even say that since the two characters following �� are ▯ and a in this order, your file is using big-endian UTF-16.
Instead of UTF-8, use StandardCharsets.UTF_16. It will also take the BOM into account and use the appropriate endianness.
It's much easier (and usually better) to use existing libraries for common stuff. There is FileUtils from apache commons-io, that provides this functionality out of the box, reducing your file reading code to a one liner
String thisString = FileUtils.readFileToString("Models/" + modelFilePath, StandardCharsets.UTF_8);
... or whatever charset your file is using...
Basically I've got an assignment which reads multiple lines from a .txt file.
There are 4 values in the text file per line and each value is separated by 2 spaces.
There are about 10 lines of data in the file.
After taking the input from the file the program then puts it onto a Database. The database connection functionality works fine.
My issue now is with reading from the file using a BufferedReader.
The issue is that if I uncomment any 1 of the 3 lines at the bottom the BufferedReader reads every other line. And if I don't use them then there's an exception as the next input is of type String.
I have contemplated using a Scanner with the .hasNextLine() method.
Any thoughts on what could be the problem and how to fix it?
Thanks.
File file = new File(FILE_INPUT_NAME);
FileReader fr = new FileReader(file);
BufferedReader readFile = new BufferedReader(fr);
String line = null;
while ((line = readFile.readLine()) != null) {
String[] split = line.split(" ", 4);
String id = split[0];
nameFromFile = split[1];
String year = split[2];
String mark = split[3];
idFromFile = Integer.parseInt(id);
yearOfStudyFromFile = Integer.parseInt(year);
markFromFile = Integer.parseInt(mark);
//line = readFile.readLine();
//readFile.readLine();
//System.out.println(readFile.readLine());
}
Edit: There was an error in the formatting of the .txt file. a missing value.
But now I get an ArrayOutOfBoundsException.
Edit edit: Another error in the .txt file! Turns out there was a single space instead of a double. It seems to be working now. But any advice on how to deal with file errors like this in the future?
The issue is that if I uncomment any 1 of the 3 lines at the bottom the BufferedReader reads every other line.
Correct. If you put any of those lines of code in, the line of text read will be thrown away and not processed. You're already reading in the while condition. You don't need another read. If you put any of those lines in, they will be thrown away and not proce
A compilable version of the code posted could be
public void read() throws IOException {
File file = new File(FILE_INPUT_NAME);
FileReader fr = new FileReader(file);
BufferedReader readFile = new BufferedReader(fr);
String line;
while ((line = readFile.readLine()) != null) {
String[] split = line.split(" ", 4);
if (split.length != 4) { // Not enough tokens (e.g., empty line) read
continue;
}
String id = split[0];
String nameFromFile = split[1];
String year = split[2];
String mark = split[3];
int idFromFile = Integer.parseInt(id);
int yearOfStudyFromFile = Integer.parseInt(year);
int markFromFile = Integer.parseInt(mark);
//line = readFile.readLine();
//readFile.readLine();
//System.out.println(readFile.readLine());
}
}
The above uses a single space (" " instead of the original " "). To split on any number of changes, a regular expression can be used, e.g. "\\s+". Of course, exactly 2 spaces can also be used, if that reflects the structure of the input data.
What the method should do with the extracted values (e.g., returning them in an object of some type, or saving them to a database directly), is up to the application using it.
I need to process a big text file, there are almost 400 column in each line, and almost 800000 lines in the file, the format of each line in the file is like:
340,9,2,3........5,2,LA
what I want to do is, for each line, if the last column is LA, then print the first column of this line.
i write a simple program to do it
BufferedReader bufr = new BufferedReader(new FileReader ("A.txt"));
BufferedWriter bufw = new BufferedWriter(new FileWriter ("LA.txt"));
String line = null;
while ((line = bufr.readLine()) != null) {
String [] text = new String [388];
text = line.split(",");
if (text [387] == args[2]) {
bufw.write(text[0]);
bufw.newLine();
bufw.flush();
}
}
bufw.close();
bufr.close();
but it seems the length of an array cant be that big, i received a java.lang.ArrayIndexOutOfBoundsException
since i'm using split(",") in order to get the last column of a line, and it will be out of array bounds, how can I do with it? thanks.
text does not need to be initialized, String.split will create a correctly sized array:
String[] text = line.split(",");
You're also comparing Strings using reference equality (==). You should be using .equals():
if (text[387].equals(args[2])) { ... }
You're probably getting java.lang.ArrayIndexOutOfBoundsException because the the index 387 is too big. If you want to get last element, use this:
text[text.length - 1]
Modify and try this
String [] text = line.split(",");
if (text [text.length - 1].equals(args[2])) {
bufw.write(text[0]);
bufw.newLine();
bufw.flush();
}
Assuming args[2] is LA.
String [] text;
Change your code to this. You don't need to initialize a size. When the String.split method executes it will automatically initialize the correct size for your array.
If you just need the first and the last column, then there is no need to create an array out of the current line.
You could do something like this:
final String test = "340,9,2,354,63,5,5,45,634,5,5,2,LA";
final char delimiter = ',';
final String lastColumn = test.substring(test.lastIndexOf(delimiter) + 1);
if (lastColumn.equals("LA")) {
final String firstColumn = test.substring(0, test.indexOf(delimiter));
System.out.println(firstColumn);
}
This code extracts the last column first and tests it. If it matches "LA", then it extract the first column. It will ignore the remaining content of the line.
Your code would be:
BufferedReader bufr = new BufferedReader(new FileReader ("A.txt"));
BufferedWriter bufw = new BufferedWriter(new FileWriter ("LA.txt"));
String line = null;
while ((line = bufr.readLine()) != null) {
final String lastColumn = line.substring(line.lastIndexOf(delimiter) + 1);
if (lastColumn.equals(args[2])) {
bufw.write(line.substring(0, line.indexOf(delimiter)));
bufw.newLine();
bufw.flush();
}
}
bufw.close();
bufr.close();
(this code is not tested yet, but you get the idea :))
I'm having a problem reading UTF-8 characters in my code (running on Eclipse).
I have a file text which has a few lines in it, for example:
אך 1234
NOTE: There is a \t before the word, and the word should appear on the left, the number on the right... I don't know how to reverse them here, sorry.
That is, a Hebrew word and then a number.
I need to separate the word from the number somehow. I tried this:
BufferedReader br = new BufferedReader(new FileReader(text));
String content;
while ((content = br.readLine()) != null)
{
String delims = "[ ]+";
String[] tokens = content.split(delims);
}
The problem is that for some reason, the code reads content (the first line in the file) as follows:
אך\t1234
...meaning that the space isn't in its correct place.
I suppose I could tokenize the text using the \t, but I'm not sure I should do it, as the file isn't being read correctly...
Does anyone have any idea why this happens?
Thanks so much :-)
I think you are matching a space when there actually is a tab there?
Can you try this:
BufferedReader br = new BufferedReader(new FileReader(text));
String content;
while ((content = br.readLine()) != null)
{
String delims = "\\s";
String[] tokens = content.split(delims);
}
My program needs to read from a multi-lined .ini file, I've got it to the point it reads every line that start with a # and prints it. But i only want to to record the value after the = sign. here's what the file should look like:
#music=true
#Volume=100
#Full-Screen=false
#Update=true
this is what i want it to print:
true
100
false
true
this is my code i'm currently using:
#SuppressWarnings("resource")
public void getSettings() {
try {
BufferedReader br = new BufferedReader(new FileReader(new File("FileIO Plug-Ins/Game/game.ini")));
String input = "";
String output = "";
while ((input = br.readLine()) != null) {
String temp = input.trim();
temp = temp.replaceAll("#", "");
temp = temp.replaceAll("[*=]", "");
output += temp + "\n";
}
System.out.println(output);
}catch (IOException ex) {}
}
I'm not sure if replaceAll("[*=]", ""); truly means anything at all or if it's just searching for all for of those chars. Any help is appreciated!
Try following:
if (temp.startsWith("#")){
String[] splitted = temp.split("=");
output += splitted[1] + "\n";
}
Explanation:
To process lines only starting with desired character use String#startsWith method. When you have string to extract values from, String#split will split given text with character you give as method argument. So in your case, text before = character will be in array at position 0, text you want to print will be at position 1.
Also note, that if your file contains many lines starting with #, it should be wise not to concatenate strings together, but use StringBuilder / StringBuffer to add strings together.
Hope it helps.
Better use a StringBuffer instead of using += with a String as shown below. Also, avoid declaring variables inside loop. Please see how I've done it outside the loop. It's the best practice as far as I know.
StringBuffer outputBuffer = new StringBuffer();
String[] fields;
String temp;
while((input = br.readLine()) != null)
{
temp = input.trim();
if(temp.startsWith("#"))
{
fields = temp.split("=");
outputBuffer.append(fields[1] + "\n");
}
}