How to set next encoding to a file in Java - java

Here is my code which is reading file and replacing on a specific line the text but when is reading the(readAllLines method) lines and it have a symbol in the file which doesn't matches with the specified Charset it throws MalformedInputException.
For Example: I'm reading a text with UTF_8 charset but in file it has symbol "†" and it throws me MIE.
I would like to ask you how in the following code i can make a check when a MalformedInputException found and try the next encoding . For example when the encoding is UTF_8, to try the next one UTF_16 etc. and when it matches to read the file properly.
public boolean replaceTextInSpecificLine(String fileName, int lineNumber, String content, Charset cs)
{
try
{
scan = new Scanner(System.in);
File filePath = readFile(fileName, true);
List<String> lines = null;
if(filePath !=null)
{
lines = Files.readAllLines(filePath.toPath(), cs);
while (lineNumber < 0 || lineNumber > lines.size() - 1)
{
System.out.print("Wrong line number or the file is empty! Enter another line: ");
lineNumber = scan.nextInt();
scan.nextLine();
}
lines.set(lineNumber - 1, content);
Files.write(filePath.toPath(), lines, cs);
System.out.println("Successfully saved!");
return true;
}
}
catch(IOException e)
{
e.printStackTrace();
}
finally
{
close(scan);
}
return false;
}

I would avoid switching encodings while reading the file and simply reread the file with the next encoding. Something like this would be sufficient:
List<String> getAllLines(File file, Charset... charsets) {
for (Charset cs: charsets) {
try {
return Files.readAllLines(file.toPath(), cs);
} catch (MalformedInputException e) {
...
} catch (IOException e) {
...
}
}
// error
}
(this is just an example, your arguments may vary based on need)
If you switched encodings while reading the document, you have the potential of interpreting some characters as valid UTF-8 characters when in fact they were ISO-8859-1 characters.

Related

Java - Printing unicode from text file doesn't output corresponding UTF-8 character

I have this text file with numerous unicodes and trying to print the corresponding UTF-8 characters in the console but all it prints is the hex string. Like if I copy any of the values and paste them into a System.out it works fine, but not when reading them from the text file.
The following is my code for reading the file, which contains lines of values like \u00C0, \u00C1, \u00C2, \u00C3 which are printed to the console and not the values I want.
private void printFileContents() throws IOException {
Path encoding = Paths.get("unicode.txt");
try (Stream<String> stream = Files.lines(encoding)) {
stream.forEach(v -> { System.out.println(v); });
} catch (IOException e) {
e.printStackTrace();
}
}
This is the method I used to parse html that had the unicodes in the first place.
private void parseGermanEncoding() {
try
{
File encoding = new File("encoding.html");
Document document = Jsoup.parse(encoding, "UTF-8", "http://example.com/");
Element table = document.getElementsByClass("codetable").first();
Path f = Paths.get("unicode.txt");
try (BufferedWriter wr = new BufferedWriter(new FileWriter(f.toFile())))
{
for (Element row : table.select("tr"))
{
Elements tds = row.select("td");
String unicode = tds.get(0).text();
if (unicode.startsWith("U+"))
{
unicode = unicode.substring(2);
}
wr.write("\\u" + unicode);
wr.newLine();
}
wr.flush();
wr.close();
}
} catch (IOException e)
{
e.printStackTrace();
}
}
You will need to convert the string from unicode encoded string to UTF-8 encoded string. You could follow the steps, 1.convert the string to byte array using myString.getBytes("UTF-8") and 2.get the UTF-8 encoded string using new String(byteArray, "UTF-8"). The code block needs to be surrounded with try/catch for UnsupportedEncodingException.
Thanks to OTM's comment above I was able to get a working solution for this. You take the unicode string, convert to hex using Integer.parseInt() and finally casting to char to get the actual value. This solution is based on this post provided by OTM - How to convert a string with Unicode encoding to a string of letters
private void printFileContents() throws IOException {
Path encoding = Paths.get("unicode.txt");
try (Stream<String> stream = Files.lines(encoding)) {
stream.forEach(v ->
{
String output = "";
// Takes unicode digits and converts to HEX value
int parse = Integer.parseInt(v, 16);
// Get the actual value of the hex value
output += (char) parse;
System.out.println(output);
});
} catch (IOException e) {
e.printStackTrace();
}
}

program only read last line in .txt file java

I have a problem and don't know what to do. This method is supposed to read all the text in a .txt document. My problem is when the document contains more then one line of text and the program only read the last line. The program don't need to worry about signs like . , : or spaces, but it have to read all the letters. Can anybody help me?
example text
hello my name is
(returns the right result)
hello my
name is
(returns only name is)
private Scanner x;
String readFile(String fileName)
{
try {
x = new Scanner (new File(fileName + (".txt")));
}
catch (Exception e) {
System.out.println("cant open file");
}
while (x.hasNext()) {
read = x.next();
}
return read;
}
It's because when you use read = x.next(), the string in the read object is always being replaced by the text in the next line of the file. Use read += x.next() or read = read.concat(x.next()); instead.
You replace every read with every read(). Also, you didn't close() your Scanner. I would use a try-with-resources and something like,
String readFile(String fileName)
{
String read = "";
try (Scanner x = new Scanner (new File(fileName + (".txt")));) {
while (x.hasNextLine()) {
read += x.nextLine() + System.lineSeparator(); // <-- +=
}
} catch (Exception e) {
System.out.println("cant open file");
}
return read;
}

How to deal with cp1252 encoding while Spliting a String contains Japanese Characters

I am reading a csv file line by line and delimiting each line on the basis of
tab delimiter.There is a line in file like :-
"\"255U-2968RYE\" \"Organization\" \"SMBC日興証券 株式会社\" \"2968RYE\"
I used split function format split("\t(?=([^\"]\"[^\"]\")[^\"]$)",-1). In my view
split function should return a String array of size 4. But it is not working as I am
expecting. I tried to change encoding type in eclipse and used different-2 like utf-8,
ISO-8859-1 But did not work. File is of csv format. I am reading it using CSV open
source api. as shown below.
String read[];
int row = 0;
int columns = 0;
int totalElement = 0;
String msg = null;
int cols = (int) Double. parseDouble(totalColumns);
try
{
CSVReader reader = new CSVReader(new FileReader(filePath),'\n');
if(titleAllowed.equalsIgnoreCase("yes"))
reader.readNext();
while((read=(reader.readNext()))!=null)
{
row++;
if(row==1)
{
columns = (read[0].split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)",-1)).length;
if(cols !=columns)
{
JOptionPane.showMessageDialog(null, "Columns not matched as mentioned in checkList");
return false;
}
}
if((columns!=0)&&(columns!=read[0].split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)",-1).length))
{
msg = "Error exists in line "+row;
return false;
}
if(!read[0].equals(""))
totalElement+=(read[0].split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)",-1)).length;
}
reader.close();
}catch(Exception e)
{
e.printStackTrace();
}
if(null!=msg)
{
JOptionPane.showMessageDialog(null, msg);
}
return (totalElement==row*columns);
Correct answer is highly appreciated. Thanks in advance

java RandomAccessFile parameter

I am trying to follow the below example found it here Java: Find if the last line of a file is empty to determine if a file finish by CRLF(empty line) however when I pass a String to the method RandomAccessFile says file Not Found. the problem is I cant feed it the file path, but I have the contents of the file as a String, so I tried to create a file using File f = new File(myString); and then pass the method the created file but it didnt work and it gave me the same error (File not Found) (it consideres the first line of the file as the path)!
how can I create a file accepted by RandomAccessFile, from my String that contains the contents of the file I want to check if it finishs by CRLF.
Hope I was clear.
public static boolean lastLineisCRLF(String filename) {
RandomAccessFile raf = null;
try {
raf = new RandomAccessFile(filename, "r");
long pos = raf.length() - 2;
if (pos < 0) return false; // too short
raf.seek(pos);
return raf.read() == '\r' && raf.read() == '\n';
} catch (IOException e) {
return false;
} finally {
if (raf != null) try {
raf.close();
} catch (IOException ignored) {
}
}
}
If you have the file contents already in memory as a string, you don't need to write it to a file again to determine if the last line is empty. Just split the contents by an end-of-line character and then trim whitespace off the last line and see if anything is left:
String fileContent = "line1\nline2\nline3\nline4\n";
// -1 limit tells split to keep empty fields
String[] fileLines = fileContent.split("\n", -1);
String lastLine = fileLines[fileLines.length - 1];
boolean lastLineIsEmpty = false;
if(lastLine.trim().isEmpty())
{
lastLineIsEmpty = true;
}
//prints true, line4 followed by carriage return but
//no line 5
System.out.println("lastLineEmpty: " + lastLineIsEmpty);

Get the offset of previous line in a file

I'm extracting data from a file line by line into a database and i can't figure out a proper way to flag lines that I've already read into my database.
I have the following code that I use to iterate through the file lines and I attempt to verify
that the line has my flag or else I try to append the flag to the file line
List<String> fileLines = new ArrayList<String>();
File logFile = new File("C:\\MyStuff\\SyslogCatchAllCopy.txt");
try {
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line = "";
String doneReadingFlag = "##";
Scanner fileScanner = new Scanner(logFile);
while ((line = raf.readLine()) != null && !line.contains(doneReading)) {
Scanner s = new Scanner(line);
String temp = "";
if (!s.hasNext(doneReadingFlag)) {
fileLines.add(line);
raf.write(doneReadingFlag.getBytes(), (int) raf.getFilePointer(),
doneReadingFlag.getBytes().length);
} else {
System.err.println("Allready Red");
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException e) {
System.out.println("Exception while reading the file ");
}
// return fileLines;
// MoreProccessing(fileLines);
This code appends the flag to the next line and it overwrites the characters in that position
Any Help ?
When you write to a file, it doesn't insert do you should expect it to replace the characters.
You need to reserve space in the file for information you want to change or you can add information to another file.
Or instead of marking each file, you can store somewhere the lines number (or better the character position) you have read up to.
If you are not restarting your process you can have process read the file as it is appended (meaning you might not need to store where you are up to anywhere)
#Peter Lawrey I did as you said and it worked for me like that:
as follows:
ArrayList<String> fileLines=new ArrayList<String>();
File logFile=new File("C:\\MyStuff\\MyFile.txt");
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line="";
String doneReadingFlag="#";
long oldOffset=raf.getFilePointer();
long newOffset=oldOffset;
while ((line=raf.readLine())!=null)
{
newOffset=raf.getFilePointer();
if(!line.contains(doneReadingFlag))
{
fileLines.add(line);
raf.seek((long)oldOffset);
raf.writeChars(doneReadingFlag);
raf.seek(newOffset);
System.out.println("Line added and flaged");
}
else
{
System.err.println("Already Red");
}
oldOffset=newOffset;
}

Categories