Special characters from txt file - java

I am downloading a text file from ftp, with common ftp library.
The problem is when i read the file into an array line by line, it doesnt take characters such as æøå. Instead it just show the "?" character.
Here is my code
FileInputStream fstream = openFileInput("name of text file");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream, "UTF-8"));
String strLine;
ArrayList<String> lines = new ArrayList<String>();
while ((strLine = br.readLine()) != null) {
lines.add(strLine);
}
String[] linjer = lines.toArray(new String[0]);
ArrayList<String> imei = new ArrayList<String>();
for(int o=0;o<linjer.length;o++)
{
String[] holder = linjer[o].split(" - ");
imei.add(holder[0] + " - " + holder[2]);
}
String[] imeinr = imei.toArray(new String[0]);
I have tried to put UTF-8 in my inputstreamreader, and i have tried with a UnicodeReader class, but with no success.
I am fairly new to Java, so might just be some stupid question, but hope you can help. :)

There is no reason to use a DataInputStream. The DataInputStream and DataOutputStream classes are used for serializing primitive Java data types ("serializing" means reading/writing data to a file). You are just reading the contents of a text file line by line, so the use of DataInputStream is unnecessary and may produce incorrect results.
FileInputStream fstream = openFileInput("name of text file");
//DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream, "UTF-8"));
Professional Java Programmer Tip: The foreach loop was recently added to the Java programming language. It allows the programmer to iterate through the contents of an array without needing to define a loop counter. This simplifies your code, making it easier to read and maintain over time.
for(String line : linjer){
String[] holder = line.split(" - ");
imei.add(holder[0] + " - " + holder[2]);
}
Note: Foreach loops can also be used with List objects.

I would suggest that the file may not be in UTF-8. It could be in CP1252 or something, especially if you're using Windows.
Try downloading the file and running your code on the local copy to see if that works.

FTP has two modes binary and ascii. Make sure you are using the correct mode. Look here for details: http://www.rhinosoft.com/newsletter/NewsL2008-03-18.asp

Related

How to split a byte array that contains multiple "lines" in Java?

Say we have a file like so:
one
two
three
(but this file got encrypted)
My crypto method returns the whole file in memory, as a byte[] type.
I know byte arrays don't have a concept of "lines", that's something a Scanner (for example) could have.
I would like to traverse each line, convert it to string and perform my operation on it but I don't know
how to:
Find lines in a byte array
Slice the original byte array to "lines" (I would convert those slices to String, to send to my other methods)
Correctly traverse a byte array, where each iteration is a new "line"
Also: do I need to consider the different OS the file might have been composed in? I know that there is some difference between new lines in Windows and Linux and I don't want my method to work only with one format.
Edit: Following some tips from answers here, I was able to write some code that gets the job done. I still wonder if this code is worthy of keeping or I am doing something that can fail in the future:
byte[] decryptedBytes = doMyCrypto(fileName, accessKey);
ByteArrayInputStream byteArrInStrm = new ByteArrayInputStream(decryptedBytes);
InputStreamReader inStrmReader = new InputStreamReader(byteArrInStrm);
BufferedReader buffReader = new BufferedReader(inStrmReader);
String delimRegex = ",";
String line;
String[] values = null;
while ((line = buffReader.readLine()) != null) {
values = line.split(delimRegex);
if (Objects.equals(values[0], tableKey)) {
return values;
}
}
System.out.println(String.format("No entry with key %s in %s", tableKey, fileName));
return values;
In particular, I was advised to explicitly set the encoding but I was unable to see exactly where?
If you want to stream this, I'd suggest:
Create a ByteArrayInputStream to wrap your array
Wrap that in an InputStreamReader to convert binary data to text - I suggest you explicitly specify the text encoding being used
Create a BufferedReader around that to read a line at a time
Then you can just use:
String line;
while ((line = bufferedReader.readLine()) != null)
{
// Do something with the line
}
BufferedReader handles line breaks from all operating systems.
So something like this:
byte[] data = ...;
ByteArrayInputStream stream = new ByteArrayInputStream(data);
InputStreamReader streamReader = new InputStreamReader(stream, StandardCharsets.UTF_8);
BufferedReader bufferedReader = new BufferedReader(streamReader);
String line;
while ((line = bufferedReader.readLine()) != null)
{
System.out.println(line);
}
Note that in general you'd want to use try-with-resources blocks for the streams and readers - but it doesn't matter in this case, because it's just in memory.
As Scott states i would like to see what you came up with so we can help you alter it to fit your needs.
Regarding your last comment about the OS; if you want to support multiple file types you should consider making several functions that support those different file extensions. As far as i know you do need to specify which file and what type of file you are reading with your code.

Java - open txt file and clear all multiple spaces

I have a txt file and what I am trying to do is open it and delete all multiple spaces so they become only one. I use:
br = new BufferedReader(new FileReader("C:\\Users\\Chris\\Desktop\\file_two.txt"));
bw = new BufferedWriter(new FileWriter("C:\\Users\\Chris\\Desktop\\file_two.txt"));
while ((current_line = br.readLine()) != null) {
//System.out.println("Here.");
current_line = current_line.replaceAll("\\s+", " ");
bw.write(current_line);
}
br.close();
bw.close();
However, as it seems correct according to me at least, nothing is written on the file. If I use a system.out.println command, it is not printed, meaning that execution is never in the while loop... What do I do wrong? Thanks
you are reading the file and at the same time writing contents on it..it is not allowed...
so better way to read the file first and store the processed text in another file and finally replace the original file with the new one..try this
br = new BufferedReader(new FileReader("C:\\Users\\Chris\\Desktop\\file_two.txt"));
bw = new BufferedWriter(new FileWriter("C:\\Users\\Chris\\Desktop\\file_two_copy.txt"));
String current_line;
while ((current_line = br.readLine()) != null) {
//System.out.println("Here.");
current_line = current_line.replaceAll("\\s+", " ");
bw.write(current_line);
bw.newLine();
}
br.close();
bw.close();
File copyFile = new File("C:\\Users\\Chris\\Desktop\\file_two_copy.txt");
File originalFile = new File("C:\\Users\\Chris\\Desktop\\file_two.txt");
originalFile.delete();
copyFile.renameTo(originalFile);
it may help...
There are few problems with your approach:
Main one is that you are trying to read and write to same file at the same time.
other is that new FileWriter(..) always creates new empty file which kind of prevents FileReader from reading anything from your file.
You should read content from file1 and write its modified version in file2. After that replace file1 with file2.
Your code can look more or less like
Path input = Paths.get("input.txt");
Path output = Paths.get("output.txt");
List<String> lines = Files.readAllLines(input);
lines.replaceAll(line -> line.replaceAll("\\s+", " "));
Files.write(output, lines);
Files.move(output, input, StandardCopyOption.REPLACE_EXISTING);
You must read first then write, you are not allowed to read and write to the same file at the same time, you would need to use RandomAccessFile to do that.
If you don't want to learn a new technique, you will need to either write to a separate file, or cache all lines to memory(IE an ArrayList) but you must close the BufferedReader before you Initialize your BufferedWriter, or it will get a file access error.
Edit:
In case you want to look into it, here is a RandomAccessFile use case example for your intended use. It is worth pointing out this will only work if the final line length is less than or equal to the original, because this technique is basically overwriting the existing text, but should be very fast with a small memory overhead and would work on extremely large files:
public static void readWrite(File file) throws IOException{
RandomAccessFile raf = new RandomAccessFile(file, "rw");
String newLine = System.getProperty("line.separator");
String line = null;
int write_pos = 0;
while((line = raf.readLine()) != null){
line = line.replaceAll("\\s+", " ") + newLine;
byte[] bytes = line.getBytes();
long read_pos = raf.getFilePointer();
raf.seek(write_pos);
raf.write(bytes, 0, bytes.length);
write_pos += bytes.length;
raf.seek(read_pos);
}
raf.setLength(write_pos);
raf.close();
}

how to read file from last line to first using java

FileInputStream fstream = new FileInputStream("\\file path");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
while (br.ready()) {
line = br.readLine();
}
Please let me know how to read a file from the last line to first provided the row number is not fixed and is varying with time? I know the above is useful for reading it from first row...
This might be helpfull for you [1]: http://mattfleming.com/node/11
read the file into a list, and process that list backwards...
files and streams are usually designed to work forward; so doing this directly with streams might turn out a lite awkward. Only advised when the files are really huge...
You cannot read a Buffer backwards, you can however count the lines of your buffer as explained in the link below
http://www.java2s.com/Code/Java/File-Input-Output/Countthenumberoflinesinthebuffer.htm
And afterwards select your line using this code:
FileInputStream fs= new FileInputStream("someFile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
for(int i = 0; i < 30; ++i)
br.readLine();
String lineIWant = br.readLine();
As you can see, you iterate, reading each line(and doing nothing) before you get to the one you want (here we got 31 lines passed and #32 is the one read). If your file is huge this will take a lot of time.
Other way to to this is to input everything in a List and then with a sizeof() and a for() you can select everything you want.
If you know the length of each line then you can work out how many lines there are by looking at the size of the file and dividing by the length of each line. (this of course ignores any possible metadata in the file)
You can then use some maths to get the start byte of the last line. Once you have then you can then open a RandomAccessFile on the file and then use seek to go to that point. Then using readline you can then read the last line
This does assume though that the lines are all the same length.
You can use FileUtils
and use this method
static List<String> readLines(File file)
Reads the contents of a file line by line to a
List of Strings using the default encoding for the VM.
This will return a List then use Collections.reverse()
Then simply iterate it to get the file lines in reverse order
Just save info backwards, that's all I did.just read Pryor to save and use \n
You can save the lines in a list (in my code a arraylist) and "read" the lines backwards from the arraylist:
try
{
FileInputStream fstream = new FileInputStream("\\file path");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line = "";
ArrayList<String> lines = new ArrayList<String>();
//Read lines and save in ArrayList
while (br.ready())
{
lines.add(br.readLine());
}
//Go backwards through the ArrayList
for (int i = lines.size(); i >= 0; i--)
{
line = lines.get(i);
}
}
catch (Exception e)
{
e.printStackTrace();
}

Check line for unprintable characters while reading text file

My program must read text files - line by line.
Files in UTF-8.
I am not sure that files are correct - can contain unprintable characters.
Is possible check for it without going to byte level?
Thanks.
Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.
E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):
String line;
try (
InputStream fis = new FileInputStream("the_file_name");
InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(isr);
) {
while ((line = br.readLine()) != null) {
// Deal with the line
}
}
While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
You can then do whatever you like with those lines.
EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.
Just found out that with the Java NIO (java.nio.file.*) you can easily write:
List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8);
for(String line:lines){
System.out.println(line);
}
instead of dealing with FileInputStreams and BufferedReaders...
If you want to check a string has unprintable characters you can use a regular expression
[^\p{Print}]
How about below:
FileReader fileReader = new FileReader(new File("test.txt"));
BufferedReader br = new BufferedReader(fileReader);
String line = null;
// if no more lines the readLine() returns null
while ((line = br.readLine()) != null) {
// reading lines until the end of the file
}
Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html
I can find following ways to do.
private static final String fileName = "C:/Input.txt";
public static void main(String[] args) throws IOException {
Stream<String> lines = Files.lines(Paths.get(fileName));
lines.toArray(String[]::new);
List<String> readAllLines = Files.readAllLines(Paths.get(fileName));
readAllLines.forEach(s -> System.out.println(s));
File file = new File(fileName);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
The answer by #T.J.Crowder is Java 6 - in java 7 the valid answer is the one by #McIntosh - though its use of Charset for name for UTF -8 is discouraged:
List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"),
StandardCharsets.UTF_8);
for(String line: lines){ /* DO */ }
Reminds a lot of the Guava way posted by Skeet above - and of course same caveats apply. That is, for big files (Java 7):
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {}
If every char in the file is properly encoded in UTF-8, you won't have any problem reading it using a reader with the UTF-8 encoding. Up to you to check every char of the file and see if you consider it printable or not.

Java create strings from Buffered Reader and compare Strings

I am using Java + Selenium 1 to test a web application.
I have to read through a text file line by line using befferedreader.readLine and compare the data that was found to another String.
Is there way to assign each line a unique string? I think it would be something like this:
FileInputStream fstream = new FileInputStream("C:\\write.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
String[] strArray = null;
int p=0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
strArray[p] = strLine;
assertTrue(strArray[p].equals(someString));
p=p+1;
}
The problem with this is that you don't know how many lines there are, so you can't size your array correctly. Use a List<String> instead.
In order of decreasing importance,
You don't need to store the Strings in an array at all, as pointed out by Perception.
You don't know how many lines there are, so as pointed out by Qwerky, if you do need to store them you should use a resizeable collection like ArrayList.
DataInputStream is not needed: you can just wrap your FileInputStream directly in an InputStreamReader.
You may want to try something like:
public final static String someString = "someString";
public boolean isMyFileOk(String filename){
Scanner sc = new Scanner(filename);
boolean fileOk = true;
while(sc.hasNext() && fileOk){
String line = sc.nextLine();
fileOk = isMyLineOk(line);
}
sc.close();
return fileOk;
}
public boolean isMyLineOk(String line){
return line.equals(someString);
}
The Scanner class is usually a great class to read files :)
And as suggested, you may check one line at a time instead of loading them all in memory before processing them. This may not be an issue if your file is relatively small but you better keep your code scalable, especially for doing the exact same thing :)

Categories