I want to read a text file and store its contents in an array where each element of the array holds up to 500 characters from the file (i.e. keep reading 500 characters at a time until there are no more characters to read).
I'm having trouble doing this because I'm having trouble understanding the difference between all of the different ways to do IO in Java and I can't find any that performs the task I want.
And will I need to use an array list since I don't initially know how many items are in the array?
It would be hard to avoid using ArrayList or something similar. If you know the file is ASCII, you could do
int partSize = 500;
File f = new File("file.txt");
String[] parts = new String[(f.length() + partSize - 1) / partSize];
But if the file uses a variable-width encoding like UTF-8, this won't work. This code will do the job.
static String[] readFileInParts(String fname) throws IOException {
int partSize = 500;
FileReader fr = new FileReader(fname);
List<String> parts = new ArrayList<String>();
char[] buf = new char[partSize];
int pos = 0;
for (;;) {
int nRead = fr.read(buf, pos, partSize - pos);
if (nRead == -1) {
if (pos > 0)
parts.add(new String(buf, 0, pos));
break;
}
pos += nRead;
if (pos == partSize) {
parts.add(new String(buf));
pos = 0;
}
}
return parts.toArray(new String[parts.size()]);
}
Note that FileReader uses the platform default encoding. To specify a specific encoding, replace it with new InputStreamReader(new FileInputStream(fname), charSet). It bit ugly, but that's the best way to do it.
An ArrayList will definitely be more suitable as you don't know how many elements you're going to have.
There are many ways to read a file, but as you want to keep the count of characters to get 500 of them, you could use the read() method of the Reader object that will read character by character. Once you collected the 500 characters you need (in a String I guess), just add it to your ArrayList (all of that in a loop of course).
The Reader object needs to be initialized with an object that extends Reader, like an InputStreamReader (this one take an implementation of an InputStream as parameter, a FileInputStream when working with a file as input).
Not sure if this will work, but you might want to try something like this (Caution: untested code):
private void doStuff() {
ArrayList<String> stringList = new ArrayList<String>();
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("file.txt"));
String str;
int count = 0;
while ((str = in.readLine()) != null) {
String temp = "";
for (int i = 0; i <= str.length(); i++) {
temp += str.charAt(i);
count++;
if(count>500) {
stringList.add(temp);
temp = "";
count = 0;
}
}
if(count>500) {
stringList.add(temp);
temp = "";
count = 0;
}
}
} catch (IOException e) {
// handle
} finally {
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Related
I typed 3 names in the file, and I wanted to write a code to count how many times each name was repeated (Example: Alex was repeated in the file 3 times..and so on). The code I wrote only counted each name once, and this is wrong because the names were repeated more than once. Can you help me with the part that could be the cause of this problem?
public class MainClass {
public static void readFile() throws IOException {
//File file;
FileWriter writer=null;
String name, line;
List <String> list = new ArrayList <>();
int countM = 0, countAl = 0, countAh = 0;
try
{
File file = new File("\\Users\\Admin\\Desktop\\namesList.txt");
Scanner scan = new Scanner(file);
while(scan.hasNextLine()) {
line = scan.nextLine();
list.add(line);
}
for (int i=0; i<list.size(); i++)
{
name=list.get(i);
if (name.equals("Ali"))
{
countAl= +1;
}
if (name.equals("Ahmed"))
{
countAh= +1;
}
if (name.equals("Muhammad"))
{
countM = +1;
}
}
Collections.sort(list);
writer = new FileWriter("\\Users\\Admin\\Desktop\\newNameList");
for(int i=0; i<list.size(); i++)
{
name = list.get(i);
writer.write(name +"\n");
}
writer.close();
System.out.println("How many times is the name (Ali) in the file? " + countAl);
System.out.println("How many times is the name (Ahmed) in the file? " + countAh);
System.out.println("How many times is the name (Muhammad) in the file? " + countM);
}
catch(IOException e) {
System.out.println(e.toString());
}
}
public static void main(String[] args) throws IOException {
readFile();
}
}
You an do this much simpler:
//Open a reader, this is autoclosed so you don't need to worry about closing it
try (BufferedReader reader = new BufferedReader(new FileReader("path to file"))) {
//Create a map to hold the counts
Map<String, Integer> nameCountMap = new HashMap<>();
//read all of the names, this assumes 1 name per line
for (String name = reader.readLine(); name != null; name = reader.readLine()) {
//merge the value into the count map
nameCountMap.merge(name, 1, (o, n) -> o+n);
}
//Print out the map
System.out.println(nameCountMap);
} catch (IOException e) {
e.printStackTrace();
}
try:
for (int i=0; i<list.size(); i++)
{
name=list.get(i);
if (name.equals("Ali"))
{
countAl += 1;
}
if (name.equals("Ahmed"))
{
countAh += 1;
}
if (name.equals("Muhammad"))
{
countM += 1;
}
}
This works with me.
+= is not same =+
You need to process each line bearing in mind that the file may be very large in some cases. Better safe than sorry. You need to consider a solution that does not take up so much resources.
Streaming Through the File
I'm going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(file_path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory:
Streaming With Apache Commons IO
The same can be achieved using the Commons IO library as well, by using the custom LineIterator provided by the library:
LineIterator it = FileUtils.lineIterator(your_file, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
LineIterator.closeQuietly(it);
}
Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers.
BufferedReader
try (BufferedReader br = Files.newBufferedReader(Paths.get("file_name"), StandardCharsets.UTF_8)) {
for (String line = null; (line = br.readLine()) != null;) {
// Do something with the line
}
}
ByteBuffer
try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
ByteBuffer bb = ByteBuffer.allocateDirect(1000);
for(;;) {
StringBuilder line = new StringBuilder();
int n = ch.read(bb);
// Do something with the line
}
}
The above examples will process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.
I have this part of code. I can read all lines from the code. But I want take (read) every letter separately and put it into array. How can I do it?
For Example: In file are numbers 00010 and I want put it into array like this: array[0,0,0,1,0]
public void readTest()
{
try
{
InputStream is = getResources().getAssets().open("test.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String st = "";
StringBuilder sb = new StringBuilder();
while ((st=br.readLine())!=null)
{
sb.append(st);
}
br.close();
}catch (IOException e)
{
Log.d(TAG, "Error: " + e);
}
}
Use br.read(). It returns the character as integer
ArrayList<char> charArray = new ArrayList<>();
int i;
while ((i = br.read()) != -1) {
char c = (char) i;
charArray.add(c);
}
Straight from the JavaDoc:
public int read()
throws IOException -
Reads a single character.
You should add read every string and add it's letters to array by iterating through it, like this:
while ((st=br.readLine())!=null) {
sb.append(st);
for (int i = 0; i < st.length(); i++) {
char c = st.charAt(i);
yourArray.add(c);
}
}
I have a text file (XML created with XStream) which is 63000 lines (3.5 MB) long. I'm trying to read it using Buffered reader:
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String s = "";
String tempString;
int i = 0;
while ((tempString = br.readLine()) != null) {
s = s.concat(tempString);
// s=s+tempString;
i = i + 1;
if (i % 1000 == 0) {
System.out.println(Integer.toString(i));
}
}
br.close();
Here you can see my attempts to measure reading speed. And it's very low. It takes seconds to read 1000 lines after 10000 line. I'm clearly doing something wrong, but can't understand what. Thanks in advance for your help.
#PaulGrime is right. You are copying the string each time the loop reads a line. Once the string gets big (say 10,000 lines big), it is doing a lot of work to do that copying.
Try this:
StringBuilder sb = new StringBuilder();
while (...reading lines..){
....
sb.append(tempString); //should add newline
...
}
s = sb.toString();
Note: read Paul's answer below on why stripping newlines makes this a bad way to read in a file. Also, as mentioned in the question comments, XStream provides a way to read the file and even if it had not, IOUtils.toString(reader) would be a safer way to read a file.
Some immediate improvements you can do:
Use a StringBuilder instead of concat and +. Using + and concat can really affect the performance, specially when used in loops.
Reduce the access to the disk. You can do it by using a large buffer:
BufferedReader br = new BufferedReader(new FileReader("someFile.txt"), SIZE);
You should use a StringBuilder as String concatenation is extremely slow for even small strings.
Further, try using NIO rather than a BufferedReader.
public static void main(String[] args) throws IOException {
final File file = //some file
try (final FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel()) {
final StringBuilder stringBuilder = new StringBuilder();
final ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
final CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
while (fileChannel.read(byteBuffer) > 0) {
byteBuffer.flip();
stringBuilder.append(charsetDecoder.decode(byteBuffer));
byteBuffer.clear();
}
}
}
You can tune the buffer size if it's still too slow - it's heavily system dependent what buffer size works better. For me it makes very little difference if the buffer is 1K or 4K but on other systems I have know that change to increase speed by an order of magnitude.
In addition to what has already been said, depending on your use of the XML, your code is potentially incorrect as it discards line endings. For example, this code:
package temp.stackoverflow.q15849706;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import com.thoughtworks.xstream.XStream;
public class ReadXmlLines {
public String read1(BufferedReader br) throws IOException {
try {
String s = "";
String tempString;
int i = 0;
while ((tempString = br.readLine()) != null) {
s = s.concat(tempString);
// s=s+tempString;
i = i + 1;
if (i % 1000 == 0) {
System.out.println(Integer.toString(i));
}
}
return s;
} finally {
br.close();
}
}
public static void main(String[] args) throws IOException {
ReadXmlLines r = new ReadXmlLines();
URL url = ReadXmlLines.class.getResource("xml.xml");
String xmlStr = r.read1(new BufferedReader(new InputStreamReader(url
.openStream())));
Object ob = null;
XStream xs = new XStream();
xs.alias("root", Root.class);
// This is incorrectly read/parsed, as the line endings are not
// preserved.
System.out.println("----------1");
System.out.println(xmlStr);
ob = xs.fromXML(xmlStr);
System.out.println(ob);
// This is correctly read/parsed, when passing in the URL directly
ob = xs.fromXML(url);
System.out.println("----------2");
System.out.println(ob);
// This is correctly read/parsed, when passing in the InputStream
// directly
ob = xs.fromXML(url.openStream());
System.out.println("----------3");
System.out.println(ob);
}
public static class Root {
public String script;
public String toString() {
return script;
}
}
}
and this xml.xml file on the classpath (in the same package as the class):
<root>
<script>
<![CDATA[
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
]]>
</script>
</root>
produces the following output. The first two lines shows the line endings have been removed, and thus made the Javascript in the CDATA section invalid (as the first JS comment now comments out the whole JS, because the JS lines have been merged).
----------1
<root> <script><![CDATA[// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then { return 1; }else { return 0; }}]]> </script></root>
// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then { return 1; }else { return 0; }}
----------2
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
...
private void readIncomingMessage() {
try {
StringBuilder builder = new StringBuilder();
InputStream is = socket.getInputStream();
int length = 1024;
byte[] array = new byte[length];
int n = 0;
while ((n = is.read(array, n, 100)) != -1) {
builder.append(new String(array));
if (checkIfComplete(builder.toString())) {
buildListItems(builder.toString(), null);
builder = new StringBuilder();
}
}
} catch (IOException e) {
Log.e("TCPclient", "Something went wrong while reading the socket");
}
}
Hi,
I want to read the stream per block of 100 bytes, convert those bytes into a string and than see if that strings fits certain conditions.
But when I debug I see that builder has a count of 3072.
And I see a string like (text, , , , , , , , , , text , , , , , , , , , text)
How can I just add the text to the stringbuilder?
thx :)
private void readIncomingMessage() {
try {
StringBuilder builder = new StringBuilder();
InputStream is = socket.getInputStream();
int length = 100;
byte[] array = new byte[length];
int n = 0;
while ((n = is.read(array, 0, length)) != -1) {
builder.append(new String(array, 0, n));
if (checkIfComplete(builder.toString())) {
buildListItems(builder.toString(), null);
builder = new StringBuilder();
}
}
} catch (IOException e) {
Log.e("TCPclient", "Something went wrong while reading the socket");
}
}
this solution did the trick for me.
any drawbacks with this solution?
2 problems:
you need to use the 'n' value when converting the bytes to a String. Specifically, use this String constructor String(byte[] bytes, int offset, int length)
when converting bytes to strings on arbitrary boundaries, like you are doing, you have the potential to corrupt multi-byte characters. You'd be better off putting an InputStreamReader on top if the 'is' and reading characters from that.
For more information read the documentation for read(byte[], int, int), new String(byte[]) and new String(byte[], int, int)
n will hold the number of bytes read in the last read operation - not the total number of bytes read. If you only want to read up to 100 bytes at a time, there is no need for a byte array of size 1024, 100 will do. When you create a String from a byte array, it uses the entire array (even if only half was able to be filled by reading), unless you tell it which parts of the array you want to use. Something like this should work, but there are still improvements you could make:
private void readIncomingMessage() {
try {
StringBuilder builder = new StringBuilder();
InputStream is = socket.getInputStream();
int length = 100;
byte[] array = new byte[length];
int pos = 0;
int n = 0;
while (pos != length && ((n = is.read(array, pos, length-pos)) != -1)) {
builder.append(new String(array, pos, n));
pos += n;
if (checkIfComplete(builder.toString())) {
buildListItems(builder.toString(), null);
builder = new StringBuilder();
pos = 0;
}
}
} catch (IOException e) {
Log.e("TCPclient", "Something went wrong while reading the socket");
}
}
I currently have the following code:
public class Count {
public static void countChar() throws FileNotFoundException {
Scanner scannerFile = null;
try {
scannerFile = new Scanner(new File("file"));
} catch (FileNotFoundException e) {
}
int starNumber = 0; // number of *'s
while (scannerFile.hasNext()) {
String character = scannerFile.next();
int index =0;
char star = '*';
while(index<character.length()) {
if(character.charAt(index)==star){
starNumber++;
}
index++;
}
System.out.println(starNumber);
}
}
I'm trying to find out how many times a * occurs in a textfile. For example given a text file containing
Hi * My * name *
the method should return with 3
Currently what happens is with the above example the method would return:
0
1
1
2
2
3
Thanks in advance.
Use Apache commons-io to read the file into a String
String org.apache.commons.io.FileUtils.readFileToString(File file);
And then, use Apache commons-lang to count the matches of *:
int org.apache.commons.lang.StringUtils.countMatches(String str, String sub)
Result:
int count = StringUtils.countMatches(FileUtils.readFileToString(file), "*");
http://commons.apache.org/io/
http://commons.apache.org/lang/
Everything in your method works fine, except that you print the count per line:
while (scannerFile.hasNext()) {
String character = scannerFile.next();
int index =0;
char star = '*';
while(index<character.length()) {
if(character.charAt(index)==star){
starNumber++;
}
index++;
}
/* PRINTS the result for each line!!! */
System.out.println(starNumber);
}
int countStars(String fileName) throws IOException {
FileReader fileReader = new FileReader(fileName);
char[] cbuf = new char[1];
int n = 0;
while(fileReader.read(cbuf)) {
if(cbuf[0] == '*') {
n++;
}
}
fileReader.close();
return n;
}
I would stick to the Java libraries at this point, then use other libraries (such as the commons libraries) as you become more familiar with the core Java API. This is off the top of my head, might need to be tweaked to run.
StringBuilder sb = new StringBuilder();
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String s = br.readLine();
while (s != null)
{
sb.append(s);
s = br.readLine();
}
br.close(); // this closes the underlying reader so no need for fr.close()
String fileAsStr = sb.toString();
int count = 0;
int idx = fileAsStr('*')
while (idx > -1)
{
count++;
idx = fileAsStr('*', idx+1);
}