Java http - read large file - java

I am reading a big file in a Java program, using http access. I read the stream, and then I apply some criteria. Would it be possible to apply the criteria on the read stream, so I will have a light result (I'm reading big files)?
Here is my code for reading the file:
public String getMyFileContent(URLConnection uc){
String myresult = null;
try {
InputStream is = uc.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
int numCharsRead;
char[] charArray = new char[1024];
StringBuffer sb = new StringBuffer();
while ((numCharsRead = isr.read(charArray)) > 0) {
sb.append(charArray, 0, numCharsRead);
}
myresult = sb.toString();
}
catch (MalformedURLException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return result;
}
And in another method, I then apply the criteria (to parse the content).
I couldn't achieve to do like this:
public String getMyFileContent(URLConnection uc){
String myresult = null;
try {
InputStream is = uc.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
int numCharsRead;
char[] charArray = new char[1024];
StringBuffer sb = new StringBuffer();
while ((numCharsRead = isr.read(charArray)) > 0) {
sb.append(charArray, 0, numCharsRead);
//Apply my criteria here on the stream ??? Is it possible ???
}
myresult = sb.toString();
}
catch (MalformedURLException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
return myresult;
}

The template I would use is
InputStreamReader isr = new InputStreamReader(uc.getInputStream());
int numCharsRead;
char[] charArray = new char[1024];
while ((numCharsRead = isr.read(charArray)) > 0) {
//Apply my criteria here on the stream
}
however since it is text, this might be more useful
InputStream is = uc.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String line;
while ((line = br.readLine()) != null) {
//Apply my criteria here on each line
}

Related

I'm confused about PDF files

If I have my code this way round the PDF say's its invalid and cannot be opened, but if I swap them and have B before A it works fine? why is this and what would I have to do to get it working? TIA
InputStream in = new BufferedInputStream(conn.getInputStream());
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
//A
String line = "";
StringBuilder builder = new StringBuilder();
try {
while ((line = reader.readLine()) != null) {
builder.append(line);
}
} catch (IOException e) {
e.printStackTrace();
}
//B
File directory = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS);
File outputFile = new File(directory, "goo.pdf");
FileOutputStream fos = null;
try {
fos = new FileOutputStream(outputFile);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
byte[] buffer = new byte[1024];
int len1 = 0;//init length
while (true) {
try {
if (!((len1 = in.read(buffer)) != -1)) break;
} catch (IOException e) {
e.printStackTrace();
}
try {
fos.write(buffer, 0, len1);
} catch (IOException e) {
e.printStackTrace();
}
}
An InputStream can only be read once.
In 'A', the stream is read and the contents are put in a StringBuilder.
In 'B', the stream (now empty) is read and piped to a file.
By having A first, the output file will always be empty.
Simply remove A as it's not doing anything for you here.

How do I display UTF-8 characters in google app engine's logs?

I'm currently using Java and I print my string using System.out.println(myString);
However, when I look at my server logs on the google app engine dashboard, I see a bunch of question marks (???) in place of where special characters (in my particular case, emoticons) would be.
The string is obtained directly from the payload of the request.
The payload of the request is being read as:
StringBuilder stringBuilder = new StringBuilder();
BufferedReader bufferedReader = null;
try {
InputStream inputStream = request.getInputStream();
if (inputStream != null) {
bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
char[] charBuffer = new char[128];
int bytesRead = -1;
while ((bytesRead = bufferedReader.read(charBuffer)) > 0) {
stringBuilder.append(charBuffer, 0, bytesRead);
}
} else {
stringBuilder.append("");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (bufferedReader != null) {
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
String body = stringBuilder.toString();

java.io.IOException: Error: End-of-File, expected line Issue with PDFBox

I am trying to read the PDF text from the PDF which is opened in the browser.
After clicking on a button 'Print' the below URL opens up in the new tab.
https://myappurl.com/employees/2Jb_rpRC710XGvs8xHSOmHE9_LGkL97j/details/listprint.pdf?ids%5B%5D=2Jb_rpRC711lmIvMaBdxnzJj_ZfipcXW
I have executed the same program with other web URLs and found to be working fine. I have used the same code that is used here (Extract PDF text).
And i am using the below versions of PDFBox.
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.8.9</version>
</dependency>
<dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>1.8.9</version>
</dependency>
Below is the code that is working fine with other URLS :
public boolean verifyPDFContent(String strURL, String reqTextInPDF) {
boolean flag = false;
PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
String parsedText = null;
try {
URL url = new URL(strURL);
BufferedInputStream file = new BufferedInputStream(url.openStream());
PDFParser parser = new PDFParser(file);
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(1);
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
} catch (MalformedURLException e2) {
System.err.println("URL string could not be parsed "+e2.getMessage());
} catch (IOException e) {
System.err.println("Unable to open PDF Parser. " + e.getMessage());
try {
if (cosDoc != null)
cosDoc.close();
if (pdDoc != null)
pdDoc.close();
} catch (Exception e1) {
e.printStackTrace();
}
}
System.out.println("+++++++++++++++++");
System.out.println(parsedText);
System.out.println("+++++++++++++++++");
if(parsedText.contains(reqTextInPDF)) {
flag=true;
}
return flag;
}
And The below is the Stacktrace of the exception that im getting
java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1517)
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:372)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
at com.kareo.utils.PDFManager.getPDFContent(PDFManager.java:26)
Updating the image which i took when debugging at URL and File.
Please help me out. Is this something with 'https'???
We all know that file stream is like a pipe. Once the data flows past, it cannot be used again. so you can:
1.Convert input stream to file.
public void useInputStreamTwiceBySaveToDisk(InputStream inputStream) {
String desPath = "test001.bin";
try (BufferedInputStream is = new BufferedInputStream(inputStream);
BufferedOutputStream os = new BufferedOutputStream(new FileOutputStream(desPath))) {
int len;
byte[] buffer = new byte[1024];
while ((len = is.read(buffer)) != -1) {
os.write(buffer, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
File file = new File(desPath);
StringBuilder sb = new StringBuilder();
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file))) {
int len;
byte[] buffer = new byte[1024];
while ((len = is.read(buffer)) != -1) {
sb.append(new String(buffer, 0, len));
}
System.out.println(sb.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
2.Convert input stream to data.
public void useInputStreamTwiceSaveToByteArrayOutputStream(InputStream inputStream) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
byte[] buffer = new byte[1024];
int len;
while ((len = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
// first read InputStream
InputStream inputStream1 = new ByteArrayInputStream(outputStream.toByteArray());
printInputStreamData(inputStream1);
// second read InputStream
InputStream inputStream2 = new ByteArrayInputStream(outputStream.toByteArray());
printInputStreamData(inputStream2);
}
3.Marking and resetting with input stream.
public void useInputStreamTwiceByUseMarkAndReset(InputStream inputStream) {
StringBuilder sb = new StringBuilder();
try (BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream, 10)) {
byte[] buffer = new byte[1024];
//Call the mark method to mark
//The number of bytes allowed to be read by the flag set here after reset is the maximum value of an integer
bufferedInputStream.mark(bufferedInputStream.available() + 1);
int len;
while ((len = bufferedInputStream.read(buffer)) != -1) {
sb.append(new String(buffer, 0, len));
}
System.out.println(sb.toString());
// After the first call, explicitly call the reset method to reset the flow
bufferedInputStream.reset();
// Read the second stream
sb = new StringBuilder();
int len1;
while ((len1 = bufferedInputStream.read(buffer)) != -1) {
sb.append(new String(buffer, 0, len1));
}
System.out.println(sb.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
then you can repeat the read operation for the same input stream many times.

Arabic RSS can't be read in java it gives me symbols

I'm trying to read RSS feed from a URL in java but I don't get Arabic output just bunch of symbols, below is a sample code, it works with English but does not with Arabic...
I tried couple of examples from web and could not solve it.
public static void main(String[] args) {
try {
URL cali = new URL(
"http://services.explorecalifornia.org/rss/tours.php");
URL aljazera = new URL(
"http://www.aljazeera.net/aljazeerarss/3c66e3fb-a5e0-4790-91be-ddb05ec17198/4e9f594a-03af-4696-ab98-880c58cd6718");
InputStream stream = aljazera.openStream();
BufferedInputStream buf = new BufferedInputStream(stream);
StringBuilder sb = new StringBuilder();
while (true) {
int data = buf.read();
if (data == -1) {
break;
} else {
sb.append((char) data);
}
}
System.out.println(sb);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Try to specify the encoding type of the InputStream:
InputStreamReader isr = new InputStreamReader(stream, "UTF-8");
StringBuilder sb = new StringBuilder();
while (true)
{
int data = isr.read();
if (data == -1) {
break;
} else {
sb.append((char) data);
}
}

How to parse xml from NOT resource file

My app works with data and saves it in the file [root]/data/data/appName/files/list.xml
I know how to parse the XML, like this:
XmlResourceParser parser = getResources().getXml(R.xml.list);
but because I havea file not in res dir, I need to find another way.
I know how to get my file as a string, like this:
FileInputStream fIn = openFileInput("samplefile.txt");
InputStreamReader isr = new InputStreamReader(fIn);
char[] inputBuffer = new char[TESTSTRING.length()];
isr.read(inputBuffer);
String readString = new String(inputBuffer);
It is important to be able to specify the name of file.
Also, when I save file with:
FileOutputStream fOut = openFileOutput("list1.xml", MODE_WORLD_READABLE);
The compiler shows: "MODE_WORLD_READABLE" because
"This constant was deprecated in API level 17".
But it works. What does it mean for me?
Read Xml File From Path-
public boolean ReadXmlFile(String filePath)
{
try {
String Data="";
File fIN = new File(filePath);
if (fIN.exists())
{
StringBuffer fileData = new StringBuffer(1000);
BufferedReader reader = new BufferedReader(
new FileReader(filePath));
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
Data= fileData.toString();
}
else
{
return false;
}
docData = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try
{
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(Data));
docData = db.parse(is);
} catch (ParserConfigurationException e) {
return false;
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
}
return true;
} catch (Exception e) {
return false;
}
}

Categories