Consider following
Code
private String url = "https://celestrak.org/NORAD/elements/resource.txt";
#Override
public Boolean crawl() {
try {
// Timeout is set to 20s
Connection connection = Jsoup.connect(url).userAgent(USER_AGENT).timeout(20 * 1000);
Document htmlDocument = connection.get();
// 200 is the HTTP OK status code
if (connection.response().statusCode() == 200) {
System.out.println("\n**Visiting** Received web page at " + url);
} else {
System.out.println("\n**Failure** Web page not recieved at " + url);
return Boolean.FALSE;
}
if (!connection.response().contentType().contains("text/plain")) {
System.out.println("**Failure** Retrieved something other than plain text");
return Boolean.FALSE;
}
System.out.println(htmlDocument.text()); // Here it print whole text file in one line
} catch (IOException ioe) {
// We were not successful in our HTTP request
System.err.println(ioe);
return Boolean.FALSE;
}
return Boolean.TRUE;
}
Output
SCD 1 1 22490U 93009B 16329.83043855 .00000228 00000-0 12801-4 0 9993 2 22490 24.9691 122.2579 0043025 337.9285 169.5838 14.44465946256021 TECHSAT 1B (GO-32) 1 25397U ....
I am trying to read an online-txt file (from https://celestrak.org/NORAD/elements/resource.txt). Problem is that while I print or save the body's text it prints whole online-txt file in one line. But I want to read it as splited by \n so that I can read it line by line. Am I making mistake while reading online-txt file?
I am using JSoup.
you can do it without using jsoup in the following manner:
public static void main(String[] args) {
String data;
try {
data = IOUtils.toString(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
for (String line : data.split("\n")) {
System.out.println(line);
}
} catch (IOException e1) {
e1.printStackTrace();
}
}
the above code uses org.apache.commons.io.IOUtils
if adding the commons library is a issue you can use the below code:
public static void main(String[] args) {
URLReader reader;
try {
reader = new URLReader(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
BufferedReader bufferedReader = new BufferedReader(reader);
String sCurrentLine;
while ((sCurrentLine = bufferedReader.readLine()) != null) {
System.out.println(sCurrentLine);
}
bufferedReader.close();
} catch (MalformedURLException e1) {
e1.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Since the file is already delimited by line separator, we can simple take the input stream from URL to read the contents
String url = "https://celestrak.com/NORAD/elements/resource.txt";
List<String> text = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines().collect(Collectors.toList());
To convert to a String
String content = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines()
.collect(Collectors.joining(System.getProperty("line.separator")));
Related
What I want to do is to open an internet page in my browser (chrome) and get the html source code of the page just opened with my java application.
I don't want to get the source code of an url, I want a program that connects to the browser and gets the html code of the page that is open.
For example, if I open youtube in my browser, I want my application to get the current pages html code (in that case youtube code). Sorry if my english is not very good.
You can do this:
import java.util.*;
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
URL url;
InputStream is = null;
BufferedReader br;
String line;
try {
String urlInput = input.nextLine();
url = new URL(urlInput);
is = url.openStream(); // throws an IOException
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (is != null) is.close();
} catch (IOException ioe) {
// nothing to see here
}
}
}
I got this from here: How do you Programmatically Download a Webpage in Java
Try this out:
You must pass in the URL as the argument and you'll have the HTML code
public static void main(String[] args) throws IOException {
URL u = null;
try {
u = new URL(args[0]);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
BufferedReader in = new BufferedReader(new InputStreamReader(u.openStream()));
String line = null;
while((line = in.readLine()) != null){
System.out.print(line);
}
}
My query is how to change how to change address in URL (http://localhost:8080/HELLO_WORLD). I change HELLO_WORLD to desire word.
#Override
public Response serve(IHTTPSession session) {
String answer = "";
BufferedReader reader = null;
try {
reader = new BufferedReader(
new InputStreamReader(appContext.getAssets().open("block.html")));
// do reading, usually loop until end of file reading
String mLine;
while ((mLine = reader.readLine()) != null) {
//process line
answer += mLine;
}
} catch (IOException e) {
//log the exception
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
//log the exception
Log.d("BABAR", "EXception occured in serve()");
}
}
}
return newFixedLengthResponse(answer);
}
please suggest me how to change
I don´t know if this is what you want, but you can try.
You have to follow the steps:
1- Create a local to store your server files;
2-Then change the response in the class that is implementing the NanoHttp server to something like this:
#Override
public Response serve(IHTTPSession session) {
String answer = "";
try{
FileReader filereader = new FileReader(contextoMain.local(localyourstorethefiles)+"/yourfolder/yourfile.html");
BufferedReader reader = new BufferedReader(filereader);
String line = "";
while ((line = reader.readLine()) != null) {
answer += line;
}
reader.close();
}catch(IOException ioe) {
Log.w("Httpd", ioe.toString());
}
return newFixedLengthResponse(answer);
}
3 - Then, call the localhost:8080 without putting the 8080/yourfolder/yourfile
I've been working on a personal app and Stack Overflow has helped a bit so far, but I've now run into another issue. I'm attempting to read a basic text file stored in my source code and output it to an alert dialog. My code does this, but the dialog does not display any of my new lines.
displayChangelogDialog method
private void displayChangelogDialog() {
Context context = this;
AssetManager am = context.getAssets();
InputStream is;
// ensure that changelog is available
try {
is = am.open("changelog");
// changelog dialog
new AlertDialog.Builder(this)
.setTitle("Changelog")
.setMessage(getStringFromInputStream(is)) // convert changelog to string
.setPositiveButton(android.R.string.yes, new DialogInterface.OnClickListener() {
public void onClick(DialogInterface dialog, int which) {
// do nothing
}
})
.show();
} catch (IOException e) {
Toast.makeText(MainActivity.this, "Error", Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
}
getStringFromInputStream method
private static String getStringFromInputStream(InputStream is) {
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return sb.toString();
}
changelog text file
v0.0.3
- Update PPS rate for recent difficulty increase
v0.0.2
- Calculate DGM based on PPS rate
I have attempted to add "\n" to the end of each line, but it does not work and the characters "\n" are simply displayed. Thanks in advance everyone.
There is an easy and hack way to read all of the inputstream into a string object which contains all you need without read line by line.
Scanner scanner = new Scanner(inputStream).useDelimiter("\\A");
String string = scanner.hasNext() ? scanner.next() : null;
scanner.close();
readLine() will read up to a linefeed, but not include the linefeed. Also, there is no reason to use a string builder here. Change to this:
String result = "";
String line;
try {
br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null) {
result += line + "\n";
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return result;
I am trying to develop a small web crawler, which downloads the web pages and search for links in a specific section. But when i am running this code, links in "href" tag are getting shortened. like :
original link : "/kids-toys-action-figures-accessories/b/ref=toys_hp_catblock_actnfigs?ie=UTF8&node=165993011&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=merchandised-search-4&pf_rd_r=267646F4BB25430BAD0D&pf_rd_t=101&pf_rd_p=1582921042&pf_rd_i=165793011"
turned into : "/kids-toys-action-figures-accessories/b?ie=UTF8&node=165993011"
can anybody help me please. below is my code :
package test;
import java.io.*;
import java.net.MalformedURLException;
import java.util.*;
public class myFirstWebCrawler {
public static void main(String[] args) {
String strTemp = "";
String dir="d:/files/";
String filename="hello.txt";
String fullname=dir+filename;
try {
URL my_url = new URL("http://www.amazon.com/s/ref=lp_165993011_ex_n_1?rh=n%3A165793011&bbn=165793011&ie=UTF8&qid=1376550433");
BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream(),"utf-8"));
createdir(dir);
while(null != (strTemp = br.readLine())){
writetofile(fullname,strTemp);
System.out.println(strTemp);
}
System.out.println("index of feature category : " + readfromfile(fullname,"Featured Categories"));
} catch (Exception ex) {
ex.printStackTrace();
}
}
public static void createdir(String dirname)
{ File d= new File(dirname);
d.mkdirs();
}
public static void writetofile(String path, String bbyte)
{
try
{
FileWriter filewriter = new FileWriter(path,true);
BufferedWriter bufferedWriter = new BufferedWriter(filewriter);
bufferedWriter.write(bbyte);
bufferedWriter.newLine();
bufferedWriter.close();
}
catch(IOException e)
{System.out.println("Error");}
}
public static int readfromfile(String path, String key)
{
String dir="d:/files/";
String filename="hello1.txt";
String fullname=dir+filename;
linksAndAt[] linksat=new linksAndAt[10];
BufferedReader bf = null;
try {
bf = new BufferedReader(new FileReader(path));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
String currentLine;
int index =-1;
try{
Runtime.getRuntime().exec("cls");
while((currentLine = bf.readLine()) != null)
{
index=currentLine.indexOf(key);
if(index>0)
{
writetofile(fullname,currentLine);
int count=0;
int lastIndex=0;
while(lastIndex != -1)
{
lastIndex=currentLine.indexOf("href=\"",lastIndex);
if(lastIndex != -1)
{
lastIndex+="href=\"".length();
StringBuilder sb = new StringBuilder();
while(currentLine.charAt(lastIndex) != '\"')
{
sb.append(Character.toString(currentLine.charAt(lastIndex)));
lastIndex++;
}
count++;
System.out.println(sb);
}
}
System.out.println("\n count : " + count);
return index;
}
}
}
catch(FileNotFoundException f)
{
f.printStackTrace();
System.out.println("Error");
}
catch(IOException e)
{try {
bf.close();
} catch (IOException e1) {
e1.printStackTrace();
}}
return index;}
}
This feels to me like a situation where the server app is responding differently to requests from your desktop browser and your Java-based crawler. That could be because your browser is passing cookies in its requests which your Java-based crawler is not (such as session-persisting cookies), or it could be because your desktop browser passes a different User-Agent header than your crawler does, or it could be because other request headers are different between your desktop browser and your Java crawler.
When writing crawling apps, this is one of the biggest issues one runs into: it's easy to forget that the same URL requested by different clients won't always respond with the same code. Not sure if that's what's happening to you here, but it's very common.
Good day. Have just switched from objective-c to java and trying to read url contents normally to string. Read tons of posts and still it gives garbage.
public class TableMain {
/**
* #param args
*/
#SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception {
URL url = null;
URLConnection urlConn = null;
try {
url = new URL("http://svo.aero/timetable/today/");
} catch (MalformedURLException err) {
err.printStackTrace();
}
try {
urlConn = url.openConnection();
} catch (IOException e) {
e.printStackTrace();
}
try {
BufferedReader input = new BufferedReader(new InputStreamReader(
urlConn.getInputStream(), "UTF-8"));
StringBuilder strB = new StringBuilder();
String str;
while (null != (str = input.readLine())) {
strB.append(str).append("\r\n");
System.out.println(str);
}
input.close();
} catch (IOException err) {
err.printStackTrace();
}
}
}
What's wrong? I get something like this
??y??'??)j1???-?q?E?|V??,??< 9??d?Bw(?э?n?v?)i?x?????Z????q?MM3~??????G??љ??l?U3"Y?]????zxxDx????t^???5???j??k??u?q?j6?^t???????W??????????~?????????o6/?|?8??{???O????0?M>Z{srs??K???XV??4Z??'??n/??^??4????w+?????e???????[?{/??,??WO???????????.?.?x???????^?rax??]?xb??& ??8;?????}???h????H5????v?e?0?????-?????g?vN
Here is a method using HttpClient:
public HttpResponse getResponse(String url) throws IOException {
httpClient.getParams().setParameter("http.protocol.content-charset", "UTF-8");
return httpClient.execute(new HttpGet(url));
}
public String getSource(String url) throws IOException {
StringBuilder sb = new StringBuilder();
HttpResponse response = getResponse(url);
if (response.getEntity() == null) {
throw new IOException("Response entity not set");
}
BufferedReader contentReader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line = contentReader.readLine();
while ( line != null ){
sb.append(line)
.append(NEW_LINE);
line = contentReader.readLine();
}
return sb.toString();
}
Edit: I edited the response to ensure it uses utf-8.
This is a result of:
You are fetching data that is UTF-8 encoded
You are didn't specify, but I surmise you are printing it to the console on a Windows system
The data is being received and stored correctly, but when you print it the destination is incapable of rendering the Russian text. You will not be able to just "print" the text to stdout unless the ultimate display handler is capable of rendering the characters involved.