Using a delimiter to extract a hyperlink - Java - java

I'm a novice at java and I'm working on a project that scans the source code of a website, and extracts all the hyperlinks contained in it.
So far I have my project working so that it scans every 'word' of the source code using a Scanner (in.next())
However Ive been told to use delimiters to extract the hyperlinks from this, but I can barely find any information out there to help me use them!
Someone couldnt help explain to me delimiters and how I could use them in this project? It would be really appreciated.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Scanner;
import java.util.ArrayList;
public class HyperlinkMain {
public static void main(String[] args) {
try {
Scanner in = new Scanner (System.in);
String URL = in.next();
URL website = new URL(URL);
Scanner inWebsite = new Scanner (website.openStream());
String inputLine;
while ((inWebsite.hasNext())) {
// Process each 'word'.
System.out.println(inWebsite.next());
}
in.close();
} catch (MalformedURLException me) {
System.out.println(me);
} catch (IOException ioe) {
System.out.println(ioe);
}
}
}

You could use Regular expression on strings. Below is an existing Stack Overflow on this topic.
How to use regular expressions to parse HTML in Java?

Related

Create a String variable from a URL using JSoup and Regex in Java?

So I am trying to make a program that retrieves the IFrame tag from a website, opens the link and downloads the video. Currently, it retrieves the IFrame tag, but I can't figure out how to ignore the actual tags. I am pretty sure I can use the .split() feature, but I don't know how to create a regex code to only pull the data from inside of the quotes. I also tried using JSoup's .html, but it just printed a blank statement. Here is what I have (It mostly split correctly, except in the URL there is "id=..." which causes it to split again):
package com.trentmenard;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Main {
public static void main(String[] args) {
Document website;
try{
website = Jsoup.connect("https://swordartonlineepisode.com/sword-art-online-season-3-episode-1-english-dubbed-watch-online/").get();
System.out.println("Website Found! Title: " + website.title());
Element videoLink = website.select("iframe").first();
System.out.println("Found Video Link: " + videoLink);
videoLink.removeAttr("width");
videoLink.removeAttr("height");
videoLink.removeAttr("scrolling");
videoLink.removeAttr("allowfullscreen");
System.out.println("Modified: " + videoLink);
String link = videoLink.toString();
String[] stringArray = link.split("=");
for(String a : stringArray){
System.out.println(a);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
Output: https://i.stack.imgur.com/ZXTiV.png
Thanks in advance!

Jama Matrix printwriter error

I am using JAMA matrix in my project. I need to write down a Jama matrix in text file. For that I write down this code.
package Xdata;
import Jama.Matrix;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
public class File_r {
public static void main(String args[]) {
Matrix A = new Matrix(10, 10);
try {
PrintWriter write1 = new PrintWriter(new File("/home/robotics//IdeaProjects/Data_arrange/src/Xdata/mu_X.txt"));
A.print(PrintWriter write1,9,6);// error in this line
}
catch(FileNotFoundException ex) {
System.out.println(ex);
}
}
}
But it throws errors:
/home/robotics/IdeaProjects/Data_arrange/src/Xdata/File_r.java
Error:(13, 32) java: ')' expected
Error:(13, 33) java: not a statement
Error:(13, 39) java: ';' expected
I wtite down this code in intellj idea. Can any one tell me why I get this error?
I did check the Jama api for Matrix.java. it looks you are trying to use the print method with three parameter in the below snippet . please re-write it correctly.
fix it as below
A.print(write1,9,6);// error in this line
I solved this problem. I think it is helpful for those who are new to Jama Matrix and face a problem like that. Here is my Solution:
package Xdata;
import Jama.Matrix;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
public class File_r {
public static void main(String args[]) {
Matrix A = new Matrix(10, 10);
PrintWriter writer=null;
try {
writer = new PrintWriter("/home/robotics//IdeaProjects/Data_arrange/src/Xdata/mu_X.txt");// So basically I change this line
A.print(writer,2,2);
writer.close();// Add this line
}
catch(FileNotFoundException ex) {
System.out.println(ex);
}
}
}
This solve my problem. As there is very less documentation of JAMA Matrix I think this is really help ful for reader.

Please help me with JUnit test cases for the code below

I want to know the JUnit test cases for the following program.please help. I have not included the main method here. Want to know the JUnit test cases for the url() method in the code. This code is to read HTML from a website and save it in a file in local machine
package Java3;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Urltohtml
{
private String str;
public void url() throws IOException
{
try
{
FileOutputStream f=new FileOutputStream("D:/File1.txt");
PrintStream p=new PrintStream(f);
URL u=new URL("http://www.google.com");
BufferedReader br=new BufferedReader(new InputStreamReader(u.openStream()));
//str=br.readLine();
while((str=br.readLine())!=null)
{
System.out.println(str+"\n");
p.println(str);
}
}
catch (MalformedURLException ex)
{
Logger.getLogger(Urltohtml.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I would rename that class to UrlToHtml and write a single JUnit test class UrlToHtmlTest.
Part of the reason why you're having problems testing this is that the class is poorly designed and implemented:
You should pass in the URL you want to scrape, not hard code it.
You should return the content as a String or List, not print it to a file.
You might want to throw that exception rather than catch it. Your logging isn't exactly "handling" the exceptional situation. Let it bubble out and have clients log if they wish.
You don't need that private data member; return the contents. That lets you make this method static.
Good names matter. I don't like what you have for the class or the method.
Why are you writing this when you could use a library to do it?
Here's what the test class might look like:
public class UrlToHtmlTest {
#Test
public void testUrlToHtml() {
try {
String testUrl = "http://www.google.com" ;
String expected = "";
String actual = UrlToHtml.url(testUrl);
Assert.assertEquals(expected, actual);
} catch (Exception e) {
e.printStackTrace();
Assert.fail();
}
}
}

Java space at start of text file

I have a program that ask the user for what application it want to open,
this is how the program works:
the user write what application it want to open in a "inputDialog" example the user write "Open application Notepad".
the program looks for the word "application" in the text file so the program is sure that it was a application the user wanted to open.
both the "open application" sentence and the application name get stored in a text file.
then does program remove "Open application" from the text file, and then is only the application name visible.
but always a space comes in front of the application name. Please help me remove the space infront of the application name!!
Here is my code:
package Test_Code;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import javax.swing.JOptionPane;
public class New_Loader_3 {
public static void main(String[]args) throws IOException{
String Test = JOptionPane.showInputDialog("Test");
BufferedWriter writer = new BufferedWriter(new FileWriter("/Applications/Userdata/tmp/Application.txt"));
writer.write(Test);
writer.close();
int tokencount;
FileReader fr=new FileReader("/Applications/Userdata/tmp/Application.txt");
BufferedReader br=new BufferedReader(fr);
String s1;
int linecount=0;
String line;
String words[]=new String[500];
while ((s1=br.readLine())!=null)
{
linecount++;
int indexfound=s1.indexOf("application");
if (indexfound>-1)
{
FileInputStream fstream1121221 = new FileInputStream("/Applications/Userdata/tmp/Application.txt");
DataInputStream in1121211 = new DataInputStream(fstream1121221);
BufferedReader br1112211 = new BufferedReader(new InputStreamReader(in1121211));
String Name12122131;
while ((Name12122131 = br1112211.readLine()) != null) {
if (Name12122131.startsWith(" "))
{
System.out.println("Name12122131");
}
}
String mega = Test.replaceAll("Open application","");
System.out.println(mega);
BufferedWriter Update_Catch = new BufferedWriter(new FileWriter("/Applications/Userdata/tmp/Application.txt"));
Update_Catch.write(mega);
Update_Catch.close();
}
}
System.out.println("Done");
}
}
It's because the user types in Open<space>application<space>Notepad. Now when you replace Open<space>Applicaton the space before Notepad is still left. So I just you use this instead:
String mega = Test.replaceAll("Open application ","");
Adding a <space> at the end of Open<space>Application will replace the space too. So now mega will be Notepad.
Otherwise you could use what you're already using and then call mega.trim()

JSoup error in data types

I have the following code that is supposed to extract data from HTML document. I used eclipse. It gives me two errors (though, this code is copied and pasted from JSoup site as a tutorial). The errors in 1) File, and 2) Elements. I can't see any problem in these two types.
import java.io.IOException;
import java.net.MalformedURLException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class TestClass
{
public static void main(String args[]) throws IOException
{
try{
File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}
}//try
catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}//catch
}
}</i>
You forgot to import them.
import java.io.File;
import org.jsoup.select.Elements;
See also:
Java tutorial - Using package members
Hint: read the "Quick Fix" options suggested by Eclipse. It's already the 1st option for File.

Categories