How can I write my exported XML to an XML File? - java

My job is to search for a book on WikiBooks and export the corresponding XML file. This file should be edited later. However, the error occurs earlier.
My idea was to read the page and write it line by line to an XML file. Here is my code for it:
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Welches Buch wollen Sie suchen?");
book = (reader.readLine());
book = replaceSpace(book);
URL url = new URL("https://de.wikibooks.org/wiki/Spezial:Exportieren/" + book);
URLConnection uc = url.openConnection();
uc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
uc.connect();
BufferedReader xmlReader = new BufferedReader(new InputStreamReader(uc.getInputStream()));
File file = new File("wiki.xml");
FileWriter writer = new FileWriter(file);
String inputLine;
while ((inputLine = xmlReader.readLine()) != null) {
writer.write(inputLine + "\n");
}
xmlReader.close();
The code is executed without an error message, but the saved file ends in the middle of a word and is therefore incomplete.
How can I work around this problem?

As the comment suggested the problem is that the content of the stream is not flushed to the file. If you call close() on your writer the content is automatically flushed to the file.
Here is your code with the added statement in the end:
BufferedReader xmlReader = new BufferedReader(new InputStreamReader(uc.getInputStream()));
File file = new File("wiki.xml");
FileWriter writer = new FileWriter(file);
String inputLine;
while ((inputLine = xmlReader.readLine()) != null) {
writer.write(inputLine + "\n");
}
writer.close();
xmlReader.close();
A much easier solution that is built into Java is using the Files class. My suggestion is that you replace the above code with the following simple statement, which directly stores your InputStream into a file and automatically takes care of the streams.
Files.copy(uc.getInputStream(), Paths.get("wiki.xml"), StandardCopyOption.REPLACE_EXISTING);

Related

Having trouble reading in content of url using InputStream

So I run the code below and it prints "!DOCTYPE html". How do I get the content of the url, like the html for instance?
public static void main(String[] args) throws IOException {
URL u = new URL("https://www.whitehouse.gov/");
InputStream ins = u.openStream();
InputStreamReader isr = new InputStreamReader(ins);
BufferedReader websiteText = new BufferedReader(isr);
System.out.println(websiteText.readLine());
}
According to java doc https://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html: "When you run the program, you should see, scrolling by in your command window, the HTML commands and textual content from the HTML file located at ".... Why am I not getting that?
In your program, your did not put while loop.
URL u = new URL("https://www.whitehouse.gov/");
InputStream ins = u.openStream();
InputStreamReader isr = new InputStreamReader(ins);
BufferedReader websiteText = new BufferedReader(isr);
String inputLine;
while ((inputLine = websiteText.readLine()) != null){
System.out.println(inputLine);
}
websiteText.close();
You are only reading one line of the text.
Try this and you will see that you get two lines:
System.out.println(websiteText.readLine());
System.out.println(websiteText.readLine());
Try reading it in a loop to get all the text.
BufferedReader has a method called #lines() since Java 8. The return type of #lines() is Stream. To read an entire site you could do something like that:
String htmlText = websiteText.lines()
.reduce("", (text, nextLine) -> text + "\n" + nextLine)
.orElse(null);

Java Writer stopping after single write

I'm looking to read the contents of a URL and write them to file, this is working as expected but it's only writing it a single time even though the program console shows multiple lines.
Code:
PrintWriter writer = new PrintWriter("the-file-name.txt", "UTF-8");
while(true) {
URL oracle = new URL("https://linkToData.com");
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
writer.println(inputLine);
System.out.println(inputLine);
}
writer.close();
The data in the URL refreshes constantly so there should be different data each time as the console print shows but it's only writing the first instance to file.
The key is writer.close()! If you want to write the file anew every time, you have to reopen the Writer each time.
If you want to append file each time you have to flush writer instead of closing and close at the end.
PrintWriter writer = new PrintWriter("the-file-name.txt", "UTF-8");
while(condition) {
URL oracle = new URL("https://linkToData.com");
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
writer.println(inputLine);
System.out.println(inputLine);
}
writer.flush();
}
writer.close();

How to get whole data from Solr

I have to write some logic in Java which should retrieve all the index data from Solr.
As of now I am doing it like this
String confSolrUrl = "http://localhost/solr/master/select?q=*%3A*&wt=json&indent=true"
LOG.info(confSolrUrl);
url = new URL(confSolrUrl);
URLConnection conn = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String inputLine;
//save to this filename
String fileName = "/qwertyuiop.html";
File file = new File(fileName);
if (!file.exists())
{
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
while ((inputLine = br.readLine()) != null) {
bw.write(inputLine);
}
bw.close();
br.close();
System.out.println("Done");
In my file I will get the whole HTML file that I can parse and extract my JSON.
Is there any better way to do it?
Instead of get the resource from the url and parse it?
I just wrote an application to do this, take a look at github: https://github.com/freedev/solr-import-export-json
If you want read all data from a solr collection the first problem you're facing is the pagination, in this case we are talking of deep paging.
A direct http request like you did will return a relative short amount of documents. And you can even have millions or billions of documents in a solr collection.
So you should use the correct API, i.e. Solrj.
In my project I just did it.
I would also suggest this reading:
https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

reading remote csv file without downloading

I have requirement to read remote big csv file line by line (basically streaming). After each read I want to persist record in db. Currently I am achieving it through below code but I am not sure if it download complete file and keep it in jvm memory. I assume it is not. Can I write this code in better way using some java 8 stream features
URL url = new URL(baseurl);
HttpURLConnection urlConnection = url.openConnection();
if(connection.getResponseCode() == 200)
{
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String current;
while((current = in.readLine()) != null)
{
persist(current);
}
}
First you should use a try-with-resources statement to automatically close your streams when reading is done.
Next BufferedReader has a method BufferedReader::lines which returns a Stream<String>.
Then your code should look like this:
URL url = new URL(baseurl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
if (connection.getResponseCode() == 200) {
try (InputStreamReader streamReader = new InputStreamReader(connection.getInputStream());
BufferedReader br = new BufferedReader(streamReader);
Stream<String> lines = br.lines()) {
lines.forEach(s -> persist(s)); //should be a method reference
}
}
Now it's up to you to decide if the code is better and your assumption is right that you don't keep the whole file in the JVM.

Character looks like "?" at Reading the Content of an Uploaded File

I have a client that uploads a vcf file, and I get this file at server side and reads it contents and saves them to a txt file. But there is a character error when I try read it, if there is turkish characters it looks like "?". My read code is here:
FileItemStream item = null;
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterator = upload.getItemIterator(request);
String encoding = null;
while (iterator.hasNext()) {
item = iterator.next();
if ("fileUpload".equals(item.getFieldName())) {
InputStreamReader isr = new InputStreamReader(item.openStream(), "UTF-8");
String str = "";
String temp="";
BufferedReader br = new BufferedReader(isr);
while((temp=br.readLine()) != null){
str +=temp;
}
br.close();
File f = new File("C:/sedat.txt");
BufferedWriter buf = new BufferedWriter(new FileWriter(f));
buf.write(str);
buf.close();
}
BufferedWriter buf = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f), "UTF-8"));
If this is production code, i would recommend writing the output straight to the file and not accumulating it in the string first. And, you could avoid any potential encoding issues by reading the source as an InputStream and writing as an OutputStream (and skipping the conversion to characters).

Categories