Programmatically reading contents of text file stored in HDFS using Java - java

How do I run this simple Java program to read bytes from a text file stored in directory/words in HDFS? Do I need to create a jar file for the purpose?
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.hadoop.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class filesystemhdfs
{
public static void main(String args[]) throws MalformedURLException, IOException
{
byte[] b=null;
InputStream in=null;
in=new URL("hdfs://localhost/words/file").openStream();
in.read(b);
System.out.println(""+b);
for(int i=0;i<b.length;i++)
{
System.out.println("b[i]=%d"+b[i]);
System.out.println(""+(char)b[i]);
}
}
}

You can use the HDFS API, this can be run from local.:
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://namenode:8020");
FileSystem fs = FileSystem.get(configuration);
Path filePath = new Path(
"hdfs://namenode:8020/PATH");
FSDataInputStream fsDataInputStream = fs.open(filePath);

First, you need to tell the JVM about the HDFS scheme in the URLs objects. This is done via:
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
After compiling your Java class, you need to use hadoop command:
hadoop filesystemhdfs
Hadoop comes with a convenient IOUtils. It will ease a lot of stuff for you.

You can not read a file from HDFS , as a regular filesystem java supports. You need to use HDFS java API for this.
public static void main(String a[]) {
UserGroupInformation ugi
= UserGroupInformation.createRemoteUser("root");
try {
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration conf = new Configuration();
//fs.default.name should match the corresponding value
// in your core-site.xml in hadoop cluster
conf.set("fs.default.name","hdfs://hostname:9000");
conf.set("hadoop.job.ugi", "root");
readFile("words/file",conf)
return null;
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
public static void readFile(String file,Configuration conf) throws IOException {
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path(file);
if (!ifExists(path)) {
System.out.println("File " + file + " does not exists");
return;
}
FSDataInputStream in = fileSystem.open(path);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line = null;
while((line = br.readLine())!= null){
System.out.println(line);
}
in.close();
br.close();
fileSystem.close();
}
public static boolean ifExists(Path source) throws IOException {
FileSystem hdfs = FileSystem.get(conf);
boolean isExists = hdfs.exists(source);
System.out.println(isExists);
return isExists;
}
Here I am trying from a remote machine, that's why I am using UserGroupInformation and write code in the run method of PrivilegedExceptionAction . If you are in the local system you may not need it. HTH!

Its a bit late to reply, but it will help future reader. It will iterate your HDFS directory and will read the content of each file.
Hadoop client and Java is used only.
Configuration conf = new Configuration();
conf.addResource(new Path(“/your/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/your/hadoop/confhdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
FileStatus[] status = fs.listStatus(new Path("hdfs://path/to/your/hdfs/directory”);
for (int i = 0; i < status.length; i++) {
FSDataInputStream inputStream = fs.open(status[i].getPath());
String content = IOUtils.toString(inputStream, "UTF-8");
}

Related

Java opens URL to save file to specific folder

I'm trying to code based on the manual operation. For manual, I have a URL and when I paste the URL to the Chrome browser, the browser automatically downloads the PDF file from that URL and save to folder "download" without prompting any user input. With Code, I'm able to accomplish the same thing as the manual operation. However I would like the code to save the PDF into specific folder instead of default folder "download". Is it possible to do that?
public static void browseURL() {
try {
String url ="mycompanyURL";
System.out.println("url " + url );
Desktop desktop = Desktop.getDesktop();
URI uri = new URI (url);
desktop.browse(uri);
}catch(Exception err) {
System.out.println("exception " + err.getMessage());
}
}
When I had to do that in old versions of Java, I used the following snippet (pure Java, source: Baeldung).
public void streamFromUrl(String downloadUrl, String filePath) throws IOException {
File file = new File(filePath);
try (BufferedInputStream in = new BufferedInputStream(new URL(downloadUrl).openStream());
FileOutputStream fileOutputStream = new FileOutputStream(file)) {
byte[] dataBuffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(dataBuffer, 0, 1024)) != -1) {
fileOutputStream.write(dataBuffer, 0, bytesRead);
}
}
}
The above opens an input stream on the URL, and outputs the bytes of such stream into a file output stream (where the file is wherever you wish).
Alternatively, there are many libraries doing that in one/two liners (the article I posted shows some of those alternatives).
Also, starting from more recent versions of Java, there are other shorter options:
public void streamFromUrl(String downloadUrl, String filePath) throws IOException {
try (InputStream in = new URL(downloadUrl).openStream()) {
Files.copy(in, Paths.get(new File(filePath)), StandardCopyOption.REPLACE_EXISTING);
}
}
Depending on the version of Java you have, you may pick one of those. Generally speaking, I suggest you reading through the Baeldung's article and check the one that best suits for you.
Here you go. Handles redirects and so on can use and modify as you wish. Have fun with it. All in native Java. Did write this to download some media easily. This can also download media like images, videos and documents.
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpRequest.Builder;
import java.net.http.HttpResponse.BodyHandlers;
import java.nio.file.Files;
import java.nio.file.Path;
public class Downloader {
public static void download(String url) {
final HttpClient hc = HttpClient.newHttpClient();
final Builder requestBuilder = HttpRequest.newBuilder().version(HttpClient.Version.HTTP_1_1);
Path path = Path.of("myfilepath");
handleGet(hc, "myfile.pdf", "myurl.com", path, requestBuilder);
}
private static void handleGet(
final HttpClient hc,
final String fileName,
final String url,
final Path filePath,
final Builder requestBuilder
) {
final HttpRequest request = requestBuilder.uri(URI.create(url)).build();
hc.sendAsync(request, BodyHandlers.ofInputStream())
.thenApply(resp -> {
int sc = resp.statusCode();
System.out.println("STATUSCODE: "+sc+" for url '"+url+"'");
if(sc >= 200 && sc < 300) return resp;
if(sc == 302) {
System.out.println("Handling 302...");
String newUrl = resp.headers().firstValue("location").get();
handleGet(hc, fileName, newUrl, filePath, requestBuilder);
}
return resp;
})
.thenAccept(resp -> {
int sc = resp.statusCode();
if(sc >= 200 && sc < 300) {
try {
System.out.println("Im fine here");
Files.copy(resp.body(), filePath);
} catch (IOException e) {
throw new RuntimeException(e);
}
} else {
System.err.println("STATUSCODE: "+ sc +" for file "+ fileName);
}
}).join();
}
}

How can I read a path from an .ini file

I would like to import a file path in Java. Since the path can change, I want it to be outside of the code and so it is changeable. I have read that that can solve with an INI file. Well, I've tried it. I have the following Java code:
import java.util.*;
import java.io.*;
class readIni {
public static void main(String args[]) {
readIni ini = new readIni();
ini.doit();
}
public void doit() {
try{
Properties p = new Properties();
p.load(new FileInputStream("user.ini"));
p.list(System.out);
}
catch (Exception e) {
System.out.println(e);
}
}
}
My Ini-file:
file = H:/
Now, the console shows exactly the Ini-file and not the contents of the directory....What is wrong?
If you want to just save a file path, consider using the following code:
File file = new File("H:\whatever.txt");
// Write to the file
FileWriter fw = new FileWriter(file);
fw.write("your path goes here");
fw.close();
// Read from the file
BufferedReader br = new BufferedReader(new FileReader(file));
String path = br.readLine();
br.close();

create a temporary file with a specified name in java

I have a Byte[] array that i want to put it's content into a temporary file .
I have tryied to do it like this
try {
tempFile = File.createTempFile("tmp", null);
FileOutputStream fos = new FileOutputStream(tempFile);
fos.write(sCourrier.getBody());
} catch (IOException e) {
e.printStackTrace();
}
but i want that I specify the filename by myself so not generated by the jvm
You can directly give the location and file name or You can access local filesystem and find the temp directory
String tempDir=System.getProperty("java.io.tmpdir");
you can use temp directory and your custom file name.
public static void main(String[] args) {
try {
String tempDir=System.getProperty("java.io.tmpdir");
String sCourrier ="sahu";
File file = new File(tempDir+"newfile.txt");
FileOutputStream fos = new FileOutputStream(file);
fos.write(sCourrier.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
You can use Guava Files.createTempDir():
File file = new File(Files.createTempDir(), fileName.txt);
But because the API is deprecated and they also recommend to use Nio with more params:
Path createTempDirectory(String prefix, FileAttribute<?>... attrs)
so it would be better if you have a method yourself:
File createTempFile(String fileName, String content) throws IOException {
String dir = System.getProperty("java.io.tmpdir");
File file = new File(dir + fileName);
try (FileOutputStream fos = new FileOutputStream(file)) {
fos.write(content.getBytes(StandardCharsets.UTF_8));
}
return file;
}

How to download a file and get the path location locally

I have a URL i.e http://downloadplugins.verify.com/Windows/SubAngle.exe .
If I paste it on the tab and press enter then the file (SubAngle.exe) is getting downloaded and saved in the download folder. This is a manual process. But it can be done with java code.
I wrote the code for getting the absolute path with the help of the file name i.e SubAngle.exe.
Requirement:- With the help of the URL file gets downloaded,Verify the file has been downloaded and returns the absolute path of the file.
where locfile is "http://downloadplugins.verify.com/Windows/SubAngle.exe"
public String downloadAndVerifyFile(String locfile) {
File fileLocation = new File(locfile);
File fileLocation1 = new File(fileLocation.getName());
String fileLocationPath = null;
if(fileLocation.exists()){
fileLocationPath = fileLocation1.getAbsolutePath();
}
else{
throw new FileNotFoundException("File with name "+locFile+" may not exits at the location");
}
return fileLocationPath;
}
easy and general function that im using:
import org.apache.commons.io.FileUtils;
public static void downLoadFile(String fromFile, String toFile) throws MalformedURLException, IOException {
try {
FileUtils.copyURLToFile(new URL(fromFile), new File(toFile), 60000, 60000);
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("exception on: downLoadFile() function: " + e.getMessage());
}
}
Instead of writing this huge code, go for Apache's commons.io
Try this:
URL ipURL = new URL("inputURL");
File opFile = new File("outputFile");
FileUtils.copyURLToFile(ipURL, opFile);
Code to DownloadFile from URL
import java.net.*;
import java.io.*;
public class DownloadFile {
public static void main(String[] args) throws IOException {
InputStream in = null;
FileOutputStream out = null;
try {
// URL("http://downloadplugins.verify.com/Windows/SubAngle.exe");
System.out.println("Starting download");
long t1 = System.currentTimeMillis();
URL url = new URL(args[0]);
// Open the input and out files for the streams
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
in = conn.getInputStream();
out = new FileOutputStream("YourFile.exe");
// Read data into buffer and then write to the output file
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
long t2 = System.currentTimeMillis();
System.out.println("Time for download & save file in millis:"+(t2-t1));
} catch (Exception e) {
// Display or throw the error
System.out.println("Erorr while execting the program: "
+ e.getMessage());
} finally {
// Close the resources correctly
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
Configure the value of fileName properly to know where the file is getting stored.
Source: http://www.devmanuals.com/tutorials/java/corejava/files/java-read-large-file-efficiently.html
The source was modified to replace local file with http URL
Output:
java DownloadFile http://download.springsource.com/release/TOOLS/update/3.7.1.RELEASE/e4.5/springsource-tool-suite-3.7.1.RELEASE-e4.5.1-updatesite.zip
Starting download
Time for download & save file in millis:100184

Read created at runtime file with java

I create and write a file with a java method, then I want to read this file at runtime with another java method.But it throws java.io.FileNotFoundException error.
How could I fix this error?
Writer output=null;
File file = new File("train.txt");
output = new BufferedWriter(new FileWriter(file));
output.write(trainVal[0] + "\n");
-------------------
and read code
FileInputStream fstreamItem = new FileInputStream("train.tx");
DataInputStream inItem = new DataInputStream(fstreamItem);
BufferedReader brItem = new BufferedReader(new InputStreamReader(inItem));
String phraseItem;
ArrayList<Double> qiF = new ArrayList<Double>();
while ((phrase = br.readLine()) != null) {
//doing somethinh here
}
Use the correct file name. This includes the path to the file. Also make sure that no one deleted the file between those two functions or renamed it.
The following is one of the best and convenient methods to read a file. Go through it instead of using traditional methods.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
final public class Main
{
public static void main(String... args)
{
File file = new File("G:/myFile.txt"); //Mention your absolute file path here.
StringBuilder fileContents = new StringBuilder((int)file.length());
Scanner scanner=null;
try
{
scanner = new Scanner(file);
}
catch (FileNotFoundException ex)
{
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
String lineSeparator = System.getProperty("line.separator");
try
{
while(scanner.hasNextLine())
{
fileContents.append(scanner.nextLine()).append(lineSeparator);
}
}
finally
{
scanner.close();
}
System.out.println(fileContents); //Displays the file contents directly no need to loop through.
}
}
You have made a mistake in giving a proper file extension in your code.
FileInputStream fstreamItem = new FileInputStream("train.tx");
Should have been
FileInputStream fstreamItem = new FileInputStream("train.txt");

Categories