I am not able to get properties from an ontology file. I am new to ontology and apache jena. And also I am not able to use 'getProperty' command in proper way.
Till these code I am able to get classes but I don't know how to use 'getProperty','listObjectsOfProperty' command to get properties.
package onto1;
import java.io.InputStream;
import org.semarglproject.vocab.OWL;
import com.hp.hpl.jena.ontology.OntClass;
import com.hp.hpl.jena.ontology.OntModel;
import com.hp.hpl.jena.ontology.OntModelSpec;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.util.iterator.ExtendedIterator;
public class ontolo {
public static void main(String[] args) {
OntModel model = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
String inputFileName = "file:///F:/apache-jena-2.12.1/travel.owl";
InputStream in = FileManager.get().open( inputFileName );
if (in == null) {
throw new IllegalArgumentException("File: " + inputFileName + " not found");
}
//Property hasTime = model.createProperty( "file:///F:/apache-jena-2.12.1/shopping.owl#" );
//Property getProperty ( inputFileName, hastime) ;
model.read(in, null);
com.hp.hpl.jena.rdf.model.Property irrr = model.getProperty(OWL.ON_PROPERTIES);
com.hp.hpl.jena.rdf.model.NodeIterator iter1 = model.listObjectsOfProperty(irrr);
com.hp.hpl.jena.rdf.model.ResIterator i = model.listSubjectsWithProperty (irrr);
//com.hp.hpl.jena.rdf.model.Statement iir = model.getRequiredProperty(inputFileName, irrr);
//com.hp.hpl.jena.rdf.model.NodeIterator iter2 = model.listObjectsOfProperty(inputFileName.subClassOf);
ExtendedIterator<OntClass> iter = ((OntModel) model).listClasses();
while ( iter.hasNext()){
System.out.println(iter.next().toString());
}
// write it to standard out
model.write(System.out);
}
}
Related
Using java, I need to read a warc archive file, filter it depending on the content of the html page, and write a new archive file.
the following code reads the archive. how to reconstruct an org.archive.io.warc.WARCRecordInfo from an org.archive.io.ArchiveRecord?
import org.apache.commons.io.IOUtils;
import org.archive.io.ArchiveRecord;
import org.archive.io.warc.*;
import org.archive.wayback.resourcestore.resourcefile.WarcResource;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.util.Iterator;
import java.util.concurrent.atomic.AtomicInteger;
public class Test126b {
public static void main() throws Exception {
File out = new java.io.File("out.warc.gz");
OutputStream bos = new BufferedOutputStream(new FileOutputStream(out));
WARCWriterPoolSettings settings = ...
WARCWriter writer = new WARCWriter(new AtomicInteger(), bos, out, settings);
File in = new java.io.File("in.warc.gz");
WARCReader reader = WARCReaderFactory.get(in);
Iterator<ArchiveRecord> it = reader.iterator();
while (it.hasNext()) {
ArchiveRecord archiveRecord = it.next();
if (archiveRecord.getHeader().getHeaderValue("WARC-Type") == "response") {
WARCRecord warcRecord = (WARCRecord) archiveRecord;
WarcResource warcResource = new WarcResource(warcRecord, reader);
warcResource.parseHeaders();
String url = warcResource.getWarcHeaders().getUrl();
System.out.println("+++ url: " + url);
byte[] content = IOUtils.toByteArray(warcResource);
String htmlPage = new String(content);
if (htmlPage.contains("hello world")) {
writer.writeRecord(warcRecordInfo) // how to reconstruct the WARCRecordInfo
}
}
}
reader.close();
writer.close();
}
}
First, I will explain what I am trying to do. First, I am putting input file (first CSV file) into mapreduce job and other CSV file will be put inside mapper class. But here is the thing. The code in mapper class does not work properly, like this right bottom code. I want to combine two CSV files to use several columns in each CSV file.
For example, 1 file has BibNum (user account), checkoutdatetime (book checkoutdatetime), and itemtype (book itemtype), and 2 CSV file has BibNum (user account), Title (book Title), Itemtype and so on. I want to find out which book will be likely borrowed in coming month. I would be really appreciated if you know the way can combine two CSV file and enlighten me with any helps. If you have any doubts for my code, just let me know, I will try to clarify it.
Path p = new Path("hdfs://0.0.0.0:8020/user/training/Inventory_Sample");
FileSystem fs = FileSystem.get(conf);
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(p)));
try {
String BibNum = "Test";
//System.out.print("test");
while(br.readLine() != null){
//System.out.print("test");
if(!br.readLine().startsWith("BibNumber")) {
String subject[] = br.readLine().split(",");
BibNum = subject[0];
}
}
.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;
import java.util.HashMap;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class StubMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outkey = new Text();
//private MinMaxCountTuple outTuple = new MinMaxCountTuple();
//String csvFile = "hdfs://user/training/Inventory_Sample";
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
//conf.addResource("/etc/hadoop/conf/core-site.xml");
//conf.addResource("/etc/hadoop/conf/hdfs-site.xml");
Path p = new Path("hdfs://0.0.0.0:8020/user/training/Inventory_Sample");
FileSystem fs = FileSystem.get(conf);
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(p)));
try {
String BibNum = "Test";
//System.out.print("test");
while(br.readLine() != null){
//System.out.print("test");
if(!br.readLine().startsWith("BibNumber")) {
String subject[] = br.readLine().split(",");
BibNum = subject[0];
}
}
if(value.toString().startsWith("BibNumber"))
{
return;
}
String data[] = value.toString().split(",");
String BookType = data[2];
String DateTime = data[5];
SimpleDateFormat frmt = new SimpleDateFormat("MM/dd/yyyy hh:mm:ss a");
Date creationDate = frmt.parse(DateTime);
frmt.applyPattern("dd-MM-yyyy");
String dateTime = frmt.format(creationDate);
//outkey.set(BookType + " " + dateTime);
outkey.set(BibNum + " " + BookType + " " + dateTime);
//outUserId.set(userId);
context.write(outkey, new IntWritable(1));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
finally{
br.close();
}
}
}
You are reading CSV file in the mapper code.
If you are using the path to open a file in the mapper, I guess you are using Distributed Cache, then only file would be shipped with the jar to each node where the map reduce should run.
There is a way to combine, but not in the mapper.
You can try the below approach :-
1) Write 2 separate mapper for two different file.
2) Send only those fields required from mapper to reducer.
3) Combine the results in the reducer ( as you want to join on some specific key ).
You can check out Multi Input Format examples for more.
I have the following three classes :
I tried making the routine of 1 & 2 and used tjava to call the main class and the method from 1 & 2 but I am unable to fetch those methods.
1)
package page_scraper;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebClientOptions;
import com.gargoylesoftware.htmlunit.html.FrameWindow;
import com.gargoylesoftware.htmlunit.html.HtmlButtonInput;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlOption;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSelect;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintStream;
import java.io.Writer;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.List;
import page_scraper.UnitArray;
public class PageScraper {
public void Scrape() throws IOException {
try {
UnitArray object = new UnitArray();
ArrayList<String> unitList = object.getUnitArray();
WebClient webClient = new WebClient();
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = (HtmlPage)webClient.getPage("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMUData");
List frames = page.getFrames();
HtmlPage page1 = (HtmlPage)((FrameWindow)frames.get(0)).getEnclosedPage();
HtmlTextInput settlementDay = (HtmlTextInput)page1.getHtmlElementById("param5");
HtmlSelect period = (HtmlSelect)page1.getHtmlElementById("param6");
HtmlOption periodOption = period.getOption(1);
HtmlTextInput unitId = (HtmlTextInput)page1.getHtmlElementById("param1");
HtmlButtonInput button = (HtmlButtonInput)page1.getHtmlElementById("go_button");
String outputLocation = String.valueOf(System.getProperty("user.home")) + "/Documents/output.csv";
FileWriter fileWriter = new FileWriter(outputLocation);
String errorLocation = String.valueOf(System.getProperty("user.home")) + "/Documents/error.csv";
FileWriter errorWriter = new FileWriter(errorLocation);
int i = 0;
while (i < unitList.size()) {
int x = 0;
while (x < 365) {
String errorData;
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
Calendar cal = Calendar.getInstance();
cal.add(5, - x);
String dateValue = dateFormat.format(cal.getTime());
System.out.println(dateValue);
settlementDay.setValueAttribute(dateValue);
period.setSelectedAttribute(periodOption, true);
unitId.setValueAttribute(unitList.get(i));
System.out.println(unitList.get(i));
try {
button.click();
HtmlPage page2 = (HtmlPage)((FrameWindow)frames.get(1)).getEnclosedPage();
String pageSource = page2.asXml();
int firstIndex = pageSource.indexOf("csv=") + 38;
int secondIndex = pageSource.indexOf("n\"") + 1;
String csvData = pageSource.substring(firstIndex, secondIndex);
fileWriter.append(csvData);
}
catch (ClassCastException e) {
errorData = String.valueOf(dateValue) + " " + unitList.get(i) + System.getProperty("line.separator");
System.out.println(errorData);
errorWriter.append(errorData);
continue;
}
catch (StringIndexOutOfBoundsException e) {
errorData = String.valueOf(dateValue) + " " + unitList.get(i) + System.getProperty("line.separator");
System.out.println(errorData);
errorWriter.append(errorData);
continue;
}
++x;
}
++i;
}
webClient.close();
fileWriter.close();
errorWriter.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
2)
package page_scraper;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
public class UnitArray {
public ArrayList<String> getUnitArray() {
String csvList = "abc,xyz";
ArrayList<String> list = new ArrayList<String>(Arrays.asList(csvList.split(",")));
return list;
}
}
3)
package page_scraper;
import page_scraper.PageScraper;
public class main {
public static void main(String[] args) throws Exception {
PageScraper test = new PageScraper();
test.Scrape();
}
}
I made the routines for the above code(1) & 2)) in Talend and then used tjava to call the method but unable to do so..I also tried using tjava for all and did a onSubjob ok on each of the tjava.
How can I call these classes in talend and call the method ?
Firstly, routines classes in Talend need to be in routines package
package routines;
public class PageScraper {
public void Scrape() {
System.out.println("PageScraper.Scrape");
}
}
Secondly, to use it in Job you need to drag'n'drop routine to opened job area.
Then you can use your class in that way
You can easily make a jar file that contains the three classes then load the jar using tLibraryLoad or include the jar in your routine if you want to get more reusability.
As suggested in the other answers, you need to define classes under routines package.
in case you are using Takend 7.3 & above, Right click on your routine and add it as Dependent package
Get routines as a jar and in case using in bigData jobs, you may need to use tLibraryLoad to package it together with other dependencies..
I have a jar file which I need to replace a few classes from. The problem is that they're located in a directory with a name that is too long for both arch and windows to handle.
(So the layout inside the jar file contains directory's with impossibly long names)
I therefore can't physically create the same directory structure and use "jar uf modded.jar com/"
Is there any way/program/command/trick to get those files to end up and that location in that jar file?
You can use the java.util.zip package to read and write a zip file. Since the zip entry names are just strings, the file system's limit on name length is not an obstacle:
import java.io.InputStream;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Map;
import java.util.LinkedHashMap;
import java.util.Enumeration;
import java.util.Objects;
import java.util.logging.Logger;
import java.util.logging.Level;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
import java.util.zip.ZipException;
public class ZipUpdater {
private static final Logger logger =
Logger.getLogger(ZipUpdater.class.getName());
private final Map<String, Path> changes;
public ZipUpdater(Map<String, Path> changes) {
this.changes = new LinkedHashMap<>(
Objects.requireNonNull(changes, "Change list cannot be null"));
}
public void update(Path zipFile)
throws IOException,
ZipException {
Objects.requireNonNull(zipFile, "Zip file cannot be null");
Map<String, Path> changesNeeded = new LinkedHashMap<>(changes);
Path newZipFilePath = Files.createTempFile(null, ".zip");
try (ZipFile oldZipFile = new ZipFile(zipFile.toFile());
ZipOutputStream newZipFile = new ZipOutputStream(
new BufferedOutputStream(
Files.newOutputStream(newZipFilePath)))) {
String comment = oldZipFile.getComment();
if (comment != null) {
newZipFile.setComment(comment);
}
Enumeration<? extends ZipEntry> oldEntries = oldZipFile.entries();
while (oldEntries.hasMoreElements()) {
ZipEntry entry = oldEntries.nextElement();
String entryName = entry.getName();
Path source = changesNeeded.remove(entryName);
if (source != null) {
ZipEntry newEntry = new ZipEntry(entryName);
newEntry.setMethod(entry.getMethod());
newEntry.setTime(entry.getTime());
newEntry.setComment(entry.getComment());
newEntry.setExtra(entry.getExtra());
newZipFile.putNextEntry(newEntry);
logger.log(Level.INFO,
"Replacing entry \"{0}\" with contents of file \"{1}\"",
new Object[] { entryName, source });
Files.copy(source, newZipFile);
} else {
ZipEntry newEntry = new ZipEntry(entry);
newZipFile.putNextEntry(newEntry);
logger.log(Level.FINE, "Copying entry {0}", entryName);
try (InputStream entryData = new BufferedInputStream(
oldZipFile.getInputStream(entry))) {
int b;
while ((b = entryData.read()) >= 0) {
newZipFile.write(b);
}
}
}
newZipFile.closeEntry();
}
}
if (!changesNeeded.isEmpty()) {
throw new IOException("The following entries were not found"
+ " in '" + zipFile + "': " + changesNeeded.keySet());
}
Files.move(zipFile, Paths.get(zipFile + ".old"),
StandardCopyOption.REPLACE_EXISTING);
Files.move(newZipFilePath, zipFile);
}
public static void main(String[] args)
throws IOException,
ZipException {
if (args.length < 3 || (args.length % 2) != 1 ||
args[0].equals("-?") ||
args[0].equalsIgnoreCase("-h") ||
args[0].equalsIgnoreCase("--help")) {
System.err.println("Usage:");
System.err.println(
"java " + ZipUpdater.class.getName() + " <zipfile>"
+ " <zip-entry> <replacement-file>"
+ " [ <zip-entry> <replacement-file> ] ...");
System.exit(2);
}
Path zipFile = Paths.get(args[0]);
int argCount = args.length;
Map<String, Path> changes = new LinkedHashMap<>(argCount / 2);
for (int i = 1; i < argCount; i += 2) {
String entry = args[i];
Path replacement = Paths.get(args[i + 1]);
changes.put(entry, replacement);
}
ZipUpdater updater = new ZipUpdater(changes);
updater.update(zipFile);
}
}
there is a text file which we read from it , then we want to write it after some little changes to othere text file, but the question is that why it has different results if we use
System.out.println and when we use pwPaperAuthor.println?
the code is like :
package cn.com.author;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;
//input:"IndexAuthors1997-2010.txt"
//output:"PaperAuthor1997-2010.txt"
public class PaperAuthors {
public static void main(String[] args) {
BufferedReader brIndexAuthors = null;
BufferedWriter bw = null;
PrintWriter pwPaperAuthor = null;
try {
brIndexAuthors = new BufferedReader(new InputStreamReader(
new FileInputStream("IndexAuthors1997-2010.txt")));
bw = new BufferedWriter(new FileWriter(new File(
"PaperAuthor1997-2010.txt")));
pwPaperAuthor = new PrintWriter(new OutputStreamWriter(
new FileOutputStream("PaperAuthor1997-2010.txt")));
/*
* line = brIndexAuthors.readLine();
*
* element=line.split("#"); String author=null; StringTokenizer st =
* new StringTokenizer(element[1],","); while(st.hasMoreTokens()) {
* author = st.nextToken(); pwPaperAuthor.println(element[0] + "+" +
* author); //~i++; }
*/
String line = null;
String element[] = new String[3];
String author = null;
int i = 0;
while ((line = brIndexAuthors.readLine()) != null) {
element = line.split("##");
StringTokenizer st = new StringTokenizer(element[1], ",");
int num=st.countTokens();
while (st.hasMoreTokens()) {
author = st.nextToken();
pwPaperAuthor.println(element[0]+"#"+author+"#"+element[2]);
bw.write(element[0] + "#" + author + "#" + element[2]);
bw.newLine();
System.out.println(element[0]+"#"+author+"#"+element[2]);
i++;
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
}
}
}
Ouput
if
System.out.println(element[0]+"#"+author+"#"+element[2]);------>620850#Henk Ern
if
pwPaperAuthor.println(element[0]+"#"+author+"#"+element[2]);
----->620850#Henk Ernstblock#2001
There's no way you can read a file and write to it in the same loop, using the stream-based API. You will have to create a new file and copy everything that's the same, adding what's new. What you are doing now has unpredictable behavior. If you still want to read and write at the same time, you'll have to use the RandomAccessFile, but that's quite a bit more complicated.