I use PlanBuilder.ModifyPlan to retrieve the contents and the results are in StringHandle().
I see the PlanBuilderBase.ExportablePlanBase but there is no reference as how to use its exportAs method.
This method should be sth like:
ExportablePlan ep = plan.exportAs(String);
Typically, an application wouldn't call exportAs().
Instead, an application would pass the plan to methods of the RowManager class. Internally, the implementation of such methods export the plan for sending to the server.
In particular, the following RowManager methods take a plan and get its result rows or an explanation of the query preparation:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/expression/class-use/PlanBuilder.Plan.html#com.marklogic.client.row
Here is an example of getting result rows:
http://docs.marklogic.com/guide/java/OpticJava#id_93678
RowManager also provides methods for binding parameters of the plan to literal values before sending the plan to the server:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/expression/class-use/PlanBuilder.Plan.html#com.marklogic.client.expression
Examples of edge cases where an application might want to export a plan include:
logging
inserting into a JSON document so an enode script could import a plan without receiving the plan from the client
The exported plan is a JSON document (represented as a String, if the exportAs() method is used). After exporting the plan, the application could process the JSON document in the same way as any other JSON document. For instance, the application could use JSONDocumentManager to write the plan as a document in the content database.
Hoping that helps,
Eurostat data can be downloaded via a REST API. The response format of the API is a XML file formatted according to the SDMX-ML standard. With SAS, very conveniently, one can access XML files with the libname statement and the XML or XMLv2 engine.
Currently, I am using the xmlv2 engine together with the automap= option to generate an xmlmap to access the data. It works. But the resulting SAS data sets are very unstructured, and for another data set to be downloaded the data structure might change. Also the request might depend on the DSD-file that Eurostat provides for each database item within a different XML file.
Here comes the code:
%let path = /your/working/directory/;
filename map "&path.map.txt";
filename resp "&path.resp.txt";
proc http
URL="http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011"
METHOD="GET"
OUT=resp;
run;quit;
libname resp XMLv2 automap=REPLACE xmlmap=map;
proc datasets;
copy out=WORK in=resp;
run;quit;
With the code above, you can view all downloaded data in your WORK library. Its a mess.
To download another time series change parameters of the URL according to Eurostat's description.
So here is my question
Is there a way to easily generate a xmlmap from a call to the DSD file so that the data are stored in a well structured way?
As the SDMX-ML standard is widely used in public institutions such as the ECB, Eurostat, OECD... I am wondering if somebody has implemented requests to the databases, already. I know about the tool from Banca Italia which uses a javaObject. However, I was wondering if there might be a solution without the javaObject.
I have about 3200 URLs to small XML files which have some data in the form of strings(obviously).The XML files are displayed(not downloaded) when I go to the URLs. So I need to extract some data from all those XMLs and save it in a single .txt file or XML file or whatever. How can I automate this process?
*Note: This is what the files look like. I need to copy the 'location' and 'title' from all of them and put them in one single file. Using what methodology can this be achieved?
<?xml version="1.0"?>
-<playlist xmlns="http://xspf.org/ns/0/" version="1">
-<tracklist>
<location>http://radiotool.com/fransn.mp3</location>
<title>France, Paris radio 104.5</title>
</tracklist>
</playlist>
*edit: Fixed XML.
It's easy enough with XQuery or XSLT, though the details will depend on how the URLs are held. If they're in a Java List, then (with Saxon at least) you can supply this list as a parameter to the following query:
declare variable urls as xs:string* external;
<data>{
for $u in $urls return doc($u)//*:tracklist
}</data>
The Java code would be something like:
Processor proc = new Processor();
XQueryCompiler c = proc.newXQueryCompiler();
XQueryEvaluator q = c.compile($query).load();
List<XdmItem> urls = new ArrayList();
for (url : inputUrls) {
urls.append(new XdmAtomicValue(url);
}
q.setExternalVariable(new QName("urls"), new XdmValue(urls));
q.setDestination(...)
run();
Have a look at the JSoup library here: http://jsoup.org/
It has facilities for pulling and fixing the contents of a URL, it is intended for HTML though, so I'm not sure it will be good for XML, but it is worth a look.
I have really big JSON file for parsing and managing. My JSON file contains structure like this
[
{"id": "11040548","key1":"keyValue1","key2":"keyValue2","key3":"keyValue3","key4":"keyValue4","key5":"keyValue5","key6":"keyValue6","key7":"keyValue7","key8":"keyValue8","key9":"keyValue9","key10":"keyValue10","key11":"keyValue11","key12":"keyValue12","key13":"keyValue13","key14":"keyValue14","key15":"keyValue15"
},
{"id": "11040549","key1":"keyValue1","key2":"keyValue2","key3":"keyValue3","key4":"keyValue4","key5":"keyValue5","key6":"keyValue6","key7":"keyValue7","key8":"keyValue8","key9":"keyValue9","key10":"keyValue10","key11":"keyValue11","key12":"keyValue12","key13":"keyValue13","key14":"keyValue14","key15":"keyValue15"
},
....
{"id": "11040548","key1":"keyValue1","key2":"keyValue2","key3":"keyValue3","key4":"keyValue4","key5":"keyValue5","key6":"keyValue6","key7":"keyValue7","key8":"keyValue8","key9":"keyValue9","key10":"keyValue10","key11":"keyValue11","key12":"keyValue12","key13":"keyValue13","key14":"keyValue14","key15":"keyValue15"
}
]
My JSON file contains data about topics from news website and practically every day this JSON file will be increased dramatically.
For parsing of that file I use
URL urlLinkSource = new URL(OUTBOX_URL);
urlLinkSourceReader = new BufferedReader(new InputStreamReader(
urlLinkSource.openStream(), "UTF-8"));
ObjectMapper mapper = new ObjectMapper();
List<DataContainerList> DataContainerListData = mapper.readValue(urlLinkSourceReader,new TypeReference<List<DataContainerList>>() { }); //DataContainerList contains id, key1, key2, key3..key15
My problem is that I want to load in this line
List<DataContainerList> DataContainerListData = mapper.readValue(urlLinkSourceReader,new TypeReference<List<DataContainerList>>() { });
only range of JSON object - just first ten object, just second ten object - because I need to display in my app just 10 news in paging mode (all the time I know the index of which 10 I need to display). It totally stuped to load 10 000 objects and to iterate just first 10 of them. So my question is how I can load
in similar way like this one:
List<DataContainerList> DataContainerListData = mapper.readValue(urlLinkSourceReader,new TypeReference<List<DataContainerList>>() { });
only objects with indexes FROM -TO (for example from 30 to 40) without loading of all objects in the entire JSON file?
Regards
It depends of what you mean by "load object with indexes from to", if you want to
Read everything but bind only a sublist
The solution in that case is to read the full stream and only bind values within those indexes.
You can use jacksons streaming api and do it yourself. Parse the stream use a counter to keep track of actual index and then bind to POJOs only what you need.
However this is not a good solution if your file is large and its done in real time.
Read only the data between those indexes
You should do that if your file is big and performance matters. Instead of having a single big file, do the pagination by splitting your json array into multiple files matching your ranges, and then just deserialize the specific file content into your array.
Hope this helps...
Is it possible to get URLs into Nutch directly from a database or a service etc. I'm not interested in the ways which data is taken from the database or service and written to seed.txt.
No. This cannot be done directly with the default nutch codebase. You need to modify Injector.java to achieve that.
EDIT:
Try using DBInputFormat : an InputFormat that reads input data from an SQL table. You need to modify the Inject code here (line 3 in snippet below):
JobConf sortJob = new NutchJob(getConf());
sortJob.setJobName("inject " + urlDir);
FileInputFormat.addInputPath(sortJob, urlDir);
sortJob.setMapperClass(InjectMapper.class);