Custom SnakeYAML dump styles - java

I want to make custom dump styles in different cases, for example I have that sample code:
DumperOptions options = new DumperOptions();
options.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK);
options.setDefaultScalarStyle(DumperOptions.ScalarStyle.PLAIN);
Yaml yaml = new Yaml(options);
Map<Object, Object> map = new LinkedHashMap<>();
map.put("list", new ArrayList<>(Arrays.asList("entry1", "entry2")));
map.put("multiline", "line 1\nline 2\nline 3");
map.put("oneline", "line");
map.put("oneline-special", "line with #");
map.put("oneline-special #", "line with #");
yaml.dump(map, fileWriter);
Dump result is:
list:
- entry1
- entry2
multiline: |-
line 1
line 2
line 3
oneline: line
oneline-special: 'line with #'
'oneline-special #': 'line with #'
Problem:
I want to have double quoted value in any case, if it's a string key: "value", and if only needed for key, then: "key": "value". Also I need to save DumperOptions.ScalarStyle.PLAIN in order to support pretty style multiline strings output.
I tried to find anything related to that, found few info about Representer extending, but seems it cannot solve my problem with explicit style (no quotes on key, but double on value), I thought about extending Emitter, but it's final class so I can't use it without rewriting part of library.
So, my final result should be:
list:
- "entry1"
- "entry2"
multiline: |-
line 1
line 2
line 3
oneline: "line"
oneline-special: "line with #"
"oneline-special #": "line with #"
number: 512
Any solutions? Need your help. Thanks in advance.

As no another solution was provided, I solved it by directly changing processScalar() method in Emitter class. First added check to force double quoting if scalar is not a key and not a multiline (because I wanna plain style for multiline):
if (!simpleKeyContext && !analysis.multiline) {
style = ScalarStyle.DOUBLE_QUOTED;
}
Then changed switch case logic, where in case of SINGLE_QUOTED ScalarStyle we write as double, so, if needed, the key will be written in double quoted style.
Runned JUnit tests with simple key value and different styles, multiline case and list case. All is right and shine.

Related

renameing .fromFilePairs with regex capture group in closure

I'm new to nextflow/groovy/java and i'm running into some difficulty with a simple regular expression task.
I'm trying to alter the labels of some file pairs.
It is my understanding that fromFilePairs returns a data structure of the form:
[
[common_prefix, [file1, file2]],
[common_prefix, [file3, file4]]
]
I further thought that:
The .name method when invoked on a item from this list will give the name, what I have labelled above as common_prefix
The value returned by a closure used with fromFilePairs sets the names of the file pairs.
The value of it in a closure used with fromFilePairs is a single item from the list of file pairs.
however, I have tried many variants on the following without success:
params.fastq = "$baseDir/data/fastqs/*_{1,2}_*.fq.gz"
Channel
.fromFilePairs(params.fastq, checkIfExists:true) {
file ->
// println file.name // returned the common file prefix as I expected
mt = file.name =~ /(common)_(prefix)/
// println mt
// # java.util.regex.Matcher[pattern=(common)_(prefix) region=0,47 lastmatch=]
// match objects appear empty despite testing with regexs I know to work correctly including simple stuff like (.*) to rule out issues with my regex
// println mt.group(0) // #No match found
mt.group(0) // or a composition like mt.group(0) + "-" + mt.group(1)
}
.view()
I've also tried some variant on this using the replaceAll method.
I've consulted documentation for, nextflow, groovy and java and I still can't figure out what I'm missing. I expect it's some stupid syntactic thing or a misunderstanding of the data structure but I'm tired of banging my head against it when it's probably obvious to someone who knows the language better - I'd appreciate anyone who can enlighten me on how this works.
A closure can be provided to the fromfilepairs operator to implement a custom file pair grouping strategy. It takes a file and should return the grouping key. The example in the docs just groups the files by their file extensions:
Channel
.fromFilePairs('/some/data/*', size: -1) { file -> file.extension }
.view { ext, files -> "Files with the extension $ext are $files" }
This isn't necessary if all you want to do is alter the labels of some file pairs. You can use the map operator for this. The fromFilePairs op emits tuples in which the first element is the 'grouping key' of the matching pair and the second element is the 'list of files' (sorted lexicographically):
Channel
.fromFilePairs(params.fastq, checkIfExists:true) \
.map { group_key, files ->
tuple( group_key.replaceAll(/common_prefix/, ""), files )
} \
.view()

How to use Cross Validation in Spark's Mllib in a Java Project with logistic regression?

I want want to implement the a cross validation k fold for my Java spark project which uses Mllib so that I can calculate the F score. (here is a link to a pastebin code ) Consider a set of labeled points
JavaRDD<LabledPoint> allData= ...// some labled points
where each point is labeled "0" or "1". Therefore it might look like {[1,(2,3)],[0,(4,6)],.... }. I managed to split my data in two parts training and verification. I created a LogisticRegressionWithLBFGS object that gives me the model
LogisticRegressionModel model = logisticRegression.run(trainingData.rdd())
I assume that before I do the model I need to do cross validation but I am not quite sure how it is implemented. Conceptually I understood the cross validation: it is a method that trains my classifier on the k parts divided data in order to find the best model.
For the F score I did the following
JavaRDD<Tuple2<Object, Object>> predict = valdidationData.map(new Function<LabeledPoint, Tuple2<Object, Object>>() {
public Tuple2<Object, Object> call(LabeledPoint point) {
Double prediction = model.predict(point.features());
return new Tuple2<Object, Object>(prediction, point.label());
}
});
BinaryClassificationMetrics metrics = new BinaryClassificationMetrics(predict.rdd());
JavaRDD<Tuple2<Object, Object>> f1Score =metrics.fMeasureByThreshold().toJavaRDD();
But the F Score is always separated I get a value for the labeled "1" and "0".
How can I use the cross validation from Mllib? In addition how can I calculate the f score correctly?
use the CrossValidator in your Spark pipeline: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaModelSelectionViaCrossValidationExample.java

java associative array sort of thing with more than one data type

imagine this:
DataTypeForConfigs config
with
String keys, but values of either String, Integer, or Boolean,
in Java, JSON can do that, but I'm making a format That goes along the lines of:
number "coolness" is 9001 means
int coolness = 9001;
It's method is: Read line, read each word, think what to make of it, set it to a Variable within it's reach
Also: what would happen if another thing had its own place to put config? a null would be read? WHY? constructor thinks a file has null on it? Rage face.
Say... should I make a class called SettingVal that when given a getValue() call it would say what it is?
SO:
config["Coolness"].getValue();
return's 9001
WAIT:
How on earth would I make the getValue() method? HOW? RETURN VALUE WONT LIKE THIS!! OH CRAP!
Solution:
Another Data type comes in and checks its 'gender' (String, Bool, Int) and then checks it's value of that 'gender' (strVal, boolVar, intVar)
Return values are a big problem when dealing with this. I need a stress free version, so maybe I can have a void returning method that runs another method based on what data type it is said to hold! Am I right?
I have a temporary solution, setVar works, getVar is get*Var, where * is Str, Bool or Int.
Sadly, I Haven't yet been able to properly read it from a file, the method I made to read from a file is not working. It makes a Map<String,SettingVar>, using a HashMap constructor and returns that map, but seems whenever I try to access a variable from it that variable is null. It is probably because of IOExceptions and FileNotFoundExceptions, FileNotFound? Why? It Shouldn't be running until called. Oh, and also NullPointerExceptions Please Help!
SUBQUESTION: what happens when you MapVariable.put({NAME HERE}, varToPutIn) many times in a for loop? what about MapVariable.put({NAME HERE},new ...)?
My code in links:
https://gist.github.com/anonymous/66c4d1c2d2718a4cc9b9
because I don't have enough reputation
P.S: OK! ive made the config reader work now, and SettingVar, and SettingContainer and im working on ConfigWriter which is good, now working on a prototype for a java command prompt like thing, and soon a WHOLE OS!! wait... java is an os. thats why java virtual machine... oh. Well, how can I close this question and turn the outcome into a revolutionary new thingy for kids who want to learn to code java *cough cough* especialy ones with higher learning ability than social ability... and like to hang around with mature people who dont bully them like all the kids in their school. (Wow, that was specific)
I would use a Plain Old Java Object which you can read from JSON.
class Config {
int coolness = 9001;
String hello = "world";
boolean cool = true;
}
This way you can have fields with a variety of types.
The type you're looking for is Map<String,Object>, but it is not type-safe and you'll have to do a bunch of casting:
Map<String,Object> config = new HashMap<>();
config.put("coolness",9001);
config.put("hello","world");
config.put("cool", true);
boolean cool = (Boolean) config.get("cool");
String hello = (String) config.get("world");
int coolness = (Integer) config.get("coolness");
Generally, I'd recommend creating a dedicated class for holding your configuration (each field = one property), which is strongly typed and doesn't require casting, and then use something like Jackson to serialize/deserialize it from json, yaml, or xml.
This provides structure to your configuration, and will cause any issues with malformed configurations to show up when you start your application/load your configuration, and not in the middle of your application.
SUBQUESTION: what happens when you MapVariable.put(varToPutIn) many times in a for loop?
A Map represents a mapping. If you do this:
Map<String, Object> map = new HashMap<>();
for (int i = 0; i < 10; i++) {
map.put("myKey", Integer.valueOf(i));
}
what happens is that you add a mapping from "myKey" to zero, then update it to one, two, three and so on. When the loop ends, "myKey" will map to nine.
In short, the map entry for "myKey" is behaving like a variable of type Integer that you assign to repeatedly.
I'm afraid your Gists are telling me that you simply didn't take on board what #Darth Android wrote. Rather that hashing through your code, here's a simple way to parse your config file syntax (more or less) and load it into a Map<String, Object>
Note: I have not compiled or tested this code. It is written to be read and understood, rather than borrowed.
Map<String,Object> config = new HashMap<>();
try (Scanner s = new Scanner(new FileReader(someFile))) {
while (s.hasNext()) {
// Syntax is '<type> <name> is <value>'
String[] words = s.nextLine().split("\\s+");
if (words.length != 4 || !words[2].equals("is")) {
throw MySyntaxException("unrecognizable config");
}
String type = words[0];
String name = words[1];
String val = words[3];
switch (type) {
case "number":
map.put(name, Integer.valueOf(val));
break;
case "boolean":
map.put(name, Boolean.valueOf(val));
break;
case "string":
map.put(name, val);
break;
default:
throw MySyntaxException("unknown type");
}
} catch (NumberFormatException ex) {
throw MySyntaxException("invalid number");
}
}

Hashmap single key holding a class. count the key and retrieve counter

I am working on a database self project. I have an input file got from: http://ir.dcs.gla.ac.uk/resources/test_collections/cran/
After processing into 1400 separate file, each named 00001.txt,... 01400.txt...) and after applying Stemming on them, I will store them separately in a specific folder lets call it StemmedFolder with the following format:
in StemmedFolder: 00001.txt includes:
investig
aerodynam
wing
slipstream
brenckman
experiment
investig
aerodynam
wing
in StemmedFolder: 00756.txt includes:
remark
eddi
viscos
compress
mix
flow
lu
ting
And so on....
I wrote the codes that do:
get the StemmedFolder, Count the Unique words
Sort Alphabetically
Add the ID of the document
save each to a new file 00001.txt to 01400.txt as will be described
{I can provide my codes for these 4 sections in case somebody needs to see how is the implementation or change or any edit}
output of each file will be result to a separate file. (1400, each named 00001.txt, 00002.txt...) in a specific folder lets call it FrequenceyFolder with the following format:
in FrequenceyFolder: 00001.txt includes:
00001,aerodynam,2
00001,agre,3
00001,angl,1
00001,attack,7
00001,basi,4
....
in FrequenceyFolder: 00999.txt includes:
00999,aerodynam,5
00999,evalu,1
00999,lift,3
00999,ratio,2
00999,result,9
....
in FrequenceyFolder: 01400.txt includes:
01400,subtract,1
01400,support,1
01400,theoret,1
01400,theori,1
01400,.....
______________
Now my question:
I need to combine these 1400 files again to output a txt file that looks like this format with some calculation:
'aerodynam' totalFrequency=3docs: [[Doc_00001,5],[Doc_01344,4],[Doc_00123,3]]
'book' totalFrequncy=2docs: [[Doc_00562,6],[Doc_01111,1]
....
....
'result' totalFrequency=1doc: [[Doc_00010,5]]
....
....
'zzzz' totalFrequency=1doc: [[Doc_01235,1]]
Thanks for spending time reading this long post
You can use a Map of List.
Map<String,List<FileInformation>> statistics = new HashMap<>()
In the above map, the key will be the word and the value will be a List<FileInformation> object describing the statistics of individual files containing the word. The FileInformation class can be declared as follows :
class FileInformation {
int occurrenceCount;
String fileName;
//getters and setters
}
To populate the above Map, use the following steps :
Read each file in the FrequencyFolder
When you come across a word for the first time, put it as a key in the Map.
Create a FileInformation object and set the occurrenceCount to the number of occurrences found and set the fileName to the name of the file it was found in. Add this object in the List<FileInformation> corresponding to the key created in step 2.
The next time you come across the same word in another file, create a new FileInfomation object and add it to the List<FileInformation> corresponding to the entry in the map for the word.
Once you have the Map populated, printing the statistics should be a piece of cake.
for(String word : statistics.keySet()) {
List<FileInformation> fileInfos = statistics.get(word);
for(FileInformation fileInfo : fileInfos) {
//sum up the occureneceCount for the word to get the total frequency
}
}

Eclipse formatter putting method on new line despite enough space in line width

I'm using Eclipse Luna 4.4.0 and Eclipse formatter takes this code:
users = getSingleColUserList(new XSSFWorkbook(fileInputStream),
userId, profCol);
and drops the method call onto a new line:
users =
getSingleColUserList(new XSSFWorkbook(fileInputStream),
userId, profCol);
As you can see, the line width is not the issue. It's not at all obvious what setting in the formatter dialog I need to change.
[UPDATED after Seelenvirtuose's answer]
I can set Eclipse to format Line Wrapping -> Assignments to Do not wrap. However that raises another issue with lines then not getting wrapped when they go over the line width:
List<Map<String, Object>> emailMap = jdbcTemplate.queryForList(DBQueries.LOAD_EMAILS);
The line width is 80 which is either the s or the . of DBQueries so it should be:
List<Map<String, Object>> emailMap = jdbcTemplate.queryForList(
DBQueries.LOAD_EMAILS);
None of the settings that I have tested for Line Wrapping -> Function Calls -> Arguments
It's cute that my browser is currently displaying a scrollbar under the unwrapped code!
It is the formatter's setting for "Line Wrapping -> Assignments". Set it to "Do not wrap".

Categories