How to parse data from mongodb document

How to parse data from mongodb document - java

I am using change stream to see the changes in mongodb. I retrieve the document in the below format, now how to parse in strings. I need the value of $oid and name
Full document is
{"_id": {"$oid": "5c60f87a9ea5deac53457e9c"}, "name": "freddy"}
I am using Java code
MongoCursor<ChangeStreamDocument<BasicDBObject>> cursor1 = collection.watch().iterator();
System.out.println("Connection Completely Established 4");
for(int i = 1; i <= 200; i++)
{
ChangeStreamDocument<BasicDBObject> next1 = cursor1.next();
System.out.println("Operation Type is " + next1.getOperationType());
System.out.println("Database Name is" + next1.getDatabaseName());
System.out.println("Full Document is " + next1.getFullDocument());
}

If you know the format of the document string then you can use a method to acquire the data needed. The getBetween() method below will retrieve the information you want from your provided document string, here is how it might be used to achieve this:
String docString = "Full Document is {\"_id\": {\"$oid\": \"5c60f87a9ea5deac53457e9c\"}, \"name\": \"freddy\"}";
String oid = getBetween(docString, "$oid\": \"", "\"}")[0];
String name = getBetween(docString, "name\": \"", "\"}")[0];
System.out.println(oid);
System.out.println(name);
The Console Window will display:
5c60f87a9ea5deac53457e9c
freddy
Here is the getBetween() method:
/**
* Retrieves any string data located between the supplied string leftString
* parameter and the supplied string rightString parameter.<br><br>
* <p>
* <p>
* This method will return all instances of a substring located between the
* supplied Left String and the supplied Right String which may be found
* within the supplied Input String.<br>
*
* #param inputString (String) The string to look for substring(s) in.
*
* #param leftString (String) What may be to the Left side of the substring
* we want within the main input string. Sometimes the
* substring you want may be contained at the very
* beginning of a string and therefore there is no
* Left-String available. In this case you would simply
* pass a Null String ("") to this parameter which
* basically informs the method of this fact. Null can
* not be supplied and will ultimately generate a
* NullPointerException.
*
* #param rightString (String) What may be to the Right side of the
* substring we want within the main input string.
* Sometimes the substring you want may be contained at
* the very end of a string and therefore there is no
* Right-String available. In this case you would simply
* pass a Null String ("") to this parameter which
* basically informs the method of this fact. Null can
* not be supplied and will ultimately generate a
* NullPointerException.
*
* #param options (Optional - Boolean - 2 Parameters):<pre>
*
* ignoreLetterCase - Default is false. This option works against the
* string supplied within the leftString parameter
* and the string supplied within the rightString
* parameter. If set to true then letter case is
* ignored when searching for strings supplied in
* these two parameters. If left at default false
* then letter case is not ignored.
*
* trimFound - Default is true. By default this method will trim
* off leading and trailing white-spaces from found
* sub-string items. General sentences which obviously
* contain spaces will almost always give you a white-
* space within an extracted sub-string. By setting
* this parameter to false, leading and trailing white-
* spaces are not trimmed off before they are placed
* into the returned Array.</pre>
*
* #return (1D String Array) Returns a Single Dimensional String Array
* containing all the sub-strings found within the supplied Input
* String which are between the supplied Left String and supplied
* Right String.
*/
public String[] getBetween(String inputString, String leftString, String rightString, boolean... options) {
// Return nothing if nothing was supplied.
if (inputString.equals("") || (leftString.equals("") && rightString.equals(""))) {
return null;
}
// Prepare optional parameters if any supplied.
// If none supplied then use Defaults...
boolean ignoreCase = false; // Default.
boolean trimFound = true; // Default.
if (options.length > 0) {
if (options.length >= 1) {
ignoreCase = options[0];
}
if (options.length >= 2) {
trimFound = options[1];
}
}
// Remove any ASCII control characters from the
// supplied string (if they exist).
String modString = inputString.replaceAll("\\p{Cntrl}", "");
// Establish a List String Array Object to hold
// our found substrings between the supplied Left
// String and supplied Right String.
List<String> list = new ArrayList<>();
// Use Pattern Matching to locate our possible
// substrings within the supplied Input String.
String regEx = Pattern.quote(leftString)
+ (!rightString.equals("") ? "(.*?)" : "(.*)?")
+ Pattern.quote(rightString);
if (ignoreCase) {
regEx = "(?i)" + regEx;
}
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(modString);
while (matcher.find()) {
// Add the found substrings into the List.
String found = matcher.group(1);
if (trimFound) {
found = found.trim();
}
list.add(found);
}
return list.toArray(new String[0]);
}

Related

I need to prase integers after a specific character from list of strings

i got a problem here guys. I need to get all the numbers from a string here from a list of strings.
Lets say one of the strings in the list is "Jhon [B] - 14, 15, 16"
and the format of the strings is constant, every string has maximum of 7 numbers in it and the numbers are separated with "," . I want to get every number after the "-". i am really confused here, i tried everything i know of but i am not getting even close.
public static List<String> readInput() {
final Scanner scan = new Scanner(System.in);
final List<String> items = new ArrayList<>();
while (scan.hasNextLine()) {
items.add(scan.nextLine());
}
return items;
}
public static void main(String[] args) {
final List<String> stats= readInput();
}
}

You could...
Just manually parse the String using things like String#indexOf and String#split (and String#trim)
String text = "Jhon [B] - 14, 15, 16";
int indexOfDash = text.indexOf("-");
if (indexOfDash < 0 && indexOfDash + 1 < text.length()) {
return;
}
String trailingText = text.substring(indexOfDash + 1).trim();
String[] parts = trailingText.split(",");
// There's probably a really sweet and awesome
// way to use Streams, but the point is to try
// and keep it simple 😜
List<Integer> values = new ArrayList<>(parts.length);
for (int index = 0; index < parts.length; index++) {
values.add(Integer.parseInt(parts[index].trim()));
}
System.out.println(values);
which prints
[14, 15, 16]
You could...
Make use of a custom delimiter for Scanner for example...
String text = "Jhon [B] - 14, 15, 16";
Scanner parser = new Scanner(text);
parser.useDelimiter(" - ");
if (!parser.hasNext()) {
// This is an error
return;
}
// We know that the string has leading text before the "-"
parser.next();
if (!parser.hasNext()) {
// This is an error
return;
}
String trailingText = parser.next();
parser = new Scanner(trailingText);
parser.useDelimiter(", ");
List<Integer> values = new ArrayList<>(8);
while (parser.hasNextInt()) {
values.add(parser.nextInt());
}
System.out.println(values);
which prints...
[14, 15, 16]

Or You could use a method that will extract signed or unsigned Whole or floating point numbers from a string. The method below makes use of the String#replaceAll() method:
/**
* This method will extract all signed or unsigned Whole or floating point
* numbers from a supplied String. The numbers extracted are placed into a
* String[] array in the order of occurrence and returned.<br><br>
*
* It doesn't matter if the numbers within the supplied String have leading
* or trailing non-numerical (alpha) characters attached to them.<br><br>
*
* A Locale can also be optionally supplied so to use whatever decimal symbol
* that is desired otherwise, the decimal symbol for the system's current
* default locale is used.
*
* #param inputString (String) The supplied string to extract all the numbers
* from.<br>
*
* #param desiredLocale (Optional - Locale varArgs) If a locale is desired for a
* specific decimal symbol then that locale can be optionally
* supplied here. Only one Locale argument is expected and used
* if supplied.<br>
*
* #return (String[] Array) A String[] array is returned with each element of
* that array containing a number extracted from the supplied
* Input String in the order of occurrence.
*/
public static String[] getNumbersFromString(String inputString, java.util.Locale... desiredLocale) {
// Get the decimal symbol the the current system's locale.
char decimalSeparator = new java.text.DecimalFormatSymbols().getDecimalSeparator();
/* Is there a supplied Locale? If so, set the decimal
separator to that for the supplied locale */
if (desiredLocale != null && desiredLocale.length > 0) {
decimalSeparator = new java.text.DecimalFormatSymbols(desiredLocale[0]).getDecimalSeparator();
}
/* The first replaceAll() removes all dashes (-) that are preceeded
or followed by whitespaces. The second replaceAll() removes all
periods from the input string except those that part of a floating
point number. The third replaceAll() removes everything else except
the actual numbers. */
return inputString.replaceAll("\\s*\\-\\s{1,}","")
.replaceAll("\\.(?![\\d](\\.[\\d])?)", "")
.replaceAll("[^-?\\d+" + decimalSeparator + "\\d+]", " ")
.trim().split("\\s+");
}

Retrieve Line Numbers from Diff Patch Match

I am working on a project that compares two large text file versions (around 5000+ lines of text). The newer version contains potentially new and removed content. It is intended to help detect early changes in text versions as a team receives information from that text.
To solve the problem, I use the diff-match-patch libary, which allows me to identify already removed and new content. In the first step I search for changes.
public void compareStrings(String oldText, String newText){
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
}
Then I filter the list by the keywords INSERT/DELETE to get only the new/removed content.
public String showAddedElements(){
String insertions = "";
for(Diff elem: diffs){
if(elem.operation == Operation.INSERT){
insertions = insertions + elem.text + System.lineSeparator();
}
}
return insertions;
}
However, when I output the contents, I sometimes get only single letters, like (o, contr, ler), when only single characters were removed/added. Instead, I would like to output the whole sentence in which a change occured.
Is there a way to also retrieve the line number from the DiffMatchPatch where the changes occured?

I have found a solution by using another libary for the line extraction. The DiffUtils (Class DiffUtils of DMitry Maumenko) helped me achieve the desired goal.
/**
* Converts a String to a list of lines by dividing the string at linebreaks.
* #param text The text to be converted to a line list
*/
private List<String> fileToLines(String text) {
List<String> lines = new LinkedList<String>();
Scanner scanner = new Scanner(text);
while (scanner.hasNext()) {
String line = scanner.nextLine();
lines.add(line);
}
scanner.close();
return lines;
}
/**
* Starts a line-by-line comparison between two strings. The results are included
* in an intern list element for further processing.
*
* #param firstText The first string to be compared
* #param secondText The second string to be compared
*/
public void startLineByLineComparison(String firstText, String secondText){
List<String> firstString = fileToLines(firstText);
List<String> secondString = fileToLines(secondText);
changes = DiffUtils.diff(firstString, secondString).getDeltas();
}
After inserting the list with changes can be extracted by using the following code, whereas elem.getType() represents the type of difference between the text:
/**
* Returns a String filled with all removed content including line position
* #return String with removed content
*/
public String returnRemovedContent(){
String deletions = "";
for(Delta elem: changes){
if(elem.getType() == TYPE.DELETE){
deletions = deletions + appendLines(elem.getOriginal()) + System.lineSeparator();
}
}
return deletions;
}

Finding the strings in a TreeSet that start with a given prefix

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.
(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)
How can I do this?

To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.
But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:
/**
* #param prefix
* #return The least string that's greater than all strings starting with
* prefix, if one exists. Otherwise, returns Optional.empty().
* (Specifically, returns Optional.empty() if the prefix is the
* empty string, or is just a sequence of Character.MAX_VALUE-s.)
*/
private static Optional<String> incrementPrefix(final String prefix) {
final StringBuilder sb = new StringBuilder(prefix);
// remove any trailing occurrences of Character.MAX_VALUE:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
sb.setLength(sb.length() - 1);
}
// if the prefix is empty, then there's no upper bound:
if (sb.length() == 0) {
return Optional.empty();
}
// otherwise, increment the last character and return the result:
sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
return Optional.of(sb.toString());
}
To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:
/**
* #param allElements - a SortedSet of strings. This set must use the
* natural string ordering; otherwise this method
* may not behave as intended.
* #param prefix
* #return The subset of allElements containing the strings that start
* with prefix.
*/
private static SortedSet<String> getElementsWithPrefix(
final SortedSet<String> allElements, final String prefix) {
final Optional<String> endpoint = incrementPrefix(prefix);
if (endpoint.isPresent()) {
return allElements.subSet(prefix, endpoint.get());
} else {
return allElements.tailSet(prefix);
}
}
See it in action at: http://ideone.com/YvO4b3.

If anybody is looking for a shorter version of ruakh's answer:
First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)
public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
String first = set.ceiling(prefix);
char[] chars = prefix.toCharArray();
if(chars.length>0)
chars[chars.length-1] = (char) (chars[chars.length-1]+1);
String last = set.floor(new String(chars));
if(first==null || last==null || last.compareTo(first)<0)
return new TreeSet<>();
return set.subSet(first, true, last, true);
}

unix 'ls' command with wildcard input - equivalent in java 6

I need an equivalent behaviour in java for this command in Unix cli:
ls /data/archive/users/*/*.xml
Which outputs me:
/data/archive/users/2012/user1.xml
/data/archive/users/2013/user2.xml
Is there a simple equivalent implementation for Java 6?

Get user input using java.util.Scanner and use java.io.File.listFiles(FilenameFilter) method to get the list of files in the folder with specific filter.

Yes, there is, and it's called the list method of the File class. See it's Javadoc for details.

I forget where this came from but this should be a good start. There are many more available via Google.
public class RegexFilenameFilter implements FilenameFilter {
/**
* Only file name that match this regex are accepted by this filter
*/
String regex = null; // setting the filter regex to null causes any name to be accepted (same as ".*")
public RegexFilenameFilter() {
}
public RegexFilenameFilter(String filter) {
setWildcard(filter);
}
/**
* Set the filter from a wildcard expression as known from the windows command line
* ("?" = "any character", "*" = zero or more occurances of any character")
*
* #param sWild the wildcard pattern
*
* #return this
*/
public RegexFilenameFilter setWildcard(String sWild) {
regex = wildcardToRegex(sWild);
// throw PatternSyntaxException if the pattern is not valid
// this should never happen if wildcardToRegex works as intended,
// so thiw method does not declare PatternSyntaxException to be thrown
Pattern.compile(regex);
return this;
}
/**
* Set the regular expression of the filter
*
* #param regex the regular expression of the filter
*
* #return this
*/
public RegexFilenameFilter setRegex(String regex) throws java.util.regex.PatternSyntaxException {
this.regex = regex;
// throw PatternSyntaxException if the pattern is not valid
Pattern.compile(regex);
return this;
}
/**
* Tests if a specified file should be included in a file list.
*
* #param dir the directory in which the file was found.
*
* #param name the name of the file.
*
* #return true if and only if the name should be included in the file list; false otherwise.
*/
public boolean accept(File dir, String name) {
boolean bAccept = false;
if (regex == null) {
bAccept = true;
} else {
bAccept = name.toLowerCase().matches(regex);
}
return bAccept;
}
/**
* Converts a windows wildcard pattern to a regex pattern
*
* #param wild - Wildcard patter containing * and ?
*
* #return - a regex pattern that is equivalent to the windows wildcard pattern
*/
private static String wildcardToRegex(String wild) {
if (wild == null) {
return null;
}
StringBuilder buffer = new StringBuilder();
char[] chars = wild.toLowerCase().toCharArray();
for (int i = 0; i < chars.length; ++i) {
if (chars[i] == '*') {
buffer.append(".*");
} else if (chars[i] == '?') {
buffer.append('.');
} else if (chars[i] == ';') {
buffer.append('|');
} else if ("+()^$.{}[]|\\".indexOf(chars[i]) != -1) {
buffer.append('\\').append(chars[i]); // prefix all metacharacters with backslash
} else {
buffer.append(chars[i]);
}
}
return buffer.toString();
}
}

Here is the code I used, it works with relative and absolute paths:
DirectoryScanner scanner = new DirectoryScanner();
if (!inputPath.startsWith("/") || inputPath.startsWith(".")) {
scanner.setBasedir(".");
}
scanner.setIncludes(new String[]{inputPath});
scanner.setCaseSensitive(false);
scanner.scan();
String[] foundFiles = scanner.getIncludedFiles();
(DirectoryScanner from org.apache.tools.ant)

Why do Strings start with a "" in Java? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does “abcd”.StartsWith(“”) return true?
Whilst debugging through some code I found a particular piece of my validation was using the .startsWith() method on the String class to check if a String started with a blank character
Considering the following :
public static void main(String args[])
{
String s = "Hello";
if (s.startsWith(""))
{
System.out.println("It does");
}
}
It prints out It does
My question is, why do Strings start off with a blank character? I'm presuming that under the hood Strings are essentially character arrays, but in this case I would have thought the first character would be H
Can anyone explain please?

"" is an empty string containing no characters. There is no "empty character", unless you mean a space or the null character, neither of which are empty strings.
You can think of a string as starting with an infinite number of empty strings, just like you can think of a number as starting with an infinite number of leading zeros without any change to the meaning.
1 = ...00001
"foo" = ... + "" + "" + "" + "foo"
Strings also end with an infinite number of empty strings (as do decimal numbers with zeros):
1 = 001.000000...
"foo" = "foo" + "" + "" + "" + ...

Seems like there is a misunderstanding in your code. Your statement s.startsWith("") checks if string starts with an empty string (and not a blank character). It may be a weird implementation choice, anyway, it's as is : all strings will say you they start with an empty string.
Also notice a blank character will be the " " string, as opposed to your empty string "".

"Hello" starts with "" and it also starts with "H" and it also starts with "He" and it also sharts with "Hel" ... do you see?

That "" is not a blank it's an empty string. I guess that the API is asking the question is this a substring of that. And the zero-length empty string is a substring of everything.

The empty String ("") basically "satisfies" every string. In your example, java calls
s.startsWith("");
to
s.startsWith("", 0);
which essentially follows the principle that "an empty element(string) satisfies its constraint (your string sentence).".
From String.java
/**
* Tests if the substring of this string beginning at the
* specified index starts with the specified prefix.
*
* #param prefix the prefix.
* #param toffset where to begin looking in this string.
* #return <code>true</code> if the character sequence represented by the
* argument is a prefix of the substring of this object starting
* at index <code>toffset</code>; <code>false</code> otherwise.
* The result is <code>false</code> if <code>toffset</code> is
* negative or greater than the length of this
* <code>String</code> object; otherwise the result is the same
* as the result of the expression
* <pre>
* this.substring(toffset).startsWith(prefix)
* </pre>
*/
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = offset + toffset;
char pa[] = prefix.value;
int po = prefix.offset;
int pc = prefix.count;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > count - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}

For folks who have taken automata theory, this makes sense because the empty string ε is a substring of any string and also is the concatenation identity element, ie:
for all strings x, ε + x = x, and x + ε = x
So yes, every string "startWith" the empty string. Also note (as many others said it), the empty string is different from a blank or null character.

A blank is (" "), that's different from an empty string (""). A blank space is a character, the empty string is the absence of any character.

An empty string is not a blank character. Assuming your question with empty string, I guess they decided to leave it that way but it does seem odd. They could have checked the length but they didn't.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to parse data from mongodb document - java

Related

I need to prase integers after a specific character from list of strings

Retrieve Line Numbers from Diff Patch Match

Finding the strings in a TreeSet that start with a given prefix

unix 'ls' command with wildcard input - equivalent in java 6

Why do Strings start with a "" in Java? [duplicate]

Categories

Resources