How to compare two .csv files?

How to compare two .csv files? - java

/**
* 5 points
*
* Return the price of the stock at the given date as a double. Each line
* of the file contains 3 comma separated
* values "date,price,volume" in the format "2016-03-23,106.129997,25703500"
* where the data is YYYY-MM-DD, the price
* is given in USD and the volume is the number of shares traded throughout
* the day.
*
* Note: You don't have to interpret dates for this assignment and you can use
* the Sting's .equals method to
* compare dates whenever date comparisons are needed.
*
* #param stockFileName The filename containing the prices for a stock for each
* day in 2016
* #param date The date to lookup given in YYYY-MM-DD format
* #return The price of the stock represented in stockFileName on the given date
*/
public static double getPrice(String stockFileName, String date) {
try {
BufferedReader br = new BufferedReader(new FileReader(stockFileName));
String line = "";
String unparsedFile = "";
Double Price;
while ((line = br.readLine()) != null) {
String[] Ans = unparsedFile.split(",");
for (String item : Ans){
if(Ans[1].equals(date)){
double aDouble = Double.parseDouble(Ans[2]);
return aDouble;
}
}
}
br.close();
} catch (IOException ex) {
System.out.println("Error");
}
return Ans;
}
The code I have now will I assume compare only one column of the .csv file to the date parameter. How do I make it so that code will look for an individual line then compare [1] of that line to the date parameter and return [2] of that line back?

I think that your code is close to being (functionally) correct. The mistake that you have made is that Java arrays are indexed from zero, not from one. So Ans[1] is actually giving you the second element of the array ... not the first one.
Solution: obvious ... assuming that you understand what I just wrote above!
Once you have fixed the bug(s), you should fix the style issues:
Always start the names of variables with a lower case letter.
Use 4 spaces as your indentation level
One space after if.
One space between ) and {

Related

I need to prase integers after a specific character from list of strings

i got a problem here guys. I need to get all the numbers from a string here from a list of strings.
Lets say one of the strings in the list is "Jhon [B] - 14, 15, 16"
and the format of the strings is constant, every string has maximum of 7 numbers in it and the numbers are separated with "," . I want to get every number after the "-". i am really confused here, i tried everything i know of but i am not getting even close.
public static List<String> readInput() {
final Scanner scan = new Scanner(System.in);
final List<String> items = new ArrayList<>();
while (scan.hasNextLine()) {
items.add(scan.nextLine());
}
return items;
}
public static void main(String[] args) {
final List<String> stats= readInput();
}
}

You could...
Just manually parse the String using things like String#indexOf and String#split (and String#trim)
String text = "Jhon [B] - 14, 15, 16";
int indexOfDash = text.indexOf("-");
if (indexOfDash < 0 && indexOfDash + 1 < text.length()) {
return;
}
String trailingText = text.substring(indexOfDash + 1).trim();
String[] parts = trailingText.split(",");
// There's probably a really sweet and awesome
// way to use Streams, but the point is to try
// and keep it simple 😜
List<Integer> values = new ArrayList<>(parts.length);
for (int index = 0; index < parts.length; index++) {
values.add(Integer.parseInt(parts[index].trim()));
}
System.out.println(values);
which prints
[14, 15, 16]
You could...
Make use of a custom delimiter for Scanner for example...
String text = "Jhon [B] - 14, 15, 16";
Scanner parser = new Scanner(text);
parser.useDelimiter(" - ");
if (!parser.hasNext()) {
// This is an error
return;
}
// We know that the string has leading text before the "-"
parser.next();
if (!parser.hasNext()) {
// This is an error
return;
}
String trailingText = parser.next();
parser = new Scanner(trailingText);
parser.useDelimiter(", ");
List<Integer> values = new ArrayList<>(8);
while (parser.hasNextInt()) {
values.add(parser.nextInt());
}
System.out.println(values);
which prints...
[14, 15, 16]

Or You could use a method that will extract signed or unsigned Whole or floating point numbers from a string. The method below makes use of the String#replaceAll() method:
/**
* This method will extract all signed or unsigned Whole or floating point
* numbers from a supplied String. The numbers extracted are placed into a
* String[] array in the order of occurrence and returned.<br><br>
*
* It doesn't matter if the numbers within the supplied String have leading
* or trailing non-numerical (alpha) characters attached to them.<br><br>
*
* A Locale can also be optionally supplied so to use whatever decimal symbol
* that is desired otherwise, the decimal symbol for the system's current
* default locale is used.
*
* #param inputString (String) The supplied string to extract all the numbers
* from.<br>
*
* #param desiredLocale (Optional - Locale varArgs) If a locale is desired for a
* specific decimal symbol then that locale can be optionally
* supplied here. Only one Locale argument is expected and used
* if supplied.<br>
*
* #return (String[] Array) A String[] array is returned with each element of
* that array containing a number extracted from the supplied
* Input String in the order of occurrence.
*/
public static String[] getNumbersFromString(String inputString, java.util.Locale... desiredLocale) {
// Get the decimal symbol the the current system's locale.
char decimalSeparator = new java.text.DecimalFormatSymbols().getDecimalSeparator();
/* Is there a supplied Locale? If so, set the decimal
separator to that for the supplied locale */
if (desiredLocale != null && desiredLocale.length > 0) {
decimalSeparator = new java.text.DecimalFormatSymbols(desiredLocale[0]).getDecimalSeparator();
}
/* The first replaceAll() removes all dashes (-) that are preceeded
or followed by whitespaces. The second replaceAll() removes all
periods from the input string except those that part of a floating
point number. The third replaceAll() removes everything else except
the actual numbers. */
return inputString.replaceAll("\\s*\\-\\s{1,}","")
.replaceAll("\\.(?![\\d](\\.[\\d])?)", "")
.replaceAll("[^-?\\d+" + decimalSeparator + "\\d+]", " ")
.trim().split("\\s+");
}

Retrieve Line Numbers from Diff Patch Match

I am working on a project that compares two large text file versions (around 5000+ lines of text). The newer version contains potentially new and removed content. It is intended to help detect early changes in text versions as a team receives information from that text.
To solve the problem, I use the diff-match-patch libary, which allows me to identify already removed and new content. In the first step I search for changes.
public void compareStrings(String oldText, String newText){
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
}
Then I filter the list by the keywords INSERT/DELETE to get only the new/removed content.
public String showAddedElements(){
String insertions = "";
for(Diff elem: diffs){
if(elem.operation == Operation.INSERT){
insertions = insertions + elem.text + System.lineSeparator();
}
}
return insertions;
}
However, when I output the contents, I sometimes get only single letters, like (o, contr, ler), when only single characters were removed/added. Instead, I would like to output the whole sentence in which a change occured.
Is there a way to also retrieve the line number from the DiffMatchPatch where the changes occured?

I have found a solution by using another libary for the line extraction. The DiffUtils (Class DiffUtils of DMitry Maumenko) helped me achieve the desired goal.
/**
* Converts a String to a list of lines by dividing the string at linebreaks.
* #param text The text to be converted to a line list
*/
private List<String> fileToLines(String text) {
List<String> lines = new LinkedList<String>();
Scanner scanner = new Scanner(text);
while (scanner.hasNext()) {
String line = scanner.nextLine();
lines.add(line);
}
scanner.close();
return lines;
}
/**
* Starts a line-by-line comparison between two strings. The results are included
* in an intern list element for further processing.
*
* #param firstText The first string to be compared
* #param secondText The second string to be compared
*/
public void startLineByLineComparison(String firstText, String secondText){
List<String> firstString = fileToLines(firstText);
List<String> secondString = fileToLines(secondText);
changes = DiffUtils.diff(firstString, secondString).getDeltas();
}
After inserting the list with changes can be extracted by using the following code, whereas elem.getType() represents the type of difference between the text:
/**
* Returns a String filled with all removed content including line position
* #return String with removed content
*/
public String returnRemovedContent(){
String deletions = "";
for(Delta elem: changes){
if(elem.getType() == TYPE.DELETE){
deletions = deletions + appendLines(elem.getOriginal()) + System.lineSeparator();
}
}
return deletions;
}

How to parse data from mongodb document

I am using change stream to see the changes in mongodb. I retrieve the document in the below format, now how to parse in strings. I need the value of $oid and name
Full document is
{"_id": {"$oid": "5c60f87a9ea5deac53457e9c"}, "name": "freddy"}
I am using Java code
MongoCursor<ChangeStreamDocument<BasicDBObject>> cursor1 = collection.watch().iterator();
System.out.println("Connection Completely Established 4");
for(int i = 1; i <= 200; i++)
{
ChangeStreamDocument<BasicDBObject> next1 = cursor1.next();
System.out.println("Operation Type is " + next1.getOperationType());
System.out.println("Database Name is" + next1.getDatabaseName());
System.out.println("Full Document is " + next1.getFullDocument());
}

If you know the format of the document string then you can use a method to acquire the data needed. The getBetween() method below will retrieve the information you want from your provided document string, here is how it might be used to achieve this:
String docString = "Full Document is {\"_id\": {\"$oid\": \"5c60f87a9ea5deac53457e9c\"}, \"name\": \"freddy\"}";
String oid = getBetween(docString, "$oid\": \"", "\"}")[0];
String name = getBetween(docString, "name\": \"", "\"}")[0];
System.out.println(oid);
System.out.println(name);
The Console Window will display:
5c60f87a9ea5deac53457e9c
freddy
Here is the getBetween() method:
/**
* Retrieves any string data located between the supplied string leftString
* parameter and the supplied string rightString parameter.<br><br>
* <p>
* <p>
* This method will return all instances of a substring located between the
* supplied Left String and the supplied Right String which may be found
* within the supplied Input String.<br>
*
* #param inputString (String) The string to look for substring(s) in.
*
* #param leftString (String) What may be to the Left side of the substring
* we want within the main input string. Sometimes the
* substring you want may be contained at the very
* beginning of a string and therefore there is no
* Left-String available. In this case you would simply
* pass a Null String ("") to this parameter which
* basically informs the method of this fact. Null can
* not be supplied and will ultimately generate a
* NullPointerException.
*
* #param rightString (String) What may be to the Right side of the
* substring we want within the main input string.
* Sometimes the substring you want may be contained at
* the very end of a string and therefore there is no
* Right-String available. In this case you would simply
* pass a Null String ("") to this parameter which
* basically informs the method of this fact. Null can
* not be supplied and will ultimately generate a
* NullPointerException.
*
* #param options (Optional - Boolean - 2 Parameters):<pre>
*
* ignoreLetterCase - Default is false. This option works against the
* string supplied within the leftString parameter
* and the string supplied within the rightString
* parameter. If set to true then letter case is
* ignored when searching for strings supplied in
* these two parameters. If left at default false
* then letter case is not ignored.
*
* trimFound - Default is true. By default this method will trim
* off leading and trailing white-spaces from found
* sub-string items. General sentences which obviously
* contain spaces will almost always give you a white-
* space within an extracted sub-string. By setting
* this parameter to false, leading and trailing white-
* spaces are not trimmed off before they are placed
* into the returned Array.</pre>
*
* #return (1D String Array) Returns a Single Dimensional String Array
* containing all the sub-strings found within the supplied Input
* String which are between the supplied Left String and supplied
* Right String.
*/
public String[] getBetween(String inputString, String leftString, String rightString, boolean... options) {
// Return nothing if nothing was supplied.
if (inputString.equals("") || (leftString.equals("") && rightString.equals(""))) {
return null;
}
// Prepare optional parameters if any supplied.
// If none supplied then use Defaults...
boolean ignoreCase = false; // Default.
boolean trimFound = true; // Default.
if (options.length > 0) {
if (options.length >= 1) {
ignoreCase = options[0];
}
if (options.length >= 2) {
trimFound = options[1];
}
}
// Remove any ASCII control characters from the
// supplied string (if they exist).
String modString = inputString.replaceAll("\\p{Cntrl}", "");
// Establish a List String Array Object to hold
// our found substrings between the supplied Left
// String and supplied Right String.
List<String> list = new ArrayList<>();
// Use Pattern Matching to locate our possible
// substrings within the supplied Input String.
String regEx = Pattern.quote(leftString)
+ (!rightString.equals("") ? "(.*?)" : "(.*)?")
+ Pattern.quote(rightString);
if (ignoreCase) {
regEx = "(?i)" + regEx;
}
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(modString);
while (matcher.find()) {
// Add the found substrings into the List.
String found = matcher.group(1);
if (trimFound) {
found = found.trim();
}
list.add(found);
}
return list.toArray(new String[0]);
}

Finding the strings in a TreeSet that start with a given prefix

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.
(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)
How can I do this?

To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.
But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:
/**
* #param prefix
* #return The least string that's greater than all strings starting with
* prefix, if one exists. Otherwise, returns Optional.empty().
* (Specifically, returns Optional.empty() if the prefix is the
* empty string, or is just a sequence of Character.MAX_VALUE-s.)
*/
private static Optional<String> incrementPrefix(final String prefix) {
final StringBuilder sb = new StringBuilder(prefix);
// remove any trailing occurrences of Character.MAX_VALUE:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
sb.setLength(sb.length() - 1);
}
// if the prefix is empty, then there's no upper bound:
if (sb.length() == 0) {
return Optional.empty();
}
// otherwise, increment the last character and return the result:
sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
return Optional.of(sb.toString());
}
To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:
/**
* #param allElements - a SortedSet of strings. This set must use the
* natural string ordering; otherwise this method
* may not behave as intended.
* #param prefix
* #return The subset of allElements containing the strings that start
* with prefix.
*/
private static SortedSet<String> getElementsWithPrefix(
final SortedSet<String> allElements, final String prefix) {
final Optional<String> endpoint = incrementPrefix(prefix);
if (endpoint.isPresent()) {
return allElements.subSet(prefix, endpoint.get());
} else {
return allElements.tailSet(prefix);
}
}
See it in action at: http://ideone.com/YvO4b3.

If anybody is looking for a shorter version of ruakh's answer:
First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)
public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
String first = set.ceiling(prefix);
char[] chars = prefix.toCharArray();
if(chars.length>0)
chars[chars.length-1] = (char) (chars[chars.length-1]+1);
String last = set.floor(new String(chars));
if(first==null || last==null || last.compareTo(first)<0)
return new TreeSet<>();
return set.subSet(first, true, last, true);
}

Java - Most efficient way to convert string to double

Hi I am reading from a text file and saving each line (split by a comma) into an array. The only problem is that most of the elements in the array are double values where as two elements are strings. As a result of this I had to make the array a String[] array. Due to this, whenever I want to perform some equations on the double values in the array, I have to first parse them as a double value. I am literally running 1000+ iterations of these equations, therefore my code is constantly parsing the strings into a double. This is a costly way which is slowing down my program. Is there a better way I can convert the values from the string array to double values or is there a better approach I should take when saving the lines from the text file? Thanks
Here is what one of the arrays looks like after I have read from the text file:
String[] details = {"24.9", "100.0", "19.2" , "82.0", "Harry", "Smith", "45.0"};
I now need to multiply the first 2 elements and add that to the sum of the 3rd, 4th and 7th elements. In other words I am only using the numerical elements (that are ofcourse saved as strings)
double score = (Double.parseDouble(details[0]) * Double.parseDouble(details[1])) + Double.parseDouble(details[2]) + Double.parseDouble(details[3]) + Double.parseDouble(details[6]);
I have to do this for every single line in the text file (1000+ lines). As a result of this my program is running very slowly. Is there a better way I can convert the string values into a double? or is there a better way I should go about storing them in the first place?
EDIT: I have used profiler to check which part of the code is the slowest and it is indeed the code that I have shown above

Here's an example of generating an input file like the one you describe that's 10000 lines long, then reading it back in and doing the calculation you posted and printing the result to stdout. I specifically disable any buffering when reading the file in order to get the worst possible read performance. I'm also not doing any caching at all, as others have suggested. The entire process, including generating the file, doing the calculation, and printing the results, consistently takes around 520-550 ms. That's hardly "slow", unless you're repeating this same process for hundreds or thousands of files. If you see drastically different performance from this, then maybe it's a hardware problem. A failing hard disk can drop read performance to nearly nothing.
import java.io.*;
import java.util.Random;
public class ReadingDoublesFromFileEfficiency {
private static Random random = new Random();
public static void main(String[] args) throws IOException {
long start = System.currentTimeMillis();
String filePath = createInputFile();
BufferedReader reader = new BufferedReader(new FileReader(filePath), 1);
String line;
while ((line = reader.readLine()) != null) {
String[] details = line.split(",");
double score = (Double.parseDouble(details[0]) * Double.parseDouble(details[1])) + Double.parseDouble(details[2]) + Double.parseDouble(details[3]) + Double.parseDouble(details[6]);
System.out.println(score);
}
reader.close();
long elapsed = System.currentTimeMillis() - start;
System.out.println("Took " + elapsed + " ms");
}
private static String createInputFile() throws IOException {
File file = File.createTempFile("testbed", null);
PrintWriter writer = new PrintWriter(new FileWriter(file));
for (int i = 0; i < 10000; i++) {
writer.println(randomLine());
}
writer.close();
return file.getAbsolutePath();
}
private static String randomLine() {
return String.format("%f,%f,%f,%f,%s,%s,%f",
score(), score(), score(), score(), name(), name(), score());
}
private static String name() {
String name = "";
for (int i = 0; i < 10; i++) {
name += (char) (random.nextInt(26) + 97);
}
return name;
}
private static double score() {
return random.nextDouble() * 100;
}
}

You'd do better to create a proper object and store the values in that - this gives you two major benefits, 1) your code will be faster since you avoid needlessly recomputing double values and 2) your code will be clearer, since the fields will be named rather than making calls like details[0] where it's completely unclear what [0] is referring to.
Due to 2) I don't know what the fields are supposed to be, so obviously your class will look different, but the idea's the same:
public class PersonScore {
private double[] multipliers = new double[2];
private double[] summers = new double[3];
private String first;
private String last;
// expects a parsed CSV String
public PersonScore(String[] arr) {
if(arr.length != 7)
throw new InvalidArgumentException("Must pass exactly 7 fields");
multipliers[0] = Double.parseDouble(arr[0]);
multipliers[1] = Double.parseDouble(arr[1]);
summers[0] = Double.parseDouble(arr[2]);
summers[0] = Double.parseDouble(arr[3]);
summers[0] = Double.parseDouble(arr[6]);
first = arr[4];
last = arr[5];
}
public double score() {
double ret = 1;
for(double mult : multipliers)
ret *= mult;
for(double sum : summers)
ret += sum;
return ret;
}
public String toString() {
return first+" "+last+": "+score();
}
}
Notice there's an additional benefit, that the score method is now more robust. Your implementation above hard-coded the fields we wanted to use, but by parsing and storing the fields as structure content, we're able to implement a more readable, more scalable score calculation method.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to compare two .csv files? - java

Related

I need to prase integers after a specific character from list of strings

Retrieve Line Numbers from Diff Patch Match

How to parse data from mongodb document

Finding the strings in a TreeSet that start with a given prefix

Java - Most efficient way to convert string to double

Categories

Resources