I've been given the unenviable task of updating someone else's 4 year old half-finished code in a language that I largely don't know.
Previously, a powershell script uploaded all records from Microsoft Access to Salesforce, which required juggling of various config files. This would be done once a month.
I have changed it to upload all records from the previous two days, and scheduled it to run once a day. Now it only uploads some of the data (the difference varies by 60-100 values depending on the table).
I am currently painstakingly comparing the values manually to see if anything is common with the Microsoft Access data and salesforce data.
This is the bit that does the uploading:
CMD /c "call ..\Java\bin\java.exe -Xms1024m -Xmx1256m -cp ..\dataloader-28.0.2-uber.jar -Dsalesforce.config.dir=.\config com.salesforce.dataloader.process.ProcessRunner process.name=Patients"
This bit is the bit that extracts Access data to csv:
Get-AccessData -sql "select * from $tables where DateReceived >= `#$firstDate`#" -connection $db | Export-Csv -Path $outcsv –notype
Also, the script doesn't work if I run it through the Powershell editor, but does work if I just run it through the batch file my predecessor has kindly left.
Related
I am currently training to be an application developer and am in my second month.
Now I got a task that felt impossible for me.
I have also googled since yesterday but did not find anything.
I have to insert the data from a database that I access with SQL Explorer into a .csv file. and save it on my machine
It should work automatically, but for testing it doesn't have to.
I use Eclipse and program with Java, have seen something on the Internet that it works with MySQL, but the DB2 database is connected with the SQL Explorer.
My plan is a solution, programming in java to update everyday the csv thats basically my task
Sorry if it doesn't fit here on Stackoverflow, I'm totally lost because I'm still in the trial period.
Greetings from Germany
IBM Data Server Client and driver types.
The IBM Data Server Driver Package, for example, contains the Command line processor plus utility which can run various db2 commands, statements and scripts including the Db2 EXPORT command.
Usage:
From the OS command line:
clpplus -nw user/password#host:port/database #s1.sql
The contents of the s1.sql file used as a parameter above:
SET ECHO ON;
EXPORT TO "full\path\to\my_file.csv" of del
SELECT *
FROM MYTABLE;
EXIT
You may use whatever valid SELECT statement in the EXPORT command.
(Don't suggest Hadoop or map reduce solution even they sounds logically the same)
I have a big file - 70GB of raw html files and I need to do the parsing to get the information I need.
I have delt with 10GB file successfully before using standardI/O:
cat input_file | python parse.py > output_file
And in my python script, it reads every html (each html per line) from standard input and writes the result back to standard output.
from bs4 import BeautifulSoup
import sys
for line in sys.stdin:
print ....
The code is very simple but right now, I am dealing with a big file which is horribly slow on one node. I have a cluster of about 20 nodes. And I am wondering how could I easily distribute this work.
What I have done so far:
split -l 5000 input_file.all input_file_ # I have 60K lines in total in that 70G file
Now the big file have been splitted into several small files:
input_file_aa
input_file_ab
input_file_ac
...
Then I have no problem working with each one of them:
cat input_file_aa | python parser.py > output_file_aa
What I gonna do is probably scp the input_file to each node and do the parsing and then scp the result back, but there are 10+ nodes! I it is so tedious to do that manually.
I am wondering how could I easily distribute these files to other nodes and do the parsing and move the result back?
I am open to basic SHELL, JAVA, Python solution. Thanks a lot in advance and let me know if you need more explanation.
Note, I do have a folder called /bigShare/ that could be assessible on every node and the contents are synchronized and stay the same. I don't know how the architect implemented that (NFS..? I don't know how to check) but I could put my input_file and python script there so the rest is how to easily log into those nodes and execute the command.
Btw, I am on Red Hat.
Execute the command remotely with remote piping to stdout. Then make the local command pipe to a local file.
Example:
ssh yourUserName#node1 "cat input_file_node1 | python parser.py" >output_file_node1
If the files have not been copied to the different nodes, then:
ssh yourUserName#node1 "python parser.py" <input_file_node1 >output_file_node1
This assumes that yourUserName has been configured with key-based authentication. Otherwise, you will need to enter your password manually (20 times! :-( ). To avoid this, you can use expect, but I will strongly suggest to setup key-based authentication. You can do the later using expect too.
Assuming you want to process a piece of each file on a host of its own: first copy the python script to the remote hosts. Then loop over the remote hosts:
for x in aa ab ac ...; do
ssh user#remote-$x python yourscript.py <input_file_$x >output_file_$x &
done;
If the processing nodes don't have names that are easy to generate you can create aliases for them in your .ssh/config, for example:
Host remote-aa
Hostname alpha.int.youcompany
Host remote-ab
Hostname beta.int.yourcompany
Host remote-ac
Hostname gamma.int.yourcompany
This particular use case could be more easily solved by editing /etc/hosts though.
How can I get the Manufacturer and the Modal Number of an XP Home computer? I asked a similar question 3 months ago here. The answers were very helpful, but Windows XP Home Premium Edition does not have wmic or systeminfo. I looked in the registry on a few machines and did not find any consistent patterns.
Do you have any ideas? I'd like to stick with Java and the Command Line.
REG QUERY HKLM\HARDWARE\DESCRIPTION\System\BIOS -v SomeValueName gives you some info about the system, depending on what you use for SomeValueName.
SystemProductName returns the model of my laptop. BaseBoardProduct has the same value, but it's entirely possible that the two will differ on some machines. One of those should give you a model number.
SystemManufacturer and BaseBoardManufacturer have the name of my laptop's manufacturer. Again, the two might differ.
You might be able to get the info by querying HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\OEMInformation, namely the "Model" and "Manufacturer" values. But that looks like info that'd be stored during an OEM install (like when you use Dell's install disc to reinstall Windows on your machine), and may not be present (or may be useless) on home-built systems.
Note, the stuff returned by REG QUERY is in a particular format that you may need to parse. It's not complex, but REG QUERY /? doesn't seem to mention a way to get rid of the headers and the REG_SZ and such that get returned.
(Also note: This is probably obvious to you...but the instant you use Runtime.exec to execute programs to query the Windows registry, you tie yourself to Windows.)
Use Runtime.getRuntime().exec() to execute the appropraite windows command that inspects the registry, capture the output and parse it for the information you need.
I'm testing a schema change over two versions of my app. I used version 1 to generate test data, and now I'd like to take that data into version 2 to run and test the converter. This is easy enough to do live on appengine, since the datastore stays persistent between versions, but I'm finding that local_db.bin does not survive from one version to the next (maybe this is because the version of the sdk also changes between versions).
I'd like to use appcfg.py to download_data from dev_appserver and then upload_data to the new version, but it seems to be asking me to download each kind of entity individually ("Kind stats are not available on dev_appserver.").
I can write a script that iterates through all of my kinds to use download_ and upload_data. Is there an easier way to transfer data between instances of the dev server?
One unelegant solution:
bash script to pump data out:
KINDS="Assessment AssessmentScore Course GradingPeriod GradingPolicy OverallGradeDefinition Standard StandardTag User"
for KIND in $KINDS
do
echo "ugh" | appcfg.py download_data --filename=$KIND --kind=$KIND -email=blagh --url=http://localhost:8888/remote_api --passin --application=myapp
sleep 5
done
And a corresponding script with upload_data to pump it back in. Getting pretty kludgy when you're using bash to drive python to drive http requests to your java app!
I've run into a peculiar problem where a hql query is working as expected on Windows but does not on Linux.
Here is the query:
select distinct resource from Resource resource , ResourceOrganization ro
where (resource.active=true) and (resource.published=true) and
((resource.resourcePublic=true) or ((ro.resource.id=resource.id and
ro.organization.id=2) and ((ro.resource.id=resource.id and ro.forever=true) or
(ro.resource.id=resource.id and current_date between ro.startDate and ro.endDate))))
Explanation: I'm fetching resources from database where they are active, published and either public or shared with an organization such that the sharing is either forever or between 2 dates.
I have the same data in both the databases (exported from Linux and imported in Windows).
On windows I get
Result size = 275
and in Linux I get
Result size = 0
I've looked at the data in Linux and I see that I should get non-zero result size.
Windows has Java 1.5 whereas Linux has Java 1.6
Any suggestions on where I should look to address this problem?
Thanks!
In a SQL command-line tool, enter the SQL one phrase at a time and see when the Linux version goes awry. For best results, do the same thing on Windows.
Make sure the SQL generated is the same on windows and linux.
and you're sure they are referring to exactly the same database, and using the same login? (edit - I re-read and saw I have the same data - Are You Suuuuuure?)
and finally, I see this: and ro.organization.id=2 Are you sure the ID is 2 on both systems? You could get lit up by the sequence numbers/autokey IDs being different.