August, 2007 Bill Wyatt ALL PROGRAMS ARE IN /home/wyatt/progs/Sloan, unless a starbase program or otherwise mentioned. Most are ksh93 scripts with a few sh and C programs also. 1) Transfer the data from FermiLab: A) Check at sdss.org for the URL of the data directory. Look for a link "Data Access", then under that, "http access to DAS", then "imaging", then "inchunk_best". For DR6, this is http://das.sdss.org/DR6/data/imaging/inchunk_best B) Edit that into the script "wgetObj.command" - the 'http' variable. C) Make a directory somewhere with at least 1.5 TB of space. Then, cd into it and run "wgetObj.command". That gets all the tsObj*.fit and *.par files from the subdirectories on the data server. I have been using "/pool/megascr1/wyatt/sdss/DR6/Photo.data". This will take days for the entire dataset, as for DR6 there are 326,189 FITS files taking 973 GB of space. 2) First stage processing A) Run "do_tsObjtodb" in the same directory as (1C) above. It is ok to run it on another computer and also to run it before the wget in (1) is finished as it waits for all the data in a stripe to be transferred before it processes that stripe. However, if it has to wait for one hour with no new stripes completed, it exits and will have to be restarted. B) This program makes a list of all the *.fit files in a directory, then calls "tsObjtodb" to create a starbase db file for each FITS file in the list. It runs two processes in parallel, each processing one half of a stripe. "do_tsObjtodb" runs some sanity checks on the data; for example, the number of files in the subdirectory 1, 2, 3, etc. must agree with the numbers in the stripe's tsChunk*.par file. The fits binary table files are names of the form "tsObj*.fit", and, using the "fundisp" program from the FUNTOOLS package, "tsObjtodb" converts the fields into a starbase db file with the same base name and a ".db" extension. Not all the fields in the FITS file are transferred, and some extra information is added, e.g. the objID as used by SDSS in their database. The resulting db files are typically 20% the size of the FITS files. N.B.: The unique objID assigned by Sloan is (((( skyver * 2048 + rerun) * 65536 + run) * 8 + camcol) * 8192 + field) * 65536 + obj The toObjtodb script uses the C program "llprint" to calculate this 64-bit for addition into the starbase table as a string. 3) Second stage processing In the directory of stripes, run "dbassemble". This takes each stripe, makes a union of each of the 1/, 2/, etc. tables into a master, then makes a union of those, with its name as the stripe name plus a .db extemsion. This staging is done because there are so many tables the command line might become too long. Besides, when errors are discovered, it's easier to fix a small piece than a large one. 4) Index by objID (physical sort) - requires TMPDIR to be set to a directory with ~ 215 GB of space, e.g. /pool/tdc3. index -mb -n PhotoPrim.db dec%0.1 ra Then, sort it on a secondary index of objID index -mi PhotoPrim.db objID 5) If you haven't already, make the SpecObjAll.db database (see notes in AAAreadme.spectro), physically sorted (indexed) by dec%0.1 and ra, and a secondary index by specObjID. 6) Match PhotoPrim.db against SpecObjAll.db, by ra and dec, to extract the objID <--> specObjID correspondences: ~/progs/Sloan/crossmatch & This gives a table, ObjSpecMatch.db, which is further reduced to eliminate duplicates, defined as objects with the same objID or specObjID within 0.8 arc-sec of each other. The resulting table is ObjSpecMatch_nodups.db, used in (7) and (8). See the scripts 'proc_objdups' and 'proc_specdups' to see how the higher-quality object or spectrum are chosen between duplicates. 7) Run edittable on PhotoPrim.db using the duplicate-pruned ObjSpecMatch_nodups.db database to replace the placeholder "000000000000" for specObjID: edittable -v PhotoPrim.db objID specObjID