WFCAM SCIENCE ARCHIVE

Before contacting wsa-support@roe.ac.uk if you have a query about accessing the WSA please see if it has already been covered in the Q&A listed below. (note: please do not contact individuals in the Wide-Field Astronomy Unit, use wsa-support@roe.ac.uk).

General look-ups between tables *Source and *Detection
Default detection rows and objID or (multiframeID,extNum,seqNum) for look-ups between *Source and *Detection?
Coverting fluxes in WFCAM flat-file FITS catalogues to calibrated magnitudes
Trade-off between completeness and reliability in point source sample selections
Why is my CMD/two-colour diagram so messy?
Issues relating to the use of the ZPN World Co-ordinate System
Compressed image format (CFITSIO RICE compression)
Querying date/time quantities
Querying string quantities
Querying cross-matched catalogue data (e.g. WFCAM v. SDSS)
Coordinates - decimal degrees, sexagesimal, radians
Viewing the ingest history
Obtaining observation dates for catalogue selections
How to compute the median in SQL
Connection problems - port 8080 (wireless network)

General look-ups between tables *Source and *Detection.
In general when querying the *Source table, you may find that certain catalogue quantities that you need are not present. This is because the table design in the WSA is to write only a subset of the most useful attributes available from *Detection into *Source (for efficiency reasons). For example, CASU standard source extraction produces 13 aperture magnitudes for every detection in every passband and these are of course present in *Detection, but in *Source you will only see a subset of 3 per passband - aperMags 1, 3 and 4 for example. How do you query if you are interested in an aperture magnitude other than 1, 3 or 4 - say aperMag2 in the J band for a GPS query? The answer is to do a join query between *Source and *Detection, noting that each row of *Source is a member of a frame set the details of which are stored in *MergeLog (hence the join has to include *MergeLog as well). The SQL syntax is best illustrated by an example:
SELECT ra, dec, jd.aperMag2 as jAperMag2
FROM gpsSource AS s, gpsMergeLog AS l, gpsDetection as jd
WHERE
/* Join source to merge log for the frame set: */
s.frameSetID=l.frameSetID AND
/* Join merge log to detection for the J frame: */
l.jmfID = jd.multiframeID AND l.jeNum = jd.extNum AND
/* Join source to detection to look up the detection required: */
s.jSeqNum = jd.seqNum
Note that the final predicate could be done via objID: ... AND s.jobjID = jd.objID but this is most inadvisable for efficiency reasons (the attribute combination multiframe number, extension number, sequence number is the primary key in *Detection so look-ups are very fast on that combination) and furthermore for the more subtle reasons detailed below.

Default detection rows and objID or (multiframeID,extNum,seqNum) for look-ups between *Source and *Detection.
In the example *Source-to-*Detection join query given above, one might be tempted to use objID as the look-up into *Detection in place of the more long-winded predicate combination involving multiframeID, extNum and seqNum since both the former attribute and the latter 3-attribute combination are unique. Users should be aware, however, that there are default rows in the *Detection table to allow joins between *Source and *Detection that allow selection of *Source rows that have incomplete passband coverage when querying that table in conjunction with a join with *Detection. Since there is now (i.e. from DR4 onwards) a default row in *Detection for every (non-default) multiframe/extension number combination, looking up a row in *Detection via objID alone when that attribute is a default produces many matches for all those default rows in *Detection having that same default value in that table. Those rows could be excluded by an addition WHERE ... AND objID > 0 but this is very inefficient and will be inappropriate if you are trying to select source rows that may have non-detections (and therefore default objIDs) in some passbands. Hence, ALWAYS use the correct relational association of the combination of multiframeID, extNum and seqNum when joining *Source to *Detection, as in the example query given above.

Trade-off between completeness and reliability in point source sample selections
There is always a trade-off between completeness and reliability when using catalogue data, since no image analysis software is perfect. The morphological classification attributes class and classStat allow you to make a choice about how to maximise completeness, or conversely minimise contamination, at query time - for examples of this, consult the SQL scripts defined under "views" in the schema browser. Note that although the integer classification scheme is based on the N(0,1) measure of stellarness (classStat) it has several overrides built in to attempt to make it more reliable. ClassStat is equivalent to a straight likelihood measure based directly on the curve-of-growth of the aperture fluxes. But because the statistic is not strictly speaking a Gaussian distribution and has much broader tails, the Class parameter has assorted overrides invoked based on saturation, ellipticity as a function of magnitude and so on. The most reliable way of getting a reasonably complete stellar-like sample is to require class of -1 and -2 on the bands in question and then if you are concerned about completeness add in any extra sources with ClassStat in the range 0 to +n-sigma; where n is between 3 and 5 depending on how non-stellar you want to push things.

Why is my CMD/two-colour diagram so messy?
When selecting samples of merged sources and displaying in colour-magnitude or multi-colour space, simply selecting all sources from the merged source tables will not yield clean diagrams. As always with survey datasets, there is a balance to be struck between reliability and completeness, and that balance can only be made by the end-user for a given science application. Hence, for example, the pairing radius used in the source merging is very generous in terms of the RMS astrometric precision of WFCAM detections (this is to pull in high proper motion stars, and very low s/n objects with poorly defined centroids). In general, you should always refine sample selections from the merged source lists appropriately for your science usage. So if, for example, you are interested in a highly reliable set of point sources from lasSource that you know are not moving (e.g. QSOs), then you should filter your selection by using the pair proximity offset attributes *Xi and *Eta in your WHERE clause. Examples of this are given in the SQL queries that define the reliable source "views" in the schema browser.

Issues relating to the use of the ZPN World Co-ordinate System
CASU pipeline processed images from WFCAM come with accurate (good to around 0.1 arcsec) astrometry based on 2MASS secondary astrometric standards. The astrometric information is encoded in the FITS file as World Co-ordinate System (WCS) keywords: the WCS type used is the zenithal polynomial (ZPN) projection. Note that some older image display utilities may not compute correctly cursor co-ordinates for this projection, so if there is any suspicion that this may be the case, do not rely on the real-time cursor readout from such software.
Information on the fix for ZPN in IRAF
Terapix Software ZPN howto

Compressed image format (CFITSIO RICE compression)
CASU pipeline processed images are supplied in a compact, compressed format to reduce online storage and network transfer overheads. The format employed is known as Rice tile compression, and is implemented in the CFITSIO library that is used by the core processing software. By default, images are delivered from the WSA in this format: network transfer will be a factor 3 to 4 times quicker, and end-user storage requirements a factor 3 to 4 times smaller, if you download the compressed image files (the compression ratio depends on the noise properties of the image in question; these figures are typical for short WFCAM exposures, as in the UKIDSS Large Area Survey). Hence it makes a great deal of sense to stick to using Rice compressed products as opposed to selecting standard decompressed images (which are also available). The only disadvantage is that the Rice compressd format is not universally supported in applications software at present; note however that it is easy to download and install an image copy utility (imcopy) that can decompress to standard FITS (the Rice compressed images are actually stored as FITS binary tables and not standard FITS images - this is why some image display utilities will not handle them correctly). On a system with Starlink you can compile imcopy using

gcc -I/star/include -L/star/lib -lm imcopy.c -lcfitsio -o imcopy

The WSA imcopy page gives a step by step guide to downloading and installation. CFITSIO Quick Start Guide includes additional informational on installation.

When trying to view or uncompress our FITS files, in some rare cases you may come across one or both of the following error messages:

FITSIO status = 1: non-CFITSIO program error 

decompression error: hit end of compressed byte stream

This is because our FITS file images have been compressed using a newer version of fpack & CFITSIO. The only solution is to first uncompress the FITS file using a version of CFITSIO >= 3.1 (fpack >= 1.0) prior to viewing.

Querying date/time quantities
All dates and time stamps used in the data flow system (i.e. from the telescope/instrument, through the pipeline processing and at the archive end) are universal time co-ordinate (UTC) quantities; the SQL data type for these quantities in the archive is the MS SQL Server datetime type. To query a datetime type, e.g. compare a datetime quantity against a literal constant, simply express that constant as a string in double quotes in the following format:
SELECT count(*)
FROM Multiframe
WHERE dateObs > "2005-Apr-18 14:00:00.0"
will return a count of the number of frames in the archive that have observation dates and times since 14:00:00.0 18th April 2005 (UTC)

Querying string quantities
String quantities (SQL data types char or varchar) can be queried in the WHERE clause of an SQL statement and compared using certain comparators (see the Cookbook) in the same way as numerical quantities. Perhaps the most useful comparator, however, is LIKE. For example, you can select all archived FITS file observations from a given date string in the filename (say 1st April 2005) by the following statement input into the freeform SQL box:
SELECT fileName
FROM Multiframe
WHERE fileName LIKE "%20050401%"
Note the use of double quotes for string literals, and the use of the wildcard percent character to match any string.

Querying cross-matched catalogue data (e.g. WFCAM v. SDSS)
WSA catalogue tables (i.e. those created for the UKIDSS surveys) are automatically cross-matched to a number of external survey catalogue datasets held locally, e.g. SDSS Data Releases; FIRST, IRAS, ROSAT and 2MASS catalogues; and legacy photographic catalogues like USNO-B and the SuperCOSMOS Sky Surveys held within the SuperCOSMOS Science Archive (for a complete list, click on "Browser" on a navigation bar). Rather than prejoining any two datasets to create a static, merged table of what are assumed to be associated sources on the basis of a fixed joining algorithm, the WSA philosophy is to create a table of pointers between any two matched datasets. This means that any externally catalogued source co-incident or nearby a WFCAM source is readily available, out to some predefined maximum angular search radius, and all sorts of science usages are possible because the exact matching criteria (e.g. maximum allowable displacement, consistency in morphological parameters or classification, or even the number of possible external matches for a given WFCAM source) can be tuned at query time by simple expressions in SQL. Furthermore, all attributes of both datasets are available to querying, because no decision has been made as to which attributes to propagate into a static merged set. The flip side to this flexibility is the rather strange overall syntax for querying cross-matched data in SQL: instead of querying a single merged table, e.g. SELECT * FROM MergedTable WHERE ..., in general you have to query three tables: the two cross-matched tables and the table of pointers, e.g.
SELECT *
FROM Table1, ExternalDB..Table2, CrossNeighboursTable2
WHERE ...
A complete tutorial, and example queries employing cross-matched data, are available in Section 4 of the WSA Cookbook.

Coordinates - decimal degrees, sexagesimal, radians:
Coordinates in the WSA are primarily stored as decimal degrees. The region search page does accept sexagesimal format but the results are returned as decimal degrees. Elsewhere e.g. entering constraints in the menu query or freeform SQL pages values must be entered as decimal degrees. Note: RA and Dec values are returned as decimal degrees except when written to FITS files where radians are used

. Viewing the ingest history:
All ingest and curation events that occur during operation of the WFCAM Science Archive are logged in database table ArchiveCurationHistory. If you wish to see, for example, the ingest time stamps and number of images ingested at those times, use the following free-form SQL query:
SELECT timeStamp, framesIngested
FROM ArchiveCurationHistory as A, (SELECT cuEventID, COUNT(*) AS framesIngested FROM Multiframe GROUP BY cuEventID) AS B
WHERE A.cuEventID=B.cuEventID
ORDER BY timeStamp
Note that you will be querying the ingest history of a static, released database product, not the database that is accumulating on a day-to-day basis as a result of routine WSA operations. Hence, for example, data that has been recently transfered from the pipeline processing centre at CASU does not appear immediately in the online accessible database.

Obtaining observation dates for catalogue selections
Observation dates are available for each image in table Multiframe in attribute dateObs. Since every single passband detection originates in an image and each merged source originates from a set of single passband detections, and furthermore the relationship between images, detections and sources is preserved in the relational arrangement of data in the archive, observation dates for catalogue selections are readily available via simple table join expressions in SQL. For example, to select the observation date as an output attribute when querying the lasDetection table, use the following statement:
SELECT ra,dec,dateObs,...
FROM lasDetection AS d, Multiframe AS m
WHERE d.multiframeID=m.multiframeID
The required SQL is a little more complicated in the case of obtaining the observation dates for merged sources, since each source is made up from several detections that have, in general, different observations times (and possibly dates). For example, consider lasSource: the survey design for the UKIDSS LAS is that merged sources are made up from YJHK observations (ignoring the complicating factor of the second epoch J observations for the sake of simplicity). The table lasMergeLog stores information about the image that each individual passband detection comes from in the merged source, and the source table and this merge log are connected via the frameSetID attribute in both. Suppose we are interested in the Z and J observation dates for an LAS source selection consisting of merged sources detected in at least the J and Y passbands - the following SQL will do the trick:
SELECT s.ra,s.dec,my.dateObs AS yDateObs, mj.dateObs AS jDateObs,...
FROM lasSource AS s, lasMergeLog AS l, Multiframe AS mj, Multiframe AS my
WHERE s.frameSetID=l.frameSetID AND ymfID=my.multiframeID AND j_1mfID=mj.multiframeID AND ymfID > 0 AND j_1mfID > 0
Furthermore, you can select a sample based on date differences (for example, you may want to exclude the possibility of variability screwing up colour selection in a sample by insisting on near-simultaneity of the times of observation in the different filters) using the MS SQL Server DATEDIFF function, which has arguments time units, date1 and date2:
SELECT COUNT(*)
FROM lasSource AS s, lasMergeLog AS l, Multiframe AS mj, Multiframe AS my
WHERE s.frameSetID=l.frameSetID AND ymfID=my.multiframeID AND j_1mfID=mj.multiframeID AND ymfID > 0 AND j_1mfID > 0
AND DATEDIFF(mi, mj.dateObs, my.dateObs) BETWEEN -60 AND +60
This query counts the number of merged LAS sources where the Y and J observations were taken within one hour (60 minutes) of each other.

How to compute the median in SQL:
Microsoft Transact-SQL provides useful extensions to the SQL standard that enable easy computation of the median value of an attribute: one simply selects the middle value from an ordered list, generated by a nested query, of the attribute in question as in the following example which selects the median ellipticty of all detector frames:
SELECT MAX(T.avStellarEll) FROM
(SELECT TOP 50 PERCENT avStellarEll FROM MultiframeDetector
WHERE avStellarEll > -0.9E9 ORDER BY avStellarEll ASC) AS T
Note the exclusion of any default values of the attribute in the query. The query can be generalised easily to produce the N^th percentile by changing the number in "TOP 50 PERCENT".

Connection problems - port 8080 (wireless network):
WSA access is carried on port 8080. Please make sure your network/firewall has this port open (this was first reported by a user trying to access the SSA from a wireless network.

WFAU, Institute for Astronomy,
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ, UK

wsa-support@roe.ac.uk
24/1/2014