In this document we provide a brief introduction to the use of Structured Query Language (SQL) for accessing data in the VISTA Science Archive (VSA). Sections included in this document are a Primer aimed at users new both to the VSA and to SQL and a Reference which should be of use more experienced and returning users. Readers wanting a fuller introduction to SQL should consult an online tutorial or one of the legion of SQL books available: O'Reilly's SQL in a nutshell is a good introduction. The document concludes with some examples of cross-querying VISTA and external survey data, which we expect to be a common usage of the archive.

The VSA and its prototype, the SuperCOSMOS Science Archive (SSA) are large databases so, for test purposes, we have produced the "Personal SSA" (PSSA), which is a small subset of the SSA, containing solely the data in the region of the sky with 184 < RA (deg) < 186 and -1.25 < Dec. (deg) < 1.25, which is the same area of sky as the "Personal SkyServer" produced for the Early Data Release (EDR)of the Sloan Digital Sky Survey (SDSS). The PSSA may be downloaded from here (as a .zip file with installation instructions included) or can be queried using a web interface of the same form as for the full SSA. Similarly, the initial release of the VSA (which contains a limited amount of commissioning data) should be used if you are unfamiliar with SQL but wish to exercise queries on that database. VSA and SSA users are strongly encouraged to use these small subsets for developing and debugging queries: with databases as large as the VSA and full SSA it can take a long time to find out that the query you wrote does not do what you intended!

This VSA Cookbook is based on the SSA version. Currently, the latter provides a more comprehensive introduction to SQL and in particular gives real-world examples of astronomy application queries that were used in the construction of the SSA. Novice users are strongly recommended to familiarise themselves with the PSSA and the information given in the SSA Cookbook before attempting real science with the VSA.

2. Primer

2.1. Relational databases

The VSA is a relational database, which means that it stores data in tables composed of rows and columns. Each row comprises the information stored for one data entry – e.g.. a celestial object in the case of the VSA – and there is one column for each of the attributes recorded for that entry – e.g. RA, Dec, ellipticity, etc. The different tables comprising a database may be linked (or related), if they each have columns representing the same data value (integrity constraints can be included in the table definitions which ensure consistency between two related tables, e.g. by preventing the deletion of only one of a pair of rows in different tables thus linked). For ease of use, it is possible to define virtual tables - called views - which are subsets of the data in one or more tables and which can be queried using the same syntax as ordinary tables (which are sometimes called base tables, to distinguish them from these virtual tables). In addition to tables and views, the major constituents of a relational database are indexes (the database community prefer that spelling to "indices"), which can speed up the identification of records which satisfy the particular condition expressed in a query, and various stored procedures and functions which extend the range of operations which can be performed on data held in the tables. The collection of definitions of columns, tables, views, indexes, stored procedures and functions in a database is called its schema.

2.2. The VSA as a relational database

The VSA schema is described in detail elsewhere, but we recap here the basic features which we shall use later. For each VISTA survey programme in the VSA (e.g. the VISTA Hemisphere Survey) two or more (depending on numbers of epochs and data complexity) major catalogue tables called, for example, vhsDetection and vhsSource are available. The columns in *Detection are basically the attributes derived by running the pipeline image analyser over single passband frames, and these single-frame detections are then merged into multi-colour records for individual celestial objects, which are stored in *Source. Other major catalogue tables include *Variability which contains statistical information from multiple-epochs and *SynopticSource which includes single-frame detections merged at multiple epochs to give contemporary colours. In addition to these major tables, there are also a number of metadata tables, which store ancillary information describing the processes involved in obtaining VISTA imaging data, and which enable the provenance of data values in *Source, *Detection etc. to be traced all the way back to a given image. To aid spatial matching of objects within the VSA and between the VSA and external catalogue datasets (such as the 2MASS Point Source Catalogue and SDSS Data Release catalogues), there are also "neighbour" and "cross-neighbour" tables which record pairs of sources within 10 to 30 arcsec of one another, depending on the catalogues joined.

2.3. Structured Query Language (SQL)

SQL is the standard language for accessing and manipulating data stored in a relational database. In fact, several versions of the SQL standard exist, and most database management systems (DBMSs) actually support a subset of standard SQL, with some vendor-specific additions. The VSA is currently implemented in Microsoft's SQL Server 2012 DBMS, so VSA users will employ its SQL dialect (known as Transact-SQL, or T-SQL), although we have tried to restrict the use of vendor-specific features to a minimum. A fuller reference on T-SQL dialect than presented here is available online here.

The first thing to understand about SQL is that it is a set-based language, not a procedural language, like Fortran or C. A user submitting an SQL query to a relational database is defining the set of properties of the records to be returned from the database, not specifying the list of operations which will lead to their delivery; this latter is the responsibility of the DBMS engine, which will decide the best way to execute a given query from a set of possible execution plans. Many database vendors are adding procedural capabilities to the SQL dialects they support, and these constitute one of the main areas of difference between those dialects. These extensions will not be discussed here, as we shall concentrate on the basics of standard SQL.

2.4. The basic structure of a VSA SQL statement

For security reasons, the VSA does not allow users to execute queries which affect the basic structure and contents of the database, only those which can extract data from it. In SQL terms, this means that only SELECT statements are allowed: N.B. in this tutorial we write all SQL keywords in upper case italics and some column names in mixed case, both for clarity, although the VSA's SQL dialect is case insensitive by default. There are three basic classes of SELECT statement:

2.4.1 Projections

A projection is the retrieval of a set of full columns from a table. To retrieve the nominal RAs and Decs of the centres of all VISTA Hemisphere Survey fields in the VSA, one would type:

SELECT ra, dec FROM vhsMergeLog

where vhsMergeLog is the name of the VSA table which records information about VHS fields, and ra and dec are the names of the relevant columns in that table.

2.4.2 Selections

A selection is the retrieval of the data values in particular columns for those rows in a table which satisfy certain criteria. So, if one were interested only in fields whose nominal centres lie in a 1 degree strip on the celestial equator, the appropriate SQL query would be:

SELECT ra, dec
FROM vhsMergeLog
WHERE dec BETWEEN -0.5 AND +0.5

In this example the SQL statement has been split into three lines to emphasise the SELECT…FROM…WHERE syntax, but this is still one SQL statement. The SQL Query Form in the VSA interface ignores the whitespace at the end of each line of text and generates a single query string from valid multi-line text like this. (Note that this means that users should not used double hyphens to indicate comments in multi-line queries, since this will result in all text after the first comment being ignored.)
Multiple constraints can be included in the WHERE clause of a selection, so, for example, the query above could be rewritten as:

SELECT ra, dec FROM vhsMergeLog WHERE (dec >= -0.5) AND (dec <= +0.5)

while the centres of all other fields could be selected using the following statement:

SELECT ra, dec FROM vhsMergeLog WHERE (dec < -0.5) OR (dec > +0.5)

The parentheses in these examples have been included for clarity – they are only required when needed to avoid ambiguity, and when necessary to over-rule the standard order of precedence amongst operators, outlined in Section 3.3.6 (users should note that the accidental omission of the WHERE clause from a selection turns it not into an invalid query, but into the projection of the columns contained in its SELECT clause, which, for tables as large as the *Source and *Detection tables of the VSA will return a lot of data). Note the units of spherical coordinates in the VSA: these are almost always decimal degrees for both RA and Dec (a notable exception to this rule is attribute raBase in table Multiframe, where the units are hours for consistency with the FITS files ingested from the processing centre. You can easily convert units in selections or WHERE clauses: e.g. SELECT raBase*15.0 ..., or WHERE (ra/15.0) > 23.0 ... etc.

Note that users should check which attributes in which tables have been indexed in the VSA, since the performance of queries that can make use of them should be significantly better than for those which do not: this information is presented in the Schema Browser as highlighted rows in the lists of table attributes.

2.4.3 Joins

A join is the retrieval of data entries from one or more tables in a database matched under some criterion. Extending our example above, a user may be interested in the dates on which the Y band images in a strip were taken. The Multiframe table in the VSA has an attribute called mjdObs, which records the Modified Julian Date of the observation. The *MergeLog and Multiframe tables are linked by having the common attribute multiframeID, which is a unique identifier for each FITS file ingested into the archive. The SQL query retrieving the desired dates here would be:

SELECT mjdObs, ra, dec
FROM vhsMergeLog, Multiframe
WHERE (dec BETWEEN -0.5 AND +0.5)
AND (ymfID = multiframeID) AND (ymfID > 0)

In this query, records in the Multiframe table and the vhsMergeLog table are being joined on condition that their values for the multiframeID attribute are equal. Furthermore, we select only those rows for which there is a Y multiframe identifier, i.e. we exclude any rows where there is no Y multiframe in the merged frame set. Such a situation is indicated by a default value for the attribute (see Section 2.5.). Note also that in general, FITS files produced by the pipeline are multi-extension files containing a paw-print of 4 individual images, and fields in the merge logs are made up from sets of individual images in different passbands and at different epochs. The observation dates of the four paw-print images will be nearly the same, so the results set should contain sets of four repeats of the observation dates matched with the different centres of the device images that comprise the multiframe.

The AS keyword can be used to rename the attributes in the SELECT clause so that their names in the output result set differs from those by which they are known in the database table. For example, a user thinking that the column names referred to in the query above are a little cryptic could rewrite it as follows:

SELECT mjdObs AS yModifiedJulianDate, ra AS fieldCentreRA, dec AS fieldCentreDec
FROM vhsMergeLog, Multiframe
WHERE (dec BETWEEN -0.5 AND +0.5)
AND (ymfID = multiframeID) AND (ymfID > 0)

and the columns returned in the output result set will be headed yModifiedJulianDate, fieldCentreRA, fieldCentreDec. This useful facility can be misused by the unwary or the very stupid. For example, it would be possible to have a query which started "SELECT ra AS dec, dec AS ra", which could confuse the subsequent analysis of the extracted result set. In the particular case of extracting data in VOTable format from the VSA, the assignment of Unified Content Descriptors (UCDs) to columns - i.e. setting the values of the ucd attributes to <FIELD> elements in the metadata section at the start of the VOTable document - is performed on the basis of the column name in the output result set, so a user swapping ra and dec, as in the example above, would end up with the wrong UCD for both those two columns, causing problems were that VOTable to be used in conjunction with other Virtual Observatory tools reliant on UCDs. For the moment, users wishing to have UCDs included in VOTable output from the VSA must not rename table columns, while, more generally, it is left to the user to ensure that any renaming they do perform is sensible (note: UCDs in the SSA and VSA have been initially assigned according to the UCD1 definitions).

It is also possible to write join queries in a way that makes it more explicit that they are joining the two tables, i.e. the example above becomes:

SELECT mjdObs, ra, dec
FROM Multiframe
JOIN vhsMergeLog ON (ymfID = multiframeID)
WHERE (dec BETWEEN -0.5 AND +0.5) AND (ymfID > 0)

This is an inner join, meaning it only returns the (mjdObs,ra,dec) triplets for matched rows, but there are other types of join, which return different combinations of data (see the SSA Cookbook for more details).

2.4.4 Subqueries

The SQL Server dialect of SQL allows the construction of nested SELECT statements, in which the WHERE clause includes a subquery which is itself a valid SELECT statement. For example, the join on vhsMergeLog and Multiframe in the previous Section could be rewritten in the following way:

SELECT mjdObs
FROM Multiframe
WHERE multiframeID IN
(SELECT ymfID FROM vhsMergeLog WHERE (dec BETWEEN -0.5 AND +0.5) AND (ymfID > 0))

Note that the ra and dec attributes from the vhsMergeLog table have now been removed from the SELECT clause of the main query. This is because the vhsMergeLog table is only present in the subquery, and its attributes are no longer accessible to the main query. The one exception to this is its ymfID attribute, but that is only accessible by value - i.e. the subquery generates a list of ymfID values and matches between this and the multiframeID column of the Multiframe table are made by use of the IN operator. Note also that now the results set only contains the distinct values of the observation dates, i.e. we get one observation date per multiframe, as opposed to previously where the results sets contained repeated observation dates for the distinct device image centres.

This query could also be written using a second logical operator, ANY, as follows:

SELECT mjdObs
FROM Multiframe
WHERE multiframeID = ANY
(SELECT ymfID FROM vhsMergeLog WHERE (dec BETWEEN -0.5 AND +0.5) AND (ymfID > 0))

where the ANY operator is used to match rows in the Multiframe table with any of the rows in the output result set from the subquery which have the same ymfID value. Note that in both these subqueries formulations the list of attributes in the SELECT clause of the subquery must be consistent with the rest of the WHERE clause in the main query, since they are to be matched by value.

These last two queries illustrate the equivalence of IN and = ANY, but care must be taken if the logical operator NOT is added to the outer query, so that it seeks matches with the complement of the set for which matches are sought in the original query. The operator which is equivalent to NOT IN is not < > ANY, as one might initially expect, but rather < > ALL - where ALL is another logical operator, which evaluates to TRUE if all of a set of comparisons are TRUE - which is clear when one thinks through the logic of that query, but perhaps not immediately apparent.

The most common use for subqueries is to express complicated joins in a simpler fashion. Up to 32 levels of nesting are allowed, in principle, although memory restrictions may prevent that from being achieved in practice. To evaluate a complicated join as a nested series of queries would often be much slower, but the database engine should recognise the equivalence of the two forms of the query and recast it in the most efficient form as part of its query optimisation process. So, there should be no difference in the performance of queries submitted as complicated joins or as nested subqueries, and the latter are to be prefered if they help ensure that the query that is executed really is the one that the user wanted to run.

2.5. Null and default values in the VSA

Our discussion to this point has implicitly assumed that the values of the attributes corresponding to each column in every row in a database table are known. This need not always be the case, as a simple example from the VSA illustrates. The *Source tables in the VSA merges information about detections made in the individual frame set images. A very blue object may well not be detected in a K band image, so what should be written for such an object in the column of the *Source table which records, say, the ellipicity of the K band detection? One answer would be a null value, which is a special type of entry to be included in a table if the value of a given attribute is not known (or is indeterminate or is not applicable) for a particular row. In designing the VSA we have decided not to use nulls in these cases, but to define default values for use in these situations instead: e.g. in the example above, we would set the K band ellipticity in the *Source table of a celestial object undetected in that band to be a recognisably meaningless value; in the case of floating point numbers, -0.9999995e9. Nulls and default values are semantically different: the query processor in a database management system (DBMS) recognises that a value marked as null is unknown or indeterminate and will not include it in, say, the computation of the mean value of all entries in a column, while, to the query processor, a default value is like any other value, and will include it, unless the user explicitly excludes it - e.g. by computing the mean magnitude only for those objects with magnitudes brighter than -0.9999995e9, in this case. Other default values in the VSA include -99999999 for 4- and 8-byte integer attributes, -9999 for 2-byte integers, and 0 for 1-byte (unsigned) integers. The schema browser generally indicates the default value for many of the table attributes, and they have all been chosen to lie well beyond the range of legitimate values found in the VSA, so it is simple to exclude them:

SELECT AVG(jAperMag3)
FROM vhsSource
WHERE jAperMag3 > 0.0

As a result of defining default values for some columns, there have to be dummy rows in some tables (i.e. rows for which every attribute takes the appropriate default value). The reason for this is illustrated by consideration of the *MergeLog and Multiframe tables in the VSA. If one of the multiframe UIDs (e.g. the Y image attribute, ymfID) in one of the *MergeLog tables has a default value because that particular frame set does not yet have a Y-band image, and if the attribute ymfID references the attribute multiframeID in table Multiframe, then table Multiframe needs an entire row of defaults for multiframeID=-99999999 in order to maintain the referential integrity of the database.

It is the responsibility of the user to ensure that the presence of default values and dummy rows will not corrupt the results of queries run on the VSA, but our decision to use them, rather than nulls greatly simplifies the logic involved in all queries run on the database. The inclusion of null values for an attribute means that an expression involving it can evaluate to TRUE, FALSE or UNKNOWN, and we believe that the simplification for users in avoiding this three-valued logic greatly outweighs the burden of having to remember that default values exist for some columns in the VSA.

3. Reference: additional options in SELECT statements

Previous sections have described the basic SELECT…FROM…WHERE… structure of an SQL query. This is the basic syntax to be employed for querying the VSA, but there are some additional options in the SELECT clause which users may find useful. Once again, a fuller reference than is presented below is available online here.

3.1 Aggregate Functions

SQL offers a number of useful aggregate functions, which can be used for deriving summary information on columns or selections from them.

3.1.1 Mathematical aggregate functions

The meanings of those mathematical aggregate functions which apply only to numeric columns are very straightforward: AVG, MAX, MIN, SUM, STDEV, STDEVP, VAR, and VARP return, respectively, the mean, maximum, minimum, sum, standard deviation, population standard variation, variance and population variance of all values in the specified column. They can be used in conjunction with a WHERE clause or not, i.e.

SELECT max(dec) FROM vhsMergeLog

will return the maximum value for the dec column found in the vhsMergeLog table, while

SELECT max(dec) FROM vhsMergeLog WHERE (ra BETWEEN 180 AND 360)

returns the maximum value found for fields within one hemisphere of the sky.

N.B. Strictly speaking, these functions only apply to non-null values found within the particular column. As discussed above, the VSA contains no null values by design, but uses default values instead. These will not be ignored in the computation of aggregate functions. For example, the SQL query

SELECT min(mjdObs) FROM Multiframe

returns the value -0.9999995e9, which is clearly nonsensical astronomically, and just illustrates the unintentional selection of the dummy row in the Multiframe table.

3.1.2 COUNT

There are additional aggregate functions which can be run on columns of all types. The most useful of these is COUNT, which can be used in a number of ways, as shown in the following example. The programmeID attribute in the ProgrammeFrame table identifies the multiframes which belong to that programme in the archive: since the VSA combines data from several programmes (e.g. the UKIDSS subsurveys), there are several different values of programmeID. To find out how many, one could use the following query:

SELECT COUNT (DISTINCT programmeID) FROM ProgrammeFrame

In practice, this is best discovered from querying the VSA's Programme table. Note that the inclusion of the DISTINCT keyword means that only the number of distinct programmeID values was returned. If it had been omitted, then the returned value would be the total number of entries in the programmeID column, which is simply equal to the number of rows in the table, since there are no null values for programmeID in the ProgrammeFrame table.
The total number of rows in the table can be also obtained using the following syntax:

SELECT COUNT(*) FROM ProgrammeFrame

Using COUNT(*) like this is a very good way of ensuring that a query is sensible before getting data returned by running it, and users are strongly encouraged to use this method to develop and debug SQL queries before running them on the full VSA. The performance overhead involved in this two-query process is not as high as it might seem, since, depending on how heavily the database is being used, some of the data from the COUNT(*) query will still be in cache when the query is run a second time to extract the desired attributes. Users should note that COUNT should be replaced by COUNT_BIG if there is likely to be more than 2x10⁹ records to be counted: this circumvents an SQL Server bug.

3.1.3 TOP

An interesting function specific to SQL Server’s SQL dialect is TOP, which is illustrated as follows. The query

SELECT TOP 10 mjdObs FROM Multiframe

would return the MJDs for ten rows in the Multiframe table. This will generally not be the ten highest (i.e. most recent) values in the table; remember that SQL is a set-based language, so a query yields the set of rows satisfying the criteria specified in the query, but with, by default, no particular ordering within that set. The ten highest area values can be obtained using TOP, however, with the addition an ORDER BY clause, which does impose an ordering of the rows in the result set: i.e.:

SELECT TOP 10 mjdObs FROM Multiframe ORDER BY mjdObs DESC

Note the presence of the DESC (for descending) keyword in the ORDER BY clause, which is required because the default behviour is for that clause to list rows in ascending order.

3.1.4 GROUP BY and HAVING

The GROUP BY clause allows aggregate functions to return more than a single value. For example, the user running the "SELECT COUNT(DISTINCT programmeID) FROM ProgrammeFrame" query above might want to know how many VSA multiframes are associated with each programme. That information is returned by the following query:

SELECT programmeID, COUNT(*) FROM ProgrammeFrame GROUP BY programmeID

where the rows in ProgrammeFrame are grouped by their programmeID value and separate counts are made of the number of rows in each group.

3.2 Mathematical Functions

The following mathematical functions are supported by SQL Server's SQL dialect.

ABS(x): returns the absolute value of x.
ACOS(x): returns the angle, in radians and in the range 0 to π, whose cosine is x.
ASIN(x): returns the angle, in radians and in the range -π/2 to π/2, whose sine is x.
ATAN(x): returns the angle, in radians and in the range -π/2 to π/2, whose tangent is x.
ATN2(y,x): returns the angle, in radians, whose tangent is y/x.
CEILING(x): returns the smallest integer greater than, or equal to, x.
COS(x): returns the cosine of angle x in radians.
COT(x): returns the cotangent of angle x in radians.
DEGREES(x): returns the angle in degrees corresponding to angle x in radians.
EXP(x): returns the exponential of x.
FLOOR(x): returns the largest integer less than or equal to x.
LOG(x): returns the natural logarithm of x.
LOG10(x): returns the base 10 logarithm of x.
PI(): returns the value of π.
POWER(x,y): returns the value of x raised to the power y.
RADIANS(x): returns the angle in radians corresponding to angle x in degrees.
RAND(x): returns a random number between 0 and 1, using an optional integer seed x if supplied.
ROUND(x,n,m): returns the value of x rounded to a specified length or precision. If n is a positive integer, then x is rounded to the number of decimal places specified by n. If n is a negative integer, then x is rounded n places to the left of the decimal point. ROUND can also be used to truncate x, through use of the optional argument m. If m is a positive integer, then x is truncated to the precision specified by n, rather than rounded to it.
SIGN(x): returns the value 1, 0, -1, depending whether x is positive, zero or negative, respectively.
SIN(x): returns the sine of the angle x in radians.
SQUARE(x): returns the square of x.
SQRT(x): returns the positive square root of x.
TAN(x): returns the tangent of angle x in radians.

Arithmetic functions (such as ABS, CEILING, DEGREES, FLOOR, POWER, RADIANS, and SIGN) return a value having the same data type as the input value, while trigonometric functions and others (such as EXP, LOG, LOG10, SQUARE, and SQRT), cast their input values to float and return a float value; this probably is of no concern to the average VSA user. All mathematical functions, except for RAND, are deterministic functions - i.e. they return the same results each time they are called with a specific set of input values - and RAND is deterministic only when a seed parameter is specified.

3.3 Operators

An operator in SQL is a symbol specifying an action that is performed on one or more expressions. For the present purposes, their major use is to provide greater flexibility in the possible forms of WHERE clauses of queries and in the columns of the result sets they can produce, which need not simply be columns drawn from the table(s) being queried. There are several classes of operator to consider.

3.3.1 Arithmetic operators

SQL Server's SQL dialect supports five arithmetic operators. The four basic ones - Addition (+), Subtraction (-), Multiplication (*) and Division (/) - plus the Modulo operation, (%), which returns the remainder of dividing one integer by another, and is used in the format "dividend%divisor". The use of the four basic arithmetic operators follows straightforwardly from their definitions.

3.3.2 Comparison operators

The Boolean comparison operators are used most frequently to filter rows via the WHERE clause of a SQL query. The most simple comparison operators ( <, >, =) were used above without introduction, but there a total of nine comparison operators which can be applied to pairs of expressions in the SQL Server dialect of SQL: = (Equal to); > (Greater than); < (Less than); >= (Greater than or equal to); <= (Less than or equal to); <>(Not equal to); != (Not equal to); !> (Not greater than); and !< (Not less than).

3.3.3 Logical operators

In a similar fashion, we have used a number of the logical operators (e.g. AND, BETWEEN, etc) above without introduction, but the following is the full list of logical operators supported by SQL Server:

ALL: TRUE if all of a set of comparisons are TRUE.
AND: TRUE if both of two Boolean expressions are TRUE.
ANY: TRUE if at least one of a set of comparisons is TRUE.
BETWEEN: TRUE if the operand is within a specified range.
EXISTS: TRUE if a subquery contains any rows.
IN: TRUE if the operand is equal to one of a list of expressions.
LIKE: TRUE if the operand matches a pattern.
NOT: Reverses the value of any other Boolean expression.
OR: TRUE if either of a pair of Boolean expressions is TRUE.
SOME: TRUE if at least one of a set of comparisons is TRUE.

The use of some of these operators (e.g. AND, BETWEEN, OR) has been illustrated above, but it is worth making a few comments on the remaining ones.

The LIKE operator is used for pattern matching. This is most commonly used for string matching. For example, a user interested in knowing how many VHS multiframes are present in the VSA could issue the query:

SELECT COUNT(*) FROM Multiframe WHERE project LIKE "U/UKIDSS/VHS%"

Note the use of the percentage sign wildcard character which matches all the UKIRT OMP project strings defined for the VHS. Several other wildcard characters can be used in conjunction with LIKE; these are described further in the SSA Cookbook, as are further examples of logical operators.

3.3.4 String concatenation operator

The plus sign, +, is used as a string concatenation operator in the SQL Server dialect of SQL. This is most likely to be of use to VSA users in the formatting of result sets - i.e. in the definition of SELECT clauses.

3.3.5 Unary operators

The SQL Server dialect of SQL defines three unary operators - i.e. operators which have only one operand - although none of these are likely to be of much use to most VSA users. The first of these is the positive unary operator, +, which is used principally in SQL statements not allowed by the VSA query interface. The second, -, is the negative unary operator, which returns the negative value of a numeric expression, as shown in the following query:

SELECT -MIN(decBase) FROM Multiframe WHERE MultiframeID > 0

The final unary operator, ~, the Bitwise NOT operator, converts each bit in a numeric expression of integer data type into its 1s complement (i.e. 0s are changed to 1 and vice versa). This might be of use in queries involving the quality flags in the *Source and *Detection tables.

3.3.6 Operator precedence

The operators described in this subsection have the following descending levels of precedence:

+ (Positive Unary Operator), - (Negative Unitary Operator), ~ (Bitwise NOT)
* (Multiply), / (Division), % (Modulo)
+ (Add), + (Concatenate), - (Subtract), & (Bitwise AND)
= , > , < , > = ,< = , < > , ! = , ! > , ! < (Comparison operators)
^ (Bitwise Exclusive OR), | (Bitwise OR)
NOT
AND
ALL, ANY, BETWEEN, IN, LIKE, OR, SOME

When two operators in an expression have the same operator precedence level, they are evaluated left to right based on their position in the expression.

4. How to query cross-matched data in the archive

Archive catalogue tables are automatically cross-matched to themselves and to a number of external survey catalogue datasets held locally, e.g. SDSS Data Releases; FIRST, IRAS, ROSAT and 2MASS catalogues; and legacy photographic catalogues like USNO-B (for a complete list, click on "Browser" on a navigation bar or see below). Rather than prejoining any two datasets to create a static, merged table of what are assumed to be associated sources on the basis of a fixed joining algorithm, and choosing a subset of what are assumed to be useful attributes from each of the two catalogue tables to propagate into that merged table, the philosophy is to create a table of pointers between any two matched datasets. This means, for example, that any externally catalogued source co-incident or nearby an archive source is readily available, out to some predefined maximum angular search radius, and all sorts of science usages are possible because the exact matching criteria (e.g. maximum allowable displacement, consistency in morphological parameters or classification, or even the number of possible external matches for a given archive source) can be tuned at query time by simple expressions in SQL. Furthermore, all attributes of both datasets are available to querying, because no decision has been made as to which attributes to propagate into a static merged set. The flip side to this flexibility is the rather opaque syntax for querying cross-matched data in SQL: instead of querying a single merged table, e.g. SELECT * FROM MergedTable WHERE ..., in general you have to query three tables: the two cross-matched tables and the table of pointers, e.g. SELECT * FROM Table1, ExternalDB..Table2, CrossNeighboursTable2 WHERE ..., using the joining techniques discussed above to select out the rows that you require. Some real-world examples best illustrate how to query cross-matched data in the VSA, but first we discuss the naming of VSA objects (databases, tables and attributes) that pertain to catalogue joining.

4.1 Naming convention for tables and attributes

In VSA parlance, a table of pointers that associates sources within one table (i.e. that provides a list of all nearby sources for each source within one table) is called a neighbour table, while a table of pointers that associates a set of externally catalogued sources to a table of VISTA sources is called a cross-neighbour table. Cross-neighbour tables are likely to be used the most, but neighbour tables have their uses (for example, you may wish to check the internal consistency of VISTA photometry and astrometry by selecting a set of measurements of the same sources in overlap regions of adjacent frame sets - this is most easily done using a neighbour table).

A list of all available neighbour/cross-neighbour tables is most easily obtained as follows:

SELECT neighbourTable FROM RequiredNeighbours

The naming convention is simple: neighbour tables are named by appending the word Neighbours to the source table that they point to, e.g. vhsSource has a neighbour table called vhsSourceNeighbours; while cross-neighbour tables are named by concatenating the two associated table names with an X, e.g. the cross-neighbour table between the VHS source table and the 2MASS point source catalogue is called vhsSourceXtwomass_psc. Within these tables, there will always be the following three attributes: masterObjID, which points to the source for which neighbours have been created (the central source of the neighbourhood, if you like); slaveObjID, which points to the neighbouring sources (sources lying within the neighbourhood); and finally distanceMins which is the angular separation between the central source and the neighbour, in units of arcminutes. Depending on the external catalogue joined, there may be other attributes in cross-neighbour tables (use the Browser to examine these).

At the time of writing, external catalogue databases held within the VSA consist of:

External survey	Catalogue product	SQL database name	Table name	Description
Sloan Digital Sky Survey	EDR	SkyServerV3	PhotoObj	SDSS Early Data Release (as released in the SkyServer RDBMS implementation, not the original SX OODBMS)
	DR5 (3,2)	BestDR5 (3,2)	PhotoObj	SDSS Data Release 5 (3,2)
2-Micron All-Sky Survey	Point Source Catalogue	TWOMASS	twomass_psc	2MASS point source catalogue
	Extended Source Catalogue	TWOMASS	twomass_xsc	2MASS extended source catalogue
ROSAT All Sky Survey	Bright Source Catalogue	ROSAT	rosat_bsc	ROSAT all-sky survey bright source catalogue
	Faint Source Catalogue	ROSAT	rosat_fsc	ROSAT all-sky survey faint source catalogue
IRAS	PSC	IRAS	iras_psc	Infra-Red Astronomical Satellite point source catalogue
VLA-FIRST	FIRST catalogue	FIRST	firstSource	Faint Images of the Radio Sky at Twenty cm from the VLA survey
SSA	Southern hemisphere survey	SSA	Source	SuperCOSMOS Sky Survey as ingested into the SuperCOSMOS Science Archive
Millenium Galaxy Catalogue	MGC	MGC	mgcDetection	37.5 square degree, medium-deep B-band imaging survey
NRAO VLA Sky Survey	NVSS	NVSS	nvssSource	A 1.4 GHz continuum survey covering the entire sky north of -40 deg declination.

Alternatively, you can use the VSA Browser to look at all the neighbour tables, their attributes and also the external databases that are available in the VSA, or use the following SQL query:

SELECT surveyName, databaseName, extTableName, description
FROM ExternalSurvey AS s, ExternalSurveyTable AS t
WHERE s.surveyID=t.surveyID AND s.surveyID>0

4.2 Selecting nearby cross-matches

Suppose a user wishes to select the identifiers and co-ordinates of all Faint Source Catalogue sources from the ROSAT All-Sky Survey that are within 6 arcsec of a VHS source. The SQL to do this is as follows:

SELECT vhs.sourceID, vhs.ra, vhs.dec, fsc.seqNo, fsc.ra, fsc.dec
FROM vhsSource AS vhs, ROSAT..rosat_fsc AS fsc, vhsSourceXrosat_fsc AS x
WHERE x.masterObjID=vhs.sourceID AND x.slaveObjID=fsc.seqNo AND x.distanceMins<0.1

Note: i) the use of table aliases vhs and fsc which are a convenient short-hand; ii) the use of attribute prefixes (e.g. vhs.ra) to distinguish between attributes in different tables that happen to have the same name; iii) the table joining condition in the WHERE clause, which selects associated rows (if you omit this condition, you will get all rows of each table joined to all other rows, i.e. an extremely large and useless dataset!); and finally iv) the specification of a maximum radius of 0.1 arcmin (=6 arcsec) for this query, where the maximum available for ROSAT joins is 30 arcsec. You can see the maximum joining radii (in arcsec) available to you in each neighbour/cross-neighbour table by issuing the following query:

SELECT neighbourTable, joinCriterion*3600.0 as maxRadius
FROM RequiredNeighbours

When selecting nearby cross-matches, users should note that one or more than one rows may result for each master object, since there may be more than one slave neighbour within the neighbourhood defined by the maximum join criterion specified. If you want the nearest match only, then the next section explains how to do this.

4.3 Selecting the nearest cross-match

Suppose a user wishes to select the infrared and optical point source photometry for a sample (say the first 50 rows for speed) in the VHS and SDSS DR2 overlap, taking the nearest SDSS source to each VISTA source as being the most likely match, where the maximum positional tolerance is set to 2 arcsec. The following query will do the job:

SELECT TOP 50 vhs.ra, vhs.dec, yAperMag3, jAperMag3, hAperMag3, ksAperMag3, psfMag_u, psfMag_g, psfMag_r, psfMag_i, psfMag_z
FROM vhsSource AS vhs, BestDR2..PhotoObj AS dr2, vhsSourceXDR2PhotoObj AS x
WHERE masterObjID=vhs.sourceID AND slaveObjID=dr2.ObjID AND distanceMins<0.033333 AND sdssPrimary=1 AND distanceMins IN (
SELECT MIN(distanceMins)
FROM vhsSourceXDR2PhotoObj
WHERE masterObjID=x.masterObjID AND sdssPrimary=1)

Note the following: i) here, we have used table aliases as attribute prefixes only where necessary (i.e. where attributes are not unambiguously identified by their names alone); ii) we select "primary" objects only from the SDSS, as we are interested in a data set with no duplicates; iii) a subquery is used to select the closest neighbour in each case, and the main query selects this nearest object if it is within the specified 2 arcsec maximum radial tolerance.

4.4 Selecting neighbours within one table

For VISTA survey tables that have neighbour tables defined, you can examine the neighbours of each source in the table. For example, suppose we want to check the internal consistency of VSA photometry and astrometry by selecting a sample of duplicate source measurements from overlap regions between adjacent frame sets in the VHS. The following query will do the job:

SELECT (s1.ra-s2.ra)*3600.0*COS(RADIANS(s1.dec)) as deltaRA,
               (s1.dec-s2.dec)*3600.0 as deltaDec,
               (s1.yAperMag3-s2.yAperMag3) as dy,
               (s1.jAperMag3-s2.jAperMag3) as dj1,
               (s1.hAperMag3-s2.hAperMag3) as dh,
               (s1.ksAperMag3-s2.ksAperMag3) as dk
FROM vhsSource AS s1, vhsSource AS s2, vhsSourceNeighbours AS x
WHERE s1.yAperMag3 > 0 AND s2.yAperMag3 > 0 AND s1.jAperMag3 > 0 AND s2.jAperMag3 > 0
               AND s1.hAperMag3 > 0 AND s2.hAperMag3 > 0 AND s1.ksAperMag3 > 0 AND s2.ksAperMag3 > 0
               AND masterObjID=s1.sourceID AND slaveObjID=s2.sourceID AND s1.priOrSec=s1.frameSetID AND distanceMins<0.01 AND distanceMins IN (
SELECT MIN(distanceMins)
FROM vhsSourceNeighbours
WHERE masterObjID=x.masterObjID)

Note that in this case, a stringent pairing tolerance of 0.6 arcsec has been set (this corresponds roughly to a 3-sigma chop given the typical positional errors in VISTA astrometry). Also, we have used the useful feature of user-defined computed columns to create a tabular dataset of attributes that are most useful to us (i.e. differences in positions in arcsec and differences in magnitudes in each available passband).

5. Refining sample selection using quality bit flags

Within the detection and source tables there are quality bit flags, labelled ppErrBits in the detection tables and *ppErrBits in the source tables, where * is the filter name e.g. yppErrBits. These attributes encode the quality issues associated with a given detection such that detections subject to more serious quality issues have higher values. For a detection with no known quality issues the attribute's value is zero. The bit flag value encodes up to 32 distinct quality issues, which are divided into four different classes according to severity. This allows you to select a sample of sources where the quality is better than a certain decimal threshold value corresponding to each warning classification listed in this table:

Warning Class	Threshold
Information	256
Warning	65536
Important	16777216
Severe

So, to select a sample that consists of sources with no known quality issues or just minor, informative, warnings we would add ppErrBits < 256 to the WHERE clause of our SQL query. This is partly how the *PointSource views generate a good quality view of the *Source tables for each programme. The reliable*PointSource view selects only those sources with no known quality issues. To consider the quality information from every passband in your selection, you can merge the source table *ppErrBits attributes as: (ksppErrBits | hppErrBits | jppErrBits) < 256 for as many passbands as exist in that programme's source table. As an example of such source selections in practice, suppose we want to produce a colour magnitude plot for galaxies in the VIDEO source table which have no important quality issues:

SELECT ksPetroMag, jmksExt
FROM videosSource WHERE mergedClass=1 AND ksPetroMag>-9.99995e+8 AND jmksExt>-9.99995e+8
AND (jppErrBits | ksppErrBits) < 65536

Initially one would wish to select just those sources with no known quality issues, and then widen the search to include more and more potentially spurious sources. To further refine our samples we can select or deselect certain quality issues according to their associated bit mask, the full list of which may be found in the glossary entries for the ppErrBits attributes. To select sources marked with a certain quality issue we would add ppErrBits & bitMask != 0 to the WHERE clause of our SQL query, where bitMask is the bit mask value (decimal or hexadecimal) assigned to that quality issue as listed in the ppErrBits glossary entry. Conversely, to exclude a certain quality issue we would add ppErrBits & bitMask = 0 to the WHERE clause of our SQL query.

For example, having initially excluded all sources with important quality issues, we might wish to include a sample that are within the ears of tiles. To do this we just extend the SQL query as follows:

SELECT ksPetroMag,jmksExt
FROM videoSource WHERE mergedClass=1 AND ksPetroMag>-9.99995e+8 AND jmksExt>-9.99995e+8
AND ((jppErrBits | ksppErrBits) < 65536 OR (jppErrBits | ksppErrBits) & 0x00800000 != 0)

where the 0x00800000 is the bit mask, in hexadecimal notation, that represents the "source in underexposed tile ear" quality issue.

Finally, to select a sample of galaxies that do not lie within a dither offset of the stacked J-band frame boundary (and so contains complete data), but can be affected by any other quality issue:

SELECT ksPetroMag,jmksExt
FROM vhsSource WHERE mergedClass=1 AND ksPetroMag>-9.99995e+8 AND jmksExt>-9.99995e+8
AND jppErrBits & 0x00400000 = 0

where the 0x00400000 is the bit mask, in hexadecimal notation, that represents the "source within a dither offset of the stacked frame boundary" quality issue.

Link to a fully detailed guide to the design of the quality bit flag attribute, ppErrBits, together with the complete list of quality issues with full descriptions and associated bit masks.

6. Refining sample selection for filter coverage

When making selections from the survey merged source tables *Source, users are reminded to think carefully about their requirements as regards filter coverage in the sample. Because releases of survey data are made prior to full completion of the surveys (obviously to expedite science exploitation), selections will be incomplete in various subtle ways depending on exactly how the sample predicates are specified.

7. Using multi-epoch data: light curves etc

7.1 Selecting variable stars

SELECT v.sourceID
FROM vvvSource AS s,vvvVariability AS v
WHERE s.sourceID=v.sourceID AND s.mergedClass=-1 AND v.variableClass=1

This will select the stars which have been classified as variable using the current algorithm in the Schema Browser glossary.

Other selections using the statistics calculated in the Variability table can be used, e.g. selecting bright objects with high rms in J and Ks and large absolute skew in each band.

SELECT v.sourceID
FROM videoSource AS s,videoVariability AS v,videoVarFrameSetInfo AS i
WHERE s.sourceID=v.sourceID AND s.mergedClass=-1 AND v.frameSetID=i.frameSetID AND v.ksMeanMag<(i.ksexpML-3.) AND v.ksMeanMag>0. AND v.jMeanMag<(i.jexpML-3.) AND v.jMeanMag>0. AND v.jMagRms>(5.*v.jExpRms) AND v.ksMagRms>(5.*v.ksExpRms) AND abs(v.jskewness)>10. AND abs(v.ksskewness)>10.

7.2 Producing light curves

SELECT d.mjd,d.aperMag3,d.aperMag3Err,d.ppErrBits,d.seqNum,x.flag
FROM videoSourceXDetectionBestMatch AS x,videoDetection AS d
WHERE x.sourceID=446677514563 AND x.multiframeID=d.multiframeID AND x.extNum=d.extNum AND x.seqNum=d.seqNum AND d.filterID=5
ORDER BY d.mjd

This gives the necessary data for the ks-band light curve for sourceID=446677514563 in VIDEO. Objects with d.ppErrBits>0 are flagged and objects with d.seqNum<0 are non-detections. Objects with x.flag=1 are also assigned to another source and those with x.flag=2 are non-detections within one dither offset of the frame edge.

7.3 Producing light curves from sources selected in the variability table

The first example gives a single light-curve from a selection on the VIDEO survey.

SELECT d.mjd,d.aperMag3,v.ksexpRms,d.ppErrBits,d.seqNum,x.flag
FROM videoSourceXDetectionBestMatch AS x,videoDetection AS d,videoVariability AS v
WHERE x.multiframeID=d.multiframeID AND x.extNum=d.extNum AND x.seqNum=d.seqNum AND d.filterID=5 AND v.sourceID=x.sourceID AND x.sourceID IN
(SELECT TOP 1 v.sourceID
FROM videoVariability AS v,videoSource AS s, videoVarFrameSetInfo AS i
WHERE v.sourceID=s.sourceID AND v.frameSetID=i.frameSetID AND v.ksMeanMag>0 AND v.ksMeanMag<(i.ksExpML-3.) AND v.jMeanMag>0 AND v.jMeanMag<(i.jExpML-3.) AND v.ksSkewness>2. AND s.mergedClass IN (-1,-2) AND v.ksnGoodObs>10 AND v.variableClass=1)
ORDER BY d.mjd

This gives the necessary data for the ks-band light curve for the first source selected which meets the following criteria: 0<Ks<(KsML-3.); 0<J<(JML-3.); Ksskew>2.; star-like; number of ks-band observations > 10; overall classification is a variable. In this case the error bars are the expected rms of a non-variable object with the mean magnitude of this source.

A second example, this time from the VVV, selects many light-curves which have similar variability properties. In this case, we include the sourceID in the output, so each light curve can be separated from the others:

SELECT v.sourceID,d.mjd,d.aperMag3,d.aperMag3Err,d.ppErrBits,d.seqNum,b.flag
FROM vvvSourceXDetectionBestMatch AS b,vvvDetection AS d,vvvVariability AS v
WHERE b.multiframeID=d.multiframeID AND b.extNum=d.extNum AND b.seqNum=d.seqNum AND d.filterID=5 AND v.sourceID=b.sourceID AND v.ksMeanMag>12. AND v.ksMeanMag<16. AND v.ksnGoodObs>=50 AND v.variableClass=1 AND v.ksskewness BETWEEN 1. AND 2. AND ((v.ksMaxMag-v.ksMinMag) BETWEEN 0.3 AND 0.7)
ORDER BY v.sourceID,d.mjd

This gives the necessary data for the ks-band light curves for all sources which meet the following criteria: 12.<Ks<16.; 1.<Ksskew<2.; 0.3<KsAmplitude<0.7 number of ks-band observations > 50; overall classification is a variable. In this case the error bars are the measured rms from each observation.

7.4 Selecting light curves from correlated filter programmes

SELECT ss.ksMjd,ss.ksaperMag3,v.ksexpRms,ss.ksppErrBits,ss.ksseqNum,ss.synSeqNum,b.flag
FROM vmcSourceXSynopticSourceBestMatch AS b,vmcSynopticSource AS ss,vmcVariability AS v
WHERE b.synframeSetID=ss.synframeSetID AND b.synSeqNum=ss.synSeqNum AND v.sourceID=b.sourceID AND b.sourceID IN
(SELECT TOP 1 v.sourceID
FROM vmcVariability AS v,vmcSource AS s, vmcVarFrameSetInfo AS i
WHERE v.sourceID=s.sourceID AND v.frameSetID=i.frameSetID AND v.ksMeanMag>0 AND v.ksMeanMag<(i.ksExpML-3.) AND v.jMeanMag>0 AND v.jMeanMag<(i.jExpML-3.) AND v.ksSkewness>2. AND s.mergedClass IN (-1,-2) AND v.ksnGoodObs>10 AND v.variableClass=1) AND v.jksiWS>10000.
ORDER BY ss.ksMjd

Some programmes, such as the standard star (CAL) or VMC observe all filters over a short duration to get colours that are correlated and are independent of variability. In these data sets, we put the colour information at each epoch into a SynopticSource table and match this to the Source table. This selection is the same as 7.3, except we have included a selection on the strength of the correlation between the J and Ks bands: hksiWS. The magnitudes come from the vmcSynopticSource table, which is the table of matched filters at a specific epoch, specified in the vmcSynopticMergeLog table, by the meanMjdObs attribute. The matching is done using the synFrameSetID and synSeqNum, but users must be aware that a synoptic frameset in SynopticMergeLog may not include all filters, if some are deprecated. The flag is also more complicated, see the Glossary for details.

7.5 Selecting colour-light curves from correlated filter programmes

SELECT ss.hMjd,ss.ksMjd,ss.hmksPnt,ss.ksppErrBits,s.ksseqNum,ss.synSeqNum,b.flag
FROM calSourceXSynopticSourceBestMatch AS b,calSynopticSource AS ss,calVariability AS v
WHERE b.synframeSetID=ss.synframeSetID AND b.synSeqNum=ss.synSeqNum AND v.sourceID=b.sourceID AND b.sourceID IN
(SELECT TOP 1 v.sourceID
FROM calVariability AS v,calSource AS s, calVarFrameSetInfo AS i
WHERE v.sourceID=s.sourceID AND v.frameSetID=i.frameSetID AND v.ksMeanMag>0 AND v.ksMeanMag<(i.ksExpML-3.) AND v.hMeanMag>0 AND v.hMeanMag<(i.hExpML-3.) AND v.ksSkewness>2. AND s.mergedClass IN (-1,-2) AND v.ksnGoodObs>10 AND v.variableClass=1) AND v.hksiWS>10000.
ORDER BY ss.ksMjd

This is similar to 7.4, but the magnitude has been replaced with the H-K colour, so in this case the user can see how the colour varies over time.

8. Matching Tiles to Pawprints

We have produced two new tables to match tiles to their constituent pawprints: xxxTileSet and xxxTilePawPrints and these are similar in design (and are created using much of the same code) as xxxMergeLog and xxxSource. Like xxxMergeLog, xxxTileSet logs all the frames, in the frameset: the tile frame (tlmfID) and the 6 pawprint frames (o1mfID -- o6mfID) where o1 is the pawprint with offsetID=1. xxxTilePawPrints then contains the matched detections between the 7 frames, just like xxxSource would have the matched detections between all the different bands. However, unlike xxxSource, xxxTilePawPrints does not contain any additional information beyond the basic matching: for each frame there is only extNum, seqNum, xi, eta. This reduces the width and size, making lookups much more efficient. These tables are very flexible and can make querying between the tiles and pawprint detections extremely efficient. For instance, if you wanted just to find which tile detections had pawprint detections in a VIDEO tile with multiframeID=401475.

SELECT COUNT(*)
FROM videoTileSet AS s, videoTilePawPrints AS t
WHERE t.tileSetID=s.tileSetID AND tlmfID=401475

It is easy to select detections that are from offset 1,

SELECT COUNT(*)
FROM videoTileSet AS s, videoTilePawPrints AS t
WHERE t.tileSetID=s.tileSetID AND tlmfID=401475 AND o1seqNum>0

or from extNum 5 of offset 5,

SELECT COUNT(*)
FROM videoTileSet AS s, videoTilePawPrints AS t
WHERE t.tileSetID=s.tileSetID AND tlmfID=401475 AND o5seqNum>0 AND o5extNum=5

or detections that are matched on all six offsets,

SELECT COUNT(*)
FROM videoTileSet AS s, videoTilePawPrints AS t
WHERE t.tileSetID=s.tileSetID AND tlmfID=401475 AND o1seqNum>0 AND o2seqNum>0 AND o3seqNum>0 AND o4seqNum>0 AND o5seqNum>0 AND o6seqNum>0

or from offset 1 only,

SELECT COUNT(*)
FROM videoTileSet AS s, videoTilePawPrints AS t
WHERE t.tileSetID=s.tileSetID AND tlmfID=401475 AND o1seqNum>0 AND o2seqNum<0 AND o3seqNum<0 AND o4seqNum<0 AND o5seqNum<0 AND o6seqNum<0

In each case setting the relevant seqNum>0 to select detections in that frame or <0 to exclude detections in that frame is all that is needed. To compare the photometry between the tiles and pawprints is also very easy:

SELECT AVG(td.aperMag3-o1d.aperMag3 AS meanDeltaMag, STDEV(td.aperMag3-o1d.aperMag3) AS stDevDlMag
FROM videoTileSet AS s, videoTilePawPrints AS t, videoDetection AS td,videoDetection AS o1d
WHERE t.tileSetID=s.tileSetID AND tlmfID=401475 AND o1seqNum>0 AND o2seqNum<0 AND o3seqNum<0 AND o4seqNum<0 AND o5seqNum<0 AND o6seqNum<0 AND tlseqNum>0 AND tlmfID=td.multiframeID AND tlextNum=td.extNum AND tlseqNum=td.seqNum AND o1mfID=o1d.multiframeID AND o1extNum=o1d.extNum AND o1seqNum=o1d.seqNum AND td.ppErrBits=0 AND o1d.ppErrBits=0

In this case we match the photometry between the tile and offset1 only, and only for those detections where both the tile and pawprint have a detection (tlseqNum>0 AND o1seqNum>0) and the detections are not flagged (td.ppErrBits=0 AND o1d.ppErrBits=0). Any attriutes in the detection tables can be compared in this way, although it is necessary to match a new detection table for every frame in the table. Since this is done via the primary key (multiframeID, extNum, seqNum), the matching is very fast. Unfortunately this table cannot be directly connected to the Source table or to the BestMatch table, This is caused by the fact that the TilePawPrints table contains rows where there are no tile Detections but there are pawprint Detections, so any match to the Source table which often has default rows when there is a non-detection in a particular filter would have multiple choices for a "default" tile detection. To avoid this, add the following constraint on the TilePawPrints: (tlSeqNum>0 OR tileSetSeqNum<0). To select all pawprint and tile aperMag3 values for all filters in a VMC frameset, use the following query:

SELECT s.sourceID,s.ra,s.dec, dty.apermag3 AS yTileMag3, do1y.aperMag3 AS yOff1Mag3, do2y.aperMag3 AS yOff2Mag3, do3y.aperMag3 AS yOff3Mag3, do4y.aperMag3 AS yOff4Mag3, do5y.aperMag3 AS yOff5Mag3, do6y.aperMag3 AS yOff6Mag3, dtj.apermag3 AS jTileMag3, do1j.aperMag3 AS jOff1Mag3, do2j.aperMag3 AS jOff2Mag3, do3j.aperMag3 AS jOff3Mag3, do4j.aperMag3 AS jOff4Mag3, do5j.aperMag3 AS jOff5Mag3, do6j.aperMag3 AS jOff6Mag3, dtks.aperMag3 AS ksTileMag3, do1ks.aperMag3 AS ksOff1Mag3, do2ks.aperMag3 AS ksOff2Mag3, do3ks.aperMag3 AS ksOff3Mag3, do4ks.aperMag3 AS ksOff4Mag3, do5ks.aperMag3 AS ksOff5Mag3, do6ks.aperMag3 AS ksOff6Mag3
FROM vmcSource AS s, vmcMergeLog AS l, vmcDetection AS dty, vmcDetection AS dtj, vmcDetection AS dtks, vmcTileSet AS sy, vmcTilePawPrints AS py, vmcTileSet AS sj, vmcTilePawPrints AS pj, vmcTileSet AS sks, vmcTilePawPrints AS pks, vmcDetection AS do1y, vmcDetection AS do2y, vmcDetection AS do3y, vmcDetection AS do4y, vmcDetection AS do5y, vmcDetection AS do6y, vmcDetection AS do1j, vmcDetection AS do2j, vmcDetection AS do3j, vmcDetection AS do4j, vmcDetection AS do5j, vmcDetection AS do6j, vmcDetection AS do1ks, vmcDetection AS do2ks, vmcDetection AS do3ks, vmcDetection AS do4ks, vmcDetection AS do5ks, vmcDetection AS do6ks
WHERE l.frameSetID=s.frameSetID AND l.frameSetID=558345748481 AND dty.multiframeID=l.ymfID AND dty.extNum=l.yeNum AND dty.seqNum=s.ySeqNum AND sy.tlmfID=l.ymfID AND (py.tlSeqNum>0 OR py.tileSetSeqNum<0) AND py.tileSetID=sy.tileSetID AND py.tlextNum=l.yeNum AND py.tlseqNum=s.yseqNum AND sy.o1mfID=do1y.multiframeID AND py.o1ExtNum=do1y.extNum AND py.o1SeqNum=do1y.seqNum AND sy.o2mfID=do2y.multiframeID AND py.o2ExtNum=do2y.extNum AND py.o2SeqNum=do2y.seqNum AND sy.o3mfID=do3y.multiframeID AND py.o3ExtNum=do3y.extNum AND py.o3SeqNum=do3y.seqNum AND sy.o4mfID=do4y.multiframeID AND py.o4ExtNum=do4y.extNum AND py.o4SeqNum=do4y.seqNum AND sy.o5mfID=do5y.multiframeID AND py.o5ExtNum=do5y.extNum AND py.o5SeqNum=do5y.seqNum AND sy.o6mfID=do6y.multiframeID AND py.o6ExtNum=do6y.extNum AND py.o6SeqNum=do6y.seqNum AND dtj.multiframeID=l.jmfID AND dtj.extNum=l.jeNum AND dtj.seqNum=s.jSeqNum AND sj.tlmfID=l.jmfID AND pj.tileSetID=sj.tileSetID AND pj.tlextNum=l.jeNum AND (pj.tlSeqNum>0 OR pj.tileSetSeqNum<0) AND pj.tlseqNum=s.jseqNum AND sj.o1mfID=do1j.multiframeID AND pj.o1ExtNum=do1j.extNum AND pj.o1SeqNum=do1j.seqNum AND sj.o2mfID=do2j.multiframeID AND pj.o2ExtNum=do2j.extNum AND pj.o2SeqNum=do2j.seqNum AND sj.o3mfID=do3j.multiframeID AND pj.o3ExtNum=do3j.extNum AND pj.o3SeqNum=do3j.seqNum AND sj.o4mfID=do4j.multiframeID AND pj.o4ExtNum=do4j.extNum AND pj.o4SeqNum=do4j.seqNum AND sj.o5mfID=do5j.multiframeID AND pj.o5ExtNum=do5j.extNum AND pj.o5SeqNum=do5j.seqNum AND sj.o6mfID=do6j.multiframeID AND pj.o6ExtNum=do6j.extNum AND pj.o6SeqNum=do6j.seqNum AND dtks.multiframeID=l.ksmfID AND dtks.extNum=l.kseNum AND dtks.seqNum=s.ksSeqNum AND sks.tlmfID=l.ksmfID AND pks.tileSetID=sks.tileSetID AND (pks.tlSeqNum>0 OR pks.tileSetSeqNum<0) AND pks.tlextNum=l.kseNum AND pks.tlseqNum=s.ksseqNum AND sks.o1mfID=do1ks.multiframeID AND pks.o1ExtNum=do1ks.extNum AND pks.o1SeqNum=do1ks.seqNum AND sks.o2mfID=do2ks.multiframeID AND pks.o2ExtNum=do2ks.extNum AND pks.o2SeqNum=do2ks.seqNum AND sks.o3mfID=do3ks.multiframeID AND pks.o3ExtNum=do3ks.extNum AND pks.o3SeqNum=do3ks.seqNum AND sks.o4mfID=do4ks.multiframeID AND pks.o4ExtNum=do4ks.extNum AND pks.o4SeqNum=do4ks.seqNum AND sks.o5mfID=do5ks.multiframeID AND pks.o5ExtNum=do5ks.extNum AND pks.o5SeqNum=do5ks.seqNum AND sks.o6mfID=do6ks.multiframeID AND pks.o6ExtNum=do6ks.extNum AND pks.o6SeqNum=do6ks.seqNum
OPTION (MAXDOP 1)

This looks complicated, but it is similar to the previous query, but joining vmcTileSet and vmcTilePawPrints to vmcMergeLog and vmcSource. There are a lot of links to be made, but they are quite repetitive. Adding OPTION (MAXDOP 1) prevents SQL server using more than one processor. If it does, each process tends to get in the way of the others. In this case, the data has been selected for one frameSet only, frameSetID=558345748481.
To avoid loading so many tables at once, which can cause MS SQL to launch multiple sub-processes which get in the way of each other, it may be better to break queries like the one above into 2 parts, using the ENHANCED VERSION of the FREEFORM SQL interface. The first part is to match the *Detection* tables to the *TilePawPrints*. This will be faster if any selection on position is applied here, although it is ESSENTIAL that the default row (tileSetID<0) is also selected. Add in whichever attributes are needed at this stage. In the query below, we have selected two different aperture magnitudes (3 and 9) and the filename for the tiles and each pawprint.

SELECT s.tlmfID,p.tlExtNum,p.tlSeqNum,dt.aperMag3 AS tileMag3, dt.aperMag9 AS tileMag9,mt.fileName AS tileFile, do1.aperMag3 AS o1Mag3, do1.aperMag9 AS o1Mag9, mo1.fileName AS o1File, do2.aperMag3 AS o2Mag3, do2.aperMag9 AS o2Mag9, mo2.fileName AS o2File, do3.aperMag3 AS o3Mag3, do3.aperMag9 AS o3Mag9, mo3.fileName AS o3File, do4.aperMag3 AS o4Mag3, do4.aperMag9 AS o4Mag9, mo4.fileName AS o4File, do5.aperMag3 AS o5Mag3, do5.aperMag9 AS o5Mag9, mo5.fileName AS o5File, do6.aperMag3 AS o6Mag3, do6.aperMag9 AS o6Mag9, mo6.fileName AS o6File
FROM vhsTileSet AS s, vhsTilePawPrints AS p, vhsDetection AS dt, Multiframe AS mt, vhsDetection AS do1, Multiframe AS mo1, vhsDetection AS do2, Multiframe AS mo2, vhsDetection AS do3, Multiframe AS mo3, vhsDetection AS do4, Multiframe AS mo4, vhsDetection AS do5, Multiframe AS mo5, vhsDetection AS do6, Multiframe AS mo6
WHERE s.tileSetID=p.tileSetID AND ((s.ra>115 AND s.ra<125 AND s.dec>-65. AND s.dec<-55.) OR s.tileSetID<0) AND s.tlmfID=dt.multiframeID AND (p.tlSeqNum>0 OR p.tileSetSeqNum<0) AND p.tlExtNum=dt.extNum AND p.tlSeqNum=dt.seqNum AND s.tlmfID=mt.multiframeID AND s.o1mfID=do1.multiframeID AND p.o1ExtNum=do1.extNum AND p.o1SeqNum=do1.seqNum AND s.o1mfID=mo1.multiframeID AND s.o2mfID=do2.multiframeID AND p.o2ExtNum=do2.extNum AND p.o2SeqNum=do2.seqNum AND s.o2mfID=mo2.multiframeID AND s.o3mfID=do3.multiframeID AND p.o3ExtNum=do3.extNum AND p.o3SeqNum=do3.seqNum AND s.o3mfID=mo3.multiframeID AND s.o4mfID=do4.multiframeID AND p.o4ExtNum=do4.extNum AND p.o4SeqNum=do4.seqNum AND s.o4mfID=mo4.multiframeID AND s.o5mfID=do5.multiframeID AND p.o5ExtNum=do5.extNum AND p.o5SeqNum=do5.seqNum AND s.o5mfID=mo5.multiframeID AND s.o6mfID=do6.multiframeID AND p.o6ExtNum=do6.extNum AND p.o6SeqNum=do6.seqNum AND s.o6mfID=mo6.multiframeID

Download this file as a FITS or VOTABLE.
Then using the ENHANCED VERSION, use multiple instances of this table in the following query. In the query below we have compared all the aperMag3 values for tiles and pawprints in all 4 VHS filters. We could have included the aperMag9 values and the filenames too.

SELECT s.sourceID,s.ra,s.dec,ymatch.tileMag3 AS yTileMag3, ymatch.o1Mag3 AS yOff1Mag3, ymatch.o2Mag3 AS yOff2Mag3, ymatch.o3Mag3 AS yOff3Mag3, ymatch.o4Mag3 AS yOff4Mag3, ymatch.o5Mag3 AS yOff5Mag3, ymatch.o6Mag3 AS yOff6Mag3, jmatch.tileMag3 AS jTileMag3, jmatch.o1Mag3 AS jOff1Mag3, jmatch.o2Mag3 AS jOff2Mag3, jmatch.o3Mag3 AS jOff3Mag3, jmatch.o4Mag3 AS jOff4Mag3, jmatch.o5Mag3 AS jOff5Mag3, jmatch.o6Mag3 AS jOff6Mag3, hmatch.tileMag3 AS hTileMag3, hmatch.o1Mag3 AS hOff1Mag3, hmatch.o2Mag3 AS hOff2Mag3, hmatch.o3Mag3 AS hOff3Mag3, hmatch.o4Mag3 AS hOff4Mag3, hmatch.o5Mag3 AS hOff5Mag3, hmatch.o6Mag3 AS hOff6Mag3, ksmatch.tileMag3 AS ksTileMag3, ksmatch.o1Mag3 AS ksOff1Mag3, ksmatch.o2Mag3 AS ksOff2Mag3, ksmatch.o3Mag3 AS ksOff3Mag3, ksmatch.o4Mag3 AS ksOff4Mag3, ksmatch.o5Mag3 AS ksOff5Mag3, ksmatch.o6Mag3 AS ksOff6Mag3
FROM vhsSource AS s,vhsMergeLog AS l, #userTable AS ymatch, #userTable AS jmatch, #userTable AS hmatch,#userTable AS ksmatch
WHERE l.frameSetID=s.frameSetID AND l.ra>115. AND l.ra<125 AND l.dec>-65. AND l.dec<-55. AND l.jmfID=jmatch.tlmfID AND l.jeNum=jmatch.tlExtNum AND s.jseqNum=jmatch.tlSeqNum AND l.ymfID=ymatch.tlmfID AND l.yeNum=ymatch.tlExtNum AND s.yseqNum=ymatch.tlSeqNum AND l.hmfID=hmatch.tlmfID AND l.heNum=hmatch.tlExtNum AND s.hseqNum=hmatch.tlSeqNum AND l.ksmfID=ksmatch.tlmfID AND l.kseNum=ksmatch.tlExtNum AND s.ksseqNum=ksmatch.tlSeqNum

Here each #userTable is an instance of the saved table above, one for each filter. It is important to do any selection in ra/dec here too, especially if you do not query all possible filters, since you may return some all default rows too.
For the VHS, a single epoch survey, we have also included a simple neighbour table to match the vhsSource to pawprint detections: vhsSourceXPawPrints. Selections such as the following can be used to compare the tile and pawprint photometry:

SELECT sourceID,yAperMag3,jAperMag3,hAperMag3,ksAperMag3,d.filterID, d.aperMag3,m.fileName,m.offsetX,m.offsetY,d.extNum
FROM vhsSourceXPawPrints AS x,vhsSource AS s,vhsDetection AS d,Multiframe AS m,vhsMergeLog AS l
WHERE s.sourceID=x.masterObjID AND x.slaveObjID=d.objID AND d.multiframeID=m.multiframeID AND s.frameSetID=l.frameSetID AND l.frameSetID IN (select top 10 frameSetID from vhsMergeLog)
ORDER BY sourceID

This returns all the pawprint detections matched within 1" of each source along with information about which offset and extension and which filter the detection came from.
9. Contemporary colours
Some surveys such as the VVV and VMC have multiple epochs, but also have multiple filters observed close enough together in time to give near simultaneous measurements and therefore a useful colour of a possibly variable object. In these cases, the deep data is stored in the Source table, with colours averaged over all the epochs, giving a better signal-to-noise for non-variables. The contemporary colours are stored in the SynopticSource table. For some programmes, such as the standard star programme (CAL), there are contemporary colours for each epoch.
In the case of the VVV, there is only one epoch of ZYJH, but they are not observed within a well defined period, because the OBs are ZY and JHKs. A new option for creating framesets was designed after VVVDR1, for VVVDR2 and later releases and for Directors Descretionary Time programme 284.C-5034(A), which has a single epoch in ZYHKs and many epochs in J. In this option, set by the "synopticSetup" parameter in the "Programme" table, a single epoch can be forced. In programme 284.C-5034(A), the mean time of the single epoch bands is found, and then the nearest J-band observation is selected. In the case of the VVV, the selection is slightly more complicated: the ZY and JHKs epochs may be months apart. It is preferred to use the Ks taken at the same time as the JH, if possible. If there was a problem with the OB, then the next option is to use a 80s Ks, rather than a 16s and finally, the last option is to take the nearest 16s Ks band. A new parameter "mergeDuration" has been added into "SynopticMergeLog" table so that users can select only those pointings when all 5 bands were taken contemporarily.
To select data with contempory colours in the VVV, with an ID from the main vvvSource table (for joining to other tables (e.g. vvvVariability, or external surveys):
SELECT s.sourceID,ss.synopticID,ss.zmyPnt,ss.zmyPntErr,ss.ymjPnt, ss.ymjPntErr,ss.jmhPnt,ss.jmhPntErr,ss.hmksPnt,ss.hmksPntErr
FROM vvvSource AS s,vvvSourceXSynopticSource AS x, vvvSynopticSource AS ss, (SELECT l.frameSetID,sl.synFrameSetID, sl.meanMjdObs,sl.mergeDuration FROM vvvSynopticMergeLog AS sl, vvvMergeLog AS l WHERE dbo.fGreatCircleDist(l.ra,l.dec,sl.ra,sl.dec)<1) AS ML
WHERE (s.priOrSec=0 OR s.priOrSec=s.frameSetID) AND x.masterObjID=s.sourceID AND x.distanceMins<0.033 AND x.distanceMins IN (SELECT MIN(distanceMins) FROM vvvSourceXSynopticSource WHERE masterObjID=x.masterObjID) AND ss.synopticID=x.slaveObjID AND ML.frameSetID=s.frameSetID AND ML.synFrameSetID=ss.synFrameSetID

The query within the FROM string creates a temporary match between the vvvMergeLog (list of frame sets used to create vvvSource) and vvvSynopticMergeLog (list of frame sets used to create vvvSynopticSource), within 1 arcminute, so that repeated matches in overlap regions aren't returned. If you prefer all the overlapping contemporary colours, exclude this and all parts of the query with the ML alias. The next query adds in a selection for only the frames where contemporary colours were taken within a day (mergeDuration<1.)

SELECT s.sourceID,ss.synopticID,ss.zmyPnt,ss.zmyPntErr,ss.ymjPnt, ss.ymjPntErr,ss.jmhPnt,ss.jmhPntErr,ss.hmksPnt,ss.hmksPntErr
FROM vvvSource AS s,vvvSourceXSynopticSource AS x, vvvSynopticSource AS ss, (SELECT l.frameSetID,sl.synFrameSetID, sl.meanMjdObs,sl.mergeDuration FROM vvvSynopticMergeLog AS sl, vvvMergeLog AS l WHERE dbo.fGreatCircleDist(l.ra,l.dec,sl.ra,sl.dec)<1 AND mergeDuration<1.) AS ML
WHERE (s.priOrSec=0 OR s.priOrSec=s.frameSetID) AND x.masterObjID=s.sourceID AND x.distanceMins<0.033 AND x.distanceMins IN (SELECT MIN(distanceMins) FROM vvvSourceXSynopticSource WHERE masterObjID=x.masterObjID) AND ss.synopticID=x.slaveObjID AND ML.frameSetID=s.frameSetID AND ML.synFrameSetID=ss.synFrameSetID

10. 3D Extinction Maps
We have started incorporating 3D Extinction Maps into the WFAU Archives. We have described the details behind the database design here.
. We incorporate the maps into two tables, a pixel map and an extinction as a function of pixel and distance. We also include a table of extinction coefficients for a range of passbands and extinction laws, matching tables to overlapping source tables and a function to do efficient matches for other data. All 3D extinction maps are stored in the EXTINCT database (see the SchemaBrowser). Below we give a few basic examples: Example 1: Ks-band extincion corrected magnitude for a particular VVV source (in the bulge) at a distance of 5kpc. The VVV is linked to a 3D map of the bulge (Chen et al. 2013, ) which is mapID=2 in the EXTINCT..ThreeDimExtinctionMaps table.

SELECT s.sourceID, Ext.r, s.ksAperMag3, Ext.ejks, Ext.aKsCard, (s.ksAperMag3-Ext.aKsCard) AS ksAperMag3ExtCor
FROM vvvSource AS s, vvvSourceExtinction AS c, (SELECT e.*, (e.ejks*fks.aEJKsCard) AS aKsCard FROM EXTINCT..vvvBulge3DExtinctVals AS e, EXTINCT..FilterExtinctionCoefficients AS fks
WHERE fks.filterID=9) AS Ext WHERE s.sourceID=515488609610 AND s.sourceID=c.sourceID AND Ext.pixelID=c.extPixelID AND c.extMapID=2 AND Ext.r=5.

The extinction values are stored in this map as (J-Ks) and (H-Ks) colour excesses and must be converted to extinction values in the x^th band using
A_x=E(J-Ks)\times\frac{A_x}{E(J-Ks)}
coefficients in the FilterExtinctionCoefficients table. There are values for both the Cardelli and the Nishiyama extinction laws. In this case it is recommended by the creators of the 3D extinction map that users use Nishiyama for
\|b\|<4
and Cardelli for $\|b\|\geq4$. We have calculated Cardelli coefficiencts for all passbands and Nishiyama for
1.2<\lambda<8.0\mu\,m
over the wavelength range for which the law was measured.
A second example selects a large sample of objects to produce an absolute colour magnitude diagram:

SELECT s.sourceID, corR, (jAperMag3-5.*log10(corR)-10.-aJ) AS absJ, (zAperMag3-aZ-(yAperMag3-aY)) AS zmyPntCor, sqrt(jAperMag3Err*jAperMag3Err+aJErr*aJErr+(2.17*0.25/corR)* (2.17*0.25/corR)) AS absJErr, sqrt(zAperMag3Err*zAperMag3Err+aZErr*aZErr+ yAperMag3Err*yAperMag3Err+aYErr*aYErr) AS zmyPntCorErr
FROM vvvSourceExtinction AS c, vvvSource AS s, (select pixelID,(r+0.005) AS corR,ejks,ejksErr, (ejks*fz.aEJKsCard) AS aZ, (ejksErr*fz.aEJKsCard) AS aZErr, (ejks*fy.aEJKsCard) AS aY, (ejksErr*fy.aEJKsCard) AS aYErr, (ejks*fj.aEJKsCard) AS aJ, (ejksErr*fj.aEJKsCard) AS aJErr, (ejks*fh.aEJKsCard) AS aH, (ejksErr*fh.aEJKsCard) AS aHErr, (ejks*fks.aEJKsCard) AS aKs, (ejksErr*fks.aEJKsCard) AS aKsErr from EXTINCT..vvvBulge3DExtinctVals AS e, EXTINCT..FilterExtinctionCoefficients AS fz, EXTINCT..FilterExtinctionCoefficients AS fy, EXTINCT..FilterExtinctionCoefficients AS fj, EXTINCT..FilterExtinctionCoefficients AS fh, EXTINCT..FilterExtinctionCoefficients AS fks where fz.filterID=5 and fy.filterID=6 and fj.filterID = 7 and fh.filterID=8 and fks.filterID=9) AS Ext
WHERE c.extPixelid=Ext.pixelid AND c.sourceID=s.sourceID AND c.extMapID=2 AND s.jAperMag3<16. AND yAperMag3>0. AND jAperMag3>0. AND Ext.corR>0.25 AND s.l<2. AND abs(s.b)<1.

This gives the (Z-Y) vs $M_{J}$ values at a set of distances from 0.25kpc to 10kpc. An offset of 0.005kpc was added so that if calculated at a distance of 0, the distance modulus calculations return floating point values. This query combines the colour excesses and filter extinction coefficients into a temporary table in memory Ext.
More complex selections can be made with all the other VVV tables and link to the 3D maps through vvvSourceExtinction.
If you want to get extinction values for a different survey, or a random position, we have a function fgetPixelID(ra,dec) that can be used efficiently for a few thousand objects. fgetPixelID should be used with the CROSS APPLY function, using table CROSS APPLY EXTINCT.dbo.fgetPixelID(ra,dec). This can be done with any table that contains right-ascension and declination values.
The following example gives the extinction corrected spectral energy distribution for a star as a function of distance in VVV and Glimpse bands:

SELECT s.sourceID, Ext.r, (s.zAperMag3-Ext.aZCard) AS zAperMag3ExtCor, (s.yAperMag3-Ext.aYCard) AS yAperMag3ExtCor, (s.jAperMag3-Ext.aJCard) AS jAperMag3ExtCor, (s.hAperMag3-Ext.ahCard) AS hAperMag3ExtCor, (s.ksAperMag3-Ext.aKsCard) AS ksAperMag3ExtCor, (g.mag1-Ext.a34Card) AS mag1ExtCor, (g.mag2-Ext.a45Card) AS mag2ExtCor, (g.mag3-Ext.a58Card) AS mag3ExtCor, (g.mag4-Ext.a80Card) AS mag4ExtCor
FROM vvvSource AS s CROSS APPLY EXTINCT.dbo.fgetPixelID(ra,dec) AS c, (SELECT e.*, (e.ejks*fz.aEJKsCard) AS aZCard, (e.ejks*fy.aEJKsCard) AS aYCard, (e.ejks*fj.aEJKsCard) AS aJCard, (e.ejks*fh.aEJKsCard) AS aHCard, (e.ejks*fks.aEJKsCard) AS aKsCard, (e.ejks*f34.aEJKsCard) AS a34Card, (e.ejks*f45.aEJKsCard) AS a45Card, (e.ejks*f58.aEJKsCard) AS a58Card, (e.ejks*f80.aEJKsCard) AS a80Card FROM EXTINCT..vvvBulge3DExtinctVals AS e, EXTINCT..FilterExtinctionCoefficients AS fz, EXTINCT..FilterExtinctionCoefficients AS fy, EXTINCT..FilterExtinctionCoefficients AS fj, EXTINCT..FilterExtinctionCoefficients AS fh, EXTINCT..FilterExtinctionCoefficients AS fks, EXTINCT..FilterExtinctionCoefficients AS f34, EXTINCT..FilterExtinctionCoefficients AS f45, EXTINCT..FilterExtinctionCoefficients AS f58, EXTINCT..FilterExtinctionCoefficients AS f80 WHERE fz.filterID=5 AND fy.filterID=6 AND fj.filterID=7 AND fh.filterID=8 AND fks.filterID=9 AND f34.filterID=20 AND f45.filterID=21 AND f58.filterID=22 AND f80.filterID=23) AS Ext, GLIMPSE..glimpse_hrc_inter AS g
WHERE s.sourceID=515488609610 AND g.seqNo=406 AND Ext.pixelID=c.pixelID AND Ext.r>5.

10. Matched Aperture Photometry
We store matched aperture photometry for several projects in a set of specially designed tables. The main specifications for each matched aperture product are stored in the table RequiredMatchedApertureProduct. For example, for UltraVISTA, we have produced two products: dual image mode processing of the deep mosaics, and a list-based remeasurement of sources at each epoch. Each product has a separate requirement in the table:

SELECT *
FROM RequiredMatchedApertureProduct
WHERE programmeID=160

This returns:
The requirements include the extractor to use, e.g. SExtractor (SEX) or casu_imcorelist (CASU), the name, which is used as the suffix for the table of apertures, the selection method for generating the list of apertures (dual image mode, SQL selection on existing tables, or a file of inputs) and the mapType (0,1,2). 0 is a band-merged catalogue within the programme, 1, is a band-merged catalogue across archives, e.g. VISTA-VHS and VST-ATLAS and 2 is a multi-epoch catalogue. A second table that includes important specifications is the RequiredMapAverages table, which includes details of averaging schemes for multi-epoch tables.

SELECT *
FROM RequiredMapAverages
WHERE programmeID=160

This returns:
As you can see, there are only averages for mapID=2, the multi-epoch map specified above, but 3 sets of averages are designed, with averaging over 3 months, 6 months and one year respectively.
To link the matched aperture photometry to other data in the release depends on the set up.
For a dual image mode setup, the apertures are defined by detections in a specific catalogue, extracted from the master image(s). For instance in ULTRAVISTA, this is a single Ks image, but the same query would apply to data extracted from multiple pointings. In the detection tables, objID is a unique identifier, so linking the matched aperture 'apertureID' back to the main band-merged sourceID can be done as below.

SELECT ma.apertureID,s.sourceID
FROM ultravistaSource AS s,ultravistaMergeLog AS l,ultravistaDetection AS d, MapApertureIDsultravistaDual AS ma
WHERE s.frameSetID=l.frameSetID AND l.ksmfID=d.multiframeID AND l.kseNum=d.extNum AND s.ksseqNum=d.seqNum AND d.objID=ma.objectID

Where the unique Ks identifier (objID) is linked to the MapApertureIDsultravistaDual objectID. More complex scenarios where a dual image master image is created by combining multiple bands is also possible, but then there may not be a simple link to the original band-merged catalogue.
For SQL selections, the SQL given in the selection points to which tables can be joined to. For instance, in a special VIKING release, VIKINGZYSELJv20170124, uses a selection of J-band VIKING sources with measurements in both Z and Y to define apertures:

SELECT s.sourceID,s.ra,s.dec
FROM vikingSource AS s,vikingMergeLog AS l
WHERE l.frameSetID=s.frameSetID AND l.zmfID>0 AND l.ymfID>0 AND s.jAperMag3>0. AND (s.priOrSec=0 OR s.priOrSec=s.frameSetID)

Since the selection is a subset of the vikingSource catalogue with the same unique identifier, linking these two is straightforward

SELECT ma.apertureID,s.sourceID
FROM vikingSource AS s, VIKINGZYSELJv20170124..MapApertureIDsvikingZY_selJ AS ma
WHERE s.sourceID=ma.objectID
/* ramses9.. */

Adding VIKINGZYSELJv20170124.. and /* ramses9.. */ is necessary for this release
Getting photometry.
Photometry for matched aperture products is stored in the xxxMapRemeasurement table, e.g. ultravistaMapRemeasurement, which has much in common with the xxxDetection tables in terms of attributes, but the primary key is apertureID,catalogueID. Catalogues are defined in the table MapFrameStatus which links them to images. Images can have multiple matched aperture product catalogues, defined by different sets of apertures.
For instance, selecting how many apertures are in each matched aperture catalogue is done as

SELECT m.multiframeID,m.fileName,mfs.catName,COUNT(apertureID) as nApertures
FROM MapFrameStatus as mfs,ultravistaMapRemeasurement as mr, Multiframe as m
WHERE mr.catalogueID=mfs.catalogueID and m.multiframeID=mfs.multiframeID
GROUP BY m.multiframeID,m.fileName,mfs.catName

Forced photometry is different to the original source extraction, in that a measurement can be made anywhere, not only where there are connected pixels above a threshold, so for instance a high redshift quasar may drop out in some shorter wavelength bands, and the measurement may be equivalent to sky. Converting almost zero flux (possibly even negative) to a magnitude and calculating a sensible error doesn't really work, so as well as magnitudes, we calculate luptitudes and calibrated fluxes (calibrated to the Jansky using the AB system). The following gives the calibrated flux, luptitude and standard magnitude in an aperture corrected 1-arcsec radius aperture for objects in the Ultravista Y-band mosaic extracted in dual image mode based on the Ks-band detections, where the Y-band magnitude is default.

SELECT mr.apertureID,mr.x,mr.y,mr.ra,mr.dec,mr.aperJky3,mr.aperLup3,mr.aperMag3,mr.ppErrBits
FROM ultravistaMapRemeasurement as mr
WHERE mr.mapID=1 and mr.aperMag3<0 and mr.filterID=2

The forced photometry products can be grouped together, much like the standard independent detections. A band merged catalogue can be formed, much like the xxxSource table, we produce a xxxSourceRemeasurement. This is simpler, as grouping does not have to be done by ra and dec, simply using the apertureID. Colours are calculated in luptitudes and calibrated flux too. The query below select very red objects (Y-J)>3 (in luptitudes), (J-H)>1 and faint (Ks petro mag >23), which may be very high redshift objects in UltraVISTA.

SELECT sr.apertureID,sr.ra,sr.dec,sr.yAperLup3,sr.yAperLup3Err,sr.jAperLup3,sr.jAperLup3Err, sr.hAperLup3,sr.hAperLup3Err,sr.ksAperLup3,sr.ksAperLup3Err,sr.ksPetroMag
FROM ultravistaSourceRemeasurement as sr
WHERE sr.ymjPntLup>3.0 and sr.jmhPntLup>1.0 and sr.ksPetroMag>23. and sr.ksppErrBits=0 and sr.yppErrBits=0 and sr.jppErrBits=0 and sr.hppErrBits=0

Multi-epoch catalogues do not require a BestMatch table since all epochs can be linked using the apertureID.
A single light curve for a given apertureID can be selected as follows:

SELECT mr.apertureID,m.mjdObs,m.filterID,mr.aperMag3,mr.aperMag3Err,mr.aperJky3, mr.aperJky3Err,mr.ppErrBits
FROM ultravistaMapRemeasurement as mr, Multiframe as m
WHERE m.multiframeID=mr.multiframeID and mapID=2 and apertureID=1000
ORDER BY m.mjdObs

Currently there are no variability tables for matched-aperture multi-epoch tables. To get the matched aperture light-curve in UltraVISTA for objects select using the standard detection table, a match between the sourceID and apertureID, see examples above is necessary. In ULTRAVISTADR4, a mistake occurred so specified selection was not used but the apertures defined by the dual image mode were used instead. The following selects faint (Ks>20. mag) variables (variableClass=1) with at least 50 good Ks band detections, and range in magnitude of >1.2 mag, and then returns the Ks-band matched-aperture light-curve.
SELECT v.sourceID,m.mjdObs,mr.aperMag3,mr.aperMag3Err,mr.ppErrBits
FROM ultravistaVariability as v,ultravistaSource as s,ultravistaMergeLog as l,ultravistaDetection as d, MapApertureIDsultravistaMapLc as ma,ultravistaMapRemeasurement as mr, Multiframe as m
WHERE v.variableClass=1 and v.ksnGoodObs>=50 and v.ksMeanMag>20. and (v.ksMaxMag-v.ksMinMag)>1.2 and v.sourceID=s.sourceID and s.frameSetID=l.frameSetID and l.ksmfID=d.multiframeID and l.kseNum=d.extNum and s.ksseqNum=d.seqNum and d.objID=ma.objectID and ma.apertureID=mr.apertureID and mr.multiframeID=m.multiframeID and mr.filterID=5 and mr.aperMag3>0.

For faint objects like the one above, individual measurements may have low signal to noise, so fainter variables can be analysed at higher signal-to-noise, but at lower temporal resolution by averaging the data. While the user can do this whichever way they like, we provide some already calculated in the table xxxMapRemeasAver, averaged as determined using the specifications in RequiredMapAverages. Below we select the same light-curves but average over a 3-month interval (avSetupID=0) using the MapProvenance table that links the average photometry catalogueID to the individual remeasurements.

SELECT v.sourceID,AVG(m.mjdObs)as avMjdObs, (MAX(m.mjdObs)-MIN(mjdObs)) as interval, mra.aperMag3,mra.aperMag3Err,mra.ppErrBits
FROM ultravistaVariability as v,ultravistaSource as s,ultravistaMergeLog as l,ultravistaDetection as d, MapApertureIDsultravistaMapLc as ma,ultravistaMapRemeasAver as mra, MapProvenance as mv, Multiframe as m,MapFrameStatus as mfs
WHERE v.variableClass=1 and v.ksnGoodObs>=50 and v.ksMeanMag>20. and (v.ksMaxMag-v.ksMinMag)>1.2 and v.sourceID=s.sourceID and s.frameSetID=l.frameSetID and l.ksmfID=d.multiframeID and l.kseNum=d.extNum and s.ksseqNum=d.seqNum and d.objID=ma.objectID and ma.apertureID=mra.apertureID and mra.filterID=5 and mra.aperMag3>0. and mra.catalogueID=mv.combiCatID and mv.avSetupID=0 and mv.catalogueID=mfs.catalogueID and mfs.multiframeID=m.multiframeID
GROUP BY v.sourceID,mra.aperMag3,mra.aperMag3Err,mra.ppErrBits
ORDER BY v.sourceID,AVG(m.mjdObs)

One thing to note, the averaging in time intervals is usually done in pawprint pointings, so an object in a VISTA tile/mosaic will probably be on 2 or more pointings, so will have overlapping intervals. In some cases we have specified to sum over overlaps (see RequiredMapAverages, but usually in special cases with limited numbers of objects.

Home | Overview | Browser | Access | Login | Cookbook
Listing | FreeSQL
Links | Credits

WFAU, Institute for Astronomy,
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ, UK

vsa-support@roe.ac.uk
23/5/2020

An Introductory SQL Tutorial for VSA Users

Contents

1. Introduction