View on GitHub

Oak Ridges Moraine Groundwater Program - Database Manual

ORMGP Database Manual

Section 3.3.4 Frequently Asked Questions (FAQ)

This section outlines various suggestions regarding database use that don’t easily fit into previous categories. These include:

Are database tables like spreadsheets?

Cells (i.e. row and column intersections) are treated differently in a spreadsheet versus in relational database software. In the former, a cell can contain information OR it can contain formulas and/or formatting directives; in the latter, only information (i.e. a value) is found. An exception to this could occur in views where a calculation could create the values in a particular field. Spreadsheet cells can also be dependent upon values or calculations in other cells, whereas in databases, cells (values) are independent (but logically related (through the use of primary and foreign keys - these are columns/values that occur across tables but are equivalent, allowing the tables to be related based upon them).

Collation - what is it and how can I work around it?

Collation is concerned with how character (i.e. text) data is interpreted by SQL Server (this includes any of the ‘char()’, ‘varchar()’, ‘text’, ‘nchar()’, ‘nvarchar()’ and ‘ntext’ field types). SQL Server has a number of different collations available - when comparing text fields (using, for example, an ‘=’ sign), if they are set to different collations, an error will be returned, along the lines of

Msg 468, Level 16, State 9, Line 9 Cannot resolve the collation conflict between “Latin1_General_CI_AS” and “SQL_Latin1_General_CP1_CI_AS” in the equal to operation.

In general, this should only happen when comparing text data across multiple databases. In order to perform this comparison, then, the text fields being compared must be assigned (temporarily) the same collation. This can be done, for example, as follows

SELECT
dloc.*
FROM
[OAK_20120615_Master].dbo.D_LOCATION as dloc
INNER JOIN 
[MOE_201304].dbo.TBLBORE_HOLE as moe
ON
dloc.LOC_ORIGINAL_NAME COLLATE DATABASE_DEFAULT = moe.WELL_ID COLLATE DATABASE_DEFAULT

Note that we’re re-assigning the two text fields (one from the ‘OAK_20120615_MASTER’ database and the other from the ‘MOE_201304’ database; the latter being the April cut of the MOE water well database, distributed in Microsoft Access format) to the default collation assigned to this (current) version of the database. Instead of specifying ‘DATABASE_DEFAULT’, users could choose any of the collation types (for example, ‘SQL_Latin1_General_CP1_CI_AS’ as found in the error message, above). Alternative methods are available for ‘permanently’ assigning/changing the collation (these are not further commented upon).

For general interest, in this particular case, we’re comparing the LOC_ORIGINAL_NAME from OAK_20120615_MASTER against the WELL_ID in MOE_201304 where the former (for MOE wells) contains the original name as applied by the MOE. This allows the data to be compared between the source and current databases.

Depth to Elevation conversion - SiteFX

This description is also found in Section 2.1 (describing the R_UNIT_CONV table).

For conversion of depths to elevations, SiteFX does not use the above table. Instead, any of the following units

when encountered (in D_INTERVAL_TEMPORAL_2) are automatically converted (to masl) if a ‘reference elevation’ (in D_INTERVAL_REF_ELEV) is specified for the interval.

Did the well reach bedrock?

The view V_GEN_BOREHOLE_BEDROCK returns all those borehole locations which reach bedrock along with a number of identification fields regarding the particular location. The bedrock elevation has been included as a field in the D_BOREHOLE table (BH_BEDROCK_ELEV) and is populated automatically (refer to Appendix G.32 for details). All boreholes where this field is not null would be considered bedrock wells.

Using either this view (V_GEN_BOREHOLE_BEDROCK) or the BH_BEDROCK_ELEV in the D_BOREHOLE table, users can create their own query (using techniques outlined in the ‘Training’ sections) to extract information for bedrock wells.

For what location types can geologic information be found?

Geologic information could be found within any of the following (location) types (as found in LOC_TYPE_CODE in D_LOCATION)

LOC_TYPE_CODE’s are referenced in the R_LOC_TYPE_CODE look-up table.

How do I find wells that were previously referred to as Golden Spikes?

The term “Golden Spike” was coined by the Geological Survey of Canada to refer to wells that were cored and properly logged by a geoscientist and that had screened pipes installed to record water levels and to obtain samples for water quality analyses. The problem arose as to those wells that may have been rotary drilled or augered and sampled rather than cored, but that were also logged by a geoscientist. Were these then ‘Golden Spikes’? Or if wells were not logged but had screened intervals and had many high quality water level measurements and water samples from a known depth. Were these to be considered ‘Golden Spikes’ from a hydrogeology perspective?

Rather than grading wells as ‘Golden Spikes’ of one form or another it is now left to the user to determine the information that they want to extract from the database. To find higher quality information from the database the following tips are provided.

DRILL_METHOD_CODE

Wells that have been cored tend to have better geological information. Within the D_BOREHOLE table wells that have a DRILL_METHOD_CODE of ‘42’ (‘PQ Cored’) or ‘14’ (‘HQ Cored’) would have a better geological record. Following on the cored boreholes, those with a DRILL_METHOD_CODE of ‘13’ (‘Hollow Stem Auger’) would also be overseen by a consultant and would also tend to have better geological records. Note that many, if not all, of the UGAIS boreholes are flagged as ‘Hollow Stem Auger’ drilled - it is unknown if this was indeed the case.

LOC_STUDY

For wells that are more important from the perspective of one of the partner agencies or from an overall ORMGP perspective the Loc_Study field is generally populated to flag details regarding a particular study the borehole may be associated with. Querying on the Loc_Study field (key words that may yield better quality wells include IWA/GSC/OGS/MOE OWRC, TTC) - or even filtering for wells where the Loc_Study field is “not null” will generally turn up wells that may have higher quality information associated with them. This is currently a “rule of thumb” users are advised that it could be that some high quality boreholes do not have a populated Loc_Study field and it may also be possible that some poorer quality wells do have a populated Loc_Study field as a result of their being part of a group of wells.

D_LOCATION_PURPOSE

At times, key wells can be associated with a primary or secondary purpose. For example, if one is looking for municipal wells, a query that selects wells with a PRIMARY_PURPOSE_CODE of ‘1’ (‘Water Supply’) coupled with a SECONDARY_PURPOSE_CODE of ‘22’ (‘Municipal Supply’), ‘58’ (‘Municipal Monitoring’) or ‘59’ (‘Municipal Exploration’) would return all of the wells in the database that are known to be associated with any municipality’s groundwater supply.

V_GEN_LAB_SUMMARY_SAMPLE_COUNT and V_GEN_FIELD_SUMMARY

These views summarizes the number of temporal measurements that are in the database for any particular location (by interval). However, depending on any filtering applied, these views can take some time to run. However, if a user is looking for wells that have many temporal measurements associated with them (e.g. water levels, pumping data, chemistry, etc.) these views are a good way to evaluate these locations.

V_GEN_HYDROGEOLOGY

This view summarizes the water level information for the wells within the database. If one is looking for wells with many water level measurements, to plot a hydrograph for instance, the summary in this table will indicate the best wells. Note that this view can take a few minutes to load depending on the size of your agency’s database.

How do I access soil/rock sample information

An existing field, SAM_TYPE_CODE, would seemingly provide this information. However, this is a general field (tied to R_SAM_TYPE_CODE) listing whether the sample taken is, for example, a ‘Duplicate’, ‘Regular Sample’, ‘Lab Replicate’, etc?

Instead, the user can access the D_INTERVAL table (as a first step) to determine which intervals are given a ‘Soil or Rock’ designation (and INT_TYPE_CODE of ‘29’). The INT_ID provided here allows a filter to be applied to D_INTERVAL_TEMPORAL_1A determining which samples are related to a ‘Soil or Rock’ interval. The SAM_ID from the D_INTERVAL_TEMPORAL_1A table can then be used to extract the actual parameter information from D_INTERVAL_TEMPORAL_1B.

The SQL code example is shown (note that insets are used to separate the ‘sections’ of the SQL statement):

select
dit1b.*
from 
[OAK_20120615_CLOCA].dbo.D_INTERVAL_TEMPORAL_1B as dit1b
where 
dit1b.SAM_ID
in
(
    select
    dit1a.SAM_ID
    from 
    [OAK_20120615_CLOCA].dbo.D_INTERVAL_TEMPORAL_1A as dit1a
    where 
    dit1a.INT_ID
    in
    (
        select
        dint.INT_ID
        from 
        [OAK_20120615_CLOCA].dbo.D_INTERVAL as dint
        where
        dint.INT_TYPE_CODE=29
    )
)

How do I assign a row number to a randomly created identifier

In many cases, when assembling data in tables for eventual import into the ORMGP database, it is desirable to create a series of 32-bit (4 byte) identifiers - these are commonly used within various tables as a primary key. The ORMGP database contains a number of views (V_SYS_RANDOM_ID_*) which return rows containing a single internally non-duplicated value (usually to a maximum of one million rows). This value would then need to be checked against the identifier values currently found within the table under examination and a way of relating the random-table to the database-table specified. One of the easiest is through assignment of a ‘row-count’ field for both tables - allowing the tables to be related based upon this (otherwise unnecessary) numeric value. A way to accomplish this is shown, following:

select
 ycb.LOC_ID
,vr2.NEW_ID as [INT_ID]
from 
[MOE_20210901].dbo.YC_20210901_BHID as ycb
inner join 
(
    select 
     vr1.NEW_ID
    ,row_number() over (order by vr1.NEW_ID) as RNUM
    from 
	(
		select 
		NEW_ID
		from 
		OAK_20160831_MASTER.dbo.V_SYS_RANDOM_ID_CLOCA 
		where 
		NEW_ID
		not in
		(
		   select INT_ID from OAK_20160831_MASTER.dbo.D_INTERVAL
		)
	) as vr1
) as vr2
on
ycb.RNUM=vr2.RNUM

This determines random INT_ID’s that do not already exist in the D_INTERVAL table. Note that the ‘row-count’ field (RNUM) is created after determining whether a particular value is already present in the table being examined.

How do I connect two instances of SQL Server

When working with the database it is, at times, useful to be able to connect to a separate database running under another instance of SQL Server. For example, the partner version of the ORMGP database may be running on the main server in the IT department while the user is working with a temporary database located on their local machine (this second database would contain additional queries useful to the user but not available in the main database). SQL Server has a built-in ‘stored procedure’ available for linking the two instances, ‘sp_addlinkedserver’, accessed (for example) as follows (as an SQL statement)

sp_addlinkedserver 'MAIN\PARTNER',@srvproduct='SQL Server'

In this example it is assumed the user is already logged into an instance of SQL Server (other than the one listed). Using this command, we’re linking to another instance of SQL Server named ‘MAIN\PARTNER’. Accessing information, then, from that server would then entail listing the name of the server in-addition to the database, database object and table name, as follows

SELECT \* FROM [MAIN\PARTNER].OAK_20160831_CLOCA.dbo.D_LOCATION

Note that the user must have adequate permissions (granted by the database administrator) to be able to access this ‘stored procedure’. Once the user is finished connecting to the additional instance it can be removed using another stored procedure ‘sp_dropserver’, as follows

sp_dropserver 'MAIN\PARTNER'

If windows authentication is not being used, the stored procedure ‘sp_addlinkedsrvlogin’ is used to specify the login name and password required to access the remote server. Note that the user should check with the network/computer administrator regarding security concerns when using this particular stored procedure. The command itself is of the form

sp_addlinkedsrvlogin
@rmtsrvname='<server name>'
,@useself='<TRUE|FALSE|NULL>'
,@locallogin='<local login account>'
,@rmtuser='<remote user name>'
,@rmtpassword='<remote password>'

where

This command must be run after the ‘sp_addlinkedserver’ command. A further command ‘sp_droplinkedsrvlogin’ must be run (as well) when disconnecting from the remote server. This is of the form

sp_droplinkedsrvlogin 
@rmtsrvname='<server name>'
,@locallogin='<local login account>'

Alternatively, if working outside of ‘Microsoft SQL Server Management Studio’, packages such as Microsoft Access/Excel (or almost any that allow connection to databases through ODBC) allow tables to be imported/linked from any valid DSN source (refer to Section 3.1).

How do I copy a backup of the database (and overwrite my original)

For stand-alone copies of the database, updates are performed through a restore-from-backup procedure. The user will need to copy the file to an accessible location available to the SQL Server (where the user’s database currently is present; for example, C:\TEMP). Then, to ‘restore’ (i.e. replace) your local database, select by ‘right-clicking’ on the database name within ‘SQL Management Studio’. Then:

Note that depending upon the size of the database to be restored, the actual restore process may take some time. This should not be done against a replicating database.

How do I create random integers in the database

The view (found in the supplementary database OAK_SUP, i.e. not in the ‘Master’) ‘V_Random_ID_Creator’ will create up-to 1,000,000 random NEW_ID integers that can be used for other tables. It runs the following (example) code, internally (note the limiting 1000000 value).

SELECT     
NEW_ID
FROM        
(SELECT  NEW_ID, COUNT(*) AS rcount
FROM
(SELECT TOP (1000000) CONVERT([int], CONVERT([varbinary](4000), NEWID(), 0), 0) 
AS NEW_ID
FROM 
OAK_20160831_MASTER.dbo.D_INTERVAL_TEMPORAL_2) AS nid
GROUP BY NEW_ID) AS nidnd
WHERE (rcount = 1)

This returns a single value (for each row) that does not duplicate itself within the query. Note the use of the D_INTERVAL_TEMPORAL_2 table - the number of rows returned by the query is dependent upon the table chose; the use of this table avoids this limitation.

Users should compare the returned numbers against a list (from the requisite table) of IDs. Refer to ‘How do I assign a row number to a randomly created identifier*’ (above) for further details.

How do I know if a surface water station is a long term gauge versus a spot flow location

The SW_COMMENT field in D_SURFACEWATER can be examined for text starting with ‘Local Station’ which identifies local spot flow stations. Also, the SW_SUBTYPE_CODE of ‘3’ (‘Local Station - Spotflow’) can be used to access these locations (through their LOC_ID) as well.

SW stations are also flagged in the LOC_STUDY field in the D_LOCATION table as being ‘Spot Flow’ or ‘Environment Canada Station’. A filter on the LOC_STUDY field for ‘spot flow’ would yield all of the spot flow locations (and also allows partitioning of the results by the conservation authority that originally provided the information).

The view V_GEN_STATION_SURFACEWATER provides information on surface water stations as well as summarizing the number of flow measurements available (including the time period of measurement).

How is geologic information interpreted and to what tables in the database is this information assigned

All of the geology information is stored in the D_GOLOGY_LAYER table. Given an example geologic description ‘Brown fine to medium sand, with gravel, some silt and trace clay, dense, moist, trace organic material’, the field breakdown would be (refer to the look-up tables for the numeric values)

Note that the term ‘Dense’ can be attributed in each of the GEOL_MAT1-4_CODE fields as well as in the consistency codes. If there is room (i.e. the geological description is short, e.g. ‘Till, Silty, Dense’) then the ‘Dense’ descriptor may be found in the GEOL_MAT3_CODE field (a value of ‘66’). The entire description would be stored in the GEOL_DESCRIPTION field.

Is the Well Flowing?

The field MON_FLOWING can be examined in the D_INTERVAL_MONITOR table; a non-null or non-zero value indicates a flowing condition. A value greater than zero (and not a null) in the FLOWING_RATE_IGPM field in the D_PUMPTEST table indicates a flowing condition as well.

Temperatures for inclusion as temporal data

Users should not input ‘Temperature’ records without further selecting the appropriate temperature reading name based upon the interval type (e.g. climate stations should be meteorological, screens should be logger related and spot flows should be field related). Care should be taken when inputting these values. Don’t forget that the RD_NAME_OUOM must be specified correctly as well (otherwise SiteFX, if used, may overwrite the otherwise correctly coded RD_NAME_CODE).

UGAIS wells - how can we find them

UGAIS (Urban Geology Analysis Information System) wells have their LOC_NAME in D_LOCATION pre-pended with a ‘UGAIS-‘ tag. Finding the UGAIS locations is accomplished by:

SELECT
*
FROM
D_LOCATION
WHERE
LOC_NAME LIKE 'UGAIS%'

Under Microsoft Access, the ‘%’ sign (used in SQL Server to denote a wildcard) is replaced with a ‘*’. Along with other geotechnical wells, the UGAIS wells are also flagged with a PURPOSE_PRIMARY_CODE of ‘3’ (‘Engineering) and a PURPOSE_SECONDARY_CODE of ‘47’ (‘Geotechnical Testhole’) - these wells can also be found using these codes.

Use of SYS_TEMP1 and SYS_TEMP2 as temporary markers

SYS_TEMP1 and SYS_TEMP2 are found in most data (D_*) tables throughout the database. It is important to know that the contents of both of these fields are not guaranteed to remain constant over time - i.e. they can be reassigned by any user at any time. As such, they can be used to mark a subset of records within the database, temporarily, for further analysis/investigation. This can be done by assigning a value to either field that would be totally different from any other value present - note that SYS_TEMP1 is a text field and SYS_TEMP2 is an integer (whole numbers only) field. For example, assigning the current date in the form ‘20120829’ to the SYS_TEMP2 field for a subset of records in the D_LOCATION table allows queries to be run against D_LOCATION looking for that particular value in that particular field. The fields can be useful for flagging a set of locations that are required by a consultant working on a small part of your agency’s jurisdiction. The ‘Consultant Views’ can then be run to only extract the information from the wells flagged through one of these fields.

What is the use of the look-up (R_*) tables

The ‘R_*’ tables enable the database to store data efficiently using a process of normalization. This involves creating additional tables each with less-and-less data but with a defined ‘relationship’ allowing the re-assembly of the information as required. In so doing, the opportunity for blank fields is reduced and the database has improved data integrity (i.e. reduction in data errors). If the process is used to an extreme, however, it can lead to slow response times since long, involved queries are necessary for any request. Thus, a compromise is necessary relating data integrity, database response times and database usability. An example of this is the use of the views within the ORMGP database. In some cases, the slower running time as the information is extracted from multiple tables, converted from a ‘numeric code’ to a ‘text description’ and incorporated in the results balances with the complexity of the query (hidden from the user) necessary to extract the information.

The MOE WWIS typically provides two pumping rates for every well: the first is the actual pumping rate that was used to test the well during the initial testing; and the second is the pumping rate that the driller recommends be used for the well based on the response of the well to the initial pumping.

Where is the MOE Static Water Level?

The MOE water levels (including static) are stored in the D_INTERVAL_TEMPORAL_2 table. The MOE water levels have been split into those that reflect the static condition (i.e. an RD_NAME_CODE of ‘628’, ‘Water Level - Manual - Static’) versus those that may reflect pumping conditions (i.e. an RD_NAME_CODE of ‘70899’, ‘Water Level - Manual - Other’). In addition, an RD_TYPE_CODE of ‘0’ (‘MOE Well Record - Static’), ‘64’ (‘MOE Well Record - Recovery’), ‘31’ (‘MOE Well Record - Pumping or Recovery’) or ‘65’ (‘MOE Well Record - Pumping’) allows the identification of specific measurements that come from the MOE.