Appendix G.27
G.27 York Database - Incorporation of Temporal Data
- Tables
- D_INTERVAL_ALIAS
- D_INTERVAL_TEMPORAL_1A
- D_INTERVAL_TEMPORAL_1B
- D_INTERVAL_TEMPORAL_2
- Estimated Recurrence Time: Based upon availability
In the absence of the SiteFX (EarthFX) merge tool, a methodology was developed and applied to incorporate temporal data found within the York database (YKDB) into the ‘master’ database (MADB). Between the two databases, common locations and intervals have differing LOC_IDs and INT_IDs. This has been examined (on dates leading up to, for example, each of ‘2015-09-30’, ‘2015-10-02’ and ‘2017-11-16’) and the results incorporated into the D_INTERVAL_ALIAS table of the MADB (i.e. the INT_NAME_ALIAS will be the INT_ID of the interval within the YKDB) and given an INT_ALIAS_TYPE_CODE of ‘1’ (‘Matching INT_ID in the York database’) or ‘2’ (‘Matching INT_ID in the York database - Not used’). These INT_IDs are then used (in the former case) as a basis for review of new temporal data.
Tables containing records to be copied into the MADB (matching, approximately, the format of D_INTERVAL_TEMPORAL_1A, D_INTERVAL_TEMPORAL_1B and D_INTERVAL_TEMPORAL_2) are created within the YKDB and then copied across to the ‘temphold’ database (TMPDB) for subsequent incorporation. These are created (within the YKDB) through the use of a series of views which extract information from the various temporal tables as found in the YKDB. All imports are provided with a distinctive DATA_ID (in the MADB D_DATA_SOURCE table) linking the imported information to a specific version/date of the YKDB.
These series of views created within the copy of the YKDB include, for D_INTERVAL_TEMPORAL_1A:
create view v_tr_dit1a as
select
t.int_id as ypdt_int_id
,t.york_int_id
,t2.ypdt_sam_id
,dit1a.sam_id as sam_id
,dit1a.sam_sample_name
,t2.ypdt_sam_sample_name
,dit1a.sam_sample_date
,dit1a.sam_analysis_date
,dit1a.sam_lab_sample_id
,t2.ypdt_sam_lab_sample_id
,dit1a.sam_lab_job_number
,t2.ypdt_sam_lab_job_number
,dit1a.sam_internal_id
,dit1a.sam_sample_name_ouom
,dit1a.sam_type_code
,t2.ypdt_sam_type_code
,dit1a.sam_sample_date_ouom
,dit1a.sam_comment
,dit1a.sam_data_file
,dit1a.sys_time_stamp
,dit1a.data_id
,row_number() over (order by dit1a.sam_id) as rkey
from
yorkdb_20210503.dbo.d_interval_temporal_1a as dit1a
inner join
-- this is the york-to-ypdt int_id conversion
(
select
dia.int_id
,cast(dia.int_name_alias as int) as york_int_id
from
oak_20160831_master.dbo.d_interval_alias as dia
where dia.int_alias_type_code=1
) as t
on dit1a.int_id=t.york_int_id
left outer join
-- this is the current contents of the master db
(
select
dit1a.int_id
,dit1a.sam_id as ypdt_sam_id
,dit1a.sam_sample_date
,dit1a.sam_sample_name as ypdt_sam_sample_name
,dit1a.sam_lab_sample_id as ypdt_sam_lab_sample_id
,dit1a.sam_lab_job_number as ypdt_sam_lab_job_number
,dit1a.sam_type_code as ypdt_sam_type_code
from
oak_20160831_master.dbo.d_interval_temporal_1a as dit1a
inner join oak_20160831_master.dbo.d_interval_alias as dia
on dit1a.int_id=dia.int_id
where
dia.int_alias_type_code=1
) as t2
on t.int_id=t2.int_id and t2.sam_sample_date=dit1a.sam_sample_date and dit1a.sam_type_code=t2.ypdt_sam_type_code
Note that this extracts ‘all’ data from the YKDB; those not present in the MADB will have a NULL YPDT_SAM_ID.
For D_INTERVAL_TEMPORAL_1B, there is an issue matching the RD_NAME_CODEs between the databases. As such, a view was created comparing the RD_NAME_OUOM to RD_NAME_DESCRIPTION (the latter is found in R_RD_NAME_CODE).
create view v_tr_rd_name_code as
select
t.rd_name_ouom
,case
when rrnc.rd_name_code is not null then rrnc.rd_name_code
when rrna.rd_name_code is not null then rrna.rd_name_code
else -9999
end as ypdt_rd_name_code
,case
when rrnc.rd_name_code is not null then rrnc.rd_name_description
when rrna.rd_name_code is not null then rrna.reading_name_alias
else '-9999'
end as ypdt_rd_name_description
from
(
select
distinct( dit1b.rd_name_ouom ) as rd_name_ouom
from
yorkdb_20210503.dbo.d_interval_temporal_1b as dit1b
inner join yorkdb_20210503.dbo.v_tr_dit1a as v
on dit1b.sam_id=v.sam_id
) as t
left outer join oak_20160831_master.dbo.r_rd_name_code as rrnc
on t.rd_name_ouom like rrnc.rd_name_description
left outer join oak_20160831_master.dbo.r_reading_name_alias as rrna
on t.rd_name_ouom like rrna.reading_name_alias
Unmatched names can either be added to R_READING_ALIAS (where the parameter already exists) or R_RD_NAME_CODE (if it does not); the tag ‘-9999’ is used to signify this. Similarly, the UNIT_CODE is also examined.
create view v_tr_unit_code as
select
t.rd_unit_ouom
,t.rcount
,case
when ruc.unit_code is not null then ruc.unit_code
else
case
when t.rd_unit_ouom = 'pH' then 32
when t.rd_unit_ouom = 'mV' then 106
end
end as ypdt_unit_code
,case
when ruc.unit_code is not null then ruc.unit_description
else
case
when t.rd_unit_ouom = 'pH' then 'pH Units'
when t.rd_unit_ouom = 'mV' then 'millivolts'
end
end as ypdt_unit_description
from
(
select
rd_unit_ouom
,count(*) as rcount
from
yorkdb_20210503.dbo.d_interval_temporal_1b as d1b
inner join yorkdb_20210503.dbo.v_tr_dit1a as v
on d1b.sam_id=v.sam_id
group by
rd_unit_ouom
) as t
left outer join oak_20160831_master.dbo.r_unit_code as ruc
on t.rd_unit_ouom = ruc.unit_description
The view for extracting information from D_INTERVAL_TEMPORAL_1B can now be created.
create view v_tr_dit1b as
select
d1b.sam_id
,v.ypdt_sam_id
,d1b.rd_value_qualifier
,d1b.rd_value
,vrnc.ypdt_rd_name_code
,vuc.ypdt_unit_code
,d1b.rd_mdl
,d1b.rd_rdl
,vrnc.ypdt_rd_name_description
,d1b.rd_value_ouom
,vuc.ypdt_unit_description
,d1b.rd_mdl_ouom
,d1b.rd_rdl_ouom
,d1b.rd_comment
,d1b.sys_record_id
,cast(null as integer) as ypdt_sys_record_id
,d1b.sys_time_stamp
,row_number() over (order by d1b.sys_record_id) as rkey
from
yorkdb_20210503.dbo.d_interval_temporal_1b as d1b
inner join yorkdb_20210503.dbo.v_tr_dit1a as v
on d1b.sam_id=v.sam_id
inner join yorkdb_20210503.dbo.v_tr_rd_name_code as vrnc
on d1b.rd_name_ouom=vrnc.rd_name_ouom
inner join yorkdb_20210503.dbo.v_tr_unit_code as vuc
on d1b.rd_unit_ouom=vuc.rd_unit_ouom
Four equivalent D_INTERVAL_TEMPORAL_2 tables reside in the YKDB (as of 20210503); a view to extract information must be created for each.
These include D_INTERVAL_TEMPORAL_2, D_INTERVAL_TEMPORAL_EXT, D_INTERVAL_TEMPORAL_YORK and D_INTVL_TEMP2). The basic structure for each are the same (the corresponding names are: V_TR_DIT2; V_TR_DITEXT; V_TR_DITYORK; and V_TR_DITVL2).
create view v_tr_dit2 as
select
t.int_id as ypdt_int_id
,t.york_int_id
,d.rd_date
,case
when d.rd_name_code=707 then 67
when d.rd_name_code=711 then 85
else null
end as rd_type_code
,case
when d.rd_name_code=163 then 71037
when d.rd_name_code=369 then 70871
when d.rd_name_code=699 then 447
when d.rd_name_code=700 then 629
when d.rd_name_code=701 then 611
when d.rd_name_code=702 then 612
when d.rd_name_code=703 then 628
when d.rd_name_code=706 then 70899
when d.rd_name_code=707 then 70899
when d.rd_name_code=711 then 629
else null
end as rd_name_code
,case
when d.unit_code=45 then d.rd_value*0.001
else d.rd_value
end as rd_value
,case
when unit_code=45 then 74
when unit_code=6 then 6
when unit_code=3 then 3
else null
end as unit_code
,d.rd_name_ouom
,d.rd_value_ouom
,d.rd_unit_ouom
,d.sys_record_id
,cast(null as integer) as ypdt_sys_record_id
--,row_number() over (order by d2.sys_record_id) as rkey
from
yorkdb_20210503.dbo.d_interval_temporal_2 as d
inner join
(
select
dia.int_id
,cast(dia.int_name_alias as int) as york_int_id
from
oak_20160831_master.dbo.d_interval_alias as dia
where dia.int_alias_type_code=1
) as t
on d.int_id=t.york_int_id
where
d.rd_name_code in (163,369,699,700,701,702,703,706,707)
Note that the RD_NAME_CODE and UNIT_CODE conversions are taking place entirely within the view itself. The RD_TYPE_CODE field is unused in the YKDB but must be specified for the MADB to distinguish pump-on and pump-off conditions.
Due to the number of records (see below), the output from the views were written to a temporary database before determination of their new identifiers (to be used within the MADB) and final insertion.
For combining the three views for D_INTERVAL_TEMPORAL_2, an INTO statement and two insertions were used (with an increment for the row number in RKEY for each of the subsequent sources of information). Note that in addition to checking whether there is ‘new’ data to be loaded, differences in ‘existing’ data (i.e. present in both the YKDB and MADB for the particular interval) are being included (to be incorporated as an update process). For the latter, a range of values is used for comparison purposes.
D_INTERVAL_TEMPORAL_2
select
v.ypdt_int_id
,v.york_int_id
,v.sys_record_id
,v.ypdt_sys_record_id as ormgp_add_sys_record_id
,d.sys_record_id as ormgp_cur_sys_record_id
,v.rd_date
,v.rd_type_code
,v.rd_name_code
,v.rd_value
,d.rd_value as ormgp_cur_rd_value
,v.unit_code
,v.rd_name_ouom
,v.rd_value_ouom
,v.rd_unit_ouom
,row_number() over (order by v.sys_record_id) as rkey
into yorkdb_20210503.dbo.dit2
from
yorkdb_20210503.dbo.v_tr_dit2 as v
left outer join oak_20160831_master.dbo.d_interval_temporal_2 as d
on v.ypdt_int_id=d.int_id and v.rd_date=d.rd_date and v.rd_name_code=d.rd_name_code
where
v.rd_value is not null
and ( d.sys_record_id is null or (d.sys_record_id is not null and not( v.rd_value between (d.rd_value-0.001) and (d.rd_value+0.001) ) ) )
D_INTERVAL_TEMPORAL_EXT
insert into yorkdb_20210503.dbo.dit2
(
ypdt_int_id
,york_int_id
,sys_record_id
,ormgp_add_sys_record_id
,ormgp_cur_sys_record_id
,rd_date
,rd_type_code
,rd_name_code
,rd_value
,ormgp_cur_rd_value
,unit_code
,rd_name_ouom
,rd_value_ouom
,rd_unit_ouom
,rkey
)
select
v.ypdt_int_id
,v.york_int_id
,v.sys_record_id
,v.ypdt_sys_record_id as ormgp_add_sys_record_id
,d.sys_record_id as ormgp_cur_sys_record_id
,v.rd_date
,v.rd_type_code
,v.rd_name_code
,v.rd_value
,d.rd_value as ormgp_cur_rd_value
,v.unit_code
,v.rd_name_ouom
,v.rd_value_ouom
,v.rd_unit_ouom
,( row_number() over (order by v.sys_record_id) ) + 9645129 as rkey
from
yorkdb_20210503.dbo.v_tr_ditext as v
left outer join oak_20160831_master.dbo.d_interval_temporal_2 as d
on v.ypdt_int_id=d.int_id and v.rd_date=d.rd_date and v.rd_name_code=d.rd_name_code
where
v.rd_value is not null
and ( d.sys_record_id is null or (d.sys_record_id is not null and not( v.rd_value between (d.rd_value-0.001) and (d.rd_value+0.001) ) ) )
Note the modification of the ‘rkey’ field - this is to allow each row to be identified as a separate record.
D_INTERVAL_TEMPORAL_YORK
insert into yorkdb_20210503.dbo.dit2
(
ypdt_int_id
,york_int_id
,sys_record_id
,ormgp_add_sys_record_id
,ormgp_cur_sys_record_id
,rd_date
,rd_type_code
,rd_name_code
,rd_value
,ormgp_cur_rd_value
,unit_code
,rd_name_ouom
,rd_value_ouom
,rd_unit_ouom
,rkey
)
select
v.ypdt_int_id
,v.york_int_id
,v.sys_record_id
,v.ypdt_sys_record_id as ormgp_add_sys_record_id
,d.sys_record_id as ormgp_cur_sys_record_id
,v.rd_date
,v.rd_type_code
,v.rd_name_code
,v.rd_value
,d.rd_value as ormgp_cur_rd_value
,v.unit_code
,v.rd_name_ouom
,v.rd_value_ouom
,v.rd_unit_ouom
,( row_number() over (order by v.sys_record_id) ) + 9664584 as rkey
from
yorkdb_20210503.dbo.v_tr_dityork as v
left outer join oak_20160831_master.dbo.d_interval_temporal_2 as d
on v.ypdt_int_id=d.int_id and v.rd_date=d.rd_date and v.rd_name_code=d.rd_name_code
where
v.rd_value is not null
and ( d.sys_record_id is null or (d.sys_record_id is not null and not( v.rd_value between (d.rd_value-0.001) and (d.rd_value+0.001) ) ) )
D_INTVL_TEMP2
insert into yordb_20210503.dbo.dit2
(
ypdt_int_id
,york_int_id
,sys_record_id
,ormgp_add_sys_record_id
,ormgp_cur_sys_record_id
,rd_date
,rd_type_code
,rd_name_code
,rd_value
,ormgp_cur_rd_value
,unit_code
,rd_name_ouom
,rd_value_ouom
,rd_unit_ouom
,rkey
)
select
v.ypdt_int_id
,v.york_int_id
,v.sys_record_id
,v.ypdt_sys_record_id as ormgp_add_sys_record_id
,d.sys_record_id as ormgp_cur_sys_record_id
,v.rd_date
,v.rd_type_code
,v.rd_name_code
,v.rd_value
,d.rd_value as ormgp_cur_rd_value
,v.unit_code
,v.rd_name_ouom
,v.rd_value_ouom
,v.rd_unit_ouom
,( row_number() over (order by v.sys_record_id) ) + 9664584 as rkey
from
yorkdb_20210503.dbo.v_tr_ditvl2 as v
left outer join oak_20160831_master.dbo.d_interval_temporal_2 as d
on v.ypdt_int_id=d.int_id and v.rd_date=d.rd_date and v.rd_name_code=d.rd_name_code
where
v.rd_value is not null
and ( d.sys_record_id is null or (d.sys_record_id is not null and not( v.rd_value between (d.rd_value-0.001) and (d.rd_value+0.001) ) ) )
The D_INTERVAL_TEMPORAL_1A/_1B tables are created using INTO statements.
D_INTERVAL_TEMPORAL_1A
SELECT
ypdt_int_id,
york_int_id,
ypdt_sam_id,
sam_id,
sam_sample_name,
ypdt_sam_sample_name,
sam_sample_date,
sam_analysis_date,
sam_lab_sample_id,
ypdt_sam_lab_sample_id,
sam_lab_job_number,
ypdt_sam_lab_job_number,
sam_internal_id,
sam_sample_name_ouom,
sam_type_code,
ypdt_sam_type_code,
sam_sample_date_ouom,
sam_comment,
sam_data_file,
sys_time_stamp,
data_id,
rkey
into yorkdb_20210503.dbo.dit1a
FROM
yorkdb_20210503.dbo.v_tr_dit1a
where
ypdt_sam_id is null
D_INTERVAL_TEMPORAL_1B
SELECT
sam_id,
ypdt_sam_id,
rd_value_qualifier,
rd_value,
ypdt_rd_name_code,
ypdt_unit_code,
rd_mdl,
rd_rdl,
ypdt_rd_name_description,
rd_value_ouom,
ypdt_unit_description,
rd_mdl_ouom,
rd_rdl_ouom,
rd_comment,
sys_record_id,
ypdt_sys_record_id,
sys_time_stamp,
rkey
into yorkdb_20210503.dbo.dit1b
FROM
yorkdb_20210503.dbo.v_tr_dit1b
where
ypdt_sam_id is null
The row counts for the 20171124 YKDB import were as follows:
- D_INTERVAL_TEMPORAL_1A - 4865
- D_INTERVAL_TEMPORAL_1B - 51119
- D_INTERVAL_TEMPORAL_2 - 15871104
The row counts for the 20181008 YKDB import were as follows:
- D_INTERVAL_TEMPORAL_1A - 679
- D_INTERVAL_TEMPORAL_1B - 8451
- D_INTERVAL_TEMPORAL_2 (update) - 3891217
- D_INTERVAL_TEMPORAL_2 (new) - 7435230
These tables can now be processed for new identifiers and then incorporated in the MADB.