incrementally updating SQL rows - sql

I'm currently dealing with two tables.
One table contains a set of columns like ID, NAME, AGE, TEAM, SCHOOL, and so forth in a table called PRIMARY_TABLE
And I also have an audit table called SECONDARY_TABLE that records updates in the aforementioned values over time.
I have ATTRIBUTE, PREV_VALUES and RECORD_ID columns in this table. It has the following attributes:
the RECORD_ID column corresponds to the ID column of PRIMARY_TABLE
the ATTRIBUTE column will store the column of the PRIMARY_TABLE that is being altered.
For example, if I have
132 NIKO 18 LANCERS JESUIT
143 KEENAN 25 RAIDERS ROCKLAND
in my first table and
132 'AGE' 22
132 'NAME' STEVAN
in my second,
then I want a combined table that has
132 NIKO 18 LANCERS JESUIT
132 NIKO 22 LANCERS JESUIT
132 STEVAN 22 LANCERS JESUIT
143 KEENAN 25 RAIDERS ROCKLAND .
The issue I have a hard time getting around is preserving the values in the unaffected rows. It seems like any idea I have for joining the two tables together won't work for this reason.
Any thoughts? I think the only solution is to create a stored procedure for this. If you need clarification, let me know as well.
EDIT
One more thing...
Here's another thing. The audit table also has a "time_of_change" column. If multiple rows have the same time of change for an ID, then instead of having multiple rows in our resulting table, there should be only one more row.
For example, if our audit table had
132 'AGE' 22 1:00
132 'NAME' STEVAN 1:00
Then instead of having
132 STEVAN 18 LANCERS JESUIT
132 NIKO 22 LANCERS JESUIT
added, there should only be one added row of
132 STEVAN 22 LANCERS JESUIT.
I can't think of any possible way to do this either.

UPDATE2 If you were to have a column with a datetime of an update in the secondary_table (lets call it updated_at) then you can order the resultset appropriately.
SELECT id, name, age, team, school, GETDATE() updated_at
FROM primary_table
UNION ALL
SELECT p.id,
CASE WHEN s.attribute = 'NAME'
THEN s.prev_values ELSE p.name END name,
CASE WHEN s.attribute = 'AGE'
THEN s.prev_values ELSE p.age END age,
CASE WHEN s.attribute = 'TEAM'
THEN s.prev_values ELSE p.team END team,
CASE WHEN s.attribute = 'SCHOOL'
THEN s.prev_values ELSE p.school END school,
updated_at
FROM primary_table p JOIN secondary_table s
ON p.ID = s.record_id
ORDER BY id, updated_at DESC
Here is SQLFiddle demo
UPDATE1 A version with one UNION and conditional output with CASE
SELECT *
FROM primary_table
UNION ALL
SELECT p.id,
CASE WHEN s.attribute = 'NAME'
THEN s.prev_values ELSE p.name END name,
CASE WHEN s.attribute = 'AGE'
THEN s.prev_values ELSE p.age END age,
CASE WHEN s.attribute = 'TEAM'
THEN s.prev_values ELSE p.team END team,
CASE WHEN s.attribute = 'SCHOOL'
THEN s.prev_values ELSE p.school END school
FROM primary_table p JOIN secondary_table s
ON p.ID = s.record_id
ORDER BY id
Here is SQLFiddle demo
Original version with UNIONs
SELECT *
FROM primary_table
UNION ALL
SELECT p.id, s.prev_values, p.age, p.team, p.school
FROM primary_table p JOIN secondary_table s
ON p.ID = s.record_id
AND s.attribute = 'NAME'
UNION ALL
SELECT p.id, p.name, s.prev_values, p.team, p.school
FROM primary_table p JOIN secondary_table s
ON p.ID = s.record_id
AND s.attribute = 'AGE'
ORDER BY id
Output:
| ID | NAME | AGE | TEAM | SCHOOL |
-------------------------------------------
| 132 | NIKO | 18 | LANCERS | JESUIT |
| 132 | STEVAN | 18 | LANCERS | JESUIT |
| 132 | NIKO | 22 | LANCERS | JESUIT |
| 143 | KEENAN | 25 | RAIDERS | ROCKLAND |
Here is SQLFiddle demo

This is not perfect but try this one:
SELECT * FROM Primary_Table
UNION ALL
SELECT * FROM
(SELECT A.ID, A.Name, B.GetAge as Age, A.School
FROM Primary_Table A INNER JOIN (SELECT Record_id, CONVERT(int,Value) as GetAge FROM Secondary_Table
WHERE Attrib='AGE' ) B
ON A.ID = B.Record_id) UAge
UNION ALL
SELECT * FROM
(SELECT A.ID, B.GetName as Name, A.Age, A.School
FROM Primary_Table A INNER JOIN (SELECT Record_id, Value as GetName FROM Secondary_Table
WHERE Attrib='NAME' ) B
ON A.ID = B.Record_id) UName
ORDER BY ID

Here's some script to get you started. You have a big job ahead of you. Things are made quite a bit more difficult if you don't have the current values in the audit table (which it seems like, so I built my script with that assumption).
WITH Curr AS (
SELECT
Record_ID = T.ID,
V.*,
Time_Of_Change = Convert(datetime, 0)
FROM
dbo.Team T
CROSS APPLY (VALUES
('Name', Convert(varchar(20), T.Name)),
('Age', Convert(varchar(20), T.Age)),
('Team', Convert(varchar(20), T.Team)),
('School', Convert(varchar(20), T.School))
) V (Attribute, Prev_Values)
),
Data AS (
SELECT
H.Record_ID,
H.Time_Of_Change,
C.Attribute,
A.Prev_Values
FROM
(
SELECT DISTINCT Record_ID, Time_Of_Change
FROM dbo.Audit
WHERE TableName = 'Team'
UNION ALL
SELECT DISTINCT ID, GetDate()
FROM dbo.Team
) H
CROSS JOIN (VALUES
('Name'), ('Age'), ('Team'), ('School')
) C (Attribute)
CROSS APPLY (
SELECT TOP 1 *
FROM (
SELECT Record_ID, Attribute, Prev_Values, Time_Of_Change
FROM dbo.Audit
WHERE TableName = 'Team'
UNION ALL
SELECT *
FROM Curr
) A
WHERE
H.Record_ID = A.Record_ID
AND H.Time_Of_Change >= A.Time_Of_Change
AND C.Attribute = A.Attribute
ORDER BY
A.Time_Of_Change DESC
) A
)
SELECT *
FROM
Data
PIVOT (Max(Prev_Values) FOR Attribute IN (Name, Age, Team, School)) P
;
See a Live Demo at SQL Fiddle
You can very easily create a view for each table you want to reconstruct the history of. Build a stored procedure that queries the INFORMATION_SCHEMA.COLUMNS view and builds something similar to what I've given you. Run the SP once after any table change, and it will update the views.
I noticed that my script is not quite right--it's showing 2 rows instead of 3 for the edits. Plus, there's no valid time for the rows for which there are no audit rows. But that part makes sense. In any case, you have a big job ahead of you...

Related

Get most recent records where one field is not null

I'm looking to narrow down my database to have only the most records. The most recent records need to have a value in a specific field.
ID Account_nbr Date Name
1 622 7/10/2018 Stu
2 622 7/24/2018
3 151 7/18/2018 Taylor
4 151 7/24/2018 Taylor
This is an example of the database.
I want the code to do this:
ID Account_nbr Date Name
1 622 7/10/2018 Stu
4 151 7/24/2018 Taylor
I have tried the following code:
Select m.*
FROM [table] m
INNER JOIN
(
SELECT last(Date) as LatestDate
,account_nbr
FROM [table]
WHERE Name IS NOT NULL
GROUP BY account_nbr
) b
ON m.Date = b.LatestDate
AND m.account_nbr = b.account_nbr
The output only included the most recent date and did not take into account records that were null in the name field.
I would do :
select t.*
from table as t
where t.name is not null and
t.date = (select max(t1.date)
from table as t1
where t1.account_nbr = t.account_nbr
);
Try this:
Select
m.*
From
[table] As m
Where
m.[Date] In
(Select Max([Date])
From [table] As T
Where T.[Name] Is Not Null
And T.account_nbr = m.account_nbr)

Oracle SQL: Retrieving a record more than once

I'm using Oracle 11 and would like to be able to retrieve a record more than one in a query, which would be a good convenience saving for the next part of my code.
Let's consider this SQL statement:
SELECT ID, NAME FROM PEOPLE WHERE NAME IN ('Alice', 'Bob', 'Alice');
It returns this data:
| 1 | Alice |
| 2 | Bob |
What I'd really like to do is to un-uniquify that list and return the records with duplicates, in the order given. So the above statement would be:
| 1 | Alice |
| 2 | Bob |
| 1 | Alice |
I appreciate that Oracle is optimized to remove repetition like this, and I could re-use the data afterwards, keep it in a store object and retrieve by name etc. I was just wondering if there was a way to make this happen on the database itself.
Oracle has a couple of handy built-in functions that return lists of arguments that you can then transform to a table and join on it. In your case, odcivarchar2list can be used to return a list of varchar2s:
SELECT p.*
FROM TABLE(sys.odcivarchar2list('Alice', 'Bob', 'Alice')) dups
JOIN people p ON p.name = dups.column_value*
query below for record with duplicate
select x.id,x.name from (
select a.id,a.name from people a where a.name in ('Alice')
union all
select a.id,a.name from people a where a.name in ('Bob')
union all
select a.id,a.name from people a where a.name in ('Alice')
) x
Late to the party but just wanted to add you can use a traditional table expression:
select p.id, p.name
from (
select 'Alice' as name from dual
union all select 'Bob' from dual
union all select 'Alice' from dual
) searched s
join people p on p.name = s.name;
Here's another idea:
WITH cteNumbers as (SELECT LEVEL AS N
FROM DUAL
CONNECT BY LEVEL <= 2),
PEOPLE AS (SELECT 'Bob' AS NAME, 111 AS EMPID FROM DUAL UNION ALL
SELECT 'Carol' AS NAME, 222 AS EMPID FROM DUAL UNION ALL
SELECT 'Ted' AS NAME, 333 AS EMPID FROM DUAL UNION ALL
SELECT 'Alice' AS NAME, 444 AS EMPID FROM DUAL)
SELECT *
FROM PEOPLE p
CROSS JOIN cteNumbers
WHERE 1 = CASE
WHEN NAME = 'Alice' THEN 1
WHEN NAME = 'Bob' AND N = 1 THEN 1
WHEN NAME = 'Ted' AND N < 4 THEN 1
WHEN NAME = 'Carol' AND N = 3 THEN 1
ELSE 0
END
ORDER BY NAME, N
Basically, use cteNumbers to generate a list of number (in this case, from 1 to 2 - adjust the CONNECT BY LEVEL condition to control how many numbers are generated), then use the CASE expression in the WHERE clause to control the circumstances under which a particular record's repetitions are selected.
SQLFiddle here

join two SQL rows in a single one

I have three tables in Postgresql, for a biological classification system.
table lang (languages)
id name
1 português
2 english
-------------------------------
table taxon (biological groups)
id name
...
101 Mammalia
-------------------------------
table pop (popular names)
id tax lang pop
...
94 101 1 mamíferos
95 101 2 mammals
I want to get
id name namePT nameEN
101 Mammalia mamíferos mammals
but my join is giving me
id name pop
101 Mammalia mamíferos
101 Mammalia mammals
select t.id,name,pop from taxon t
left join pop p on p.tax = t.id
where t.id = 101
How can I get the desired result in a single row?
If you are happy to change query every time you add a new language then this query will do the trick:
select t.id,name,pe.pop as eng_pop, pp.pop as port_pop
from taxon t
left join pop pe on pe.tax = t.id and pe.lang = 1
left join pop pp on pp.tax = t.id and pp.lang = 2
where t.id = 101
You could use this
SELECT t.id, t.name,
MAX(CASE WHEN p.lang = 1 THEN p.pop END) AS namePT,
MAX(CASE WHEN p.lang = 2 THEN p.pop END) AS nameEN
FROM taxon t
LEFT JOIN pop p
ON p.tax = t.id
GROUP BY t.id, t.name;
Here's how I got the results:
with base as (
select t.id, t.name,
case when lang = 1 then 'mamiferos' else null end as namePT,
case when lang = 2 then 'mamals' else null end as nameEN
from taxon t
left join pop p on t.id = p.tax
group by 1,2,3, p.lang
)
select
distinct id,
name,
coalesce(namept,'mamiferos',null) as namept,
coalesce(nameen,'mamals',null) as nameen
from base
where id = 101
group by id, name, namept, nameen;
id | name | namept | nameen
-----+----------+-----------+--------
101 | Mammalia | mamiferos | mamals
(1 row)

cross joining two tables

I have a table that looks like this, lets call this table B.
id boardid schoolid subject cnt1 cnt2 cnt3 ....
=================================================================
1 20 21 f
2 20 21 r
3 20 21 w
4 20 21 m
5 20 30 r
6 20 30 w
7 20 30 m
Suppose the counts are just integers. Notice that there is no subject = f for schoolid = 30. Similarly, for most schools, some subject dosnt exist. You might have a schoolid that has just r, w or some that are just r, m, f..
So what I want to do is have 4 consistent rows for each school, and the row that dosnt exist I want dummy values. I thought about creating a secondary table
drop table #A
Select * into #A FROM
(
select [subject_s] = 'r', orderNo = 1
union all
select [subject_s] = 'w', orderNo = 2
union all
select [subject_s] = 'm', orderNo = 3
union all
select [subject_s] = 'f', orderNo = 4
) z
and doing some joins on them, but I've gotten NO where. I've tried inner join, left outer, cross join, everything. I've even tried to make cartesian product. I think my cartesian product messes up because I have orderno in there so it makes 16 rows per row in the main table. Actually typing this out, I realize if I remove the orderno, apply the cartesian product and then add orderno in later, it might work but I am interested to see what you guys can come up with. I am stumped.
End result
id boardid schoolid subject cnt1 cnt2 cnt3 ....
=================================================================
1 20 21 r
2 20 21 w
3 20 21 m
4 20 21 f
5 20 30 r
6 20 30 w
7 20 30 m
7 20 30 f
Try the following:
SELECT S.boardid, S.schoolid, A.[subject], B.cnt1, B.cnt2, B.cnt3
FROM (SELECT DISTINCT boardid, schoolid FROM YourTable) S
CROSS JOIN #A A
LEFT JOIN YourTable B
ON B.boardid = S.boardid AND B.schoolid = S.schoolid
AND A.[subject] = B.[subject]
Since I do not know which RDBMS you are using I tried the following with sqlite and a simpler table:
sqlite> create table schools (name varchar, subject varchar, teacher varchar);
sqlite> select * from schools;
School1|Maths|Mr Smith
School2|English|Jack
School3|English|Jimmy
School3|Maths|Jane
School4|Computer Science|Bob
sqlite> select
schoolnames.name,
subjects.subject,
ifnull(teachers.teacher, "Unknown")
from (select distinct name from schools) schoolnames
join (select distinct subject from schools) subjects
left join schools teachers
on schoolnames.name = teachers.name
and subjects.subject = teachers.subject;
School1|Maths|Mr Smith
School1|English|Unknown
School1|Computer Science|Unknown
School2|Maths|Unknown
School2|English|Jack
School2|Computer Science|Unknown
School3|Maths|Jane
School3|English|Jimmy
School3|Computer Science|Unknown
School4|Maths|Unknown
School4|English|Unknown
School4|Computer Science|Bob
I'd use:
SELECT
boardid, schoolid, dist_subject, id, cnt1, ...
FROM
(SELECT
boardid, schoolid, dist_subject
FROM
(SELECT
DISTINCT subject AS dist_subject
FROM b ) s full outer join
(SELECT
boardid, schoolid
FROM b
GROUP BY
boardid, schoolid ) g ) sg LEFT OUTER JOIN
b ON
sg.boardID = b.boardID AND
sg.schoolid = b.schoolID
sg.dist_subject = b.subject

how to get biggest result from a sql result in postgresql

I am using postgresql 8.3 and I have a simple sql query:
SELECT a.id,a.bpm_process_instance_id,a.actor_id
FROM bpm_task_instance a
WHERE a.bpm_process_instance_id IN
(
SELECT bpm_process_instance_id
FROM incident_info
WHERE status = 12
AND registrant = 23
)
so, I got a result set like this:
id instance_id actor_id
150 53 24
147 53 26
148 53 25
161 57 26
160 57 26
158 57 24
165 58 23
166 58 24
167 58 24
now, I want to get the max id by instance_id, and the result is like blew
id instance_id actor_id
150 53 24
161 57 26
167 58 23
how could I get the result ? I use the following sql, but get an error.
ERROR: relation "x" does not exist
SELECT *
FROM (SELECT a.id,a.bpm_process_instance_id,a.actor_id
FROM bpm_task_instance a
WHERE a.bpm_process_instance_id IN
(
SELECT bpm_process_instance_id
FROM incident_info
WHERE status = 12
AND registrant = 23
)
) AS x
WHERE x.id = (
SELECT max(id)
FROM x
WHERE bpm_process_instance_id = x.bpm_process_instance_id
)
anyone who can help me , thanks a lot!
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE the_table
( id INTEGER NOT NULL
, instance_id INTEGER NOT NULL
, actor_id INTEGER NOT NULL
);
INSERT INTO the_table(id, instance_id, actor_id) VALUES
(150,53,24) ,(147,53,26) ,(148,53,25)
,(161,57,26) ,(160,57,26) ,(158,57,24)
,(165,58,23) ,(166,58,24) ,(167,58,24)
;
SELECT id, instance_id, actor_id
FROM the_table dt
WHERE NOT EXISTS (
SELECT *
FROM the_table nx
WHERE nx.instance_id = dt.instance_id
AND nx.id > dt.id
);
Result (note: the last row differs!):
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 9
id | instance_id | actor_id
-----+-------------+----------
150 | 53 | 24
161 | 57 | 26
167 | 58 | 24
(3 rows)
UPDATE: this is the query including the other subquery and the missing table, and the original (ugly) column names, all packed into a CTE:
WITH zcte AS (
SELECT ti.id AS id
, ti.bpm_process_instance_id AS instance_id
, ti.actor_id AS actor_id
FROM bpm_task_instance ti
WHERE EXISTS ( SELECT * FROM incident_info ii
WHERE ii.bpm_process_instance_id = ti.bpm_process_instance_id
AND ii.status = 12
AND ii.registrant = 23
)
)
SELECT id, instance_id, actor_id
FROM zcte dt
WHERE NOT EXISTS (
SELECT *
FROM zcte nx
WHERE nx.instance_id = dt.instance_id
AND nx.id > dt.id
);
UPDATE addendum:
Oops, the bad news is that 8.3 did not have CTE's yet. (think about upgrading). The good news is: as a workaround you could make zcte () as a (temporary) VIEW, and refer to that instead.
try this:
select a.id,a.bpm_process_instance_id,a.actor_id
from bpm_task_instance A
inner join
(select max(a.id) as id,a.bpm_process_instance_id
from bpm_task_instance a
where a.bpm_process_instance_id in
( select bpm_process_instance_id
from incident_info
where status = 12 and registrant = 23
)
group by a.bpm_process_instance_id)B
on A.bpm_process_instance_id=B.bpm_process_instance_id
and A.id=B.id
#wildplasser
SELECT dt.* FROM
(
SELECT id,bpm_process_instance_id,actor_id
FROM bpm_task_instance WHERE bpm_process_instance_id in
(
SELECT bpm_process_instance_id FROM incident_info
WHERE status = 12 and registrant = 23
)
) as dt
WHERE NOT EXISTS (
SELECT *
FROM bpm_task_instance nx
WHERE nx.bpm_process_instance_id = dt.bpm_process_instance_id
AND nx.id > dt.id
)
ORDER BY id asc
At large scale, the DISTINCT ON syntax is sometimes faster than the perfectly valid answers already given.
SELECT DISTINCT ON (instance_id)
id, instance_id, actor_id
FROM the_table dt
ORDER BY instance_id, id DESC;
Once you get used to this syntax, you may find it easier to read than the alternatives. Inside the parentheses in the DISTINCT ON clause you put the list of columns which should be unique, and the ORDER BY clause must start with matching columns and continue with enough columns to ensure that the one you want to keep comes first.