SQL Dynamically Joining Tables on Various Columns - sql

First time posting!
Have a use case where we want to join some sales data to a master agreement table to determine applicable fee's at a transactional level.
The hard part is that the agreement table has VARIOUS possibilities, and in a worse case scenario at least a "catch all".
We would want to start at the *most granular" level. So the purple line matches on all possible values.
However, a field like the blue sales record does not match on any value to the master except supplier, so in that case it is a catch all.
I've thought of concat'ing all the rows in the master, but then I'd need to find a way of joining it to sales? a simple concat would not successfully join the blue row example together. So it's like the join would have to dynamically choose which columns to compare.
By chance would any users have some idea's on how to achieve this?
Thanks!
(Code for tables)
create TABLE T_TEST_AGREEMENT (
SUPPLIER VARCHAR(254),
ITEM VARCHAR(254),
PROGRAM INT,
RXDA VARCHAR(254),
CTRCT INT,
FEE INT
);
create TABLE T_TEST_AGREEMENT_SALES (
SUPPLIER VARCHAR(254),
ITEM VARCHAR(254),
PROGRAM INT,
RXDA VARCHAR(254),
CTRCT INT
);
INSERT INTO T_TEST_AGREEMENT values
(123, 'A', 60, 'Y', 4, 1),
(123, 'A', 61, 'N', 4, 2),
(123, 'B', 62, null, 5, 3),
(123, 'C', null, 'Y', 6, 4),
(123, null, 63, null, null, 5),
(123, null, null, 'Y', null, 6),
(123, null, null, null, null, 7);
INSERT INTO T_TEST_AGREEMENT_SALES values
(123, 'D', 63, null, null),
(123, 'F', null, null, null),
(123, 'A', 61, 'N', 4),
(123, 'C', null, 'Y', 6);

You can use a correlated subquery:
select st.*,
(select m.fee
from master m
where m.supplier = st.supplier and
(m.item is null or m.item = st.item) and
(m.program is null or m.program = st.program) and
(m.rxda is null or m.rxda = st.rxda) and
(m.ctrct is null or m.ctrct = st.ctrct)
order by ( (case when m.item = st.item then 1 else 0 end) +
(case when m.program = st.program then 1 else 0 end) +
(case when m.rxda = st.rxda then 1 else 0 end) +
(case when m.ctrct = st.ctrct then 1 else 0 end) +
) desc
fetch first 1 row only
) as fee
from sales_transactions st;
This uses standard SQL syntax. It might vary depending on your row.

Related

Most efficient way to update table column based on sum

I am looking for the most efficient / minimal code way to update a table column based on the sum of another value in the same table. A method which works and the temp table are shown below.
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) as sum_price_bystate
from #t1 t2_in
group by t2_in.id, t2_in.astate
) t2
on t1.id = t2.id
and t1.astate = t2.astate
update t1
set total_id_price = sum_price
from #t1 t1
inner join (
select t3_in.Id,
sum(t3_in.price) as sum_price
from #t1 t3_in
group by t3_in.id
) t3
on t1.id = t3.id
select * from #t1
The main thing I don't like about my method is that it requires an inner join with a subquery that requires the same table itself. So I am looking for a way that might be able to avoid this, although I don't think this method I have is overly complicated. Maybe there isn't any method too much more efficient.
To add, I am wondering what the best way would be to combine the two updates together, since they are very similar, but only differ by the group by clause.
As pointed out in the comments, this is not a good way to store data as it violates the basic principles of normalisation -
you are storing data that you can compute
you are storing the same data multiple times, ie, duplicates.
you need to re-calculate the totals whenever any individual values changes
it's possible to update a single row and create a data contradiction
it's also not a bad thing to pre-calculate aggregations, especially in a data warehouse scenario, but you would still only store the value once per unique key.
Normalisation prevents these issues.
Saying that, you can utilise analytic window functions to compute your values in a single pass over the table:
select *,
Sum(price) over(partition by id, astate) total_id_price_bystate,
Sum(price) over(partition by id) total_id_price
from #t1;
If you really want the data in this format you could create a view and query it:
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
select *
from Totals where id = 100;
And to answer your specific question, a view (or a CTE) that touches a single base table can be updated so you can accomplish what you are doing like so:
drop view Totals;
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
update totals set
total_id_price_bystate = total_bystate,
total_id_price = total;
You can use PARTITION BY to get the two different aggregated value,
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate,total_id_price=sum_price
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) over(partition by t2_in.id, t2_in.astate) as sum_price_bystate,
sum(t2_in.price) over(partition by t2_in.id) as sum_price
from #t1 t2_in
) t2
on t1.id = t2.id
and t1.astate = t2.astate
select * from #t1

Create SQL view to show mark for each subject and total of marks for each student

I need to create a view from these 3 tables to show data like in this screenshot:
Create these 3 tables:
CREATE TABLE Student
(
idStu int PRIMARY KEY,
name varchar(30)
)
CREATE TABLE Subjects
(
idSub int PRIMARY KEY,
subjName varchar(30)
)
CREATE TABLE Exam
(
idStu int REFERENCES Student,
idSub int REFERENCES Subjects,
Mark float,
CONSTRAINT idStu_idSub PRIMARY KEY(idStu, idSub)
)
Then I'm inserting some values:
INSERT INTO Student
VALUES (1, 'Jacob'), (2, 'Amilee')
INSERT INTO Subjects
VALUES (1, 'Mathematics'), (2, 'Science'), (3, 'English')
INSERT INTO Exam
VALUES (1, 1, 10), (1, 2, 9), (1, 3, 8),
(2, 1, 9), (2, 2, 10), (2, 3, 7)
You need conditional aggregation (aka PIVOT)
CREATE VIEW View_MyHomeworkThatICantBeBotheredToDo
AS
SELECT
Student = stu.name,
Mathematics = SUM(CASE WHEN sub.subjName = 'Mathematics' THEN e.Mark END),
Science = SUM(CASE WHEN sub.subjName = 'Science' THEN e.Mark END),
English = SUM(CASE WHEN sub.subjName = 'English' THEN e.Mark END),
Total = SUM(e.Mark)
FROM Student stu
JOIN Exam e ON e.idStu = stu.idStu
JOIN Subjects sub ON sub.idSub = e.idSub
GROUP BY stu.idStu, stu.name;
GO

SQL Pivot Half of table

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.
The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

SQL select items between LAG and LEAD using as range

Is it possible to select and sum items from a table using Lag and lead from another table as range as below.
SELECT #Last = MAX(ID) from [dbo].[#Temp]
select opl.Name as [Age Categories] ,
( SELECT count([dbo].udfCalculateAge([BirthDate],GETDATE()))
FROM [dbo].[tblEmployeeDetail] ed
inner join [dbo].[tblEmployee] e
on ed.EmployeeID = e.ID
where convert(int,[dbo].udfCalculateAge(e.[BirthDate],GETDATE()))
between LAG(opl.Name) OVER (ORDER BY opl.id)
and (CASE opl.ID WHEN #Last THEN '100' ELSE opl.Name End )
) as Total
FROM [dbo].[#Temp] opl
tblEmployee contains the employees and their dates of birth
INSERT INTO #tblEmployees VALUES
(1, 'A', 'A1', 'A', '1983/01/02'),
(2, 'B', 'B1', 'BC', '1982/01/02'),
(3, 'C', 'C1', 'JR2', '1982/10/11'),
(4, 'V', 'V1', 'G', '1990/07/12'),
(5, 'VV', 'VV1', 'J', '1992/06/02'),
(6, 'R', 'A', 'D', '1982/05/15'),
(7, 'C', 'Ma', 'C', '1984/09/29')
Next table is a temp table which is created depending on the ages enter by user eg "20;30;50;60" generates a temp table below , using funtion split
select * FROM [dbo].[Split](';','20;30;50;60')
Temp Table
pn s
1 20
2 30
3 50
4 60
Desired output as below, though column Age Categories can be renamed in a data-table in C#. l need the total columns to be accurate on ranges.
Age Categories Total
up to 20 0
21 - 30 2
31 - 50 5
51 - 60 0
Something along these lines should work for you:
declare #tblEmployees table(
ID int,
FirstNames varchar(20),
Surname varchar(20),
Initial varchar(3),
BirthDate date)
INSERT INTO #tblEmployees VALUES
(1, 'A', 'A1', 'A', '1983/01/02'),
(2, 'B', 'B1', 'BC', '1982/01/02'),
(3, 'C', 'C1', 'JR2', '1982/10/11'),
(4, 'V', 'V1', 'G', '1990/07/12'),
(5, 'VV', 'VV1', 'J', '1992/06/02'),
(6, 'R', 'A', 'D', '1982/05/15'),
(7, 'C', 'Ma', 'C', '1984/09/29')
declare #temp table
(id int identity,
age int)
INSERT INTO #temp
SELECT cast(item as int) FROM dbo.fnSplit(';','20;30;50;60')
declare #today date = GetDate()
declare #minBirthCutOff date = (SELECT DATEADD(yy, -MAX(age), #today) FROM #temp)
declare #minBirth date = (SELECT Min(birthdate) from #tblEmployees)
IF #minBirth < #minBirthCutOff
BEGIN
INSERT INTO #temp VALUES (100)
end
SELECT COALESCE(CAST((LAG(t.age) OVER(ORDER BY t.age) + 1) as varchar(3))
+ ' - ','Up to ')
+ CAST(t.age AS varchar(3)) AS [Age Categories],
COUNT(e.id) AS [Total] FROM #temp t
LEFT JOIN
(SELECT te.id,
te.age,
(SELECT MIN(age) FROM #temp t WHERE t.age > te.age) AS agebucket
FROM (select id,
dbo.udfCalculateAge(birthdate,#today) age from #tblEmployees) te) e
ON e.agebucket = t.age
GROUP BY t.age ORDER BY t.age
Result set looks like this:
Age Categories Total
Up to 20 0
21 - 30 2
31 - 50 5
51 - 60 0
For future reference, particularly when asking SQL questions, you will get far faster and better response, if you provide much of the work that I have done. Ie create statements for the tables concerned and insert statements to supply the sample data. It is much easier for you to do this than for us (we have to copy and paste and then re-format etc), whereas you should be able to do the same via a few choice SELECT statements!
Note also that I handled the case when a birthdate falls outside the given range rather differently. It is a bit more efficient to do a single check once via MAX than to complicate your SELECT statement. It also makes it much more readable.
Thanks to HABO for suggestion on GetDate()

Import wizard with subqueries

I want to import 100k+ rows on to a SQL Server table.
I have my insert like this (observe the 6th value that is a subquery):
INSERT INTO BD_S3I.dbo.AGENDA
(COD_UNDFBR, COD_DCPLNA, COD_TECNCA, COD_ATVIDE, DAT_PROGM_AGENDA, NUM_SQNCL_AGENDA, DAT_FINAL_AGENDA, COD_OCORR, COD_ROTA, NUM_SEMAN_PRGINS, NUM_DIAIN_PRGINS, DAT_INIC_PRGINS, MRC_SITUA_AGENDA, DAT_SUSPN_AGENDA, DAT_CONCL_AGENDA, DAT_REPRG_AGENDA, DCR_SITUA_AGENDA, DCR_AGENDA, MRC_AVISO_AGENDA, MRC_NEGLG_AGENDA, NUM_PERIO_PRGINS, DAT_DIAIN_PRGINS, DAT_JUSTN_AGENDA, COD_MTVNVS, MRC_ERP_AGENDA, COD_USUS3I_JUSTN)
VALUES
(1, 290, 2, 6, '2017-09-11 00:00:00.000', (SELECT CASE WHEN MAX(AGENDA.NUM_SQNCL_AGENDA) + 1 IS NULL THEN 1 ELSE MAX(AGENDA.NUM_SQNCL_AGENDA) + 1 END FROM AGENDA WHERE AGENDA.COD_UNDFBR = 1 AND AGENDA.COD_DCPLNA = 290 AND AGENDA.COD_TECNCA = 2 AND AGENDA.COD_ATVIDE = 6 AND AGENDA.DAT_PROGM_AGENDA = '2017-09-11 00:00:00.000'), '2017-09-17 00:00:00.000', NULL, 492, NULL, NULL, '2017-07-24 08:30:00.000', 'P', NULL, NULL, NULL, NULL, NULL, 'S', 'S', 7, '2017-07-24 00:00:00.000', NULL, NULL, 'N', NULL);
I put al the 100k inserts below each other and start the import. It is working fine, but thakes too much time to execute all the 100k+ rows.
I was thinking to use the import wizard (the time is better?).
The problem is that when I choose the excel file with my data, the import wizard do not understand the subquery on the value. It calls it a longtext.
Select atleast one return type that's needs to be inserted into 6th column.
Just like
( SELECT x = CASE .... )
Or use return type along with subquery at the end.
Simply convert your INSERT...VALUES to INSERT...SELECT which works since all other values are scalars and can be included inline with the subquery's SELECT statement:
INSERT INTO BD_S3I.dbo.AGENDA (COD_UNDFBR, COD_DCPLNA, COD_TECNCA, COD_ATVIDE,
DAT_PROGM_AGENDA, NUM_SQNCL_AGENDA, DAT_FINAL_AGENDA,
COD_OCORR, COD_ROTA, NUM_SEMAN_PRGINS, NUM_DIAIN_PRGINS,
DAT_INIC_PRGINS, MRC_SITUA_AGENDA, DAT_SUSPN_AGENDA,
DAT_CONCL_AGENDA, DAT_REPRG_AGENDA, DCR_SITUA_AGENDA,
DCR_AGENDA, MRC_AVISO_AGENDA, MRC_NEGLG_AGENDA,
NUM_PERIO_PRGINS, DAT_DIAIN_PRGINS, DAT_JUSTN_AGENDA,
COD_MTVNVS, MRC_ERP_AGENDA, COD_USUS3I_JUSTN)
SELECT 1, 290, 2, 6, '2017-09-11 00:00:00.000',
CASE WHEN MAX(AGENDA.NUM_SQNCL_AGENDA) + 1 IS NULL
THEN 1
ELSE MAX(AGENDA.NUM_SQNCL_AGENDA) + 1
END,
'2017-09-17 00:00:00.000', NULL, 492, NULL,
NULL, '2017-07-24 08:30:00.000', 'P',
NULL, NULL, NULL, NULL, NULL, 'S', 'S', 7,
'2017-07-24 00:00:00.000', NULL, NULL, 'N', NULL
FROM AGENDA
WHERE AGENDA.COD_UNDFBR = 1 AND AGENDA.COD_DCPLNA = 290
AND AGENDA.COD_TECNCA = 2 AND AGENDA.COD_ATVIDE = 6
AND AGENDA.DAT_PROGM_AGENDA = '2017-09-11 00:00:00.000')