Replace hard coded values with data from table - sql

Currently, I have 3 affiliations hard-coded in a query. They serve as a heirarchy: 1 = Faculty, 2 = Staff, 3 = Student. If a user from the affiliations_tbl table has more than one affiliation (example: a Staff member who is also a Student), it will use their Staff affiliation since it is higher on the heirarchy that is defined with the partition by and decode().
SELECT x2.emplid,
scc_afl_code
FROM (SELECT x.emplid,
scc_afl_code,
row_number() over(partition BY x.emplid ORDER BY x.affil_order) r
FROM (SELECT t.emplid,
scc_afl_code,
DECODE(scc_afl_code,
'FACULTY',
1,
'STAFF',
2,
'STUDENT',
3,
999) affil_order
FROM affiliations_tbl t
WHERE t.scc_afl_code IN
(SELECT a.scc_afl_code
FROM affiliation_groups_tbl a
WHERE a.group = 'COLLEGE')) x) x2
WHERE x2.r = 1;
I have created a table that will store affiliation groups affiliation_groups_tbl so I can scale this by adding data to the table, rather than changing the hard-coded values in this query. Example: Instead of adding 'CONSULTANT', 4 to the decode() list, I would add it to the table, so I wouldn't have to modify the SQL.
scc_afl_code | group | group_name | sort_order
-------------+---------+------------+-----------
FACULTY | COLLEGE | Faculty | 1
STAFF | COLLEGE | Staff | 2
STUDENT | COLLEGE | Student | 3
I've already updated the latter half of the query to only select scc_afl_code that are in the COLLEGE_GROUP group. How can I properly update the first part of the query to use the table as a hierarchy?

Try a piece of code below instead decode in the select clause of your statement:
coalesce((
select g.sort_order
from affiliation_groups_tbl g
where g.scc_afl_code = t.scc_afl_code ), 999)

You can try like that
create table dictionary
(id number,
code varchar2(32),
name varchar2(32),
sort number);
insert into dictionary (id, code, name, sort) values (16, 'B', 'B name', 1);
insert into dictionary (id, code, name, sort) values (23, 'A', 'A name', 2);
insert into dictionary (id, code, name, sort) values (15, 'C', 'C name', 4);
insert into dictionary (id, code, name, sort) values (22, 'D', 'D name', 3);
select partition,
string,
decode(string, 'B', 1, 'A', 2, 'D', 3, 'C', 4, 999) decode,
row_number() over(partition by partition order by decode(string, 'B', 1, 'A', 2, 'D', 3, 'C', 4, 999)) ordering
from (select mod(level, 3) partition, chr(65 + mod(level, 5)) string
from dual
connect by level <= 8)
minus
-- Alternate --
select partition,
string,
nvl(t.sort, 999) nvl,
row_number() over(partition by partition order by nvl(t.sort, 999)) ordering
from (select mod(level, 3) partition, chr(65 + mod(level, 5)) string
from dual
connect by level <= 8) r
left join dictionary t
on t.code = r.string;

Related

Rule based select with source tracking

Here is an example of the sample dataset that I am having trouble writing an efficient SQL:
There is a target table T1 with 5 columns ID (primary key), NAME, CATEGORY, HEIGHT, LINEAGE
T1 gets data from 3 sources - source1, source2, source3
A map table defines the rule as to which column has to be picked in what order from which source
If a source has NULL value for a column, then check the next source to get the value - that's the rule
So the values for target table columns based on the rules are as below for ID = 1:
Name: A12, CATEGORY: T1, HEIGHT: 4, Lineage: S3-S1-S1
The values for target table columns based on the rules are as below for ID = 2:
NAME: B, CATEGORY: T22, HEIGHT: 5, Lineage: S3-S2-S1
The logic to merge into target should look like this:
Merge into Target T
using (select statement with rank based rules from 3 source tables) on
when matched then
when not matched then
Question: any suggestions on writing this Merge in an efficient way which also should update the Lineage in the merge?
First, the MAP table must have a column that will give priority to the mapping.
Then you should PIVOT this table.
The next step is to combine UNION ALL of all source tables.
And finally, we can join all and select our values with the FIRST_VALUE function.
Having such a result, you can substitute it in MERGE.
Structure and sample data for testing:
CREATE OR REPLACE TABLE SOURCE1 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE SOURCE2 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE SOURCE3 (
ID int,
NAME string,
CATEGORY string,
HEIGHT numeric);
CREATE OR REPLACE TABLE MAP (
PRIORITY int,
SOURCE_COLUMN string,
SOURCE_TABLE string);
INSERT INTO SOURCE1 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A', 'T1', 4),
(2, 'B', 'T2', 5),
(3, 'C', 'T3', 6);
INSERT INTO SOURCE2 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A1', 'T1', 4.4),
(2, 'B1', 'T22', 6),
(3, NULL, 'T3', 7.2);
INSERT INTO SOURCE3 (ID, NAME, CATEGORY, HEIGHT)
VALUES (1, 'A12', 'T21', NULL),
(2, 'B', NULL, 6),
(3, 'C3', 'T3', NULL);
INSERT INTO MAP (PRIORITY, SOURCE_COLUMN, SOURCE_TABLE)
VALUES (1, 'NAME', 'SOURCE3'),
(2, 'NAME', 'SOURCE1'),
(3, 'NAME', 'SOURCE2'),
(1, 'CATEGORY', 'SOURCE2'),
(2, 'CATEGORY', 'SOURCE3'),
(3, 'CATEGORY', 'SOURCE1'),
(1, 'HEIGHT', 'SOURCE1'),
(2, 'HEIGHT', 'SOURCE2'),
(3, 'HEIGHT', 'SOURCE3');
And my suggestion for a solution:
WITH _MAP AS (
SELECT *
FROM MAP
PIVOT (MAX(SOURCE_TABLE) FOR SOURCE_COLUMN IN ('NAME', 'CATEGORY', 'HEIGHT')) AS p(PRIORITY, NAME, CATEGORY, HEIGHT)
), _SRC AS (
SELECT 'SOURCE1' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE1
UNION ALL
SELECT 'SOURCE2' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE2
UNION ALL
SELECT 'SOURCE3' AS SOURCE_TABLE, ID, NAME, CATEGORY, HEIGHT FROM SOURCE3
)
SELECT DISTINCT _SRC.ID,
FIRST_VALUE(_SRC.NAME) OVER(PARTITION BY _SRC.ID ORDER BY MN.PRIORITY) AS NAME,
FIRST_VALUE(_SRC.CATEGORY) OVER(PARTITION BY _SRC.ID ORDER BY MC.PRIORITY) AS CATEGORY,
FIRST_VALUE(_SRC.HEIGHT) OVER(PARTITION BY _SRC.ID ORDER BY MH.PRIORITY) AS HEIGHT,
REPLACE(FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MN.PRIORITY) || '-' ||
FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MC.PRIORITY) || '-' ||
FIRST_VALUE(_SRC.SOURCE_TABLE) OVER(PARTITION BY _SRC.ID ORDER BY MH.PRIORITY), 'SOURCE', 'S') AS LINEAGE
FROM _SRC
LEFT JOIN _MAP AS MN ON _SRC.SOURCE_TABLE = MN.NAME AND _SRC.NAME IS NOT NULL
LEFT JOIN _MAP AS MC ON _SRC.SOURCE_TABLE = MC.CATEGORY AND _SRC.CATEGORY IS NOT NULL
LEFT JOIN _MAP AS MH ON _SRC.SOURCE_TABLE = MH.HEIGHT AND _SRC.HEIGHT IS NOT NULL;
Result:
+----+------+----------+--------+----------+
| ID | NAME | CATEGORY | HEIGHT | LINEAGE |
+----+------+----------+--------+----------+
| 1 | A12 | T1 | 4 | S3-S2-S1 |
| 2 | B | T22 | 5 | S3-S2-S1 |
| 3 | C3 | T3 | 6 | S3-S2-S1 |
+----+------+----------+--------+----------+

1 distinct row having max value

This is the data I have
I need Unique ID(1 row) with max(Price). So, the output would be:
I have tried the following
select * from table a
join (select b.id,max(b.price) from table b
group by b.id) c on c.id=a.id;
gives the Question as output, because there is no key. I did try the other where condition as well, which gives the original table as output.
You could try something like this in SQL Server:
Table
create table ex1 (
id int,
item char(1),
price int,
qty int,
usr char(2)
);
Data
insert into ex1 values
(1, 'a', 7, 1, 'ab'),
(1, 'a', 7, 2, 'ac'),
(2, 'b', 6, 1, 'ab'),
(2, 'b', 6, 1, 'av'),
(2, 'b', 5, 1, 'ab'),
(3, 'c', 5, 2, 'ab'),
(4, 'd', 4, 2, 'ac'),
(4, 'd', 3, 1, 'av');
Query
select a.* from ex1 a
join (
select id, max(price) as maxprice, min(usr) as minuser
from ex1
group by id
) c
on c.id = a.id
and a.price = c.maxprice
and a.usr = c.minuser
order by a.id, a.usr;
Result
id item price qty usr
1 a 7 1 ab
2 b 6 1 ab
3 c 5 2 ab
4 d 4 2 ac
Explanation
In your dataset, ID 1 has 2 records with the same price. You have to make a decision which one you want. So, in the above example, I am showing a single record for the user whose name is lowest alphabetically.
Alternate method
SQL Server has ranking function row_number over() that can be used as well:
select * from (
select row_number() over( partition by id order by id, price desc, usr) as sr, *
from ex1
) c where sr = 1;
The subquery says - give me all records from the table and give each row a serial number starting with 1 unique to each ID. The rows should be sorted by ID first, then price descending and then usr. The outer query picks out records with sr number 1.
Example here: https://rextester.com/KZCZ25396

Select rows matching the pattern: greater_than, less_than, greater_than

Got a database with entries indicating units earned by staff. Am trying to find a query that can select for me entries where the units_earned by the staff follow this pattern: >30 then <30 and then >30
In this SQL Fiddle, I would expect the query to return:
For John, Rows:
2, 4, 6
9, 10, 11
For Jane, Rows:
3, 5, 8
12, 13, 14
Here is the relevant SQL:
CREATE TABLE staff_units(
id integer,
staff_number integer,
first_name varchar(50),
month_name varchar(3),
units_earned integer,
PRIMARY KEY(id)
);
INSERT INTO staff_units VALUES (1, 101, 'john', 'jan', 32);
INSERT INTO staff_units VALUES (2, 101, 'john', 'jan', 33);
INSERT INTO staff_units VALUES (3, 102, 'jane', 'jan', 39);
INSERT INTO staff_units VALUES (4, 101, 'john', 'feb', 28);
INSERT INTO staff_units VALUES (5, 102, 'jane', 'feb', 28);
INSERT INTO staff_units VALUES (6, 101, 'john', 'mar', 39);
INSERT INTO staff_units VALUES (7, 101, 'john', 'mar', 34);
INSERT INTO staff_units VALUES (8, 102, 'jane', 'mar', 40);
INSERT INTO staff_units VALUES (9, 101, 'john', 'mar', 36);
INSERT INTO staff_units VALUES (10, 101, 'john', 'apr', 18);
INSERT INTO staff_units VALUES (11, 101, 'john', 'may', 32);
INSERT INTO staff_units VALUES (12, 102, 'jane', 'jun', 31);
INSERT INTO staff_units VALUES (13, 102, 'jane', 'jun', 28);
INSERT INTO staff_units VALUES (14, 102, 'jane', 'jun', 32);
Using window function lead you can refer to the next two leading records of the current record and then compare the three against your desired pattern.
with staff_units_with_leading as (
select id, staff_number, first_name, units_earned,
lead(units_earned) over w units_earned_off1, -- units_earned from record with offset 1
lead(units_earned, 2) over w units_earned_off2, -- units_earned from record with offset 2
lead(id) over w id_off1, -- id from record with offset 1
lead(id, 2) over w id_off2 -- id from record with offset 2
from staff_units
window w as (partition by first_name order by id)
)
, ids_wanted as (
select unnest(array[id, id_off1, id_off2]) id --
from staff_units_with_leading
where
id_off1 is not null -- Discard records with no two leading records
and id_off2 is not null -- Discard records with no two leading records
and units_earned > 30 -- Match desired pattern
and units_earned_off1 < 30 -- Match desired pattern
and units_earned_off2 > 30 -- Match desired pattern
)
select * from staff_units
where id in (select id from ids_wanted)
order by staff_number, id;
To generate trigrams just get rid of the unnest
with staff_units_with_leading as (
select id, staff_number, first_name, units_earned,
lead(units_earned) over w units_earned_off1, -- units_earned from record with offset 1
lead(units_earned, 2) over w units_earned_off2, -- units_earned from record with offset 2
lead(id) over w id_off1, -- id from record with offset 1
lead(id, 2) over w id_off2 -- id from record with offset 2
from staff_units
window w as (partition by first_name order by id)
)
select staff_number, array[id, id_off1, id_off2] id, array[units_earned , units_earned_off1 , units_earned_off2 ] units_earned --
from staff_units_with_leading
where
id_off1 is not null -- Discard records with no two leading records
and id_off2 is not null -- Discard records with no two leading records
and units_earned > 30 -- Match desired pattern
and units_earned_off1 < 30 -- Match desired pattern
and units_earned_off2 > 30 -- Match desired pattern
I took cachique's answer (with excellent idea to use lead() ) and reformatted and extended it to generate 3-grams as you originally wanted:
with staff_units_with_leading as (
select
id, staff_number, first_name, units_earned,
lead(units_earned) over w units_earned_off1, -- units_earned from record with offset 1
lead(units_earned, 2) over w units_earned_off2, -- units_earned from record with offset 2
lead(id) over w id_off1, -- id from record with offset 1
lead(id, 2) over w id_off2 -- id from record with offset 2
from staff_units
window w as (partition by staff_number order by id)
), ids_wanted as (
select
id_off1, -- keep this to group 3-grams later
unnest(array[id, id_off1, id_off2]) id
from staff_units_with_leading
where
id_off1 is not null -- Discard records with no two leading records
and id_off2 is not null -- Discard records with no two leading records
and units_earned > 30 -- Match desired pattern
and units_earned_off1 < 30 -- Match desired pattern
and units_earned_off2 > 30 -- Match desired pattern
), res as (
select su.*, iw.id_off1
from staff_units su
join ids_wanted iw on su.id = iw.id
order by su.staff_number, su.id
)
select
staff_number,
array_agg(units_earned order by id) as values,
array_agg(id order by id) as ids
from res
group by staff_number, id_off1
order by 1
;
The result will be:
staff_number | values | ids
--------------+------------+------------
101 | {33,28,39} | {2,4,6}
101 | {36,18,32} | {9,10,11}
102 | {39,28,40} | {3,5,8}
102 | {31,28,32} | {12,13,14}
(4 rows)
The problem you're trying to solve is a bit complicated. It is probably easier to solve it if you'll use pl/pgsql and play with integer arrays inside pl/pgsql function, or probably with JSON/JSONB.
But it also can be solved in plain SQL, however such SQL is pretty advanced.
with rows_numbered as (
select
*, row_number() over (partition by staff_number order by id) as row_num
from staff_units
order by staff_number
), sequences (staff_number, seq) as (
select
staff_number,
json_agg(json_build_object('row_num', row_num, 'id', id, 'units_earned', units_earned) order by id)
from rows_numbered
group by 1
)
select
s1.staff_number,
(s1.chunk->>'id')::int as id1,
(s2.chunk->>'id')::int as id2,
(s3.chunk->>'id')::int as id3
from (select staff_number, json_array_elements(seq) as chunk from sequences) as s1
, lateral (
select *
from (select staff_number, json_array_elements(seq) as chunk from sequences) _
where
(s1.chunk->>'row_num')::int + 1 = (_.chunk->>'row_num')::int
and (_.chunk->>'units_earned')::int < 30
and s1.staff_number = _.staff_number
) as s2
, lateral (
select *
from (select staff_number, json_array_elements(seq) as chunk from sequences) _
where
(s2.chunk->>'row_num')::int + 1 = (_.chunk->>'row_num')::int
and (_.chunk->>'units_earned')::int > 30
and s2.staff_number = _.staff_number
) as s3
where (s1.chunk->>'units_earned')::int > 30
order by 1, 2;
I used several advanced SQL features:
CTE
JSON
LATERAL
window functions.

Check duplicates in sql table and replace the duplicates ID in another table

I have a table with duplicate entries (I forgot to make NAME column unique)
So I now have this Duplicate entry table called 'table 1'
ID NAME
1 John F Smith
2 Sam G Davies
3 Tom W Mack
4 Bob W E Jone
5 Tom W Mack
IE ID 3 and 5 are duplicates
Table 2
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 5 item34
NAMEID are ID from table 1. Table 2 ID 4 and 5 I want to have NAMEID of 3 (Tom W Mack's Orders) like so
Table 2 (correct version)
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 3 item34
Is there an easy way to find and update the duplicates NAMEID in table 2 then remove the duplicates from table 1
In this case what you can do is.
You can find how many duplicate records you have.
In Order to find duplicate records you can use.
SELECT ID, NAME,COUNT(1) as CNT FROM TABLE1 GROUP BY ID, NAME
This is will give you the count and you find all the duplicate records
and delete them manually.
Don't forget to alter your table after removing all the duplicate records.
Here's how you can do it:
-- set up the environment
create table #t (ID int, NAME varchar(50))
insert #t values
(1, 'John F Smith'),
(2, 'Sam G Davies'),
(3, 'Tom W Mack'),
(4, 'Bob W E Jone'),
(5, 'Tom W Mack')
create table #t2 (ID int, NAMEID int, ORDERS varchar(10))
insert #t2 values
(1, 2, 'item4'),
(2, 1, 'item5'),
(3, 4, 'item6'),
(4, 3, 'item23'),
(5, 5, 'item34')
go
-- update the referencing table first
;with x as (
select id,
first_value(id) over(partition by name order by id) replace_with
from #t
),
y as (
select #t2.nameid, x.replace_with
FROM #t2
join x on #t2.nameid = x.id
where #t2.nameid <> x.replace_with
)
update y set nameid = replace_with
-- delete duplicates from referenced table
;with x as (
select *, row_number() over(partition by name order by id) rn
from #t
)
delete x where rn > 1
select * from #t
select * from #t2
Pls, test first for performance and validity.
Let's use the example data
INSERT INTO TableA
(`ID`, `NAME`)
VALUES
(1, 'NameA'),
(2, 'NameB'),
(3, 'NameA'),
(4, 'NameC'),
(5, 'NameB'),
(6, 'NameD')
and
INSERT INTO TableB
(`ID`, `NAMEID`, `ORDERS`)
VALUES
(1, 2, 'itemB1'),
(2, 1, 'itemA1'),
(3, 4, 'itemC1'),
(4, 3, 'itemA2'),
(5, 5, 'itemB2'),
(5, 6, 'itemD1')
(makes it a bit easier to spot the duplicates and check the result)
Let's start with a simple query to get the smallest ID for a given NAME
SELECT
NAME, min(ID)
FROM
tableA
GROUP BY
NAME
And the result is [NameA,1], [NameB,2], [NameC,4], [NameD,6]
Now if you use that as an uncorrelated subquery for a JOIN with the base table like
SELECT
keep.kid, dup.id
FROM
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
It finds all duplicates that have the same name as in the result of the subquery but a different id + it also gives you the id of the "original", i.e. the smallest id for that name.
For the example it's [1,3], [2,5]
Now you can use that in an UPDATE query like
UPDATE
TableB as b
JOIN
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
SET
b.NAMEID=keep.kid
WHERE
b.NAMEID=dup.id
And the result is
ID,NAMEID,ORDERS
1, 2, itemB1
2, 1, itemA1
3, 4, itemC1
4, 1, itemA2 <- now has NAMEID=1
5, 2, itemB2 <- now has NAMEID=2
5, 6, itemD1
To eleminate the duplicates from tableA you can use the first query again.

query to count number of unique relations

I have 3 tables:
t_user (id, name)
t_user_deal (id, user_id, deal_id)
t_deal (id, title)
multiple user can be linked to the same deal. (I'm using oracle but it should be similar, I can adapt it)
How can I get all the users (name) with the number of unique user he made a deal with.
let's explain with some data:
t_user:
id, name
1, joe
2, mike
3, John
t_deal:
id, title
1, deal number 1
2, deal number 2
t_user_deal:
id, user_id, deal_id
1, 1, 1
2, 2, 1
3, 1, 2
4, 3, 2
the result I expect:
user_name, number of unique user he made a deal with
Joe, 2
Mike, 1
John, 1
I've try this but I didn't get the expected result:
SELECT tu.name,
count(tu.id) AS nbRelations
FROM t_user tu
INNER JOIN t_user_deal tud ON tu.id = tud.user_id
INNER JOIN t_deal td ON tud.deal_id = td.id
WHERE
(
td.id IN
(
SELECT DISTINCT td.id
FROM t_user_deal tud2
INNER JOIN t_deal td2 ON tud2.deal_id = td2.id
WHERE tud.id <> tud2.user_id
)
)
GROUP BY tu.id
ORDER BY nbRelations DESC
thanks for your help
This should get you the result
SELECT id1, count(id2),name
FROM (
SELECT distinct tud1.user_id id1 , tud2.user_id id2
FROM t_user_deal tud1, t_user_deal tud2
WHERE tud1.deal_id = tud2.deal_id
and tud1.user_id <> tud2.user_id) as tab, t_user tu
WHERE tu.id = id1
GROUP BY id1,name
Something like
select name, NVL (i.ud, 0) ud from t_user join (
SELECT user_id, count(*) ud from t_user_deal group by user_id) i on on t_user.id = i.user_id
where i.ud > 0
Unless I'm missing somethig here. It actually sounds like your question references having a second user in the t_user_deal table. The model you've described here doesn't include that.
PostgreSQL example:
create table t_user (id int, name varchar(255)) ;
create table t_deal (id int, title varchar(255)) ;
create table t_user_deal (id int, user_id int, deal_id int) ;
insert into t_user values (1, 'joe'), (2, 'mike'), (3, 'john') ;
insert into t_deal values (1, 'deal 1'), (2, 'deal 2') ;
insert into t_user_deal values (1, 1, 1), (2, 2, 1), (3, 1, 2), (4, 3, 2) ;
And the query.....
SELECT
name, COUNT(DISTINCT deal_id)
FROM
t_user INNER JOIN t_user_deal ON (t_user.id = t_user_deal.user_id)
GROUP BY
user_id, name ;
The DISTINCT might not be necessary (in the COUNT(), that is). Depends on how clean your data is (e.g., no duplicate rows!)
Here's the result in PostgreSQL:
name | count
------+-------
joe | 2
mike | 1
john | 1
(3 rows)