Got a database with entries indicating units earned by staff. Am trying to find a query that can select for me entries where the units_earned by the staff follow this pattern: >30 then <30 and then >30
In this SQL Fiddle, I would expect the query to return:
For John, Rows:
2, 4, 6
9, 10, 11
For Jane, Rows:
3, 5, 8
12, 13, 14
Here is the relevant SQL:
CREATE TABLE staff_units(
id integer,
staff_number integer,
first_name varchar(50),
month_name varchar(3),
units_earned integer,
PRIMARY KEY(id)
);
INSERT INTO staff_units VALUES (1, 101, 'john', 'jan', 32);
INSERT INTO staff_units VALUES (2, 101, 'john', 'jan', 33);
INSERT INTO staff_units VALUES (3, 102, 'jane', 'jan', 39);
INSERT INTO staff_units VALUES (4, 101, 'john', 'feb', 28);
INSERT INTO staff_units VALUES (5, 102, 'jane', 'feb', 28);
INSERT INTO staff_units VALUES (6, 101, 'john', 'mar', 39);
INSERT INTO staff_units VALUES (7, 101, 'john', 'mar', 34);
INSERT INTO staff_units VALUES (8, 102, 'jane', 'mar', 40);
INSERT INTO staff_units VALUES (9, 101, 'john', 'mar', 36);
INSERT INTO staff_units VALUES (10, 101, 'john', 'apr', 18);
INSERT INTO staff_units VALUES (11, 101, 'john', 'may', 32);
INSERT INTO staff_units VALUES (12, 102, 'jane', 'jun', 31);
INSERT INTO staff_units VALUES (13, 102, 'jane', 'jun', 28);
INSERT INTO staff_units VALUES (14, 102, 'jane', 'jun', 32);
Using window function lead you can refer to the next two leading records of the current record and then compare the three against your desired pattern.
with staff_units_with_leading as (
select id, staff_number, first_name, units_earned,
lead(units_earned) over w units_earned_off1, -- units_earned from record with offset 1
lead(units_earned, 2) over w units_earned_off2, -- units_earned from record with offset 2
lead(id) over w id_off1, -- id from record with offset 1
lead(id, 2) over w id_off2 -- id from record with offset 2
from staff_units
window w as (partition by first_name order by id)
)
, ids_wanted as (
select unnest(array[id, id_off1, id_off2]) id --
from staff_units_with_leading
where
id_off1 is not null -- Discard records with no two leading records
and id_off2 is not null -- Discard records with no two leading records
and units_earned > 30 -- Match desired pattern
and units_earned_off1 < 30 -- Match desired pattern
and units_earned_off2 > 30 -- Match desired pattern
)
select * from staff_units
where id in (select id from ids_wanted)
order by staff_number, id;
To generate trigrams just get rid of the unnest
with staff_units_with_leading as (
select id, staff_number, first_name, units_earned,
lead(units_earned) over w units_earned_off1, -- units_earned from record with offset 1
lead(units_earned, 2) over w units_earned_off2, -- units_earned from record with offset 2
lead(id) over w id_off1, -- id from record with offset 1
lead(id, 2) over w id_off2 -- id from record with offset 2
from staff_units
window w as (partition by first_name order by id)
)
select staff_number, array[id, id_off1, id_off2] id, array[units_earned , units_earned_off1 , units_earned_off2 ] units_earned --
from staff_units_with_leading
where
id_off1 is not null -- Discard records with no two leading records
and id_off2 is not null -- Discard records with no two leading records
and units_earned > 30 -- Match desired pattern
and units_earned_off1 < 30 -- Match desired pattern
and units_earned_off2 > 30 -- Match desired pattern
I took cachique's answer (with excellent idea to use lead() ) and reformatted and extended it to generate 3-grams as you originally wanted:
with staff_units_with_leading as (
select
id, staff_number, first_name, units_earned,
lead(units_earned) over w units_earned_off1, -- units_earned from record with offset 1
lead(units_earned, 2) over w units_earned_off2, -- units_earned from record with offset 2
lead(id) over w id_off1, -- id from record with offset 1
lead(id, 2) over w id_off2 -- id from record with offset 2
from staff_units
window w as (partition by staff_number order by id)
), ids_wanted as (
select
id_off1, -- keep this to group 3-grams later
unnest(array[id, id_off1, id_off2]) id
from staff_units_with_leading
where
id_off1 is not null -- Discard records with no two leading records
and id_off2 is not null -- Discard records with no two leading records
and units_earned > 30 -- Match desired pattern
and units_earned_off1 < 30 -- Match desired pattern
and units_earned_off2 > 30 -- Match desired pattern
), res as (
select su.*, iw.id_off1
from staff_units su
join ids_wanted iw on su.id = iw.id
order by su.staff_number, su.id
)
select
staff_number,
array_agg(units_earned order by id) as values,
array_agg(id order by id) as ids
from res
group by staff_number, id_off1
order by 1
;
The result will be:
staff_number | values | ids
--------------+------------+------------
101 | {33,28,39} | {2,4,6}
101 | {36,18,32} | {9,10,11}
102 | {39,28,40} | {3,5,8}
102 | {31,28,32} | {12,13,14}
(4 rows)
The problem you're trying to solve is a bit complicated. It is probably easier to solve it if you'll use pl/pgsql and play with integer arrays inside pl/pgsql function, or probably with JSON/JSONB.
But it also can be solved in plain SQL, however such SQL is pretty advanced.
with rows_numbered as (
select
*, row_number() over (partition by staff_number order by id) as row_num
from staff_units
order by staff_number
), sequences (staff_number, seq) as (
select
staff_number,
json_agg(json_build_object('row_num', row_num, 'id', id, 'units_earned', units_earned) order by id)
from rows_numbered
group by 1
)
select
s1.staff_number,
(s1.chunk->>'id')::int as id1,
(s2.chunk->>'id')::int as id2,
(s3.chunk->>'id')::int as id3
from (select staff_number, json_array_elements(seq) as chunk from sequences) as s1
, lateral (
select *
from (select staff_number, json_array_elements(seq) as chunk from sequences) _
where
(s1.chunk->>'row_num')::int + 1 = (_.chunk->>'row_num')::int
and (_.chunk->>'units_earned')::int < 30
and s1.staff_number = _.staff_number
) as s2
, lateral (
select *
from (select staff_number, json_array_elements(seq) as chunk from sequences) _
where
(s2.chunk->>'row_num')::int + 1 = (_.chunk->>'row_num')::int
and (_.chunk->>'units_earned')::int > 30
and s2.staff_number = _.staff_number
) as s3
where (s1.chunk->>'units_earned')::int > 30
order by 1, 2;
I used several advanced SQL features:
CTE
JSON
LATERAL
window functions.
Related
This is the data I have
I need Unique ID(1 row) with max(Price). So, the output would be:
I have tried the following
select * from table a
join (select b.id,max(b.price) from table b
group by b.id) c on c.id=a.id;
gives the Question as output, because there is no key. I did try the other where condition as well, which gives the original table as output.
You could try something like this in SQL Server:
Table
create table ex1 (
id int,
item char(1),
price int,
qty int,
usr char(2)
);
Data
insert into ex1 values
(1, 'a', 7, 1, 'ab'),
(1, 'a', 7, 2, 'ac'),
(2, 'b', 6, 1, 'ab'),
(2, 'b', 6, 1, 'av'),
(2, 'b', 5, 1, 'ab'),
(3, 'c', 5, 2, 'ab'),
(4, 'd', 4, 2, 'ac'),
(4, 'd', 3, 1, 'av');
Query
select a.* from ex1 a
join (
select id, max(price) as maxprice, min(usr) as minuser
from ex1
group by id
) c
on c.id = a.id
and a.price = c.maxprice
and a.usr = c.minuser
order by a.id, a.usr;
Result
id item price qty usr
1 a 7 1 ab
2 b 6 1 ab
3 c 5 2 ab
4 d 4 2 ac
Explanation
In your dataset, ID 1 has 2 records with the same price. You have to make a decision which one you want. So, in the above example, I am showing a single record for the user whose name is lowest alphabetically.
Alternate method
SQL Server has ranking function row_number over() that can be used as well:
select * from (
select row_number() over( partition by id order by id, price desc, usr) as sr, *
from ex1
) c where sr = 1;
The subquery says - give me all records from the table and give each row a serial number starting with 1 unique to each ID. The rows should be sorted by ID first, then price descending and then usr. The outer query picks out records with sr number 1.
Example here: https://rextester.com/KZCZ25396
Currently, I have 3 affiliations hard-coded in a query. They serve as a heirarchy: 1 = Faculty, 2 = Staff, 3 = Student. If a user from the affiliations_tbl table has more than one affiliation (example: a Staff member who is also a Student), it will use their Staff affiliation since it is higher on the heirarchy that is defined with the partition by and decode().
SELECT x2.emplid,
scc_afl_code
FROM (SELECT x.emplid,
scc_afl_code,
row_number() over(partition BY x.emplid ORDER BY x.affil_order) r
FROM (SELECT t.emplid,
scc_afl_code,
DECODE(scc_afl_code,
'FACULTY',
1,
'STAFF',
2,
'STUDENT',
3,
999) affil_order
FROM affiliations_tbl t
WHERE t.scc_afl_code IN
(SELECT a.scc_afl_code
FROM affiliation_groups_tbl a
WHERE a.group = 'COLLEGE')) x) x2
WHERE x2.r = 1;
I have created a table that will store affiliation groups affiliation_groups_tbl so I can scale this by adding data to the table, rather than changing the hard-coded values in this query. Example: Instead of adding 'CONSULTANT', 4 to the decode() list, I would add it to the table, so I wouldn't have to modify the SQL.
scc_afl_code | group | group_name | sort_order
-------------+---------+------------+-----------
FACULTY | COLLEGE | Faculty | 1
STAFF | COLLEGE | Staff | 2
STUDENT | COLLEGE | Student | 3
I've already updated the latter half of the query to only select scc_afl_code that are in the COLLEGE_GROUP group. How can I properly update the first part of the query to use the table as a hierarchy?
Try a piece of code below instead decode in the select clause of your statement:
coalesce((
select g.sort_order
from affiliation_groups_tbl g
where g.scc_afl_code = t.scc_afl_code ), 999)
You can try like that
create table dictionary
(id number,
code varchar2(32),
name varchar2(32),
sort number);
insert into dictionary (id, code, name, sort) values (16, 'B', 'B name', 1);
insert into dictionary (id, code, name, sort) values (23, 'A', 'A name', 2);
insert into dictionary (id, code, name, sort) values (15, 'C', 'C name', 4);
insert into dictionary (id, code, name, sort) values (22, 'D', 'D name', 3);
select partition,
string,
decode(string, 'B', 1, 'A', 2, 'D', 3, 'C', 4, 999) decode,
row_number() over(partition by partition order by decode(string, 'B', 1, 'A', 2, 'D', 3, 'C', 4, 999)) ordering
from (select mod(level, 3) partition, chr(65 + mod(level, 5)) string
from dual
connect by level <= 8)
minus
-- Alternate --
select partition,
string,
nvl(t.sort, 999) nvl,
row_number() over(partition by partition order by nvl(t.sort, 999)) ordering
from (select mod(level, 3) partition, chr(65 + mod(level, 5)) string
from dual
connect by level <= 8) r
left join dictionary t
on t.code = r.string;
I'm writing query which has to select few infos. Below table:
ID ID-Toner Quantity Location Order_date Send_date
1 2 1 55 20.01.2015 26.01.2015
2 2 1 41 22.02.2015 26.02.2015
3 2 1 35 23.02.2015 26.02.2015
4 5 1 77 25.02.2015 25.02.2015
5 2 1 55 25.02.2015 26.02.2015
I need to select all columns and additional column with number of days between two dates: Order_date and previous Order_date for location = ie.: 55.
Sample result should look like:
ID ID-Toner Quantity Location Order_date Send_date Number_of_days
1 2 1 55 20.01.2015 26.01.2015 0
5 2 1 55 25.02.2015 26.02.2015 36
How to select such a query?
updated after clarifications in the PO
let'say that it needs to do a sort of aggregation on data called ranking, that is a type of classification based on numbering in succession order tbale's rows.
In our case the order is given by the Orders dates.
This is a quite cross-dbms solution (date fields are suppased to be Datetime type and DATEDIFF is a function of MySql) so I think that you can adapt to your dbms quite easily.
You can try the sql on Sql Fiddle at http://sqlfiddle.com/#!9/290e9
Table
CREATE TABLE Orders
(`ID` int, `IDToner` int, `Quantity` int, `Location` int, `Order_date` Date, `Send_date` Date)
;
INSERT INTO Orders
(`ID`, `IDToner`, `Quantity`, `Location`, `Order_date`, `Send_date`)
VALUES
(1, 2, 1, 55, STR_TO_DATE('20.01.2015','%d.%m.%Y'), STR_TO_DATE('26.01.2015','%d.%m.%Y')),
(2, 2, 1, 41, STR_TO_DATE('22.02.2015','%d.%m.%Y'), STR_TO_DATE('26.02.2015','%d.%m.%Y')),
(3, 2, 1, 35, STR_TO_DATE('23.02.2015','%d.%m.%Y'), STR_TO_DATE('26.02.2015','%d.%m.%Y')),
(4, 5, 1, 77, STR_TO_DATE('25.02.2015','%d.%m.%Y'), STR_TO_DATE('25.02.2015','%d.%m.%Y')),
(5, 5, 1, 77, STR_TO_DATE('25.04.2015','%d.%m.%Y'), STR_TO_DATE('25.04.2015','%d.%m.%Y')),
(6, 5, 1, 77, STR_TO_DATE('25.06.2015','%d.%m.%Y'), STR_TO_DATE('25.06.2015','%d.%m.%Y')),
(7, 5, 1, 77, STR_TO_DATE('25.08.2015','%d.%m.%Y'), STR_TO_DATE('25.08.2015','%d.%m.%Y')),
(8, 2, 1, 55, STR_TO_DATE('25.02.2015','%d.%m.%Y'), STR_TO_DATE('26.02.2015','%d.%m.%Y'))
;
Query
SELECT
ID,
ID_Toner,
Quantity,
Location,
Order_date,
Send_date,
days_from_previous_order
FROM(
SELECT
current_ID AS ID,
current_IDToner AS ID_Toner,
current_Quantity AS Quantity,
current_Location AS Location,
current_Send_Date AS Send_date,
current_Order_Date AS Order_date,
previous_Order_Date,
COALESCE(DATEDIFF(current_Order_Date, previous_Order_Date),0) AS days_from_previous_order
FROM(
SELECT
TabOrdersRanking_currents.ID AS current_ID,
TabOrdersRanking_currents.IDToner AS current_IDToner,
TabOrdersRanking_currents.Quantity AS current_Quantity,
TabOrdersRanking_currents.Location AS current_Location,
TabOrdersRanking_currents.Send_Date AS current_Send_Date,
TabOrdersRanking_currents.Order_Date AS current_Order_Date,
TabOrdersRanking_previous.Order_Date AS previous_Order_Date
FROM(
SELECT Orders.*, #rank1 := #rank1 + 1 rank
FROM Orders
,(Select #rank1 := 0) r1
order by location, order_date
) TabOrdersRanking_currents
LEFT JOIN(
SELECT Orders.*, #rank2 := #rank2 + 1 rank
FROM Orders
,(Select #rank2 := 0) r2
order by location, order_date
) TabOrdersRanking_previous
on TabOrdersRanking_currents.Location = TabOrdersRanking_previous.Location
and TabOrdersRanking_currents.rank - TabOrdersRanking_previous.rank = 1
) TabOrdersSuccessionRanking
) TabWithDaysFromPrevious;
I have a table with duplicate entries (I forgot to make NAME column unique)
So I now have this Duplicate entry table called 'table 1'
ID NAME
1 John F Smith
2 Sam G Davies
3 Tom W Mack
4 Bob W E Jone
5 Tom W Mack
IE ID 3 and 5 are duplicates
Table 2
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 5 item34
NAMEID are ID from table 1. Table 2 ID 4 and 5 I want to have NAMEID of 3 (Tom W Mack's Orders) like so
Table 2 (correct version)
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 3 item34
Is there an easy way to find and update the duplicates NAMEID in table 2 then remove the duplicates from table 1
In this case what you can do is.
You can find how many duplicate records you have.
In Order to find duplicate records you can use.
SELECT ID, NAME,COUNT(1) as CNT FROM TABLE1 GROUP BY ID, NAME
This is will give you the count and you find all the duplicate records
and delete them manually.
Don't forget to alter your table after removing all the duplicate records.
Here's how you can do it:
-- set up the environment
create table #t (ID int, NAME varchar(50))
insert #t values
(1, 'John F Smith'),
(2, 'Sam G Davies'),
(3, 'Tom W Mack'),
(4, 'Bob W E Jone'),
(5, 'Tom W Mack')
create table #t2 (ID int, NAMEID int, ORDERS varchar(10))
insert #t2 values
(1, 2, 'item4'),
(2, 1, 'item5'),
(3, 4, 'item6'),
(4, 3, 'item23'),
(5, 5, 'item34')
go
-- update the referencing table first
;with x as (
select id,
first_value(id) over(partition by name order by id) replace_with
from #t
),
y as (
select #t2.nameid, x.replace_with
FROM #t2
join x on #t2.nameid = x.id
where #t2.nameid <> x.replace_with
)
update y set nameid = replace_with
-- delete duplicates from referenced table
;with x as (
select *, row_number() over(partition by name order by id) rn
from #t
)
delete x where rn > 1
select * from #t
select * from #t2
Pls, test first for performance and validity.
Let's use the example data
INSERT INTO TableA
(`ID`, `NAME`)
VALUES
(1, 'NameA'),
(2, 'NameB'),
(3, 'NameA'),
(4, 'NameC'),
(5, 'NameB'),
(6, 'NameD')
and
INSERT INTO TableB
(`ID`, `NAMEID`, `ORDERS`)
VALUES
(1, 2, 'itemB1'),
(2, 1, 'itemA1'),
(3, 4, 'itemC1'),
(4, 3, 'itemA2'),
(5, 5, 'itemB2'),
(5, 6, 'itemD1')
(makes it a bit easier to spot the duplicates and check the result)
Let's start with a simple query to get the smallest ID for a given NAME
SELECT
NAME, min(ID)
FROM
tableA
GROUP BY
NAME
And the result is [NameA,1], [NameB,2], [NameC,4], [NameD,6]
Now if you use that as an uncorrelated subquery for a JOIN with the base table like
SELECT
keep.kid, dup.id
FROM
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
It finds all duplicates that have the same name as in the result of the subquery but a different id + it also gives you the id of the "original", i.e. the smallest id for that name.
For the example it's [1,3], [2,5]
Now you can use that in an UPDATE query like
UPDATE
TableB as b
JOIN
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
SET
b.NAMEID=keep.kid
WHERE
b.NAMEID=dup.id
And the result is
ID,NAMEID,ORDERS
1, 2, itemB1
2, 1, itemA1
3, 4, itemC1
4, 1, itemA2 <- now has NAMEID=1
5, 2, itemB2 <- now has NAMEID=2
5, 6, itemD1
To eleminate the duplicates from tableA you can use the first query again.
Using Oracle 10.2.0.
I have a table that consists of a line number, an indent level, and text. I need to write a routine to 'natural' sort the text within an indent level [that is a child of a lower indent level]. I have limited experience with analytic routines and connect by/prior, but from what I've read here and elsewhere, it seems like they could be put to use to help my cause, but I can't figure out how.
CREATE TABLE t (ord NUMBER(5), indent NUMBER(3), text VARCHAR2(254));
INSERT INTO t (ord, indent, text) VALUES (10, 0, 'A');
INSERT INTO t (ord, indent, text) VALUES (20, 1, 'B');
INSERT INTO t (ord, indent, text) VALUES (30, 1, 'C');
INSERT INTO t (ord, indent, text) VALUES (40, 2, 'D');
INSERT INTO t (ord, indent, text) VALUES (50, 2, 'Z');
INSERT INTO t (ord, indent, text) VALUES (60, 2, 'E');
INSERT INTO t (ord, indent, text) VALUES (70, 1, 'F');
INSERT INTO t (ord, indent, text) VALUES (80, 2, 'H');
INSERT INTO t (ord, indent, text) VALUES (90, 2, 'G');
INSERT INTO t (ord, indent, text) VALUES (100, 3, 'J');
INSERT INTO t (ord, indent, text) VALUES (110, 3, 'H');
This:
SELECT ord, indent, LPAD(' ', indent, ' ') || text txt FROM t;
...returns:
ORD INDENT TXT
---------- ---------- ----------------------------------------------
10 0 A
20 1 B
30 1 C
40 2 D
50 2 Z
60 2 E
70 1 F
80 2 H
90 2 G
100 3 J
110 3 H
11 rows selected.
In the case I've defined for you, I need my routine to set ORD 60 = 50 and ORD 50 = 60 [flip them] because E is after D and before Z.
Same with ORD 80 and 90 [with 90 bringing 100 and 110 with it because they belong to it], 100 and 110. The final output should be:
ORD INDENT TXT
10 0 A
20 1 B
30 1 C
40 2 D
50 2 E
60 2 Z
70 1 F
80 2 G
90 3 H
100 3 J
110 2 H
The result is that each indent level is sorted alphabetically, within its indent level, within the parent indent level.
Here's what I got to work. No idea how efficient it might be on larger sets. The hard part for me was identifying the "parent" for a given row based solely on indent and original order.
WITH
a AS (
SELECT
t.*,
( SELECT MAX( ord )
FROM t t2
WHERE t2.ord < t.ord AND t2.indent = t.indent-1
) AS parent_ord
FROM
t
)
SELECT
ROWNUM*10 AS ord,
indent,
rpad( ' ', LEVEL-1, ' ' ) || text
FROM
a
CONNECT BY
PRIOR ord = parent_ord
START WITH
parent_ord IS NULL
ORDER SIBLINGS BY
text
Okay, here you go. The hard part in your data structure is that the parent is not (explicitly) known, so that the first part of the query does nothing but identify the parent according to the rules (for each node, it gets all subnodes one level deep, stopping as soon as the identation is smaller or equal to the start node).
The rest is easy, basically just some recursion with connect by to get the items in the order you want them (renumbering them dynamically).
WITH OrdWithParentInfo AS
(SELECT ID,
INDENT,
TEXT,
MIN(ParentID) ParentID
FROM (SELECT O.*,
CASE
WHEN (CONNECT_BY_ROOT ID = ID) THEN
NULL
ELSE
CONNECT_BY_ROOT ID
END ParentID
FROM (SELECT ROWNUM ID,
INDENT,
TEXT
FROM T
ORDER BY ORD) O
WHERE (INDENT = CONNECT_BY_ROOT INDENT + 1)
OR (CONNECT_BY_ROOT ID = ID)
CONNECT BY ((ID = PRIOR ID + 1) AND (INDENT > CONNECT_BY_ROOT INDENT)))
GROUP BY ID,
INDENT,
TEXT)
SELECT ROWNUM * 10 ORD, O.INDENT, O.TEXT
FROM OrdWithParentInfo O
START WITH O.ParentID IS NULL
CONNECT BY O.ParentID = PRIOR ID
ORDER SIBLINGS BY O.Text;