SQL: Update with all possible combinations - sql

I have a relation
+-----+----+
| seq | id |
+-----+----+
| 1 | A1 |
| 2 | B1 |
| 3 | C1 |
| 4 | D1 |
+-----+----+
and want to join it in PostgreSQL with
+----+-------+
| id | alter |
+----+-------+
| B1 | B2 |
| D1 | D2 |
+----+-------+
so I get all possible combinations of replacement (i.e. the Cartesian product of replacing more or less). So group 1 has no update,group 2 only B2, group 3 only D2 and group 4 both B2 and D2.
The end should look like this, but should be open to more (like an extra D3 for D1)
+-------+-----+----+
| group | seq | id |
+-------+-----+----+
| 1 | 1 | A1 |
| 1 | 2 | B1 |
| 1 | 3 | C1 |
| 1 | 4 | D1 |
| 2 | 1 | A1 |
| 2 | 2 | B2 |
| 2 | 3 | C1 |
| 2 | 4 | D1 |
| 3 | 1 | A1 |
| 3 | 2 | B1 |
| 3 | 3 | C1 |
| 3 | 4 | D2 |
| 4 | 1 | A1 |
| 4 | 2 | B2 |
| 4 | 3 | C1 |
| 4 | 4 | D2 |
+-------+-----+----+
EDIT:
Another possible replacement table could be
+----+-------+
| id | alter |
+----+-------+
| B1 | B2 |
| D1 | D2 |
| D1 | D3 |
+----+-------+
could should result in 6 groups (I hope I haven't forgot a case)
+-------+-----+----+
| group | seq | id |
+-------+-----+----+
| 1 | 1 | A1 |
| 1 | 2 | B1 |
| 1 | 3 | C1 |
| 1 | 4 | D1 |
| 2 | 1 | A1 |
| 2 | 2 | B2 |
| 2 | 3 | C1 |
| 2 | 4 | D1 |
| 3 | 1 | A1 |
| 3 | 2 | B2 |
| 3 | 3 | C1 |
| 3 | 4 | D2 |
| 4 | 1 | A1 |
| 4 | 2 | B2 |
| 4 | 3 | C1 |
| 4 | 4 | D3 |
| 5 | 1 | A1 |
| 5 | 2 | B1 |
| 5 | 3 | C1 |
| 5 | 4 | D2 |
| 6 | 1 | A1 |
| 6 | 2 | B1 |
| 6 | 3 | C1 |
| 6 | 4 | D3 |
+-------+-----+----+
If you have instead three replacements like
+----+-------+
| id | alter |
+----+-------+
| B1 | B2 |
| C1 | C2 |
| D1 | D3 |
+----+-------+
It'll result in 8 groups.
What I tried so far was not really helpful:
WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1') ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
---------
SELECT row_number() OVER (PARTITION BY a.id) as g, * FROM
a
CROSS JOIN b as b1
CROSS JOIN b as b2
LEFT JOIN b as b3 ON a.id=b3.id
ORDER by g,seq;

So group 1 has no update,group 2 only B2, group 3 only D2 and group 4
both B2 and D2.
Since the logic of this statement is not in a table, I decided to add this logic to table c, which is adding 3 new columns to the existing table a, depending on which selection of field had to be considered.
WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1') ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
, c as (
SELECT a.seq, a.id,
COALESCE(b1.alter,a.id) as id2,
COALESCE(b2.alter,a.id) as id3,
COALESCE(b3.alter,a.id) as id4
FROM a
LEFT JOIN (SELECT * FROM b WHERE b.alter='B2') b1 ON a.id = b1.id
LEFT JOIN (SELECT * FROM b WHERE b.alter='D2') b2 ON a.id = b2.id
LEFT JOIN (SELECT * FROM b WHERE b.alter IN ('B2','D2')) b3 ON a.id = b3.id)
, d as (SELECT * FROM (values (1),(2), (3), (4) ) as d1(gr) )
SELECT d.gr,
CASE d.gr
WHEN 1 THEN c.id
WHEN 2 THEN c.id2
WHEN 3 THEN c.id3
WHEN 4 THEN c.id4 END as id
FROM d
CROSS JOIN c
ORDER by d.gr, c.seq

What you need
After additional info from your comments, it seems like this is your case:
You have toll stations with a given number of booths:
CREATE TABLE station (
station text PRIMARY KEY
, booths int NOT NULL -- number of cashiers in station
);
INSERT INTO station VALUES
('A', 1)
, ('B', 2)
, ('C', 1)
, ('D', 3);
For a given route, say A --> B --> C --> D you want to generate all possible paths, taking booth numbers into account. I suggest an SQL function with a recursive CTE like:
CREATE OR REPLACE FUNCTION f_pathfinder(_route text[])
RETURNS TABLE (grp int, path text[]) LANGUAGE sql STABLE PARALLEL SAFE AS
$func$
WITH RECURSIVE rcte AS (
SELECT cardinality($1) AS hops, 1 AS hop, ARRAY[s.station || booth] AS path
FROM station s, generate_series(1, s.booths) booth
WHERE s.station = $1[1]
UNION ALL
SELECT r.hops, r.hop + 1, r.path || (s.station || booth)
FROM rcte r
JOIN station s ON s.station = _route[r.hop + 1], generate_series(1, s.booths) booth
WHERE r.hop < r.hops
)
SELECT row_number() OVER ()::int AS grp, path
FROM rcte r
WHERE r.hop = r.hops;
$func$;
Simple call:
SELECT * FROM f_pathfinder('{A,B,C,D}'::text[]);
Result:
grp | path
---: | :--------
1 | {1,1,1,1}
2 | {1,1,1,2}
3 | {1,1,1,3}
4 | {1,2,1,1}
5 | {1,2,1,2}
6 | {1,2,1,3}
Or with unnested arrays (result like you show in question):
SELECT grp, seq, booth
FROM f_pathfinder('{A,B,C,D}'::text[])
, unnest(path) WITH ORDINALITY AS x(booth, seq); -- ①
Result:
grp | seq | booth
--: | --: | :----
1 | 1 | A1
1 | 2 | B1
1 | 3 | C1
1 | 4 | D1
2 | 1 | A1
2 | 2 | B1
2 | 3 | C1
2 | 4 | D2
3 | 1 | A1
3 | 2 | B1
3 | 3 | C1
3 | 4 | D3
4 | 1 | A1
4 | 2 | B2
4 | 3 | C1
4 | 4 | D1
5 | 1 | A1
5 | 2 | B2
5 | 3 | C1
5 | 4 | D2
6 | 1 | A1
6 | 2 | B2
6 | 3 | C1
6 | 4 | D3
db<>fiddle here
The number of variants is growing quickly with the number of stops in your route. It's M1*M2* .. *Mn with Mn being the number of booths for the nth station.
① About ORDINALITY:
PostgreSQL unnest() with element number
What you asked (originally)
Seems like you want to apply all possible combinations from the set of changes listed in the replacement table rpl to the target table tbl.
With just two rows, forming the 4 (2^n) possible combinations is simple. For a general solution, I suggest a basic combinatorics function to generate all combinations. There are innumerable ways. Here is a pure SQL function:
CREATE OR REPLACE FUNCTION f_allcombos(_len int)
RETURNS SETOF bool[] LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
WITH RECURSIVE
tf(b) AS (VALUES (false), (true))
, rcte AS (
SELECT 1 AS lvl, ARRAY[b] AS arr
FROM tf
UNION ALL
SELECT r.lvl + 1, r.arr || tf.b
FROM rcte r, tf
WHERE lvl < _len
)
SELECT arr
FROM rcte
WHERE lvl = _len;
$func$;
Similar to what's discussed here:
Generate all combinations of given list of elements, sorted
Example for just 2 replacement rows:
SELECT * FROM f_allcombos(2);
{f,f}
{t,f}
{f,t}
{t,t}
Query
WITH effective_rpl AS ( -- ①
SELECT *, count(alter) OVER (ORDER BY seq) AS idx -- ②
FROM tbl LEFT JOIN rpl USING (id)
)
SELECT c.grp, e.seq
, CASE WHEN alter IS NOT NULL AND c.arr[e.idx] THEN e.alter -- ③
ELSE e.id END AS id
FROM effective_rpl e
, f_allcombos((SELECT count(alter)::int FROM effective_rpl)) -- ④
WITH ORDINALITY AS c(arr, grp); -- ⑤
Produces your desired result exactly.
db<>fiddle here
① Some of the replacements may have no match in the target table; so determine effective replacements to begin with.
② count() only counts non-null values, so this can serve as index for the 1-based array returned from f_allcombos().
③ Only replace when a replacement is available, and the boolean array has true for the given index idx.
④ The CROSS JOIN multiplies the set of rows in the target table with the number of possible replacement combinations
⑤ I use WITH ORDINALITY to generate "group numbers". See:
PostgreSQL unnest() with element number
We might wire that into the function directly, but I'd rather keep it generic.
Aside: "alter" is non-reserved in Postgres but a reserved word in standard SQL.

answer updated after edit to question
The tricky part in this problem is to generate the powerset of the replacements. However, luckily postgres supports recursive queries & powersets can be computed recursively. Thus we can build a general solution to this problem that will work regardless of the size of your replacement set.
Lets call the first table source, the 2nd table replacements, and I'll avoid the unsavory name alter for something else:
CREATE TABLE source (seq, id) as (
VALUES (1, 'A1'), (2, 'B1'), (3, 'C1'), (4, 'D1')
);
CREATE TABLE replacements (id, sub) as (
VALUES ('B1', 'B2'), ('D1', 'D2')
);
First powerset of the ids to be replaced needs to be generated. The null-set may be omitted since that won't work with joins anyhow & at the end the source table can be union'ed to the intermediate result to provide the same output.
In the recursive step, the the JOIN condition rec.id > repl.id ensures that each id is present only once for each generated subset.
In the final step:
the cross join fans out the source N times, where N is the number of non-empty combinations of replacements (with variations)
the group names are generated using a filtered runnign sum on seq.
the subsets are unnested & the ids replaced using a coalesce if the replacement id equals the source id.
WITH RECURSIVE rec AS (
SELECT ARRAY[(id, sub)] subset, id FROM replacements
UNION ALL
SELECT subset || (repl.id, sub), repl.id
FROM replacements repl
JOIN rec ON rec.id > repl.id
)
SELECT NULL subset, 0 set_name, seq, id FROM source
UNION ALL
SELECT subset
, SUM(seq) FILTER (WHERE seq = 1) OVER (ORDER BY subset, seq) set_name
, seq
, COALESCE(sub, source.id) id
FROM rec
CROSS JOIN source
LEFT JOIN LATERAL (
SELECT id, sub
FROM unnest(subset) x(id TEXT, sub TEXT)
) x ON source.id = x.id;
Tests
With replacement values ('B1', 'B2'), ('D1', 'D2'), the query returns 4 groups.
subset | set_name | seq | id
-----------------------+----------+-----+----
| 0 | 1 | A1
| 0 | 2 | B1
| 0 | 3 | C1
| 0 | 4 | D1
{"(B1,B2)"} | 1 | 1 | A1
{"(B1,B2)"} | 1 | 2 | B2
{"(B1,B2)"} | 1 | 3 | C1
{"(B1,B2)"} | 1 | 4 | D1
{"(D1,D2)"} | 2 | 1 | A1
{"(D1,D2)"} | 2 | 2 | B1
{"(D1,D2)"} | 2 | 3 | C1
{"(D1,D2)"} | 2 | 4 | D2
{"(D1,D2)","(B1,B2)"} | 3 | 1 | A1
{"(D1,D2)","(B1,B2)"} | 3 | 2 | B2
{"(D1,D2)","(B1,B2)"} | 3 | 3 | C1
{"(D1,D2)","(B1,B2)"} | 3 | 4 | D2
(16 rows)
With replacement values ('B1', 'B2'), ('D1', 'D2'), ('D1', 'D3'), The query returns 6 groups:
subset | set_name | seq | id
-----------------------+----------+-----+----
| 0 | 1 | A1
| 0 | 2 | B1
| 0 | 3 | C1
| 0 | 4 | D1
{"(B1,B2)"} | 1 | 1 | A1
{"(B1,B2)"} | 1 | 2 | B2
{"(B1,B2)"} | 1 | 3 | C1
{"(B1,B2)"} | 1 | 4 | D1
{"(D1,D2)"} | 2 | 1 | A1
{"(D1,D2)"} | 2 | 2 | B1
{"(D1,D2)"} | 2 | 3 | C1
{"(D1,D2)"} | 2 | 4 | D2
{"(D1,D2)","(B1,B2)"} | 3 | 1 | A1
{"(D1,D2)","(B1,B2)"} | 3 | 2 | B2
{"(D1,D2)","(B1,B2)"} | 3 | 3 | C1
{"(D1,D2)","(B1,B2)"} | 3 | 4 | D2
{"(D1,D3)"} | 4 | 1 | A1
{"(D1,D3)"} | 4 | 2 | B1
{"(D1,D3)"} | 4 | 3 | C1
{"(D1,D3)"} | 4 | 4 | D3
{"(D1,D3)","(B1,B2)"} | 5 | 1 | A1
{"(D1,D3)","(B1,B2)"} | 5 | 2 | B2
{"(D1,D3)","(B1,B2)"} | 5 | 3 | C1
{"(D1,D3)","(B1,B2)"} | 5 | 4 | D3
(24 rows)
With replacement values ('B1', 'B2'), ('C1', 'C2'), ('D1', 'D2'), The query returns 8 groups:
subset | set_name | seq | id
---------------------------------+----------+-----+----
| 0 | 1 | A1
| 0 | 2 | B1
| 0 | 3 | C1
| 0 | 4 | D1
{"(B1,B2)"} | 1 | 1 | A1
{"(B1,B2)"} | 1 | 2 | B2
{"(B1,B2)"} | 1 | 3 | C1
{"(B1,B2)"} | 1 | 4 | D1
{"(C1,C2)"} | 2 | 1 | A1
{"(C1,C2)"} | 2 | 2 | B1
{"(C1,C2)"} | 2 | 3 | C2
{"(C1,C2)"} | 2 | 4 | D1
{"(C1,C2)","(B1,B2)"} | 3 | 1 | A1
{"(C1,C2)","(B1,B2)"} | 3 | 2 | B2
{"(C1,C2)","(B1,B2)"} | 3 | 3 | C2
{"(C1,C2)","(B1,B2)"} | 3 | 4 | D1
{"(D1,D2)"} | 4 | 1 | A1
{"(D1,D2)"} | 4 | 2 | B1
{"(D1,D2)"} | 4 | 3 | C1
{"(D1,D2)"} | 4 | 4 | D2
{"(D1,D2)","(B1,B2)"} | 5 | 1 | A1
{"(D1,D2)","(B1,B2)"} | 5 | 2 | B2
{"(D1,D2)","(B1,B2)"} | 5 | 3 | C1
{"(D1,D2)","(B1,B2)"} | 5 | 4 | D2
{"(D1,D2)","(C1,C2)"} | 6 | 1 | A1
{"(D1,D2)","(C1,C2)"} | 6 | 2 | B1
{"(D1,D2)","(C1,C2)"} | 6 | 3 | C2
{"(D1,D2)","(C1,C2)"} | 6 | 4 | D2
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 1 | A1
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 2 | B2
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 3 | C2
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 4 | D2
(32 rows)

I can only think of a brute force approach. Enumerate the groups and multiply the second table -- so one set of rows for each group.
The following then uses bit manipulation to choose which value:
WITH a as (
SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1') ) as a1(seq, id)
),
b as (
SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter)
),
bgroups as (
SELECT b.*, grp - 1 as grp, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY id) - 1 as seqnum
FROM b CROSS JOIN
GENERATE_SERIES(1, (SELECT POWER(2, COUNT(*))::int FROM b)) gs(grp)
)
SELECT bg.grp, a.seq,
COALESCE(MAX(CASE WHEN a.id = bg.id AND (POWER(2, bg.seqnum)::int & bg.grp) > 0 THEN bg.alter END),
MAX(a.id)
) as id
FROM a CROSS JOIN
bgroups bg
GROUP BY bg.grp, a.seq
ORDER BY bg.grp, a.seq;
Here is a db<>fiddle.

Related

Count multiple Columns in Oracle SQL

I am looking to count the occurrence of IDs in 3 different columns using SQL. The raw table would look like:
id | A | B | C
------------------
1 | A1 | B1 | C1
2 | A1 | B2 | C1
3 | A1 | B1 | C2
4 | A2 | B1 | C2
5 | A1 | B2 | C2
The desired Table should look like:
id | A | count(A) | B | count(B) | C | count(C)
--------------------------------------------------
1 | A1 | 4 | B1 | 3 | C1 | 2
2 | A1 | 4 | B2 | 2 | C1 | 2
3 | A1 | 4 | B1 | 3 | C2 | 3
4 | A2 | 1 | B1 | 3 | C2 | 3
5 | A1 | 4 | B2 | 2 | C2 | 3
I tried the below query for a single column but didn't quite get the desired results:
SELECT A, COUNT(A) from table_name GROUP BY A;
But am unable to do the same for 3 columns.
Use COUNT as analytic function:
SELECT
id,
A,
COUNT(*) OVER (PARTITION BY A) cnt_A,
B,
COUNT(*) OVER (PARTITION BY B) cnt_B,
C,
COUNT(*) OVER (PARTITION BY C) cnt_C
FROM yourTable
ORDER BY id;
Demo
You don't want GROUP BY here, at least not by itself, because that aggregates the original records all of which you want to include in the output.

Oracle SQL - Mapping attributes from two tables

I have two tables T1 and T2.
T1 -> Contains the main data
T2 -> Configuration table
T1:
FILE | ATTRB1 | ATTRB2 | ATTRB3 | ATTRB4 |
F1 | 0 | 2 | 4 | 6 |
F1 | 1 | 3 | 5 | 7 |
F2 | 7 | 8 | 9 | 0 |
F2 | 1 | 2 | 3 | 4 |
F3 | 0 | 2 | 0 | 4 |
F3 | 1 | 0 | 3 | 0 |
F4 | 3 | 6 | 9 | 0 |
F4 | 4 | 8 | 1 | 2 |
T2:
ATTRB_ID | ATTRB_NAME | COLUMN | TABLE_REF |
1 | WORDS | ATTRB1 | T1 |
2 | CHARS | ATTRB2 | T1 |
3 | MATCH | ATTRB3 | T1 |
4 | SPACES | ATTRB4 | T1 |
Note: Mapping between T1 and T2 is using COLUMN attribute and TABLE_REF. If TABLE_REF is T1 then the records in it refer to T1.
Result Table:
FILE | WORDS | CHARS | MATCH | SPACES |
F1 | 0 | 2 | 4 | 6 |
F1 | 1 | 3 | 5 | 7 |
F2 | 7 | 8 | 9 | 0 |
F2 | 1 | 2 | 3 | 4 |
F3 | 0 | 2 | 0 | 4 |
F3 | 1 | 0 | 3 | 0 |
F4 | 3 | 6 | 9 | 0 |
F4 | 4 | 8 | 1 | 2 |
How can we achieve the above result using oracle SQL?
Any help on this topic is much appreciated.
Unpivot data from T1, join with T2, pivot again:
select *
from (
select rn, file_, val, attrb_name
from (select * from (select rownum rn, t1.* from t1)
unpivot (val for attr in (ATTRB1, ATTRB2, ATTRB3, ATTRB4))) t1
left join t2 on t2.col = t1.attr)
pivot (max(val) for attrb_name in ('WORDS', 'CHARS', 'MATCH', 'SPACES'))
order by rn
dbfiddle demo
You can create a function that returns a cursor
create or replace function query2 (p_table varchar2)
return sys_refcursor
is
rf_cur sys_refcursor;
sql_stm varchar2(4000);
begin
select 'SELECT ' || listagg('"'||column_name||'"' || nvl2(attrb_name,' as "'||attrb_name||'"',''),',') within group (order by column_id) ||' FROM '||p_table into sql_stm
from user_tab_columns
left join t2 on column_name = "COLUMN"
where table_name = p_table;
dbms_output.put_line(sql_stm);
open rf_cur for sql_stm;
return rf_cur;
end query2;
Usage: select query2('T1') from dual;

Oracle SQL - Generate aggregate rows for certain rows using select

I have a table like below.
|FILE| ID |PARENTID|SHOWCHILD|CAT1|CAT2|CAT3|TOTAL|
|F1 | A1 | P1 | N | 3 | 2 | 6 | 11 |
|F2 | A2 | P2 | N | 4 | 7 | 3 | 14 |
|F3 | A3 | P1 | N | 3 | 1 | 1 | 5 |
|F4 | LG1| | Y | 6 | 3 | 7 | 16 |
|F5 | LG2| | Y | 4 | 7 | 3 | 14 |
Now, Is it possible if I want to find the total (ie) aggregate of cat1, cat2, cat3 & total only for rows which has showChild as 'Y' and add that to the resultset.
|Tot| Res | Res | N | 10 | 10 | 10 | 30 |
Expected final output:
|FILE| ID |PARENTID|SHOWCHILD|CAT1|CAT2|CAT3|TOTAL|
|F1 | A1 | P1 | N | 3 | 2 | 6 | 11 |
|F2 | A2 | P2 | N | 4 | 7 | 3 | 14 |
|F3 | A3 | P1 | N | 3 | 1 | 1 | 5 |
|F4 | LG1| | Y | 6 | 3 | 7 | 16 |
|F5 | LG2| | Y | 4 | 7 | 3 | 14 |
|Tot | Res| Res | N | 10 | 10 | 10 | 30 |
Here I have added the Tot row(last row) after considering only the rows which has showchild as 'Y' and added that to the resultset.
I am trying for a solution without using UNION
Any help on achieving the above results is highly appreciated.
Thank you.
One approach would be to use a union:
WITH cte AS (
SELECT "FILE", ID, PARENTID, SHOWCHILD, CAT1, CAT2, CAT3, TOTAL, 1 AS position
FROM yourTable
UNION ALL
SELECT 'Tot', 'Res', 'Res', 'N', SUM(CAT1), SUM(CAT2), SUM(CAT3), SUM(TOTAL), 2
FROM yourTable
WHERE SHOWCHILD = 'Y'
)
SELECT "FILE", ID, PARENTID, SHOWCHILD, CAT1, CAT2, CAT3, TOTAL
FROM cte
ORDER BY
position,
"FILE";
Demo
You can try using UNION
select FILE,ID ,PARENTID,SHOWCHILD,CAT1,CAT2,CAT3,TOTAL from table1
union
select 'Tot','Res','Res','N',sum(cat1), sum(cat2),sum(cat3), sum(total)
from table1 where SHOWCHILD='Y'
I see you already accepted an answer, but you did ask for a solution that did not involve UNION. One such solution would be to use GROUPING SETS.
GROUPING SETS allow you to specify different grouping levels in your query and generate aggregates at each of those levels. You can use it to generate an output row for each record plus a single "total" row, as per your requirements. The function GROUPING can be used in expressions to identify whether each output row is in one group or the other.
Example, with test data:
with input_data ("FILE", "ID", PARENTID, SHOWCHILD, CAT1, CAT2, CAT3, TOTAL ) AS (
SELECT 'F1','A1','P1','N',3,2,6,11 FROM DUAL UNION ALL
SELECT 'F2','A2','P2','N',4,7,3,14 FROM DUAL UNION ALL
SELECT 'F3','A3','P1','N',3,1,1,5 FROM DUAL UNION ALL
SELECT 'F4','LG1','','Y',6,3,7,16 FROM DUAL UNION ALL
SELECT 'F5','LG2','','Y',4,7,3,14 FROM DUAL )
SELECT decode(grouping("FILE"),1,'Tot',"FILE") "FILE",
decode(grouping("ID"),1,'Res',"ID") "ID",
decode(grouping(parentid),1, 'Res',parentid) parentid,
decode(grouping(showchild),1, 'N',showchild) showchild,
decode(grouping("FILE"),1,sum(decode(showchild,'Y',cat1,0)),sum(cat1)) cat1,
decode(grouping("FILE"),1,sum(decode(showchild,'Y',cat2,0)),sum(cat2)) cat2,
decode(grouping("FILE"),1,sum(decode(showchild,'Y',cat3,0)),sum(cat3)) cat3,
decode(grouping("FILE"),1,sum(decode(showchild,'Y',total,0)),sum(total)) total
from input_data
group by grouping sets (("FILE", "ID", parentid, showchild), ())
+------+-----+-----+----------+-----------+------+------+------+-------+
| FILE | F2 | ID | PARENTID | SHOWCHILD | CAT1 | CAT2 | CAT3 | TOTAL |
+------+-----+-----+----------+-----------+------+------+------+-------+
| F1 | F1 | A1 | P1 | N | 3 | 2 | 6 | 11 |
| F2 | F2 | A2 | P2 | N | 4 | 7 | 3 | 14 |
| F3 | F3 | A3 | P1 | N | 3 | 1 | 1 | 5 |
| F4 | F4 | LG1 | - | Y | 6 | 3 | 7 | 16 |
| F5 | F5 | LG2 | - | Y | 4 | 7 | 3 | 14 |
| Tot | Tot | Res | Res | N | 10 | 10 | 10 | 30 |
+------+-----+-----+----------+-----------+------+------+------+-------+

TSQL - Referencing a changed value from previous row

I am trying to do a row calculation whereby the larger value will carry forward to the subsequent rows until a larger value is being compared. It is done by comparing the current value to the previous row using the lag() function.
Code
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
SELECT id
,d1
,d2 = CASE WHEN id <> (LAG(id,1,0) OVER (ORDER BY id,d1)) THEN d2
WHEN d2 < (LAG(d2,1,0) OVER (ORDER BY id,d1)) THEN (LAG(d2,1,0) OVER (ORDER BY id,d1))
ELSE d2 END
Output (Added row od2 for clarity)
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 4 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 3 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
As you can see from the output it lag function is referencing the original value of the previous row rather than the new value. Is there anyway to achieve this?
Desired Output
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 5 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 4 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
Try this:
SELECT id
,d1
,d2
,MAX(d2) OVER (PARTITION BY ID ORDER BY d1)
FROM #TAB
The idea is to use the MAX to get the max value from the beginning to the current row for each partition.
Thanks for providing the DDL scripts and the DML.
One way of doing it would be using recursive cte as follows.
1. First rank all the records according to id, d1 and d2. -> cte block
2. Use recursive cte and get the first elements using rnk=1
3. the field "compared_val" will check against the values from the previous rnk to see if the value is > than the existing and if so it would swap
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
;with cte
as (select row_number() over(partition by id order by d1,d2) as rnk
,id,d1,d2
from #TAB
)
,data(rnk,id,d1,d2,compared_val)
as (select rnk,id,d1,d2,d2 as compared_val
from cte
where rnk=1
union all
select a.rnk,a.id,a.d1,a.d2,case when b.compared_val > a.d2 then
b.compared_val
else a.d2
end
from cte a
join data b
on a.id=b.id
and a.rnk=b.rnk+1
)
select * from data order by id,d1,d2

Spark dataframe add Missing Values

I have a dataframe of the following format. I want to add empty rows for missing time stamps for each customer.
+-------------+----------+------+----+----+
| Customer_ID | TimeSlot | A1 | A2 | An |
+-------------+----------+------+----+----+
| c1 | 1 | 10.0 | 2 | 3 |
| c1 | 2 | 11 | 2 | 4 |
| c1 | 4 | 12 | 3 | 5 |
| c2 | 2 | 13 | 2 | 7 |
| c2 | 3 | 11 | 2 | 2 |
+-------------+----------+------+----+----+
The resulting table should be of the format
+-------------+----------+------+------+------+
| Customer_ID | TimeSlot | A1 | A2 | An |
+-------------+----------+------+------+------+
| c1 | 1 | 10.0 | 2 | 3 |
| c1 | 2 | 11 | 2 | 4 |
| c1 | 3 | null | null | null |
| c1 | 4 | 12 | 3 | 5 |
| c2 | 1 | null | null | null |
| c2 | 2 | 13 | 2 | 7 |
| c2 | 3 | 11 | 2 | 2 |
| c2 | 4 | null | null | null |
+-------------+----------+------+------+------+
I have 1 Million customers and 360(in the above example only 4 is depicted) Time slots.
I figured out a way to create a Dataframe with 2 columns (Customer_id,Timeslot) with (1 M x 360 rows) and do a Left outer join with the original dataframe.
Is there a better way to do this?
You can express this as a SQL query:
select df.customerid, t.timeslot,
t.A1, t.A2, t.An
from (select distinct customerid from df) c cross join
(select distinct timeslot from df) t left join
df
on df.customerid = c.customerid and df.timeslot = t.timeslot;
Notes:
You should probably put this into another dataframe.
You might have tables with the available customers and/or timeslots. Use those instead of the subqueries.
I think can used the answer of gordon linoff but you can add the following thinsg as you stated that there are millions of customer and you are performing join in them.
use tally table for TimeSlot?? because it might give a better performance.
for more usabllity please refer the following link
http://www.sqlservercentral.com/articles/T-SQL/62867/
and I think you should use partition or row number function to divide you column customerid and select the customers based on some partition value. For example just select the row number values and then cross join with the tally table. it can imporove your performance .