Grouping over the subquery in SQL on unique id - sql

I've a query which gets results from temp table. It has aggregate columns which are derived from the temp table:
SELECT
DISTINCT
SUM(a),
SUM(b),
c,
d,
id1
FROM
#tmpTable
.
.
.
join with many other tables
I want to now get the SUM of columns c & d returned from the query along with all other columns. It will be group by id1. It should look something like:
+--------------------------------------------
||Sum(A) |Sum(B)|C |D |id1 |
|-------------------------------------------+
| 12 |34 |1 | 3 | 1 |
|-------------------------------------------+
| 22 |37 | 2 | 4 | 2 |
|-------------------------------------------+
| 33 | 55 | 3 | 5 | 1 |
|-------------------------------------------+
| 44 | 25 | 5 | 6 | 2 |
+---------+------+------+---------+---------+
Final result should be this:
+--------------------------------------------
||Sum(A) |Sum(B)|Sum(C)|Sum(d) |id1 |
|-------------------------------------------+
| 12 |34 |4 | 8 | 1 |
|-------------------------------------------+
| 22 |37 | 7 | 10 | 2 |
|-------------------------------------------+
| 33 | 55 | 4 | 8 | 1 |
|-------------------------------------------+
| 44 | 25 | 7 | 10 | 2 |
+---------+------+------+---------+---------+

select
x.sum_a,
x.sum_b,
x.sum_c,
x.sum_d,
t.id1
from
tmpTable t
join
(
select
id1,
sum(A) as sum_a,
sum(B) as sum_b,
sum(C) as sum_c,
sum(D) as sum_d
from
tmpTable
group by
id1
) x on t.id1 = x.id1

Seeing as you have different grouping criteria for A and B, you can group them separately to C and D. The below (using common table expression) might start you on the right track:
; with SummaryValues AS
(
select id1, sum(C) as SumC, SUM(D) as SumD
from #SourceTable
group by id1
)
select SUM(st.A), SUM(st.b), sv.SumC, sv.SumD, st.id1
from #SourceTable st
inner join SummaryValues sv
on st.id1 = sv.id1
group by <whatever grouping you are using>

If your current real query is summing up a and b the way you want and generating that first sample output, maybe something like:
SELECT DISTINCT
SUM(a),
SUM(b),
SUM(c) OVER (PARTITION BY id1),
SUM(d) OVER (PARTITION BY id1),
id1
FROM
#tmpTable
.
.
.
join with many other tables
to get the second one.

Related

Bigquery: Joining 2 tables one having repeated records and one with count ()

I want to join tables after unnest arrays in Table:1 but the records duplicated after the join because of the unnest.
Table:1
| a | d.b | d.c |
-----------------
| 1 | 5 | 2 |
- -------------
| | 3 | 1 |
-----------------
| 2 | 2 | 1 |
Table:2
| a | c | f |
-----------------
| 1 | 12 | 13 |
-----------------
| 2 | 14 | 15 |
I want to join table 1 and 2 on a but I need also to have the output of:
| a | d.b | d.c | f | h | Sum(count(a))
---------------------------------------------
| 1 | 5 | 2 | 13 | 12 |
- ------------- - - 1
| | 3 | 1 | | |
---------------------------------------------
| 2 | 2 | 1 | 15 | 14 | 1
a can be repeated in table 2 for that I need to count(a) then select the sum after join.
My problem is when I'm joining I need the nested and repeated record to be the same as in the first table but when use aggregation to get the sum I can't group by struct or arrays so I UNNEST the records first then use ARRAY_AGG function but also there was an issue in the sum.
SELECT
t1.a,
t2.f,
t2.h,
ARRAY_AGG(DISTINCT(t1.db)) as db,
ARRAY_AGG(DISTINCT(t1.dc)) as dc,
SUM(t2.total) AS total
FROM (
SELECT
a,
d.b as db,
d.c as dc
FROM
`table1`,
UNNEST(d) AS d,
) AS t1
LEFT JOIN (
SELECT
a,
f,
h,
COUNT(*) AS total,
FROM
`table2`
GROUP BY
a,f,h) AS t2
ON
t1.a = t2.a
GROUP BY
1,
2,
3
Note: the error is in the total number after the sum it is much higher than expected all other data are correct.
I guess your table 2 contains is not unique for column a.
Lets assume that the table 2 looks like this:
a
c
f
1
12
13
2
14
15
1
100
101
There are two rows where a is 1. Since b and f are different, the grouping does not solve this ( GROUP BY a,f,h) AS t2) and counts(*) as total is one for each row.
a
c
f
total
1
12
13
1
2
14
15
1
1
100
101
1
In the next step you join this table to your table 1. The rows of table1 with value 1 in column a are duplicated, because table2 has two entries. This lead to the fact that the sum is too high.
Instead of unnesting the tables, I recommend following approach:
-- Creating of sample data as given:
with tbl_A as (select 1 a, [struct(5 as b,2 as c),struct(3,1)] d union all select 2,[struct(2,1)] union all select null,[struct(50,51)]),
tbl_B as (select 1 as a,12 b, 13 f union all select 2,14,15 union all select 1,100,101 union all select null,500,501)
-- Query:
select *
from tbl_A A
left join
(Select a,array_agg(struct(b,f)) as B, count(1) as counts from tbl_B group by 1) B
on ifnull(A.a,-9)=ifnull(B.a,-9)

SQL - GROUP BY - Dynamic Columns

How can we achieve this?
Actual Table:
.-------.---------.-------.------.---------.
| EmpId | Project | Title | Role | Values |
|-------|---------|-------|----- |---------|
| 1 | aaa |xxx | A| 100|
| 1 | aaa |yyy | B| 120|
| 1 | aaa |zzz | C| 90|
.-------.---------.-------.------.---------.
Target 1:
.-------.---------.-------.----.----.----.
| EmpId | Project | Title | A | B | C |
|-------|---------|-------|--- |----|----|
| 1 | aaa |xxx | 100|null|null|
| 1 | aaa |yyy |null| 120|null|
| 1 | aaa |zzz |null|null| 90|
.-------.---------.-------.----.----.----.
Target 2:
.-------.---------.----.----.----.
| EmpId | Project | A | B | C |
|-------|---------|--- |----|----|
| 1 | aaa | 100| 120| 90|
.-------.---------.----.----.----.
Conditions:
In Target 1, Columns A/B/C are dynamically generated.(Pivot-ed, constant change of column names).
The columns A/B/C are not actually A/B/C. Its a result of a pivot table or stored procedure. It could be A/B/C/D or M/N or X/Y/Z.
Column Title is not at all important in Target 2.
Just use aggregation:
select EmpId, Project, max(A) as a, max(B) as b, max(C) as c
from t
group by EmpId, Project;
Use aggregation. MAX() ignores NULL values:
SELECT empid, project, MAX(A) as A, MAX(B) as B, MAX(C) as C
FROM mytable
GROUP BY empid, project
Demo on DB Fiddle:
| empid | project | A | B | C |
| ----- | ------- | --- | --- | --- |
| 1 | aaa | 100 | 120 | 90 |
SELECT
T1.EmpId,
T1.Project,
T2.A,
T3.B,
T4.C
FROM
Table T1
LEFT JOIN
Table T2 ON
T2.EmpId=T1.EmpId
AND T2.Project=T1.Project
AND T2.A IS NOT NULL
LEFT JOIN
Table T3 ON
T3.EmpId=T1.EmpId
AND T3.Project=T1.Project
AND T3.B IS NOT NULL
LEFT JOIN
Table T4 ON
T4.EmpId=T1.EmpId
AND T4.Project=T1.Project
AND T4.B IS NOT NULL
with cte (id,pro,title,rol,val) as (
select 1,'aaa','xxx','A',100 union all
select 1,'aaa','yyy','B',120 union all
select 1,'aaa','zzz','C',90)
select id,pro,title,[a],[b],[c] from (
select * from cte ) a
pivot
(max(val) for rol in ([a],[b],[c])) aa
with cte (id,pro,title,rol,val) as (
select 1,'aaa','xxx','A',100 union all
select 1,'aaa','yyy','B',120 union all
select 1,'aaa','zzz','C',90)
select id,pro,max([a]) A,max([b]) B,max([c]) C from (
select * from cte ) a
pivot
(max(val) for rol in ([a],[b],[c])) aa
group by id,pro

Impala - Does impala allow multi GROUP_CONCAT in one query

For example, I have a table below
+-----------+-------+------------+
| Id | a| b|
+-----------+-------+------------+
| 1 | 6 | 20 |
| 1 | 4 | 55 |
| 1 | 9 | 56 |
| 1 | 2 | 67 |
| 1 | 7 | 80 |
| 1 | 5 | 66 |
| 1 | 3 | 33 |
| 1 | 8 | 34 |
| 1 | 1 | 52 |
I want the output would be like below by using Impala
+-----------+-------------------+-----------------------------+
| Id | a | b |
+-----------+-------------------+-----------------------------+
| 1 | 6,4,9,2,7,5,3,8,1 | 20,55,56,67,80,66,33,34,52 |
+-----------+-------------------+-----------------------------+
In Impala, I have used
SELECT Id,
group_concat(DISTINCT a) AS a,
group_concat(DISTINCT b) AS b
FROM table GROUP BY Id
It will always get Syntax error. Just wondering is that we are not allowed to use multi group_concat for one query in Impala? or not allow to use multi Distinct for one query?
From the documentation for GROUP_CONCAT:
You cannot apply the DISTINCT operator to the argument of this function.
But, as workaround, we can use two separate subqueries to find the distinct values:
WITH cte1 AS (
SELECT Id, GROUP_CONCAT(a) AS a
FROM (SELECT DISTINCT Id, a FROM yourTable) t
GROUP BY Id
),
cte2 AS (
SELECT Id, GROUP_CONCAT(b) AS b
FROM (SELECT DISTINCT Id, b FROM yourTable) t
GROUP BY Id
)
SELECT
t1.Id,
t1.a,
t2.b
FROM cte1 t1
INNER JOIN cte2 t2
ON t1.Id = t2.Id;

Efficient query to Group by column name in SQL or hive

Imagine I have a table with 2 columns m_1 and m_2:
m1 | m2
3 | 17
3 | 18
4 | 17
9 | 9
I would like to get a table with 3 columns:
m is the index of m (in my exemple 1 or 2)
d is the data contains in the table .
count is the number of occurence of each data, group by value and index.
In the example, the result is:
m | d | count
m_1 | 3 | 2
m_1 | 4 | 1
m_1 | 9 | 1
m_2 | 17| 2
m_2 | 18| 1
m_2 | 9 | 1
The first ligne mus be read as 'data 3 occurs 2 times in column m_1'?
A naive solution is to execute two times a parametric query like this:
for (i in 1 .. 2)
SELECT CONCAT('m_', i), m_i, count(*) FROM table GROUP BY m_i
But this algorithm scans my table two times. This is a problem since I have 255 columns m and bilion of rows.
Will the solution becomes easier if I use hive instead of a relational data base?
You can write this using union all and group by:
select colname, d, count(*)
from ((select 'm_1' as colname, m1 as d from t) union all
(select 'm_2' as colname, m2 as d from t)
) m12
group by colname, d;
posexplode(array(m1,m2))
select concat('m_',cast(pe.pos+1 as string)) as m
,pe.val as d
,count(*) as `count`
from mytable t
lateral view posexplode(array(m1,m2)) pe
group by pos
,val
;
+------+-----+--------+
| m | d | count |
+------+-----+--------+
| m_1 | 3 | 2 |
| m_1 | 4 | 1 |
| m_1 | 9 | 1 |
| m_2 | 9 | 1 |
| m_2 | 17 | 2 |
| m_2 | 18 | 1 |
+------+-----+--------+

SQL Server Pivot Query add a total

Hi below is a SQL Server Pivot Query which gives an output like:
Semester| StudentDesc | [A]| [B] |[C] |[D]
-----------------------------------------
| 2 | Term1 | 20 | NULL| 5 | 10
------------------------------------------
| 3 | Term2 | 10 | 2 | 2 | 1
-----------------------------------------
I would the output to include a total (TotalSessions) of A, B, C, D such as:
Semester| StudentDesc | [A]| [B] |[C] |[D] | TotalSessions
---------------------------------------------------------
| 2 | Term1 | 20 | NULL| 5 | 10 | 35
--------------------------------------------------------
| 3 | Term2 | 10 | 2 | 2 | 1 | 15
-------------------------------------------------------
I assume that would be the column called
Count(Stats.SessionNumber) AS TotalSessions in the query
The query I have is:
SELECT Semester, StudentDesc, [A],[B],[C],[D]
FROM
(
SELECT
Semesters.Semester, Options.StudentDesc,
/*TotalSessions */ Count(Stats.SessionNumber) AS
TotalSessions, TrainerList.ShortName
FROM Semesters, (StudentList_tbl
INNER JOIN
((RegistrarSemestersAndTerms
INNER JOIN
Stats ON (StudentSemestersAndTerms.Semester = Stats.Semester)
AND (StudentSemestersAndTerms.StudentID = Stats.StudentID))
INNER JOIN
Options ON StudentSemestersAndTerms.Q3 = Options.TermID)
ON StudentList_tbl.StudentID = StudentSemestersAndTerms.StudentID)
INNER JOIN
TrainerList ON StudentList_tbl.RTP = TrainerList.TrainerID
GROUP BY Semesters.Semester, Options.StudentDesc, TrainerList.ShortName
) as base_query
PIVOT
(
Sum(TotalSessions) FOR ShortName IN ([A],[B],[C],[D])
) as pivot_query;
thanks
Simply wirte as:
SELECT Semester, StudentDesc, [A],[B],[C],[D],
isnull([A],0)+isnull([B],0)+isnull([C],0)+isnull([D],0) as TotalSessions
FROM
--... rest of the query