I have data as following and i need to group sum pivot
AA
BB
date
a
1
01/01/2020
a
2
01/01/2020
b
5
01/01/2020
b
1
01/01/2020
c
5
01/01/2020
d
1
01/01/2020
d
8
02/01/2020
e
1
01/01/2020
what I obtain with my sql code
a
b
c
d
e
01/01/2020
3
6
5
1
1
02/01/2020
/
/
/
8
/
what I need to obtain: a and d grouped as f and c and e grouped as g and b separate
b
f
g
01/01/2020
6
4
6
02/01/2020
/
8
/
I have got the following sql but I cant seem to do the group summing. Do you do it before pivoting or after?
SELECT * FROM(
SELECT AA,Date
FROM [dbo].[Data] )
AS SourceTable
PIVOT(SUM([BB])
FOR [AA] IN([a],[b],[c],[d],[e]))
AS PivotTable
IF I try this it doesnt work
SELECT * FROM(
SELECT AA,Date
FROM [dbo].[Data] )
AS SourceTable
PIVOT(SUM([BB])
FOR [AA] IN([a]+[d],[b],[c]+[e]))
AS PivotTable
Use conditional aggregation as follows:
select sum(case when aa in ('a','d') then BB end) as f,
sum(case when aa in ('c','e') then BB end) as g,
sum(case when aa = 'b' then BB end) as b
from table_name
group by date
I find that this is simpler done with conditional aggregation:
select
date,
sum(case when d.aa = 'b' then bb else 0 end) as b,
sum(case when d.aa in ('a', 'd') then bb else 0 end) as f,
sum(case when d.aa in ('c', 'e') then bb else 0 end) as g
from data d
group by date
Related
I'm trying to sum a column based on a condition in another column with partition by in SQL, but it's not working. So I hope somebody can help me with this.
My table is like this:
Group_1
Group_2
Date
Value
A
D
01/01/2021
1
A
D
01/02/2021
3
A
E
01/03/2021
5
B
D
01/01/2021
7
B
D
01/02/2021
9
B
E
01/03/2021
11
B
D
01/05/2021
17
B
D
01/03/2021
13
B
E
01/04/2021
13
C
D
01/01/2021
7
C
D
01/02/2021
10
So, I need to sum the values of [Value] for all rows where there is a 'D' on [Group_2] that is older than the first 'E' on the same group (if it exists) for each group of [Group_1].
And the result should be like:
Group_1
Group_2
Sum
A
D
4
B
D
16
C
D
17
Anybody knows how can I solve this kind of problem?
Try the following aggregation with NOT EXISTS:
SELECT Group_1, Group_2, SUM(Value) AS Value_Sum
FROM table_name T
WHERE Group_2 <> 'E' AND
NOT EXISTS (SELECT 1 FROM table_name D
WHERE D.Group_1 = T.Group_1 AND
D.Group_2 = 'E' AND
D.Date <= T.Date)
GROUP BY Group_1, Group_2
ORDER BY Group_1, Group_2
See a demo.
select group_1
,group_2
,sum(value)
from
(
select group_1
,group_2
,case when count(case when group_2 = 'E' then 1 end) over(partition by group_1 order by date) = 0 then value end as value
from t
) t
group by group_1, group_2
having group_2 = 'D'
group_1
group_2
sum
A
D
4
B
D
16
C
D
17
Fiddle
Table 1 -
ID VehicleID
1 A
2 A
3 A
1 B
1 C
4 C
2 D
Table 2-
ID VehicleID VehicleNo
1 A AA
2 A AA
3 A
1 B BB
1 C CC
4 C CC
2 D DD
Output-
VehicleId VehicleNo
A AA
B BB
C CC
D DD
This is how I understood it; read comments within code.
SQL> with
2 -- calculate "RN" (so that you'd have something to match rows on)
3 a as
4 (select vehicleid,
5 row_number() over (order by vehicleid) rn
6 from (select distinct vehicleid from tab1)
7 ),
8 b as
9 (select vehicleno,
10 row_number() over (order by vehicleno) rn
11 from (select distinct vehicleno from tab2)
12 )
13 -- final query
14 select a.vehicleid, b.vehicleno
15 from a left join b on a.rn = b.rn;
VEHICLEID VEHICLENO
---------- ----------
A AA
B BB
C CC
D DD
SQL>
One simple method is aggregation:
select VehicleId, max(VehicleNo) as VehicleNo
from table2
group by VehicleId;
Need to transfer certain records of some columns from Table1 to Table2 but filtering rows based on condition.
Lets say Table1 looks like as shown below, has many columns in it.
Table1
A B C D E F G H ...
1 24-OCT-20 08.22.57.642000000 AM 100 xyz 1 1
2 24-OCT-20 08.22.57.642000000 AM 100 xyz 1 0
13 25-OCT-20 05.47.52.733000000 PM 100 xyz 1 0
34 26-OCT-20 09.22.57.642000000 AM 100 xyz 1 0
25 26-OCT-20 09.25.57.642000000 AM 101 xyz 1 0
26 26-OCT-20 09.25.57.642000000 AM 101 xyz 1 1
6 26-OCT-20 09.25.57.642000000 AM 101 abc 1 1
10 26-OCT-20 09.25.57.642000000 AM 101 xyz 0 1
17 26-OCT-20 04.22.57.642000000 AM 100 xyz 1 0
18 26-OCT-20 06.22.57.642000000 AM 105 xyz 1 1
19 26-OCT-20 06.22.57.642000000 AM 105 xyz 1 0
In Table2, need to insert rows from Table1 based on following:
First, select A, B, C, D, E, F from Table1 where D='xyz' and E=1; and on the result of this query apply the following condition to further filter out unwanted rows:
Condition: For same values in columns B, C, D & E in 2 different rows, if column F has 2 different values then only keep the row with greater value in column A.
So desired output in Table2 is shown as below:
Table2
A B C D E F
2 24-OCT-20 08.22.57.642000000 AM 100 xyz 1 0
13 25-OCT-20 05.47.52.733000000 PM 100 xyz 1 0
34 26-OCT-20 09.22.57.642000000 AM 100 xyz 1 0
26 26-OCT-20 09.25.57.642000000 AM 101 xyz 1 1
17 26-OCT-20 04.22.57.642000000 AM 100 xyz 1 0
19 26-OCT-20 06.22.57.642000000 AM 105 xyz 1 0
How can this be achieved through the simplest and most efficient SQL query?
Any help will be appreciated.
You can use window functions:
insert into table2 (a, b, c, d, e, f)
select a, b, c, d, e, f
from (
select t1.*,
row_number() over(partition by b, c, d, e order by a desc) rn
from table1 t1
where d = 'xyz' and e = 1
) t1
where rn = 1
This can be achieved using the GROUP BY and KEEP clause as follows:
select max(t.a) as a, t.b, t.c, t.d, t.e,
max(t.f) keep (dense_rank last over order by t.a) as f
from t
where t.d = 'xyz' and t.e = 1
group by t.b, t.c, t.d, t.e
I have a table like below
email table_name
a#mail.com a1
a#mail.com b2
b#mail.com a1
c#mail.com c1
d#mail.com d1
e#mail.com e
g#mail.com e
g#mail.com e
e#mail.com f
g#mail.com g
So from here how can I calculate email duplicate percentage of each table
table_name total_email duplicate_email duplicate_percentage
a1 2 1 50%
b2 1 1 100%
c1 1 0 0
d1 1 0 0
e 2 2 100%
f 1 1 100%
g 1 1 100%
Here's my try. Setup:
DECLARE #Test TABLE (
email VARCHAR(100),
table_name VARCHAR(100))
INSERT INTO #Test (
email,
table_name)
VALUES
('a#mail.com', 'a1'),
('a#mail.com', 'b2'),
('b#mail.com', 'a1'),
('c#mail.com', 'c1'),
('d#mail.com', 'd1'),
('e#mail.com', 'e'),
('g#mail.com', 'e'),
('g#mail.com', 'e'),
('e#mail.com', 'f'),
('g#mail.com', 'g')
Solution:
;WITH DupDetail AS
(
SELECT
T.email,
T.table_name,
IsDup = CASE WHEN COUNT(*) OVER (PARTITION BY T.email) > 1 THEN 1 ELSE 0 END
FROM
#Test AS T
),
DupStats AS
(
SELECT
T.table_name,
total_email = COUNT(DISTINCT(T.email)),
duplicate_email = COUNT(DISTINCT(CASE WHEN T.IsDup = 1 THEN T.email END))
FROM
DupDetail AS T
GROUP BY
T.table_name
)
SELECT
D.table_name,
D.total_email,
D.duplicate_email,
duplicate_percentage = CONVERT(
DECIMAL(5,2),
D.duplicate_email * 100.0 / D.total_email)
FROM
DupStats AS D
IsDup column marks the mail as 1 if it's repeated in any table, then duplicate_email is a COUNT DISTINCT for all emails that are duplicates across all tables, but grouped by each table name.
Result:
table_name total_email duplicate_email duplicate_percentage
a1 2 1 50.00
b2 1 1 100.00
c1 1 0 0.00
d1 1 0 0.00
e 2 2 100.00
f 1 1 100.00
g 1 1 100.00
You can use window functions, then aggregation:
select
table_name,
sum(cnt1) duplicate,
sum(1.0 * cnt1) / cnt2 percentage
from (
select
t.*,
count(*) over(partition by email) - 1 cnt1,
count(*) over(partition by table_name) cnt2
from mytable t
) t
group by table_name, cnt2
order by table_name
Demo on DB Fiddle:
table_name | duplicate | percentage
:--------- | --------: | :---------
a1 | 1 | 0.500000
b2 | 1 | 1.000000
c1 | 0 | 0.000000
d1 | 0 | 0.000000
table ta having four columns ( SQL server and D column is date)
A | B | C|D
1 |11| 0|10-MAY-2019
1 |12| 0|10-MAY-2019
1 |13| 0|null
2 |33| 5|null
2 |34| 10|null
2 |35| 78|null
5 |45| 0|10-MAY-2019
5 |49| 0|10-MAY-2019
5 |51| 0|10-MAY-2019
8 |10| 0|1-MAY-2018
8 |14| 0|1-MAY-2018
8 |34| 0|1-MAY-2018
I am looking the SQL query to fetch the distinct A value which is having C value ZERO for all the B (ie. SUM(ABS(C))=0) and all D value for that will not be null and should be > GETDATE() - 90 (i.e any day between current date and 90 days)
From above table I would only get the value of A as '5'
Try this-
SELECT * FROM
(
SELECT A
FROM your_table
WHERE D > CAST(DATEADD(DD,-90,GETDATE()) AS DATE)
GROUP BY A
HAVING COUNT(A) = SUM(CASE WHEN C= 0 THEN 1 ELSE 0 END)
)A
WHERE NOT EXISTS
(
SELECT 1 FROM your_table B WHERE A.A = B.A
AND D IS NULL
)
You can use aggregation. I think this is the logic you describe:
select a
from t
where d > dateadd(day, -90, getdate()) or d is null
group by a
having max(c) = 0 and
count(*) = count(d); -- no NULL d values