Need some sort of "conditional grouping" in MySQL - sql

I have Article table:
id | type | date
-----------------------
1 | A | 2010-01-01
2 | A | 2010-01-01
3 | B | 2010-01-01
Field type can be A, B or C.
I need to run a report that would return how many articles of each type there is per every day, like this:
date | count(type="A") | count(type="B") | count(type="C")
-----------------------------------------------------
2010-01-01 | 2 | 1 | 0
2010-01-02 | 5 | 6 | 7
Currently I am running 3 queries for every type and then manually merging the results
select date, count(id) from article where type="A" group by date
Is it possible to do this in one query? (in pure sql, no stored procedures or anything like that).
Thanks

A combination of SUM and CASE should do ya
select date
, sum(case when type ='A' then 1 else 0 end) as count_type_a
, sum(case when type ='B' then 1 else 0 end) as count_type_b
, sum(case when type ='C' then 1 else 0 end) as count_type_c
from article group by date

EDIT: Alex's answer above uses a better approach that the one in this answer. I'm leaving it here just because it also satisfies the question, in an alternative way:
You should be able to use sub queries, as follows:
SELECT DATE(a.date) as date,
(SELECT COUNT(a1.id) FROM articles a1 WHERE a1.type = 'A' AND a1.date = a.date) count_a,
(SELECT COUNT(a2.id) FROM articles a2 WHERE a2.type = 'B' AND a2.date = a.date) count_b,
(SELECT COUNT(a3.id) FROM articles a3 WHERE a3.type = 'C' AND a3.date = a.date) count_c
FROM articles a
GROUP BY a.date;
Test Case:
CREATE TABLE articles (id int, type char(1), date datetime);
INSERT INTO articles VALUES (1, 'A', '2010-01-01');
INSERT INTO articles VALUES (2, 'A', '2010-01-01');
INSERT INTO articles VALUES (3, 'B', '2010-01-01');
INSERT INTO articles VALUES (4, 'B', '2010-01-02');
INSERT INTO articles VALUES (5, 'B', '2010-01-02');
INSERT INTO articles VALUES (6, 'B', '2010-01-03');
INSERT INTO articles VALUES (7, 'B', '2010-01-01');
INSERT INTO articles VALUES (8, 'C', '2010-01-05');
Result:
+------------+---------+---------+---------+
| date | count_a | count_b | count_c |
+------------+---------+---------+---------+
| 2010-01-01 | 2 | 2 | 0 |
| 2010-01-02 | 0 | 2 | 0 |
| 2010-01-03 | 0 | 1 | 0 |
| 2010-01-05 | 0 | 0 | 1 |
+------------+---------+---------+---------+
4 rows in set (0.00 sec)

Related

Select aggregate ignores where cause

I'm trying to transform an existing view into a format I can work with.
The view vw_temp_appHoursLastTwoEntries looks like this:
RowNumber | PersNr | Client | Localtion | Agent | Date | Calweek | Year
----------+--------+--------+-----------+-------+------------+---------+------
1 | 123 | 1 | 1 | ag-01 | 2020-01-01 | 1 | 2021
2 | 123 | 1 | 1 | ag-01 | 2020-01-03 | 1 | 2021
1 | 9999 | 1 | 4 | ag-01 | 2020-01-01 | 1 | 2021
2 | 9999 | 1 | 4 | ag-01 | 2020-01-07 | 1 | 2021
I need this data in a different format that would look like this:
PersNr | Client | Localtion | Agent | minDate | MaxDate | DateDiff | Calweek | Year
-------+--------+-----------+-------+------------+------------+----------+---------+-------
123 | 1 | 1 | ag-01 | 2020-01-01 | 2020-01-03 | 3 | 1 | 2021
9999 | 1 | 4 | ag-01 | 2020-01-01 | 2020-01-07 | 7 | 1 | 2021
in the original format, one person has only two rows (RowNumber 1 and 2). I'd like to match each column and have the min and max date as well as the difference in a new view.
my Code:
select a.persnr, a.client, a.location, a.agent, a.calweek, a.year,
max(a.date) as maxdate, min(b.date) as mindate
, DATEDIFF(day,a.date,b.date) as dDiff
from vw_temp_appHoursLastTwoEntries a
left join vw_temp_appHoursLastTwoEntries b on
a.persnr = b.persnr and a.client = b.client and
a.agent = b.agent and a.date = b.date
where a.date != b.date and DATEDIFF(day,a.date,b.date) != 0
or (a.date is not null and b.date is not null)
group by a.persnr, a.client, a.location, a.agent, a.calweek, a.year, DATEDIFF(day,a.date,b.date)
The issue:
I'm currently getting back values where it seems like the where cause does not take effect but I don't understand why.
a.date != b.date should not return rows where min- and maxdates are the same. The datediff does not return any other value then 0 even when the min- and maxdates are different.
Pretty sure this is what you want:
declare #Test table (RowNumber int, PersNr int, Client int, Localtion int, Agent varchar(5), [Date] date, Calweek int, [Year] int);
insert into #Test (RowNumber, PersNr, Client, Localtion, Agent, [Date], Calweek, [Year])
values
(1, 123, 1, 1, 'ag-01', '2020-01-01', 1, 2021),
(2, 123, 1, 1, 'ag-01', '2020-01-03', 1, 2021),
(1, 9999, 1, 4, 'ag-01', '2020-01-01', 1, 2021),
(2, 9999, 1, 4, 'ag-01', '2020-01-07', 1, 2021);
select a.PersNr, a.Client, a.Localtion, a.Agent, a.Calweek, a.[Year]
, max(a.[date]) as maxdate
, min(b.[date]) as mindate
, abs(datediff(day,a.[date],b.[date])) as dDiff
from #Test a
left join #Test b on
a.persnr = b.persnr and a.client = b.client and
a.agent = b.agent --and a.[date] = b.[date]
where (/*a.[date] != b.[date] and*/ datediff(day,a.[date],b.[date]) != 0)
and /* not OR */ (a.[date] is not null and b.[date] is not null)
group by a.persnr, a.client, a.Localtion, a.agent, a.calweek, a.[Year], abs(datediff(day,a.[date],b.[date]));
Returns:
PersNr
Client
Localtion
Agent
Calweek
Year
maxdate
mindate
dDiff
123
1
1
ag-01
1
2021
2020-01-03
2020-01-01
2
9999
1
4
ag-01
1
2021
2020-01-07
2020-01-01
6
As Giorgos points out, you don't want to join on a.[date] = b.[date] because your where clause specifically filters that condition out.
The main issue was using OR instead of AND, you want to ensure that both date values are not null so that is an AND condition.
I am also assuming that dDiff is for debugging purposes only, which as you have it kept the rows from grouping, but you can group them by using the absolute value (abs).
You also don't need to test a.[date] != b.[date] because that is already true by virtue of datediff(day,a.[date],b.[date]) != 0.
Please use this form of DDL+DML (or a temp table) in future to provide sample data for us to work with (it gives you a minimal reproducible example also which is never a bad thing, because I picked up a number of typos in your query while copying it).

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!
There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

Sql query to partition and sum the records grouping by their bill number and Product code

Below are two tables where there are parent bill number like 1, 4 and 8. These parents bill references to nothing/NULL values. They are referenced by one or more child bill number. For eg parent bill 1 is referenced by child bill 2, 3 and 6.
Table B also has the bill no column with prod code with actual service (ST values) and associated service values (SV). SV are the additional cost to ST.
Same ST may occur in multiple bill numbers. Here Bill number is only unique.
For eg, ST1 are in bill number 1 and 8. Also same SV may reference same or different ST.
SV1, SV2 and SV3 are referencing to ST1 corresponding to bill no. 1 and SV2 and SV4 are referencing to ST2 corresponding to bill no.2.
How can we get below expected output?
Table A:
| bill no | ref |
+----------------------------------------+
| 1 | |
| 2 | 1 |
| 3 | 1 |
| 4 | |
| 5 | 4 |
| 6 | 1 |
| 7 | 4 |
| 8 | |
| 9 | 8 |
Table B:
| bill no | Prod code | cost |
+-----------------------------------------------------+
| 1 | ST1 | 10
| 2 | SV1 | 20
| 3 | SV2 | 30
| 4 | ST2 | 10
| 5 | SV2 | 20
| 6 | SV3 | 30
| 7 | SV4 | 40
| 8 | ST1 | 50
| 9 | SV1 | 10
Expected output:
| bill no | Prod code | ST_cost | SV1 | SV2 | SV3 |
+---------------------------------------------------------------------------------------------+
| 1 | ST1 | 10 | 20 | 30 | 30 |
| 4 | ST2 | 10 | 20 | 40 | |
| 8 | ST1 | 50 | 10 | | |
Here's a script that should get you there:
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.TableA;
CREATE TABLE dbo.TableA
(
BillNumber int NOT NULL PRIMARY KEY,
Reference int NULL
);
GO
INSERT dbo.TableA (BillNumber, Reference)
SELECT *
FROM (VALUES (1,NULL),
(2,1),
(3,1),
(4,NULL),
(5,4),
(6,1),
(7,4),
(8,NULL),
(9,8)) AS a(BillNumber, Reference);
GO
DROP TABLE IF EXISTS dbo.TableB;
CREATE TABLE dbo.TableB
(
BillNumber int NOT NULL PRIMARY KEY,
ProductCode varchar(10) NOT NULL,
Cost int NOT NULL
);
GO
INSERT dbo.TableB (BillNumber, ProductCode, Cost)
SELECT BillNumber, ProductCode, Cost
FROM (VALUES (1, 'ST1', 10),
(2, 'SV1', 20),
(3, 'SV2', 30),
(4, 'ST2', 10),
(5, 'SV2', 20),
(6, 'SV3', 30),
(7, 'SV4', 40),
(8, 'ST1', 50),
(9, 'SV1', 10)) AS b(BillNumber, ProductCode, Cost);
GO
WITH ParentBills
AS
(
SELECT b.BillNumber, b.ProductCode, b.Cost AS STCost
FROM dbo.TableB AS b
INNER JOIN dbo.TableA AS a
ON b.BillNumber = a.BillNumber
WHERE a.Reference IS NULL
),
SubBills
AS
(
SELECT pb.BillNumber, pb.ProductCode, pb.STCost,
b.ProductCode AS ChildProduct, b.Cost AS ChildCost
FROM ParentBills AS pb
INNER JOIN dbo.TableA AS a
ON a.Reference = pb.BillNumber
INNER JOIN dbo.TableB AS b
ON b.BillNumber = a.BillNumber
)
SELECT sb.BillNumber, sb.ProductCode, sb.STCost,
MAX(CASE WHEN sb.ChildProduct = 'SV1' THEN sb.ChildCost END) AS [SV1],
MAX(CASE WHEN sb.ChildProduct = 'SV2' THEN sb.ChildCost END) AS [SV2],
MAX(CASE WHEN sb.ChildProduct = 'SV3' THEN sb.ChildCost END) AS [SV3]
FROM SubBills AS sb
GROUP BY sb.BillNumber, sb.ProductCode, sb.STCost
ORDER BY sb.BillNumber;
You could write a function that creates you query based on your SV number.
And use "Execute Immediate" to execute the Query String and then "PIPE ROW" to generate the result.
Check This PIPE ROW EXAMPLE
I don't understand where the "SV1" value comes from on the second row.
But your problem is basically conditional aggregation:
with ab as (
select a.*, b.productcode, b.cost,
coalesce(a.reference, a.billnumber) as parent_billnumber
from a join
b
on b.billnumber = a.billnumber
)
select parent_billnumber,
max(case when reference is null then productcode end) as st,
sum(case when reference is null then cost end) as st_cost,
sum(case when productcode = 'SV1' then cost end) as sv1,
sum(case when productcode = 'SV2' then cost end) as sv2,
sum(case when productcode = 'SV3' then cost end) as sv3
from ab
group by parent_billnumber
order by parent_billnumber;
Here is a db<>fiddle.
Note this works because you have only one level of child relationships. If there are more, then recursive CTEs are needed. I would recommend that you ask a new question if this is possible.
The CTE doesn't actually add much to the query, so you can also write:
select coalesce(a.reference, a.billnumber) as parent_billnumber ,
max(case when a.reference is null then productcode end) as st,
sum(case when a.reference is null then b.cost end) as st_cost,
sum(case when b.productcode = 'SV1' then b.cost end) as sv1,
sum(case when b.productcode = 'SV2' then b.cost end) as sv2,
sum(case when b.productcode = 'SV3' then b.cost end) as sv3
from a join
b
on b.billnumber = a.billnumber
group by coalesce(a.reference, a.billnumber)
order by parent_billnumber;

Grouping by column and rows

I have a table like this:
+----+--------------+--------+----------+
| id | name | weight | some_key |
+----+--------------+--------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
| 4 | cranberries | 8 | 2 |
| 5 | raspberries | 18 | 2 |
+----+--------------+--------+----------+
I'm looking for a generic request that would get me all berries where there are three entries with the same 'some_key' and one of the entries (within those three entries belonging to the same some_key) has the weight = 0
in case of the sample table, expected output would be:
1 strawberries
2 blueberries
3 cranberries
As you want to include non-grouped columns, I would approach this with window functions:
select id, name
from (
select id,
name,
count(*) over w as key_count,
count(*) filter (where weight = 0) over w as num_zero_weight
from fruits
window w as (partition by some_key)
) x
where x.key_count = 3
and x.num_zero_weight >= 1
The count(*) over w counts the number of rows in that group (= partition) and the count(*) filter (where weight = 0) over w counts how many of those have a weight of zero.
The window w as ... avoids repeating the same partition by clause for the window functions.
Online example: https://rextester.com/SGWFI49589
Try this-
SELECT some_key,
SUM(weight) --Sample aggregations on column
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
As per your edited question, you can try this below-
SELECT id, name
FROM your_table
WHERE some_key IN (
SELECT some_key
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
)
Try doing this.
Table structure and sample data
CREATE TABLE tmp (
id int,
name varchar(50),
weight int,
some_key int
);
INSERT INTO tmp
VALUES
('1', 'strawberries', '12', '1'),
('2', 'blueberries', '7', '1'),
('3', 'elderberries', '0', '1'),
('4', 'cranberries', '8', '2'),
('5', 'raspberries', '18', '2');
Query
SELECT t1.*
FROM tmp t1
INNER JOIN (SELECT some_key
FROM tmp
GROUP BY some_key
HAVING Count(some_key) >= 3
AND Min(Abs(weight)) = 0) t2
ON t1.some_key = t2.some_key;
Output
+-----+---------------+---------+----------+
| id | name | weight | some_key |
+-----+---------------+---------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
+-----+---------------+---------+----------+
Online Demo: http://sqlfiddle.com/#!15/70cca/26/0
Thank you, #mkRabbani for reminding me about the negative values.
Further reading
- ABS() Function - Link01, Link02
- HAVING Clause - Link01, Link02

SQL query for sales report by date

I have a table of sales leads:
CREATE TABLE "lead" (
"id" serial NOT NULL PRIMARY KEY,
"marketer" varchar(500) NOT NULL,
"date_set" varchar(500) NOT NULL
)
;
INSERT INTO lead VALUES (1, 'Joe', '05/01/13');
INSERT INTO lead VALUES (2, 'Joe', '05/02/13');
INSERT INTO lead VALUES (3, 'Joe', '05/03/13');
INSERT INTO lead VALUES (4, 'Sally', '05/03/13');
INSERT INTO lead VALUES (5, 'Sally', '05/03/13');
INSERT INTO lead VALUES (6, 'Andrew', '05/04/13');
I want to produce a report that summarizes the number of records each marketer has for each day. It should look like this:
| MARKETER | 05/01/13 | 05/02/13 | 05/03/13 | 05/04/13 |
--------------------------------------------------------
| Joe | 1 | 1 | 1 | 0 |
| Sally | 0 | 0 | 2 | 1 |
| Andrew | 0 | 0 | 0 | 1 |
What's the SQL query to produce this?
I have this example set up on SQL Fiddle: http://sqlfiddle.com/#!12/eb27a/1
Pure SQL cannot produce such structure (it is two dimensional, but sql return plain list of records).
You could make query like this:
select marketer, date_set, count(id)
from lead
group by marketer, date_set;
And vizualise this data by your reporting system.
You can do it like this:
select
marketer,
count(case when date_set = '05/01/13' then 1 else null end) as "05/01/13",
count(case when date_set = '05/02/13' then 1 else null end) as "05/02/13",
count(case when date_set = '05/03/13' then 1 else null end) as "05/03/13",
count(case when date_set = '05/04/13' then 1 else null end) as "05/04/13"
from lead
group by marketer