SQL GROUP BY ( DATEPART(), field1 ) result set to zero nulls - sql

I want to aggregate counts, grouped by a datepart and column.
For example, a table with 3 columns with each row representing a unique event: id, name, date
I want to select total counts grouped by name and hour, with zeros when there are no events. If I'm only grouping by name, I can join it with a table of every name. With an hour I could do something similar.
How would I handle the case of grouping by both without having a table with a row for every name+hour combination?

The following is the mysql solution:
create table hours (hour int)
insert hours (hour) values (0), (1) .... (23)
select hour, name, sum(case when name is null then 0 else 1 end)
from hours left outer join
event on (hour(event.date) = hours.hour)
group by hour, name
the sum(case when name is null then 0 else 1 end) handles the case when there are no events for a particular hour and name. the count will show as 0. For others each matching row contributes 1 to the sum.
For sql server use datepart(hour, event.date) instead. The rest should be similar

You can use cross join to generate all the rows and then other logic to fill in the values:
select h.hour, n.name, count(a.name) as cnt
from (select distinct hour(date) as hour from atable) h cross join
(select distinct name from atable) n left join
atable a
on hour(a.date) = h.hour and a.name = n.name
group by h.hour, n.name;

Related

Hive SQL nested query use similar column

I have a query that includes two subqueries with similar column 'day'. I would like to show values in a following way:
day cnt1 cnt_total
But in a query I have it does not recognize that the day column is similar and makes a multiplication of all rows in nested statement one by all rows in nested statement two.
Is there a way to make it recognize that the day column is similar?
The query looks as follows:
SELECT p1.day, p1.count AS cnt1, p2.count AS cnt_total
FROM
(
SELECT day, COUNT(DISTINCT id) AS count FROM table
WHERE 1=1
AND service="service"
AND action="action"
AND path LIKE "%search%"
AND year="2021"
GROUP BY day
) p1,
(
SELECT day, COUNT(DISTINCT id) AS count FROM table
WHERE 1=1
AND service="service"
AND action="action"
AND year="2021"
GROUP BY day
) p2;
You should be able to do this with conditional aggregation, so only one SELECT is needed:
SELECT day,
COUNT(DISTINCT CASE WHEN action = 'mousedown' AND data["path"] LIKE '%go-to-latest-search%' THEN gsid END) AS count,
COUNT(DISTINCT CASE WHEN action = 'impress' THEN gsid END) as cnt_total
FROM hit
WHERE service = 'sauto' AND
year = '2021' AND
month = '07'
GROUP BY day

Can I Select DISTINCT on 2 columns and Sum grouped by 1 column in one query?

Is it possible to write one query, where I would group by 2 columns in a table to get the count of total members plus get a sum of one column in that same table, but grouped by one column?
For example, the data looks like this
I want to get a count on distinct combinations of columns "OHID" and "MemID" and get the SUM of the "Amount" column grouped by OHID. The result is supposed to look like this
I was able to get the count correct using this query below
SELECT count(*) as TotCount
from (Select DISTINCT OHID, MemID
from #temp) AS TotMembers
However, when I try to use this query below to get all the results together, I am getting a count of 15 and a totally different total sum.
SELECT t.OHID,
count(TotMembers.MemID) as TotCount,
sum(t.Amount) as TotalAmount
from (Select DISTINCT OHID, MemID
from #temp) AS TotMembers
join #temp t on t.OHID = TotMembers .OHID
GROUP by t.OHID
If I understand correctly, you want to consider NULL as a valid value. The rest is just aggregation:
select t.ohid,
(count(distinct t.memid) +
(case when count(*) <> count(t.memid) then 1 else 0 end)
) as num_memid,
sum(t.amount) as total_amount
from #temp t
group by t.ohid,
The case logic might be a bit off-putting. It is just adding 1 if any values are NULL.
You might find this easier to follow with two levels of aggregation:
select t.ohid, count(*), sum(amount)
from (select t.ohid, t.memid, sum(t.amount) as amount
from #temp t
group by t.ohid, t.memid
) t
group by t.ohid

Showing zeroes in sql count

I`m using redshift and trying to count different things by days, but its not showing when the count in table 2 is zero. How can i make it show count zero?
SELECT TO_CHAR(date1,'dd') AS day,
COUNT(*) as Volume,sum(CASE WHEN status = 'ANSWERED' THEN 1 ELSE 0 END )as ANSWERED , t2.Volume AS TRANSFERS
FROM table1 t1
RIGHT JOIN (SELECT TO_CHAR(date2,'dd') AS day,
COUNT(*) as Volume
FROM table2
WHERE TO_CHAR(date2,'yyyy_MM') IN (SELECT DISTINCT TO_CHAR(date2,'yyyy_MM')
FROM table2
WHERE date2 BETWEEN DATE ('2016-11-01') AND DATE ('2016-12-30'))
AND type = 'Active'
GROUP BY day) t2 ON TO_CHAR(date1,'dd') = day
WHERE TO_CHAR(date1,'yyyy_MM') IN (SELECT DISTINCT TO_CHAR(date1,'yyyy_MM')
FROM table1
WHERE date1 BETWEEN DATE ('2016-11-01') AND DATE ('2016-12-30'))
GROUP BY 1,4
ORDER BY 1
Notice that you used a right join between the tables. This means that any row from the first table that doesn't have a matching day in the second table will not display.
If you're new with SQL joins you can refer to this image that explains it.
If your first (or left table) contains all of the unique days that should show up in the result, just switch the "right" to a "left" join.

SQL Percentage Count Query By Date

I am able to calculate the percentage count on a particular date in a Microsoft Access 2007 SQL query using:
SELECT Date, Val, (Count(Val) / (SELECT Count(*) From Table HAVING Date=#7/31/2012#) as PercentVal
FROM Table
GROUP BY Date, Val
HAVING Date=#7/31/2012#
However, I would like to make this same calculation over every date using the count totals . For instance, the query:
SELECT Date, Val, Count(*) AS CountVal
FROM Table
GROUP BY Date, Val
finds the counts in every period. I would like to add an additional column with the percent counts. However, I can't seem to figure out how to calculate count percentage in every period without using the above block of text and setting up queries for each individual period.
You can subquery it like this:
SELECT A.ADate, A.Val, COUNT(A.Val) / B.DateCount
FROM Table1 AS A
INNER JOIN (
SELECT C.ADate, COUNT(*) AS DateCount
FROM Table1 C
GROUP BY C.ADate
) AS B ON A.ADate = B.ADate
GROUP BY A.ADate, A.Val, B.DateCount

Hive SQL aggregate merge multiple sqls into one

I have a serial sqls like:
select count(distinct userId) from table where hour >= 0 and hour <= 0;
select count(distinct userId) from table where hour >= 0 and hour <= 1;
select count(distinct userId) from table where hour >= 0 and hour <= 2;
...
select count(distinct userId) from table where hour >= 0 and hour <= 14;
Is there a way to merge them into one sql?
It looks like you are trying to keep a cumulative count, bracketed by the hour. To do that, you can use a window function, like this:
SELECT DISTINCT
A.hour AS hour,
SUM(COALESCE(M.include, 0)) OVER (ORDER BY A.hour) AS cumulative_count
FROM ( -- get all records, with 0 for include
SELECT
name,
hour,
0 AS include
FROM
table
) A
LEFT JOIN
( -- get the record with lowest `hour` for each `name`, and 1 for include
SELECT
name,
MIN(hour) AS hour,
1 AS include
FROM
table
GROUP BY
name
) M
ON M.name = A.name
AND M.hour = A.hour
;
There might be a simpler way, but this should yield the correct answer in general.
Explanation:
This uses 2 subqueries against the same input table, with a derived field called include to keep track of which records should contribute to the final total for each bucket. The first subquery simply takes all records in the table and assigns 0 AS include. The second subquery finds all unique names and the lowest hour slot in which that name appears, and assigns them 1 AS include. The 2 subqueries are LEFT JOIN'ed by the enclosing query.
The outermost query does a COALESCE(M.include, 0) to fill in any NULL's produced by the LEFT JOIN, and those 1's and 0's are SUM'ed and windowed by hour. This needs to be a SELECT DISTINCT rather than using a GROUP BY becuse a GROUP BY will want both hour and include listed, but it ends up collapsing every record in a given hour group into a single row (still with include=1). The DISTINCT is applied after the SUM so it will remove duplicates without discarding any input rows.