SQLite: Multiple aggregate columns - sql

I'm a little new to SQL world and still learning the ins and outs of the language.
I have a table with an id, dayOfWeek, and a count for each day. So any given id might appear in the table up to seven times, with a count of events for each day for each id. I'd like to restructure the table to have a single row for each id with a column for each day of the week, something like the following obviously incorrect query:
SELECT id, sum(numEvents where dayOfWeek = 0), sum(numEvents where dayOfWeek = 1) ... from t;
Is there a solid way to approach this?
EDIT:
I'm worried I may not have been very clear. The table would ideally be structured something like this:
id | Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday
0 | 13 | 45 | 142 | 3 | 36 | 63 | 15336
1 | 17 | 25 | 45 | 364 | 37 | 540 | 0
So event 0 occurred 13 times on Sunday, 45 on Monday, etc... My current table looks like this:
id | dayOfWeek | count
0 | 0 | 13
0 | 1 | 45
0 | 2 | 142
0 | 3 | 3
0 | 4 | 36
0 | 5 | 63
0 | 6 | 15336
1 | 0 | 17
1 | 1 | 25
...
Hope that helps clear up what I'm after.

The following is verbose, but should work (generic ideone sql demo unfortunately SqlLite on SqlFiddle is down at the moment):
SELECT id,
SUM(case when dayofweek = 1 then numevents else 0 end) as Day1Events,
SUM(case when dayofweek = 2 then numevents else 0 end) as Day2Events,
SUM(case when dayofweek = 3 then numevents else 0 end) as Day3Events
--, etc...
FROM EventTable
GROUP BY ID;

SELECT dayOfWeek, sum(numEvents) as numberOfEvents
FROM t
GROUP BY dayOfWeek;

Related

SQL "Group" and "Count" categories

Edit. This is a follow up from another question. To simplify the question. Assume a table
date | id | type
01/01 | 1 | F
02/01 | 1 | F
02/01 | 1 | F
03/01 | 1 | S
03/01 | 1 | S
04/01 | 1 | F
04/01 | 1 | S
05/01 | 1 | S
I am looking for a way to summarise the above table by combination of transaction types per day. If a person (id) has only one transaction per day it counts as a Single type. If they have more than one it counts as a Multiple one. I've done that with my original query and it works. The output from the above table would be:
date | Single | Multiple
01/01 | 1 | 0
02/01 | 0 | 1
03/01 | 0 | 1
04/01 | 0 | 1
05/01 | 1 | 0
I got that far and it works. What's I'm struggling with (ie. don't have a clue of how to start) is how set up a query to show all possible combinations of Type (SS, FF, FS) instead of just counting the multiple transactions. The desired output would be like:
date | Single | # FF | # FS | # SS
01/01 | 1 | 0 | 0 | 0
02/01 | 0 | 1 | 0 | 0
03/01 | 0 | 0 | 0 | 1
04/01 | 0 | 0 | 1 | 0
05/01 | 1 | 0 | 0 | 0
Any constructive hints or ideas will be much appreciated.
this is assuming that you have max 2 types per date.
You can use the CASE WHEN statement with MIN() and MAX() to check for combination of FF, FS or SS
select [date],
case when count(*) = 1 then 1 else 0 end as Single,
case when count(*) >= 2
and min([type]) = 'F'
and max([type]) = 'F'
then 1
else 0
end as [# FF],
case when count(*) >= 2
and min([type]) = 'F'
and max([type]) = 'S'
then 1
else 0
end as [# FS],
case when count(*) >= 2
and min([type]) = 'S'
and max([type]) = 'S'
then 1
else 0
end as [# SS]
from yourtable
group by [date]
EDIT :
for more then 3 types, just change the count(*) = 2 to count(*) >= 2 as long as the type are either F or S

Creating columns by subaggregating by condition in Snowflake SQL

I have following table:
id1 | id2 | n_products | daydiff
a | 1 | 12 | 12
a | 1 | 11 | 13
a | 1 | 90 | 46
a | 2 | 5 | 5
b | 2 | 15 | 15
b | 2 | 15 | 21
c | 3 | 90 | 7
I need to aggregate this table by id and using daydiff in the following manner:
if daydiff is less than 14
if daydiff is between 14 and 28
if daydiff is more than 28.
this should be aggregated using mean.
The result should be:
id1 | id2 | sub 14 | 14_28 | 28+
a | 1 | 11.5 | 0 | 46
a | 2 | 5 | 0 | 0
b | 2 | 0 | 15 | 0
a | 3 | 7 | 0 | 0
How can I achieve this? I guess this would involve some group by statements, but I am not sure how should they be applied
Use conditional aggregation:
select id1, id2,
avg(case when datediff < 14 then n_products end) as avg_lt14,
avg(case when datediff >= 14 and datediff <= 28 then n_products end) as avg_14_28,
avg(case when datediff > 29 then n_products end) as avg_29pl
from t
group by id1, id2;
Some databases calculate the averages of integers as an integer. I don't know if Snowflake does this. If so, then change n_products to n_products * 1.0.
Gordon's answer is cross platform correct, but for myself I prefer the snowflake IFF syntax
SELECT id1, id2,
AVG(IFF(datediff < 14, n_products, NULL)) as avg_lt14,
AVG(IFF(datediff >= 14 and datediff <= 28, n_products, NULL)) as avg_14_28,
AVG(IFF(datediff > 29, n_products, NULL)) as avg_29pl
FROM t
GROUP BY id1, id2;

Showing date even zero value SQL

I have SQL Query:
SELECT Date, Hours, Counts FROM TRANSACTION_DATE
Example Output:
Date | Hours | Counts
----------------------------------
01-Feb-2018 | 20 | 5
03-Feb-2018 | 25 | 3
04-Feb-2018 | 22 | 3
05-Feb-2018 | 21 | 2
07-Feb-2018 | 28 | 1
10-Feb-2018 | 23 | 1
If you can see, there are days that missing because no data/empty, but I want the missing days to be shown and have a value of zero:
Date | Hours | Counts
----------------------------------
01-Feb-2018 | 20 | 5
02-Feb-2018 | 0 | 0
03-Feb-2018 | 25 | 3
04-Feb-2018 | 22 | 3
05-Feb-2018 | 21 | 2
06-Feb-2018 | 0 | 0
07-Feb-2018 | 28 | 1
08-Feb-2018 | 0 | 0
09-Feb-2018 | 0 | 0
10-Feb-2018 | 23 | 1
Thank you in advanced.
You need to generate a sequence of dates. If there are not too many, a recursive CTE is an easy method:
with dates as (
select min(date) as dte, max(date) as last_date
from transaction_date td
union all
select dateadd(day, 1, dte), last_date
from dates
where dte < last_date
)
select d.date, coalesce(td.hours, 0) as hours, coalesce(td.count, 0) as count
from dates d left join
transaction_date td
on d.dte = td.date;

What is the most performant way to rewrite a correlated subquery in the SELECT clause?

I am trying to count whether a user has visited a site in three time ranges:
last 30 days
between 31 and 60 days
between 61 and 90 days
I am using Netezza, which does NOT support correlated subqueries in the SELECT clause. See Rextester for successful query that must be re-written to NOT use a correlated subquery: http://rextester.com/JGR62033
Sample Data:
| user_id | last_visit | num_days_since_2017117 |
|---------|------------|------------------------|
| 1234 | 2017-11-02 | 15.6 |
| 1234 | 2017-09-30 | 48.6 |
| 1234 | 2017-09-03 | 75.0 |
| 1234 | 2017-08-21 | 88.0 |
| 9876 | 2017-10-03 | 45.0 |
| 9876 | 2017-07-20 | 120.0 |
| 5545 | 2017-09-15 | 63.0 |
Desired Output:
| user_id | last_30 | btwn_31_60 | btwn_61_90 |
|---------|---------|------------|------------|
| 1234 | 1 | 1 | 1 |
| 5545 | 0 | 0 | 1 |
| 9876 | 0 | 1 | 0 |
Here is one way with conditional aggregation, Rextester:
select
user_id
,MAX(case when '2017-11-17'-visit_date <=30
then 1
else 0
end) as last_30
,MAX(case when '2017-11-17'-visit_date >=31
and '2017-11-17'-visit_date <=60
then 1
else 0
end) as between_31_60
,MAX(case when '2017-11-17'-visit_date >=61
and '2017-11-17'-visit_date <=90
then 1
else 0
end) as between_61_90
from
visits
group by user_id
order by user_id
I don't know the specific DBMS you're using, but if it supports CASE or an equivalent you don't need a correlated sub-query; you can do it with a combination of SUM() and CASE.
Untested in your DBMS, of course, but it should give you a starting point:
SELECT
user_id,
SUM(CASE WHEN num_days <= 30 then 1 else 0 end) as last_30,
SUM(CASE WHEN num_days > 30 AND numdays < 61 then 1 else 0 end) as btwn_31_60,
SUM(CASE WHEN num_days >= 61 then 1 else 0 end) as btwn_61_90
FROM
YourTableName -- You didn't provide a tablename
GROUP BY
user_id
Since your values are floating point and not integer, you may need to adjust the values used for the day ranges to work with your specific requirements.

How to insert additional values in between a GROUP BY

i am currently making a monthly report using MySQL. I have a table named "monthly" that looks something like this:
id | date | amount
10 | 2009-12-01 22:10:08 | 7
9 | 2009-11-01 22:10:08 | 78
8 | 2009-10-01 23:10:08 | 5
7 | 2009-07-01 21:10:08 | 54
6 | 2009-03-01 04:10:08 | 3
5 | 2009-02-01 09:10:08 | 456
4 | 2009-02-01 14:10:08 | 4
3 | 2009-01-01 20:10:08 | 20
2 | 2009-01-01 13:10:15 | 10
1 | 2008-12-01 10:10:10 | 5
Then, when i make a monthly report (which is based by per month of per year), i get something like this.
yearmonth | total
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-07 | 54
2009-10 | 5
2009-11 | 78
2009-12 | 7
I used this query to achieved the result:
SELECT substring( date, 1, 7 ) AS yearmonth, sum( amount ) AS total
FROM monthly
GROUP BY substring( date, 1, 7 )
But I need something like this:
yearmonth | total
2008-01 | 0
2008-02 | 0
2008-03 | 0
2008-04 | 0
2008-05 | 0
2008-06 | 0
2008-07 | 0
2008-08 | 0
2008-09 | 0
2008-10 | 0
2008-11 | 0
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-05 | 0
2009-06 | 0
2009-07 | 54
2009-08 | 0
2009-09 | 0
2009-10 | 5
2009-11 | 78
2009-12 | 7
Something that would display the zeroes for the month that doesnt have any value. Is it even possible to do that in a MySQL query?
You should generate a dummy rowsource and LEFT JOIN with it:
SELECT *
FROM (
SELECT 1 AS month
UNION ALL
SELECT 2
…
UNION ALL
SELECT 12
) months
CROSS JOIN
(
SELECT 2008 AS year
UNION ALL
SELECT 2009 AS year
) years
LEFT JOIN
mydata m
ON m.date >= CONCAT_WS('.', year, month, 1)
AND m.date < CONCAT_WS('.', year, month, 1) + INTERVAL 1 MONTH
GROUP BY
year, month
You can create these as tables on disk rather than generate them each time.
MySQL is the only system of the major four that does have allow an easy way to generate arbitrary resultsets.
Oracle, SQL Server and PostgreSQL do have those (CONNECT BY, recursive CTE's and generate_series, respectively)
Quassnoi is right, and I'll add a comment about how to recognize when you need something like this:
You want '2008-01' in your result, yet nothing in the source table has a date in January, 2008. Result sets have to come from the tables you query, so the obvious conclusion is that you need an additional table - one that contains each month you want as part of your result.