Creating columns by subaggregating by condition in Snowflake SQL

Creating columns by subaggregating by condition in Snowflake SQL - sql

I have following table:
id1 | id2 | n_products | daydiff
a | 1 | 12 | 12
a | 1 | 11 | 13
a | 1 | 90 | 46
a | 2 | 5 | 5
b | 2 | 15 | 15
b | 2 | 15 | 21
c | 3 | 90 | 7
I need to aggregate this table by id and using daydiff in the following manner:
if daydiff is less than 14
if daydiff is between 14 and 28
if daydiff is more than 28.
this should be aggregated using mean.
The result should be:
id1 | id2 | sub 14 | 14_28 | 28+
a | 1 | 11.5 | 0 | 46
a | 2 | 5 | 0 | 0
b | 2 | 0 | 15 | 0
a | 3 | 7 | 0 | 0
How can I achieve this? I guess this would involve some group by statements, but I am not sure how should they be applied

Use conditional aggregation:
select id1, id2,
avg(case when datediff < 14 then n_products end) as avg_lt14,
avg(case when datediff >= 14 and datediff <= 28 then n_products end) as avg_14_28,
avg(case when datediff > 29 then n_products end) as avg_29pl
from t
group by id1, id2;
Some databases calculate the averages of integers as an integer. I don't know if Snowflake does this. If so, then change n_products to n_products * 1.0.

Gordon's answer is cross platform correct, but for myself I prefer the snowflake IFF syntax
SELECT id1, id2,
AVG(IFF(datediff < 14, n_products, NULL)) as avg_lt14,
AVG(IFF(datediff >= 14 and datediff <= 28, n_products, NULL)) as avg_14_28,
AVG(IFF(datediff > 29, n_products, NULL)) as avg_29pl
FROM t
GROUP BY id1, id2;

Related

Query to get the count of data for particular customer with all other data from table

My table structure is as follows:
group_id | cust_id | ticket_num
------------------------------
60 | 12 | 1
60 | 12 | 2
60 | 12 | 3
60 | 12 | 4
60 | 30 | 5
60 | 30 | 6
60 | 31 | 7
60 | 31 | 8
65 | 02 | 1
I want to fetch all the data for group_id=60 and find the count of ticket_num for each customer in that group. My output should be like this:
cust_id | ticket_count | ticket_num
------------------------------
12 | 4 | 1
12 | | 2
12 | | 3
12 | | 4
30 | 2 | 5
30 | | 6
31 | 2 | 7
31 | | 8
I tried this query:
SELECT gd.cust_id, Count(gd.cust_id),gd.ticket_num
FROM Group_details gd
WHERE gd.group_id = 65
GROUP BY gd.cust_id;
But this query is not working.

You appear to want the ANSI/ISO standard row_number() functions and count() as a window function:
select gd.cust_id, count(*) over (partition by gd.cust_id) as num_tickets,
row_number() over (order by gd.cust_id) as ticket_seqnum
from group_details gd
where gd.group_id = 60;

use aggregate and subquery
select t2.*,t1.ticket_num from Group_details t1
inner join
(
SELECT gd.cust_id, Count(gd.ticket_num) as ticket_count
FROM Group_details gd where gd.group_id = 60
GROUP BY gd.cust_id
) t2 on t1.cust_id=t2.cust_id
http://sqlfiddle.com/#!9/dd718b/1

Showing date even zero value SQL

I have SQL Query:
SELECT Date, Hours, Counts FROM TRANSACTION_DATE
Example Output:
Date | Hours | Counts
----------------------------------
01-Feb-2018 | 20 | 5
03-Feb-2018 | 25 | 3
04-Feb-2018 | 22 | 3
05-Feb-2018 | 21 | 2
07-Feb-2018 | 28 | 1
10-Feb-2018 | 23 | 1
If you can see, there are days that missing because no data/empty, but I want the missing days to be shown and have a value of zero:
Date | Hours | Counts
----------------------------------
01-Feb-2018 | 20 | 5
02-Feb-2018 | 0 | 0
03-Feb-2018 | 25 | 3
04-Feb-2018 | 22 | 3
05-Feb-2018 | 21 | 2
06-Feb-2018 | 0 | 0
07-Feb-2018 | 28 | 1
08-Feb-2018 | 0 | 0
09-Feb-2018 | 0 | 0
10-Feb-2018 | 23 | 1
Thank you in advanced.

You need to generate a sequence of dates. If there are not too many, a recursive CTE is an easy method:
with dates as (
select min(date) as dte, max(date) as last_date
from transaction_date td
union all
select dateadd(day, 1, dte), last_date
from dates
where dte < last_date
)
select d.date, coalesce(td.hours, 0) as hours, coalesce(td.count, 0) as count
from dates d left join
transaction_date td
on d.dte = td.date;

What is the most performant way to rewrite a correlated subquery in the SELECT clause?

I am trying to count whether a user has visited a site in three time ranges:
last 30 days
between 31 and 60 days
between 61 and 90 days
I am using Netezza, which does NOT support correlated subqueries in the SELECT clause. See Rextester for successful query that must be re-written to NOT use a correlated subquery: http://rextester.com/JGR62033
Sample Data:
| user_id | last_visit | num_days_since_2017117 |
|---------|------------|------------------------|
| 1234 | 2017-11-02 | 15.6 |
| 1234 | 2017-09-30 | 48.6 |
| 1234 | 2017-09-03 | 75.0 |
| 1234 | 2017-08-21 | 88.0 |
| 9876 | 2017-10-03 | 45.0 |
| 9876 | 2017-07-20 | 120.0 |
| 5545 | 2017-09-15 | 63.0 |
Desired Output:
| user_id | last_30 | btwn_31_60 | btwn_61_90 |
|---------|---------|------------|------------|
| 1234 | 1 | 1 | 1 |
| 5545 | 0 | 0 | 1 |
| 9876 | 0 | 1 | 0 |

Here is one way with conditional aggregation, Rextester:
select
user_id
,MAX(case when '2017-11-17'-visit_date <=30
then 1
else 0
end) as last_30
,MAX(case when '2017-11-17'-visit_date >=31
and '2017-11-17'-visit_date <=60
then 1
else 0
end) as between_31_60
,MAX(case when '2017-11-17'-visit_date >=61
and '2017-11-17'-visit_date <=90
then 1
else 0
end) as between_61_90
from
visits
group by user_id
order by user_id

I don't know the specific DBMS you're using, but if it supports CASE or an equivalent you don't need a correlated sub-query; you can do it with a combination of SUM() and CASE.
Untested in your DBMS, of course, but it should give you a starting point:
SELECT
user_id,
SUM(CASE WHEN num_days <= 30 then 1 else 0 end) as last_30,
SUM(CASE WHEN num_days > 30 AND numdays < 61 then 1 else 0 end) as btwn_31_60,
SUM(CASE WHEN num_days >= 61 then 1 else 0 end) as btwn_61_90
FROM
YourTableName -- You didn't provide a tablename
GROUP BY
user_id
Since your values are floating point and not integer, you may need to adjust the values used for the day ranges to work with your specific requirements.

SQLite: Multiple aggregate columns

I'm a little new to SQL world and still learning the ins and outs of the language.
I have a table with an id, dayOfWeek, and a count for each day. So any given id might appear in the table up to seven times, with a count of events for each day for each id. I'd like to restructure the table to have a single row for each id with a column for each day of the week, something like the following obviously incorrect query:
SELECT id, sum(numEvents where dayOfWeek = 0), sum(numEvents where dayOfWeek = 1) ... from t;
Is there a solid way to approach this?
EDIT:
I'm worried I may not have been very clear. The table would ideally be structured something like this:
id | Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday
0 | 13 | 45 | 142 | 3 | 36 | 63 | 15336
1 | 17 | 25 | 45 | 364 | 37 | 540 | 0
So event 0 occurred 13 times on Sunday, 45 on Monday, etc... My current table looks like this:
id | dayOfWeek | count
0 | 0 | 13
0 | 1 | 45
0 | 2 | 142
0 | 3 | 3
0 | 4 | 36
0 | 5 | 63
0 | 6 | 15336
1 | 0 | 17
1 | 1 | 25
...
Hope that helps clear up what I'm after.

The following is verbose, but should work (generic ideone sql demo unfortunately SqlLite on SqlFiddle is down at the moment):
SELECT id,
SUM(case when dayofweek = 1 then numevents else 0 end) as Day1Events,
SUM(case when dayofweek = 2 then numevents else 0 end) as Day2Events,
SUM(case when dayofweek = 3 then numevents else 0 end) as Day3Events
--, etc...
FROM EventTable
GROUP BY ID;

SELECT dayOfWeek, sum(numEvents) as numberOfEvents
FROM t
GROUP BY dayOfWeek;

How to insert additional values in between a GROUP BY

i am currently making a monthly report using MySQL. I have a table named "monthly" that looks something like this:
id | date | amount
10 | 2009-12-01 22:10:08 | 7
9 | 2009-11-01 22:10:08 | 78
8 | 2009-10-01 23:10:08 | 5
7 | 2009-07-01 21:10:08 | 54
6 | 2009-03-01 04:10:08 | 3
5 | 2009-02-01 09:10:08 | 456
4 | 2009-02-01 14:10:08 | 4
3 | 2009-01-01 20:10:08 | 20
2 | 2009-01-01 13:10:15 | 10
1 | 2008-12-01 10:10:10 | 5
Then, when i make a monthly report (which is based by per month of per year), i get something like this.
yearmonth | total
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-07 | 54
2009-10 | 5
2009-11 | 78
2009-12 | 7
I used this query to achieved the result:
SELECT substring( date, 1, 7 ) AS yearmonth, sum( amount ) AS total
FROM monthly
GROUP BY substring( date, 1, 7 )
But I need something like this:
yearmonth | total
2008-01 | 0
2008-02 | 0
2008-03 | 0
2008-04 | 0
2008-05 | 0
2008-06 | 0
2008-07 | 0
2008-08 | 0
2008-09 | 0
2008-10 | 0
2008-11 | 0
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-05 | 0
2009-06 | 0
2009-07 | 54
2009-08 | 0
2009-09 | 0
2009-10 | 5
2009-11 | 78
2009-12 | 7
Something that would display the zeroes for the month that doesnt have any value. Is it even possible to do that in a MySQL query?

You should generate a dummy rowsource and LEFT JOIN with it:
SELECT *
FROM (
SELECT 1 AS month
UNION ALL
SELECT 2
…
UNION ALL
SELECT 12
) months
CROSS JOIN
(
SELECT 2008 AS year
UNION ALL
SELECT 2009 AS year
) years
LEFT JOIN
mydata m
ON m.date >= CONCAT_WS('.', year, month, 1)
AND m.date < CONCAT_WS('.', year, month, 1) + INTERVAL 1 MONTH
GROUP BY
year, month
You can create these as tables on disk rather than generate them each time.
MySQL is the only system of the major four that does have allow an easy way to generate arbitrary resultsets.
Oracle, SQL Server and PostgreSQL do have those (CONNECT BY, recursive CTE's and generate_series, respectively)

Quassnoi is right, and I'll add a comment about how to recognize when you need something like this:
You want '2008-01' in your result, yet nothing in the source table has a date in January, 2008. Result sets have to come from the tables you query, so the obvious conclusion is that you need an additional table - one that contains each month you want as part of your result.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Creating columns by subaggregating by condition in Snowflake SQL - sql

Related

Query to get the count of data for particular customer with all other data from table

Showing date even zero value SQL

What is the most performant way to rewrite a correlated subquery in the SELECT clause?

SQLite: Multiple aggregate columns

How to insert additional values in between a GROUP BY

Categories

Resources