i am trying to use the avg() function in a subquery after using a count in the inner query but i cannot seem to get it work in SQL - sql

my table name is CustomerDetails and it has the following columns:
customer_id, login_id, session_id, login_date
i am trying to write a query that calculates the average number of customers login in per day.
i tried this:
select avg(session_id)
from CustomerDetails
where exists (select count(session_id) from CustomerDetails as 'no_of_entries')
.
but then i realized it was going straight to the column and just calculating the average of that column but that's not what i want to do. can someone help me?
thanks

The first thing you need to do is get logins per day:
SELECT login_date, COUNT(*) AS loginsPerDay
FROM CustomerDetails
GROUP BY login_date
Then you can use that to get average logins per day:
SELECT AVG(loginsPerDay)
FROM (
SELECT login_date, COUNT(*) AS loginsPerDay
FROM CustomerDetails
GROUP BY login_date
)
If your login_date is a DATE type you're all set. If it has a time component then you'll need to truncate it to date only:
SELECT AVG(loginsPerDay)
FROM (
SELECT CAST(login_date AS DATE), COUNT(*)
FROM CustomerDetails
GROUP BY CAST(login_date AS DATE)
)

i am trying to write a query that calculates the average number of customers login in per day.
Count the number of customers. Divide by the number of days. I think that is:
select count(*) * 1.0 / count(distinct cast(login_date as date))
from customerdetails;

I understand that you want do count the number of visitors per day, not the number of visits. So if a customer logged twice on the same day, you want to count him only once.
If so, you can use distinct and two levels of aggregation, like so:
select avg(cnt_visitors) avg_cnt_vistors_per_day
from (
select count(distinct customer_id) cnt_visitors
from customer_details
group by cast(login_date as date)
) t
The inner query computes the count of distinct customers for each day, he outer query gives you the overall average.

Related

SQL count of distinct values over two columns

I have the following query that allows me to aggregate the number of unique sellers/buyers for every single day from the Flipside API:
SELECT
date_trunc('day', block_timestamp) AS date,
COUNT(DISTINCT(seller_address)) AS unique_sellers,
COUNT(DISTINCT(buyer_address)) AS unique_buyers
FROM ethereum.core.ez_nft_sales
GROUP BY date
Now, I've been trying a lot of different things, but I can't for the life of me figure out how it would be possible to get the number of unique active addresses on a given day as I would need to somehow merge the sellers and buyers and then count the unique addresses. I would greatly appreciate any kind of help. Thanks in advance!
This is how I managed to solve the issue by using a separate query for the unique_active and merging them:
WITH
other_values AS (
SELECT
date_trunc('day', block_timestamp) AS date,
COUNT(DISTINCT seller_address) AS unique_sellers,
COUNT(DISTINCT buyer_address) AS unique_buyers
FROM ethereum.core.ez_nft_sales
GROUP BY date
),
unique_addresses AS (
SELECT
date,
COUNT(*) as unique_active
FROM (
SELECT
date_trunc('day', block_timestamp) as date,
seller_address as address
FROM ethereum.core.ez_nft_sales
GROUP BY date, seller_address
UNION
SELECT
date_trunc('day', block_timestamp) as date,
buyer_address as address
FROM ethereum.core.ez_nft_sales
GROUP BY date, buyer_address
)
GROUP BY date
)
SELECT * FROM other_values
LEFT JOIN unique_addresses
ON other_values.date = unique_addresses.date
ORDER BY other_values.date DESC

How to reference fields from table created in sub-query's of large JOIN

I am writing a large query with many JOINs (shortened it in example here) and I am trying to reference values form other sub-queries but can't figure out how.
This is my example query:
DROP TABLE IF EXISTS breakdown;
CREATE TEMP TABLE breakdown AS
SELECT * FROM
(
SELECT COUNT(DISTINCT s_id) AS before, date_trunc('day', time) AS day FROM table_a
WHERE date_trunc('sec',earliest) < date_trunc('sec',time) GROUP BY day
)
JOIN
(
SELECT ROUND(before * 100.0 / total, 1) AS Percent_1, day
FROM breakdown
GROUP BY day
) USING (day)
JOIN
(
SELECT COUNT(DISTINCT s_id) AS equal, date_trunc('day', time) AS day FROM table_a
WHERE date_trunc('sec',earliest) = date_trunc('sec',time) GROUP BY day
) USING (day)
JOIN
(
SELECT COUNT(DISTINCT s_id) AS after, date_trunc('day', time) AS day FROM table_a
WHERE date_trunc('sec',earliest) > date_trunc('sec',time) GROUP BY day
) USING (day)
JOIN
(
SELECT COUNT(DISTINCT s_id) AS total, date_trunc('day', earliest) AS day
FROM first
GROUP BY 2
) USING (day)
ORDER BY day;
SELECT * FROM breakdown ORDER BY day;
The last query gives me the total and for each of the previous subqueries I want to get the percentages as well.
I found the code for getting the percentage (second JOIN) but I don't know how to reference the values from the other tables.
E.g. for getting the percentage from the first query I want to use the COUNT of the first query which I renamed before and then divide that by the COUNT of the last query which I renamed total (If there is an easier solution to do this i.e. get the percentage for each of the sub-queries please let me know), But I cant seem to find how to reference them. I tried adding AS x to the end of each subquery and calling by that (x.total) as well as trying to reference via the parent table (breakdown.total) but neither worked.
How can I do this without changing my table too much as it is a long table with a lot of sub-queries.
This is what my table looks like I would like to add percentage for each column
Using redshift BTW.
Thanks
I'm a little confused by all that is going on as you drop table breakdown and then in the second subquery of the create table you reference breakdown. I suspect that there are some issues in the provided sample of SQL. Please update if there are issues.
For a number of these subqueries it looks like you are using a subquery where a case statement will do. In Redshift you don't want to scan the same table over and over if you can prevent it. For example if we look at the the 3rd and 4th subqueries you can replace these with one query. Also in these cases I like to use the DECODE() statement rather than CASE since it is more readable in these simple cases.
(
SELECT COUNT(DISTINCT s_id) AS equal, date_trunc('day', time) AS day
FROM table_a
WHERE date_trunc('sec',earliest) = date_trunc('sec',time)
GROUP BY day
) USING (day)
JOIN
(
SELECT COUNT(DISTINCT s_id) AS after, date_trunc('day', time) AS day
FROM table_a
WHERE date_trunc('sec',earliest) > date_trunc('sec',time)
GROUP BY day
)
Becomes:
(
SELECT COUNT(DISTINCT DECODE(date_trunc('sec',earliest) = date_trunc('sec',time), true, s_id, NULL)) AS equal,
COUNT(DISTINCT DECODE(date_trunc('sec',earliest) > date_trunc('sec',time), true, s_id, NULL)) AS after,
date_trunc('day', time) AS day
FROM table_a
GROUP BY day
)
Read each table once (if at all possible) and calculate the desired results. then you will have all your values in one layer of query and can reference these new values. This will be faster (especially on Redshift).
=============================
Expanding based on comment made by poster.
It appears that using DECODE() and referencing derived columns in a single query can produce what you want. I don't have your data so I cannot test this but here is what I'd want to move to:
SELECT
COUNT(DISTINCT DECODE(date_trunc('sec',earliest) < date_trunc('sec',time), true, s_id)) AS before,
ROUND(before * 100.0 / total, 1) AS Percent_1,
COUNT(DISTINCT DECODE(date_trunc('sec',earliest) = date_trunc('sec',time), true, s_id)) AS equal,
COUNT(DISTINCT DECODE(date_trunc('sec',earliest) > date_trunc('sec',time), true, s_id)) AS after,
COUNT(DISTINCT s_id) AS total
FROM table_a
GROUP BY date_trunc('day', time);
This should be a complete replacement for the SELECT currently inside your CREATE TEMP TABLE. However, I don't have sample data so this is untested.

Using Subquery in Sequence function PrestoSQL

Use case -
I am trying to find weekly frequency of a customer from a dataset. Now, not all customers have "events" happening in all of the weeks, and I would need to fill them in with zero values for the "count" column.
I was trying to do this using the sequence function of PrestoSQL. However, this would need me to get the value of max week from the customer's orders itself ( I don't want to hardcode this since the result would be going into a BI tool and I dont want to update this manually every week )
with all_orders_2020 as (select customer, cast(date_parse(orderdate, '%Y-%m-%d') as date) as order_date
from orders
where orderdate > '2020-01-01' and customer in (select customer from some_customers)),
orders_with_week_number as (select *, week(order_date) as week_number from all_orders_2020),
weekly_count as (select customer, week_number, count(*) as ride_count from orders_with_week_number
where customer = {{some_customer}} group by customer, week_number)
SELECT
week_number
FROM
(VALUES
(SEQUENCE(1,(select max(week_number) from weekly_count)))
) AS t1(week_array)
CROSS JOIN
UNNEST(week_array) AS t2(week_number)
Presto complaints about this saying -
Unexpected subquery expression in logical plan: (SELECT "max"(week_number)
FROM
weekly_count
)
Any clues how this can be done ?
Had a similar use case and followed the example from here: https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html
Bring the SEQUENCE out and define the subquery using a WITH clause:
WITH dataset AS (
SELECT SEQUENCE(1, (SELECT MAX(week_number) FROM weekly_count)) AS week_array
)
SELECT week_number FROM dataset
CROSS JOIN UNNEST(week_array) as t(week_number)

Number of Unique Visits in SQL

I am using Postgres 9.1 to count the number of unique patient visits over a given period of time, using the invoices entered for each patient.
I have two columns, transactions.ptnumber and transactions.dateofservice, and can calculate the the patient visits in the following manner:
select count(*)
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
The problem is that sometimes one patient might get two invoices for the same day, but that should be counted as only one patient visit.
If I use SELECT DISTINCT or GROUP BY on the column transactions.ptnumber, that would count the number of patients who were seen (but not the number of times they were seen).
If I use SELECT DISTINCT or GROUP BY on the column transactions.dateofservice, that would count the number of days that had an invoice.
Not sure how to approach this.
This will return unique patients per day.
select count(distinct transactions.ptnumber) as cnt
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
group by transactions.dateofservice
You can sum them up to get the unique patients for the whole period
select sum(cnt) from (
select count(distinct transactions.ptnumber) as cnt
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
group by transactions.dateofservice
)
You might use a subselect, consider
Select count(*)
from (select ptnumber, dateofservice
from transactions
where dateofservice between '2012-01-01' and '2013-12-31'
group by ptnumber, dateofservice
)
You may also want to make this a stored procedure so you can pass in the date range.
There are multiple ways to achieve this, but you could use the WITH clause to construct a temporary table that contains the unique visits, then count the results !
WITH UniqueVisits AS
(SELECT DISTINCT transactions.ptnumber, transactions.dateofservice
FROM transactions
WHERE transactions.dateofservice between '2012-01-01' and '2013-12-31')
SELECT COUNT(*) FROM UniqueVisits

SQL: Need to SUM on results that meet a HAVING statement

I have a table where we record per user values like money_spent, money_spent_on_candy and the date.
So the columns in this table (let's call it MoneyTable) would be:
UserId
Money_Spent
Money_Spent_On_Candy
Date
My goal is to SUM the total amount of money_spent -- but only for those users where they have spent more than 10% of their total money spent for the date range on candy.
What would that query be?
I know how to select the Users that have this -- and then I can output the data and sum that by hand but I would like to do this in one single query.
Here would be the query to pull the sum of Spend per user for only the users that have spent > 10% of their money on candy.
SELECT
UserId,
SUM(Money_Spent),
SUM(Money_Spent_On_Candy) / SUM(Money_Spent) AS PercentCandySpend
FROM MoneyTable
WHERE DATE >= '2010-01-01'
HAVING PercentCandySpend > 0.1;
You couldn't do this with a single query. You'd need a query that could reach back in time and retroactively filter the source table to handle only users with 10% candy spending. Luckily, that's kind of what sub-queries do:
SELECT SUM(spent) FROM (
SELECT SUM(Money_Spent) AS spent
FROM MoneyTable
WHERE (DATE >= '2010-01-01')
GROUP BY UserID
HAVING (SUM(Money_Spent_On_Candy)/SUM(Money_Spent)) > 0.1
);
The inner query does the heavy lifting of figuring out what the "10%" users spent, and then the outer query uses the sub-query as a virtual table to sum up the per-user Money_Spent sums.
Of course, this only works if you need ONLY the global total Money_Spent. If you end up needing the per-user sums as well, then you'd be better off just running the inner query and doing the global total in your application.
You can use common table expressions. Like this:
WITH temp AS (SELECT
UserId,
SUM(Money_Spent) AS MoneySpent,
SUM(Money_Spent_On_Candy)/SUM(Money_Spent) AS PercentCandySpend
FROM MoneyTable
WHERE DATE >= '2010-01-01'
HAVING PercentCandySpend > 0.1)
SELECT
UserId
SUM(MoneySpent)
FROM UserId
Or you can use a derived table:
SELECT SUM(Total_Money_Spent)
FROM ( SELECT UserId, Total_Money_Spent = SUM(Money_Spent), SUM(Money_Spent_On_Candy)/SUM(Money_Spent) AS PercentCandySpend
FROM MoneyTable
WHERE DATE >= '2010-01-01'
HAVING PercentCandySpend > 0.1 ) x;