SQL count of distinct values over two columns - sql

I have the following query that allows me to aggregate the number of unique sellers/buyers for every single day from the Flipside API:
SELECT
date_trunc('day', block_timestamp) AS date,
COUNT(DISTINCT(seller_address)) AS unique_sellers,
COUNT(DISTINCT(buyer_address)) AS unique_buyers
FROM ethereum.core.ez_nft_sales
GROUP BY date
Now, I've been trying a lot of different things, but I can't for the life of me figure out how it would be possible to get the number of unique active addresses on a given day as I would need to somehow merge the sellers and buyers and then count the unique addresses. I would greatly appreciate any kind of help. Thanks in advance!

This is how I managed to solve the issue by using a separate query for the unique_active and merging them:
WITH
other_values AS (
SELECT
date_trunc('day', block_timestamp) AS date,
COUNT(DISTINCT seller_address) AS unique_sellers,
COUNT(DISTINCT buyer_address) AS unique_buyers
FROM ethereum.core.ez_nft_sales
GROUP BY date
),
unique_addresses AS (
SELECT
date,
COUNT(*) as unique_active
FROM (
SELECT
date_trunc('day', block_timestamp) as date,
seller_address as address
FROM ethereum.core.ez_nft_sales
GROUP BY date, seller_address
UNION
SELECT
date_trunc('day', block_timestamp) as date,
buyer_address as address
FROM ethereum.core.ez_nft_sales
GROUP BY date, buyer_address
)
GROUP BY date
)
SELECT * FROM other_values
LEFT JOIN unique_addresses
ON other_values.date = unique_addresses.date
ORDER BY other_values.date DESC

Related

Using Subquery in Sequence function PrestoSQL

Use case -
I am trying to find weekly frequency of a customer from a dataset. Now, not all customers have "events" happening in all of the weeks, and I would need to fill them in with zero values for the "count" column.
I was trying to do this using the sequence function of PrestoSQL. However, this would need me to get the value of max week from the customer's orders itself ( I don't want to hardcode this since the result would be going into a BI tool and I dont want to update this manually every week )
with all_orders_2020 as (select customer, cast(date_parse(orderdate, '%Y-%m-%d') as date) as order_date
from orders
where orderdate > '2020-01-01' and customer in (select customer from some_customers)),
orders_with_week_number as (select *, week(order_date) as week_number from all_orders_2020),
weekly_count as (select customer, week_number, count(*) as ride_count from orders_with_week_number
where customer = {{some_customer}} group by customer, week_number)
SELECT
week_number
FROM
(VALUES
(SEQUENCE(1,(select max(week_number) from weekly_count)))
) AS t1(week_array)
CROSS JOIN
UNNEST(week_array) AS t2(week_number)
Presto complaints about this saying -
Unexpected subquery expression in logical plan: (SELECT "max"(week_number)
FROM
weekly_count
)
Any clues how this can be done ?
Had a similar use case and followed the example from here: https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html
Bring the SEQUENCE out and define the subquery using a WITH clause:
WITH dataset AS (
SELECT SEQUENCE(1, (SELECT MAX(week_number) FROM weekly_count)) AS week_array
)
SELECT week_number FROM dataset
CROSS JOIN UNNEST(week_array) as t(week_number)

adding all columns from mutiple tables

I have a simple question.
I need to count all records from multiple tables with day and hour and add all of them together in a single final table.
So the query for each tab is something like this
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_1
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_2
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_3
and so on so forth
I would like to combine all the results showing number of total records for each day and hour from these tables.
Expected results will be like this
date, hour, number of records of table 1, number of records of table 2, number of records of table 3 ........
What would the most optimum SQL query for this?
Probably the simplest way is to union them together and aggregation:
select timestamp_trunc(timestamp, hour) as hh,
countif(which = 1) as num_1,
countif(which = 2) as num_2
from ((select timestamp, 1 as which
from table_1
) union all
(select timestamp, 2 as which
from table_2
) union all
. . .
) t
group hh
order by hh;
You are using timestamp_trunc(). It returns a timestamp truncated to the hour -- there is no need to also include the date.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
TIMESTAMP_TRUNC(TIMESTAMP, DAY) day,
EXTRACT(HOUR FROM TIMESTAMP) hour,
COUNT(*) cnt,
_TABLE_SUFFIX AS table
FROM `project.dataset.table_*`
GROUP BY day, hour, table

i am trying to use the avg() function in a subquery after using a count in the inner query but i cannot seem to get it work in SQL

my table name is CustomerDetails and it has the following columns:
customer_id, login_id, session_id, login_date
i am trying to write a query that calculates the average number of customers login in per day.
i tried this:
select avg(session_id)
from CustomerDetails
where exists (select count(session_id) from CustomerDetails as 'no_of_entries')
.
but then i realized it was going straight to the column and just calculating the average of that column but that's not what i want to do. can someone help me?
thanks
The first thing you need to do is get logins per day:
SELECT login_date, COUNT(*) AS loginsPerDay
FROM CustomerDetails
GROUP BY login_date
Then you can use that to get average logins per day:
SELECT AVG(loginsPerDay)
FROM (
SELECT login_date, COUNT(*) AS loginsPerDay
FROM CustomerDetails
GROUP BY login_date
)
If your login_date is a DATE type you're all set. If it has a time component then you'll need to truncate it to date only:
SELECT AVG(loginsPerDay)
FROM (
SELECT CAST(login_date AS DATE), COUNT(*)
FROM CustomerDetails
GROUP BY CAST(login_date AS DATE)
)
i am trying to write a query that calculates the average number of customers login in per day.
Count the number of customers. Divide by the number of days. I think that is:
select count(*) * 1.0 / count(distinct cast(login_date as date))
from customerdetails;
I understand that you want do count the number of visitors per day, not the number of visits. So if a customer logged twice on the same day, you want to count him only once.
If so, you can use distinct and two levels of aggregation, like so:
select avg(cnt_visitors) avg_cnt_vistors_per_day
from (
select count(distinct customer_id) cnt_visitors
from customer_details
group by cast(login_date as date)
) t
The inner query computes the count of distinct customers for each day, he outer query gives you the overall average.

Looking to create a query in SQL that states

i am relatively new to SQL and I'm looking to create a query that states how many records were created by those other than a certain "good" group of users (userids). If possible grouped by month as well. Any suggestions? I have some basic logic set out below.
Table is called newcompanies
SELECT COUNT(record_num), userid
FROM Newcompanies
WHERE userID <> (certain group of userIds)
GROUP BY Month
Will i be required to create a second table where the group of "good" userids is held
There are a few ways to do this. Without knowing your exact columns, this will be a rough estimate.
SELECT id,
DATEPART(MONTH, created_date) AS created_month,
COUNT(*)
FROM your_table
WHERE id NOT IN(
--hardcode userID's here
)
GROUP BY
id,
DATEPART(MONTH, created_date)
Or you could have a table with your good id's and then exclude those.
SELECT id,
DATEPART(MONTH, created_date) AS created_month,
COUNT(*)
FROM your_table
WHERE id NOT IN(
SELECT id
from your_good_id_table
)
GROUP BY
id,
DATEPART(MONTH, created_date)
-- if month is not a field in the table you will have to do a function to parse out the month that will depend on the sql database you are using, if it is MS SQL you can do Month(datefield)
SELECT COUNT(record_num), userid, Month
FROM Newcompanies
WHERE userID NOT IN (
Select UserID
from ExcludeTheseUserIDs
)
GROUP BY Month, userid

Number of Unique Visits in SQL

I am using Postgres 9.1 to count the number of unique patient visits over a given period of time, using the invoices entered for each patient.
I have two columns, transactions.ptnumber and transactions.dateofservice, and can calculate the the patient visits in the following manner:
select count(*)
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
The problem is that sometimes one patient might get two invoices for the same day, but that should be counted as only one patient visit.
If I use SELECT DISTINCT or GROUP BY on the column transactions.ptnumber, that would count the number of patients who were seen (but not the number of times they were seen).
If I use SELECT DISTINCT or GROUP BY on the column transactions.dateofservice, that would count the number of days that had an invoice.
Not sure how to approach this.
This will return unique patients per day.
select count(distinct transactions.ptnumber) as cnt
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
group by transactions.dateofservice
You can sum them up to get the unique patients for the whole period
select sum(cnt) from (
select count(distinct transactions.ptnumber) as cnt
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
group by transactions.dateofservice
)
You might use a subselect, consider
Select count(*)
from (select ptnumber, dateofservice
from transactions
where dateofservice between '2012-01-01' and '2013-12-31'
group by ptnumber, dateofservice
)
You may also want to make this a stored procedure so you can pass in the date range.
There are multiple ways to achieve this, but you could use the WITH clause to construct a temporary table that contains the unique visits, then count the results !
WITH UniqueVisits AS
(SELECT DISTINCT transactions.ptnumber, transactions.dateofservice
FROM transactions
WHERE transactions.dateofservice between '2012-01-01' and '2013-12-31')
SELECT COUNT(*) FROM UniqueVisits