SQL to find unique counts between two date fields - sql

I was reading this but can't manage to hack it to work on my own problem.
My data has the following fields, in a single table in Postgres:
Seller_id (varchar) (contains_duplicates).
SKU (varchar) (contains duplicates).
selling_starts (datetime).
selling_ends (datetime).
I want to query it so I get the count of unique SKUs on sale, per seller, per day. If there are any null days I don't need these.
I've tried before querying it by using another table to generate a list of unique "filler" dates and then joining it to where the date is more than the selling_starts and less than the selling_ends fields. However, this is so computationally expensive that I get timeout errors.
I'm vaguely aware there are probably more efficient ways of doing this via with statements to create CTEs or some sort of recursive function, but I don't have any experience of this.
Any help much appreciated!

try this :
WITH list AS
( SELECT generate_series(date_trunc('day', min(selling_starts)), max(selling_ends), '1 day') AS ref_date
FROM your_table
)
SELECT seller_id
, l.ref_date
, count(DISTINCT sku) AS sku_count
FROM your_table AS t
INNER JOIN list AS l
ON t.selling_starts <= l.ref_date
AND t.selling_ends > l.ref_date
GROUP BY seller_id, l.ref_date
If your_table is large, you should create indexes to accelerate the query.

Related

Delete duplicates using dense rank

I have a sales data table with cust_ids and their transaction dates.
I want to create a table that stores, for every customer, their cust_id, their last purchased date (on the basis of transaction dates) and the count of times they have purchased.
I wrote this code:
SELECT
cust_xref_id, txn_ts,
DENSE_RANK() OVER (PARTITION BY cust_xref_id ORDER BY CAST(txn_ts as timestamp) DESC) AS rank,
COUNT(txn_ts)
FROM
sales_data_table
But I understand that the above code would give an output like this (attached example picture)
How do I modify the code to get an output like :
I am a beginner in SQL queries and would really appreciate any help! :)
This would be an aggregation query which changes the table key from (customer_id, date) to (customer_id)
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(txn_ts) as count_purchase_dates
FROM
sales_data_table
GROUP BY
cust_xref_id
You are looking for last purchase date and count of distinct transaction dates ( like if a person buys twice, it should be considered as one single time).
Although you mentioned you want count of dates but sample data shows you want count of distinct dates - customer 284214 transacted 9 times but distinct will give you 7.
So, here is the SQL you can use to get your result.
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(distinct txn_ts) as count_purchase_dates -- Pls note distinct will count distinct dates
FROM sales_data_table
GROUP BY 1

Joining 2 tables with a timestamp of YEAR for the first item purchased by customer

I have 2 tables "items" and "customers".
In both tables, "customer_id" is present and a single customer can have more than 1 item. In the "items" table there is also a timestamp field called "date_created" when an item was purchased.
I want to construct a query that can return each customer_id and item_id associated with the first item each customer bought in a specific year, let's say 2020.
My approach was
SELECT customer_id, items
INNER JOIN items ON items.customer_id=customers.customer_id
and then try to use the EXTRACT function to take care of the first item each customer bought in 2020 but I can't seem to extract the first item only for the specific year. I would really appreciate some help.
I am using PostgreSQL. Thank you.
Just use distinct on:
select distinct on (customer_id) i.*
from items i
where date_created >= date '2020-01-01' and
date_created < date '2021-01-01'
order by customer_id, date_created;
First, note the use of direct date comparisons. This makes it easier for the optimizer to choose the best execution plan.
distinct on is a handy Postgres extension that returns the first row encountered for the keys in parentheses, which must be the first keys in the order by. "First" is based on the subsequent order by keys.

Bigquery - how to aggregate data based on conditions

I have a simple table like the following, which has product, price, cost and category. price and cost can be null.
And this table is being updated from time to time. Now I want to have a daily summary of the table content grouped by category, to see in each category, how many products that has no price, and how many has a price, and how many products has a price that is higher than the cost, so the result table would look like the following:
I think I can get a query running everyday by setting up query re-run schedule in bigQuery, so I can have three rows of data appended to the result table everyday.
But the problem is, how can I get those three rows? I know I can group by, but how do I get the count with those conditions like not null, larger than, etc.
You seem to want window functions:
select t.*
countif(price is nuill) over (partition by date) as products_no_price,
countif(price <= cost) over (partition by date) as products_price_lower_than_cost
from t;
You can run this code on the table that has date column. In fact, you don't need to store the last two columns.
If you want to insert the first table into the second, then there is no date and you can simply use:
select t.*
countif(price is nuill) over () as products_no_price,
countif(price <= cost) over () as products_price_lower_than_cost
from t;

Make table ID appear as a column and select across all tables

I've been requested by my superiors to write a query that will search every table in a database (each representative of a road and their total counts of traffic) and take the total counts by hour of motorcycles. Here's what I have so far whilst testing on one table:
WITH
totalCount AS
(
SELECT DATEDIFF(dd,0,event_time) AS DaySerial,
DATEPART(dd,event_time) AS theDay,
DATEDIFF(mm,0,event_time) AS MonthSerial,
DATEPART(mm,event_time) AS MonthofYear,
DATEDIFF(hh,0,event_time) AS HourSerial,
DATEPART(hh,event_time) AS Hour,
COUNT(*) AS HourlyCount,
DATEDIFF(yy,0,event_time) AS YearSerial,
DATEPART(yy,event_time) AS theYear
FROM [RUD].dbo.[10011E]
WHERE length <='1.7'
GROUP BY DATEDIFF(hh,0,event_time),
DATEPART(hh,event_time),
DATEDIFF(dd,0,event_time),
DATEPART(dd,event_time),
DATEDIFF(mm,0,event_time),
DATEPART(mm,event_time),
DATEDIFF(yy,0,event_time),
DATEPART(yy,event_time)
)
SELECT
theYear,
MonthofYear,
theDay,
Hour,
AVG(HourlyCount) AS Avg_Count
FROM
totalCount
GROUP BY
theYear,
MonthofYear,
theDay,
Hour
ORDER BY
theYear,
MonthofYear,
theDay,
Hour
Now I'm sure some of this is redundant or not needed, that's ok for now (I'm new to SQL btw, which is why some of this will be redundant). Basically as it stands, I list the year, month, date, hour and hourly count of motorcycles for one road. Now my two questions:
How do I take this query and make it so that it searches across every single table in the RUD database? Do I just need to list them all and UNION them, or is there a quicker way?
I realise if I search through every table gathering only the above (year, month, day, hour, hourly count) I will end up with the right data but with no way to distinguish which road all the counts are coming from. Is there a way to select the table ID (in this example, 10011E is the ID, and is the assigned name for a specific road) and place it in a column next to the rows that were selected from it?
If anyone needs clarification on what I mean, please let me know! Thanks!
One option would be to use UNION ALL and add an additional column for which source. You'll have to write out each of your tables in this case, but it's perhaps your fastest option:
SELECT ID, 'YourTable' TableName
FROM YourTable
UNION ALL
SELECT ID, 'YourOtherTable'
FROM YourOtherTable
....
Alternatively, dynamic sql could produce you the same results -- you might not have to type out all your table names, but it comes with a performance hit.

How to produce a distinct count of records that are stored by day by month

I have a table with several "ticket" records in it. Each ticket is stored by day (i.e. 2011-07-30 00:00:00.000) I would like to count the unique records in each month by year I have used the following sql statement
SELECT DISTINCT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM
NAT_JOBLINE
GROUP BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
ORDER BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
This does produce a count but it is wrong as it picks up the unique tickets for every day. I just want a unique count by month.
Try combining Year and Month into one field, and grouping on that new field.
You may have to cast them to varchar to ensure that they don't simply get added together. Or.. you could multiple through the year...
SELECT
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE),
count(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE GROUP BY
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE)
Presuming that TICKETID is not a primary or unique key, but does appear multiple times in table NAT_JOBLINE, that query should work. If it is unique (does not occur in more than 1 row per value), you will need to select on a different column, one that uniquely identifies the "entity" that you want to count, if not each occurance/instance/reference of that entity.
(As ever, it is hard to tell without working with the actual data.)
I think you need to remove the first distinct. You already have the group by. If I was the first Distict I would be confused as to what I was supposed to do.
SELECT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY YEAR(TICKETDATE), MONTH(TICKETDATE)
ORDER BY YEAR(TICKETDATE), MONTH(TICKETDATE)
From what I understand from your comments to Phillip Kelley's solution:
SELECT TICKETDATE, COUNT(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY TICKETDATE
should do the trick, but I suggest you update your question.