How to show only records that have ocurred after an events - . Redshift/Postgresql - - sql

This may sound a little convoluted, so please bear with me.
I have 2 tables.
One is my sales which has columns:
ordernumber, storenumber, customerid, ordervalue, purchasedate
The other is my marketing events table which has many columns, but I am using:
eventtype,subscriberkey and eventdate
I have union the tables this is their current structure
marketing.eventtype = sales.ordernumber, marketing.subscriberkey =sales.customerid and sales.purchasedate = marketing.eventdate. Order value and store number are null if empty.
I only want to show the sales that have come directly after a eventtype or multiple event type - so a sale is shown as the last record after the list is ordered by customerid/subscriberid.
I also want to limit it to any marketing events between a certain date and give a window of conversion in days for the sale.
SELECT ordernumber,
storenumber,
customerid,
ordervalue,
purchasedate FROM table.sales oh
WHERE purchasedate > '2016-01-16'
AND purchasedate < '2016-01-24'
AND ordervalue > 0
AND customerid IN (SELECT subscriberkey
FROM table.marketing
WHERE eventtype = 'Open'
AND eventdate < '2016-01-19'
AND eventdate > '2016-01-15')
UNION
SELECT eventtype AS ordernumber,
NULL AS storenumber,
subscriberkey AS customerid,
NULL AS ordervalue,
eventdate AS purchasedate
FROM table.marketing
WHERE eventtype = 'Open'
AND eventdate < '2016-01-19'
AND eventdate > '2016-01-15'
AND subscriberkey IN (SELECT customerid
FROM table.sales oh
WHERE purchasedate > '2016-01-16'
AND purchasedate < '2016-01-24'
)
ORDER BY customerid,
purchasedate ASC

Let's call the result of the query you have "events" (it's going to be used as a subquery). Then showing only the sales which came after a marketing event can be done via a window function.
First, we copy the data from following row in the previous row. Ordered in reverse, we copy marketing data into sales row:
SELECT LEAD(events.ordernumber, 1) OVER (PARTITION BY customerid ORDER BY purchasedate desc) as following_ordernumber,
LEAD(events.purchasedate, 1) OVER (PARTITION BY customerid ORDER BY purchasedate desc) as following_purchasedate
By grouping by customer we are isolating a group of customer events, and by ordering in reverse by date, we can use LEAD (only works forward) to extract the marketing data, and copy it into the sales rows. Let's call this new query "enriched_sales".
Then, we filter out everything but sales events:
WHERE enriched_sales.ordernumber != 'Open' and enriched_sales.following_ordernumber = 'Open'
In reverse order, we want sales rows which are immediately followed by marketing rows. After copying data from such marketing rows, ordernumber of the sales row would not be 'Open', while copied value will be 'Open'.
Finally, we use the dates to compute the conversion window:
SELECT datediff(days, following_purchasedate, purchasedate) ...
If you post the table definitions and input/output data I can produce the whole query, but hopefully you get the idea.

Related

BigQuery: Returning Records for a list of specific customers

I have a list of CustomerId in an Excel sheet that I want to use as a filter in BigQuery.
For example:
SELECT CustomerId, Status, OrderTotal, StoreCode, PaymentAmount FROM Orders
WHERE OrderPlacedTime > '2022-01-01'
AND CustomerId = '1,2,3 ... 1000'
Is there an easier way to input all these CustomerId values? Or do I need to transpose the IDs and separate them with a comma in order for the query to run?
Use below as a direction
SELECT CustomerId, Status, OrderTotal, StoreCode, PaymentAmount
FROM Orders
WHERE OrderPlacedTime > '2022-01-01'
AND '' || CustomerId IN UNNEST(SPLIT('1,2,3 ... 1000'))

PARTITION BY first use of a particular product

I'm trying to produce a table that lists the month, account and product name from our billing database. However, I also want to understand (for subsequent cohort analysis) what the earliest use is of "Product A" for each line item too. I was hoping I could do the following:
SELECT
Month,
AccountID,
ProductName,
SUM(NetRevenue) AS NetRevenue,
MIN(Month) OVER(PARTITION BY AccountID, 'Product A') AS EarliestUse
FROM
<<my-billing-table>>
WHERE
NetRevenue > 0
AND AccountID IN (
SELECT DISTINCT AccountID
FROM <<my-billing-table>>
WHERE ProductName = 'Product A' AND NetRevenue > 0
)
GROUP BY 1,2,3
...but it seems that just using "Product A" within the OVER clause does not have the desired effect (it seems to just return the first month for AccountID).
While the syntax is fine and the query runs, I'm obviously missing something regarding PARTITIONing the OVER clause. Any help much appreciated!
I think you want conditional aggregation along with a window function:
SELECT Month, AccountID, ProductName,
SUM(NetRevenue) AS NetRevenue,
MIN(MIN(CASE WHEN ProductName = 'Product A' THEN month END)) OVER (PARTITION BY AccountID) AS EarliestUse
FROM <<my-billing-table>>
WHERE NetRevenue > 0 AND
AccountID IN (SELECT AccountID
FROM <<my-billing-table>>
WHERE ProductName = 'Product A' AND NetRevenue > 0
)
GROUP BY 1,2,3;
The key expression here is an aggregation function nestled inside a window function. The aggregation function is MIN(CASE WHEN ProductName = 'Product A' THEN month END). This calculates the earliest month for the specified product on each row. This could be a column in the result set, and you would see the minimum value on the product row.
The window function then "spreads" this value over all rows for a given AccountID.
you are using a constant in partition it will not impact in your result, should use the column ProductName in partition to get the earliest use of the product
SELECT
Month,
AccountID,
ProductName,
SUM(NetRevenue) AS NetRevenue,
MIN(Month) OVER(PARTITION BY AccountID, ProductName) AS EarliestUse
FROM
<<my-billing-table>>
WHERE
NetRevenue > 0
AND AccountID IN (
SELECT DISTINCT AccountID
FROM <<my-billing-table>>
WHERE ProductName = 'Product A' AND NetRevenue > 0
)
GROUP BY 1,2,3

Tweaking a Query - looking for duplicates within a certain day range

I posted a question similar to this, and got an answer, but the answer isn't configurable - my fault I should have been more clear, so I'll try again.
I have a table where TABLENAME has the following information - OrderDate, OrderNumber, CustomerID, ProductSKU, ProductName exist. This table has lines for invoices. So an order will have a data line for every item in the order.
I want to know, which customers have ordered the same item, more than once, where the order is within 90 of any other order of that same product by that customer, after a specific date. Same product in the same order number do not count. The catch is that I want "more than once" to be configurable, so if I need to see 3 or more, or 4 or more I can adjust AND I want to see the counts. Here's the query I have so far, which I think gives me the items and the counts - but not the 90 day thing:
EDITED: I don't think the former version gave me the right counts
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >'2017-11-01'
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 2
Try doing this. it'll go back 90 days
declare #date date = '2017-11-01'
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >= dateadd(DD,-90,#date) and orderdate <= #date
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 1
yes that is what I was doing in the first query. so this might be a really crappy way of doing it but without seeing any data it was kind of tough. this query shows gives you the order dates as well. hope it helps
WITH DupsWithin90Days (customerid,productsku,productname,orderdate,num)
as
(
select customerid,productsku,productname,orderdate ,count(*) num from (
SELECT X.customerid, X.productsku, X.productname,X.ORDERDATE,ROW_NUMBER() OVER (partition by x.customerid,x.orderdate order by x.orderdate) rownum
FROM
(
SELECT T1.customerid, T1.productsku, T1.productname,T1.ORDERDATE
FROM TABLENAME1 T1
) X
JOIN
(
SELECT T2.customerid, T2.productsku, T2.productname,T2.ORDERDATE
FROM
TABLENAME1 T2
) Y
ON X.customerid = Y.customerid AND X.orderdate >= dateadd(DD,-90,Y.orderdate)
) dup
where rownum > 1
group by customerid,productsku,productname,orderdate
)
select customerid,productsku,productname,orderdate
from DupsWithin90Days
order by customerid ,orderdate desc

Summing a column over a date range in a CTE?

I'm trying to sum a certain column over a certain date range. The kicker is that I want this to be a CTE, because I'll have to use it multiple times as part of a larger query. Since it's a CTE, it has to have the date column as well as the sum and ID columns, meaning I have to group by date AND ID. That will cause my results to be grouped by ID and date, giving me not a single sum over the date range, but a bunch of sums, one for each day.
To make it simple, say we have:
create table orders (
id int primary key,
itemID int foreign key references items.id,
datePlaced datetime,
salesRep int foreign key references salesReps.id,
price int,
amountShipped int);
Now, we want to get the total money a given sales rep made during a fiscal year, broken down by item. That is, ignoring the fiscal year bit:
select itemName, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName
Simple enough. But when you add anything else, even the price, the query spits out way more rows than you wanted.
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName, price
Now, each group is (name, price) instead of just (name). This is kind of sudocode, but in my database, just this change causes my result set to jump from 13 to 32 rows. Add to that the date range, and you really have a problem:
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between 150101 and 151231
group by itemName, price
This is identical to the last example. The trouble is making it a CTE:
with totals as (
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped, orderDate as startDate, orderDate as endDate
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between startDate and endDate
group by itemName, price, startDate, endDate
)
select totals_2015.itemName as itemName_2015, totals_2015.price as price_2015, ...
totals_2016.itemName as itemName_2016, ...
from (
select * from totals
where startDate = 150101 and endDate = 151231
) totals_2015
join (
select *
from totals
where startDate = 160101 and endDate = 160412
) totals_2016
on totals_2015.itemName = totals_2016.itemName
Now the grouping in the CTE is way off, more than adding the price made it. I've thought about breaking the price query into its own subquery inside the CTE, but I can't escape needing to group by the dates in order to get the date range. Can anyone see a way around this? I hope I've made things clear enough. This is running against an IBM iSeries machine. Thank you!
Depending on what you are looking for, this might be a better approach:
select 'by sales rep' breakdown
, salesRep
, '' year
, sum(price * amountShipped) amount
from etc
group by salesRep
union
select 'by sales rep and year' breakdown
, salesRep
, convert(char(4),orderDate, 120) year
, sum(price * amountShipped) amount
from etc
group by salesRep, convert(char(4),orderDate, 120)
etc
When possible group by the id columns or foreign keys because the columns are indexed already you'll get faster results. This applies to any database.
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte order by rep
or more fancy
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte join reps on cte.rep = reps.rep order by sls desc
I eventually found a solution, and it doesn't need a CTE at all. I wanted the CTE to avoid code duplication, but this works almost as well. Here's a thread explaining summing conditionally that does exactly what I was looking for.

SQL: Need help with query construction

I am relatively new with sql and I need some help with some basic query construction.
Problem: To retrieve the number of orders and the customer id from a table based on a set of parameters.
I want to write a query to figure out the number of orders under each customer (Column: Customerid) along with the CustomerID where the number of orders should be greater or equal to 10 and the status of the order should be Active. Moreover, I also want to know the first transaction date of an order belonging to each customerid.
Table Description:
product_orders
Orderid CustomerId Transaction_date Status
------- ---------- ---------------- -------
1 23 2-2-10 Active
2 22 2-3-10 Active
3 23 2-3-10 Deleted
4 23 2-3-10 Active
Query that I have written:
select count(*), customerid
from product_orders
where status = 'Active'
GROUP BY customerid
ORDER BY customerid;
The above statement gives me
the sum of all order under a customer
id but does not fulfil the condition
of atleast 10 orders.
I donot know how
to display the first transaction date
along with the order under a
customerid (status: could be active
or delelted doesn't matter)
Ideal solutions should look like:
Total Orders CustomerID Transaction Date (the first transaction date)
------------ ---------- ----------------
11 23 1-2-10
Thanks in advance. I hope you guys would be kind enough to stop by and help me out.
Cheers,
Leonidas
SELECT
COUNT(*) AS [Total Orders],
CustomerID,
MIN(Transaction_date) AS [Transaction Date]
FROM product_orders
WHERE product_orders.Status = 'Active'
GROUP BY
CustomerId
HAVING COUNT(*) >= 10
HAVING will allow you to filter aggregates like COUNT() & MIN() will show the first date.
select
count(*),
customerid,
MIN(order_date)
from product_orders
where status = 'Active'
GROUP BY customerid
HAVING COUNT(*) >= 10
ORDER BY customerid
If you want the earliest date irrespective of status you can sub-query for it
select
count(*),
customerid,
(SELECT min(order_date) FROM product_orders WHERE product_orders.customerid = p.customerid) AS FirstDate
from product_orders P
where status = 'Active'
GROUP BY customerid
HAVING COUNT(*) >= 10
ORDER BY customerid
This query should give you the total active orders for each customer that has 10 or more active orders. It will also display the first active order date.
Select Count(OrderId) as TotalOrders,
CustomerId,
Min(Transaction_Date) as FirstActiveOrder
From Product_Orders
Where [Status] = 'Active'
Group By CustomerId
Having Count(OrderId)>10
select count(*), customerid, MIN(Transaction_date) from product_orders
where status = 'Active'
GROUP BY customerid having count(*) >= 10
ORDER BY customerid