How to get the discount number of customers in prior period? - sql

I have a requirement where I supposed to roll customer data in the prior period of 365 days.
Table:
CREATE TABLE orders (
persistent_key_str character varying,
ord_id character varying(50),
ord_submitted_date date,
item_sku_id character varying(50),
item_extended_actual_price_amt numeric(18,2)
);
Sample data:
INSERT INTO orders VALUES
('01120736182','ORD6266073' ,'2010-12-08','100856-01',39.90),
('01120736182','ORD33997609' ,'2011-11-23','100265-01',49.99),
('01120736182','ORD33997609' ,'2011-11-23','200020-01',29.99),
('01120736182','ORD33997609' ,'2011-11-23','100817-01',44.99),
('01120736182','ORD89267964' ,'2012-12-05','200251-01',79.99),
('01120736182','ORD89267964' ,'2012-12-05','200269-01',59.99),
('01011679971','ORD89332495' ,'2012-12-05','200102-01',169.99),
('01120736182','ORD89267964' ,'2012-12-05','100907-01',89.99),
('01120736182','ORD89267964' ,'2012-12-05','200840-01',129.99),
('01120736182','ORD125155068','2013-07-27','201443-01',199.99),
('01120736182','ORD167230815','2014-06-05','200141-01',59.99),
('01011679971','ORD174927624','2014-08-16','201395-01',89.99),
('01000217334','ORD92524479' ,'2012-12-20','200021-01',29.99),
('01000217334','ORD95698491' ,'2013-01-08','200021-01',19.99),
('01000217334','ORD90683621' ,'2012-12-12','200021-01',29.990),
('01000217334','ORD92524479' ,'2012-12-20','200560-01',29.99),
('01000217334','ORD145035525','2013-12-09','200972-01',49.99),
('01000217334','ORD145035525','2013-12-09','100436-01',39.99),
('01000217334','ORD90683374' ,'2012-12-12','200284-01',39.99),
('01000217334','ORD139437285','2013-11-07','201794-01',134.99),
('01000827006','W02238550001','2010-06-11','HL 101077',349.000),
('01000827006','W01738200001','2009-12-10','EL 100310 BLK',119.96),
('01000954259','P00444170001','2009-12-03','PC 100455 BRN',389.99),
('01002319116','W02242430001','2010-06-12','TR 100966',35.99),
('01002319116','W02242430002','2010-06-12','EL 100985',99.99),
('01002319116','P00532470001','2010-05-04','HO 100482',49.99);
Using the query below I am trying to get the number of distinct customers by order_submitted_date:
select
g.order_date as "Ordered",
count(distinct o.persistent_key_str) as "customers"
from
generate_series(
(select min(ord_submitted_date) from orders),
(select max(ord_submitted_date) from orders),
'1 day'
) g (order_date)
left join
orders o on o.ord_submitted_date between g.order_date - interval '364 days'
and g.order_date
WHERE extract(year from ord_submitted_date) <= 2009
group by 1
order by 1
This is the output I expected.
Ordered Customers
2009-12-03 1
2009-12-10 1
When I execute the query above I get incorrect results.
How can I make this right?

To get your expected output ("the number of distinct customers") - only days with actual orders 2009:
SELECT ord_submitted_date, count(DISTINCT persistent_key_str) AS customers
FROM orders
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
Formulate the WHERE conditions this way to make the query sargable, and input easy.
If you want one row per day (from the earliest entry up to the latest in orders) - within 2009:
SELECT ord_submitted_date AS ordered
, count(DISTINCT o.persistent_key_str) AS customers
FROM (SELECT generate_series(min(ord_submitted_date) -- single query ...
, max(ord_submitted_date) -- ... to get min / max
, '1d')::date FROM orders) g (ord_submitted_date)
LEFT join orders o USING (ord_submitted_date)
WHERE ord_submitted_date >= '2009-1-1'
AND ord_submitted_date < '2010-1-1'
GROUP BY 1
ORDER BY 1;
SQL Fiddle.
Distinct customers per year
SELECT extract(year from ord_submitted_date) AS year
, count(DISTINCT persistent_key_str) AS customers
FROM orders
GROUP BY 1
ORDER BY 1;
SQL Fiddle.

Related

Getting the number of users for this year and last year in SQL query

My table structure like this
root_tstamp
userId
2022-01-26T00:13:24.725+00:00
d2212
2022-01-26T00:13:24.669+00:00
ad323
2022-01-26T00:13:24.629+00:00
adfae
2022-01-26T00:13:24.573+00:00
adfa3
2022-01-26T00:13:24.552+00:00
adfef
...
...
2021-01-26T00:12:24.725+00:00
d2212
2021-01-26T00:15:24.669+00:00
daddfe
2021-01-26T00:14:24.629+00:00
adfda
2021-01-26T00:12:24.573+00:00
466eff
2021-01-26T00:12:24.552+00:00
adfafe
I want to get the number of users in the current year and in previous year like below using SQL.
Date
Users
previous_year
2022-01-01
10
5
2022-01-02
20
15
and the query I have used is:
with base as (
select
date(root_tstamp) as current_date
, count(distinct userid) as signup_counts
from table1
group by 1
)
select
t1.current_date
, t1.signup_counts as signups_this_year
, t2.signup_counts as signups_last_year
, t1.signup_counts - t2.signups_counts as difference
from base t1
left join base t2 on t1.current_date = t2.current_date + interval '1 year'
group by t1.current_date
order by t1.current_date Desc
But I getting this error:
ERROR: column t2.signups_counts does not exist
It's because you have t2.signup_counts is misspelled as t2.signups_counts.
Another note is that your query only has a GROUP BY on current_date and since the other columns are not aggregates you've to include these columns too.
Here is the modified query:
with base as (
select
date(root_tstamp) as current_date
, count(distinct userid) as signup_counts
from table1
group by 1
)
select
t1.current_date
, t1.signup_counts as signups_this_year
, t2.signup_counts as signups_last_year
, t1.signup_counts - t2.signup_counts as difference
from base t1
left join base t2 on t1.current_date = t2.current_date - interval '1 year'
group by t1.current_date, t2.signup_counts, t1.signup_counts
order by t1.current_date Desc

How ovoid duplicate code in sql query subQuery

I'm new in SQL. Need some help to improve my query to ovoid duplicate code.
SELECT customers.name, orders.price
FROM customers
JOIN orders ON orders.id = customers.order_id
WHERE customers.order_id IN (
SELECT orders.id
FROM orders
WHERE orders.price = (
SELECT orders.price
FROM orders
WHERE orders.order_date BETWEEN
(SELECT MIN(orders.order_date) FROM orders)
AND
(SELECT DATE_ADD(MIN(orders.order_date), INTERVAL 10 year)FROM orders)
ORDER BY orders.price DESC LIMIT 1
)
AND orders.order_date BETWEEN
(SELECT MIN(orders.order_date) FROM orders)
AND
(SELECT DATE_ADD(MIN(orders.order_date), INTERVAL 10 year)FROM orders)
)
I would like ovoid duplicate code here
SELECT MIN(orders.order_date) FROM orders
and
SELECT DATE_ADD(MIN(orders.order_date), INTERVAL 10 year)FROM orders
You can use WITH to get first 10 years orders. By defitinion there exists no orders with the date < min(date), so you needn't between, just <= .
firstOrders as (
SELECT *
FROM orders
WHERE order_date <=
(SELECT DATE_ADD(MIN(o.order_date), INTERVAL 10 year)
FROM orders o)
)
SELECT customers.name, orders.price
FROM customers
JOIN FirsrOrders orders ON orders.id = customers.order_id
AND orders.price = (
select price
from firstOrders
order py price desc
limit 1
)
You want orders from the first ten years where the price was equal to the maximum price among those orders. So rank by price and grab those holding the #1 spot.
with data as (
select *,
date_add(min(order_date) over (), interval 10 year) as max_date,
rank() over (order by price desc) as price_rank
from orders
)
select *
from data
where order_date <= max_date and price_rank = 1;

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

Grouping multiple selects within a SQL query

I have a table Supplier with two columns, TotalStock and Date. I'm trying to write a single query that will give me stock totals by week / month / year for a list of suppliers.
So results will look like this..
SUPPLIER WEEK MONTH YEAR
SupplierA 50 100 2000
SupplierB 60 150 2500
SupplierC 15 25 200
So far I've been playing around with multiple selects but I can't get any further than this:
SELECT Supplier,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-5-12'
GROUP BY Supplier
) AS StockThisWeek,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-5-1'
GROUP BY Supplier
) AS StockThisMonth,
(
SELECT Sum(TotalStock)
FROM StockBreakdown
WHERE Date >= '2014-1-1'
GROUP BY Supplier
) AS StockThisYear
This query throws an error as each individual grouping returns multiple results. I feel that I'm close to the solution but can't work out where to go
You don't have to use subqueries to achieve what you want :
SELECT Supplier
, SUM(CASE WHEN Date >= CAST('2014-05-12' as DATE) THEN TotalStock END) AS StockThisWeek
, SUM(CASE WHEN Date >= CAST('2014-05-01' as DATE) THEN TotalStock END) AS StockThisMonth
, SUM(CASE WHEN Date >= CAST('2014-01-01' as DATE) THEN TotalStock END) AS StockThisYear
FROM StockBreakdown
GROUP BY Supplier
You may need to make the selects for the columns return only a single result. You could try this (not tested currently):
SELECT Supplier,
(
SELECT TOP 1 StockThisWeek FROM
(
SELECT Supplier, Sum(TotalStock) AS StockThisWeek
FROM StockBreakdown
WHERE Date >= '2014-5-12'
GROUP BY Supplier
) tmp1
WHERE tmp1.Supplier = Supplier
) AS StockThisWeek,
(
SELECT TOP 1 StockThisMonth FROM
(
SELECT Supplier, Sum(TotalStock) AS StockThisMonth
FROM StockBreakdown
WHERE Date >= '2014-5-1'
GROUP BY Supplier
) tmp2
WHERE tmp2.Supplier = Supplier
) AS StockThisMonth,
...
This selects the supplier and then tries to create two columns StockThisWeek and StockThisMonth by selecting the first entry from the select you created before. As through the GROUP BY there should only be one entry per supplier, so you don't lose and data.

Using a column in sql join without adding it to group by clause

My actual table structures are much more complex but following are two simplified table definitions:
Table invoice
CREATE TABLE invoice (
id integer NOT NULL,
create_datetime timestamp with time zone NOT NULL,
total numeric(22,10) NOT NULL
);
id create_datetime total
----------------------------
100 2014-05-08 1000
Table payment_invoice
CREATE TABLE payment_invoice (
invoice_id integer,
amount numeric(22,10)
);
invoice_id amount
-------------------
100 100
100 200
100 150
I want to select the data by joining above 2 tables and selected data should look like:-
month total_invoice_count outstanding_balance
05/2014 1 550
The query I am using:
select
to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') as month,
count(i.id) as total_invoice_count,
(sum(i.total) - sum(pi.amount)) as outstanding_balance
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by date_trunc('month', i.create_datetime)
order by date_trunc('month', i.create_datetime);
Above query is giving me incorrect results as sum(i.total) - sum(pi.amount) returns (1000 + 1000 + 1000) - (100 + 200 + 150) = 2550.
I want it to return (1000) - (100 + 200 + 150) = 550
And I cannot change it to i.total - sum(pi.amount), because then I am forced to add i.total column to group by clause and that I don't want to do.
You need a single row per invoice, so aggregate payment_invoice first - best before you join.
When the whole table is selected, it's typically fastest to aggregate first and join later:
SELECT to_char(date_trunc('month', i.create_datetime), 'MM/YYYY') AS month
, count(*) AS total_invoice_count
, (sum(i.total) - COALESCE(sum(pi.paid), 0)) AS outstanding_balance
FROM invoice i
LEFT JOIN (
SELECT invoice_id AS id, sum(amount) AS paid
FROM payment_invoice pi
GROUP BY 1
) pi USING (id)
GROUP BY date_trunc('month', i.create_datetime)
ORDER BY date_trunc('month', i.create_datetime);
LEFT JOIN is essential here. You do not want to loose invoices that have no corresponding rows in payment_invoice (yet), which would happen with a plain JOIN.
Accordingly, use COALESCE() for the sum of payments, which might be NULL.
SQL Fiddle with improved test case.
Do the aggregation in two steps. First aggregate to a single line per invoice, then to a single line per month:
select
to_char(date_trunc('month', t.create_datetime), 'MM/YYYY') as month,
count(*) as total_invoice_count,
(sum(t.total) - sum(t.amount)) as outstanding_balance
from (
select i.create_datetime, i.total, sum(pi.amount) amount
from invoice i
join payment_invoice pi on i.id=pi.invoice_id
group by i.id, i.total
) t
group by date_trunc('month', t.create_datetime)
order by date_trunc('month', t.create_datetime);
See sqlFiddle
SELECT TO_CHAR(invoice.create_datetime, 'MM/YYYY') as month,
COUNT(invoice.create_datetime) as total_invoice_count,
invoice.total - payments.sum_amount as outstanding_balance
FROM invoice
JOIN
(
SELECT invoice_id, SUM(amount) AS sum_amount
FROM payment_invoice
GROUP BY invoice_id
) payments
ON invoice.id = payments.invoice_id
GROUP BY TO_CHAR(invoice.create_datetime, 'MM/YYYY'),
invoice.total - payments.sum_amount