I am trying to get new customer's vs returning customers and for this I have to create multiple tables. Is there a better way to aggregate the data shown like below:
my SQL code looks like below:
---- ALL INDIVIDUALS WHO PURCHASED IN CURRENT WEEK---------
CREATE TABLE PURCHASES_FEB_WK2 AS (Select DISTINCT INDIVIDUAL_ID
from DM_OWNER.TRANSACTION_DETAIL_MV
WHERE BRAND_ORG_CODE = 'BRAND'
and is_merch = 1
and currency_code = 'USD'
AND LINE_ITEM_AMT_TYPE_CD = 'S'
AND TRUNC(TXN_DATE) BETWEEN '10-FEB-19' AND '16-FEB-19')
----------MINIMUM PURCHASE DATE OF ALL CUSTOMERS------------
Create table feb_wk2_min as
Select distinct Individual_ID, MIN(TRANSACTION_DATE) as FIRST_TRANSACTION
from dm_owner.transaction_mv
WHERE BRAND_ORG_CODE = 'BRAND'
and transaction_type_code in ('PR','EP')
group by individual_ID;
------- NEW CUSTOMERS FOR THE WEEK---------
Select Count(distinct B.INDIVIDUAL_ID)
from PURCHASES_FEB_WK2 A
JOIN FEB_WK2_MIN B ON A.INDIVIDUAL_ID = B.INDIVIDUAL_ID
where FIRST_TRANSACTION between '10-FEB-19' and '16-FEB-19'
---- ALL RETURNING CUSTOMERS
SELECT COUNT (DISTINCT INDIVIDUAL_ID)
FROM PURCHASES_FEB_WK2
WHERE INDIVIDUAL_ID IN (SELECT INDIVIDUAL_ID FROM DM_OWNER.TRANSACTION_DETAIL_MV WHERE TRUNC(TXN_DATE) < '10-FEB-19' AND BRAND_ORG_CODE = 'BRAND' AND IS_MERCH = 1 AND line_item_amt_type_cd = 'S' AND STATUS = 'A')
-------NEW CUSTOMERS DOLLAR_VALUE_US------
SELECT SUM(DOLLAR_VALUE_US) FROM DM_OWNER.TRANSACTION_DETAIL_MV
WHERE INDIVIDUAL_ID IN (Select distinct B.INDIVIDUAL_ID
from PURCHASES_FEB_WK2 A
JOIN FEB_WK2_MIN B ON A.INDIVIDUAL_ID = B.INDIVIDUAL_ID
where FIRST_TRANSACTION between '10-FEB-19' and '16-FEB-19')
AND BRAND_ORG_CODE = 'BRAND'
and is_merch = 1
and currency_code = 'USD'
AND LINE_ITEM_AMT_TYPE_CD = 'S'
AND TRUNC(TXN_DATE) BETWEEN '10-FEB-19' AND '16-FEB-19'
-------RETURNING CUSTOMERS DOLLAR_VALUE_US------
SELECT SUM(DOLLAR_VALUE_US) FROM DM_OWNER.TRANSACTION_DETAIL_MV
WHERE INDIVIDUAL_ID IN (SELECT DISTINCT INDIVIDUAL_ID
FROM PURCHASES_FEB_WK2
WHERE INDIVIDUAL_ID IN (SELECT INDIVIDUAL_ID FROM DM_OWNER.TRANSACTION_DETAIL_MV WHERE TRUNC(TXN_DATE) < '10-FEB-19' AND BRAND_ORG_CODE = 'BRAND' AND IS_MERCH = 1 AND line_item_amt_type_cd = 'S' AND STATUS = 'A'))
AND BRAND_ORG_CODE = 'BRAND'
and is_merch = 1
and currency_code = 'USD'
AND LINE_ITEM_AMT_TYPE_CD = 'S'
AND TRUNC(TXN_DATE) BETWEEN '10-FEB-19' AND '16-FEB-19'
To get the quantity and the count of order, I am replacing the sum (dollar_value_us) with count of distinct orders and sum of quantity. Is there an easy way to pivot and combine this code so that I can just copy paste the data in the format (picture attached) I have provided.
Based on the comments, I understand that you want to split the customers into two groups : customers that had their first transactions during the period should be separated from thoses who had transactions before. For each group, you want to count the number of customers and sum the value of the transactions.
NB : your sql code does not show hot to compute qty and count_of_orders, so I left it apart (but this will likely follow the same logic).
Given this sample data:
INDIVIDUAL_ID | DOLLAR_VALUE_US | TXN_DATE | RAND_ORG_CODE | IS_MERCH | CURRENCY_CODE | LINE_ITEM_AMT_TYPE_CD
------------: | --------------: | :-------- | :------------ | -------: | :------------ | :--------------------
1 | 10 | 01-FEB-19 | BRAND | 1 | USD | S
1 | 10 | 10-FEB-19 | BRAND | 1 | USD | S
1 | 10 | 15-FEB-19 | BRAND | 1 | USD | S
1 | 10 | 28-FEB-19 | BRAND | 1 | USD | S
2 | 11 | 11-FEB-19 | BRAND | 1 | USD | S
2 | 11 | 12-FEB-19 | BRAND | 1 | USD | S
3 | 11 | 12-FEB-19 | BRAND | 1 | USD | S
Considering week range from February 10th to 16th included, customer 1 is a returning customer with 2 transactions in the window, and customers 2 and 3 are new customers with respectively 2 and 1 transactions. You would expect the following output:
TYPE_OF_CUSTOMER | COUNT_OF_CUSTOMERS | SUM_DOLLAR_VALUE_US
:------------------ | -----------------: | ------------------:
New Customers | 2 | 33
Returning Customers | 1 | 20
To solve this, you need to set up several levels of aggregation. First, use window function MIN() OVER() to recover the date of the first transaction of each customer. Then, filter on the anlaysis period, split customers into new/returning groups, and aggregate the money spent. Finally, aggregate all results together.
Query:
SELECT
DECODE(is_new, 1, 'New Customers', 'Returning Customers') type_of_customer,
COUNT(individual_id) count_of_customers,
SUM(dollar_value_us) sum_dollar_value_us
FROM (
SELECT
individual_id,
SUM(dollar_value_us) dollar_value_us,
CASE WHEN MIN(txn_date) = min_txn_date THEN 1 ELSE 0 END is_new
FROM (
SELECT
individual_id,
dollar_value_us,
txn_date,
MIN(txn_date) OVER(PARTITION BY individual_id) min_txn_date
FROM transaction_detail_mv
WHERE
rand_org_code = 'BRAND'
AND is_merch = 1
AND currency_code = 'USD'
AND line_item_amt_type_cd = 'S'
) t
WHERE
txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
GROUP BY
individual_id,
min_txn_date
) x GROUP BY is_new
This demo on DB Fiddle demonstrates each step of the computation.
Related
I have a basic parent / child scheme for expenditures:
The underling data is the same so I just added a category column and parent_id. These have child records:
I am trying to aggregate the totals form the orders, related orders and difference between the two like this:
Which is grouped by the orders overall then I am also looking for something like this:
I can get the order_amount no problem either way. That's a simple JOIN and SUM.
I am stuck on the secondary JOINS given that I have to JOIN the invoices expenditures to the orders then JOIN the invoice expenditure items and SUM that up.
I am looking for direction on the correct JOIN or if there is a better way to approach this with some sort of subquery etc.
To sump up by order, one solution would be to use a conditional aggregate query. A trick is to check the category to decide whether to use the value from column expenditures.id or from column expenditures.parent_id as grouping criteria:
SELECT
CASE WHEN e.category = 'order' THEN e.id ELSE e.parent_id END expenditure_id,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END)
- SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) balance
FROM expenditures e
LEFT JOIN expenditure_items i ON e.id = i.expenditure_id
GROUP BY CASE WHEN e.category = 'order' THEN e.id ELSE e.parent_id END
ORDER BY expenditure_id
Demo on DB Fiddle:
| expenditure_id | order_amount | order_amount | balance |
| -------------- | ------------ | ------------ | ------- |
| 1 | 3740 | 0 | 3740 |
| 2 | 11000 | 9350 | 1650 |
The second query, that sums up by item code, basically follows the same logic, but groups by idem code instead:
SELECT
i.code,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END)
- SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) balance
FROM expenditures e
LEFT JOIN expenditure_items i ON e.id = i.expenditure_id
GROUP BY i.code
ORDER BY i.code;
Demo:
| code | order_amount | order_amount | balance |
| ---- | ------------ | ------------ | ------- |
| a | 13400 | 8500 | 4900 |
| b | 1340 | 850 | 490 |
I have a query (Oracle) that shows the sales of each customer by years:
SELECT cmp.company_key
, sum(CASE WHEN sd.date between TO_DATE('01-Jan-2010', 'dd-mm-yyyy') and TO_DATE('11-Jun-2010', 'dd-mm-yyyy') THEN sd.qty_ship * sd.unit_price END) AS year1sales
, sum(CASE WHEN sd.date between TO_DATE('01-Jan-2011', 'dd-mm-yyyy') and TO_DATE('11-Jun-2011', 'dd-mm-yyyy') THEN sd.qty_ship * sd.unit_price END) AS year2sales
FROM sales_detail sd
INNER JOIN sales_header sh on sd.sales_header_key = sh.sales_header_key
INNER JOIN companies cmp on sh.company_key = cmp.company_key
GROUP BY cmp.company_key
The query produces this:
company_key | year1sales | year2sales
------------|------------|------------
8687 | 21355.76 | 54326.45
25 | 9375.41 | 12401
34 | 6440.03 | 50349.27
247 | 47355.93 | 77432.67
83 | 15757.35 | 39999.12
But I also need it to return a value ("TBI") showing what percentage that company's sales are compared to the sum of all the other sales numbers.
So, for company #8687 it would be 21355.76 / sigma(year1 sales) which is 21355.76/100,284.48 = 21.3%.
So the result would be:
company_key | year1sales | year1 TBI | year2sales | year1 TBI
------------|------------|-----------|------------|----------
8687 | 21355.76 | 21.30 | 54326.45 | 23.17
25 | 9375.41 | 9.35 | 12401 | 5.29
34 | 6440.03 | 6.42 | 50349.27 | 21.47
247 | 47355.93 | 47.22 | 77432.67 | 33.02
83 | 15757.35 | 15.71 | 39999.12 | 17.06
And obviously the TBI columns would sum up to 100%.
How would you write this query? Also, what is the time complexity for a problem like this? I think it's O(n^2) best case.
You'd use SUM OVER to get the totals:
select
company_key,
year1sales,
year1sales / sum(year1sales) over() as year1tbi,
year2sales,
year2sales / sum(year2sales) over() as year2tbi
from
(
SELECT cmp.company_key
, sum(CASE WHEN sd.date between date '2010-01-01' and date '2010-06-11' THEN sd.qty_ship * sd.unit_price END) AS year1sales
, sum(CASE WHEN sd.date between date '2011-01-01' and date '2011-06-11' THEN sd.qty_ship * sd.unit_price END) AS year2sales
FROM sales_detail sd
INNER JOIN sales_header sh on sd.sales_header_key = sh.sales_header_key
INNER JOIN companies cmp on sh.company_key = cmp.company_key
GROUP BY cmp.company_key
)
order by company_key;
As to the complexity: I cannot answer this. The DBMS has to run through the result, build the totals and calculate the percentages per row then.
I have a table that looks like this:
+--------+----------+--------+------------+-------+
| ID | CHANNEL | VENDOR | num_PERIOD | SALES |
+--------+----------+--------+------------+-------+
| 000001 | Business | Shop | 1 | 40 |
| 000001 | Business | Shop | 2 | 60 |
| 000001 | Business | Shop | 3 | NULL |
+--------+----------+--------+------------+-------+
With many combinations of ID, CHANNEL and VENDOR, and sales records for each of them over time (num_PERIOD).
The idea is to obtain a new column which returns the number of NULLS in SALES column, but in the first 111 registers according to num_PERIOD column.
I have been trying something like this:
SELECT ID,
CHANNEL,
VENDOR,
sum(CASE
WHEN SALES IS NULL THEN 1
ELSE 0
END) OVER (PARTITION BY ID,
CHANNEL,
VENDOR
ORDER BY num_PERIOD ROWS BETWEEN UNBOUNDED PRECEDING AND 111 FOLLOWING) AS NULL_SALES_SET
FROM TABLE
GROUP BY ID,
CHANNEL,
VENDOR
But I'm not obtaining what I'm looking for.
So to obtain a table simillar to:
+--------+--------------+--------+----------------+
| ID | CHANNEL | VENDOR | NULL_SALES_SET |
+--------+--------------+--------+----------------+
| 000001 | Business | Shop | 1 |
| 000002 | Business | Market | 0 |
| 000002 | Non Business | Shop | 3 |
+--------+--------------+--------+----------------+
The difficulty comes when selecting these first 111 rows per ID, CHANNEL AND VENDOR ordered by num_PERIOD.
Use a CTE (Common Table Expression) with the ROW_NUMBER windowed function and you should be set:
;WITH MyCTE AS
(
SELECT
id,
channel,
vendor,
sales,
ROW_NUMBER() OVER (PARTITION BY id, channel, vendor ORDER BY num_period) AS row_num
FROM
MyTable
)
SELECT
id,
channel,
vendor,
SUM(CASE WHEN sales IS NULL THEN 1 ELSE 0 END) AS null_sales_set
FROM
MyCTE
WHERE
row_num <= 111
GROUP BY
id, channel, vendor
Do you have to use the windowing function?
SELECT ID
, CHANNEL
, VENDOR
, NULL_SALES_SET = SUM(CASE WHEN SALES IS NULL THEN 1 ELSE 0 END)
FROM Table
WHERE num_PERIOD <= 111
GROUP BY ID, CHANNEL, VENDOR
Or are you looking for the first 111 num_PERIOD values allowing for gaps in the num_PERIOD column?
SELECT t.ID
, t.CHANNEL
, t.VENDOR
, NULL_SALES_SET = SUM(CASE WHEN t.SALES IS NULL THEN 1 ELSE 0 END)
FROM Table t
INNER JOIN ( SELECT i.ID
, i.CHANNEL
, i.VENDOR
, i.num_PERIOD
, rowNum = ROW_NUMBER(PARTITION BY i.ID, i.CHANNEL, i.VENDOR ORDER BY i.num_PERIOD)
FROM Table i ) l
ON t.ID = l.ID
AND t.CHANNEL = l.CHANNEL
AND t.VENDOR = l.VENDOR
AND t.num_PERIOD = l.num_PERIOD
WHERE l.rowNum <= 111
GROUP BY ID, CHANNEL, VENDOR
Edit: Not sure how I overlooked it, but it is necessary to JOIN on the num_PERIOD column.
Edit: Add the number of distinct num_PERIOD per ID, Channel, Vendor without affecting the NULL_SALES_SET
SELECT t.ID
, t.CHANNEL
, t.VENDOR
-- Counts the NULL Sales when the num_PERIOD is in the
-- first 111 num_PERIODs
, NULL_SALES_SET = SUM(CASE WHEN l.rowNum IS NOT NULL AND t.SALES IS NULL
THEN 1
ELSE 0 END)
-- Counts the distinct num_PERIOD values
, PERIOD_COUNT = COUNT(DISTINCT t.num_PERIOD)
FROM Table t
LEFT OUTER JOIN ( SELECT i.ID
, i.CHANNEL
, i.VENDOR
, i.num_PERIOD
, rowNum = ROW_NUMBER(PARTITION BY i.ID,
i.CHANNEL,
i.VENDOR
ORDER BY i.num_PERIOD)
FROM Table i ) l
ON t.ID = l.ID
AND t.CHANNEL = l.CHANNEL
AND t.VENDOR = l.VENDOR
AND t.num_PERIOD = l.num_PERIOD
AND l.rowNum <= 111
GROUP BY ID, CHANNEL, VENDOR
Can i please get some help with my SQL report query, i am 90% there just need the last step (still learning SQL so be kind :) ).
We currently have 2 different databases:
- [DATABASE1] stores all our assets
- [DATABASE2] stores all assets that we are still paying off
- - This database stores every payment made against the asset to the bank, the date the payment was made, the amount etc.
- - The last row against the asset will be the last payment, and the date on this would be the expected lease end date.
I would like a report that will have a single line per asset shows all columns.
My current report shows all the required information, however it shows ALL the rows per asset instead of a single encapsulated row, e.g:
ASSET NO | FINANCIER | AGEEMENT NUMBER | PAYMENT NUMBER | LEASE COMMENCE DATE | LEASE FINAL DATE | MONTHLY PAYMENTS
asset1 | bank 1 | 1111 | 1 | 01/01/2017 | NULL | NULL
asset1 | bank 1 | 1111 | 2 | NULL | NULL | 2000
asset1 | bank 1 | 1111 | 3 | NULL | NULL | NULL
..
asset1 | bank 1 | 1111 | 20 | NULL | 01/01/2020 | NULL
asset2 | bank 5 | 1536 | 1 | 05/08/2016 | NULL | NULL
..
Instead of:
ASSET NO | FINANCIER | AGEEMENT NUMBER | PAYMENT NUMBER | LEASE COMMENCE DATE | LEASE FINAL DATE | MONTHLY PAYMENTS
asset1 | bank 1 | 1111 | 20 | 01/01/2017 | 01/01/2020 | 2000
asset2 | bank 5 | 1536 | 15 | 05/08/2016 | 12/05/2019 | 5500
..
Below is my query:
Declare #MaxPays TABLE (
ITEMNO VARCHAR(MAX),
PAYNO VARCHAR(MAX)
)
INSERT INTO #MaxPays
SELECT
a.ITEMNO,
a.PAYNO
FROM
[DATABASE1] a
INNER JOIN
(SELECT ITEMNO, MAX(PAYNO) as PAYNO FROM [DATABASE1] GROUP BY ITEMNO) AS b ON
a.ITEMNO = b.ITEMNO AND a.PAYNO = b.PAYNO
SELECT
a.ITEMNO as 'Asset #',
a.FINANCE as 'Financier',
a.AGREENO as 'Agreement number',
a.PAYNO as 'Payment Number',
CASE WHEN a.PAYNO = 1 THEN a.PAYDATE ELSE NULL END as 'Lease Commencing Date',
CASE WHEN a.PAYNO = (SELECT PAYNO FROM #MaxPays WHERE ITEMNO = a.ITEMNO) THEN a.PAYDATE ELSE NULL END as 'Lease Finalising Date',
CASE WHEN a.PAYNO = 2 THEN a.PAYAMOUNT ELSE NULL END as 'Monthly Payments'
FROM
[DATABASE1] a
INNER JOIN
(SELECT DISTINCT ITEMNO from [DATABASE2]) AS b ON
a.ITEMNO = b.ITEMNO
ORDER BY a.ITEMNO
EDIT: The monthly payment links to the 2nd instance because sometimes the 1st payment includes down payments, and isnt a clear indicator of the recurring monthly payments
Any help would be appreciated.
Thanks
You just need a minor modification in your query -
SELECT
a.ITEMNO as 'Asset #',
a.FINANCE as 'Financier',
a.AGREENO as 'Agreement number',
a.PAYNO as 'Payment Number',
MAX(CASE WHEN a.PAYNO = 1 THEN a.PAYDATE ELSE NULL END) as 'Lease Commencing Date',
MAX(CASE WHEN a.PAYNO = (SELECT PAYNO FROM #MaxPays WHERE ITEMNO = a.ITEMNO) THEN a.PAYDATE ELSE NULL END) as 'Lease Finalising Date',
MAX(CASE WHEN a.PAYNO = 2 THEN a.PAYAMOUNT ELSE NULL END) as 'Monthly Payments'
FROM
[DATABASE1] a
INNER JOIN
(SELECT DISTINCT ITEMNO from [DATABASE2]) AS b ON
a.ITEMNO = b.ITEMNO
GROUP BY a.ITEMNO,
a.FINANCE,
a.AGREENO,
a.PAYNO,
ORDER BY a.ITEMNO
I have a 3 datbles Dealer, payment_type and dealer_payment_type
Dealer : dealer_id , dealer_name, dealer_address
1 | test | 123 test lane
2 | abc | abc lane
3 | def | def lane
Payment_type : paymenttype_id , paytype
1 | CHECK
2 | WIRE
3 | CREDIT
Dealer_Payment_type : DPT_id , dealer_id , payment_type_id
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 2
5 | 2 | 3
6 | 3 | 1
7 | 3 | 2
I have to write a query to get payment type info for each dealer , query needs to return data like this:
dealer_id , dealer_name , paytype
1 | test | check,wire,credit
2 | abc | wire,credit
3 | def | check,wire
OR
dealer_id , dealer_name , check , wire , credit
1 | test | true | true | true
2 | abc | false | true | true
3 | def | true | false | true
You did not specify what version of Oracle you are using.
If you are using Oracle 11g, then you can use the following.
To get the values into a single column, then you can use LISTAGG:
select d.dealer_id,
d.dealer_name,
listagg(p.paytype, ',') within group (order by d.dealer_id) as paytype
from dealer d
left join Dealer_Payment_type dp
on d.dealer_id = dp.dealer_id
left join payment_type p
on dp.payment_type_id = p.paymenttype_id
group by d.dealer_id, d.dealer_name;
See SQL Fiddle with demo
To get the values in separate columns, then you can use PIVOT:
select dealer_id, dealer_name,
coalesce("Check", 'false') "Check",
coalesce("Wire", 'false') "Wire",
coalesce("Credit", 'false') "Credit"
from
(
select d.dealer_id,
d.dealer_name,
p.paytype,
'true' flag
from dealer d
left join Dealer_Payment_type dp
on d.dealer_id = dp.dealer_id
left join payment_type p
on dp.payment_type_id = p.paymenttype_id
)
pivot
(
max(flag)
for paytype in ('CHECK' as "Check", 'WIRE' as "Wire", 'CREDIT' as "Credit")
)
See SQL Fiddle with Demo.
If you are not using Oracle 11g, then you can use wm_concat() to concatenate the values into a single row:
select d.dealer_id,
d.dealer_name,
wm_concat(p.paytype) as paytype
from dealer d
left join Dealer_Payment_type dp
on d.dealer_id = dp.dealer_id
left join payment_type p
on dp.payment_type_id = p.paymenttype_id
group by d.dealer_id, d.dealer_name;
To create the separate columns, then you can use an aggregate function with a CASE:
select dealer_id, dealer_name,
max(case when paytype = 'CHECK' then flag else 'false' end) "Check",
max(case when paytype = 'WIRE' then flag else 'false' end) "Wire",
max(case when paytype = 'CREDIT' then flag else 'false' end) "Credit"
from
(
select d.dealer_id,
d.dealer_name,
p.paytype,
'true' flag
from dealer d
left join Dealer_Payment_type dp
on d.dealer_id = dp.dealer_id
left join payment_type p
on dp.payment_type_id = p.paymenttype_id
)
group by dealer_id, dealer_name;
See SQL Fiddle with Demo