SQL Count column remaining at 1 - sql

I would like to display a running total of Invoice_Amount. Here is my current query:
SELECT cust_name, COUNT(*) as Invoice_Amount, Invoice.invoice_date
FROM Customer, Invoice
WHERE Customer.customer_id = Invoice.customer_id
GROUP BY Invoice.customer_id, Customer.cust_name,invoice_date;
and here is the current output:
cust_name Invoice_Amount invoice_date
Company A 1 2000-10-12 00:00:00.000
Company B 1 2000-09-22 00:00:00.000
Company C 1 2000-05-26 00:00:00.000
Company D 1 2000-08-15 00:00:00.000
Company E 1 2000-11-15 00:00:00.000
Company E 1 2000-05-02 00:00:00.000
Where I would like the Invoice_Amount in both cases to read 2 like so:
cust_name Invoice_Amount invoice_date
Company A 1 2000-10-12 00:00:00.000
Company B 1 2000-09-22 00:00:00.000
Company C 1 2000-05-26 00:00:00.000
Company D 1 2000-08-15 00:00:00.000
Company E 2 2000-11-15 00:00:00.000
Company E 2 2000-05-02 00:00:00.000
This is so I can eventually do something along the lines of:
HAVING (COUNT(*) > 1)
How would I go about getting to this result

There is no need for a GROUP BY or a HAVING because you're not actually grouping by anything in the final result.
;;;/* CTE with leading semi-colons for safety */;;;WITH src AS
(
SELECT c.cust_name, i.invoice_date,
COUNT(i.invoice_date) OVER (PARTITION BY i.customer_id)
AS Invoice_Count
FROM dbo.Customer AS c
INNER JOIN dbo.Invoice AS i
ON c.customer_id = i.customer_id
)
SELECT cust_name, Invoice_Count, invoice_date
FROM src
-- WHERE Invoice_Count > 1;

Well, as from your data, the combinations of invoice_date and cust_name seem to be unique - as COUNT(*) always returns 1.
You now seem to need the count value that you call invoice_amount to tally up for the same cust_name. 'Chopsticks' is occurring twice in your report, and, for 'Chopsticks', you need the value 2. But, still, you want to keep both rows.
Functions that sort-of aggregate data, but still return the same number of rows as the input, are not GROUP BY or aggregate functions, they are window functions, or OLAP/Analytic functions.
So, start from your grouping query, but then select from it, applying an OLAP function , and select from that outer query in turn, filtering for the OLAP function's result:
WITH
grp AS (
SELECT
cust_name
, count(*) AS invoice_amount
, invoice.invoice_date
FROM customer
JOIN invoice ON customer.customer_id = invoice.customer_id
GROUP BY
invoice.customer_id
, customer.cust_name
, invoice_date;
)
,
olap AS (
SELECT
cust_name
, SUM(invoice_amount) OVER(PARTITION BY cust_name) AS invoice_amount
, invoice_date
FROM grp
)
SELECT
*
FROM olap
WHERE invoice_amount > 1;

Related

Filter timeseries data based on multiple criteria

I have a table where I store timeseries data:
customer_id
transaction_type
transaction_date
transaction_value
1
buy
2022-12-04
100.0
1
sell
2022-12-04
80.0
2
buy
2022-12-04
120.0
2
sell
2022-12-03
120.0
1
buy
2022-12-02
90.0
1
sell
2022-12-02
70.0
2
buy
2022-12-01
110.0
2
sell
2022-12-01
110.0
Number of customers and transaction types is not limited. Currently there are over 10,000 customers and over 600 transaction types. Dates of transactions ~between customers can be unique and~ will not always align based on any criteria among a customer or transaction type (that's why I've tried using LATERAL JOIN — you'll see it later).
I want to filter those records to get customers IDs with the values of the transaction where any arbitrary condition is met. Number of those conditions in a query is not restricted to two — can be anything. For example:
Give me all customers who have a buy with value > $90 and a sale with value > 100$ as their latest transactions
The final query should return these two rows:
customer_id
transaction_type
transaction_date
transaction_value
2
buy
2022-12-04
120$
2
sell
2022-12-03
120$
The closest I've came to what I need was by creating a materialized view cross-joining customer IDs and transaction_types:
customer_id
transaction_type
1
buy
1
sell
2
buy
2
sell
And then running a LATERAL JOIN between table with transactions and customer_transactions materialized view:
SELECT *
FROM customer_transactions
JOIN LATERAL (
SELECT *
FROM transactions
WHERE (transactions.customer_id = customer_transactions.customer_id)
AND (transactions.transaction_type = customer_transactions.transaction_type)
AND transactions.transaction_date <= '2022-12-04' -- this can change for filtering records back in time
ORDER BY transactions.transaction_date DESC
LIMIT 1
) transactions ON TRUE
WHERE customer_transactions.transaction_type = 'buy'
AND customer_transactions.transaction_value > 90
It seems to be working when one condition is specified. But as soon as subsequential conditions are introduced that's where things start falling apart for me; changing condition to:
WHERE (customer_transactions.transaction_type = 'buy'
AND customer_transactions.transaction_value > 90)
AND (customer_transactions.transaction_type = 'sell'
AND customer_transactions.transaction_value > 100)
is obviously not going to work as there is no row that satisfies both of these conditions.
Is it possible to achieve this using the aproach I took? If so what am I missing? Or maybe there is another way to solve that would be more appropriate?
You could use a CTE with row_number and chech out the last transactios
WITH CTE as (SELECT
"customer_id", "transaction_type", "transaction_date",
"transaction_value",
ROW_NUMBER() OVER(PARTITION BY "customer_id", "transaction_type" ORDER BY "transaction_date" DESC) rn
FROM tab1)
SELECT "customer_id", "transaction_type", "transaction_date",
"transaction_value" FROM CTE
WHERE rn = 1
AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90)
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100)
ELSE FALSE END
AND (SELECT COUNT(*) FROM CTE c1
WHERE c1."customer_id"= CTE."customer_id" and rn = 1
AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90)
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100)
ELSE FALSE END ) = 2
customer_id
transaction_type
transaction_date
transaction_value
2
buy
2022-12-04
120.0
2
sell
2022-12-03
120.0
SELECT 2
fiddle
Use distinct on with custom order to select all the latest transactions per customer according to your several criteria (hence the OR) - latest CTE, then count the number of result records per user using count as a window function - latest_with_count CTE - and finally pick these that have a count equal to the number of criteria, i.e. all the criteria are honoured.
This may be a bit verbose and abstract template but hopefully would help with the generic problem. The idea would work for any number of conditions.
with t as
(
/*
your query here with several conditions in DISJUNCTION (OR) here, i.e.
WHERE (customer_transactions.transaction_type = 'buy' AND customer_transactions.transaction_value > 90)
OR (customer_transactions.transaction_type = 'sell' AND customer_transactions.transaction_value > 100)
*/
),
latest as
(
select distinct on (customer_id, transaction_type) *
from t
-- pick the latest per customer & type
order by customer_id, transaction_type, transaction_date desc
),
latest_with_count as
(
select *, count(*) over (partition by customer_id) cnt
from latest
)
select *
from latest_with_count
where cnt = 2 -- the number of criteria

ORACLE SQL - multiple JOINs from same table

I have data related material transactions in one table and log history header data related to materials is in another table and detailed log history data in third table. I'm trying to get different status update dates matched to material table but I get duplicate rows for one material transaction
Original material transaction table:
ORDER_NO
MATERIAL
QTY
0001
MAT01
2
0002
MAT02  
5
Original Log History Header transaction table:
ORDER_NO
LOG_ID
0001
1001
0001
1002
Status code 1 refers to Opened and code 2 to Closed
Detailed Log History table:
LOG_ID
STATUS_CODE
DATE
1001
1
11/12/2021
1002
2  
15/12/2021
With following SQL query:
SELECT
TO_CHAR (m.order_no) order_no,
m.material,
a.date opened_date,
ab.closed_date
FROM MATERIAL_TRANSACTIONS m
INNER JOIN HISTORY_LOG t
ON m.ORDER_NO = t.ORDER_NO
INNER JOIN HISTORY_LOG_DETAILED a
ON t.LOG_ID = a.LOG_ID
AND a.STATUS_CODE = '1'
INNER JOIN HISTORY_LOG_DETAILED ab
ON t.LOG_ID = ab.LOG_ID
AND ab.STATUS_CODE = '2'
I get following result:
ORDER_NO
MATERIAL
QTY
OPENED_DATE
CLOSED_DATE
0001
MAT01
2
11/12/2021
0001
MAT01  
2
15/12/2021
And I would like to get the status dates to the same row as below:
ORDER_NO
MATERIAL
QTY
OPENED_DATE
CLOSED_DATE
0001
MAT01
2
11/12/2021
15/12/2021
I would appreciate all the help I can get and am very sorry if there already is topic for similar issue.
Your problem occurs because you join the history table, which holds 2 records for the order. You could flatten this if you use 2 inline tables that hold exactly 1 record.
with opened_dates as (
select h.order_id, d.date
from history h
inner join details d on h.log_id = d.log_id and d.status_code = '1'
), closed_dates as (
select h.order_id, d.date
from history h
inner join details d on h.log_id = d.log_id and d.status_code = '2'
)
select to_char (m.order_no) order_no,
m.material,
o.date opened_date,
c.date closed_date
from material_transactions m
join opened_dates o on m.order_no = o.order_no
join closed_dates c on m.order_no = c.order_no
;
Just an idea :
I joined HISTORY_LOG and HISTORY_LOG_DETAILED tables to get dates for specific status, and set as OPENED_DATE and CLOSED_DATE (if status 1 , then opened date is DATE column, otherwise set it as 01.01.0001)
After that grouped those records by ORDER_NO and maxed the date values to get actual OPENED_DATE and CLOSED_DATE .
Finally joined this subquery with MATERIAL_TRANSACTIONS table :
SELECT
TO_CHAR (M.ORDER_NO) ORDER_NO,
M.MATERIAL,
QTY,
L_T.OPENED_DATE,
L_T.CLOSED_DATE
FROM MATERIAL_TRANSACTIONS M
INNER JOIN
(
SELECT L.ORDER_NO ,
MAX( CASE WHEN LD.STATUS_CODE = 1 THEN LD.DATE ELSE TO_DATE('01.01.0001','dd.mm.yyyy') END ) OPENED_DATE
MAX( CASE WHEN LD.STATUS_CODE = 2 THEN LD.DATE ELSE TO_DATE('01.01.0001','dd.mm.yyyy') END ) CLOSED_DATE
FROM
HISTORY_LOG L
INNER JOIN HISTORY_LOG_DETAILED LD ON LD.LOG_ID = L.LOG_ID
GROUP BY L.ORDER_NO
) L_T on L_T.ORDER_NO = M.ORDER_NO
Note: I didnt test it. So there can be small syntax errors. Please check it and for better help add a fiddle so i can test my query

How to query SQLite to find sum, latest record, grouped by id, in between dates?

Items (itemId, itemName)
Logs (logId, itemId, qtyAdded, qtyRemoved, availableStock, transactionDate)
Sample Data for Items:
itemId itemName
1 item 1
2 item 2
Sample Data for Logs:
logid itemId qtyAdded qtyRemoved avlStock transDateTime
1 2 5405 0 5405 June 1 (4PM)
2 2 1000 0 6405 June 2 (5PM)
3 2 0 6000 405 June 3 (11PM)
I need to see all items from Items table and their SUM(qtyAdded), SUM(qtyRemoved), latest availableStock (there's an option for choosing the range of transactionDate but default gets all records). Order of date in final result does not matter.
Preferred result: (without date range)
itemName qtyAddedSum qtyRemovedSum avlStock
item 1 6405 6000 405
item 2 <nothing here yet>
With date Range between June 2 (8AM) and June 3 (11:01PM)
itemName qtyAddedSum qtyRemovedSum avlStock
item 1 1000 6000 405
item 2 <no transaction yet>
So as you can see, final result is grouped which makes almost all my previous query correct except my availableStock is always wrong. If I focus in the availableStock, I can't get the two sums.
you could use group by sum, and between
select itemName, sum(qtyAdded), sum(qtyRemoved), sum(avlStock)
from Items
left join Logs on logs.itemId = items.itemId
where transDateTime between '2017-06-02 08:00:00' and '2017-06-03 23:00:00'
group by itemId
or
If you need the last avlStock
select itemName, sum(qtyAdded), sum(qtyRemoved), tt.avlStock
from Items
left join Logs on logs.itemId = items.itemId
INNER JOIN (
select logid,avlStock
from logs
inner join (
select itemId, max(transDateTime) max_trans
from Logs
group by itemId
) t1 on logs.itemId = t1.ItemId and logs.transDateTime = t1.max_trans
) tt on tt.logId = Logs.itemId
where transDateTime between '2017-06-02 08:00:00' and '2017-06-03 23:00:00'
group by itemId
Okay, I tried both of these and they worked, can anyone confirm if these are already efficient or if there are some more efficient answers there.
SELECT * FROM Items LEFT JOIN
(
SELECT * FROM Logs LEFT JOIN
(
SELECT SUM(qtyAdd) AS QtyAdded, SUM(qtySub) AS QtyRemoved, availableStock AS Stock
FROM Logs WHERE transactionDate BETWEEN julianday('2017-07-18 21:10:40')
AND julianday('2017-07-18 21:12:00') GROUP BY itemId
)
ORDER BY transactionDate DESC
)
USING (itemId) GROUP BY itemName;
SELECT * FROM Items LEFT JOIN
(
SELECT * FROM Logs LEFT JOIN
(
SELECT SUM(qtyAdd) AS QtyAdded, SUM(qtySub) AS QtyRemoved, availableStock AS Stock
FROM Logs GROUP BY itemId
)
ORDER BY transactionDate DESC
)
USING (itemId) GROUP BY itemName;

SQL Server Amount Split

I have below 2 tables in SQL Server database.
Customer Main Expense Table
ReportID CustomerID TotalExpenseAmount
1000 1 200
1001 2 600
Attendee Table
ReportID AttendeeName
1000 Mark
1000 Sam
1000 Joe
There is no amount at attendee level. I have need to manually calculate individual attendee amount as mentioned below. (i.e split TotalExpenseAmount based on number of attendees and ensure individual split figures round to 2 decimals and sums up to the TotalExpenseAmount exactly)
The final report should look like:
ReportID CustID AttendeeName TotalAmount AttendeeAmount
1000 1 Mark 200 66.66
1000 1 Sam 200 66.66
1000 1 Joe 200 66.68
The final report will have about 1,50,000 records. If you notice the attendee amount I have rounded the last one in such a way that the totals match to 200. What is the best way to write an efficient SQL query in this scenario?
You can do this using window functions:
select ReportID, CustID, AttendeeName, TotalAmount,
(case when seqnum = 1
then TotalAmount - perAttendee * (cnt - 1)
else perAttendee
end) as AttendeeAmount
from (select a.ReportID, a.CustID, a.AttendeeName, e.TotalAmount,
row_number() over (partition by reportId order by AttendeeName) as seqnum,
count(*) over (partition by reportId) as cnt,
cast(TotalAmount * 1.0 / count(*) over (partition by reportId) as decimal(10, 2)) as perAttendee
from attendee a join
expense e
on a.ReportID = e.ReportID
) ae;
The perAttendee amount is calculated in the subquery. This is rounded down by using cast() (only because floor() doesn't accept a decimal places argument). For one of the rows, the amount is the total minus the sum of all the other attendees.
Doing something similar to #Gordon's answer but using a CTE instead.
with CTECount AS (
select a.ReportId, a.AttendeeName,
ROW_NUMBER() OVER (PARTITION BY A.ReportId ORDER BY A.AttendeeName) [RowNum],
COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId) [AttendeeCount],
CAST(c.TotalExpenseAmount / (COUNT(A.AttendeeName) OVER (PARTITION BY A.ReportId)) AS DECIMAL(10,2)) [PerAmount]
FROM #Customer C INNER JOIN #Attendee A ON A.ReportId = C.ReportID
)
SELECT CT.ReportID, CT.CustomerId, AT.AttendeeName,
CASE WHEN CC.RowNum = 1 THEN CT.TotalExpenseAmount - CC.PerAmount * (CC.AttendeeCount - 1)
ELSE CC.PerAmount END [AttendeeAmount]
FROM #Customer CT INNER JOIN #Attendee AT
ON CT.ReportID = AT.ReportId
INNER JOIN CTECount CC
ON CC.ReportId = CT.ReportID AND CC.AttendeeName = AT.AttendeeName
I like the CTE because it allows me to separate the different aspects of the query. The cool thing that #Gordon used was the Case statement and the inner calculation to have the lines total correctly.

Only joining rows where the date is less than the max date in another field

Let's say I have two tables. One table containing employee information and the days that employee was given a promotion:
Emp_ID Promo_Date
1 07/01/2012
1 07/01/2013
2 07/19/2012
2 07/19/2013
3 08/21/2012
3 08/21/2013
And another table with every day employees closed a sale:
Emp_ID Sale_Date
1 06/12/2013
1 06/30/2013
1 07/15/2013
2 06/15/2013
2 06/17/2013
2 08/01/2013
3 07/31/2013
3 09/01/2013
I want to join the two tables so that I only include sales dates that are less than the maximum promotion date. So the result would look something like this
Emp_ID Sale_Date Promo_Date
1 06/12/2013 07/01/2012
1 06/30/2013 07/01/2012
1 06/12/2013 07/01/2013
1 06/30/2013 07/01/2013
And so on for the rest of the Emp_IDs. I tried doing this using a left join, something to the effect of
left join SalesTable on PromoTable.EmpID = SalesTable.EmpID and Sale_Date
< max(Promo_Date) over (partition by Emp_ID)
But apparently I can't use aggregates in joins, and I already know that I can't use them in the where statement either. I don't know how else to proceed with this.
The maximum promotion date is:
select emp_id, max(promo_date)
from promotions
group by emp_id;
There are various ways to get the sales before that date, but here is one way:
select s.*
from sales s
where s.sales_date < (select max(promo_date)
from promotions p
where p.emp_id = s.emp_id
);
Gordon's answer is right on! Alternatively, you could also do a inner join to a subquery to achieve your desired output like this:
SELECT s.emp_id
,s.sales_date
,t.promo_date
FROM sales s
INNER JOIN (
SELECT emp_id
,max(promo_date) AS promo_date
FROM promotions
GROUP BY emp_id
) t ON s.emp_id = t.emp_id
AND s.sales_date < t.promo_date;
SQL Fiddle Demo