SQL (Subqueries) - sql

Struggling to go the extra step with a SQL query I'd like to run.
I have a customer database with a Customer table with the date/time detail of when the customer joined and a transaction table with details of their transactions of the years
What I'd like to do is to Group by the Join Date (as Year) and count the number that joined in each year then in the next column I'd like to then count the number who have transacted in a specific year E.g. 2016 the current year. This way I can show customer retention over the years.
Both tables are linked by a customer URN, but I am struggling to get my head around the the most efficient way to show this. I can easily count and group the members by joined year and I can display the max dated transaction but I am struggling to bring the two together. I think I need to use sub queries and a left join but it's alluding me.
Example output column headers with data
Year_Joined = 2009
Joiner_Count = 10
Transact_in_2016 = 5
Where I am syntax-wise. I know this is no where near complete. As I need to group by DateJoined and then sub query the count of customers of have transacted in 2016?
SELECT Customer.URNCustomer,
MAX(YEAR(Customer.DateJoined)),
MAX(YEAR(Tran.TranDate)) As Latest_Tran,
FROM Mydatabase.dbo.Customer
LEFT JOIN Mydatabase.dbo.Tran
ON Tran.URNCustomer = Customer.URNCustomer
GROUP BY Customer.URNCustomer
ORDER BY Customer.URNCustomer

The best approach is to do the aggregation before doing the joins. You want to count two different things, so count them individually and them combine them.
The following uses full outer join. This handles the case where there are years with no new customers and years with no transactions:
select coalesce(c.yyyy, t.yyyy) as yyyy,
coalesce(c.numcustomers, 0) as numcustomers,
coalesce(t.numtransactions, 0) as numtransactions
from (select year(c.datejoined) as yyyy, count(*) as numcustomers
from Mydatabase.dbo.Customer c
group by year(c.datejoined)
) c full outer join
(select year(t.trandate) as yyyy, count(*) as numtransactions
from database.dbo.Tran t
group by year(t.trandate)
) t
on c.yyyy = t.yyyy;

You may want to try something like this:
SELECT YEAR(Customer.DateJoined),
COUNT( Customer.URNCustomer ),
COUNT( DISTINCT Tran.URNCustomer ) AS NO_ACTIVE_IN_2016
FROM Mydatabase.dbo.Customer
LEFT Mydatabase.dbo.Tran
ON Tran.URNCustomer = Customer.URNCustomer
AND YEAR(Tran.TranDate) = 2016
GROUP BY YEAR(Customer.DateJoined)

Related

How to get the sum of transaction amount happened in one date?

I am trying to write a query that will give me transaction amount sum happened in one date. The problem is , when I added column date in my query, I get individual values not their sum. The requirement for this query is to have one entry for each merchant but i am getting multiple rows for one merchant.
SELECT SUBSTR(m.MERCHANTLASTNAME, 1, 36) Name1,
m.MERCHANTBANKBSB MerchantAccbsb,
m.MERCHANTBANKACCNR Merchant_act,
m.MERCHANTID merchantid,
t.transactiondate date1,
sum(t.TRANSACTIONAMOUNT) as total
FROM fss_merchant m
JOIN fss_terminal term
ON m.MERCHANTID = term.MERCHANTID
JOIN FSS_DAILY_TRANSACTION t
ON term.TERMINALID = t.TERMINALID
group by t.transactiondate, SUBSTR(m.MERCHANTLASTNAME, 1, 36), m.MERCHANTID, m.MERCHANTBANKBSB, m.MERCHANTBANKACCNR,
m.MERCHANTLASTNAME
Output of my query:
I want to get one entry per each merchant with the sum of transaction amount in one day, not multiple rows of transaction in that day.
You can calculate the total amount in different inner query with the truncated date and join it with FSS_MERCHANT table so that issues described by #SatishSK and #mangusta is taken care.
You can use the following query:
SELECT
SUBSTR(M.MERCHANTLASTNAME, 1, 36) NAME1,
M.MERCHANTBANKBSB MERCHANTACCBSB,
M.MERCHANTBANKACCNR MERCHANT_ACT,
M.MERCHANTID MERCHANTID,
M_DATA.TRANSACTIONDATE DATE1,
M_DATA.TOTAL AS TOTAL
FROM
FSS_MERCHANT M
INNER JOIN (
SELECT
TERM.MERCHANTID MERCHANTID,
TRUNC(T.TRANSACTIONDATE) TRANSACTIONDATE,
SUM(T.TRANSACTIONAMOUNT) AS TOTAL
FROM
FSS_TERMINAL TERM
JOIN FSS_DAILY_TRANSACTION T ON TERM.TERMINALID = T.TERMINALID
GROUP BY
TERM.MERCHANTID,
TRUNC(T.TRANSACTIONDATE)
) M_DATA ON ( M.MERCHANTID = M_DATA.MERCHANTID );
Good luck!!
t.transactiondate column might contain date+time values. Use TRUNC(t.transactiondate) where you are using just t.transactiondate. You will get sum(transaction amount) "Date-wise" for each merchant.
OR
Filter out rows based on "Date" value in WHERE clause to retrieve data pertaining to a specific date.
Probably the reason is that you have included both m.MERCHANTLASTNAME and SUBSTR(m.MERCHANTLASTNAME,1,36) into the group by clause.
In case if there are entries with same SUBSTR(m.MERCHANTLASTNAME,1,36) but different m.MERCHANTLASTNAME, this is going to yield duplicates. You need to remove m.MERCHANTLASTNAME from group by clause

Query from two tables have two different date fields

I have two tables one for receiving "PO_RECVR_HIST" and other table for sales "PS_TKT_HIST_LIN". I want to create a query showing the total for receiving and total for sales. The two dates are not related the results come wrong. The two tables has the same vendor. I am using the following query
SELECT P.VEND_NO,
sum(P.RECVR_TOT)AS RECV_TOT,
sum(S.CALC_EXT_PRC) AS SAL_TOT
FROM PO_RECVR_HIST P INNER JOIN
PS_TKT_HIST_LIN S
ON P.VEND_NO = S.ITEM_VEND_NO
WHERE P.RECVR_DAT > getdate()-7
GROUP BY P.VEND_NO, S.BUS_DAT
HAVING S.BUS_DAT > getdate()-7
ORDER BY P.VEND_NO
Any advise please?
I think I get it. You want the sums in different columns. One approach is to do a UNION ALL to get the data together, and then aggregate:
SELECT VEND_NO,
SUM(RECVR_TOT)AS RECV_TOT,
SUM(CALC_EXT_PRC) AS SAL_TOT
FROM ((SELECT P.VEND_NO, P.RECVR_DAT as dte, P.RECVR_TOT, 0 as CALC_EXT_PRC
FROM PO_RECVR_HIST P
) UNION ALL
(SELECT S.ITEM_VEND_NO, S.BUS_DAT, 0, CALC_EXT_PRC
FROM PS_TKT_HIST_LIN S
)
) PS
WHERE DTE > GETDATE() - 7
GROUP BY P.VEND_NO
ORDER BY P.VEND_NO;
A JOIN is just not a good approach because it will throw off the aggregation.

Using a stored procedure in Teradata to build a summarial history table

I am using Terdata SQL Assistant connected to an enterprise DW. I have written the query below to show an inventory of outstanding items as of a specific point in time. The table referenced loads and stores new records as changes are made to their state by load date (and does not delete historical records). The output of my query is 1 row for the specified date. Can I create a stored procedure or recursive query of some sort to build a history of these summary rows (with 1 new row per day)? I have not used such functions in the past; links to pertinent previously answered questions or suggestions on how I could get on the right track in researching other possible solutions are totally fine if applicable; just trying to bridge this gap in my knowledge.
SELECT
'2017-10-02' as Dt
,COUNT(DISTINCT A.RECORD_NBR) as Pending_Records
,SUM(A.PAY_AMT) AS Total_Pending_Payments
FROM DB.RECORD_HISTORY A
INNER JOIN
(SELECT MAX(LOAD_DT) AS LOAD_DT
,RECORD_NBR
FROM DB.RECORD_HISTORY
WHERE LOAD_DT <= '2017-10-02'
GROUP BY RECORD_NBR
) B
ON A.RECORD_NBR = B.RECORD_NBR
AND A.LOAD_DT = B.LOAD_DT
WHERE
A.RECORD_ORDER =1 AND Final_DT Is Null
GROUP BY Dt
ORDER BY 1 desc
Here is my interpretation of your query:
For the most recent load_dt (up until 2017-10-02) for record_order #1,
return
1) the number of different pending records
2) the total amount of pending payments
Is this correct? If you're looking for this info, but one row for each "Load_Dt", you just need to remove that INNER JOIN:
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
ORDER BY 1 DESC
If you want to get the summary info per record_order, just add record_order as a grouping column:
SELECT
load_Dt,
record_order,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE final_Dt IS NULL
GROUP BY load_Dt, record_order
ORDER BY 1,2 DESC
If you want to get one row per day (if there are calendar days with no corresponding "load_dt" days), then you can SELECT from the sys_calendar.calendar view and LEFT JOIN the query above on the "load_dt" field:
SELECT cal.calendar_date, src.Pending_Records, src.Total_Pending_Payments
FROM sys_calendar.calendar cal
LEFT JOIN (
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
) src ON cal.calendar_date = src.load_Dt
WHERE cal.calendar_date BETWEEN <start_date> AND <end_date>
ORDER BY 1 DESC
I don't have access to a TD system, so you may get syntax errors. Let me know if that works or you're looking for something else.

SQL Server get customer with 7 consecutive transactions

I am trying to write a query that would get the customers with 7 consecutive transactions given a list of CustomerKeys.
I am currently doing a self join on Customer fact table that has 700 Million records in SQL Server 2008.
This is is what I came up with but its taking a long time to run. I have an clustered index as (CustomerKey, TranDateKey)
SELECT
ct1.CustomerKey,ct1.TranDateKey
FROM
CustomerTransactionFact ct1
INNER JOIN
#CRTCustomerList dl ON ct1.CustomerKey = dl.CustomerKey --temp table with customer list
INNER JOIN
dbo.CustomerTransactionFact ct2 ON ct1.CustomerKey = ct2.CustomerKey -- Same Customer
AND ct2.TranDateKey >= ct1.TranDateKey
AND ct2.TranDateKey <= CONVERT(VARCHAR(8), (dateadd(d, 6, ct1.TranDateTime), 112) -- Consecutive Transactions in the last 7 days
WHERE
ct1.LogID >= 82800000
AND ct2.LogID >= 82800000
AND ct1.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
AND ct2.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
GROUP BY
ct1.CustomerKey,ct1.TranDateKey
HAVING
COUNT(*) = 7
Please help make it more efficient. Is there a better way to write this query in 2008?
You can do this using window functions, which should be much faster. Assuming that TranDateKey is a number and you can subtract a sequential number from it, then the difference constant for consecutive days.
You can put this in a query like this:
SELECT CustomerKey, MIN(TranDateKey), MAX(TranDateKey)
FROM (SELECT ct.CustomerKey, ct.TranDateKey,
(ct.TranDateKey -
DENSE_RANK() OVER (PARTITION BY ct.CustomerKey, ct.TranDateKey)
) as grp
FROM CustomerTransactionFact ct INNER JOIN
#CRTCustomerList dl
ON ct.CustomerKey = dl.CustomerKey
) t
GROUP BY CustomerKey, grp
HAVING COUNT(*) = 7;
If your date key is something else, there is probably a way to modify the query to handle that, but you might have to join to the dimension table.
This would be a perfect task for a COUNT(*) OVER (RANGE ...), but SQL Server 2008 supports only a limited syntax for Windowed Aggregate Functions.
SELECT CustomerKey, MIN(TranDateKey), COUNT(*)
FROM
(
SELECT CustomerKey, TranDateKey,
dateadd(d,-ROW_NUMBER()
OVER (PARTITION BY CustomerKey
ORDER BY TranDateKey),TranDateTime) AS dummyDate
FROM CustomerTransactionFact
) AS dt
GROUP BY CustomerKey, dummyDate
HAVING COUNT(*) >= 7
The dateadd calculates the difference between the current TranDateTime and a Row_Number over all date per customer. The resulting dummyDatehas no actual meaning, but is the same meaningless date for consecutive dates.

How to perform running sum (balance) in SQL

I have 2 SQL Tables
unit_transaction
unit_detail_transactions
(tables schema here: http://sqlfiddle.com/#!3/e3204/2 )
What I need is to perform an SQL Query in order to generate a table with balances. Right now I have this SQL Query but it's not working fine because when I have 2 transactions with the same date then the balance is not calculated correctly.
SELECT
ft.transactionid,
ft.date,
ft.reference,
ft.transactiontype,
CASE ftd.isdebit WHEN 1 THEN MAX(ftd.debitaccountid) ELSE MAX(ftd.creditaccountid) END as financialaccountname,
CAST(COUNT(0) as tinyint) as totaldetailrecords,
ftd.isdebit,
SUM(ftd.amount) as amount,
balance.amount as balance
FROM unit_transaction_details ftd
JOIN unit_transactions ft ON ft.transactionid = ftd.transactionid
JOIN
(
SELECT DISTINCT
a.transactionid,
SUM(CASE b.isdebit WHEN 1 THEN b.amount ELSE -ABS(b.amount) END) as amount
--SUM(b.debit-b.credit) as amount
FROM unit_transaction_details a
JOIN unit_transactions ft ON ft.transactionid = a.transactionid
CROSS JOIN unit_transaction_details b
JOIN unit_transactions ft2 ON ft2.transactionid = b.transactionid
WHERE (ft2.date <= ft.date)
AND ft.unitid = 1
AND ft2.unitid = 1
AND a.masterentity = 'CONDO-A'
GROUP BY a.transactionid,a.amount
) balance ON balance.transactionid = ft.transactionid
WHERE
ft.unitid = 1
AND ftd.isactive = 1
GROUP BY
ft.transactionid,
ft.date,
ft.reference,
ft.transactiontype,
ftd.isdebit,
balance.amount
ORDER BY ft.date DESC
The result of the query is this:
Any clue on how to perform a correct SQL that will show me the right balances ordered by transaction date in descendant mode?
Thanks a lot.
EDIT: THINK OF 2 POSSIBLE SOLUTIONS
The problem is generated when you have the same date in 2 transactions, so here is what Im going to do:
Save Date and Time into "date" column. That way there won't be 2 exact dates.
OR
Create a "priority" column and set the priority for each record. So if I found that the date already exists and it has priority = 1 then the current priority will be 2.
What do you think?
There are two ways to do a running sum. I am going to show the syntax on a simpler table, to give you an idea.
Some databases (Oracle, PostgreSQL, SQL Server 2012, Teradata, DB2 for instance) support cumulative sums directly. For this you use the following function:
select sum(<val>) over (partition by <column> order by <ordering column>)
from t
This is a windows function that will calculate the running sum of for each group of records identified by . The order of the sum is .
Alas, many databases don't support this functionality, so you would need to do a self join to do this in a single SELECT query in the database:
select t.column, sum(tprev.<val>) as cumsum
from t left join
t tprev
where t.<column> = tprev.<column> and
t.<ordering column> >= tprev.<ordering column>
group by t.column
There is also the possibility of creating another table and using a cursor to assign the cumulative sum, or of doing the sum at the application level.