Join two tables to get counts different dates - sql

I have a table A with columns:
id, transactiondate, pointsordered
Table B
id,redemptiondate,pointsused
Table C
as
id,joindate
What I want
All data is needed for a date range lets say 2014-01-01 to 2014-02-01 in YYYY-MM-DD
Count of total ids by date ( count of ids from table a)
count of accounts that had the first transaction on this date
total points ordered by date ( sum of points from table a)
count of accounts that redeemed on that date ( count of ids from table b )
countofpointsued on that date ( sum of points from table b)
new customers that joined by date
I understand id is a foreign key for table b and table c but how do i ensure i match the dates ?
for eg if i join by date such as a.transactiondate=b.redemption.date it gives me all the customers that had a transaction on that date and also redeemed on that date.
Where as I want a count of all customers that had transaction on that date and customers that redeemed on that date ( irrespetive of the fact when did they have their transaction)
Here is what I had tried
select count( distinct a.id) as noofcustomers, sum(a.pointsordered), sum(b.pointsused), count(distinct b.id)
from transaction as a join redemption as b on a.transactiondate=b.redemptiondate
where a .transactiondate between '2014-01-01' and '2014-02-01'
group by a.transactiondate,b.redemptiondate

I would first group the data by table and only then join the results by date. You shouldn't use inner join because you may lose data if there is no matching record on one side like no transaction on given date but a redemption. It would help if you'd have a list of dates in that range. If you don't have that you can build one using a CTE.
declare #from date = '2014-01-01'
declare #to date = '2014-02-01'
;
with dates as
(
select #from as [date]
union all
select dateadd(day, [date], 1) as d from dates where [date] < #to
)
, orders as
(
select transactiondate as [date], count(distinct id) as noofcustomers, sum(pointsordered) as pointsordered
from [transaction]
where transactiondate between #from and #to
group by transactiondate
)
, redemptions as
(
select redemptiondate as [date], count(distinct id) as noofcustomers, sum(pointsused) as pointsused
from [redemption]
where redemptiondate between #from and #to
group by redemptiondate
)
, joins as
(
select joindate as [date], count(distinct id) as noofcustomers
from [join]
where joindate between #from and #to
group by joindate
)
, firsts as
(
select transactiondate as [date], count(distinct id) as noofcustomers
from [transaction] t1
where transactiondate between #from and #to
and not exists (
select * from [transaction] t2 where t2.id = t1.id and t2.transactiondate < t1.transactiondate)
group by transactiondate
)
select
d.[date],
isnull(o.noofcustomers,0) as noofcustomersordered,
isnull(o.pointsordered,0) as totalpointsordered,
isnull(f.noofcustomers,0) as noofcustomersfirsttran,
isnull(r.noofcustomers,0) as noofcustomersredeemed,
isnull(r.pointsused,0) as totalpointsredeemed,
isnull(j.noofcustomers,0) as noofcustomersjoined
from dates d
left join orders o on o.[date] = d.[date]
left join redemptions r on r.[date] = d.[date]
left join joins j on j.[date] = d.[date]
left join firsts f on f.[date] = d.[date]
Please note that I didn't run the query so they may be errors, but I think the general idea is clear.

Related

Latest value of compared date range? (SQL/Snowflake)

I have values in Table-A like:
Patient|Invoice|Date
A,111,2021-02-01
A,222,2021-01-01
B,333,2021-03-01
B,444,2021-02-01
C,555,2021-04-01
C,666,2021-03-01
And values in Table-B like:
Patient|Value|Date
A,2,2021-01-05
A,3,2021-01-05
A,3,2021-02-05
B,1,2021-02-05
B,1,2021-03-05
C,6,2021-01-01
And I want to join the two tables such that I see the most recent cumulative sum of values in Table-B as-of the Date in Table-A for a given Patient.
Patient|Invoice|Latest Value|Date
A,111,5,2021-02-01
A,222,0,2021-01-01
B,333,1,2021-03-01
B,444,0,2021-02-01
C,555,6,2021-04-01
C,666,6,2021-03-01
How would I join these two tables by date to accomplish this?
First step seems like a basic SQL join:
select patient, invoice, sum(value), date
from table1 a
join table2 b
on a.patient=b.patient
and a.date=b.date
group by patient, invoice, date
But instead of a plain sum() you can apply a sum() over():
select patient, invoice
, sum(value) over(partition by patient order by date)
, date
from table1 a
join table2 b
on a.patient=b.patient
and a.date=b.date
group by patient, invoice, date
I think that first we need to calculate the time intervals when the invoice is valid (using LAG function), then calculate the cumulative SUM.
WITH A AS (
SELECT Patient, Invoice, Date, IFNULL(LAG(Date) OVER(PARTITION BY Patient ORDER BY Date), '1900-01-01') AS LG
FROM Table_A
)
SELECT DISTINCT A.Patient, A.Invoice, IFNULL(SUM(B.Value) OVER(PARTITION BY A.Patient ORDER BY A.Date), 0) AS Latest_Value, A.Date
FROM A
LEFT JOIN Table_B AS B
ON A.Patient = B.Patient
AND B.Date >= A.LG AND B.Date < A.Date
GROUP BY A.Patient, A.Invoice, A.Date, B.Value
ORDER BY A.Patient, A.Invoice, A.Date;

Return row with 0 for dates which has no entry in table - SQL

I have a table that records daily sales data. However, there are days when no sale is made and hence there is no record on the database for those dates. Is it possible to extract data out from the table that returns null for these dates when no sale was made
Referring to the image attached, it is seen there is no sales done on Jan 4 and Jan 8. I would like to write a SQL query that would return all dates from Jan 1 - Jan 10 but for Jan 4 and Jan 8, it should return 0 since there is no row for those dates (no sale done)
My date starts from Mar 1, 2018 and should go on for the next few quarters.
Yes. In Postgres, you can use generate_series() to generate dates or numbers within a range.
Then, you can use a cross join to generate the rows and then a left join to bring in the data:
select s.seller, gs.dte, t.count
from (select generate_series(mindate::timestamp, maxdate::timestamp, interval '1 day')::date
from (select min(date) as mindate, max(date) as maxdate
from t
) x
) gs(dte) cross join
(select distinct seller from t) s left join
t
on t.date = gs.dte and t.seller = s.seller
CTE is also an alternative here,
DECLARE #FDATE DATE = '2018-01-01'
,#TDATE DATE = '2018-01-10'
;WITH CTE_DATE
AS (
SELECT #FDATE AS CDATE
UNION ALL
SELECT DATEADD(DAY,1,CDATE)
FROM CTE_DATE
WHERE DATEADD(DAY,1,CDATE) <= #TDATE
)
SELECT C.CDATE AS [DATE],COUNT(*) AS [COUNT]
FROM CTE_DATE AS C
LEFT OUTER JOIN [MY_TABLE] AS M ON C.CDATE = M.[DATE] --*[your table here]*
GROUP BY C.CDATE
OPTION ( MAXRECURSION 0 );

Get last transactions from Transaction table by date

I need to get Transactions from Transaction Table from 2 lats dates, where this Transactions completed. And check, if amount of transaction on last day more than 10% than amount of transaction for previous day.
My table Have columns AccountId, SubAccountId, Amount, Date and UserId.
For example:
CREATE TABLE Transactions
(`id` int, `AccountId` int, `SubAccountId` int, `Amount` decimal
,`Date` datetime, `User` int);
INSERT INTO Transactions
(`id`, `AccountId`, `SubAccountId`, `Amount`, `Date`, `User`)
VALUES
(1, 1, 2, 100, '06/15/2018', 1),
(2, 1, 2, 40, '06/15/2018', 1),
(3, 1, 2, 20, '06/14/2018', 1),
(4, 1, 2, 0, '06/10/2018', 1),
;
In this example I need to select only transactions for date 06/15/2018 and 06/14/2018, and display sum of amount of transactions for this days.
So far, I can select the last transactions, like this:
select distinct AccountId,
SubAccountId,
UserId,
Amount,
Date AS lastDate,
min(Date)
over (partition by PayerAccount order by Date
rows between 1 preceding and 1 preceding) as PrevDate
from Transactions
order by UserId
This checks the sum amount of the current day against the sum amount of the previous day (to confirm it's greater than 10%) and then does a top 2 to extract only the last two days...
WITH CTE AS(
select
Date,
sum(Amount) as SumAmount,
rownum = ROW_NUMBER() OVER(ORDER BY Date)
from Transactions
group by Date
)
select top 2 CTE.date, CTE.SumAmount, CTE.rownum, CASE WHEN prev.sumamount > CTE.sumamount * 0.10 THEN 1 else 0 END isgreaterthan10per
from CTE
LEFT JOIN CTE prev ON prev.rownum = CTE.rownum - 1
order by CTE.date desc
with CTE1 as
(
select accountID, Date, sum(Amount) as Amount
from Transactions
where Date between '2018-06-14' and '2018-06-16' -- Apply date restriction here
group by accountID, Date
)
, CTE2 as
(
select accountID, Amount, Date,
row_number() over (partition by accountID order by date desc) as rn
from Transactions
)
select a1.accountID, a1.Amount, a1.Date, a2.Date, a2.Amount
from CTE2 a1
left join CTE2 a2
on a1.accountID = a2.accountID
and a2.rn = a1.rn+1
This will get you the transactions for each day and those for the day previous by accountID on one line. From here you can compare values.
you wanna group by date and sum the amount
select Date,sum(Amount) from Transactions /*where contitions*/ group by Date
You can use this. I hope it will for you.
SELECT
*
FROM Transactions tb
INNER JOIN
(
SELECT MAX([Date]) AS [Date] FROM Transactions
UNION ALL
SELECT MAX([Date]) AS [Date] FROM Transactions WHERE [Date] < (SELECT MAX([Date]) AS [Date] FROM Transactions)
) tb1 ON tb1.[Date] = tb.[Date]
You can check out below query to get last two date and sum of amount on those two dates.
select distinct accountid,subaccountid,user,trandate,sum(amount) over(partition by date)
from transactions
where date>=(select max(date) from transactions where date < (select max(date) from transactions));

I have 3 tables with dates to which I want to join to a date dimension table but it is returning many duplicates with left joins

I have 4 tables with dates to which I want to join to a date dimension table but it is returning many duplicates with left joins.
Tables are basically a date field which I want to count.
mdate datetime, mordate varchar(10),fteam varchar(20)
sdate datetime,fteam varchar(20)
bdate datetime,fteam varchar(20)
These are actually one table with the separate dates which I am joining 3 times to the dimension table to get one dataset. Also this table
compdate datetime, fteam varchar(20)
and the date dimension table as date in yyyymmdd,which I join on the date field.
as
select cp.fteam,md.mdate,sd.sdate,bd.bdate,cp.cpdate,d.date
into #resultstable
from datedimension d
left join mdate md
on d.date = convert(date,md.mdate,103)
left join sentdate sd
on d.date = convert(date,sd.sdate,103)
left join bacdate bd
on d.date = convert(date,bd.bdate,103)
left join compdate cp
on d.date = convert(date,cp.cdate,103)
Doing this I want the dates in the date dimension to give me one date I can use a where clause on to get counts of each date from the 4 different tables for a report.
However it is giving me many repeats as each time a there is a matching date you get the same line repeated repeated for the matching date on all tables.
This gives many counts which are wrong.
ie
if md table has a record 2 records for 2016/06/29 and cp has 3 and bd has six
The dimension date result will be 36! for md when it should only be showing 2!, ie 6x3x2.
How can I join these tables with causing repeats and incorrect results.
I thought it would be a standard way to join fact tables with a dimension table to give accurate results and not duplicates as you are join sets together.
I have tried picking only the dates from each table only but it still gives repeats.
I cannot show a schema as company details but you can put together a hypothetical one from the tables shown.
What you are seeing is that because none of your fact tables relate to each other, you are essentially creating a Cartesian product for the fact tables--where they only relate to each other by date.
Consider this simplified version of your example, where I also include some sample data for "today":
CREATE TABLE #fact1 (id int identity, dt datetime, val varchar(5));
CREATE TABLE #fact2 (id int identity, dt datetime, val varchar(5));
CREATE TABLE #fact3 (id int identity, dt datetime, val varchar(5));
CREATE TABLE #fact4 (id int identity, dt datetime, val varchar(5));
CREATE TABLE #date (dt datetime, val varchar(5));
GO
INSERT INTO #fact1 (dt, val) VALUES (GETDATE(),'fact1');
INSERT INTO #fact2 (dt, val) VALUES (GETDATE(),'fact2');
INSERT INTO #fact3 (dt, val) VALUES (GETDATE(),'fact3');
INSERT INTO #fact4 (dt, val) VALUES (GETDATE(),'fact4');
WAITFOR DELAY '00:00:01';
GO 5
INSERT INTO #date (dt, val) VALUES (CAST(GETDATE() AS date),'Today');
GO
SELECT *
FROM #date d
JOIN #fact1 AS f1 ON d.dt = CAST(f1.dt AS date)
JOIN #fact2 AS f2 ON d.dt = CAST(f2.dt AS date)
JOIN #fact3 AS f3 ON d.dt = CAST(f3.dt AS date)
JOIN #fact4 AS f4 ON d.dt = CAST(f4.dt AS date);
GO
DROP TABLE #fact1;
DROP TABLE #fact2;
DROP TABLE #fact3;
DROP TABLE #fact4;
DROP TABLE #date;
GO
Note that 625 rows are returned. This is the Cartesian product of the four fact tables, which is then joined to the dimension table. This happens because there is no relation between the fact tables other than the date. As a result, any one row for "today" in one fact table is joined to every row for "today" in every other fact table.
Instead, consider how your four fact tables related WITHOUT the join to the date dimension table. Re-write your query so that the data makes sense before joining to the date dimension. Do the tables relate on something like an order_id or any other aspect?
If the fact tables only relate insomuch as you are aggregating them by date--then yes, you'll need to take another approach:
a) Aggregate by date first, then join the aggregated sets together. This option makes the most sense if you only need the aggregated values, and don't need the full details for your report.
SELECT *
FROM #date d
JOIN (SELECT CAST(dt AS date) AS dt, count(*) AS dt_count
FROM #fact1 GROUP BY CAST(dt AS date)) AS f1 ON d.dt = f1.dt
JOIN (SELECT CAST(dt AS date) AS dt, count(*) AS dt_count
FROM #fact2 GROUP BY CAST(dt AS date)) AS f2 ON d.dt = f2.dt
JOIN (SELECT CAST(dt AS date) AS dt, count(*) AS dt_count
FROM #fact3 GROUP BY CAST(dt AS date)) AS f3 ON d.dt = f3.dt
JOIN (SELECT CAST(dt AS date) AS dt, count(*) AS dt_count
FROM #fact4 GROUP BY CAST(dt AS date)) AS f4 ON d.dt = f4.dt
b) Assign an arbitrary row_number() for each calendar day, then use that as a secondary join criterion. If the data doesn't actually relate, this option might work, but the detailed result set is largely meaningless when all data in a single row doesn't refer to a single entity. This might give you the right numbers, but is logically a useless result set.
SELECT *
FROM #date d
JOIN (SELECT *,
ROW_NUMBER() OVER(PARTITION BY CAST(dt AS date) ORDER BY dt) AS row_num
FROM #fact1 ) AS f1 ON d.dt = CAST(f1.dt AS date)
JOIN (SELECT *,
ROW_NUMBER() OVER(PARTITION BY CAST(dt AS date) ORDER BY dt) AS row_num
FROM #fact2 ) AS f2 ON d.dt = CAST(f2.dt AS date) AND f1.row_num = f2.row_num
JOIN (SELECT *,
ROW_NUMBER() OVER(PARTITION BY CAST(dt AS date) ORDER BY dt) AS row_num
FROM #fact3 ) AS f3 ON d.dt = CAST(f3.dt AS date) AND f1.row_num = f3.row_num
JOIN (SELECT *,
ROW_NUMBER() OVER(PARTITION BY CAST(dt AS date) ORDER BY dt) AS row_num
FROM #fact4 ) AS f4 ON d.dt = CAST(f4.dt AS date) AND f1.row_num = f4.row_num
c) Break this up into separate statements: one for each fact table. Optionally UNION those results into a single result set. This result set could then be further aggregated/grouped to give you the results you want.
SELECT *, 'Fact 1' AS SourceTable
FROM #date d
JOIN #fact1 AS f1 ON d.dt = CAST(f1.dt AS date)
UNION ALL
SELECT *, 'Fact 2' AS SourceTable
FROM #date d
JOIN #fact2 AS f2 ON d.dt = CAST(f2.dt AS date)
UNION ALL
SELECT *, 'Fact 3' AS SourceTable
FROM #date d
JOIN #fact3 AS f3 ON d.dt = CAST(f3.dt AS date)
UNION ALL
SELECT *, 'Fact 4' AS SourceTable
FROM #date d
JOIN #fact4 AS f4 ON d.dt = CAST(f4.dt AS date);
In my opinion, options a & c offer the best solutions when the fact tables don't otherwise relate to each other. Option b might work, but you would need to be very careful that your data is meaningful & doesn't create confusing or erroneous results.
Additionally, while it is orthogonal to the question asked, keep in mind that applying a function to a join criteria (in this case, CONVERTing the date column) will prevent index usage, resulting in a table scan.
You should have a single date table that you join to from the fact table multiple times. This is called a role playing dimension. Your query will look like this:
SELECT fact.*
,COALESCE(moc.datekey, #unknownDateKey)
,COALESCE(sent.datekey, #unknownDateKey)
FROM factTable fact
LEFT OUTER JOIN date moc
ON fact.mocdate = moc.date
LEFT OUTER JOIN date sent
ON fact.sentdate = sent.date
You would have the #unknownDateKey as a variable above that is set to the key of the unknown member for the dimension.

Efficiently group by column aggregate

SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
GROUP BY date, id
HAVING sum(revenue)>1000
Returns rows that have revenue>1000.
SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
AND id IN (SELECT id FROM table where date between '2013-01-01' and '2013-01-08' GROUP BY id HAVING sum(revenue)>1000)
GROUP BY date, id
Returns rows for id's whose total revenue over the date period is >1000 as desired. But this query is much slower. Any quicker way to do this?
Make sure you have indexes on the date and id columns, and try this variation:
select t.date, t.id, sum(t.revenue)
from table t
inner join (
select id
from table
where date between '2013-01-01' and '2013-01-08'
group by id
having sum(revenue) > 1000
) ts on t.id = ts.id
where t.date between '2013-01-01' and '2013-01-08'
group by t.date, t.id
it's not MySQL, it's Vertica ;)
Cris, what projection and order by you using in CREATE TABLE ???
Do you try using database designer
see http://my.vertica.com/docs/6.1.x/HTML/index.htm#14415.htm