Latest value of compared date range? (SQL/Snowflake) - sql

I have values in Table-A like:
Patient|Invoice|Date
A,111,2021-02-01
A,222,2021-01-01
B,333,2021-03-01
B,444,2021-02-01
C,555,2021-04-01
C,666,2021-03-01
And values in Table-B like:
Patient|Value|Date
A,2,2021-01-05
A,3,2021-01-05
A,3,2021-02-05
B,1,2021-02-05
B,1,2021-03-05
C,6,2021-01-01
And I want to join the two tables such that I see the most recent cumulative sum of values in Table-B as-of the Date in Table-A for a given Patient.
Patient|Invoice|Latest Value|Date
A,111,5,2021-02-01
A,222,0,2021-01-01
B,333,1,2021-03-01
B,444,0,2021-02-01
C,555,6,2021-04-01
C,666,6,2021-03-01
How would I join these two tables by date to accomplish this?

First step seems like a basic SQL join:
select patient, invoice, sum(value), date
from table1 a
join table2 b
on a.patient=b.patient
and a.date=b.date
group by patient, invoice, date
But instead of a plain sum() you can apply a sum() over():
select patient, invoice
, sum(value) over(partition by patient order by date)
, date
from table1 a
join table2 b
on a.patient=b.patient
and a.date=b.date
group by patient, invoice, date

I think that first we need to calculate the time intervals when the invoice is valid (using LAG function), then calculate the cumulative SUM.
WITH A AS (
SELECT Patient, Invoice, Date, IFNULL(LAG(Date) OVER(PARTITION BY Patient ORDER BY Date), '1900-01-01') AS LG
FROM Table_A
)
SELECT DISTINCT A.Patient, A.Invoice, IFNULL(SUM(B.Value) OVER(PARTITION BY A.Patient ORDER BY A.Date), 0) AS Latest_Value, A.Date
FROM A
LEFT JOIN Table_B AS B
ON A.Patient = B.Patient
AND B.Date >= A.LG AND B.Date < A.Date
GROUP BY A.Patient, A.Invoice, A.Date, B.Value
ORDER BY A.Patient, A.Invoice, A.Date;

Related

SQL: Difference between consecutive rows

Table with 3 columns: order id, member id, order date
Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id
What I have is this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id+1;
It's not helping me completely as the output I need is:
You can use lag() to get the date of the previous order by the same customer:
select o.*,
datediff(
order_date,
lag(order_date) over(partition by member_id order by order_date, order_id)
) days_diff
from orders o
When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.
I just don't get the logic you want to compute num_orders.
May be something like this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id
where not exists (
select intermediate_order
from orders as intermedite_order
where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

How to Compare Column in one Table to OneOther Value using SQL

I am trying to perform an analysis where I select from a table only rows that fulfill certain criteria. In this instance, I am interested in date criteria. Specifically, in this query:
SELECT * FROM INPUT_TABLE
WHERE THE_DATE<='2018-01-01' and THE_DATE >='2017-01-01'
I wish to replace the strings '2018-01-01' and '2017-01-01'
with a sort of subquery, where I keep the values of the minimum and maximum dates in another table,called VALUE_TABLES which has the following 2 values:
MAX_DATE MIN_DATE
2018-01-01 2017-01-01
How exactly can I do this?
Use JOIN:
SELECT *
FROM INPUT_TABLE i JOIN
(SELECT MIN(DATE) as MIN_DATE, MAX(DATE) as MAX_DATE
FROM TABLE2
) t2
ON i.THE_DATE >= T2.MIN_DATE AND i.THE_DATE <= t2.MAX_DATE;

Join two tables to get counts different dates

I have a table A with columns:
id, transactiondate, pointsordered
Table B
id,redemptiondate,pointsused
Table C
as
id,joindate
What I want
All data is needed for a date range lets say 2014-01-01 to 2014-02-01 in YYYY-MM-DD
Count of total ids by date ( count of ids from table a)
count of accounts that had the first transaction on this date
total points ordered by date ( sum of points from table a)
count of accounts that redeemed on that date ( count of ids from table b )
countofpointsued on that date ( sum of points from table b)
new customers that joined by date
I understand id is a foreign key for table b and table c but how do i ensure i match the dates ?
for eg if i join by date such as a.transactiondate=b.redemption.date it gives me all the customers that had a transaction on that date and also redeemed on that date.
Where as I want a count of all customers that had transaction on that date and customers that redeemed on that date ( irrespetive of the fact when did they have their transaction)
Here is what I had tried
select count( distinct a.id) as noofcustomers, sum(a.pointsordered), sum(b.pointsused), count(distinct b.id)
from transaction as a join redemption as b on a.transactiondate=b.redemptiondate
where a .transactiondate between '2014-01-01' and '2014-02-01'
group by a.transactiondate,b.redemptiondate
I would first group the data by table and only then join the results by date. You shouldn't use inner join because you may lose data if there is no matching record on one side like no transaction on given date but a redemption. It would help if you'd have a list of dates in that range. If you don't have that you can build one using a CTE.
declare #from date = '2014-01-01'
declare #to date = '2014-02-01'
;
with dates as
(
select #from as [date]
union all
select dateadd(day, [date], 1) as d from dates where [date] < #to
)
, orders as
(
select transactiondate as [date], count(distinct id) as noofcustomers, sum(pointsordered) as pointsordered
from [transaction]
where transactiondate between #from and #to
group by transactiondate
)
, redemptions as
(
select redemptiondate as [date], count(distinct id) as noofcustomers, sum(pointsused) as pointsused
from [redemption]
where redemptiondate between #from and #to
group by redemptiondate
)
, joins as
(
select joindate as [date], count(distinct id) as noofcustomers
from [join]
where joindate between #from and #to
group by joindate
)
, firsts as
(
select transactiondate as [date], count(distinct id) as noofcustomers
from [transaction] t1
where transactiondate between #from and #to
and not exists (
select * from [transaction] t2 where t2.id = t1.id and t2.transactiondate < t1.transactiondate)
group by transactiondate
)
select
d.[date],
isnull(o.noofcustomers,0) as noofcustomersordered,
isnull(o.pointsordered,0) as totalpointsordered,
isnull(f.noofcustomers,0) as noofcustomersfirsttran,
isnull(r.noofcustomers,0) as noofcustomersredeemed,
isnull(r.pointsused,0) as totalpointsredeemed,
isnull(j.noofcustomers,0) as noofcustomersjoined
from dates d
left join orders o on o.[date] = d.[date]
left join redemptions r on r.[date] = d.[date]
left join joins j on j.[date] = d.[date]
left join firsts f on f.[date] = d.[date]
Please note that I didn't run the query so they may be errors, but I think the general idea is clear.

Efficiently group by column aggregate

SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
GROUP BY date, id
HAVING sum(revenue)>1000
Returns rows that have revenue>1000.
SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
AND id IN (SELECT id FROM table where date between '2013-01-01' and '2013-01-08' GROUP BY id HAVING sum(revenue)>1000)
GROUP BY date, id
Returns rows for id's whose total revenue over the date period is >1000 as desired. But this query is much slower. Any quicker way to do this?
Make sure you have indexes on the date and id columns, and try this variation:
select t.date, t.id, sum(t.revenue)
from table t
inner join (
select id
from table
where date between '2013-01-01' and '2013-01-08'
group by id
having sum(revenue) > 1000
) ts on t.id = ts.id
where t.date between '2013-01-01' and '2013-01-08'
group by t.date, t.id
it's not MySQL, it's Vertica ;)
Cris, what projection and order by you using in CREATE TABLE ???
Do you try using database designer
see http://my.vertica.com/docs/6.1.x/HTML/index.htm#14415.htm

Oracle - select rows with minimal value in a subset

I have a following table of dates:
dateID INT (PK),
personID INT (FK),
date DATE,
starttime VARCHAR, --Always in a format of 'HH:MM'
What I want to do is I want to pull rows (all columns, including PK) with lowest date (primary condition) and starttime (secondary condition) for every person. For example, if we have
row1(date = '2013-04-01' and starttime = '14:00')
and
row2(date = '2013-04-02' and starttime = '08:00')
row1 will be retrieved, along with all other columns.
So far I have come up with gradual filtering the table, but it`s quite a mess. Is there more efficient way of doing this?
Here is what I made so far:
SELECT
D.id
, D.personid
, D.date
, D.starttime
FROM table D
JOIN (
SELECT --Select lowest time from the subset of lowest dates
A.personid,
B.startdate,
MIN(A.starttime) AS starttime
FROM table A
JOIN (
SELECT --Select lowest date for every person to exclude them from outer table
personid
, MIN(date) AS startdate
FROM table
GROUP BY personid
) B
ON A.personid = B.peronid
AND A.date = B.startdate
GROUP BY
A.personid,
B.startdate
) C
ON C.personid = D.personid
AND C.startdate = D.date
AND C.starttime = D.starttime
It works, but I think there is a more clean/efficient way to do this. Any ideas?
EDIT: Let me expand a question - I also need to extract maximum date (only date, without time) for each person.
The result should look like this:
id
personid
max(date) for each person
min(date) for each person
min(starttime) for min(date) for each person
It is a part of a much larger query (the resulting table is joined with it), and the resulting table must be lightweight enough so that the query won`t execute for too long. With single join with this table (just using min, max for each field I wanted) the query took about 3 seconds, and I would like the resulting query not to take longer than 2-3 times that.
you should be able to do this like:
select a.dateID, a.personID, a.date, a.max_date, a.starttime
from (select t.*,
max(t.date) over (partition by t.personID) max_date,
row_number() over (partition by t.personID
order by t.date, t.starttime) rn
from table t) a
where a.rn = 1;
sample data added to fiddle: http://sqlfiddle.com/#!4/63c45/1
This is the query you can use and no need to incorporate in your query. You can also use #Dazzal's query as stand alone
SELECT ID, PERSONID, DATE, STARTTIME
(
SELECT ID, PERONID, DATE, STARTTIME, ROW_NUMBER() OVER(PARTITION BY personid ORDER BY STARTTIME, DATE) AS RN
FROM TABLE
) A
WHERE
RN = 1
select a.id,a.accomp, a.accomp_name, a.start_year,a.end_year, a.company
from (select t.*,
min(t.start_year) over (partition by t.company) min_date,
max(t.end_year) over (partition by t.company) max_date,
row_number() over (partition by t.company
order by t.end_year desc) rn
from temp_123 t) a
where a.rn = 1;