How to find the date range between two orders from the Order table with respect to subsequent Customer_Ids? - hive

For eg- lets say we have a customer_id =1 and he has placed 3 orders in 2 years and his
1st Order_date = '1st Jan 2015'
2nd Order_date = '5th June 2015'
3rd Order_date = '2nd Feb 2016'.
This has to be calculated yearly from the date he has placed his first order.
Please let me know how to achieve this scenario in HiveQL.

select ord_rnk_1.customer_id,ord_rnk_1.order_id as 1st_order, ord_rnk_2.order_id as 2nd_order, ord_rnk_1.order_date as 1st_order_date, ord_rnk_2.order_date as 2nd_order_date,
CASE
WHEN nullif(ord_rnk_2.order_id,0)=0 THEN '1st purchase'
WHEN datediff(ord_rnk_2.order_date,ord_rnk_1.order_date) <=365 THEN 'repeat purchase'
ELSE '1st purchase'
end as customer_type
from
(
select customer_id,order_id, order_date from
(select customer_id,order_id, order_date,row_number() over(partition by customer_id order by order_date asc) rank from
(select distinct customer_id, order_id, to_date(order_date,"dd/mm/yyyy") as order_date
from table_t1
) abc
) order_rank where order_rank.rank=1
) ord_rnk_1
left join
(
select customer_id,order_id, order_date from
(select customer_id,order_id, order_date,row_number() over(partition by customer_id order by order_date asc) rank
from
(select distinct customer_id, order_id, to_date(order_date,"dd/mm/yyyy") as order_date
from table_t1
) abc
) order_rank where order_rank.rank=2
) ord_rnk_2
on ord_rnk_1.customer_id=ord_rnk_2.customer_id

Related

SQL - get start & end balance for each member each year

so I'd like to effectively get for each year the starting and end balance for each member for every year there is a record. for example the below would give me the latest balance for each member each year based on the date column
SELECT
T.MemberID,
T.DateCol,
T.Amount
FROM
(SELECT T.MemberID,
T.DateCol,
Amount,
ROW_NUMBER() OVER (PARTITION BY MemberID,
YEAR(DateCol)
ORDER BY
DateCol desc) AS seqnum
FROM
Tablet T
GROUP BY DateCol, MemberID, Amount
) T
WHERE
seqnum = 1 AND
MemberID = '1000009'
and the below would give me the earliest balance for each year
SELECT
T.MemberID,
T.DateCol,
T.Amount
FROM
(SELECT T.MemberID,
T.DateCol,
Amount,
ROW_NUMBER() OVER (PARTITION BY MemberID,
YEAR(DateCol)
ORDER BY
DateCol) AS seqnum
FROM
Tablet T
GROUP BY DateCol, MemberID, Amount
) T
WHERE
seqnum = 1 AND
MemberID = '1000009'
This would give me a result set like the below, column titles (MemberID, Date, Amount)
What I'm looking for is one query which is done by YEAR, MEMBERID, STARTBALANCE, ENDBALANCE as the columns. And would look like the below
What would be the best way to go about this?
commented above

Execution orders of SQL aggregate functions

I have a sales table in SQLite:
purchase_date
units_sold
customer_id
15
1
1
17
1
1
30
3
1
I want to get the total unit_solds for each customer on the first date and last date of their purchases. My query is:
select customer_id,
sum(units_sold) total_units_sold
from sales
group by customer_id
having purchase_date = min(purchase_date)
or purchase_date = max(purchase_date)
I was expecting results like:
customer_id
total_units_sold
1
4
but I got:
customer_id
total_units_sold
1
5
I would like to know why this solution doesn't work.
The order of the phrase is incorrect
Note: The having statement is executed after compilation.
You need to get the results as partial queries
For example, I arranged to know the first line of the date according to each customer
as well as the last line of the date (by getting the first line after descending order)
and then execute the group statement
The example is complete
select customer_id,sum(units_sold) from (
select customer_id, units_sold,purchase_date,
ROW_NUMBER() over(partition by customer_id order by purchase_date) As RowDatefirst,
ROW_NUMBER() over(partition by customer_id order by purchase_date desc)As RowDatelast
from sales
) t where t.RowDatefirst = 1 or t.RowDatelast=1
group by customer_id
Try this:
SELECT a.customer_id, SUM(a.units_sold) as total_units_sold
FROM sales a
INNER JOIN (
SELECT customer_id, MIN(purchase_date) as _first ,MAX(purchase_date) as _last
FROM sales
GROUP BY customer_id
) b ON a.customer_id = b.customer_id AND
(a.purchase_date = b._first OR a.purchase_date = b._last)
GROUP BY a.customer_id
http://sqlfiddle.com/#!7/0a4a4/7

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

T-SQL query to obtain the no of days an item was at the current price

Declare #sec_temp table
(
sec_no varchar(10),
amount money,
price_date date
)
insert #sec_temp
values
('123ABC', 25, '2011-01-20'),
('123ABC', 25, '2011-01-19'),
('123ABC', 25, '2011-01-18'),
('123ABC', 20, '2011-01-15'),
('123ABC', 22, '2011-01-13'),
('456DEF', 22, '2011-01-13')
Problem: To list out the distinct sec_no with the latest price (amount) and the number of days it was at the current price. In this case,
Result:
sec_no amount no_of_days_at_price
123ABC 25 3 e.g. 01-18 to 01-20
456DEF 22 1 e.g. 01-13
select
a.sec_no,
a.amount,
min(price_date) as FirstDateAtPrice,
No_of_days_at_price = COALESCE(DATEDIFF(d, c.price_date, a.price_date),0)
from (
select *, ROW_NUMBER() over (partition by sec_no order by price_date desc) rn
from #sec_temp) a
outer apply (
select top 1 *
from #sec_temp b
where a.sec_no=b.sec_no and a.amount <> b.amount
order by b.price_date desc
) c
where a.rn=1
The subquery A works out the greatest-1-per-group, which is to say the most recent price record for each sec_no. The subquery C finds the first prior record that holds a different price for the same sec_no. The difference in the two dates is the number of days sought. If you need it to be one for no prior date, change the end of the COALESCE line to 1 instead of 0.
EDITED for clarified question
To start counting from the first date equal to the current rate, use this query instead
select
sec_no,
amount,
No_of_days_at_price = 1 + DATEDIFF(d, min(price_date), max(price_date))
from (
select *,
ROW_NUMBER() over (partition by sec_no order by price_date desc) rn,
ROW_NUMBER() over (partition by sec_no, amount order by price_date desc) rn2
from #sec_temp
) X
WHERE rn=rn2
group by sec_no, amount
AND FINALLY If the required result is actually the days between
the first date on which the price is equal to current; and
today
Then the only part to change is this:
No_of_days_at_price = 1 + DATEDIFF(d, min(price_date), getdate())
Here's one approach, first looking up the latest price, and then the last price that was different:
select secs.sec_no
, latest.amount as price
, case when previous.price_date is null then 1
else datediff(day, previous.price_date, latest.price_date)
end as days_at_price
from (
select distinct sec_no
from #sec_temp
) secs
cross apply
(
select top 1 amount
, price_date
from #sec_temp
where sec_no = secs.sec_no
order by
price_date desc
) latest
outer apply
(
select top 1 price_date
from #sec_temp
where sec_no = secs.sec_no
and amount <> latest.amount
order by
price_date desc
) previous
This prints:
sec_no price days_at_price
123ABC 25,00 5
456DEF 22,00 1

Output two columns for 1 field for different date ranges?

I have a SQL table "ITM_SLS" with the following fields:
ITEM
DESCRIPTION
TRANSACTION #
DATE
QTY SOLD
I want to be able to output QTY SOLD for a one month value and a year to date value so that the output would look like this:
ITEM, DESCRIPTION, QTY SOLD MONTH, QTY SOLD YEAR TO DATE
Is this possible?
You could calculate the total quantity sold using group by in a subquery. For example
select a.Item, a.Description, b.MonthQty, c.YearQty
from (
select distinct Item, Description from TheTable
) a
left join (
select Item, sum(Qty) as MonthQty
from TheTable
where datediff(m,Date,getdate()) <= 1
group by Item
) b on a.Item = b.Item
left join (
select Item, sum(Qty) as YearQty
from TheTable
where datediff(y,Date,getdate()) <= 1
group by Item
) c on a.Item = c.Item
The method to limit the subquery to a particular date range differs per DBMS, this example uses the SQL Server datediff function.
Assuming the "one month" is last month...
select item
, description
, sum (case when trunc(transaction_date, 'MM')
= trunc(add_months(sysdate, -1), 'MM')
then qty_sold
else 0
end) as sold_month
, sum(qty_sold) as sold_ytd
from itm_sls
where transaction_date >= trunc(sysdate, 'yyyy')
group by item, description
/
This will give you an idea of what you can do:
select
ITEM,
DESCRIPTION,
QTY SOLD as MONTH,
( select sum(QTY SOLD)
from ITM_SLS
where ITEM = I.ITEM
AND YEAR = i.YEAR
) as YEAR TO DATE
from ITM_SLS I