How can you use SQL to return values for a specified date or closest date < specified date? - sql

I've written an SQL statement to return a list of prices based on a date parameter, but I am finding that on some dates, the price is missing. I am looking for a way to modify this statement to return the price on the date specified, but if that is not available return the price for the most recent price available before the date specified.
Select date, grp, id, price
From
price_table
Where
date In ('12/31/2009', '11/30/2009') And
grp In ('Group1')
For example, in the I would like to be able to re-write the statement above to return all of the records below, showing appropriate parameter dates for all records. Assume this is a subset of a table with daily prices and the values below are the last prices for the months noted.
12/31/2009 Group1 1111 100
12/31/2009 Group1 2222 99
12/29/2009 Group1 3333 98
11/30/2009 Group1 1111 100
11/28/2009 Group1 2222 99
11/30/2009 Group1 3333 98
UPDATE:
Thanks to some help from srgerg below, I have been able to create a statement that works for one date at a time, but I would still like to find a way to pass multiple dates to the query.
Select p1.date, p1.grp, p1.id, p1.price
From
price_table As p1 Inner Join
(Select Max(p2.date) As maxdt, id
From
price_table As p2
Where
p2.date <= '12/31/2009'
Group By
p2.id) As p On p.maxdt = p1.date And p1.id = p.id
Where grp in ('Group1')

You could try something like this:
SELECT date, grp, id
, (SELECT TOP 1 price
FROM price_table P2
WHERE P1.id = P2.id
AND P1.grp = P2.grp
AND P2.date <= P1.date
ORDER BY P2.date DESC)
FROM price_table P1
WHERE P1.date IN ('12/31/2009', '11/30/2009')
AND P1.grp IN ('Group1')
Edit To get the last record for each month, group and id you could try this:
SELECT date, grp, id, price
FROM price_table P1
WHERE P1.date = (SELECT MAX(date)
FROM price_table P2
WHERE P1.grp = P2.grp
AND P1.id = P2.id
AND YEAR(P1.date) = YEAR(P2.date)
AND MONTH(P1.date) = MONTH(P2.date))
AND P1.grp In ('Group1')

Here's my approach to solving to this problem:
Associate every search date with a date in the table.
Use search dates as (additional) group terms.
Rank the dates in descending order, partitioning them by group terms.
Select those with the desired group term values and rank = 1.
The script:
WITH price_grouped AS (
SELECT
date, grp, id, price,
dategrp = CASE
WHEN date <= '11/30/2009' THEN '11/30/2009'
WHEN date <= '12/31/2009' THEN '12/31/2009'
/* extend the list of dates here */
END
FROM price_table
),
price_ranked AS (
SELECT
date, grp, id, price, dategrp,
rank = RANK() OVER (PARTITION BY grp, dategrp ORDER BY date DESC)
FROM price_grouped
)
SELECT date, grp, id, price
FROM price_ranked
WHERE grp IN ('Group1')
AND rank = 1
The above solution may seem not very handy because of the necessity to repeat each search date twice. An alternative to that might be to define the search date list as a separate CTE and, accordingly, assign the dates in a different way:
WITH search_dates (Date) AS (
SELECT '11/30/2009' UNION ALL
SELECT '12/31/2009'
/* extend the list of dates here */
),
price_grouped AS (
SELECT
p.date, p.grp, p.id, p.price,
dategrp = MIN(d.Date)
FROM price_table p
INNER JOIN search_dates d ON p.date <= d.Date
GROUP BY
p.date, p.grp, p.id, p.price
),
price_ranked AS (
SELECT
date, grp, id, price, dategrp,
rank = RANK() OVER (PARTITION BY grp, dategrp ORDER BY date DESC)
FROM price_grouped
)
SELECT date, grp, id, price
FROM price_ranked
WHERE grp IN ('Group1')
AND rank = 1
But take into account that the former solution will most probably be more performant.

Related

oracle sql get transactions between the period

I have 3 tables in oracle sql namely investor, share and transaction.
I am trying to get new investors invested in any shares for a certain period. As they are the new investor, there should not be a transaction in the transaction table for that investor against that share prior to the search period.
For the transaction table with the following records:
Id TranDt InvCode ShareCode
1 2020-01-01 00:00:00.000 inv1 S1
2 2019-04-01 00:00:00.000 inv1 S1
3 2020-04-01 00:00:00.000 inv1 S1
4 2021-03-06 11:50:20.560 inv2 S2
5 2020-04-01 00:00:00.000 inv3 S1
For the search period between 2020-01-01 and 2020-05-01, I should get the output as
5 2020-04-01 00:00:00.000 inv3 S1
Though there are transactions for inv1 in the table for that period, there is also a transaction prior to the search period, so that shouldn't be included as it's not considered as new investor within the search period.
Below query is working but it's really taking ages to return the results calling from c# code leading to timeout issues. Is there anything we can do to refine to get the results quicker?
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
),
SHARES_IN_PERIOD AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.SHARECODE = S.SHARECODE
WHERE T.TRANDT >= :startDate AND T.TRANDT <= :endDate
),
PREVIOUS_SHARES AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.TRSTCODE = S.TRSTCODE
WHERE T.TRANDT < :startDate
)
SELECT
DISTINCT
SP.INVCODE AS InvestorCode,
SP.SHARECODE AS ShareCode,
SP.TYPE AS ShareType
FROM SHARES_IN_PERIOD SP
WHERE (SP.INVCODE, SP.SHARECODE, SP.TYPE) NOT IN
(
SELECT
PS.INVCODE,
PS.SHARECODE,
PS.TYPE
FROM PREVIOUS_SHARES PS
)
With the suggestion given by #Gordon Linoff, I tried following options (for all the shares I need) but they are taking long time too. Transaction table is over 32 million rows.
1.
WITH
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join investors i on i.invcode = t.invcode
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode IN (SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL)))
and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
If you want to know if the first record in transactions for a share is during a period, you can use window functions:
select t.*
from (select t.*,
row_number() over (partition by invcode, sharecode order by trandt) as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode = :sharecode and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
For performance for this code, you want an index on transactions(invcode, sharecode, trandate).

SQLite Getting multiple results with LIMIT 1

I have the following problem.
Part of a task is to determine the visitor(s) with the most money spent between 2000 and 2020.
It just looks like this.
SELECT UserEMail FROM Visitor
JOIN Ticket ON Visitor.UserEMail = Ticket.VisitorUserEMail
where Ticket.Date> date('2000-01-01') AND Ticket.Date < date ('2020-12-31')
Group by Ticket.VisitorUserEMail
order by SUM(Price) DESC;
Is it possible to output more than one person if both have spent the same amount?
Use rank():
SELECT VisitorUserEMail
FROM (SELECT VisitorUserEMail, SUM(PRICE) as sum_price,
RANK() OVER (ORDER BY SUM(Price) DESC) as seqnum
FROM Ticket t
WHERE t.Date >= date('2000-01-01') AND Ticket.Date <= date('2021-01-01')
GROUP BY t.VisitorUserEMail
) t
WHERE seqnum = 1;
Note: You don't need the JOIN, assuming that ticket buyers are actually visitors. If that assumption is not true, then use the JOIN.
Use a CTE that returns all the total prices for each email and with NOT EXISTS select the rows with the top total price:
WITH cte AS (
SELECT VisitorUserEMail, SUM(Price) SumPrice
FROM Ticket
WHERE Date >= '2000-01-01' AND Date <= '2020-12-31'
GROUP BY VisitorUserEMail
)
SELECT c.VisitorUserEMail
FROM cte c
WHERE NOT EXISTS (
SELECT 1 FROM cte
WHERE SumPrice > c.SumPrice
)
or:
WITH cte AS (
SELECT VisitorUserEMail, SUM(Price) SumPrice
FROM Ticket
WHERE Date >= '2000-01-01' AND Date <= '2020-12-31'
GROUP BY VisitorUserEMail
)
SELECT VisitorUserEMail
FROM cte
WHERE SumPrice = (SELECT MAX(SumPrice) FROM cte)
Note that you don't need the function date() because the result of date('2000-01-01') is '2000-01-01'.
Also I think that the conditions in the WHERE clause should include the =, right?

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

SQL - values from two rows into new two rows

I have a query that gives a sum of quantity of items on working days. on weekend and holidays that quantity value and item value is empty.
I would like that on empty days is last known quantity and item.
My query is like this:
`select a.dt,b.zaliha as quantity,b.artikal as item
from
(select to_date('01-01-2017', 'DD-MM-YYYY') + rownum -1 dt
from dual
connect by level <= to_date(sysdate) - to_date('01-01-2017', 'DD-MM-YYYY') + 1
order by 1)a
LEFT OUTER JOIN
(select kolicina,sum(kolicina)over(partition by artikal order by datum_do) as zaliha,datum_do,artikal
from
(select sum(vv.kolicinaulaz-vv.kolicinaizlaz)kolicina,vz.datum as datum_do,vv.artikal
from vlpzaglavlja vz, vlpvarijante vv
where vz.id=vv.vlpzaglavlje
and vz.orgjed='01006'
and vv.skladiste='01006'
and vv.artikal in (3069,6402)
group by vz.datum,vv.artikal
order by vv.artikal,vz.datum asc)
order by artikal,datum_do asc)b
on a.dt=b.datum_do
where a.dt between to_date('12102017','ddmmyyyy') and to_date('16102017','ddmmyyyy')
order by a.dt`
and my output is like this:
and I want this:
In short, if quantity is null use lag(... ignore nulls) and coalesce or nvl:
select dt, item,
nvl(quantity, lag(quantity ignore nulls) over (partition by item order by dt))
from t
order by dt, item
Here is the full query, I cannot test it, but it is something like:
with t as (
select a.dt, b.zaliha as quantity, b.artikal as item
from (
select date '2017-10-10' + rownum - 1 dt
from dual
connect by date '2017-10-10' + rownum - 1 <= date '2017-10-16' ) a
left join (
select kolicina, datum_do, artikal,
sum(kolicina) over(partition by artikal order by datum_do) as zaliha
from (
select sum(vv.kolicinaulaz-vv.kolicinaizlaz) kolicina,
vz.datum as datum_do, vv.artikal
from vlpzaglavlja vz
join vlpvarijante vv on vz.id = vv.vlpzaglavlje
where vz.orgjed = '01006' and vv.skladiste='01006'
and vv.artikal in (3069,6402)
group by vz.datum, vv.artikal)) b
on a.dt = b.datum_do)
select *
from (
select dt, item,
nvl(quantity, lag(quantity ignore nulls)
over (partition by item order by dt)) qty
from t)
where dt >= date '2017-10-12'
order by dt, item
There are several issues in your query, major and minor:
in date generator (subquery a) you are selecting dates from long period, january to september, then joining with main tables and summing data and then selecting only small part. Why not filter dates at first?,
to_date(sysdate). sysdate is already date,
use ansi joins,
do not use order by in subqueries, it has no impact, only last ordering is important,
use date literals when defining dates, it is more readable.

Output two columns for 1 field for different date ranges?

I have a SQL table "ITM_SLS" with the following fields:
ITEM
DESCRIPTION
TRANSACTION #
DATE
QTY SOLD
I want to be able to output QTY SOLD for a one month value and a year to date value so that the output would look like this:
ITEM, DESCRIPTION, QTY SOLD MONTH, QTY SOLD YEAR TO DATE
Is this possible?
You could calculate the total quantity sold using group by in a subquery. For example
select a.Item, a.Description, b.MonthQty, c.YearQty
from (
select distinct Item, Description from TheTable
) a
left join (
select Item, sum(Qty) as MonthQty
from TheTable
where datediff(m,Date,getdate()) <= 1
group by Item
) b on a.Item = b.Item
left join (
select Item, sum(Qty) as YearQty
from TheTable
where datediff(y,Date,getdate()) <= 1
group by Item
) c on a.Item = c.Item
The method to limit the subquery to a particular date range differs per DBMS, this example uses the SQL Server datediff function.
Assuming the "one month" is last month...
select item
, description
, sum (case when trunc(transaction_date, 'MM')
= trunc(add_months(sysdate, -1), 'MM')
then qty_sold
else 0
end) as sold_month
, sum(qty_sold) as sold_ytd
from itm_sls
where transaction_date >= trunc(sysdate, 'yyyy')
group by item, description
/
This will give you an idea of what you can do:
select
ITEM,
DESCRIPTION,
QTY SOLD as MONTH,
( select sum(QTY SOLD)
from ITM_SLS
where ITEM = I.ITEM
AND YEAR = i.YEAR
) as YEAR TO DATE
from ITM_SLS I