I am running SQL Server 2016 and have the following problem which seems quite basic but I cannot figure it out. I have a table Prices, which holds prices of different securities, with columns
idTag varchar(12) NOT NULL
ts datetime2 NOT NULL
price float NOT NULL
I also have another table Data with columns idTag and ts, where tags match exactly, but timestamps don't. I would like to find the corresponding prices for each row of the Data table (equivalent to constant interpolation in time).
For example, sample values in Prices may be
idTag | ts | price
=================================
IBM | 2020-01-01 13:00 | 100.23
IBM | 2020-01-01 13:05 | 100.34
IBM | 2020-01-01 13:10 | 100.45
IBM | 2020-01-01 13:15 | 100.29
IBM | 2020-01-01 13:20 | 100.31
and the sample values of the Data table may be
idTag | ts
========================
IBM | 2020-01-01 13:01
IBM | 2020-01-01 13:03
IBM | 2020-01-01 13:17
IBM | 2020-01-01 13:18
IBM | 2020-01-01 13:20
The expected output would be
idTag | ts | price
=================================
IBM | 2020-01-01 13:01 | 100.23
IBM | 2020-01-01 13:03 | 100.23
IBM | 2020-01-01 13:17 | 100.29
IBM | 2020-01-01 13:18 | 100.29
IBM | 2020-01-01 13:20 | 100.31
If the time stamps in both tables would match, I cuold write an INNER JOIN, but here, the timestamps don't match. I could also do this in code, e.q. Python or Java, but Prices has more than 150 million rows, I would rather not read that in.
Is there a way to do this in SQL?
Thank you very much
You can get the latest price for a date in a subquery.
select
idtag, ts,
(
select top(1) price
from prices p
where p.idtag = d.idtag
and p.ts <= d.ts
order by p.ts desc
) as price
from data d
order by idtag, ts;
(You could also move this subquery to the FROM clause and use CROSS APPLY).
Recommended index:
create index idx on prices(idtag, ts, price);
Sure, use an analytic to copy the next value of ts into the current row then use a ranged predicate:
select *
from
(select *, lead(ts) over(partition by idtag order by ts) as nextts from prices) p
inner join data d
on
d.idtag = p.idtag and
d.ts >= p.ts and
d.ts < p.nextts
where
idtag = 'IBM'
Might take a while to do on hundreds of millions of rows..
Related
Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.
My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.
Edit:
The user table does not have a registration date. One can assume it only contains the ID for this example.
Here is some sample data for the check_in table:
| user_id | date |
=====================================
| 1 | 2020-09-02 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 12:00:00 |
-------------------------------------
| 1 | 2020-09-04 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 11:00:00 |
-------------------------------------
| ... |
-------------------------------------
And the expected output of the query would be something like this:
| day_0 | day_1 | day_2 | day_3 |
=================================
| 70% | 67 % | 44% | 32% |
---------------------------------
Please note that I've used random numbers for this output just to illustrate the format.
Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:
select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
from users u
) u left join
(select ci.user_id, ci.date::date as date,
min(min(date::date)) over (partition by user_id order by date) as min_date
from checkins ci
group by user_id, ci.date::date
) ci;
Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.
I'm pretty new with SQL, and I'm struggling to figure out a seemingly simple task.
Here's the situation:
I'm working with two data sets
Data Set A, which is the most accurate but only refreshes every quarter
Data Set B, which has all the date, including the most recent data, but is overall less accurate
My goal is to combine both data sets where I would have Data Set A for all data up to the most recent quarter and Data Set B for anything after (i.e., all recent data not captured in Data Set A)
For example:
Data Set A captures anything from Q1 2020 (January to March)
Let's say we are April 15th
Data Set B captures anything from Q1 2020 to the most current date, April 15th
My goal is to use Data Set A for all data from January to March 2020 (Q1) and then Data Set B for all data from April 1 to 15
Any thoughts or advice on how to do this? Potentially a join function along with a date one?
Any help would be much appreciated.
Thanks in advance for the help.
I hope I got your question right.
I put in some sample data that might match your description: a date and an amount. To keep it simple, one row per any month. You can extract the quarter from a date, and keep that as an additional column, and then filter by that down the line.
WITH
-- some sample data: date and amount ...
indata(dt,amount) AS (
SELECT DATE '2020-01-15', 234.45
UNION ALL SELECT DATE '2020-02-15', 344.45
UNION ALL SELECT DATE '2020-03-15', 345.45
UNION ALL SELECT DATE '2020-04-15', 346.45
UNION ALL SELECT DATE '2020-05-15', 347.45
UNION ALL SELECT DATE '2020-06-15', 348.45
UNION ALL SELECT DATE '2020-07-15', 349.45
UNION ALL SELECT DATE '2020-08-15', 350.45
UNION ALL SELECT DATE '2020-09-15', 351.45
UNION ALL SELECT DATE '2020-10-15', 352.45
UNION ALL SELECT DATE '2020-11-15', 353.45
UNION ALL SELECT DATE '2020-12-15', 354.45
)
-- real query starts here ...
SELECT
EXTRACT(QUARTER FROM dt) AS the_quarter
, CAST(
TIMESTAMPADD(
QUARTER
, CAST(EXTRACT(QUARTER FROM dt) AS INTEGER)-1
, TRUNC(dt,'YEAR')
)
AS DATE
) AS qtr_start
, *
FROM indata;
-- out the_quarter | qtr_start | dt | amount
-- out -------------+------------+------------+--------
-- out 1 | 2020-01-01 | 2020-01-15 | 234.45
-- out 1 | 2020-01-01 | 2020-02-15 | 344.45
-- out 1 | 2020-01-01 | 2020-03-15 | 345.45
-- out 2 | 2020-04-01 | 2020-04-15 | 346.45
-- out 2 | 2020-04-01 | 2020-05-15 | 347.45
-- out 2 | 2020-04-01 | 2020-06-15 | 348.45
-- out 3 | 2020-07-01 | 2020-07-15 | 349.45
-- out 3 | 2020-07-01 | 2020-08-15 | 350.45
-- out 3 | 2020-07-01 | 2020-09-15 | 351.45
-- out 4 | 2020-10-01 | 2020-10-15 | 352.45
-- out 4 | 2020-10-01 | 2020-11-15 | 353.45
-- out 4 | 2020-10-01 | 2020-12-15 | 354.45
If you filter by quarter, you can group your data by that column ...
I have a SOME_DELTA table which records all party related transactions with amount change
Ex.:
PARTY_ID | SOME_DATE | AMOUNT
--------------------------------
party_id_1 | 2019-01-01 | 100
party_id_1 | 2019-01-15 | 30
party_id_1 | 2019-01-15 | -60
party_id_1 | 2019-01-21 | 80
party_id_2 | 2019-01-02 | 50
party_id_2 | 2019-02-01 | 100
I have a case where where MVC controller accepts map someMap(party_id, some_date) and I need to get part_id list with summed amount till specific some_date
In this case if I send mapOf("party_id_1" to Date(2019 - 1 - 15), "party_id_2" to Date(2019 - 1 - 2))
I should get list of party_id with summed amount till some_date
Output should look like:
party_id_1 | 70
party_id_2 | 50
Currently code is:
select sum(amount) from SOME_DELTA where party_id=:partyId and some_date <= :someDate
But in this case I need to iterate through map and do multiple DB calls for summed amount for eatch party_id till some_date which feels wrong
Is there a more delicate way to get in one select query? (to avoid +100 DB calls)
You can use a lateral join for this:
select map.party_id,
c.amount
from (
values
('party_id_1', date '2019-01-15'),
('party_id_2', date '2019-01-02')
) map (party_id, cutoff_date)
join lateral (
select sum(amount) amount
from some_delta sd
where sd.party_id = map.party_id
and sd.some_date <= map.cutoff_date
) c on true
order by map.party_id;
Online example
I have a table in Amazon Redshift, named 'inventory'
This is a data pull from external systems. This happens twice a day, once in the morning (right at opening), and once right after closing. These are the location_id column below (there are multiple locations).
I want to figure out the total items sold based on column 'total_inventory'.
There is a column 'import_time' which has two possible values, 'am' and 'pm'.
All of this should by done by date, called 'import_date'
Data may look like this:
item_id | location_id | total_inventory | import_date | import_time
-------------------------------------------------------------------
10123 | 3 | 10 | 2019-10-01 | am
10123 | 3 | 3 | 2019-10-01 | pm
10123 | 3 | 7 | 2019-10-02 | am
10123 | 3 | 6 | 2019-10-02 | pm
I would ideally like to be able to see results of total_sold such as:
item_id | location_id | total_sold | import_date
------------------------------------------------
10123 | 3 | 7 | 2019-10-01
10123 | 3 | 1 | 2019-10-02
Note: Daily start levels have nothing to do with previous stock levels as they are replenished over night.
Also note: I have inherited this issue, and if structural changes are required, I can do so, but if possible to avoid it would be helpful.
I have attempted to look at other answers where arithmetic is being done based on column values, but I did not see (or rather, understand) a fit that would work for me.
Full Transparency: My SQL skills are fairly weak as of late due to not using them in a long while, so please go easy on me if I have asked a foolish question.
If the pm value is alway less than the am, you can do:
select import_date, item_id, location_id,
max(total_inventory) - min(total_inventory)
from t
group by import_date, item_id, location_id;
However, I suspect yo really want conditional aggregation:
select import_date, item_id, location_id,
(max(case when import_time = 'pm' then total_inventory else 0 end) -
min(case when import_time = 'am' then total_inventory else end)
)
from t
group by import_date, item_id, location_id;
I have two tables that have identical columns. I would like to join these two tables together into a third one that contains all the rows from the first one and from the second one all the rows that have a date that doesn't exist in the first table for the same location.
Example:
transactions:
date |location_code| product_code | quantity
------------+------------------+--------------+----------
2013-01-20 | ABC | 123 | -20
2013-01-23 | ABC | 123 | -13.158
2013-02-04 | BCD | 234 | -4.063
transactions2:
date |location_code| product_code | quantity
------------+------------------+--------------+----------
2013-01-20 | BDE | 123 | -30
2013-01-23 | DCF | 123 | -2
2013-02-05 | UXJ | 234 | -6
Desired result:
date |location_code| product_code | quantity
------------+------------------+--------------+----------
2013-01-20 | ABC | 123 | -20
2013-01-23 | ABC | 123 | -13.158
2013-01-23 | DCF | 123 | -2
2013-02-04 | BCD | 234 | -4.063
2013-02-05 | UXJ | 234 | -6
How would I go about this? I tried for example this:
SELECT date, location_code, product_code, type, quantity, location_type, updated_at
,period_start_date, period_end_date
INTO transactions_combined
FROM ( SELECT * FROM transactions_kitchen k
UNION ALL
SELECT *
FROM transactions_admin h
WHERE h.date NOT IN (SELECT k.date FROM k)
) AS t;
but that doesn't take into account that I'd like to include the rows that have the same date, but different location. I have Postgresql 9.2 in use.
UNION simply doesn't do what you describe. This query should:
CREATE TABLE AS
SELECT date, location_code, product_code, quantity
FROM transactions_kitchen k
UNION ALL
SELECT h.date, h.location_code, h.product_code, h.quantity
FROM transactions_admin h
LEFT JOIN transactions_kitchen k USING (location_code, date)
WHERE k.location_code IS NULL;
LEFT JOIN / IS NULL to exclude rows from the second table for the same location and date. See:
Select rows which are not present in other table
Use CREATE TABLE AS instead of SELECT INTO. The manual:
CREATE TABLE AS is functionally similar to SELECT INTO. CREATE TABLE AS is the recommended syntax, since this form of SELECT INTO
is not available in ECPG or PL/pgSQL, because they interpret the
INTO clause differently. Furthermore, CREATE TABLE AS offers a
superset of the functionality provided by SELECT INTO.
Or, if the target table already exists:
INSERT INTO transactions_combined (<list names of target column here!>)
SELECT ...
Aside: I would not use date as column name. It's a reserved word in every SQL standard and a function and data type name in Postgres.
Change UNION ALL to just UNION and it should return only unique rows from each table.