Relational division with events in a certain timeframe - sql

I have my table (cte) defintions and result set here
The CTE may look strange but it has been tested and returns the correct results in the most efficient manner that I've found yet. The below query will find the number of person IDs (patid) who are taking two or more drugs at the same time. Currently, the query works insofar as it returns the patIDs of the people taking both drugs, but not both drugs at the same time. Taking both drugs is indicated by one fillDate of one drug falling before a scriptEndDate of another drug. So
You can see in this partial result set that on line 18 the scriptFillDate is 2009-07-19 which is between the fillDate and scriptEndDate of the same patID from row 2. What constraint do I need to add so I can filter these unneeded results?
--PatientDrugList is a CTE because eventually parameters might be passed to it
--to alter the selection population
;with PatientDrugList(patid, filldate, scriptEndDate,drugName,strength)
as
(
select rx.patid,rx.fillDate,rx.scriptEndDate,rx.drugName,rx.strength
from rx
),
--the row constructor here will eventually be parameters for a stored procedure
DrugList (drugName)
as
(
select x.drugName
from (values ('concerta'),('fentanyl'))
as x(drugName)
where x.drugName is not null
)
--the row number here is so that I can find the largest date range
--(the largest datediff means the person was on a given drug for a larger
--amount of time. obviously not a optimal solution
--celko inspired relational division!
select distinct row_number() over(partition by pd.patid, drugname order by datediff(day,pd.fillDate,pd.scriptEndDate)desc) as rn
,pd.patid
,pd.drugname
,pd.fillDate
,pd.scriptEndDate
from PatientDrugList as pd
where not exists
(select * from DrugList
where not exists
(select * from PatientDrugList as pd2
where(pd.patid=pd2.patid)
and (pd2.drugName = DrugList.drugName)))
and exists
(select *
from DrugList
where DrugList.drugName=pd.drugName
)
group by pd.patid, pd.drugName,pd.filldate,pd.scriptEndDate

Wrap you original query into a CTE, or better yet, for performance, stability of query plan and result, store it into a temp table.
The query below (assuming CTE option) will give you the overlapping times when both drugs are being taken.
;with tmp as (
.. your query producing the columns shown ..
)
select *
from tmp a
join tmp b on a.patid = b.patid and a.drugname <> b.drugname
where a.filldate < b.scriptenddate
and b.filldate < a.scriptenddate;

Related

How can I create a dates-table inside a query in SQL Server?

Say I want to match records in table_a that have a startdate and an enddate to individual days and see if on, for instance March 13, one or more records in table_a match. I'd like to solve this by generating a row per day, with the date as the leading column, and any matching data from table_a as a left join.
I've worked with data warehouses that have date dimensions that make this job easy. But unfortunately I need to run this particular query on an OLTP database that doesn't have such a table.
How can I generate a row-per-day table in SQL Server? How can I do this inside my query, without temp tables, functions/procedures etc?
An alternative is a recursive query to generate the date series. Based on your pseudo-code:
with dates_table as (
select <your-start-date> dt
union all
select dateadd(day, 1, dt) from dates_table where dt < <your-end-date>
)
select d.dt, a.<whatever>
from dates_table d
left outer join table_a a on <join / date matching here>
-- where etc etc
option (maxrecursion 0)
I found a bit of a hack way to do this. I'll assume two years of dates is sufficient for your dates table.
Now, find a table in your database that has at least 800 records (365 x 2 + leap years - headache for multiplying 365 = ~~ 800). The question talks about selecting data from table_a, so we'll assume this is table_b. Then, create this Common Table Expression at the top of the query:
with dates_table as (
select top 800 -- or more/less if your timespan isn't ~2years
[date] = date_add(day, ROW_NUMBER() over (order by <random column>) -1, <your-start-date>)
from table_b
)
select d.[date]
, a.<whatever>
from dates_table d
left outer join table_a a on <join / date matching here>
-- where etc, etc, etc
Some notes:
This works by just getting the numbers 0 - 799 and adding them to an arbitrary date.
If you need more or less dates than two years, increase or decrease the number fed into the top statement. Ensure that table_b has sufficient rows: select count(*) from table_b.
<random column can be any column on table_b, the ordering doesn't matter. We're only interested in the numbers 1-800 ( -1 for a range of 0-799), but ROW_NUMBER requires an order by argument.
<your-start-date> has the first date tyou want in the dates table, and is included in the output.
in the where of the joined query, you could filter out any excess days that we overshot by taking 800 rows instead of 730 (+leap) by adding stuff like year(d.[date]) IN (2020, 2021).
If table_a has more than 800 records itself, this could be used as the basis for the dates_table instead of some other table too.

Select random value for each row

I'm trying to select a new random value from a column in another table for each row of a table I'm updating. I'm getting the random value, however I can't get it to change for each row. Any ideas? Here's the code:
UPDATE srs1.courseedition
SET ta_id = teacherassistant.ta_id
FROM srs1.teacherassistant
WHERE (SELECT ta_id FROM srs1.teacherassistant ORDER BY RANDOM()
LIMIT 1) = teacherassistant.ta_id
My guess is that Postgres is optimizing out the subquery, because it has no dependencies on the outer query. Have you simply considered using a subquery?
UPDATE srs1.courseedition
SET ta_id = (SELECT ta.ta_id
FROM srs1.teacherassistant ta
ORDER BY RANDOM()
LIMIT 1
);
I don't think this will fix the problem (smart optimizers, alas). But, if you correlate to the outer query, then it should run each time. Perhaps:
UPDATE srs1.courseedition ce
SET ta_id = (SELECT ta.ta_id
FROM srs1.teacherassistant ta
WHERE ce.ta_id IS NULL -- or something like that
ORDER BY RANDOM()
LIMIT 1
);
You can replace the WHERE clause with something more nonsensical such as WHERE COALESCE(ca.ta_id, '') IS NOT NULL.
This following solution should be faster by order(s) of magnitude than running a correlated subquery for every row. N random sorts over the whole table vs. 1 random sort. The result is just as random, but we get a perfectly even distribution with this method, whereas independent random picks like in Gordon's solution can (and probably will) assign some rows more often than others. There are different kinds of "random". Actual requirements for "randomness" need to be defined carefully.
Assuming the number of rows in courseedition is bigger than in teacherassistant.
To update all rows in courseedition:
UPDATE srs1.courseedition c1
SET ta_id = t.ta_id
FROM (
SELECT row_number() OVER (ORDER BY random()) - 1 AS rn -- random order
, count(*) OVER () As ct -- total count
, ta_id
FROM srs1.teacherassistant -- smaller table
) t
JOIN (
SELECT row_number() OVER () - 1 AS rn -- arbitrary order
, courseedition_id -- use actual PK of courseedition
FROM srs1.courseedition -- bigger table
) c ON c.rn%t.ct = t.rn -- rownumber of big modulo count of small table
WHERE c.courseedition_id = c1.courseedition_id;
Notes
Match the random rownumber of the bigger table modulo the count of the smaller table to the rownumber of the smaller table.
row_number() - 1 to get a 0-based index. Allows using the modulo operator % more elegantly.
Random sort for one table is enough. The smaller table is cheaper. The second can have any order (arbitrary is cheaper). The assignment after the join is random either way. Perfect randomness would only be impaired indirectly if there are regular patterns in sort order of the bigger table. In this unlikely case, apply ORDER BY random() to the bigger table to eliminate any such effect.

using a single query to eliminate N+1 select issue

I want to return the last report of a given range of units. The last report will be identified by its time of creation. Therefore, the result would be a collection of last reports for a given range of units. I do not want to use a bunch of SELECT statements e.g.:
SELECT * FROM reports WHERE unit_id = 9999 ORDER BY time desc LIMIT 1
SELECT * FROM reports WHERE unit_id = 9998 ORDER BY time desc LIMIT 1
...
I initially tried this (but already knew it wouldn't work because it will only return 1 report):
'SELECT reports.* FROM reports INNER JOIN units ON reports.unit_id = units.id WHERE units.account_id IS NOT NULL AND units.account_id = 4 ORDER BY time desc LIMIT 1'
So I am looking for some kind of solution using subqueries or derived tables, but I can't just seem to figure out how to do it properly:
'SELECT reports.* FROM reports
WHERE id IN
(
SELECT id FROM reports
INNER JOIN units ON reports.unit_id = units.id
ORDER BY time desc
LIMIT 1
)
Any solution to do this with subqueries or derived tables?
The simple way to do this in Postgres uses distinct on:
select distinct on (unit_id) r.*
from reports r
order by unit_id, time desc;
This construct is specific to Postgres and databases that use its code base. It the expression distinct on (unit_id) says "I want to keep only one row for each unit_id". The row chosen is the first row encountered with that unit_id based on the order by clause.
EDIT:
Your original query would be, assuming that id increases along with the time field:
SELECT r.*
FROM reports r
WHERE id IN (SELECT max(id)
FROM reports
GROUP BY unit_id
);
You might also try this as a not exists:
select r.*
from reports r
where not exists (select 1
from reports r2
where r2.unit_id = r.unit_id and
r2.time > r.time
);
I thought the distinct on would perform well. This last version (and maybe the previous) would really benefit from an index on reports(unit_id, time).

Combining the results of two SQL queries as separate columns

I have two queries which return separate result sets, and the queries are returning the correct output.
How can I combine these two queries into one so that I can get one single result set with each result in a separate column?
Query 1:
SELECT SUM(Fdays) AS fDaysSum From tblFieldDays WHERE tblFieldDays.NameCode=35 AND tblFieldDays.WeekEnding=?
Query 2:
SELECT SUM(CHdays) AS hrsSum From tblChargeHours WHERE tblChargeHours.NameCode=35 AND tblChargeHours.WeekEnding=?
Thanks.
You can aliasing both query and Selecting them in the select query
http://sqlfiddle.com/#!2/ca27b/1
SELECT x.a, y.b FROM (SELECT * from a) as x, (SELECT * FROM b) as y
You can use a CROSS JOIN:
SELECT *
FROM ( SELECT SUM(Fdays) AS fDaysSum
FROM tblFieldDays
WHERE tblFieldDays.NameCode=35
AND tblFieldDays.WeekEnding=1) A -- use you real query here
CROSS JOIN (SELECT SUM(CHdays) AS hrsSum
FROM tblChargeHours
WHERE tblChargeHours.NameCode=35
AND tblChargeHours.WeekEnding=1) B -- use you real query here
You could also use a CTE to grab groups of information you want and join them together, if you wanted them in the same row. Example, depending on which SQL syntax you use, here:
WITH group1 AS (
SELECT testA
FROM tableA
),
group2 AS (
SELECT testB
FROM tableB
)
SELECT *
FROM group1
JOIN group2 ON group1.testA = group2.testB --your choice of join
;
You decide what kind of JOIN you want based on the data you are pulling, and make sure to have the same fields in the groups you are getting information from in order to put it all into a single row. If you have multiple columns, make sure to name them all properly so you know which is which. Also, for performance sake, CTE's are the way to go, instead of inline SELECT's and such. Hope this helps.
how to club the 4 query's as a single query
show below query
total number of cases pending + 2.cases filed during this month ( base on sysdate) + total number of cases (1+2) + no. cases disposed where nse= disposed + no. of cases pending (other than nse <> disposed)
nsc = nature of case
report is taken on 06th of every month
( monthly report will be counted from 05th previous month to 05th present of present month)

SQL Query to get latest price

I have a table containing prices for a lot of different "things" in a MS SQL 2005 table. There are hundreds of records per thing per day and the different things gets price updates at different times.
ID uniqueidentifier not null,
ThingID int NOT NULL,
PriceDateTime datetime NOT NULL,
Price decimal(18,4) NOT NULL
I need to get today's latest prices for a group of things. The below query works but I'm getting hundreds of rows back and I have to loop trough them and only extract the latest one per ThingID. How can I (e.g. via a GROUP BY) say that I want the latest one per ThingID? Or will I have to use subqueries?
SELECT *
FROM Thing
WHERE ThingID IN (1,2,3,4,5,6)
AND PriceDate > cast( convert(varchar(20), getdate(), 106) as DateTime)
UPDATE: In an attempt to hide complexity I put the ID column in a an int. In real life it is GUID (and not the sequential kind). I have updated the table def above to use uniqueidentifier.
I think the only solution with your table structure is to work with a subquery:
SELECT *
FROM Thing
WHERE ID IN (SELECT max(ID) FROM Thing
WHERE ThingID IN (1,2,3,4)
GROUP BY ThingID)
(Given the highest ID also means the newest price)
However I suggest you add a "IsCurrent" column that is 0 if it's not the latest price or 1 if it is the latest. This will add the possible risk of inconsistent data, but it will speed up the whole process a lot when the table gets bigger (if it is in an index). Then all you need to do is to...
SELECT *
FROM Thing
WHERE ThingID IN (1,2,3,4)
AND IsCurrent = 1
UPDATE
Okay, Markus updated the question to show that ID is a uniqueid, not an int. That makes writing the query even more complex.
SELECT T.*
FROM Thing T
JOIN (SELECT ThingID, max(PriceDateTime)
WHERE ThingID IN (1,2,3,4)
GROUP BY ThingID) X ON X.ThingID = T.ThingID
AND X.PriceDateTime = T.PriceDateTime
WHERE ThingID IN (1,2,3,4)
I'd really suggest using either a "IsCurrent" column or go with the other suggestion found in the answers and use "current price" table and a separate "price history" table (which would ultimately be the fastest, because it keeps the price table itself small).
(I know that the ThingID at the bottom is redundant. Just try if it is faster with or without that "WHERE". Not sure which version will be faster after the optimizer did its work.)
I would try something like the following subquery and forget about changing your data structures.
SELECT
*
FROM
Thing
WHERE
(ThingID, PriceDateTime) IN
(SELECT
ThingID,
max(PriceDateTime )
FROM
Thing
WHERE
ThingID IN (1,2,3,4)
GROUP BY
ThingID
)
Edit the above is ANSI SQL and i'm now guessing having more than one column in a subquery doesnt work for T SQL. Marius, I can't test the following but try;
SELECT
p.*
FROM
Thing p,
(SELECT ThingID, max(PriceDateTime ) FROM Thing WHERE ThingID IN (1,2,3,4) GROUP BY ThingID) m
WHERE
p.ThingId = m.ThingId
and p.PriceDateTime = m.PriceDateTime
another option might be to change the date to a string and concatenate with the id so you have only one column. This would be slightly nasty though.
If the subquery route was too slow I would look at treating your price updates as an audit log and maintaining a ThingPrice table - perhaps as a trigger on the price updates table:
ThingID int not null,
UpdateID int not null,
PriceDateTime datetime not null,
Price decimal(18,4) not null
The primary key would just be ThingID and "UpdateID" is the "ID" in your original table.
Since you are using SQL Server 2005, you can use the new (CROSS|OUTTER) APPLY clause. The APPLY clause let's you join a table with a table valued function.
To solve the problem, first define a table valued function to retrieve the top n rows from Thing for a specific id, date ordered:
CREATE FUNCTION dbo.fn_GetTopThings(#ThingID AS GUID, #n AS INT)
RETURNS TABLE
AS
RETURN
SELECT TOP(#n) *
FROM Things
WHERE ThingID= #ThingID
ORDER BY PriceDateTime DESC
GO
and then use the function to retrieve the top 1 records in a query:
SELECT *
FROM Thing t
CROSS APPLY dbo.fn_GetTopThings(t.ThingID, 1)
WHERE t.ThingID IN (1,2,3,4,5,6)
The magic here is done by the APPLY clause which applies the function to every row in the left result set then joins with the result set returned by the function then retuns the final result set. (Note: to do a left join like apply, use OUTTER APPLY which returns all rows from the left side, while CROSS APPLY returns only the rows that have a match in the right side)
BlaM:
Because I can't post comments yet( due to low rept points) not even to my own answers ^^, I'll answer in the body of the message:
-the APPLY clause even, if it uses table valued functions it is optimized internally by SQL Server in such a way that it doesn't call the function for every row in the left result set, but instead takes the inner sql from the function and converts it into a join clause with the rest of the query, so the performance is equivalent or even better (if the plan is chosen right by sql server and further optimizations can be done) than the performance of a query using subqueries), and in my personal experience APPLY has no performance issues when the database is properly indexed and statistics are up to date (just like a normal query with subqueries behaves in such conditions)
It depends on the nature of how your data will be used, but if the old price data will not be used nearly as regularly as the current price data, there may be an argument here for a price history table. This way, non-current data may be archived off to the price history table (probably by triggers) as the new prices come in.
As I say, depending on your access model, this could be an option.
I'm converting the uniqueidentifier to a binary so that I can get a MAX of it.
This should make sure that you won't get duplicates from multiple records with identical ThingIDs and PriceDateTimes:
SELECT * FROM Thing WHERE CONVERT(BINARY(16),Thing.ID) IN
(
SELECT MAX(CONVERT(BINARY(16),Thing.ID))
FROM Thing
INNER JOIN
(SELECT ThingID, MAX(PriceDateTime) LatestPriceDateTime FROM Thing
WHERE PriceDateTime >= CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)
GROUP BY ThingID) LatestPrices
ON Thing.ThingID = LatestPrices.ThingID
AND Thing.PriceDateTime = LatestPrices.LatestPriceDateTime
GROUP BY Thing.ThingID, Thing.PriceDateTime
) AND Thing.ThingID IN (1,2,3,4,5,6)
Since ID is not sequential, I assume you have a unique index on ThingID and PriceDateTime so only one price can be the most recent for a given item.
This query will get all of the items in the list IF they were priced today. If you remove the where clause for PriceDate you will get the latest price regardless of date.
SELECT *
FROM Thing thi
WHERE thi.ThingID IN (1,2,3,4,5,6)
AND thi.PriceDateTime =
(SELECT MAX(maxThi.PriceDateTime)
FROM Thing maxThi
WHERE maxThi.PriceDateTime >= CAST( CONVERT(varchar(20), GETDATE(), 106) AS DateTime)
AND maxThi.ThingID = thi.ThingID)
Note that I changed ">" to ">=" since you could have a price right at the start of a day
It must work without using a global PK column (for complex primary keys for example):
SELECT t1.*, t2.PriceDateTime AS bigger FROM Prices t1
LEFT JOIN Prices t2 ON t1.ThingID = t2.ThingID AND t1.PriceDateTime < t2.PriceDateTime
HAVING t2.PriceDateTime IS NULL
Try this (provided you only need the latest price, not the identifier or datetime of that price)
SELECT ThingID, (SELECT TOP 1 Price FROM Thing WHERE ThingID = T.ThingID ORDER BY PriceDateTime DESC) Price
FROM Thing T
WHERE ThingID IN (1,2,3,4) AND DATEDIFF(D, PriceDateTime, GETDATE()) = 0
GROUP BY ThingID
maybe i missunderstood the taks but what about a:
SELECT ID, ThingID, max(PriceDateTime), Price
FROM Thing GROUP BY ThingID