Selecting Random Record for each User returned in the results - sql

I have seen several ways of selecting random records from a table using several methods. However, my need I am sure is here, I just cannot find it. I have a query using several tables. What my end goal is is to have one random record for each user returned.
In my result set, I get users and the work items they have. What I need is to only return one random work item per user. This is where I am getting stuck. ANy assistance would be greatly appreciated.
Here is my code. What I need is 1 random C.credentilaing_k per user.
select
U.FULLNAME as 'Chg_By',
CONVERT(DATE,AL.AUDITDATETIME) as 'DE_Date',
P.ID,
P.LONGNAME,
CONVERT(DATE,P.DATEOFBIRTH) as 'DOB',
C.CREDENTIALING_K,
C.entity_k,
R.DESCRIPTION as 'CVI_TYPE',
CG.GROUPDESCRIPTION,
C.APPLICATION_RECEIVED,
R1.DESCRIPTION as 'Cur_STATUS',
CONVERT(DATE,C.USERDEF_D3) as 'MSO_DUE_DT'
from
VisualCACTUS.AUDITLOG AL
JOIN VisualCACTUS.USERS U
on U.user_k = AL.USER_K
join VisualCACTUS.CREDENTIALING C
JOIN VisualCACTUS.PROVIDERS P
on P.provider_k = C.PROVIDER_K
JOIN visualcactus.CREDENTIALINGGROUP CG
on CG.CREDENTIALINGGROUP_K = C.CREDENTIALINGGROUP_K
JOIN VisualCACTUS.REFTABLE R
on R.reftable_k = C.TYPE_RTK
JOIN VisualCACTUS.REFTABLE R1
ON R1.REFTABLE_K = C.CREDENTIALINGSTATUS_RTK
on C.CREDENTIALING_K = AL.FILE_PRIMARYKEY
where
AUDITLOG_K in (select AUDITLOG_K from VisualCACTUS.AUDITLOG_RECORDLEVEL where TABLE_NAME = 'CREDENTIALING '
and
AUDITLOG_RECORDLEVEL_K in (SELECT AUDITLOG_RECORDLEVEL_K from VisualCACTUS.AUDITLOG_FIELDLEVEL where NEWVALUE_SHORT = 'D2LC0YSXXW'))
and
CONVERT(DATE, AUDITDATETIME) = DATEADD(day, -1, convert(date, GETDATE()))

I think you need something like this (in T-SQL):
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY NEWID()) rn
FROM yourTable) dt
WHERE
rn = 1;

What you want is to get a random record for a user if the user is having multiple records. We may do something as below (In Teradata):
select
*
from tablename
where
qualify row_number() (over partition by id order by column) = 1;
Here:
partition by - can have multiple columns seperated by comma incase you want to segregate your data based on few columns.
order by - can be given just to arrange the data in that segment based on your preference, it can be on date or anything as per your data.
=1 - signifies take the first among them.

Related

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

SSRS Table of locations per item type

I have a basic query which shows what the latest product to be put in each location (FVTank) is:
SELECT TOP 1
T0.[DateTime],
T0.[TankName],
T1.[Item]
FROM
t005_pci_data T0
INNER JOIN t001_fvbatch T1 ON T1.[FVBatch] = T0.[FVBatch]
WHERE
T0.[TankName] = 'FV101'
UNION
SELECT TOP 1
T0.[DateTime],
T0.[TankName],
T1.[Item]
FROM
t005_pci_data T0
INNER JOIN t001_fvbatch T1 ON T1.[FVBatch] = T0.[FVBatch]
WHERE
T0.[TankName] = 'FV102'
[...etc...]
ORDER BY
T0.[DateTime] DESC
Which gives a result like this:
What I'd like to do is create a summary page on SSRS which would display all the locations which currently hold each item. Ideally it would look something like this:
There are 50 locations and 7 main items so I need it to have 8 headers (one additional one for "other".)
Is there a way to do this in SSRS? Or is there a better solution by doing it in SQL?
Thank you.
Add an additional column to your dataset that calculates a row number for each Item, ordered by the DateTime field:
row_number() over (partition by Item order by DateTime desc) as rn
Judging by your source query in your question, this may be best included as a wrapping select around your final query:
select DateTime
,TankName
,Item
,row_number() over (partition by Item order by DateTime desc) as rn
from(
<Your original query here>
) a
You can then use this as your row group, as without one you will not get the top aligned format you are after in each Item x column. Remember to delete the rn column but keep the grouping:
When you run this report you will get the following format (I didn't bother typing out all your data into my dataset query, hence the missing values):

SQL - Returning CTE with Top 1

I am trying to return a set of results and decided to try my luck with CTE, the first table "Vendor", has a list of references, the second table "TVView", has ticket numbers that were created using a reference from the "Vendor" table. There may be one or more tickets using the same ticket number depending on the state of that ticket and I am wanting to return the last entry for each ticket found in "TVView" that matches a selected reference from "Vendor". Also, the "TVView" table has a seed field that is incremented.
I got this to return the right amount of entries (meaning not showing the duplicate tickets but only once) but I cannot figure out how to add an additional layer to go back through and select the last entry for that ticket and return some other fields. I can figure out how to sum which is actually easy, but I really need the Top 1 of each ticket entry in "TVView" regardless if its a duplicate or not while returning all references from "Vendor". Would be nice if SQL supported "Last"
How do you do that?
Here is what I have done so far:
with cteTickets as (
Select s.Mth2, c.Ticket, c.PyRt from Vendor s
Inner join
TVView c on c.Mth1 = s.Mth1 and c.Vendor = s.Vendor
)
Select Mth2, Ticket, PayRt from cteTickets
Where cteTickets.Vendor >='20'
and cteTickets.Vendor <='40'
and cteTickets.Mth2 ='8/15/2014'
Group by cteTickets.Ticket
order by cteTickets.Ticket
Several rdbms's that support Common Table Expressions (CTE) that I am aware of also support analytic functions, including the very useful ROW_NUMBER(), so the following should work in Oracle, TSQL (MSSQL/Sybase), DB2, PostgreSQL.
In the suggestions the intention is to return just the most recent entry for each ticket found in TVView. This is done by using ROW_NUMBER() which is PARTITIONED BY Ticket that instructs row_number to recommence numbering for each change of the Ticket value. The subsequent ORDER BY Mth1 DESC is used to determine which record within each partition is assigned 1, here it will be the most recent date.
The output of row_number() needs to be referenced by a column alias, so using it in a CTE or derived table permits selection of just the most recent records by RN = 1 which you will see used in both options below:
-- using a CTE
WITH
TVLatest
AS (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
)
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
-- using a derived table instead
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
) TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
please note: "SELECT *" is a convenience or used as an abbreviation if full details are unknown. The queries above may not operate without correctly specifying the field list (eg. 'as is' they would fail in Oracle).

Query for active records between date range or most recent before date range

I need to find active records that fall between a range of date parameters from a table containing applications. First, I look for a record between the date range in a table called 'app_notes' and check if is linked to an application. If there is no app_note record in the date range, I must look at the most recent app note from before the date range. If this app note indicates a status of active, I select it.
The app_indiv table connects an individual to an application. There can be multiple app_indiv records for each individual and multiple app_notes for each app_indiv. Here is what I have so far:
SELECT DISTINCT individual.indiv_id
FROM individual INNER JOIN
app_indiv ON app_indiv.indiv_id = individual.indiv_id INNER JOIN
app_note ON app_indiv.app_indiv_id = app_note.app_indiv_id
WHERE (app_note.when_mod BETWEEN #date_from AND #date_to)
/* OR most recent app_note indicates active */
How can I get the most recent app_note record if there is not one in the date range? Since there are multiple app_note records possible, I don't know how to make it only retrieve the most recent.
SELECT *
FROM individual i
INNER JOIN app_indiv ai
ON ai.indiv_id = i.indiv_id
OUTER APPLY
(
SELECT TOP 1 * FROM app_note an
WHERE an.app_indiv_id = ai.app_indiv_id
AND an.when_mod < #date_to
ORDER BY an.when_mod DESC
) d
WHERE d.status = 'active'
Find the last note less than end date, check to see if it's active and if so show the individual record.
(untested) You'll need to use a CASE switch.
SELECT DISTINCT individual.indiv_id
FROM individual INNER JOIN
app_indiv ON app_indiv.indiv_id = individual.indiv_id INNER JOIN
app_note ON app_indiv.app_indiv_id = app_note.app_indiv_id
WHERE (CASE WHEN app_note.when_mod BETWEEN #date_from AND #date_to
THEN (SELECT appnote.when_mod from individual where appnote.when_mod BETWEEN #date_from AND #date_to)
WHEN app_note.when_mod NOT BETWEEN #date_from and #date_to
THEN (SELECT appnote.when_mod from individual appnote.when_mod LIMIT 1))
Query might not be correct. Switch might need to be in the first SELECT part of the query.
It seems to me that you really only care about the end date of your date range, since you want to be able to look farther back if there's nothing in that date range. I would use a CTEand the ROW_NUMBER() function. The CTE is just a cleaner way to write a sub-query (in this case, a CTE can do a lot more though). The Row_Number function will numbers the rows based on the order by statement. The partition by resets the numbering to one each time you hit a new value in that column.
with AppNoteCTE as
(select
<not sure what columns you need here>
app_indiv_id,
ROW_NUMBER() OVER (PARTITION BY APP_INDIV_ID ORDER BY WHEN_MOD DESC) RN
FROM
APP_INDIV
WHERE
WHEN_MOD <= #endDate)
SELECT DISTINCT individual.indiv_id
FROM individual INNER JOIN
app_indiv ON app_indiv.indiv_id = individual.indiv_id INNER JOIN
AppNoteCTE ON app_indiv.app_indiv_id = AppNoteCTE .app_indiv_id
and AppNoteCTE.RN = 1

Sql Server Query Selecting Top and grouping by

SpousesTable
SpouseID
SpousePreviousAddressesTable
PreviousAddressID, SpouseID, FromDate, AddressTypeID
What I have now is updating the most recent for the whole table and assigning the most recent regardless of SpouseID the AddressTypeID = 1
I want to assign the most recent SpousePreviousAddress.AddressTypeID = 1
for each unique SpouseID in the SpousePreviousAddresses table.
UPDATE spa
SET spa.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa INNER JOIN Spouses ON spa.SpouseID = Spouses.SpouseID,
(SELECT TOP 1 SpousePreviousAddresses.* FROM SpousePreviousAddresses
INNER JOIN Spouses AS s ON SpousePreviousAddresses.SpouseID = s.SpouseID
WHERE SpousePreviousAddresses.CountryID = 181 ORDER BY SpousePreviousAddresses.FromDate DESC) as us
WHERE spa.PreviousAddressID = us.PreviousAddressID
I think I need a group by but my sql isn't all that hot. Thanks.
Update that is Working
I was wrong about having found a solution to this earlier. Below is the solution I am going with
WITH result AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) AS rowNumber, *
FROM SpousePreviousAddresses
WHERE CountryID = 181
)
UPDATE result
SET AddressTypeID = 1
FROM result WHERE rowNumber = 1
Presuming you are using SQLServer 2005 (based on the error message you got from the previous attempt) probably the most straightforward way to do this would be to use the ROW_NUMBER() Function couple with a Common Table Expression, I think this might do what you are looking for:
WITH result AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) as rowNumber,
*
FROM
SpousePreviousAddresses
)
UPDATE SpousePreviousAddresses
SET
AddressTypeID = 2
FROM
SpousePreviousAddresses spa
INNER JOIN result r ON spa.SpouseId = r.SpouseId
WHERE r.rowNumber = 1
AND spa.PreviousAddressID = r.PreviousAddressID
AND spa.CountryID = 181
In SQLServer2005 the ROW_NUMBER() function is one of the most powerful around. It is very usefull in lots of situations. The time spent learning about it will be re-paid many times over.
The CTE is used to simplyfy the code abit, as it removes the need for a temporary table of some kind to store the itermediate result.
The resulting query should be fast and efficient. I know the select in the CTE uses *, which is a bit of overkill as we dont need all the columns, but it may help to show what is happening if anyone want to see what is happening inside the query.
Here's one way to do it:
UPDATE spa1
SET spa1.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa1
LEFT OUTER JOIN SpousePreviousAddresses AS spa2
ON (spa1.SpouseID = spa2.SpouseID AND spa1.FromDate < spa2.FromDate)
WHERE spa1.CountryID = 181 AND spa2.SpouseID IS NULL;
In other words, update the row spa1 for which no other row spa2 exists with the same spouse and a greater (more recent) date.
There's exactly one row for each value of SpouseID that has the greatest date compared to all other rows (if any) with the same SpouseID.
There's no need to use a GROUP BY, because there's kind of an implicit grouping done by the join.
update: I think you misunderstand the purpose of the OUTER JOIN. If there is no row spa2 that matches all the join conditions, then all columns of spa2.* are returned as NULL. That's how outer joins work. So you can search for the cases where spa1 has no matching row spa2 by testing that spa2.SpouseID IS NULL.
UPDATE spa SET spa.AddressTypeID = 1
WHERE spa.SpouseID IN (
SELECT DISTINCT s1.SpouseID FROM Spa S1, SpousePreviousAddresses S2
WHERE s1.SpouseID = s2.SpouseID
AND s2.CountryID = 181
AND s1.PreviousAddressId = s2.PreviousAddressId
ORDER BY S2.FromDate DESC)
Just a guess.