How to get value from another row in sql - sql

I have a sql query like the following:
select
,c.customer_leg1
,d.mid
,c.previous_customer_leg1
,c.creation_date
,c.end_date
,c.cid
from table1 c
JOIN table2 d
ON c.cid = d.cid
where c.cid = '1234'
which gives the below output:
customer_leg1 | previous_customer_leg1 | creation_data | end_date | cid
4092 | 1888 | 05/06/17 | 05/07/17 | 735
8915 | 4092 | 05/06/17 | 05/08/17 | 735
I want to add a new column such that for each customer_leg1 where ever we find that in previous_customer_leg1 it should put that row's "end_date" in that column.
For eg: in row 1 of the above output customer_leg1 is 4092 and this is found in row 2 in the previous_customer_leg1, so in row 1, this new_column should have 05/08/17 in it. And for those, where the customer_leg1 doesn't match in previous_customer_leg1, it should be NULL. I think I could maybe use partition and lag function for this, but I'm not very clear on those. Any help will be appreciated. Thanks!

Since you are only showing "the gist" of what you want, perhaps "the gist" of one possible solution is like this:
Add to your "really huge" query another join:
select .....
, c1.end_date as new_column
from table1 c join table2 d on c.cid = d.cid
join talbe1 c1 on c.cid = c1.cid
and c.customer_leg1 = c1.previous_customer_leg1
..................

As I asked in comments, what are the columns on which you will make sure that your next row is displayed correctly, as you cannot guarantee that your output will always be in same order. So assuming that your column for order is creation_date, and you want this to be done on a partition of c.cid, you can add something like below to your select statement to derive this new column, without disturbing the rest of the query.
Disclaimer: If the columns for partition and order are not the same, then this will not work. Please change the columns. But the concept of reading next row for a column and if matched, then display another column from next row is below.
,CASE
WHEN lead (c.previous_customer_leg1,1)
over (partition BY c.cid ORDER BY c.creation_date)
= c.customer_leg1
THEN lead (c.end_date,1) over (partition BY c.cid ORDER BY c.creation_date)
END AS new_column

Related

Using INNER JOIN resulting table in another INNER JOIN

I'm not really sure that the title actually corresponds, maybe my approach is wrong.
I have the following database structure:
TABLE producers
id
TABLE data
id
date
value
producer_id OneToMany
First thing first, for each producer, I want to get the latest date of registered data that there is. The code below does exactly this:
SELECT producers.id AS producer_id, max.date AS max_date
FROM producers
INNER JOIN data ON producers.id = data.producer_id
INNER JOIN (
SELECT producer_id, MAX(date) AS date
FROM data
GROUP BY producer_id
) AS max USING (producer_id, date)
And the resulting table is:
----------------------------------------------------
| producer_id | max_date |
----------------------------------------------------
| 5 | 2022-01-01 01:45:00.000 +0000 |
| 7 | 2022-01-01 01:45:00.000 +0000 |
| 14 | 2022-01-01 01:45:00.000 +0000 |
| 15 | 2022-01-01 01:45:00.000 +0000 |
| 17 | 2022-01-01 01:45:00.000 +0000 |
----------------------------------------------------
The next thing that I need is to SUM all the data records per producer WITH date bigger than the max_date we got for each producer after the INNER JOIN from the previous query. The SUM() will be performed on column value.
Hopefully that was clear, if not, let me know. I've tried doing another INNER JOIN and use table max in the WHERE clause but I got an error that told me that the table was there, but it wasn't possible to be used in that part of the query.
Maybe another INNER JOIN isn't the solution. Here I'm limited by my knowledge of SQL and I don't really know about which keywords to read more in-depth to understand what's the best approach and how to do it. So, an info to redirect me on the best path would be really helpful.
Thanks in advance.
EDIT: Forgot to specify on which column the SUM() will be executed on.
EDIT 2: Just realized that what I'm asking here, the result will always be an empty table because there won't ever be a record whose date will be bigger. When I wrote the simplified version of my database, forgot to add a table/join, that's why. But in the end imo the approach/solution will still be the same, just applied on a different table. Sorry for that again.
The first query in the question can be greatly simplified using distinct on and order by:
select distinct on (p.id)
p.id, d.date
from producers p
join data d on p.id = d.producer_id
order by p.id, d.date desc;
As for "SUM all the data records per producer WITH date bigger than the max_date" - well, none exists with date bigger than the latest one. Here is a query to do so (even the result will be empty)
select producer_id, sum(value)
from data d inner join -- the query above follows
(
select distinct on (p.id)
p.id producer_id, d.date
from producers p
join data d on p.id = d.producer_id
order by p.id, d.date desc
) t using (producer_id)
where d.date > t.date
group by producer_id;

Return count of total group membership when providers are part of a group

TABLE A: Pre-joined table - Holds a list of providers who belong to a group and the group the provider belongs to. Columns are something like this:
ProviderID (PK, FK) | ProviderName | GroupID | GroupName
1234 | LocalDoctor | 987 | LocalDoctorsUnited
5678 | Physican82 | 987 | LocalDoctorsUnited
9012 | Dentist13 | 153 | DentistryToday
0506 | EyeSpecial | 759 | OphtaSpecialist
TABLE B: Another pre-joined table, holds a list of providers and their demographic information. Columns as such:
ProviderID (PK,FK) | ProviderName | G_or_I | OtherColumnsThatArentInUse
1234 | LocalDoctor | G | Etc.
5678 | Physican82 | G | Etc.
9012 | Dentist13 | I | Etc.
0506 | EyeSpecial | I | Etc.
The expected result is something like this:
ProviderID | ProviderName | ProviderStatus | GroupCount
1234 | LocalDoctor | Group | 2
5678 | Physican82 | Group | 2
9012 | Dentist13 | Individual | N/A
0506 | EyeSpecial | Individual | N/A
The goal is to determine whether or not a provider belongs to a group or operates individually, by the G_or_I column. If the provider belongs to a group, I need to include an additional column that provides the count of total providers in that group.
The Group/Individual portion is relatively easy - I've done something like this:
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus
FROM
TableA A
LEFT OUTER JOIN TableB B
ON A.ProviderID = B.ProviderID;
So far so good, this returns the expected results based on the G_or_I flag.
However, I can't seem to wrap my head around how to complete the COUNT portion. I feel like I may be overthinking it, and stuck in a loop of errors. Some things I've tried:
Add a second CASE STATEMENT:
CASE
WHEN B.G_or_I = 'G'
THEN (
SELECT CountedGroups
FROM (
SELECT ProviderID, count(GroupID) AS CountedGroups
FROM TableA
WHERE A.ProviderID = B.ProviderID
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
)
)
ELSE 'N/A' END
This returns an error stating that a single row sub-query is returning more than one row. If I limit the number of rows returned to 1, the CountedGroups column returns 1 for every row. This makes me think that its not performing the count function as I expect it to.
I've also tried including a direct count of TableA as a factored sub-query:
WITH CountedGroups AS
( SELECT Provider ID, count(GroupID) As GroupSum
FROM TableA
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
) --This as a standalone query works just fine
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus,
CASE
WHEN B.G_or_I = 'G'
THEN GroupSum
ELSE 'N/A' END
FROM
CountedGroups CG
JOIN TableA A
ON CG.ProviderID = A.ProviderID
LEFT OUTER JOIN TableB
ON A.ProviderID = B.ProviderID
This returns either null or completely incorrect column values
Other attempts have been a number of variations of this, with a mix of bad results or Oracle errors. As I mentioned above, I'm probably way overthinking it and the solution could be rather simple. Apologies if the information is confusing or I've not provided enough detail. The real tables have a lot of private medical information, and I tried to translate the essence of the issue as best I could.
Thank you.
You can use the CASE..WHEN and analytical function COUNT as follows:
SELECT
A.PROVIDERID,
A.PROVIDERNAME,
CASE
WHEN B.G_OR_I = 'G' THEN 'Group'
ELSE 'Individual'
END AS PROVIDERSTATUS,
CASE
WHEN B.G_OR_I = 'G' THEN TO_CHAR(COUNT(1) OVER(
PARTITION BY A.GROUPID
))
ELSE 'N/A'
END AS GROUPCOUNT
FROM
TABLE_A A
JOIN TABLE_B B ON A.PROVIDERID = B.PROVIDERID;
TO_CHAR is needed on COUNT as output expression must be of the same data type in CASE..WHEN
Your problem seems to be that you are missing a column. You need to add group name, otherwise you won't be able to differentiate rows for the same practitioner who works under multiple business entities (groups). This is probably why you have a DISTINCT on your query. Things looked like duplicates which weren't. Once you've done that, just use an analytic function to figure out the rest:
SELECT ta.providerid,
ta.providername,
DECODE(tb.g_or_i, 'G', 'Group', 'I', 'Individual') AS ProviderStatus,
ta.group_name,
CASE
WHEN tb.g_or_i = 'G' THEN COUNT(DISTINCT ta.provider_id) OVER (PARTITION BY ta.group_id)
ELSE 'N/A'
END AS GROUP_COUNT
FROM table_a ta
INNER JOIN table_b tb ON ta.providerid = tb.providerid
Is it possible that your LEFT JOIN was going the wrong direction? It makes more sense that your base demographic table would have all practitioners in it and then the Group table might be missing some records. For instance if the solo prac was operating under their own SSN and Type I NPI without applying for a separate Type II NPI or TIN.

Get specific row from each group

My question is very similar to this, except I want to be able to filter by some criteria.
I have a table "DOCUMENT" which looks something like this:
|ID|CONFIG_ID|STATE |MAJOR_REV|MODIFIED_ON|ELEMENT_ID|
+--+---------+----------+---------+-----------+----------+
| 1|1234 |Published | 2 |2019-04-03 | 98762 |
| 2|1234 |Draft | 1 |2019-01-02 | 98762 |
| 3|5678 |Draft | 3 |2019-01-02 | 24244 |
| 4|5678 |Published | 2 |2017-10-04 | 24244 |
| 5|5678 |Draft | 1 |2015-05-04 | 24244 |
It's actually a few more columns, but I'm trying to keep this simple.
For each CONFIG_ID, I would like to select the latest (MAX(MAJOR_REV) or MAX(MODIFIED_ON)) - but I might want to filter by additional criteria, such as state (e.g., the latest published revision of a document) and/or date (the latest revision, published or not, as of a specific date; or: all documents where a revision was published/modified within a specific date interval).
To make things more interesting, there are some other tables I want to join in.
Here's what I have so far:
SELECT
allDocs.ID,
d.CONFIG_ID,
d.[STATE],
d.MAJOR_REV,
d.MODIFIED_ON,
d.ELEMENT_ID,
f.ID FILE_ID,
f.[FILENAME],
et.COLUMN1,
e.COLUMN2
FROM DOCUMENT -- Get all document revisions
CROSS APPLY ( -- Then for each config ID, only look at the latest revision
SELECT TOP 1
ID,
MODIFIED_ON,
CONFIG_ID,
MAJOR_REV,
ELEMENT_ID,
[STATE]
FROM DOCUMENT
WHERE CONFIG_ID=allDocs.CONFIG_ID
ORDER BY MAJOR_REV desc
) as d
LEFT OUTER JOIN ELEMENT e ON e.ID = d.ELEMENT_ID
LEFT OUTER JOIN ELEMENT_TYPE et ON e.ELEMENT_TYPE_ID=et.ID
LEFT OUTER JOIN TREE t ON t.NODE_ID = d.ELEMENT_ID
OUTER APPLY ( -- This is another optional 1:1 relation, but it's wrongfully implemented as m:n
SELECT TOP 1
FILE_ID
FROM DOCUMENT_FILE_RELATION
WHERE DOCUMENT_ID=d.ID
ORDER BY MODIFIED_ON DESC
) as df -- There should never be more than 1, but we're using TOP 1 just in case, to avoid duplicates
LEFT OUTER JOIN [FILE] f on f.ID=df.FILE_ID
WHERE
allDocs.CONFIG_ID = '5678' -- Just for testing purposes
and d.state ='Released' -- One possible filter criterion, there may be others
It looks like the results are correct, but multiple identical rows are returned.
My guess is that for documents with 4 revisions, the same values are found 4 times and returned.
A simple SELECT DISTINCT would solve this, but I'd prefer to fix my query.
This would be a classic row_number & partition by question I think.
;with rows as
(
select <your-columns>,
row_number() over (partion by config_id order by <whatever you want>) as rn
from document
join <anything else>
where <whatever>
)
select * from rows where rn=1

Delete rows where date was least updated

How can I delete rows where dateupdated was least updated ?
My table is
Name Dateupdated ID status
john 1/02/17 JHN1 A
john 1/03/17 JHN2 A
sally 1/02/17 SLLY1 A
sally 1/03/17 SLLY2 A
Mike 1/03/17 MK1 A
Mike 1/04/17 MK2 A
I want to be left with the following after the data removal:
Name Date ID status
john 1/03/17 JHN2 A
sally 1/03/17 SLLY2 A
Mike 1/04/17 MK2 A
If you really want to "delete rows where dateupdated was least updated" then a simple single-row subquery should do the trick.
DELETE MyTable
WHERE Date = (SELECT MIN(Date) From MyTable)
If on the other hand you just want to delete the row with the earliest Date per person (as identified by their ID) you could use:
DELETE MyTable
FROM MyTable a
JOIN (SELECT ID, MIN(Date) MinDate FROM MyTable GROUP BY ID) b
ON a.ID = b.ID AND a.Date = b.MinDate
The idea here is you create an aggregate query that returns rows containing the columns that would match the rows you want deleted, then join to it. Because it's an inner join, rows that do not match the criteria will be excluded.
If people are uniquely identified by something else (e.g. Name then you can just substitute that for the ID in my example above.
I am thinking though that you don't want either of these. I think you want to delete everything except for each person's latest row. If that is the case, try this:
DELETE MyTable
WHERE EXISTS (SELECT 0 FROM MyTable b WHERE b.ID = MyTable.ID AND b.Date > MyTable.Date)
The idea here is you check for existence of another data row with the same ID and a later date. If there is a later record, delete this one.
The nice thing about the last example is you can run it over and over and every person will still be left with exactly one row. The other two queries, if run over and over, will nibble away at the table until it is empty.
P.S. As these are significantly different solutions, I suggest you spend some effort learning how to articulate unambiguous requirements. This is an extremely important skill for any developer.
This deletes rows where the name is a duplicate, and deletes all but the latest row for each name. This is different from your stated question.
Using a common table expression (cte) and row_number():
;with cte as (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
)
/* ------------------------------------------------
-- Remove duplicates by deleting rows
-- where the row number (rn) is greater than 1
-- leaving the first row for each partition
------------------------------------------------ */
delete
from cte
where cte.rn > 1
select * from t
rextester: http://rextester.com/HZBQ50469
returns:
+-------+-------------+-------+--------+
| Name | Dateupdated | ID | status |
+-------+-------------+-------+--------+
| john | 2017-01-03 | JHN2 | A |
| sally | 2017-01-03 | SLLY2 | A |
| Mike | 2017-01-04 | MK2 | A |
+-------+-------------+-------+--------+
Without using the cte it can be written as:
delete d
from (
select *
, rn = row_number() over (
partition by Name
order by Dateupdated desc
)
from t
) as d
where d.rn > 1
This should do the trick:
delete
from MyTable a
where not exists (
select top 1 1
from MyTable b
where b.name = a.name
and b.DateUpdated < a.DateUpdated
)
i.e. remove any entries from the table for which there is no record on the same name with a date earlier than the record to be deleted's.
Your Name column has Mike and Mik2 which is different for each other.
So, if you did not make a mistake, standard column to group by must be ID column without last digit.
I think following is more accurate if you did not mistaken.
delete a
from MyTable a
inner join
(select substring(ID, 1, len(ID) - 1) as ID, min(Dateupdated) as MinDate
from MyTable
group by substring(ID, 1, len(ID) - 1)
) b
on substring(a.ID, 1, len(a.ID) - 1) = b.ID and a.Dateupdated = b.MinDate
You can test it at SQLFiddle: http://sqlfiddle.com/#!6/9c440/1

Selecting records from subquery found set (postgres)

I have a query on 2 tables (part, price). The simplified version of this query is:
SELECT price.*
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type = '01'
ORDER BY date DESC
That returns several records:
code | type | date | price | file
-------------+----------+------------------------------------------------------
00065064705 | 01 | 2008-01-07 00:00:00 | 16.400000 | 28SEP2011.zip
00065064705 | 01 | 2007-02-05 00:00:00 | 15.200000 | 20JUL2011.zip
54868278900 | 01 | 2006-02-24 00:00:00 | 16.642000 | 28SEP2011.zip
As you can see, there is code 00065064705 listed twice. I just need the maxdate record (2008-01-07) along with the code, type, date and price for each unique code. So basically the top record for each unique code. This postgres so I can't use SELECT TOP or something like that.
I think I should be using this as subquery inside of a main query but I'm not sure how. something like
SELECT *
FROM price
JOIN (insert my original query here) AS price2 ON price.code = price2.code
Any help would be greatly appreciated.
You can use the row_number() window function to do that.
select *
from (SELECT price.*,
row_number() over (partition by price.code order by price.date desc) as rn
FROM price
INNER JOIN parts ON (price.code = part.code )
WHERE price.type='01') x
where rn = 1
ORDER BY date DESC
(*) Note: I may have prefixed some of the columns incorrectly, as I'm not sure which column is in which table. I'm sure you can fix that.
In Postgres you can use DISTINCT ON:
SELECT DISTINCT ON(code) *
FROM price
INNER JOIN parts ON price.code = part.code
WHERE price.type='01'
ORDER BY code, "date" DESC
select distinct on (code)
code, p.type, p.date, p.price, p.file
from
price p
inner join
parts using (code)
where p.type='01'
order by code, p.date desc
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT