complex django or SQL query - sql

A record can have status 'renewal_required'. If it enters this status, and the applicant indeed renews, a copy is generated, which enters status 'in_process' (But an application can have status 'in_process' for other reasons too).
Now I need to get all records that have renewal_required status, BUT, if a copy exists in status 'in_process' for a given applicant, I shall only show that one...the key is the applicant_id, being the same for copied records.
| id | status | applicant_id |
| 1 | renewal_required | 2 |
| 2 | in_process | 3 |
| 3 | renewal_required | 4 |
| 4 | in_process | 4 |
in the above example, records with id 1 and 4 would be returned...
Can this be done? Thanks for any suggestion (DB-redesign excluded, even if the design looks ridiculous - can't do anything about it right now)
Solution needs to be for django but if a SQL solution is being proposed I will happily accept it and adapt/execute directly

select a.applicant_id,COALESCE(b.status,a.status) status from
(select applicant_id,status from yourtable where status='renewal_required') a
left join
(select applicant_id,status from yourtable where status='in_process') b
on a.applicant_id = b.applicant_id;
check the DEMO

Here, a possible solution
SELECT MAX(t1.id) as max_id, t1.status, t1.applicant_id
FROM t1
JOIN (
SELECT MIN(status) as status, applicant_id
FROM t1
WHERE status in ('renewal_required', 'in_process')
GROUP by applicant_id ) tmp
ON t1.status = tmp.status
AND t1.applicant_id = tmp.applicant_id
GROUP BY t1.status, t1.applicant_id
SQL Fiddle
EDIT: Rethought it, now this one won't work if there are more statuses than just these two, because of SELECT MIN(status). Could you comment on that?
EDIT2: Might be like this it will. added WHERE status in ('renewal_required', 'in_process')

Related

How do I merge and delete duplicated rows in SQL using UPDATE?

For example, I have a table of:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web
2 | 23 | xyz | 0 | mobile
3 | 24 | xyzc | 0 | web
4 | 25 | xyzc | 0 | web
I want the result to be:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web&mobile
2 | 24 | xyzc | 0 | web
3 | 25 | xyzc | 0 | web
How do I do this in SQL Server using UPDATE and DELETE statements?
Any help is greatly appreciated!
I might actually suggest just leaving the original data intact, and instead creating a view here:
CREATE VIEW yourView AS
SELECT ROW_NUMBER() OVER (ORDER BY MIN(id)) AS id,
code, name, type,
STRING_AGG(deviceType, '&') WITHIN GROUP (ORDER BY id) AS deviceType
FROM yourTable
GROUP BY code, name, type;
Demo
One main reason for not actually doing the update is that every time new data comes in, you might possibly have to run that update, over and over. Instead, just keeping the original data and running the view occasionally might perform better here.
Note that I assume that you are using SQL Server 2017 or later. If not, then STRING_AGG would have to be replaced with an uglier approach, but you should consider upgrading in this case.
To do what you want, you would need two separate statements.
This updates the "first" row of each group with all the device types in the group:
update t
set t.devicetype = t1.devicetype
from mytable t
inner join (
select min(id) as id, string_agg(devicetype, '&') within group(order by id) as devicetype
from mytable
group by code, name, type
having count(*) > 1
) t1 on t1.id = t.id
This deletes everything but the first row per group:
with t as (
select row_number() over(partition by code, name, type order by id) rn
from mytable
)
delete from t where rn > 1
Demo on DB Fiddle

SQL script runs VERY slowly with small change

I am relatively new to SQL. I have a script that used to run very quickly (<0.5 seconds) but runs very slowly (>120 seconds) if I add one change - and I can't see why this change makes such a difference. Any help would be hugely appreciated!
This is the script and it runs quickly if I do NOT include "tt2.bulk_cnt
" in line 26:
with bulksum1 as
(
select t1.membercode,
t1.schemecode,
t1.transdate
from mina_raw2 t1
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t1.membercode,
t1.schemecode,
t1.transdate
),
bulksum2 as
(
select t1.schemecode,
t1.transdate,
count(*) as bulk_cnt
from bulksum1 t1
group by t1.schemecode,
t1.transdate
having count(*) >= 10
),
results as
(
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join bulksum2 tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
)
select * from results
EDIT: I apologise for not putting enough detail in here previously - although I can use basic SQL code, I am a complete novice when it comes to databases.
Database: Oracle (I'm not sure which version, sorry)
Execution plans:
QUICK query:
Plan hash value: 1712123489
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | FILTER | |
| 4 | HASH GROUP BY | |
| 5 | VIEW | VM_NWVW_0 |
| 6 | HASH GROUP BY | |
| 7 | TABLE ACCESS FULL| MINA_RAW2 |
| 8 | TABLE ACCESS FULL | MINA_RAW2 |
---------------------------------------------
SLOW query:
Plan hash value: 1298175315
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | HASH GROUP BY | |
| 3 | HASH JOIN | |
| 4 | VIEW | VM_NWVW_0 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL| MINA_RAW2 |
| 7 | TABLE ACCESS FULL | MINA_RAW2 |
--------------------------------------------
A few observations, and then some things to do:
1) More information is needed. In particular, how many rows are there in the MINA_RAW2 table, what indexes exist on this table, and when was the last time it was analyzed? To determine the answers to these questions, run:
SELECT COUNT(*) FROM MINA_RAW2;
SELECT TABLE_NAME, LAST_ANALYZED, NUM_ROWS
FROM USER_TABLES
WHERE TABLE_NAME = 'MINA_RAW2';
From looking at the plan output it looks like the database is doing two FULL SCANs on MINA_RAW2 - it would be nice if this could be reduced to no more than one, and hopefully none. It's always tough to tell without very detailed information about the data in the table, but at first blush it appears that an index on TRANSACTIONTYPE might be helpful. If such an index doesn't exist you might want to consider adding it.
2) Assuming that the statistics are out-of-date (as in, old, nonexistent, or a significant amount of data (> 10%) has been added, deleted, or updated since the last analysis) run the following:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(owner => 'YOUR-SCHEMA-NAME',
table_name => 'MINA_RAW2');
END;
substituting the correct schema name for "YOUR-SCHEMA-NAME" above. Remember to capitalize the schema name! If you don't know if you should or shouldn't gather statistics, err on the side of caution and do it. It shouldn't take much time.
3) Re-try your existing query after updating the table statistics. I think there's a fair chance that having up-to-date statistics in the database will solve your issues. If not:
4) This query is doing a GROUP BY on the results of a GROUP BY. This doesn't appear to be necessary as the initial GROUP BY doesn't do any grouping - instead, it appears this is being done to get the unique combinations of MEMBERCODE, SCHEMECODE, and TRANSDATE so that the count of the members by scheme and date can be determined. I think the whole query can be simplified to:
WITH cteWORKING_TRANS AS (SELECT *
FROM MINA_RAW2
WHERE TRANSACTIONTYPE IN ('RSP','SP','UNTV',
'ASTR','CN','TVIN',
'UCON','TRAS')),
cteBULKSUM AS (SELECT a.SCHEMECODE,
a.TRANSDATE,
COUNT(*) AS BULK_CNT
FROM (SELECT DISTINCT MEMBERCODE,
SCHEMECODE,
TRANSDATE
FROM cteWORKING_TRANS) a
GROUP BY a.SCHEMECODE,
a.TRANSDATE)
SELECT t.*, b.BULK_CNT
FROM cteWORKING_TRANS t
INNER JOIN cteBULKSUM b
ON b.SCHEMECODE = t.SCHEMECODE AND
b.TRANSDATE = t.TRANSDATE
I managed to remove an unnecessary subquery, but this syntax with distinct inside count may not work outside of PostgreSQL or may not be the desired result. I know I've certainly used it there.
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join (select t2.schemecode,
t2.transdate,
count(DISTINCT membercode) as bulk_cnt
from mina_raw2 t2
where t2.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t2.schemecode,
t2.transdate
having count(DISTINCT membercode) >= 10) tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
When you use those with queries, instead of subqueries when you don't need to, you're kneecapping the query optimizer.

Find Duplicate Values in a column based on specific criteria

I have a table that holds actions against specific accounts, the actions are given a numbered SET of actions and within that SET they get a unique, sequential number. We ran into an issue where somehow one of the unique numbers had been duplicated and would like to check for more examples where this might have happened. The table looks a little like this:
Account | Action Set | Action No | Action Code
--------|------------|-----------|------------
001 | 1 | 1 | GEN
001 | 1 | 2 | PHO
001 | 1 | 3 | RAN
001 | 1 | 3 | GEN
002 | 1 | 1 | GEN
002 | 1 | 2 | PHO
002 | 1 | 3 | RAN
I have tried various things I've found through searches on here but can't find anything that looks like it fits my specific circumstances.
For any given account number, I would like to find where within one Action SET the same Action Number is used more than once. I also need to return the full row, not just a count of how many there are.
From the example above, I would expect to see these results, same account, same action set, same action number
Account | Action Set | Action No | Action Code
--------|------------|-----------|------------
001 | 1 | 3 | RAN
001 | 1 | 3 | GEN
I would post what I have tried so far but honestly the extent of the code I have written so far is:
SELECT
TIA
Mark
Based on your description, you can use exists:
select t.*
from t
where exists (select 1
from t t2
where t2.account = t.account and
t2.actionset = t.actionset and
t2.actionno <> t.actionno
);
EDIT:
The above assumes that action numbers are different. Otherwise you can use:
select t.*
from t
where (select count(*)
from t t2
where t2.account = t.account and
t2.actionset = t.actionset
) >= 2;
try this one
Select account,actionset,actioncode,actionno
from table
where (account,actionset)
IN
(
Select account,actionset from table
group by account,actionset
having count(distinct actionno)>1
)
group by account,actionset,actioncode,actionno
Please find my solution for Getting duplicate records from table.
SELECT [ActionSet],ActionCode,[ActionNo]
FROM
(
SELECT *,ROW_NUMBER()OVER(PARTITION by [ActionSet],[ActionNo] ORDER BY
[ActionNo]) as rnk FROM [dbo].[ActionAccount]
) t where t.rnk>1
Thanks .

Removing duplicate SQL records to permit a unique key

I have a table ('sales') in a MYSQL DB which should rightfully have had a unique constraint enforced to prevent duplicates. To first remove the dupes and set the constraint is proving a bit tricky.
Table structure (simplified):
'id (unique, autoinc)'
product_id
The goal is to enforce uniqueness for product_id. The de-duping policy I want to apply is to remove all duplicate records except the most recently created, eg: the highest id.
Or to put another way, I would like to delete only duplicate records, excluding the ids matched by the following query whilst also preserving the existing non-duped records:
select id
from sales s
inner join (select product_id,
max(id) as maxId
from sales
group by product_id
having count(product_id) > 1) groupedByProdId on s.product_id
and s.id = groupedByProdId.maxId
I've struggled with this on two fronts - writing the query to select the correct records to delete and then also the constraint in MYSQL where a subselect FROM clause of a DELETE cannot reference the same table from which data is being removed.
I checked out this answer and it seemed to deal with the subject, but seem specific to sql-server, though I wouldn't rule this question out from duplicating another.
In reply to your comment, here's a query that works in MySQL:
delete YourTable
from YourTable
inner join YourTable yt2
on YourTable.product_id = yt2.product_id
and YourTable.id < yt2.id
This would only remove duplicate rows. The inner join will filter out the latest row for each product, even if no other rows for the same product exist.
P.S. If you try to alias the table after FROM, MySQL requires you to specify the name of the database, like:
delete <DatabaseName>.yt
from YourTable yt
inner join YourTable yt2
on yt.product_id = yt2.product_id
and yt.id < yt2.id;
Perhaps use ALTER IGNORE TABLE ... ADD UNIQUE KEY.
For example:
describe sales;
+------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| product_id | int(11) | NO | | NULL | |
+------------+---------+------+-----+---------+----------------+
select * from sales;
+----+------------+
| id | product_id |
+----+------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 3 |
| 6 | 2 |
+----+------------+
ALTER IGNORE TABLE sales ADD UNIQUE KEY idx1(product_id), ORDER BY id DESC;
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 3 Warnings: 0
select * from sales;
+----+------------+
| id | product_id |
+----+------------+
| 6 | 2 |
| 5 | 3 |
| 2 | 1 |
+----+------------+
See this pythian post for more information.
Note that the ids end up in reverse order. I don't think this matters, since order of the ids should not matter in a database (as far as I know!). If this displeases you however, the post linked to above shows a way to solve this problem too. However, it involves creating a temporary table which requires more hard drive space than the in-place method I posted above.
I might do the following in sql-server to eliminate the duplicates:
DELETE FROM Sales
FROM Sales
INNER JOIN Sales b ON Sales.product_id = b.product_id AND Sales.id < b.id
It looks like the analogous delete statement for mysql might be:
DELETE FROM Sales
USING Sales
INNER JOIN Sales b ON Sales.product_id = b.product_id AND Sales.id < b.id
This type of problem is easier to solve with CTEs and Ranking functions, however, you should be able to do something like the following to solve your problem:
Delete Sales
Where Exists(
Select 1
From Sales As S2
Where S2.product_id = Sales.product_id
And S2.id > Sales.Id
Having Count(*) > 0
)

SQL query filtering

Using SQL Server 2005, I have a table where certain events are being logged, and I need to create a query that returns only very specific results. There's an example below:
Log:
Log_ID | FB_ID | Date | Log_Name | Log_Type
7 | 4 | 2007/11/8 | Nina | Critical
6 | 4 | 2007/11/6 | John | Critical
5 | 4 | 2007/11/6 | Mike | Critical
4 | 4 | 2007/11/6 | Mike | Critical
3 | 3 | 2007/11/3 | Ben | Critical
2 | 3 | 2007/11/1 | Ben | Critical
The query should do the following: return ONLY one row per each FB_ID, but this needs to be the one where Log_Name has changed for the first time, or if the name never changes, then the first dated row.
In layman's terms I need this to browse through a DB to check for each instance where the responsibility of a case (FB_ID) has been moved to another person, and in case it never has, then just get the original logger's name.
In the example above, I should get rows (Log_ID) 2 and 6.
Is this even possible? Right now there's a discussion going on whether the DB was just made the wrong way. :)
I imagine I need to somehow be able to store the first resulting Log_Name into a variable and then compare it with an IF condition etc. I have no idea how to do such a thing with SQL though.
Edit: Updated the date. And to clarify on this, the correct result would look like this:
Log_ID | FB_ID | Date | Log_Name | Log_Type
6 | 4 | 2007/11/6 | John | Critical
2 | 3 | 2007/11/1 | Ben | Critical
It's not the first date per FB_ID I'm after, but the row where the Log_Name is changed from the original.
Originally FB_ID 4 belongs to Mike, but the query should return the row where it moves on to John. However, it should NOT return the row where it moves further on to Nina, because the first responsibility change already happened when John got it.
In the case of Ben with FB_ID 3, the logger is never changed, so the first row for Ben should be returned.
I guess that there is a better and more performant way, but this one seems to work:
SELECT *
FROM log
WHERE log_id IN
( SELECT MIN(log_id)
FROM log
WHERE
( SELECT COUNT(DISTINCT log_name)
FROM log log2
WHERE log2.fb_id = log.fb_id ) = 1
OR log.log_name <> ( SELECT log_name
FROM log log_3
WHERE log_3.log_id =
( SELECT MIN(log_id)
FROM log log4
WHERE log4.fb_id = log.fb_id ) )
GROUP BY fb_id )
This will efficiently use an index on (fb_id, cdate, id):
SELECT lo4.*
FROM
(
SELECT CASE WHEN ln.log_id IS NULL THEN lo2.log_id ELSE ln.log_id END AS log_id,
ROW_NUMBER() OVER (PARTITION BY lo2.fb_id ORDER BY lo2.cdate) AS rn
FROM (
SELECT
lo.*,
(
SELECT TOP 1 log_id
FROM t_log li
WHERE li.fb_id = lo.fb_id
AND li.cdate >= lo.cdate
AND li.log_id <> lo.log_id
AND li.log_name <> lo.log_name
ORDER BY
cdate, log_id
) AS next_id
FROM t_log lo
) lo2
LEFT OUTER JOIN
t_log ln
ON ln.log_id = lo2.next_id
) lo3, t_log lo4
WHERE lo3.rn = 1
AND lo4.log_id = lo3.log_id
If I've understood the problem correctly, the following SQL should do the trick:
SELECT Log_ID, FB_ID, min(Date), Log_Name, Log_Type
FROM Log
GROUP BY Date
The SQL will select the row with the earliest date for each FP_ID.