How to write this query [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am stuck in a query and need your help and suggestion.The situation is :
I have a table with structure as
JOB_ID , ITEM_ID , NEW_ITEM_ID , STATUS
Where job_id is a primary key and status can be AC,SB.
Now i want to write a query that selects only those rows from table which have STATUS as AC and for which none of ITEM_ID OR NEW_ITEM_ID is in the row for which status is SB.I have already written a query but it takes a lot of time so please help me writing the optimized query.This what i have written
SELECT * FROM (
SELECT JOB_ID,NEW_ITEM_ID,ITEM_ID,STATUS
FROM X1
WHERE STATUS='AC'
AND NEW_ITEM_ID IS NOT NULL
MINUS
( SELECT T1.JOB_ID,T1.NEW_ITEM_ID ,T1.ITEM_ID ,T1.STATUS
FROM ( SELECT *
FROM X1
WHERE STATUS IN 'AC'
AND NEW_ITEM_ID IS NOT NULL ) T1
, ( SELECT *
FROM X1
WHERE STATUS IN ('PR','SB')
AND NEW_ITEM_ID IS NOT NULL ) T2
WHERE ( T2.ITEM_ID IN (T1.ITEM_ID,T1.NEW_ITEM_ID)
OR T2.NEW_ITEM_ID IN (T1.ITEM_ID,T1.NEW_ITEM_ID)
)
AND T1.STATUS!=T2.STATUS
)
) T
EDIT
This table is going to contain millions of records say around 30M.

The easiest way would be to have a query that selects all ITEM_IDs and NEW_ITEM_IDs which status is SB, then have another query like this:
SELECT * FROM table WHERE STATUS = 'AC' AND WHERE ITEM_ID NOT IN (the results of the previous query) AND WHERE NEW_ITEM_ID NOT IN (the results of the query for NEW_ITEM_IDs mentioned above).
Just an idea though but with the proper syntax I think that should work.

try this :
select * from status where STATUS ='AC' or (STATUS ='SB' and ITEM_ID is null) or or (STATUS ='SB' and NEW_ITEM_ID is null)

It sounds like you are looking for (1) the rows where status is AC and (2) there is no other row where the item_id or new_item_id's match and the status is SB?
How about:
SELECT job_id, item_id, new_item_id, status
FROM x1 a
WHERE a.status = 'AC'
AND NOT EXISTS (SELECT 1 FROM x1 b
WHERE b.status = 'SB'
AND ( b.new_item_id = a.item_id
OR b.item_id = a.new_item_id )

"This table is going to contain millions of records say around 30M"
This is one crucial piece of information but a couple of other key stats are missing. How many rows match the status of 'PR','SB' and 'AC' ? How many rows have new_item_id populated? Are those columns indexed?
You 'select * from x1' in your sub-queries. SELECT * is bad practice, a bug-waiting to happen. However it is disastrous here, because you don't use any of the columns, but you're forcing the database to read the entire row for each entry in the result-sets. The longer the rows the more expensive that is. In the sub-query you really should be driving off just indexes if you can possibly do so.
Ideally, you would have a index on X1 ( STATUS, NEW_ITEM_ID, ITEM_ID, JOB_ID ). Then you wouldn't hit the table at all. But at the very least you need an index on (STATUS, NEW_ITEM_ID). An index just on STATUS won't do you any good unless STATUS is highly selective - several hundred different values, evenly distributed. (Which seems unlikely: in my experience most status columns have a handful of different states_.
Your posted query hits table X1 three times; that will take ages. So the main thing is to reduce the number of times you hit the table. This is where sub-query factoring can help:
with data as ( select job_id, new_item_id, item_id, status
from x1
where status in ('PR','SB', 'AC' )
and new_item_id is not null )
select t1.*
from data t1
, data t2
where t1.status = 'AC'
and t2.status in ( 'PR','SB' )
abd (t2.new_item_id in ( t1.new_item_id, t1.item_id )
or t2.item_id in ( t1.new_item_id, t1.item_id ) )
/
So this query hits the table only once, and with a favourable index not even once.
If the query still takes too much time - or you can't wangle a helpful index - the other option for improving execution times against massive tables is parallel query. This option is open to you if you have an Enterprise Edition license and a server with sufficient CPUs (and both those conditions should be true if you want to run an application database with multi-million row tables_.
with data as ( select /*+ parallel (x1, 4) */
job_id, new_item_id, item_id, status
from x1
...

Related

More than one row returned by a subquery used as an expression when UPDATE on multiple rows

I'm trying to update rows in a single table by splitting them into two "sets" of rows.
The top part of the set should have a status set to X and the bottom one should have a status set to status Y.
I've tried putting together a query that looks like this
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
((SELECT id from x_status), 'X'),
((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);
When I run this query I get the following error:
pq: more than one row returned by a subquery used as an expression
This makes sense, folks.ids is expected to return a list of IDs, hence the IN clause in the UPDATE statement, but I suspect the problem is I can not return the list in the values statement in the FROM clause as it turns into something like this:
(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)
Is there a way how this UPDATE can be done using a CTE query at all? I could split this into two separate UPDATE queries, but CTE query would be better and in theory faster.
I think I understand now... if I get your problem, you want to set the status to 'X' for the oldest five records and 'Y' for everything else?
In that case I think the row_number() analytic would work -- and it should do it in a single pass, two scans, and eliminating one order by. Let me know if something like this does what you seek.
with ranked as (
select
id, row_number() over (order by date_registered desc) as rn
from people
)
update people p
set
status = case when r.rn <= 5 then 'X' else 'Y' end
from ranked r
where
p.id = r.id
Any time you do an update from another data set, it's helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.
Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.
Let me know if I missed the boat.
So after more tinkering, I've come up with a solution.
The problem with why the previous query fails is we are not grouping the IDs in the subqueries into arrays so the result expands into a huge list as I suspected.
The solution is grouping the IDs in the subqueries into ARRAY -- that way they get returned as a single result (tuple) in ids value.
This is the query that does the job. Note that we must unnest the IDs in the WHERE clause:
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
(ARRAY(SELECT id from x_status), 'X'),
(ARRAY(SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (SELECT * from unnest(folks.ids));

How to rewrite CONNECT BY PRIOR Oracle style query to RECURSIVE CTE Postgres for query with correlated WHERE clause? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Now I have following working query for Oracle:
select * from (
select orgId, oNdId, stamp, op,
lgin, qwe, rty,
tusid, tnid, teid,
thid, tehid, trid,
name1, name2,
xtrdta, rownum as rnum from
(
select a.*
from tblADT a
where a.orgId=? and EXISTS(
SELECT oNdId, prmsn FROM (
SELECT oNdId, rp.prmsn FROM tblOND
LEFT JOIN tblRoleprmsn rp ON rp.roleId=? AND rp.prmsn='vors'
START WITH oNdId IN (
SELECT oNdId FROM tblrnpmsn rnp
WHERE rnp.roleId=?
AND rnp.prmsn=?
)
CONNECT BY PRIOR oNdId = parentId
)
WHERE oNdId = a.oNdId OR 1 = (
CASE WHEN prmsn IS NOT NULL THEN
CASE WHEN a.oNdId IS NULL THEN 1 ELSE 0 END
END
)
)
AND op IN (?)
order by stamp desc
) WHERE rownum < (? + ? + 1)
) WHERE rnum >= (? + 1)
For now I am trying to implement analog for PostreSQl. Based on my investigation I could use recursive CTE.
But I am not successful. The eaxamples I found all without where clause so it is not so easy.
Could you please help me with that ?
The Oracle query seems to have a few extra quirks and conditions I'm not able to understand. It's probably related to the specific use case.
In the absence of sample data I'll show you the simple case. You say:
There is a table 'tblOND' which has 2 columns 'oNdId' and 'parentId' it is a hierarchy here
Here's a query that would get all the children of nodes, according to an initial filtering predicate:
create table tblond (
ondid int primary key not null,
parentid int foreign key references tblond (ondid)
);
with recursive
n as (
select ondid, parentid, 1 as lvl
from tblond
where <search_predicate> -- initial nodes
union all
select t.ondid, t.parentid, n.lvl + 1
from n
join tblond t on t.parentid = n.ondid -- #1
)
select * from n
Recursive CTEs are not limited to hierarchies, but to any kind of graph. As long as you are able to depict the relationship to "walk" to the next nodes (#1) you can keep adding rows.
Also the example shows a "made up" column lvl; you can produce as many columns as you need/want.
The section before the UNION ALL is the "anchor" query that is run only once. After the UNION ALL is the "iterative" query that is run iteratively until it does not return any more rows.

Nested SQL Queries with Self JOIN - How to filter rows OUT

I have an SQLite3 database with a table upon which I need to filter by several factors. Once such factor is to filter our rows based on the content of other rows within the same table.
From what I've researched, a self JOIN is going to be required, but I am not sure how I would do that to filter the table by several factors.
Here is a sample table of the data:
Name Part # Status Amount
---------------------------------
Item 1 12345 New $100.00
Item 2 12345 New $15.00
Item 3 35864 Old $132.56
Item 4 12345 Old $15.00
What I need to do is find any Items that have the same Part #, one of them has an "Old" Status and the Amount is the same.
So, first we would get all rows with Part # "12345," and then check if any of the rows have an "Old" status with a matching Amount. In this example, we would have Item2 and Item4 as a result.
What now would need to be done is to return the REST of the rows within the table, that have a "New" Status, essentially discarding those two items.
Desired Output:
Name Part # Status Amount
---------------------------------
Item 1 12345 New $100.00
Removed all "Old" status rows and any "New" that had a matching "Part #" and "Amount" with an "Old" status. (I'm sorry, I know that's very confusing, hence my need for help).
I have looked into the following resources to try and figure this out on my own, but there are so many levels that I am getting confused.
Self-join of a subquery
ZenTut
Compare rows and columns of same table
The first two links dealt with comparing columns within the same table. The third one does seem to be a pretty similar question, but does not have a readable answer (for me, anyway).
I do Java development as well and it would be fairly simple to do this there, but I am hoping for a single SQL query (nested), if possible.
The "not exists" statment should do the trick :
select * from table t1
where t1.Status = 'New'
and not exists (select * from table t2
where t2.Status = 'Old'
and t2.Part = t1.Part
and t2.Amount = t1.Amount);
This is a T-SQL answer. Hope it is translatable. If you have a big data set for matches you might change the not in to !Exists.
select *
from table
where Name not in(
select Name
from table t1
join table t2
on t1.PartNumber = t2.PartNumber
AND t1.Status='New'
AND t2.Status='Old'
and t1.Amount=t2.Amount)
and Status = 'New'
could be using an innner join a grouped select for get status old and not only this
select * from
my_table
INNER JOIN (
select
Part_#
, Amount
, count(distinct Status)
, sum(case when Status = 'Old' then 1 else 0 )
from my_table
group part_#, Amount,
having count(distinct Status)>1
and sum(case when Status = 'Old' then 1 else 0 ) > 0
) t on.t.part_# = my_table.part_#
and status = 'new'
and my_table.Amount <> t.Amount
Tried to understand what you want best I could...
SELECT DISTINCT yt.PartNum, yt.Status, yt.Amount
FROM YourTable yt
JOIN YourTable yt2
ON yt2.PartNum = yt.PartNum
AND yt2.Status = 'Old'
AND yt2.Amount != yt.Amount
WHERE yt.Status = 'New'
This gives everything with a new status that has an old status with a different price.

SQL Data Duplication Query

Greetings of the day!!!!
I have a table having multiple columns of data with different status.
Assume I have 500 rows of data with Status 'Valid' And I have 150 rows of data with 'chkDuplicate'.
Now I have to write query to Update these 150 records status to Valid or Invalid by comparing few columns for duplication like Address,City,State.
How to achieve this, It needs to support large data tables as well.
Thanks in advance....
TABLE DEFINITION
CREATE TABLE XYZ
(
ID bigint,
ADDRESS navrchar,
CITY navrchar,
STATE nvarchar,
ZIP nvarchar,
STATUS
)
Status should update based on duplication query.
Important!!!! For Duplicate data first record should be valid others should be invalid. If re-process the Invalid data again it should not disturb the valid records.
If I run query the above table should be same. Record 1,3 should be Success and 3,4 should be 'Duplicate'. Even if i have add few more 1,3 always be in Success other duplicates should be updated to 'Duplicate'.
This query returned duplicate rows.
select tbl.data1, tbl.data2, tbl.data3
from TestTable1 tbl
inner join (
SELECT data1 , data2, data3 , COUNT(*) AS dupCount
FROM TestTable1
GROUP BY data1, data2, data3
HAVING COUNT(*) > 1
) oc on tbl.data1 = oc.data1 and tbl.data2 = oc.data2 and tbl.data3 = oc.data3
then use Cursor and update duplicate row
Cursor Expamle
Added ID for ORDER BY clause then it works for me even if I re-process the duplication call multiple times.
WITH TABLE_DATA_DUPLICATE AS
(SELECT * ,ROW_NUMBER() OVER(
PARTITION BY STREET1,CITY,STATE,ZIP
ORDER BY STREET1,CITY,STATE,ZIP,ID
) NO_OF_REPEATS
FROM YOURTABLE(NOLOCK))
UPDATE TABLE_DATA_DUPLICATE SET STATUS = (CASE WHEN NO_OF_REPEATS = 1 THEN 'VALID' ELSE 'DUPLICATE' END)
Thanks everyone for support.... Cheers!!!!

Oracle HASH_JOIN_RIGHT_SEMI performance

Here is my query,
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
SHIPMENT_ITEMS is a very large table (10.1TB) , id_map is a very small table (12 rows and 3 columns). This query goes through HASH_JOIN_RIGHT_SEMI and takes a very long time.SHIPMENT_ITEMS is partitioned on ID column.
If I remove subquery with hard code values , it performs lot better
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN (1,2,3 )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
I cannot remove the subquery as it leads to hard coding.
Given that id_map is a very small table , I expect both queries to perform very similar. Why is the first one taking much longer.
I'm actually trying to understand why this performs so bad.
I expect dynamic partition pruning to happen here and I'm not able to come out with a reason on why its not happening
https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_avail.htm#BABHDCJG
Try hint no_unnest.
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE ID IN ( SELECT /*+ NO_UNNEST */ ID FROM id_map WHERE code = 'A' )
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')
CBO will not try to join subquery and use it like filter
Instead of using 'in' operator, use exists and check the query performance
SELECT si.* FROM
FROM SHIPMENT_ITEMS si
WHERE Exists ( SELECT 1 FROM id_map map WHERE map.code = 'A' and map.ID = so.ID)
AND LAST_UPDATED BETWEEN TO_DATE('20150102','YYYYMMDD') - 1 AND TO_DATE('20150103','YYYYMMDD')