SQL Update with CTEs not updating records - sql

I am attempting to update a set of records that are duplicates in three particular columns. The reason for this update is that there is a conflict when trying to insert this data into an updated database schema. The conflict is caused by a new constraint that has been added on DM_ID, DM_CONTENT_TYPE_ID, and DMC_TYPE. I need to adjust the DM_CONTENT_TYPE_ID column to either 1, 3, or 5 based on the row number to get around this. A sample of the duplicate data looks as such. Notice that the first three columns are the same.
+--------+--------------------+----------+--------------------------------------+
| DM_ID | DM_CONTENT_TYPE_ID | DMC_TYPE | DMC_PATH |
+--------+--------------------+----------+--------------------------------------+
| 314457 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7897-0.tif |
| 314457 | 1 | TIF | \\DOCIMG\DR\640\0001_640_0001.tif |
| 314458 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7898-0.tif |
| 314458 | 1 | TIF | \\DOCIMG\TD\640\0002_640_0001.tif |
| 314460 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7900-0.tif |
| 314460 | 1 | TIF | \\DOCIMG\ZZ\640\0003_640_0003.tif |
| 314461 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7901-0.tif |
| 314461 | 1 | TIF | \\DOCIMG\ED\6501\03_0001.tif |
| 314461 | 1 | TIF | \\DOCIMG\ZZ\640\0004_640_0004.tif |
+--------+--------------------+----------+--------------------------------------+
This is the desired output to get around the constraint issue:
+--------+--------------------+----------+--------------------------------------+
| DM_ID | DM_CONTENT_TYPE_ID | DMC_TYPE | DMC_PATH |
+--------+--------------------+----------+--------------------------------------+
| 314457 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7897-0.tif |
| 314457 | 3 | TIF | \\DOCIMG\DR\640\0001_640_0001.tif |
| 314458 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7898-0.tif |
| 314458 | 3 | TIF | \\DOCIMG\TD\640\0002_640_0001.tif |
| 314460 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7900-0.tif |
| 314460 | 3 | TIF | \\DOCIMG\ZZ\640\0003_640_0003.tif |
| 314461 | 1 | TIF | \\DOCIMG\CD\1965\19651227\7901-0.tif |
| 314461 | 3 | TIF | \\DOCIMG\ED\6501\03_0001.tif |
| 314461 | 5 | TIF | \\DOCIMG\ZZ\640\0004_640_0004.tif |
+--------+--------------------+----------+--------------------------------------+
The script I have developed is as such:
;WITH CTE AS
(SELECT -- Grab the documents that have a duplicate.
DM_ID
,DM_CONTENT_TYPE_ID
,DMC_TYPE
,COUNT(*) 'COUNT'
FROM
[DM_CONTENT]
GROUP BY
DM_ID
,DM_CONTENT_TYPE_ID
,DMC_TYPE
HAVING
COUNT(*) > 1),
CTE2 AS
(SELECT -- Designate the row number for the duplicate documents.
DMC.*
,ROW_NUMBER() OVER(PARTITION BY DMC.DM_ID, DMC.DM_CONTENT_TYPE_ID, DMC.DMC_TYPE ORDER BY DMC.DMC_PATH) AS 'ROWNUM'
FROM
[DM_CONTENT] DMC
JOIN CTE
ON DMC.DM_ID = CTE.DM_ID),
CTE3 AS
(SELECT -- Set the new document type ID based on the row number.
*
,CASE
WHEN ROWNUM = 1
THEN 1
WHEN ROWNUM = 2
THEN 3
WHEN ROWNUM = 3
THEN 5
END AS 'DM_CONTENT_TYPE_ID_NEW'
FROM
CTE2)
UPDATE -- Update the records.
DMC
SET
DMC.DM_CONTENT_TYPE_ID = CTE3.DM_CONTENT_TYPE_ID_NEW
FROM
[DM_CONTENT] DMC
JOIN CTE3
ON DMC.DM_ID = CTE3.DM_ID
Now when I execute the script, it says that the appropriate rows have been affected. However, when I check the [DM_CONTENT] table, the DM_CONTENT_TYPE_ID actually hasn't been updated and still remains at a value of 1. If I SELECT from CTE3, the DM_CONTENT_TYPE_ID_NEW, is the appropriate new ID. My logic seems to be sound, but I cannot figure out what mistake I am making. Does anyone have any insight? Thanks in advance!

This seems much simpler to write as:
WITH toupdate AS (
SELECT DMC.*,
ROW_NUMBER() OVER (PARTITION BY DMC.DM_ID, DMC.DM_CONTENT_TYPE_ID, DMC.DMC_TYPE
ORDER BY DMC.DMC_PATH) AS ROWNUM
FROM DM_CONTENT DMC
)
UPDATE toupdate
SET DM_CONTENT_TYPE_ID = (CASE ROWNUM WHEN 2 THEN 3 WHEN 3 THEN 5 END)
WHERE ROWNUM > 1;
Now, I find it suspicious that your join conditions are only on DM_ID. I think the problem is that you are getting multiple matches between the CTE and your table. An arbitrary match is used for the update -- and that happens to be the first one encountered (hence a value of 1).

Try
UPDATE CTE3
SET DM_CONTENT_TYPE_ID = DM_CONTENT_TYPE_ID_NEW
instead of what you're currently doing.
Updating from a CTE works a little different that regular table joins.

Should work with any no. of duplicates. Try this way
;WITH cte
AS (SELECT Row_number()
OVER(
partition BY dm_id, dm_content_type_id, dmc_type
ORDER BY DMC_PATH) AS Rn,
*
FROM dm_content)
UPDATE cte
SET dm_content_type_id = rn + (rn -1)

Related

Query Optimization - subselect in Left Join

I'm working on optimizing a sql query, and I found a particular line that appears to be killing my queries performance:
LEFT JOIN anothertable lastweek
AND lastweek.date>= (SELECT MAX(table.date)-7 max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
AND lastweek.date< (SELECT MAX(table.date) max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
I'm working on a way of optimizing these lines, but I'm stumped. If anyone has any ideas, please let me know!
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1908654 | 145057704 | 720461 | 00:00:29 |
| * 1 | HASH JOIN RIGHT OUTER | | 1908654 | 145057704 | 720461 | 00:00:29 |
| 2 | VIEW | VW_DCL_880D8DA3 | 427487 | 7694766 | 716616 | 00:00:28 |
| * 3 | HASH JOIN | | 427487 | 39328804 | 716616 | 00:00:28 |
| 4 | VIEW | VW_SQ_2 | 7174144 | 193701888 | 278845 | 00:00:11 |
| 5 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 6 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 7 | HASH JOIN | | 8549735 | 555732775 | 429294 | 00:00:17 |
| 8 | VIEW | VW_SQ_1 | 7174144 | 172179456 | 278845 | 00:00:11 |
| 9 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 10 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| 11 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 12 | TABLE ACCESS STORAGE FULL | TASK | 1908654 | 110701932 | 2520 | 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("SYS_ID"(+)="TASK"."PARENT")
* 3 - access("ITEM_2"="TASK_LWEEK"."SYS_ID")
* 3 - filter("TASK_LWEEK"."SNAPSHOT_DATE"<"MAX_DATE_LWEEK")
* 7 - access("ITEM_1"="TASK_LWEEK"."SYS_ID")
* 7 - filter("TASK_LWEEK"."SNAPSHOT_DATE">=INTERNAL_FUNCTION("MAX_DATE_LWEEK"))
* 12 - storage("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
* 12 - filter("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
Well, you are not even showing the select. As I can see that the select is done over Exadata ( Table Access Storage Full ) , perhaps you need to ask yourself why do you need to make 4 access to the same table.
You access fourth times ( lines 6, 10, 11, 12 ) to the main table TASK with 170994691 rows ( based on estimation of the CBO ). I don't know whether the statistics are up-to-date or it is optimizing sampling kick in due to lack of good statistics.
A solution could be use WITH for generating intermediate results that you need several times in your outline query
with my_set as
(SELECT MAX(table.date)-7 max_date_lweek ,
max(table.date) as max_date,
id from FROM table )
select
.......................
from ...
left join anothertable lastweek on ( ........ )
left join myset on ( anothertable.id = myset.id )
where
lastweek.date >= myset.max_date_lweek
and
lastweek.date < myset.max_date
Please, take in account that you did not provide the query, so I am guessing a lot of things.
Since complete information is not available I will suggest:
You are using the same query twice then why not use CTE such as
with CTE_example as (SELECT MAX(table.date), max_date_lweek, ID
FROM table table)
Looking at your explain plan, the only table being accessed is TASK. From that, I infer that the tables in your example: ANOTHERTABLE and TABLE are actually the same table and that, therefore, you are trying to get the last week of data that exists in that table for each id value.
If all that is true, it should be much faster to use an analytic function to get the max date value for each id and then limit based on that.
Here is an example of what I mean. Note I use "dte" instead of "date", to remove confusion with the reserved word "date".
LEFT JOIN ( SELECT lastweek.*,
max(dte) OVER ( PARTITION BY id ) max_date
FROM anothertable lastweek ) lastweek
ON 1=1 -- whatever other join conditions you have, seemingly omitted from your post
AND lastweek.dte >= lastweek.max_date - 7;
Again, this only works if I am correct in thinking that table and anothertable are actually the same table.

MS-Script query unable to get working or to run

I am trying to create a query to show the top 24 most-viewed pages by joining 3 tables.
But, I am having trouble getting it to work. Either it has an issue with the use of UNION, JOIN or a part of the written function/script, in general.
The tables are:
+---------------+
| dbo_Good_URLs |
+---------------+
| Url |
| HTTPAlias |
| PortalID |
| page_title |
+---------------+
+-----------------+
| dbo_vw_GoodURLs |
+-----------------+
| URL |
| PortalID |
| HTTPAlias |
| Title |
+-----------------+
+-----------------------+
| dbo_analytics_history |
+-----------------------+
| URL |
| PortalId |
| HTTPAlias |
| Page_Title |
| Report_Month |
| Report_Year |
| Pageviews |
| Unique_Pageviews |
| Entrances |
| Total_Time_on_Page |
| Bounces |
| Exits |
| Avg_Time_on_Page |
| Bounce_Rate |
| Exit_Rate |
+-----------------------+
I've tried to use an IIF(Is Null(**) And I've looked through to script itself to see why UNION and JOIN seem to not work and I can't seem to figure it out.
I've been playing around with this all week and it's just not coming to me.
SELECT TOP 24 dbo_Good_URLs.Url, Nz(dbo_analytics_history.Pageviews, 0) AS Total_Pageviews,
Nz(dbo_analytics_history.Pageviews, 0) AS Month1
FROM (SELECT Url FROM dbo_Good_URLs WHERE HTTPAlias IN ('x.org', 'ab.x.org'))
UNION
SELECT Url FROM dbo_vw_GoodURLs WHERE dbo_Good_URLs.HTTPAlias IN ('x.org', 'ab.x.org')
LEFT OUTER JOIN dbo_analytics_history
ON dbo_Good_URLs.Url = dbo_analytics_history.URL AND dbo_analytics_history.HTTPAlias IN ('x.org', 'ab.x.org') AND dbo_analytics_history.Report_Month = 10
GROUP BY dbo_Good_URLs.Url, dbo_analytics_history.Pageviews
ORDER BY Nz(dbo_analytics_history.Pageviews, 0) DESC;
The result that I am looking for is for it to show the top 24 pages viewed for the month of October(I.e. month 10)
I would hazard a guess that you're actually looking for something like the following:
select top 24 q.url, nz(a.pageviews, 0) as Total_Pageviews, nz(a.pageviews, 0) as Month1
from
(
select dbo_good_urls.url from dbo_good_urls
where dbo_good_urls.httpalias in ('x.org', 'ab.x.org')
union
select dbo_vw_goodurls.url from dbo_vw_goodurls
where dbo_vw_goodurls.httpalias in ('semcog.org', 'loggedin.semcog.org')
) q
left join dbo_analytics_history a on q.url = a.url
)
where
a.report_month is null or a.report_month = 10
order by
nz(a.pageviews, 0) desc;
Here, the target URLs are selected by the two unioned subqueries, with the result of such union left joined to your dbo_analytics_history table on the url field.

ORDER BY FIELD LIST - Subquery returns more than 1 row

What i want to do is quite simple:
Write an SQL that will return a bunch of record and order the records by some list of id from the FIELD LIST section of my SQL
TABLE SAMPLE
lessons
+----+----------------------+
| id | name |
+----+----------------------+
| 9 | Greedy algorithms |
| 5 | Maya civilization |
| 3 | eFront Beginner |
| 2 | eFront Intermediate |
+----+----------------------+
mod_comp_rule
+----+---------------------+
| id | lesson_id | comp_id |
+----+---------------------+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 9 | 2 |
+----+---------------------+
WHAT I WANT TO GET FROM MY QUERY
SELECT * FROM lessons ORDER BY FIELD(id,'3','2','9') ASC;
MY SQL
SELECT ls.id, ls.name
FROM lessons ls
ORDER BY FIELD(ls.id,
(SELECT mcr.lesson_id FROM mod_comp_rule mcr
INNER JOIN lessons ls ON ls.id = mcr.lesson_id))
My SQL Query returned the following error
MySQL said: #1242 - Subquery returns more than 1 row
So how can i make my SQL return FIELD(id,'3','2','9') without flagging the more than 1 row error ?
I don't see why FIELD() is needed for this. A correlated query will do what you want:
SELECT ls.id, ls.name
FROM lessons ls
ORDER BY (SELECT mcr.id FROM mod_comp_rule mcr WHERE ls.id = mcr.lesson_id);

Pick a record based on a given value in postgres

I have a table in postgres like below,
alg_campaignid | alg_score | cp | sum
----------------+-----------+---------+----------
9829 | 30.44056 | 12.4000 | 12.4000
9880 | 29.59280 | 12.0600 | 24.4600
9882 | 29.59280 | 12.0600 | 36.5200
9827 | 29.27504 | 11.9300 | 48.4500
9821 | 29.14840 | 11.8800 | 60.3300
9881 | 29.14840 | 11.8800 | 72.2100
9883 | 29.14840 | 11.8800 | 84.0900
10026 | 28.79280 | 11.7300 | 95.8200
10680 | 10.31504 | 4.1800 | 100.0000
From which i have to select a record based on randomly generated number from 0 to 100.i.e first record should be returned if random number picked is between 0 and 12.4000,second if rendom is between 12.4000 and 24.4600,and likewise last if random no is between 95.8200 and 100.0000.
For Example
if the random number picked is 8 then the first record should be returned
or
if the random number picked is 48 then the fourth record should be returned
Is it possible to do this postgres if so kindly recommend a solution for this..
Yes, you can do this in Postgres. If you want to generate the number in the database:
with r as (
select random() * 100 as r
)
select t.*
from table t cross join r
where t.sum <= r.r
order by t.sum desc
limit 1;

Is there a single query that can update a "sequence number" across multiple groups?

Given a table like below, is there a single-query way to update the table from this:
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | NULL |
| 2 | 1 | 2010-04-27 | NULL |
| 3 | 2 | 2010-04-28 | NULL |
| 4 | 3 | 2010-04-28 | NULL |
To this (note that created_at is used for ordering, and sequence is "grouped" by type_id):
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | 1 |
| 2 | 1 | 2010-04-27 | 2 |
| 3 | 2 | 2010-04-28 | 1 |
| 4 | 3 | 2010-04-28 | 1 |
I've seen some code before that used an # variable like the following, that I thought might work:
SET #seq = 0;
UPDATE `log` SET `sequence` = #seq := #seq + 1
ORDER BY `created_at`;
But that obviously doesn't reset the sequence to 1 for each type_id.
If there's no single-query way to do this, what's the most efficient way?
Data in this table may be deleted, so I'm planning to run a stored procedure after the user is done editing to re-sequence the table.
You can use another variable storing the previous type_id (#type_id). The query is ordered by type_id, so whenever there is a change in type_id, sequence has to be reset to 1 again.
Set #seq = 0;
Set #type_id = -1;
Update `log`
Set `sequence` = If(#type_id=(#type_id:=`type_id`), (#seq:=#seq+1), (#seq:=1))
Order By `type_id`, `created_at`;
I don't know MySQL very well, but you could use a sub query though it may be very slow.
UPDATE 'log' set 'sequence' = (
select count(*) from 'log' as log2
where log2.type_id = log.type_id and
log2.created_at < log.created_at) + 1
You'll get duplicate sequences, though, if two type_ids have the same created_at date.