Ntile function not dividing groups evenly when being ran from stored procedure

Ntile function not dividing groups evenly when being ran from stored procedure - sql

I have the following code inside a stored procedure.
select
ID,NTILE(2) OVER (Partition by GroupID order by newID()) as RandomSplit
into #TempSplit
from TableA
where IsUpdated = 1
Update a
set a.SplitColumn = CASE WHEN b.RandomSplit = 1 THEN 'A'
WHEN b.RandomSplit = 2 THEN 'B'
END
from Table A a
inner join #TempSplit b
on a.ID = b.ID and a.IsUpdated = 1
This code works as expected and produces the data table below.
GroupID SplitColumn
1 | A
1 | A
1 | B
1 | B
2 | A
3 | A
3 | B
However,when I execute this code from the stored procedure I get the following results
GroupID SplitColumn
1 | A
1 | A
1 | A
1 | B
2 | A
3 | A
3 | B
This is sample data but basically what is happening is that when I execute from the stored procedure the groups are not distributed evenly(in the real data the variation is by thousands rather than just one). Not sure what is exactly causing this behavior since again if I execute the code manually it comes up with the correct behavior.
Also I know this is a small sample of what is happening, but its also not happening for all GroupIDs. Meaning GroupID = 3 always gets split correctly into two even groups, while say GroupID = 1 always gets wrongly split.

You are creating #TempSplit only for the rows where IsUpdated = 1.
However, you are joining back to all the values. If id is duplicated in TableA, then you would get results as you see.

Related

SQL UPDATE issues

I'm currently working on an application where an UPDATE query is performed multiple times in a row on a single table and I've stumbled into a problem.
If UPDATE query ends up swapping two rows, e.g. UPDATING 1 -> 2 and then 2 -> 1
The following happens
original | 1 -> 2 | 2 -> 1 | what i want
1 | 2 | 1 | 2
2 | 2 | 1 | 1
3 | 3 | 3 | 3
4 | 4 | 4 | 4
There are no other columns which can be used to further differentiate the tuples consistently.
Is there a way to achieve 'what i want' without restructuring the table/database? One solution I could think of is to first delete all the rows and insert the updated ones instead (this is satisfactory implementation wise) but I'd like to know whether it's doable with an UPDATE query.

Either make this one update:
update mytable
set col = case when col = 1 then 2 else 1 end
where col in (1,2);
Or three updates (by using an "impossible" value, i.e. a value that is not used in the column):
update mytable set col = -1 where col = 1;
update mytable set col = 1 where col = 2;
update mytable set col = 2 where col = -1;

u may select the ids first and put them on a two separate lists or variables to store ids then update based on that previously selected ids ?

Only select from tables that are of interest

SQL Server 2016
I have a number of tables
Table A Table B Table C Table D
User | DataA User | DataB User | DataC User | DataD
=========== =========== =================== =============
1 | 10 1 | 'hello' 4 | '2020-01-01' 1 | 0.34
2 | 20 2 | 'world'
3 | 30
So some users have data for A,B,C and/or D.
Table UserEnabled
User | A | B | C | D
=============================
1 | 1 | 1 | 0 | 0
2 | 1 | 1 | 0 | 0
3 | 1 | 0 | 0 | 0
4 | 0 | 0 | 1 | 0
Table UserEnabled indicates whether we are interested in any of the data in the corresponding tables A,B,C and/or D.
Now I want to join those tables on User but I do only want the columns where the UserEnabled table has at least one user with a 1 (ie at least one user enabled). Ideally I only want to join the tables that are enabled and not filter the columns from the disabled tables afterwards.
So as a result for all users I would get
User | DataA | DataB | DataC
===============================
1 | 10 | 'hello' | NULL
2 | 20 | 'world' | NULL
3 | 30 | NULL | NULL
4 | NULL | NULL | '2020-01-01'
No user has D enabled so it does not show up in a query
I was going to come up with a dynamic SQL that's built every time I execute the query depending on the state of UserEnabled but I'm afraid this is going to perform poorly on a huge data set as the execution plan will need to be created every time. I want to dynamically display only the enabled data, not columns with all NULL.
Is there another way?
Usage will be a data sheet that may be generated up to a number of times per minute.

You have no choice but to approach this through dynamic SQL. A select query has a fixed set of columns defined when the query is created. No such thing as "variable" columns.
What can you do? One method is to "play a trick". Store the columns as JSON (or XML) and delete the empty columns.
Another method is to create a view that has the specific logic you need. I think you can maintain this view by altering it in a trigger, based on when data in the enabled table changes. That said, altering the view requires dynamic SQL so the code will not be pretty.

Just because I thought this could be fun.
Example
Declare #Col varchar(max) = ''
Declare #Src varchar(max) = ''
Select #Col = #Col+','+Item+'.[Data'+Item+']'
,#Src = #Src+'Left Join [Table'+Item+'] '+Item+' on U.[User]=['+Item+'].[User] and U.['+Item+']=1'+char(13)
From (
Select Item
From ( Select A=max(A)
,B=max(B)
,C=max(C)
,D=max(D)
From UserEnabled
Where 1=1 --<< Use any Key Inital Filter Condition Here
) A
Unpivot ( value for item in (A,B,C,D)) B
Where Value=1
) A
Declare #SQL varchar(max) = '
Select U.[User]'+#Col+'
From #UserEnabled U
'+#Src
--Print #SQL
Exec(#SQL)
Returns
User DataA DataB DataC
1 10 Hello NULL
2 20 World NULL
3 30 NULL NULL
4 NULL NULL 2020-01-01
The Generated SQL
Select A.[User],A.[DataA],B.[DataB],C.[DataC]
From UserEnabled U
Left Join TableA A on U.[User]=[A].[User] and U.[A]=1
Left Join TableB B on U.[User]=[B].[User] and U.[B]=1
Left Join TableC C on U.[User]=[C].[User] and U.[C]=1

If all the relations are 1:1, you can make one query with
...
FROM u
LEFT JOIN a ON u.id = a.u_id
LEFT JOIN b ON u.id = b.u_id
LEFT JOIN c ON u.id = c.u_id
LEFT JOIN d ON u.id = d.u_id
...
and use display logic on the client to omit the irrelevant columns.
If more than one relation is 1:N, then you'd likely have to do multiple queries anyway to prevent N1xN2 results.

Update column in one table for a user based on count of required records in another table for same user without cursor

I have 2 tables A and B. I need to update a column in table A for all userid's based on the count of records that userid has in another table based on defined rules. If count of records in another table is 3 and is required for that userID, then mark IsCorrect as 1 else 0, if count is 2 and required is 5 then IsCorrect as 0 For e.g. Below is what I am trying to achieve
Table A
UserID | Required | IsCorrect
----------------------------------
1 | SO;GO;PE | 1
2 | SO;GO;PE;PR | 0
3 | SO;GO;PE | 1
Table B
UserID | PPName
-----------------------
1 | SO
1 | GO
1 | PE
2 | SO
2 | GO
3 | SO
3 | GO
3 | PE
I tried using Update in table joining another table, but cannot up with one. Also, do not want to use cursors, because of its overhead. I know I will have to create a stored Procedure for it for the rules, but how to pass the userID's to it without cursor is what am i am looking for.
This is an update for my earlier question. Thanks for the help.

Here's a solution for PostgreSQL:
update TableA
set IsCorrect =
case when
string_to_array(Required, ';') <#
(select array_agg(PPName)
from TableB
where TableA.UserID = TableB.UserID)
then 1
else 0
end;
You can also see it live on SQL Fiddle.

use sub-query and aggregate function and then case when for conditional update
update TableA A
set A.IsCorrect= case when T.cnt>=3 then 1 else 0 end
inner join
(
select B.UserID ,count(*) as cnt from TableB as B
group by UserID
) as T
on A.userid=T.UserID

Update column in one table for a user based on count of records in another table for same user without using cursor

I have 2 tables A and B. I need to update a column in table A for all userid's based on the count of records that userid has in another table based on defined rules. If count of records in another table is 3 and is required for that userID, then mark IsCorrect as 1 else 0, if count is 2 and required is 5 then IsCorrect as 0 For e.g. Below is what I am trying to achieve
Table A
UserID | Required | IsCorrect
----------------------------------
1 | SO;GO;PE | 1
2 | SO;GO;PE;PR | 0
3 | SO;GO;PE | 1
Table B
UserID | PPName
-----------------------
1 | SO
1 | GO
1 | PE
2 | SO
2 | GO
3 | SO
3 | GO
3 | PE
I tried using Update in table joining another table, but cannot up with one. Also, do not want to use cursors, because of its overhead. I know I will have to create a stored Procedure for it for the rules, but how to pass the userID's to it without cursor is what am i am looking for.
Thanks for the help. Apologies for not formatting the table correctly :)

update A
set IsCorrect = case
when Required <= (select count(*) from B where b.UserID = A.UserID)
then 'Y' -- or 0, or whatever sense is appropriate
else 'N'
end

THIS ANSWERS THE ORIGINAL QUESTION.
Hmmm, you can use a correlated subquery and some case logic:
update a
set iscorrect = (case when required <=
(select count(*) from b where b.userid = a.userid)
then 1 else 0
end);

DB2 large update from another table

I have a table with 600 000+ rows called asset. The customer has added a new column and would like it populated with a value from another table:
ASSET TEMP
| id | ... | newcol | | id | condition |
--------------------- ------------------
|0001| ... | - | |0001| 3 |
If I try to update it all at once, it times out/claims there is a dead lock:
update asset set newcol = (
select condition from temp where asset.id = temp.id
) where newcol is null;
The way I got around it was by only doing a 100 rows at a time:
update (select id, newcol from asset where newcol is null
fetch first 100 rows only) a1
set a1.newcol = (select condition from temp a2 where a1.id = a2.id);
At the moment I am making good use of the copy/paste utility, but I'd like to know of a more elegant way to do it (as well as a faster way).
I have tried putting it in a PL/SQL loop but I can't seem to get it to work with DB2 as a standalone script.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Ntile function not dividing groups evenly when being ran from stored procedure - sql

You are creating #TempSplit only for the rows where IsUpdated = 1. However, you are joining back to all the values. If id is duplicated in TableA, then you would get results as you see.

Related

SQL UPDATE issues

Only select from tables that are of interest

Update column in one table for a user based on count of required records in another table for same user without cursor

Update column in one table for a user based on count of records in another table for same user without using cursor

DB2 large update from another table

Categories

Resources