How to update table by of the records in the table - sql

I have a table in PostgerSQL and I need to make N entries in the table twice and for the first half I need to fill in the partner_id field with the value 1 and the second half with the value partner_id = 2.
i try to `
update USERS_TABLE set user_rule_id = 1;
update USERS_TABLE set user_rule_id = 2 where USERS_TABLE.id > count(*)/2;
`

I depends a lot how precise the number of users have to be that are updated with 1 or 2.
The following would be quite unprecise,a s it doesn't take the exact number of user that already exist8after deleting some rows the numbers doesn't fit anymore.
SELECT * FROM USERS_TABLE
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
If you have a lot of deleted rows and want still the half of the users, you can choose following approach, which does rely on the id, but at teh actual row number
UPDATE USERS_TABLE1
set user_rule_id = CASE WHEN rn <= (SELECT count(*) FROM USERS_TABLE1)/ 2 then 1
ELSE 2 END
FROM (SELECT id, ROW_NUMBER() OVER( ORDER BY id) rn FROM USERS_TABLE1) t
WHERE USERS_TABLE1.id = t.id;
UPDATE 5
SELECT * FROM USERS_TABLE1
id
user_rule_id
1
1
2
1
3
2
4
2
5
2
SELECT 5
fiddle
In the sample case it it the same result, but when you have a lot of rows and a bunch of the deleted users, the senind will give you quite a good result

Related

SQL To delete number of items is less than required item number

I have two tables - StepModels (support plan) and FeedbackStepModels (feedback), StepModels keeps how many steps each support plan requires.
SELECT [SupportPlanID],COUNT(*)AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
SupportPlanID (Steps)
-------------------------------
1 4
2 9
3 3
4 10
FeedbackStepModels keeps how many steps employee entered the system
SELECT [FeedbackID],SupportPlanID,Count(*)AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,SupportPlanID
FeedbackID SupportPlanID
---------------------------------------------
1 1 3 --> this suppose to be 4
2 2 9 --> Correct
3 3 0 --> this suppose to be 3
4 4 10 --> Correct
If submitted Feedback steps total is less then required total amount I want to delete this wrong entry from the database. Basically i need to delete FeedbackID 1 and 3.
I can load the data into List and compare and delete it, but want to know if we can we do this in SQL rather than C# code.
You can use the query below to remove your unwanted data by SQL Script
DELETE f
FROM FeedbackStepModels f
INNER JOIN (
SELECT [FeedbackID],SupportPlanID, Count(*) AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,SupportPlanID
) f_derived on f_derived_FeedbackID=f.FeedBackID and f_derived.SupportPlanID = f.SupportPlanID
INNER JOIN (
SELECT [SupportPlanID],COUNT(*)AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
) s_derived on s_derived.SupportPlanID = f.SupportPlanID
WHERE f_derived.StepsNumber < s_derived.Steps
I think you want something like this.
DELETE FROM [FeedbackStepModels]
WHERE FeedbackID IN
(
SELECT a.FeedbackID
FROM
(
SELECT [FeedbackID],
SupportPlanID,
COUNT(*) AS StepsNumber
FROM [FeedbackStepModels]
GROUP BY FeedbackID,
SupportPlanID
) AS a
INNER JOIN
(
SELECT [SupportPlanID],
COUNT(*) AS Steps
FROM [StepModels]
GROUP BY SupportPlanID
) AS b ON a.SupportPlanID = b.[SupportPlanID]
WHERE a.StepsNumber < b.Steps
);

Keyset pagination with composite key

I am using oracle 12c database and I have a table with the following structure:
Id NUMBER
SeqNo NUMBER
Val NUMBER
Valid VARCHAR2
A composite primary key is created with the field Id and SeqNo.
I would like to fetch the data with Valid = 'Y' and apply ketset pagination with a page size of 3. Assume I have the following data:
Id SeqNo Val Valid
1 1 10 Y
1 2 20 N
1 3 30 Y
1 4 40 Y
1 5 50 Y
2 1 100 Y
2 2 200 Y
Expected result:
----------------------------
Page 1
----------------------------
Id SeqNo Val Valid
1 1 10 Y
1 3 30 Y
1 4 40 Y
----------------------------
Page 2
----------------------------
Id SeqNo Val Valid
1 5 50 Y
2 1 100 Y
2 2 200 Y
Offset pagination can be done like this:
SELECT * FROM table ORDER BY Id, SeqNo OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY;
However, in the actual db it has more than 5 millions of records and using OFFSET is going to slow down the query a lot. Therefore, I am looking for a ketset pagination approach (skip records using some unique fields instead of OFFSET)
Since a composite primary key is used, I need to offset the page with information from more than 1 field.
This is a sample SQL that should work in PostgreSQL (fetch 2nd page):
SELECT * FROM table WHERE (Id, SeqNo) > (1, 4) AND Valid = 'Y' ORDER BY Id, SeqNo LIMIT 3;
How do I achieve the same in oracle?
Use row_number() analytic function with ceil arithmetic fuction. Arithmetic functions don't have a negative impact on performance, and row_number() over (order by ...) expression automatically orders the data without considering the insertion order, and without adding an extra order by clause for the main query. So, consider :
select Id,SeqNo,
ceil(row_number() over (order by Id,SeqNo)/3) as page
from tab
where Valid = 'Y';
P.S. It also works for Oracle 11g, while OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY works only for Oracle 12c.
Demo
You can use order by and then fetch rows using fetch and offset like following:
Select ID, SEQ, VAL, VALID FROM TABLE
WHERE VALID = 'Y'
ORDER BY ID, SEQ
--FETCH FIRST 3 ROWS ONLY -- first page
--OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY -- second pages
--OFFSET 6 ROWS FETCH NEXT 3 ROWS ONLY -- third page
--Update--
You can use row_number analytical function as following.
Select id, seqNo, Val, valid from
(Select t.*,
Row_number(order by id, seq) as rn from table t
Where valid = 'Y')
Where ceil(rn/3) = 2 -- for page no. 2
Cheers!!

How to identify subsequent user actions based on prior visits

I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!
Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1
select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.
I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2
Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Returning several rows from a single query, based on a value of a column

Let's say I have this table:
|Fld | Number|
1 5
2 2
And I want to make a select that retrieves as many Fld as the Number field has:
|Fld |
1
1
1
1
1
2
2
How can I achieve this? I was thinking about making a temporary table and instert data based on the Number, but I was wondering if this could be done with a single Select statement.
PS: I'm new to SQL
You can join with a numbers table:
SELECT Fld
FROM yourtable
JOIN Numbers
ON yourtable.Number <= Numbers.Number
A numbers table is just a table with a list of numbers:
Number
1
2
3
etc...
Not an great solution (since you still query your table twice, but maybe you can work from it)
SELECT t1.fld, t1.number
FROM table t1, (
SELECT ROWNUM number FROM dual
CONNECT BY LEVEL <= (SELECT MAX(number) FROM t1)) t2
WHERE t2.number<=t1.number
It generates maximum amount of rows needed and then filters it by each row.
I don't know if your RDBMS version supports it (although I rather suspect it does), but here is a recursive version:
WITH remaining (fld, times) as (SELECT fld, 1
FROM <table>
UNION ALL
SELECT a.fld, a.times + 1
FROM remaining as a
JOIN <table> as b
ON b.fld = a.fld
AND b.number > a.times)
SELECT fld
FROM remaining
ORDER BY fld
Given your source data table, it outputs this (count included for verification):
fld times
=============
1 1
1 2
1 3
1 4
1 5
2 1
2 2