How to randomly update rows in SQL for entire table - sql

I have a table where for a column id, I want to randomly assign it "a","b", or "c" randomly and equally. There's also a condition where name has to be "test".
UPDATE table
WHERE name="test"
SET id=
CASE WHEN rand() < 1/3 THEN "a"
CASE WHEN rand() < 2/3 THEN "b"
CASE WHEN rand() < 1 THEN "c"
I'm not sure how this would work though, because I'm not sure if rand() will run for every row?
I'm pretty new to SQL so this is a bit confusing, thanks!

This answers the original version of the question
Just use arithmetic:
UPDATE table
SET id = 1 + floor(rand * 3)
WHERE name = 'test'

If by randomly and equally you mean that you want to ensure the number of rows in the 3 groups are distributed as equally as possible, you have to use ntile() like so:
CREATE TABLE public.test
( pk serial primary key,
id varchar,
name character varying
)
insert into test (name)values(generate_series(1,150)::varchar);
--99 rows of test data:
update test set name = 'test' where pk < 100;
-- partition by random and divide into 3 equal parts:
SELECT t.*, ntile(3) over(order by random())AS n from test t where name = 'test' ;
-- use this select in an update query:
UPDATE test
SET id = case sel.n when 1 then 'a'
when 2 then 'b'
when 3 then 'c'
END
FROM
(SELECT t.*, ntile(3) over(order by random())AS n
FROM test t WHERE name = 'test'
)sel
WHERE sel.pk = test.pk;
-- test result
select count(*) from test where name = 'test' group by id;
-- 33
-- 33
-- 33

Related

SQL Update all rows for one column with a list of values

How can I set all rows of a column by manually typing the values?
My table has 3 rows, I want the column named mycolumn to have 3 values a, b and c (currently those values are NULL):
update mytable set mycolumn = ('a','b','c')
ORA-00907 missing right parenthesis
EDIT: my table is very simple, I have one column ID INT NOT NULL with values 1, 2, 3 and another column mycolumn with all NULL values and I want those values to become 'a' where ID = 1, 'b' where ID=2 etc.
EDIT2: I might have a huge amount of rows, so I want to avoid typing every single ID value where to replace mycolumn. Isn't it possible to match the ID values of 1 to 3 to the values 'a', 'b', 'c' in an automatic way, something like match(ID, ('a','b','c')) perhaps
I just want to replace all values of mycolumn by increasing order of ID. ID being strictly equivalent to what I call a row number in a matrix
EDIT3: I'd like a solution which would work in a general case with all sorts of values, not only the letters of the alphabet given here for simplicity. What if for example my values to replace in mycolumn are ('oefaihfoiashfe', 'fiaohoawdihoiwahopah', 'aefohdfaohdao')? However the ID row numbers will always be a sequence from 1 to N by 1.
Obviously, you should do this in a single update. Like this:
update mytable
set mycolumn = case id when 1 then 'a' when 2 then 'b' when 3 then 'c' end
;
More compact (but also more cryptic, and only works in Oracle, while case expressions are in the SQL standard):
update mytable
set mycolumn = decode(id, 1, 'a', 2, 'b', 3, 'c')
;
Note - this only works if there really are only three rows. If you have many more rows, make sure to add where id in (1, 2, 3) at the end. Otherwise all the OTHER values (in the other rows) will be updated to null!
You can try an update like the one below. This will update 1 > a, 2 > b, 3 > c, 4 > d, etc. When you reach ID 27, since there are no more letters, it will begin at a again and continue down the alphabet.
UPDATE mytable
SET mycolumn = CASE MOD (id, 26)
WHEN 0 THEN 'z'
ELSE CHR (MOD (id, 26) + 96)
END;
Update
To update based on any list of values, you can try an update statement like the one below. If you add a 4th item to the comma delimited list, ID 4 in mytable will be set to whatever you specified as the 4th value.
UPDATE mytable
SET mycolumn =
(SELECT COLUMN_VALUE
FROM (SELECT ROWNUM AS row_num, t.COLUMN_VALUE
FROM TABLE (
sys.odcivarchar2list ('oefaihfoiashfe',
'fiaohoawdihoiwahopah',
'aefohdfaohdao')) t)
WHERE row_num = id);
Hmmm . . . A row can only have one value. Perhaps something like this to assign random values:
update mytable
set mycolumn = (case floor(dbms_random.random * 3)
case 0 then 'a' case 1 then 'b' else 'c'
end)
if you want the 3 rows to have different values a, b and c then you will have to write 3 update statements.
update mytable set mycolumn = 'a' where id = 1;
update mytable set mycolumn = 'b' where id = 2;
update mytable set mycolumn = 'c' where id = 3;

Create a new table with columns with case statements and max function

I have some problems in creating a new table from an old one with new columns defined by case statements.
I need to add to a new table three columns, where I compute the maximum based on different conditions. Specifically,
if time is between 1 and 3, I define a variable max_var_1_3 as max((-1)*var),
if time is between 1 and 6, I define a variable max_var_1_6 as max((-1)*var),
if time is between 1 and 12, I define a variable max_var_1_12 as max((-1)*var),
The max function needs to take the maximum value of the variable var in the window between 1 and 3, 1 and 6, 1 and 12 respectively.
I wrote this
create table new as(
select t1.*,
(case when time between 1 and 3 then MAX((-1)*var)
else var
end) as max_var_1_3,
(case when time between 1 and 6 then MAX((-1)*var)
else var
end) as max_var_1_6,
(case when time between 1 and 12 then MAX((-1)*var)
else var
end) as max_var_1_12
from old_table t1
group by time
) with data primary index time
but unfortunately it is not working. The old_table has already some columns, and I would like to import all of them and then compare the old table with the new one. I got an error that says that should be something between ) and ',', but I cannot understand what. I am using Teradata SQL.
Could you please help me?
Many thanks
The problem is that you have GROUP BY time in your query while trying to return all the other values with your SELECT t1.*. To make your query work as-is, you'd need to add each column from t1.* to your GROUP BY clause.
If you want to find the MAX value within the different time ranges AND also return all the rows, then you can use a window function. Something like this:
CREATE TABLE new AS (
SELECT
t1.*,
CASE
WHEN t1.time BETWEEN 1 AND 3 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 3 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_3,
CASE
WHEN t1.time BETWEEN 1 AND 6 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 6 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_6,
CASE
WHEN t1.time BETWEEN 1 AND 12 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 12 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_12,
FROM old_table t1
) WITH DATA PRIMARY INDEX (time)
;
Here's the logic:
check if a row falls in the range
if it does, return the desired MAX value for rows in that range
otherwise, just return that given row's default value (var)
return all rows along with the three new columns
If you have performance issues, you could also move the max_var calculations to a CTE, since they only need to be calculated once. Also to avoid confusion, you may want to explicitly specify the values in your SELECT instead of using t1.*.
I don't have a TD system to test, but try it out and see if that works.
I cannot help with the CREATE TABLE AS, but the query you want is this:
SELECT
t.*,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 3) AS max_var_1_3,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 6) AS max_var_1_6,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 12) AS max_var_1_12
FROM old_table t;

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

If null, return multiple values

I had posted a similar question earlier - a slightly different requirement here.
I have a textBox which returns the user-selected 'Number' value.(eg. : 100,200,300)
What needs to be done is basically check a table MyTable if a record/records exist for the particular Number value selected by user. If it returns NULL, then I need to return the records for the default Number value of 999.
MyTable:
id Number MyVal
1 100 55
2 200 66
3 400 22
4 400 12
5 999 23
6 999 24
Here's what I have so far :(Assuming textBoxInput(Number) = 300)
SELECT Myval
from MyTable
where id in (
SELECT ISNULL(
SELECT id
from MyTable
where Number=300,
select id
from MyTable
where Number = 999
)
)
So here, since Number=300 does not exist in the table, return the records for Number=999.
But when I run this query, I'm getting an error 'Subquery returned more than 1 value...'
Any suggestions/ideas?
This should work:
SELECT Myval
from MyTable
where Number = #Number
OR (NOT EXISTS(SELECT * FROM MyTable WHERE Number = #Number) AND Number = 999)
SELECT id, Myval
FROM MyTable
WHERE id = #id
UNION
SELECT id, 999 AS Myval
FROM MyTable
WHERE id = 999
AND #id IS NULL;

SQL retrieval from tables

I have a table something like
EMPLOYEE_ID DTL_ID COLUMN_A COLUMN_B
---------------------------
JOHN 0 1 1
JOHN 1 3 1
LINN 0 1 12
SMITH 0 9 1
SMITH 1 11 12
It means for each person there will be one or more records with different DTL_ID's value (0, 1, 2 .. etc).
Now I'd like to create a T-SQL statement to retrieve the records with EMPLOYEE_ID and DTL_ID.
If the specified DTL_ID is NOT found, the record with DTL_ID=0 will be returned.
I know that I can achieve this in various ways such as checking if a row exists via EXISTS or COUNT(*) first and then retrieve the row.
However, I'd like to know other possible ways because this retrieval statement is very common in my application and my table have hundred thousand of rows.
In the above approach, I've had to retrieve twice even if the record with the DTL_ID specified exists, and I want to avoid this.
Like this:
SELECT *
FROM table
WHERE EMPLOYEE_ID = ?? AND DTL_ID = ??
UNION
SELECT *
FROM table
WHERE EMPLOYEE_ID = ?? AND DTL_ID = 0
AND NOT EXISTS (SELECT *
FROM table
WHERE EMPLOYEE_ID = ?? AND DTL_ID = ??)
You will of course have to fill in the ?? with the proper number.
If DTL_ID is always 0 or positive:
SELECT TOP 1 * FROM table
where EmployeeID = #EmployeeID and DTL_ID in (#DTL_ID,0)
order by DTL_ID desc
If you're working across multiple employees in a single query, etc, then you might want to use ROW_NUMBER() if your version of SQL supports it.
Use ISNULL(DTL_ID, 0) in your final SELECT query
SELECT E1.EMPLOYEE_ID, ISNULL(E2.DTL_ID, 0), E1.COLUMN_A, E1.COLUMN_B EMPLIYEES AS E1
LEFT JOIN EMPLIYEES AS E2
ON E1.EMPLOYEE_ID = E2.EMPLOYEE_ID AND E2.DTL_ID = 42
You can use top and union, e.g.:
declare #t table(id int, value int, c char)
insert #t values (1,0,'a'), (1,1,'b'), (1,2,'c')
declare #id int = 1;
declare #value int = 2;
select top(1) *
from
(
select *
from #t t
where t.value = #value and t.id = #id
union all
select *
from #t t
where t.value = 0
)a
order by a.value desc
If #value = 2 than query returns 1 2 c. If #value = 3 than query returns 1 0 a.
SELECT MAX(DTL_ID) ...
WHERE DTL_ID IN (#DTL_ID, 0)