Replace a column value with random values - sql

I want to replace values in a column with randomized values
NO LINE
-- ----
1 1
1 2
1 3
1 4
2 1
2 2
3 1
4 1
4 2
I want to randomize column NO and replace with random values. I have 5 million records and doing something like below script gives me 5 million unique NO's but as you can see NO is not unique and i want the same random value assigned for the same NO.
UPDATE table1
SET NO= abs(checksum(NewId())) % 100000000
I want my resultant dataset like below
NO LINE
------ ----
99 1
99 2
99 3
99 4
1092 1
1092 2
3456 1
41098 1
41098 2

I would recommend rand() with a seed:
UPDATE table1
SET NO = FLOOR(rand(NO) * 100000000);
This runs a slight risk of collisions, so two different NO rows could get the same value.
If the numbers do not need to be "random" you can give them consecutive values in an arbitrary order and avoid collisions:
with toupdate as (
select t1.*,
dense_rank() over (order by rand(NO), no) as new_no
from t
)
update toupdate
set no = new_no;

Related

how can i alternate between 0 and 1 values in sql server?

I want to create a select which will alternate between 1 and 0
my table looks like that
id1 id2 al
11 1 1
40 1 0
12 1 0
237 1 1
but I want to make it like that
id1 id2 al
40 1 0
11 1 1
12 1 0
237 1 1
I want to keep the same values in my table but I just want to switch the rows to alternate between 0 and 1
Consider:
select *
from mytable
order by row_number() over(partition by al order by id1), al
This alternates 0 and 1 values - if the groups have a different number of rows, then, once the smallest group exhausts, all remaining rows in the other group appear at the end of the resultset.
I am unsure which column you want to use to order the rows within each group - I assumed id1, but you might want to change that to your actual requirement.

How to update a column with incrementally sequenced values that change depending on other column value

I am trying to update a column in a table so that the Index column (which is currently arbitrary numbers) is renumbered sequentially starting at 1000 with increments of 10, and this sequence restarts every time the Group changes.
I have tried ROWNUMBER() with PARTITION and trying to define a SEQUENCE, but I can't seem to get the result I'm looking for.
Table 1
ID Group Index
1 A 1
2 A 2
3 B 3
4 B 4
5 B 5
6 C 6
7 D 7
What I want:
Table 1
ID Group Index
1 A 1000
2 A 1010
3 B 1000
4 B 1010
5 B 1020
6 C 1000
7 D 1000
You can use row_number() with some arithmetic:
select t.*,
990 + 10 * row_number() over (partition by group order by id) as index
from t;
Note that group and index are SQL reserved words, so they are really bad column names.

Adding Auto Increment Value to Column in relation to Duplicate values in Another Column

I have a large table (3 million rows and about 12 columns). I have one column that can contain duplicate values - this is my "ID" column. I have a second column "NUM_ID" that I would like to have it start at the value of 1 for every unique "ID". Then - if I run into a duplicate value - "NUM_ID" would then bump up one value (to 2) and so on. For example:
ID NUM_ID
1 1
2 1
2 2
2 3
3 1
3 2
4 1
5 1
5 2
5 3
5 4
Again, "ID" is pre-populated, I cannot change this column and its values. My "NUM_ID" column is currently empty - I'm hoping there is a sql command I can use to populate the column as shown above? I've tried using Python but updating 3M rows is taking a long time. Also, if it matters, I am using PostGresSQL.
Help? Thanks!
If you are Using SQL Server then You Should Use ROW_NUMBER() as below :
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) NUM_ID FROM #TM
Result :
ID NUM_ID
1 1
2 1
2 2
2 3
3 1
3 2
4 1
5 1
5 2
5 3
5 4

Assign value to a column based on values of other columns in the same table

I have a table with columns Date and Order. I want to add a column named Batch to this table which will be filled as follows: For each Date, we start from the first Order, and group each two orders in one batch.
It means that for records with Date = 1 in this example (the first 4 records), the first two records (Order= 10 and Order=30) will have batch number: Batch = 1, the next two records (Order = 80 and Order = 110) will have Batch = 2, and so on.
If at the end the number of remaining record(s) is less than the batch size (2 in this example),
the remained order(s) will have a separate Batch number, as in the example below, number of records with Date=2 is odd, so the last record (5th records) will have Batch = 3.
Date Order
-----------
1 10
1 30
1 80
1 110
2 20
2 30
2 50
2 70
2 120
3 90
Date Order Batch
------------------
1 10 1
1 30 1
1 80 2
1 110 2
2 20 1
2 30 1
2 50 2
2 70 2
2 120 3
3 90 1
Use the analytic function row_number to get row numbers 1,2,3,... within each date. Then add one and divide by two:
select
dateid,
orderid,
trunc((row_number() over (partition by dateid order by orderid) +1 ) / 2) as batch
from mytable;

Inserting a new indicator column to tell if a given row maximizes another column in SQL

I currently have a table in SQL that looks like this
PRODUCT_ID_1 PRODUCT_ID_2 SCORE
1 2 10
1 3 100
1 10 3000
2 10 10
3 35 100
3 2 1001
That is, PRODUCT_ID_1,PRODUCT_ID_2 is a primary key for this table.
What I would like to do is use this table to add in a row to tell whether or not the current row is the one that maximizes SCORE for a value of PRODUCT_ID_1.
In other words, what I would like to get is the following table:
PRODUCT_ID_1 PRODUCT_ID_2 SCORE IS_MAX_SCORE_FOR_ID_1
1 2 10 0
1 3 100 0
1 10 3000 1
2 10 10 1
3 35 100 0
3 2 1001 1
I am wondering how I can compute the IS_MAX_SCORE_FOR_ID_1 column and insert it into the table without having to create a new table.
You can try like this...
Select PRODUCT_ID_1, PRODUCT_ID_2 ,SCORE,
(Case when b.Score=
(Select Max(a.Score) from TableName a where a.PRODUCT_ID_1=b. PRODUCT_ID_1)
then 1 else 0 End) as IS_MAX_SCORE_FOR_ID_1
from TableName b
You can use a window function for this:
select product_id_1,
product_id_2,
score,
case
when score = max(score) over (partition by product_id_1) then 1
else 0
end as is_max_score_for_id_1
from the_table
order by product_id_1;
(The above is ANSI SQL and should run on any modern DBMS)