SQL rand() dependent on another column? - sql

I have a table with the following format
id cityname user
1 newyork a
2 newyork b
3 newyork c
4 denver d
5 colorodo e
6 colorodo e
I need to add a new column with name version which is randomly generated using rand() and it should have same values for similar cityname
id cityname user version
1 newyork a 1111111.11
2 newyork b 1111111.11
3 newyork c 1111111.11
4 denver d 7845156.12
5 colorodo e 8765589.12
6 colorodo e 8765589.12
How can I achieve random values similar for a group.
Please help.

If you are on SQL Server 2008 or above, you can use the CHECKSUM function. Keep in mind that you may get collisions with a 4 byte hash.
SELECT *, CHECKSUM(CityName) as Version
FROM Cities
For something a bit less likely to have a collision, you could use HASHBYTES:
SELECT *, HASHBYTES('SHA1', CityName) as Version
FROM Cities
For MySQL, you can use any of the encryption functions, and take a substring:
SELECT *, LEFT(SHA1(CityName), 8) as Version
FROM Cities
or, just use the whole hash for some heavier collision protection. Most other RDBMS have similar hash functions.

As #Mitch mentioned, you can have it as CHECKSUM
you can make it as computed column, so that on INSERT or UPDATE it is computed automatically
ALTER TABLE tableA ADD version AS CHECKSUM(CityName) PERSISTED

Related

Merge SQL Rows in Subquery

I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.

Set based approach to SQL Server insert where one column is calculated as Max from same column

I'm wondering if this is one of those situations where I'm forced to use a cursor or if I can use a set based approach. I've searched for several hours and also tried to come up with a solution myself to no avail.
I've got a table, SuperSupplierCodes, that contains two columns: SuperSupplierCode INT, and SupplierName NVARCHAR(50).
SuperSupplierID SupplierName
1 21ST CENTURY GRAPHIC TECHNOLOGIES LLC
2 3D SYSTEMS
3 3G
4 A A ABRASIVOS ARGENTINOS SAIC
5 A AND F DRUCKLUFTTECHNIK GMBH
6 A BAY STATIONERS
7 A C T TOOL AND ENGINEERING LLC
8 A HERZOG AG
9 A LI T DI MONTANARI MARCO AND CO SAS
11 A RAYMOND GMBH AND CO KG
I've got a second table with millions of rows in it containing financial data as well as the SupplierName column.
LocalSupplierName
23 JAN HOFMEYER ROAD
303 TAXICAB, LLC
3D MECA SARL
3D SYSTEMS
3E CO ENVIRONMENTAL, ECO. & EN
3E COMPANY
What I need to do is insert into the SuperSupplierCodes table such that each row gets the MAX(SuperSupplierCode) from the previous row, increments it by one, and inserts that into the SuperSupplierCode column along with the SupplierName from the second table.
I've tried the following, just as a test, that I might be able to use for the insert, but of course it will only do the increment once and try to use that same value for SuperSupplierCode for every row:
SELECT s.SuperSupplierID,
s.SupplierName,
s.SupplierAddress,
s.DateCreated,
s.DateModified,
s.SupplierCode,
s.PlantName,
s.id,
x.MaxSSC
FROM SuperSupplierCodes AS s
CROSS APPLY (SELECT MAX(SuperSupplierID)+1 AS MaxSSC FROM dbo.SuperSupplierCodes) x;
I don't like using cursors unless I absolutely have to. Is there a way to do this with T-SQL in a set based manner versus using a cursor?
Create the column as an identity and insert the existing records once using SET IDENTITY_INSERT ON option. Then switch it off for adding new Ids and they will be incremented.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-identity-insert-transact-sql?view=sql-server-2017
Why not something like this?
SELECT (SELECT MAX(SuperSupplierID) FROM dbo.SuperSupplierCodes) + ROW_NUMBER() OVER (ORDER BY s.DateCreated) AS SuperSupplierID,
s.SupplierName,
s.SupplierAddress,
s.DateCreated,
s.DateModified,
s.SupplierCode,
s.PlantName,
s.id
FROM SuperSupplierCodes AS s;
We use the above technique at my work all the time when inserting rows. If some have existing values, you can insert them all into the table and then change the above to only update values that are currently null.

Updating a database column based on its similarity to another database column

I have a database table (Customers) with the following columns:
ID
FIRST_NAME
MIDDLE_INIT
LAST_NAME
FULL_NAME
I also have a database table (ENG) with the following columns:
ID
ENG_NAME
I want to replace all of the ENG.ENG_NAME entries with a FULL_NAME entry from the CUSTOMERS table
Here is the problem.
The ENG_NAME was hand-jammed through a web form and, so, has no consistency. For instance, one row might contain "Robin Hood". Another "Hood, Robin L". An another "Robin L Hood".
I want to search the entries in the CUSTOMERS table, find a close match, then replace the ENG.ENG_NAME with the CUSTOMERS.FULL_NAME.
Example:
ENG table CUSTOMERS table
ID ENG_NAME ID FULL_NAME FIRST_NAME MIDDLE_INIT LAST_NAME
================ ==================================================================
1 Hood,Robin 1 Robin L Hood Robin L Hood
2 Rob Hood 2 Maid M Marion Maid M Marion
3 Marion M 3 Friar F Tuck Friar F Tuck
4 Rob Garza 4 Robert A Garza Robert A Garza
Based on the data above, I would want ENG_NAME columns to be replaced like this:
ENG table
ID ENG_NAME
====================
1 Robin L Hood
2 Robin L Hood
3 Maid M Marion
4 Robert A Garza
Any thoughts on how to do this?
Thanks
This is not going to be a simple task, I would start at finding a good C# (or any .NET) algorithm that detects similar strings portions.
Then look at Compiling C# Code into SQL Stored Procedures and Invoke that code using SQL Server. This CLR Code can then write the results to a table for you to analyze and do whatever you want with it.
For More: CLR SQL Server User-Defined Function
I would do it in .NET using Levenshtein distance.
Start at 1 and you are going to have some ties and you need to decide
Then move to 2,3,4...
You could do in a CLR but how are you going to deal with ties? And you are going to have ties. How are you going to decide when it is not a match at all?
And I would put it in new column so you have a history of original data
Or a FK reference to customers table

Populating column for Oracle Text search from 2 tables

I am investigating the benefits of Oracle Text search, and currently am looking at collecting search text data from multiple (related) tables and storing the data in the smaller table in a 1-to-many relationship.
Consider these 2 simple tables, house and inhabitants, and there are NEVER any uninhabited houses:
HOUSE
ID Address Search_Text
1 44 Some Road
2 31 Letsby Avenue
3 18 Moon Crescent
INHABITANT
ID House Name Nickname
1 1 Jane Doe Janey
2 1 John Doe JD
3 2 Jo Smythe Smithy
4 2 Percy Plum PC
5 3 Apollo Lander Moony
I want to to write SQL that updates the HOUSE.Search_Text column with text from INHABITANT. Now because this is a 1-to-many, the SQL needs to collate the data in INHABITANT for each matching row in house, and then combine the data (comma separated) and update the Search_Text field.
Once done, the Oracle Text search index on HOUSE.Search_Text will return me HOUSEs that match the search criteria, and I can look up INHABITANTs accordingly.
Of course, this is a very simplified example, I want to pick up data from many columns and Full Text Search across fields in both tables.
With the help of a colleague we've got:
select id, ADDRESS||'; '||Names||'; '||Nicknames as Search_Text
from house left join(
SELECT distinct house_id,
LISTAGG(NAME, ', ') WITHIN GROUP (ORDER BY NAME) OVER (PARTITION BY house_id) as Names,
LISTAGG(NICKNAME, ', ') WITHIN GROUP (ORDER BY NICKNAME) OVER (PARTITION BY house_id) as Nicknames
FROM INHABITANT)
i on house.id = i.house_id;
which returns:
1 44 Some Road; Jane Doe, John Doe; JD, Janey
2 31 Letsby Avenue; Jo Smythe, Percy Plum; PC, Smithy
3 18 Moon Crescent; Apollo Lander; Moony
Some questions:
Is this an efficient query to return this data? I'm slightly
concerned about the distinct.
Is this the right way to use Oracle Text search across multiple text fields?
How to update House.Search_Text with the results above? I think I need a correlated subquery, but can't quite work it out.
Would it be more efficient to create a new table containing House_ID and Search_Text only, rather than update House?

Table Join issue

Right now I've got a Main table in which I am uploading data. Because the Main table has many different duplicates, I Append various data out of the Main table into other tables such as, username, phone number, and locations in order to keep things optimized. Once I have everything stripped down from the Main table, I then append what's left into a final optimized Main table. Before this happens though, I run a select query joining all the stripped tables with the original Main table in order to connect the IDs from each table, with the correct data. For example:
Original Main Table
--Name---------Number------Due Date-------Location-------Charges Monthly-----Charges Total--
John Smith 111-1111 4/3 Chicago 234.56 500.23
Todd Jones 222-2222 4/3 New York 174.34 323.56
John Smith 111-1111 4/3 Chicago 274.56 670.23
Bill James 333-3333 4/3 Orlando 100.00 100.00
This gets split into 3 tables (name, number, location) and then there is a date table with all the dates for the year:
Name Table Number Table Location Table Due Date Table
--ID---Name------ -ID--Number--------- ---ID---Location---- --Date---
1 John Smith 1 111-1111 1 Chicago 4/1
2 Todd Jones 2 222-2222 2 New York 4/2
3 Bill James 3 333-3333 3 Orlando 4/3
Before The Original table gets stripped, I run a select query that grabs the ID from the 3 new tables, and joins them based on the connection they have with the original Main table.
Select Output
--Name ID----Number ID---Location ID---Due Date--
1 1 1 4/3
2 2 2 4/3
1 1 1 4/3
3 3 3 4/3
My issue comes when I need to introduce a new table that isn't able to be tied into the Original Main Table. I have an inventory table that, much like the original Main table, has duplicates and needs to be optimized. I do this by creating a secondary table that takes all the duplicated devices out and put them in their own table, and then strips the username and number out and puts them into their tables. I would like to add the IDs from this new device table into the select output that I have above. Resulting in:
Select Output
--Name ID----Number ID---Location ID---Due Date--Device ID---
1 1 1 4/3 1
2 2 2 4/3 1
1 1 1 4/3 2
3 3 3 4/3 1
Unlike the previous tables, the device table has no relationship to the originalMain Table, which is what is causing me so much headache. I can't seem to find a way to make this happen...is there anyway to accomplish this?
Any two tables can be joined. A table represents an application relationship. In some versions (not the original) of Entity-Relationship Modelling (notice that the "R" in E-R stands for "(application) relationship"!) a foreign key is sometimes called a "relationship". You do not need other tables or FKs to join any two tables.
Explain, in terms of its column names and the values for those names, exactly when a row should turn up in the result. Maybe you want:
SELECT *
FROM the stripped-and-ID'd version of the Original AS o
JOIN the stripped-and-ID'd version of the Device AS d
USING NameID, NumberID, LocationID and DueDate
Ie
SELECT *
FROM the stripped-and-ID'd version of the Original AS o
JOIN the stripped-and-ID'd version of the Device AS d
ON o.NameID=d.NameId AND o.NumberID=d.NumberID
AND o.LocationID=d.LocationID AND o.DueDateID=d.DueDate.
Suppose p(a,...) is some statement parameterized by a,... .
If o holds the rows where o(NameID,NumberID,LocationID,DueDate) and d holds the rows where d(NameID,NumberID,LocationID,DueDate,DeviceID) then the above holds the rows where o(NameID, NumberID, LocationID, DueDate) AND d(NameID,NumberID,LocationID,DueDate,DeviceID). But you really have not explained what rows you want.
The only way to "join" tables that have no relation is by unioning them together:
select attribute1, attribute2, ... , attributeN
from table1
where <predicate>
union // or union all
select attribute1, attribute2, ... , attributeN
from table2
where <predicate>
the where clauses are obviously optional
EDIT
optionally you could join the tables together by stating ON true which will act like a cross product