Is there a way to insert a record in SQL server if it does not match the latest version of the record based on three of the columns? - sql

Consider the following table named UserAttributes:
+----+--------+----------+-----------+
| Id | UserId | AttrName | AttrValue |
+----+--------+----------+-----------+
| 4 | 1 | FavFood | Apples |
| 3 | 2 | FavFood | Burgers |
| 2 | 1 | FavShape | Circle |
| 1 | 1 | FavFood | Chicken |
+----+--------+----------+-----------+
I would like to insert a new record in this table if the latest version of a particular attribute for a user has a value that does not match the latest.
What I mean by the latest is, for example, if I was to do:
SELECT TOP(1) * FROM [UserAttributes] WHERE [UserId] = 1 AND [AttrName] = 'FavFood' ORDER BY [Id] DESC
I will be able to see that user ID 1's current favorite food is "Apples".
Is there a query safe for concurrency that will only insert a new favorite food if it doesn't match the current favorite food for this user?
I tried using the MERGE query with a HOLDLOCK, but the problem is that WHEN MATCHED/WHEN NOT MATCHED, and that works if I never want to insert a new record after a user has previously set their favorite food (in this example) to the new value. However, it does not consider that a user might switch to a new favorite food, then subsequently change back to their old favorite food. I would like to maintain all the changes as a historical record.
In the data set above, I would like to insert a new record if the user ID 1's new favorite food is "Burgers", but I do not want to insert a record if their new favorite food is "Apples" (since that is their current favorite food). I would also like to make this operation safe for concurrency.
Thank you for your help!
EDIT: I should probably also mention that when I split this operation into two queries (ie: first select their current favorite food, then do an insert query only if there is a new food detected) it works under normal conditions. However, we are observing race conditions (and therefore duplicates) since (as you may have guessed) the data set above is simply an example and there are many threads operating on this table at the same time.

A bit ugly, but to do it in one command, you could insert the user's (new) favorite food but filter with an EXCEPT of their current values.
e.g., (assuming the user's new data is in #UserID, #FavFood
; WITH LatestFavFood AS
(SELECT TOP(1) UserID, AttrName, AttrValue
FROM [UserAttributes]
WHERE [UserId] = #UserID AND [AttrName] = 'FavFood'
ORDER BY [Id] DESC
)
INSERT INTO UserAttributes (UserID, AttrName, AttrValue)
SELECT #UserID, 'FavFood', #FavFood
EXCEPT
SELECT UserID, AttrName, AttrValue
FROM LatestFavFood
Here's a DB_Fiddle with three runs.
EDIT: I have changed the above to assume varchar types for AttrName rather than nvarchar. The fiddle has a mixture. Would be good to ensure you get them correct (especially food as it may have special characters).

Related

Transform Row Values to Column Names

I have a table of customer contacts and their role. Simplified example below.
customer | role | userid
----------------------------
1 | Support | 123
1 | Support | 456
1 | Procurement | 567
...
desired output
customer | Support1 | Support2 | Support3 | Support4 | Procurement1 | Procurement2
-----------------------------------------------------------------------------------
1 | 123 | 456 | null | null | 567 | null
2 | 123 | 456 | 12333 | 45776 | 888 | 56723
So dynamically create number of required columns based on how many user are in that role. It's a small number of roles. Also I can assume max 5 user in that same role. Which means worst case I need to generate 5 columns for each role. The userids don't need to be in any particular order.
My current approach is getting 1 userid per role/customer. Then a second query pulls another id that wasn't part of first results set. And so on. But that way I have to statically create 5 queries. It works. But I was wondering whether there is a more efficient way? Dynamically creating needed columns.
Example of pulling one user per role:
SELECT customer,role,
(SELECT top 1 userid
FROM temp as tmp1
where tmp1.customer=tmp2.customer and tmp1.role=tmp2.role
) as userid
FROM temp as tmp2
group by customer,role
order by customer,role
SQL create with dummy data
create table temp
(
customer int,
role nvarchar(20),
userid int
)
insert into temp values (1,'Support',123)
insert into temp values (1,'Support',456)
insert into temp values (1,'Procurement',567)
insert into temp values (2,'Support',123)
insert into temp values (2,'Support',456)
insert into temp values (2,'Procurement',888)
insert into temp values (2,'Support',12333)
insert into temp values (2,'Support',45776)
insert into temp values (2,'Procurement',56723)
You may need to adapt your approach slightly if you want to avoid getting into the realm of programming user defined table functions (which is what you would need in order to generate columns dynamically). You don't mention which SQL database variant you are using (SQL Server, PostgreSQL, ?). I'm going to make the assumption that it supports some form of string aggregation feature (they pretty much all do), but the syntax for doing this will vary, so you will probably have to adjust the code to your circumstances. You mention that the number of roles is small (5-ish?). The proposed solution is to generate a comma-separated list of user ids, one for each role, using common table expressions (CTEs) and the LISTAGG (variously named STRING_AGG, GROUP_CONCAT, etc. in other databases) function.
WITH tsupport
AS (SELECT customer,
Listagg(userid, ',') AS "Support"
FROM temp
WHERE ROLE = 'Support'
GROUP BY customer),
tprocurement
AS (SELECT customer,
Listagg(userid, ',') AS "Procurement"
FROM temp
WHERE ROLE = 'Procurement'
GROUP BY customer)
--> tnextrole...
--> AS (SELECT ... for additional roles
--> Listagg...
SELECT a.customer,
"Support",
"Procurement"
--> "Next Role" etc.
FROM tsupport a
JOIN tprocurement b
ON a.customer = b.customer
--> JOIN tNextRole ...
Fiddle is here with a result that appears as below based on your dummy data:

Query Parent and Children from single table

I currently have a single table that hosts all of my users. Now some users have team_leaders which reference the user id of the team leader which is also stored in the database.
Now, what I wanted to do do (and can't figure out) is how to query the database where it retrieves a list of the ids of all the team members and the leader in one result set.
For Example
name | id | team_leader
--------------------------------------------------
Jack | 1 | null
--------------------------------------------------
Susan| 2 | 1
--------------------------------------------------
Bob | 3 | 1
--------------------------------------------------
Eric | 4 | null
--------------------------------------------------
SELECT name FROM users where team_leader = '<some user's id>'
returns [ 'Susan', Bob']
But I would like it to return the team leader included, such as
['Jack', 'Susan', 'Bob']
Does anyone have any idea how to include the team leader in the query results?
EDIT:
Okay, so it seems like I have not explained myself 100%, my apologies. so the goal of this query is to do as follows.
I have another table called leads and there is a field there that is called user_id which correlates to the user that has access to the lead. Now, I want to introduce the ability for team leaders to update the leads that are associated with their accounts, so if the current user is a team leader they should have the ability to update the user_id from their id to anyone on their team, from one of their children to another, and from one of the children to themselves, but not to anyone not on their team. So the way I thought of it was to have a WHERE EXISTS or a WHERE IN (this would mean adding a field to the lead table called leader_id) and it checks if the new user_id is in a list of that team leader's members, including themselves.
Based off the example above.
UPDATE lead SET user_id = xxx
WHERE lead.id = yyy
AND ...
-- here is where I would check that the user_id xxx is part of the current
-- user's team which must be a team leader, for example user.id = 1
So my thought process was to get the previous query to then check against.
Hope this clears things up.
If I'm understanding correctly, you can just use or:
select name
from users
where team_leader = 1 or id = 1
WITH CTE AS(
SELECT name,id,team_leader FROM [users]
WHERE team_leader=1
UNION ALL
SELECT u.name,u.id,u.team_leader from [users] u
JOIN CTE ON CTE.empno=u.team_leader`enter code here`
and u.team_leader=1
)
SELECT * FROM CTE

Multiple records in a table matched with a column

The architecture of my DB involves records in a Tags table. Each record in the Tags table has a string which is a Name and a foreign kery to the PrimaryID's of records in another Worker table.
Records in the Worker table have tags. Every time we create a Tag for a worker, we add a new row in the Tags table with the inputted Name and foreign key to the worker's PrimaryID. Therefore, we can have multiple Tags with different names per same worker.
Worker Table
ID | Worker Name | Other Information
__________________________________________________________________
1 | Worker1 | ..........................
2 | Worker2 | ..........................
3 | Worker3 | ..........................
4 | Worker4 | ..........................
Tags Table
ID |Foreign Key(WorkerID) | Name
__________________________________________________________________
1 | 1 | foo
2 | 1 | bar
3 | 2 | foo
5 | 3 | foo
6 | 3 | bar
7 | 3 | baz
8 | 1 | qux
My goal is to filter WorkerID's based on an inputted table of strings. I want to get the set of WorkerID's that have the same tags as the inputted ones. For example, if the inputted strings are foo and bar, I would like to return WorkerID's 1 and 3. Any idea how to do this? I was thinking something to do with GROUP BY or JOINING tables. I am new to SQL and can't seem to figure it out.
This is a variant of relational division. Here's one attempt:
select workerid
from tags
where name in ('foo', 'bar')
group by workerid
having count(distinct name) = 2
You can use the following:
select WorkerID
from tags where name in ('foo', 'bar')
group by WorkerID
having count(*) = 2
and this will retrieve your desired result/
Regards.
This article is an excellent resource on the subject.
While the answer from #Lennart works fine in Query Analyzer, you're not going to be able to duplicate that in a stored procedure or from a consuming application without opening yourself up to SQL injection attacks. To extend the solution, you'll want to look into passing your list of tags as a table-valued parameter since SQL doesn't support arrays.
Essentially, you create a custom type in the database that mimics a table with only one column:
CREATE TYPE list_of_tags AS TABLE (t varchar(50) NOT NULL PRIMARY KEY)
Then you populate an instance of that type in memory:
DECLARE #mylist list_of_tags
INSERT #mylist (t) VALUES('foo'),('bar')
Then you can select against that as a join using the GROUP BY/HAVING described in the previous answers:
select workerid
from tags inner join #mylist on tag = t
group by workerid
having count(distinct name) = 2
*Note: I'm not at a computer where I can test the query. If someone sees a flaw in my query, please let me know and I'll happily correct it and thank them.

How should I handle updating data in a one-to-many relationship database schema?

Lets say I have a database similar to the following:
Table: People
id | name | age
---+------+-----
1 | dave | 78
Table: Likes
id | userid | like
---+--------+-------
1 | 1 | apples
---+--------+-------
2 | 1 | oranges
---+--------+-------
3 | 1 | women
What would be the best way to handle updating daves data? Currently, I do the following:
Update his age
UPDATE People SET age = 79 WHERE id = 1;
Update his likes
DELETE FROM Likes WHERE userid = 1;
INSERT INTO LIKES (userid, like) VALUES (1, 'new like');
INSERT INTO LIKES (userid, like) VALUES (1, 'another like');
I delete all the users data from the table and then readd their new stuff. This seems inefficient. Is there a better way?
It's not clear to me why you are suggesting a link between updating a record in the parent table and its dependents in the child table. The point of having separate tables is precisely that we can modify the non-key columns in People without touching Likes.
When it comes to updating Likes there are two different business transactions. The first is when Dave says, "I didn't mean 'oranges' I meant to say I like flower arranging". Correcting a mistake would use an update:
update likes
set like = 'flower arranging'
where userid = 1
and like = 'oranges'
/
The WHERE clause could use the LIKES.ID column instead.
The other case is where the preferences have actually changed. That is, when Dave says "Now I'm 79 I don't like women any more. I have new tastes.". This might look like this:
delete from likes
where userid = 1
and like = 'women'
/
insert into likes (userid, like)
values (1, 'dominoes')
/
insert into likes (userid, like)
values (1, 'Werthers Orignals')
/
The difference between these two statements is primarily one of clarity. We could have implemented the second set of statements as an update and a single insert but that would be misleading. Keeping the distinction between meaningful changes to the data and correcting mistakes is a useful discipline. It is especially helpful when we are keeping historical records and/or auditing changes.
What is definitely a bad idea is deleting all Dave's Likes records and then re-inserting them. Your application should be able to track which records have changed.
Depending on the DBMS you could use "MERGE" "REPLACE" "INSERT OR REPLACE",
most DBMSes support this functionality but the syntax varies wildly so you will need to RTFM to find out how do do this with the databse you are using.
You could add a new column to Likes storing the actual age. By doing this you get a history functionality and you can determine his likes by requesting those with the highest age like:
SELECT * FROM Likes WHERE userid = 1 AND age = (SELECT MAX(age) FROM Likes WHERE userid = 79 GROUP BY userid);
When you want be able to override some Likes you also could add another column "overriden_by" containing the id of the new Like. This would result in a simpler query:
SELECT * FROM Likes WHERE userid = 1 and overriden_by is null;

Database structure for items with varying attributes

I am developing a clothes web application and would appreciate advice on how to structure the data in my mysql database.
Every product (item of clothing) will be photograped in a number of ways, let's call them 'modes'. For example a shirt would be photographed buttoned or unbuttoned, and/or tucked in/not tucked in. A pair of trousers would have a different set of possible attributes. I want to store information on the way these items are photographed so I can later use that information to display the item of clothing in particular way.
So one method would be just to store all the possible attributes in a single table, something like:
productId (FK,PK)
modeId (PK)
isLoose
isTuckedIn
Size
HasSmthUnderneath
Where the attributes could be a value or a code defined in another table or NULL if it does not apply to a particular mode.
Then given a particular productId and modeId, I imagine I could filter out the NULL values for attributes which do not apply and use only the relevant ones.
However, I am not sure if that is the ideal way to store this kind of values as I would have alot of NULL values, for example in a pair of trousers which are only photographed in one way. I've heard of the EAV model, is this appropriate?
It's probably worth noting that the number of attributes will be decided by me and not the user and should not change considerably; and that my end goal is to extract the attributes of a particular mode so I can use that data in my application.
Sorry if anything is unclear!
I would be tempted to have the following normalized schema design
Mode Table
id | mode_style
---------------
1 | buttoned
2 | unbuttoned
3 | tucked in
4 | untucked
Clothes Table
id | name | description
----------------------------
1 | shirt | mans shirt...
2 | dress | short sleeve
Clothes_mm_Mode Table (Junction/Map table)
mode_id | clothes_id
--------------------
1 | 1
1 | 2
3 | 3
Then you can easily query those clothes that have an unbuttoned display
SELECT
c.id,
c.name,
c.description
FROM
Clothes c
INNER JOIN
Clothes_Mode cm
ON c.id = cm.clothes_id
WHERE
cm.mode_id = 2
If certain types of clothes are always displayed in the same way i.e. all shirts always have a buttoned and unbuttoned display, you could take out the Clothes_mm_Mode Table and introduce a Common Mode table that maps Modes to a Common Mode id
Common_Modes Table
id | name | description
--------------------------------------------------
1 | Men's Shirt | Common Modes for a Mens shirt
2 | Women's Shirt | Common Modes for a Womens shirt
Common_Modes_mm_Mode Table (Junction/Map table)
common_mode_id | mode_id
--------------------------------------------------
1 | 1
1 | 2
2 | 1
2 | 2
and then associate each item of Clothing with a Common Mode type
Clothing_Common_Modes Table
clothing_id | common_mode_id
----------------------------
1 | 1
The advantage of this design would be that when adding a new item of clothing, only one record need be entered into the Common Modes table to associate that item of clothing with the Modes common to the clothing type. Of course this could be handled without a common modes table by having a procedure that inserts the appropriate records into the original Clothes_mm_Mode Table for a new item of clothing, but by having the relationship in the database, it will be more prominent, visible and easier to maintain.
I think your design is fine. It would be possible to apply database normalization to it, which may give you the following designs alternatively:
have one table per property, each with (id, propvalue) pairs. Only add rows into these tables for items where the property actually applies.
have generic tables (id, propname, propvalue), perhaps one such table per property datatype (boolean, number, string).
With your description, I feel that either is overkill. The only exception would be cases were properties are multi-valued (e.g. list of available colors)
I personally think plain old key/value pairs for this type of thing are underrated, so if you're happy to control it more in the application itself you could also do something like this:
create table ProductStates
(
ProductId int PK
ModeState nvarchar(200) PK
)
Nice and simple in my mind. You get no redundant null values; if the product has that mode then there's a row, if not there's no row. Also means no schema changes required if there's a new state. If you wanted to you could have ModeState instead link out to a ModeStates lookup table, if you think integrity is going to be a problem.
create table ProductStates
(
ProductId int PK
ModeStateId int PK
)
create table ModeStates
(
ModeStateId int PK
ModeStateDescription nvarchar(500)
(...whatever else you might need here)
)
... though that's probably redundant.
Just an alternative, not sure if I'd do it that way myself (depends on the brief(s)). Did I get the specification right?