Concatenate one field after GROUP BY - sql

This question have been asked many times in SO but none of the answers is satisfying to my situation.
Question 1
Question 2
Question 3
Question 4
I am dealing with a DataObjectVersions table that contains multiple versions for around 1.2 million unique objects (and increasing). I need to concatenate changes from a specific field for each unique object.
Right now I am using the solution with the XML Path presented in Q3 but running such a query on this table is a total performance disaster. SQL Server started to retun Data after 19mn. Knowing that this data will be than joined twice, you can imagine the impact.
I am looking for the most efficient scalability-aware way to concatenate the values of the same fields of different rows grouped by an other field (which is not of course a key). To be more precise, this is used within a view in a Datawarehouse.
EDIT:
I tried to simplify the description but here is a complete overview
I have multiple tables with the following columns
[ID]
[CreatedTime]
[CreatedBy]
[DeletedTime]
[DeletedBy]
[ResourceId]
[AccountId]
[Type]
A view is used to return the union of all records from all tables, which will still return the same columns (described in my questions by the versions table). [ResourceId] and [AccountId] are a unique composite identifier of an object (Group membership, System account, etc.. a resource assignment specifically). The [Type] is used to identify different levels (like Read/Write/Execute in the case of a file assignment)
All other fields contain the same values (in different tables) for different unique objects. I need to get the objects and concatenate the values of the [Type] column. All the row are processed afterward and the ([ResourceId],[AccountId]) combination must be unique (not the case when different types exists).
EDIT 2:
I am using this function:
CREATE FUNCTION [dbo].[GetUniqueType]
(
#ResourceId as uniqueidentifier,
#Account as uniqueidentifier
)
RETURNS nvarchar(100)
AS
BEGIN
return STUFF((select ',' + raType.Type from vwAllAssignments raType where raType.AccountId = #Account and raType.ResourceId = #ResourceId and raType.DeletedBy is null for xml path('')), 1,1,'')
END
GO
vwAllAssignments is the view returning the union of all tables rows.
Finally I am selecting
SELECT [CreatedTime]
,[DeletedTime]
,[DeletedBy]
,[ResourceId]
,[AccountId]
,dbo.GetUniqueType([ResourceId],[AccountId]) AS [Type]
FROM vwAllAssignments
GROUP BY [ResourceId], [AccountId], [CreatedTime], [DeletedTime], [DeletedBy]

Try this:
SELECT [CreatedTime]
,[DeletedTime]
,[DeletedBy]
,[ResourceId]
,[AccountId]
,STUFF((select ',' + raType.Type
from vwAllAssignments raType
where raType.AccountId = vwAllAssignments.AccountId and
raType.ResourceId = vwAllAssignments.ResourceId and
raType.DeletedBy is null
for xml path('')), 1,1,'') AS [Type]
FROM vwAllAssignments
GROUP BY [ResourceId], [AccountId], [CreatedTime], [DeletedTime], [DeletedBy]
And an index like this should be helpful.
create index IX_vwAllAssignments on vwAllAssignments(AccountId, ResourceId, DeletedBy) include(Type)

Related

SQL Server - matching attributes query

SQL Server Gurus ...
Currently using MS SQL Server 2016
I know Joe Celko and all SQL purists are squirming at the thought of using bitmasks, but I have a use case in which I need to query for all widgets that contain a set of given attributes.
Each widget may contain several hundred attributes.
The attributes of a widget are either present or not (1 = present, 0 = not
present)
One way I thought to do this is via bitmasks – the attributes to be found (a bitmask) could be ANDed with the attributes of each widget to find matches in a single operation. For example, the widgets table might be:
widets table:
widget_uid Uniqueidentifier
attributes BigInt
SELECT widget_uid
FROM widgets
WHERE ( attributes & bitmask ) = bitmask;
Problem is, using a BigInt for the attributes limits the number of attributes to 64 (a widget can have several hundred attributes), I could group the attributes in chunks of 64 bits, ie:
widets table:
widget_uid Uniqueidentifier
attributes0 BigInt -- Attributes 0-63
attributes1 BigInt -- Attributes 64-127
attributes2 BigInt -- Attributes 128-191
SELECT widget_uid
FROM widgets
WHERE ( attributes0 & bitmask0 ) = bitmask0
AND ( attributes1 & bitmask1 ) = bitmask1
AND ( attributes2 & bitmask2 ) = bitmask2
... but was wondering if anyone has come up with a solution for bit operations using bitmasks with greater than 64 bits – or if other (more efficient?) solutions would exist?
In the use case, the widgets table does contain other columns, but I am only concerned with the attributes matching portion of the query at the moment.
Any and all ideas are welcome - would be interested in knowing how others tackle this particular problem.
Thanks in advance.
We had a similar use case, on a significantly large data set. This was for an e-commerce site with products and attributes. Our case was a bit more complex than here, where we had any possible number of attributes and then values assigned to those attributes. e.g. Color - Red/Green/Blue, Size - S/M/L etc.
We found that associated tables with good indexing was the key in our case. While this may not be an option for you we found this to be the optimal solution for a dynamic data set.
I can code you up an example if you feel it will be helpful.
Edited to add example:
DROP TABLE IF EXISTS #Widgets
DROP TABLE IF EXISTS #Attributes
DROP TABLE IF EXISTS #WidgetAttributes
CREATE TABLE #Widgets (widget_UID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, Name NVARCHAR(255))
CREATE TABLE #Attributes (Attribute_UID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, Name NVARCHAR(255))
CREATE TABLE #WidgetAttributes (widget_UID UNIQUEIDENTIFIER,Attribute_UID UNIQUEIDENTIFIER)
CREATE NONCLUSTERED INDEX ix_WidgetAttribute ON #WidgetAttributes (Attribute_UID) INCLUDE (widget_UID)
INSERT INTO #Widgets (widget_UID, Name) values
( '{c63bea73-2331-4698-82c9-f71845ab8601}', N'Widget 1' ),
( '{a0865b8f-606b-4273-9207-39a8a26016c4}', N'Widget 2' ),
( '{211fe27e-ab98-4b61-83a3-3d006d66db5a}', N'Widget 3' )
INSERT INTO #Attributes (Attribute_UID, Name)
VALUES
( '{99354dc0-d0b2-4919-a887-edf115eeb1bd}', N'Height' ),
( '{136bbe4c-497d-472f-a905-670e4a7805d0}', N'Width' ),
( '{f006f950-30d1-453e-8e09-4f7d140fa3cb}', N'Depth' ),
( '{0d190639-677f-4b75-8d36-1bdac00de132}', N'Colour' )
-- Set links
-- Widget 1 All attributes
-- Widget 2 Height Width
-- Widget 3 Colour
INSERT INTO #WidgetAttributes (widget_UID, Attribute_UID)
SELECT '{c63bea73-2331-4698-82c9-f71845ab8601}',Attribute_UID FROM #Attributes
UNION ALL
SELECT TOP (2) '{a0865b8f-606b-4273-9207-39a8a26016c4}',Attribute_UID FROM #Attributes WHERE Name<> 'Colour'
UNION ALL
SELECT '{211fe27e-ab98-4b61-83a3-3d006d66db5a}',Attribute_UID FROM #Attributes WHERE Name = 'Colour'
-- #SearchAttributes to hold list of attributes you are trying to find
DECLARE #SearchAttributes TABLE (Attribute_UID UNIQUEIDENTIFIER)
INSERT INTO #SearchAttributes
SELECT Attribute_UID FROM #Attributes WHERE Name<> 'Colour'
;WITH cte AS (
SELECT WA.widget_UID, COUNT(1) AttributesPresent FROM #WidgetAttributes WA
JOIN #SearchAttributes SA ON SA.Attribute_UID = WA.Attribute_UID
GROUP BY WA.widget_UID
)
SELECT cte.AttributesPresent
, W.widget_UID
, W.Name
FROM cte
JOIN #Widgets W ON W.widget_UID = cte.widget_UID
ORDER BY cte.AttributesPresent DESC
Gives an output of:
AttributesPresent widget_UID Name
----------------- ------------------------------------ ----------
3 C63BEA73-2331-4698-82C9-F71845AB8601 Widget 1
2 A0865B8F-606B-4273-9207-39A8A26016C4 Widget 2
We used an approach of counting how many attributes were present for each so we not only had the option of "exact match" but also "closest fit".
Using bitmask in databases is wrong approach. Even if you somewhow manage it to work, you will not be able to use indexes to speed up execution.
Use standard solution, this is standard situation. There is standard M:N relationship between Widgets and Attributes (both should be tables, of course). You will add another table that will assign Attributes to Widgets - you can call it WidgetAttributes.
It will have 3 columns: Id, WidgetId, AttributeId
Then you can simply for example get list of Widgets that have Attribute:
select w.*
from Widgets w
inner join WidgetAttributes wa on wa.WidgetId = w.Id
inner join Attributes a on a.Id = wa.AttributeId
where a.AttributeName='xxx'

How do I get SQL to append columns to the result set instead of adding more rows?

I have 3 tables, they are Events, SignOffs and Users.
Events has the fields EventId (PK, int, autoincrement) and EventTitle (nvarchar(50)).
SignOffs has the fields SignOffId (PK, int, autoincrement), EventId (FK to Events.EventId) and SignedOffByUserId (FK to Users.UserId).
Users has the fields UserId (PK, int, autoincrement) and UserName (nvarchar(50)).
I want to do something like this:
SELECT [Events].EventId, EventTitle, UserName
FROM [Events]
INNER JOIN [SignOffs] ON [Events].EventId = [SignOffs].EventId
INNER JOIN [Users] ON [SignOffs].SignedOffByUserId = [Users].UserId
The problem with the above is you get a row for each person that signed off, so a given event can be repeated in the list multiple times if multiple people signed off.
What I want is for columns to be added to the result set for each person that signed off on an event. So the result set should look like this for an event where 3 people signed off:
EventId - EventTitle - SignedOffByUser1 - SignedOffByUser2 - SignedOffByUser3
I don't have any ideas about how this can be done and I'm not even sure how to articulate the problem succinctly to be able to search for answers.
You need to pivot the data, this can either be done using the PIVOT operator.
https://msdn.microsoft.com/en-us/library/ms177410.aspx?f=255&MSPPError=-2147217396
or you can use multiple case statements to achieve the same effect.
the limitations of both approaches is that you may only have a predefined number of columns. so if you have more than 3 signoffs for example you will not see them.
Another possiblity could be to put them into a csv list this would enable you to see them all on a single row no matter how many there are.
http://blog.sqlauthority.com/2009/11/25/sql-server-comma-separated-values-csv-from-table-column/
SELECT SUBSTRING(
(SELECT ',' + s.Name
FROM HumanResources.Shift s
ORDER BY s.Name
FOR XML PATH('')),2,200000) AS CSV
this can be done in a sub query.
Hope this helps.

How to show one to many relationship between two tables in SQL?

I have two tables A and B.
A table contain
postid,postname,CategoryURl
and
B table contain
postid,CategoryImageURL
For one postid there are multiple CategoryImageURL assigned.I want to display that CategoryImageURL in Table A but for one postid there should be CategoryImageURL1,CategoryImageURL2 should be like that one.
I want to achieve one to many relationship for one postid then what logic should be return in sql function??
In my eyes it seems that you want to display all related CategoryImageURLs of the second table in one line with a separator in this case the comma?
Then you will need a recursive operation there. Maybe a CTE (Common Table Expression) does the trick. See below. I have added another key to the second table, to be able to check, if all rows of the second table have been processed for the corresponding row in the first table.
Maybe this helps:
with a_cte (post_id, url_id, name, list, rrank) as
(
select
a.post_id
, b.url_id
, a.name
, cast(b.urln + ', ' as nvarchar(100)) as list
, 0 as rrank
from
dbo.a
join dbo.b
on a.post_id = b.post_id
union all
select
c.post_id
, a1.url_id
, c.name
, cast(c.list + case when rrank = 0 then '' else ', ' end + a1.urln as nvarchar(100))
, c.rrank + 1
from a_cte c
join ( select
b.post_id
, b.url_id
, a.name
, b.urln
from dbo.a
join dbo.b
on a.post_id = b.post_id
) a1
on c.post_id = a1.post_id
and c.url_id < a1.url_id -- ==> take care, that there is no endless loop
)
select d.name, d.list
from
(
select name, list, rank() over (partition by post_id order by rrank desc)
from a_cte
) d (name, list, rank)
where rank = 1
You are asking the wrong sort of question. This is about normalization.
As it stands, you have a redundancy? Where each postname and categoryURL is represented by an ID field.
For whatever reason, the tables separated CategoryImageUrl into its own table and linked this to each set of postname and categoryURL.
If the relation is actually one id to each postname, then you can denormalize the table by adding the column CategoryImageUrl to your first table.
Postid, postname, CategoryURL, CategoryImageUrl
Or if you wish to keep the normalization, combine like fields into their own table like so:
--TableA:
Postid, postname, <any other field dependent on postname >
--TableA
Postid, CategoryURL, CategoryImageUrl
Now this groups CategoryURL together but uses a redundancy of having multiple CategoryURL to exist. However, Postid has only one CategoryUrl.
To remove this redundancy in our table, we could use a Star Schema strategy like this:
-- Post Table
Postid, postname
-- Category table
CategoryID, CategoryURL, <any other info dependent only on CategoryURL>
-- Fact Table
Postid, CategoryID, CategoryImageURL
DISCLAIMER: Naturally I assumed aspects of your data and might be off. However, the strategy of normalization is still the same.
Also, remember that SQL is relational and deals with sets of data. Inheritance is incompatible to the relational set theory. Every table can be queried forwards and backwards much the way every page and chapter in a book is treated as part of the book. At no point would we see a chapter independent of a book.

Ordering output in a table that I am creating

I need to apply a 3 way split test to a subset of data, the way that I have been doing this is to create a 'TestTable'
eg
Select Group, List, Urn
into tbl_TestSplit
from tbl_AllRecords
where ApplicableToTest = 'Y'
order by List, Urn
Then I add some fields:
alter table tbl_testsplit
add
[ID][int] identity (1,1) not null,
[Split] [nvarchar] (20) null
then I update the split field as follows:
update tbl_testsplit set split = {fn MOD(id,3)}
However when I check the results of the split it is not splitting the records correctly - usually a few records out on at least one of the lists. When I investigated this, I noticed that the table it created was not actually in the order I had indicated.
I am sure there is an easier (ie smarter) way to go about this. Any help gratefully appreciated.
Thanks
You can use ROW_NUMBER to calculate the row number in the order you want
Select Group, List, Urn
, split = (ROW_NUMBER() OVER (ORDER BY List, Urn)) % 3
from tbl_AllRecords
where ApplicableToTest = 'Y'
order by List, Urn
% is the modulo function

Return multiple values in one column within a main query

I am trying to find Relative information from a table and return those results (along with other unrelated results) in one row as part of a larger query.
I already tried using this example, modified for my data.
How to return multiple values in one column (T-SQL)?
But I cannot get it to work. It will not pull any data (I am sure it is is user[me] error).
If I query the table directly using a TempTable, I can get the results correctly.
DECLARE #res NVARCHAR(100)
SET #res = ''
CREATE TABLE #tempResult ( item nvarchar(100) )
INSERT INTO #tempResult
SELECT Relation AS item
FROM tblNextOfKin
WHERE ID ='xxx' AND Address ='yyy'
ORDER BY Relation
SELECT #res = #res + item + ', ' from #tempResult
SELECT substring(#res,1,len(#res)-1) as Result
DROP TABLE #tempResult
Note the WHERE line above, xxx and yyy would vary based on the input criteria for the function. but since you cannot use TempTables in a function... I am stuck.
The relevant fields in the table I am trying to query are as follows.
tblNextOfKin
ID - varchar(12)
Name - varchar(60)
Relation - varchar(30)
Address - varchar(100)
I hope this makes enough sense... I saw on another post an expression that fits.
My SQL-fu is not so good.
Once I get a working function, I will place it into the main query for the SSIS package I am working on which is pulling data from many other tables.
I can provide more details if needed, but the site said to keep it simple, and I tried to do so.
Thanks !!!
Follow-up (because when I added a comment to the reponse below, I could not edit formatting)
I need to be able to get results from different columns.
ID Name Relation Address
1, Mike, SON, 100 Main St.
1, Sara, DAU, 100 Main St.
2, Tim , SON, 123 South St.
Both the first two people live at the same address, so if I query for ID='1' and Address='100 Main St.' I need the results to look something like...
"DAU, SON"
Mysql has GROUP_CONCAT
SELECT GROUP_CONCAT(Relation ORDER BY Relation SEPARATOR ', ') AS item
FROM tblNextOfKin
WHERE ID ='xxx' AND Address ='yyy'
You can do it for the whole table with
SELECT ID, Address, GROUP_CONCAT(Relation ORDER BY Relation SEPARATOR ', ') AS item
FROM tblNextOfKin
GROUP BY ID, Address
(assuming ID is not unique)
note: this is usually bad practice as an intermediate step, this is acceptable only as final formatting for presentation (otherwise you will end up ungrouping it which will be pain)
I think you need something like this (SQL Server):
SELECT stuff((select ',' +Relation
FROM tblNextOfKin a
WHERE ID ='xxx' AND Address ='yyy'
ORDER BY Relation
FOR XML path('')),1,1,'') AS res;