Adding value to column depending on LIKE - sql

I have SSMS 2008 R2. This is my table
PK | DateOf | Other Columns | Items
01 | 05/30/2017 15:30 | Blah | truck, elephant, apple, . . .
02 | 04/15/2012 07:07 | Bluh | foot, orange, horse, . . .
03 | 11/01/2016 10:30 | Wham | apple, screen, penny, . . .
I am trying to search the Items column for each record and count how many times the fruits occur. Conveniently enough, there will only be a single fruit for each record, but it would be cool if the solution would be able to handle several fruits (maybe even duplicates). The result table would look like this for the above
Count | Fruit
2 | Apple
1 | Orange
I have a complete list of the "fruits". I was trying to figure out how to make it work with LIKE's.
SELECT
count(PK) AS [Count]
??? AS [Fruit]
WHERE DateOf >= '2011-01-01 00:00' AND DateOf < '2012-01-01 00:00'
AND Items LIKE '%Apple%' --??(some kind of code that looks for the fruit values??)

This is the wrong way to store lists. But, sometimes we are stuck with other people's really bad design decisions.
In this case, you want a split() function. SQL Server 2016 offers one. You can also find code for one on the web (Google "SQL Server split string"):
SELECT ss.fruit, count(*) AS [Count]
FROM t CROSS APPLY
string_split(t.items) ss(fruit)
WHERE DateOf >= '2011-01-01' AND DateOf < '2012-01-01'
GROUP BY ss.fruit;

You can use a simple outer apply to identify fruits if you have table with fruits. Like this:
drop table if exists dbo.Details;
drop table if exists dbo.Fruits;
create table dbo.Details (
PK int not null primary key
, DateOf datetime2(3)
, Items varchar(100)
);
create table dbo.Fruits (
Fruit varchar(100) not null primary key
);
insert into dbo.Details (PK, DateOf, Items)
values (1, '20170530 15:30', 'truck, elephant, apple, . . .')
, (2, '20120415 07:07', 'foot, orange, horse, . . .')
, (3, '20161101 10:30', 'apple, screen, penny, orange, . . .')
insert into dbo.Fruits (Fruit)
values ('apple'), ('orange');
select
d.PK, d.DateOf, d.Items, count(*) as CountOfFruit
from dbo.Details d
outer apply (
select
*
from dbo.Fruits f
where d.Items like '%' + f.Fruit + '%'
) tf
group by d.PK, d.DateOf, d.Items

Try this it may helps you
IF OBJECT_ID('Tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
DECLARE #SearchWords nvarchar(max)='Apple,Orange'
DECLARE #Table TABLE (SearchWords nvarchar(max))
INSERT INTO #Table
SELECT #SearchWords
;With cte_Search
AS
(
SELECT ROW_NUMBER()OVER(ORDER BY (SELECT 1) )AS Seq,
Split.a.value('.', 'VARCHAR(1000)') AS SearchWords
FROM (
SELECT
CAST('<S>' + REPLACE(SearchWords, ',', '</S><S>') + '</S>' AS XML) AS SearchWords
FROM #Table
) AS A
CROSS APPLY SearchWords.nodes('/S') AS Split(a)
)
SELECT * INTO #Temp FROM cte_Search
;With CTE (PK , DateOf , OtherColumns , Items)
AS
(
SELECT 01 , '05/30/2017 15:30' ,'Blah', 'truck, elephant, apple'UNION ALL
SELECT 02 , '04/15/2012 07:07' ,'Bluh', 'foot, orange, horse' UNION ALL
SELECT 03 , '11/01/2016 10:30' ,'Wham', 'apple, screen, penny'
)
SELECT SearchWords AS Fruit,
MAX(Seq) AS CountOFFruits FROM
(
SELECT C.OtherColumns,
t.SearchWords,
C.Items,
COUNT(C.Items) CntItems,
ROW_NUMBER()OVER(Partition by t.SearchWords order by C.OtherColumns ) AS Seq
FROM CTE c ,#Temp t
WHERE CHARINDEX(t.SearchWords,c.Items) >0
Group by C.OtherColumns,t.SearchWords,C.Items
)dt
Group by SearchWords
Result
Fruit CountOFFruits
----------------------
Apple 2
Orange 1

Related

SQL group three columns into one

I have a table with three columns:
[ID] [name] [link]
1 sample_name_1 sample_link_1
2 sample_name_2 sample_link_2
3 sample_name_3 sample_link_3
I need to somehow group them into one column, so the ideal result is this:
[one_column]
1
sample_name_1
sample_name_1
2
sample_name_2
sample_link_2
3
sample_name_3
sample_link_3
Does anyone have any suggestions on where to look and how to get it done in SQL Server?
You may try to use VALUES table value constructor with CROSS APPLY:
Table:
CREATE TABLE MyTable (
ID int,
name varchar(50),
link varchar(50)
)
INSERT INTO MyTable (ID, name, link)
VALUES
(1, 'sample_name_1', 'sample_link_1'),
(2, 'sample_name_2', 'sample_link_2'),
(3, 'sample_name_3', 'sample_link_3')
Statement:
SELECT v.one_column
FROM MyTable t
CROSS APPLY (VALUES
(1, CONVERT(varchar(50), ID)),
(2, CONVERT(varchar(50), name)),
(3, CONVERT(varchar(50), link))
) v (rn, one_column)
ORDER BY t.ID, v.rn
Result:
one_column
1
sample_name_1
sample_link_1
2
sample_name_2
sample_link_2
3
sample_name_3
sample_link_3
While this is something you should do in your presentation layer (i.e. your app or Website) you can do this in SQL:
select one column
from
(
select cast(id as varchar(10)) as one column, id as sortkey1, 1 as sortkey2 from mytable
union all
select name as one column, id as sortkey1, 2 as sortkey2 from mytable
union all
select link as one column, id as sortkey1, 3 as sortkey2 from mytable
) unioned
order by sortkey1, sortkey2;

How to synthesize attribute for joined tables

I have a view defined like this:
CREATE VIEW [dbo].[PossiblyMatchingContracts] AS
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts
FROM [dbo].AllContracts AS C
INNER JOIN [dbo].AllContracts AS CC
ON C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND C.AssociatedUser IS NULL
AND C.UniqueID <> CC.UniqueID
Which is basically finding contracts where f.e. the first name and the birthday are matching. This works great. Now I want to add a synthetic attribute to each row with the value from only one source row.
Let me give you an example to make it clearer. Suppose I have the following table:
UniqueID | FirstName | LastName | Birthday
1 | Peter | Smith | 1980-11-04
2 | Peter | Gray | 1980-11-04
3 | Peter | Gray-Smith| 1980-11-04
4 | Frank | May | 1985-06-09
5 | Frank-Paul| May | 1985-06-09
6 | Gina | Ericson | 1950-11-04
The resulting view should look like this:
UniqueID | PossiblyMatchingContracts | SyntheticID
1 | 2 | PeterSmith1980-11-04
1 | 3 | PeterSmith1980-11-04
2 | 1 | PeterSmith1980-11-04
2 | 3 | PeterSmith1980-11-04
3 | 1 | PeterSmith1980-11-04
3 | 2 | PeterSmith1980-11-04
4 | 5 | FrankMay1985-06-09
5 | 4 | FrankMay1985-06-09
6 | NULL | NULL [or] GinaEricson1950-11-04
Notice that the SyntheticID column uses ONLY values from one of the matching source rows. It doesn't matter which one. I am exporting this view to another application and need to be able to identify each "match group" afterwards.
Is it clear what I mean? Any ideas how this could be done in sql?
Maybe it helps to elaborate a bit on the actual use case:
I am importing contracts from different systems. To account for the possibility of typos or people that have married but the last name was only updated in one system, I need to find so called 'possible matches'. Two or more contracts are considered a possible match if they contain the same birthday plus the same first, last or birth name. That implies, that if contract A matches contract B, contract B also matches contract A.
The target system uses multivalue reference attributes to store these relationships. The ultimate goal is to create user objects for these contracts. The catch first is, that the shall only be one user object for multiple matching contracts. Thus I'm creating these matches in the view. The second catch is, that the creation of user objects happens by workflows, which run parallel for each contract. To avoid creating multiple user objects for matching contracts, each workflow needs to check, if there is already a matching user object or another workflow, which is about to create said user object. Because the workflow engine is extremely slow compared to sql, the workflows should not repeat the whole matching test. So the idea is, to let the workflow check only for the 'syntheticID'.
I have solved it with a multi step approach:
Create the list of possible 1st level matches for each contract
Create the base groups list, assigning a different group for for
each contract (as if they were not related to anybody)
Iterate the matches list updating the group list when more contracts need to
be added to a group
Recursively build up the SyntheticID from final group list
Output results
First of all, let me explain what I have understood, so you can tell if my approach is correct or not.
1) matching propagates in "cascade"
I mean, if "Peter Smith" is grouped up with "Peter Gray", it means that all Smith and all Gray are related (if they have the same birth date) so Luke Smith can be in the same group of John Gray
2) I have not understood what you mean with "Birth Name"
You say contracts matches on "first, last or birth name", sorry, I'm italian, I thought birth name and first were the same, also in your data there is not such column. Maybe it is related to that dash symbol between names?
When FirstName is Frank-Paul it means it should match both Frank and Paul?
When LastName is Gray-Smith it means it should match both Gray and Smith?
In following code I have simply ignored this problem, but it could be handled if needed (I already did a try, breaking names, unpivoting them and treating as double match).
Step Zero: some declaration and prepare base data
declare #cli as table (UniqueID int primary key, FirstName varchar(20), LastName varchar(20), Birthday varchar(20))
declare #comb as table (id1 int, id2 int, done bit)
declare #grp as table (ix int identity primary key, grp int, id int, unique (grp,ix))
declare #str_id as table (grp int primary key, SyntheticID varchar(1000))
declare #id1 as int, #g int
;with
t as (
select *
from (values
(1 , 'Peter' , 'Smith' , '1980-11-04'),
(2 , 'Peter' , 'Gray' , '1980-11-04'),
(3 , 'Peter' , 'Gray-Smith', '1980-11-04'),
(4 , 'Frank' , 'May' , '1985-06-09'),
(5 , 'Frank-Paul', 'May' , '1985-06-09'),
(6 , 'Gina' , 'Ericson' , '1950-11-04')
) x (UniqueID , FirstName , LastName , Birthday)
)
insert into #cli
select * from t
Step One: Create the list of possible 1st level matches for each contract
;with
p as(select UniqueID, Birthday, FirstName, LastName from #cli),
m as (
select p.UniqueID UniqueID1, p.FirstName FirstName1, p.LastName LastName1, p.Birthday Birthday1, pp.UniqueID UniqueID2, pp.FirstName FirstName2, pp.LastName LastName2, pp.Birthday Birthday2
from p
join p pp on (pp.Birthday=p.Birthday) and (pp.FirstName = p.FirstName or pp.LastName = p.LastName)
where p.UniqueID<=pp.UniqueID
)
insert into #comb
select UniqueID1,UniqueID2,0
from m
Step Two: Create the base groups list
insert into #grp
select ROW_NUMBER() over(order by id1), id1 from #comb where id1=id2
Step Three: Iterate the matches list updating the group list
Only loop on contracts that have possible matches and updates only if needed
set #id1 = 0
while not(#id1 is null) begin
set #id1 = (select top 1 id1 from #comb where id1<>id2 and done=0)
if not(#id1 is null) begin
set #g = (select grp from #grp where id=#id1)
update g set grp= #g
from #grp g
inner join #comb c on g.id = c.id2
where c.id2<>#id1 and c.id1=#id1
and grp<>#g
update #comb set done=1 where id1=#id1
end
end
Step Four: Build up the SyntheticID
Recursively add ALL (distinct) first and last names of group to SyntheticID.
I used '_' as separator for birth date, first names and last names, and ',' as separator for the list of names to avoid conflicts.
;with
c as(
select c.*, g.grp
from #cli c
join #grp g on g.id = c.UniqueID
),
d as (
select *, row_number() over (partition by g order by t,s) n1, row_number() over (partition by g order by t desc,s desc) n2
from (
select distinct c.grp g, 1 t, FirstName s from c
union
select distinct c.grp, 2, LastName from c
) l
),
r as (
select d.*, cast(CONVERT(VARCHAR(10), t.Birthday, 112) + '_' + s as varchar(1000)) Names, cast(0 as bigint) i1, cast(0 as bigint) i2
from d
join #cli t on t.UniqueID=d.g
where n1=1
union all
select d.*, cast(r.names + IIF(r.t<>d.t,'_',',') + d.s as varchar(1000)), r.n1, r.n2
from d
join r on r.g = d.g and r.n1=d.n1-1
)
insert into #str_id
select g, Names
from r
where n2=1
Step Five: Output results
select c.UniqueID, case when id2=UniqueID then id1 else id2 end PossibleMatchingContract, s.SyntheticID
from #cli c
left join #comb cb on c.UniqueID in(id1,id2) and id1<>id2
left join #grp g on c.UniqueID = g.id
left join #str_id s on s.grp = g.grp
Here is the results
UniqueID PossibleMatchingContract SyntheticID
1 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
1 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
4 5 1985-06-09_Frank,Frank-Paul_May
5 4 1985-06-09_Frank,Frank-Paul_May
6 NULL 1950-11-04_Gina_Ericson
I think that in this way the resulting SyntheticID should also be "unique" for each group
This creates a synthetic value and is easy to change to suit your needs.
DECLARE #T TABLE (
UniqueID INT
,FirstName VARCHAR(200)
,LastName VARCHAR(200)
,Birthday DATE
)
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 1,'Peter','Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 2,'Peter','Gray','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 3,'Peter','Gray-Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 4,'Frank','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 5,'Frank-Paul','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 6,'Gina','Ericson','1950-11-04'
DECLARE #PossibleMatches TABLE (UniqueID INT,[PossibleMatch] INT,SynKey VARCHAR(2000)
)
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Fn=' + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,t2.UniqueID,'Ln=' + t1.LastName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,pm.UniqueID,'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
LEFT JOIN #PossibleMatches pm on pm.UniqueID=t1.UniqueID
WHERE pm.UniqueID IS NULL
SELECT *
FROM #PossibleMatches
ORDER BY UniqueID,[PossibleMatch]
I think this will work for you
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C INNER JOIN
[dbo].AllContracts AS CC ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN(
SELECT UniqueID FROM [dbo].DefinitiveMatches)
AND C.AssociatedUser IS NULL
You can try this:
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C
INNER JOIN
[dbo].AllContracts AS CC
ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND
C.AssociatedUser IS NULL
This will generate one extra row (because we left out C.UniqueID <> CC.UniqueID) but will give you the good souluton.
Following an example with some example data extracted from your original post. The idea: Generate all SyntheticID in a CTE, query all records with a "PossibleMatch" and Union it with all records which are not yet included:
DECLARE #t TABLE(
UniqueID int
,FirstName nvarchar(20)
,LastName nvarchar(20)
,Birthday datetime
)
INSERT INTO #t VALUES (1, 'Peter', 'Smith', '1980-11-04');
INSERT INTO #t VALUES (2, 'Peter', 'Gray', '1980-11-04');
INSERT INTO #t VALUES (3, 'Peter', 'Gray-Smith', '1980-11-04');
INSERT INTO #t VALUES (4, 'Frank', 'May', '1985-06-09');
INSERT INTO #t VALUES (5, 'Frank-Paul', 'May', '1985-06-09');
INSERT INTO #t VALUES (6, 'Gina', 'Ericson', '1950-11-04');
WITH ctePrep AS(
SELECT UniqueID, FirstName, LastName, BirthDay,
ROW_NUMBER() OVER (PARTITION BY FirstName, BirthDay ORDER BY FirstName, BirthDay) AS k,
FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
),
cteKeys AS(
SELECT FirstName, BirthDay, SyntheticID
FROM ctePrep
WHERE k = 1
),
cteFiltered AS(
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
keys.SyntheticID
FROM #t AS C
JOIN #t AS CC ON C.FirstName = CC.FirstName
AND C.Birthday = CC.Birthday
JOIN cteKeys AS keys ON keys.FirstName = c.FirstName
AND keys.Birthday = c.Birthday
WHERE C.UniqueID <> CC.UniqueID
)
SELECT UniqueID, PossiblyMatchingContracts, SyntheticID
FROM cteFiltered
UNION ALL
SELECT UniqueID, NULL, FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
WHERE UniqueID NOT IN (SELECT UniqueID FROM cteFiltered)
Hope this helps. The result looked OK to me:
UniqueID PossiblyMatchingContracts SyntheticID
---------------------------------------------------------------
2 1 PeterSmith1980-11-04
3 1 PeterSmith1980-11-04
1 2 PeterSmith1980-11-04
3 2 PeterSmith1980-11-04
1 3 PeterSmith1980-11-04
2 3 PeterSmith1980-11-04
4 NULL FrankMay1985-06-09
5 NULL Frank-PaulMay1985-06-09
6 NULL GinaEricson1950-11-04
Tested in SSMS, it works perfect. :)
--create table structure
create table #temp
(
uniqueID int,
firstname varchar(15),
lastname varchar(15),
birthday date
)
--insert data into the table
insert #temp
select 1, 'peter','smith','1980-11-04'
union all
select 2, 'peter','gray','1980-11-04'
union all
select 3, 'peter','gray-smith','1980-11-04'
union all
select 4, 'frank','may','1985-06-09'
union all
select 5, 'frank-paul','may','1985-06-09'
union all
select 6, 'gina','ericson','1950-11-04'
select * from #temp
--solution is as below
select ab.uniqueID
, PossiblyMatchingContracts
, c.firstname+c.lastname+cast(c.birthday as varchar) as synID
from
(
select a.uniqueID
, case
when a.uniqueID < min(b.uniqueID)over(partition by a.uniqueid)
then a.uniqueID
else min(b.uniqueID)over(partition by a.uniqueid)
end as SmallestID
, b.uniqueID as PossiblyMatchingContracts
from #temp a
left join #temp b
on (a.firstname = b.firstname OR a.lastname = b.lastname) AND a.birthday = b.birthday AND a.uniqueid <> b.uniqueID
) as ab
left join #temp c
on ab.SmallestID = c.uniqueID
Result capture is attached below:
Say we have following table (a VIEW in your case):
UniqueID PossiblyMatchingContracts SyntheticID
1 2 G1
1 3 G2
2 1 G3
2 3 G4
3 1 G4
3 4 G6
4 5 G7
5 4 G8
6 NULL G9
In your case you can set initial SyntheticID as a string like PeterSmith1980-11-04 using UniqueID for each line. Here is a recursive CTE query it divides all lines to unconnected groups and select MAX(SyntheticId) in the current group as a new SyntheticID for all lines in this group.
WITH CTE AS
(
SELECT CAST(','+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' as Varchar(MAX)) as GroupCont,
SyntheticID
FROM PossiblyMatchingContracts
UNION ALL
SELECT CAST(GroupCont+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' AS Varchar(MAX)) as GroupCont,
pm.SyntheticID
FROM CTE
JOIN PossiblyMatchingContracts as pm
ON
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
AND NOT
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
AND
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
)
SELECT pm.UniqueID,
pm.PossiblyMatchingContracts,
ISNULL(
(SELECT MAX(SyntheticID) FROM CTE WHERE
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
))
,pm.SyntheticID) as SyntheticID
FROM PossiblyMatchingContracts pm

Trying to find the total number of, distinct, clients logged on weekdays, weekends and both weekday+weekend

My table structure is as follows:
node_id | client_id | timestamp
--------+-----------+-----------
1 | 102 | 2012-02-01 (weekday)
--------+-----------+-----------
2 | 104 | 2012-02-01 (weekday)
--------+-----------+-----------
2 | 106 | 2012-02-02 (weekday)
--------+-----------+-----------
1 | 106 | 2012-02-02 (weekend)
--------+-----------+-----------
(added fake weekday/weekend to simplify things)
I need to find the total number of, distinct, client_id's logged on:
A weekday
The weekend
Both a weekday and the weekend
Is it possible to do this in MSSQL? Or will I have to resort to simply dumping all the data and parsing it in my program?
EDIT:
From the above table, the desired output would tell me that:
3 people were logged on Mon-Fri by nodes 1 & 2
1 person was logged on Sat-Sun by nodes 1
1 person was logged on Mon-Sun by nodes 1 & 2
Basically, I need to know how many clients were logged on Mon-Fri, Sat-Sun, Mon-Sun and by which nodes.
You can use datepart(weekday,?) to find the day-of-the-week for a date (see http://msdn.microsoft.com/en-us/library/ms174420.aspx); then its just a matter of specifying which of those you're interested in grouping together.
If I understand you correctly you can try this
SET DATEFIRST 1
declare #tbl table (node_id int identity(1,1), client_id int, dtm datetime)
insert into #tbl (client_id,dtm) values (1,'20111001'), (1,'20111001'),(1,'20111002'),(1,'20111003'),(1,'20111004')
,(2,'20111001'), (2,'20111003'),(2,'20111003'),(2,'20111003'),(2,'20111004')
--weekday
select client_id, COUNT(*)
FROM #tbl
WHERE DATEPART(DW,dtm)<6
GROUP BY client_id
--weekend
select client_id, COUNT(*)
FROM #tbl
WHERE DATEPART(DW,dtm)>5
GROUP BY client_id
This solution uses DATEPART function.
To set the first day of the week use SET DATEFIRST.
EDITED
In SQL-Server 2005+ you can do it this way:
SET DATEFIRST 1
DECLARE #tbl table (node_id int, client_id int, dtm datetime)
INSERT INTO #tbl (node_id,client_id,dtm) VALUES (1,102,'20120201'),(2,104,'20120201'),(2,106,'20120202'),(1,106,'20120204')
--weekday
SELECT CAST(COUNT(*) as varchar)+' people were logged on Mon-Fri by nodes '+
(select cast(node_id as varchar)+',' as 'data()' from #tbl WHERE DATEPART(DW,dtm)<6 GROUP BY node_id for xml path(''))
FROM #tbl
WHERE DATEPART(DW,dtm)<6
--weekend
SELECT CAST(COUNT(*) as varchar)+' people were logged on Sat-Sun by nodes '+
(select cast(node_id as varchar)+',' as 'data()' from #tbl WHERE DATEPART(DW,dtm)>5 GROUP BY node_id for xml path(''))
FROM #tbl
WHERE DATEPART(DW,dtm)>5
--all week
SELECT CAST(COUNT(*) as varchar)+' people were logged on Mon-Sun by nodes '+
(select cast(node_id as varchar)+',' as 'data()' from #tbl GROUP BY node_id for xml path(''))
FROM #tbl
WITH myCTE AS (
SELECT node_id, clientid, datepart(weekday, timestamp) wkday
FROM YourTable
)
SELECT T1.node_id, T1.count(distinct clientid) total, 'weekdays' category
FROM myCTE T1
WHERE wkday in (2,3,4,5,6) --weekdays
UNION ALL
SELECT T1.node_id, T1.count(distinct clientid) total, 'weekends' category
FROM myCTE T1
WHERE wkday in (1,7) --weekends
SELECT T1.node_id, T1.count(distinct clientid) total, 'mon-sat' category
FROM myCTE T1
WHERE wkday in (2,3,4,5,6,7) --mon-sat
Hope this helps... using the script above, you can follow that if you need more groupings of different combination of days, just keep adding a union all and change the where clause criteria.

Ungrouping effect?

I have a dynamic set of data X of the form:
----------------------------------
x.id | x.allocated | x.unallocated
----------------------------------
foo | 2 | 0
bar | 1 | 2
----------------------------------
And I need to get to a result of Y (order is unimportant):
----------------------------------
y.id | y.state
----------------------------------
foo | allocated
foo | allocated
bar | allocated
bar | unallocated
bar | unallocated
----------------------------------
I have a UTF based solution, but I'm looking for hyper-efficiency so I'm idly wondering if there's a statement based, non-procedural way to get this kind of "ungroup by" effect?
It feels like an unpivot, but my brain can't get there right now.
If you have a numbers table in your database, you could use that to help get your results. In my database, I have a table named Numbers with a Num column.
Declare #Temp Table(id VarChar(10), Allocated Int, UnAllocated Int)
Insert Into #Temp Values('foo', 2, 0)
Insert Into #Temp Values('bar',1, 2)
Select T.id,'Allocated'
From #Temp T
Inner Join Numbers
On T.Allocated >= Numbers.Num
Union All
Select T.id,'Unallocated'
From #Temp T
Inner Join Numbers
On T.unAllocated >= Numbers.Num
Using Sql Server 2005, UNPIVOT, and CTE you can try something like
DECLARE #Table TABLE(
id VARCHAR(20),
allocated INT,
unallocated INT
)
INSERT INTO #Table SELECT 'foo', 2, 0
INSERT INTO #Table SELECT 'bar', 1, 2
;WITH vals AS (
SELECT *
FROM
(
SELECT id,
allocated,
unallocated
FROM #Table
) p
UNPIVOT (Cnt FOR Action IN (allocated, unallocated)) unpvt
WHERE Cnt > 0
)
, Recurs AS (
SELECT id,
Action,
Cnt - 1 Cnt
FROM vals
UNION ALL
SELECT id,
Action,
Cnt - 1 Cnt
FROM Recurs
WHERE Cnt > 0
)
SELECT id,
Action
FROM Recurs
ORDER BY id, action
This answer is just to ping back to G Mastros and doesn't need any upvotes. I thought he would appreciate a performance boost to his already superior query.
SELECT
T.id,
CASE X.Which WHEN 1 THEN 'Allocated' ELSE 'Unallocated' END
FROM
#Temp T
INNER JOIN Numbers N
On N.Num <= CASE X.Which WHEN 1 THEN T.Allocated ELSE T.Unallocated END
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) X (Which)

SQL query getting data

In SQL Server 2000:
hello i have a table with the following structure:
sku brand product_name inventory_count
------ ------ ------------- ---------------
c001 honda honda car 1 3
t002 honda honda truck 1 6
c003 ford ford car 1 7
t004 ford ford truck 1 8
b005 honda honda bike 5 9
b006 ford ford bike 6 18
I'm using the following SQL query
select distinct left(sku,1) from products
this would return the following:
c
t
b
and then ...
c = car
t = truck
b = bike
this works great,
Now I want to get just one product example for each of the categories with the greatest INVENTORY_COUNT
so that it returns the data as:
c, "ford car 1"
t, "ford truck 1"
b, "ford bike 6"
what SQL query would i run to get that data??
i want the item with the greatest INVENTORY_COUNT for each category.. left(sku,1)
thanks!!
You could join the table on itself to filter out the rows with less than maximum inventory:
select left(a.sku,1), max(a.product_name), max(a.inventory_count)
from YourTable a
left join YourTable more_inv
on left(a.sku,1) = left(more_inv.sku,1)
and a.inventory_count < more_inv.inventory_count
where more_inv.sku is null
group by left(a.sku,1)
The WHERE condition on more_inv.sku is null filters out rows that don't have the highest inventory for their one letter category.
Once we're down to rows with the maximum inventory, you can use max() to get the inventory_count (it'll be the same for all rows) and another max() to get one of the products with the highest inventory_count. You could use min() too.
im using the following sql query which works,
SELECT DISTINCT left(field1,1) as cat , MAX(sku) as topproduct FROM products where inventory_count > 0 GROUP BY left(sku,1)
i just need to add in there an ..order by inventory_count
Using SQL Server 2005 you can try this
DECLARe #Table TABLE(
sku VARCHAR(50),
brand VARCHAR(50),
product_name VARCHAR(50),
inventory_count INT
)
INSERT INTO #Table SELECT 'c001', 'honda', 'honda car 1', 3
INSERT INTO #Table SELECT 't002', 'honda', 'honda truck 1', 6
INSERT INTO #Table SELECT 'c003', 'ford', 'ford car 1', 7
INSERT INTO #Table SELECT 't004', 'ford', 'ford truck 1', 8
INSERT INTO #Table SELECT 'b005', 'honda', 'honda bike 5', 9
INSERT INTO #Table SELECT 'b006', 'ford', 'ford bike 6', 18
SELECT LEFT(sku,1),
product_name
FROM (
SELECT *,
ROW_NUMBER() OVER( PARTITION BY LEFT(sku,1) ORDER BY inventory_count DESC) ORDERCOUNT
FROm #Table
) SUB
WHERE ORDERCOUNT = 1
OK Then you can try
SELECT LEFT(sku,1),
*
FROm #Table t INNER JOIN
(
SELECT LEFT(sku,1) c,
MAX(inventory_count) MaxNum
FROM #Table
GROUP BY LEFT(sku,1)
) sub ON LEFT(t.sku,1) = sub.c and t.inventory_count = sub.MaxNum
For mysql:
SELECT LEFT(sku,1), product_name FROM Table1 GROUP BY LEFT(sku,1)
For MS SQL 2005 (maybe also works in 2000?):
SELECT LEFT(sku,1), MAX(product_name) FROM Table1 GROUP BY LEFT(sku,1)
Try this
declare #t table (sku varchar(50),brand varchar(50),product_name varchar(50),inventory_count int)
insert into #t
select 'c001','honda','honda car 1',3 union all
select 't002','honda','honda truck 1',6 union all
select 'c004','ford','ford car 1',7 union all
select 't004','ford','ford truck 1',8 union all
select 'b005','honda','honda bike 5',9 union all
select 'b006','ford','ford bike 6',18
Query:
select
x.s + space(2) + ',' + space(2) + '"' + t.product_name + '"' as [Output]
from #t t
inner join
(
SELECT left(sku,1) as s,MAX(inventory_count) ic from #t
group by left(sku,1)
) x
on x.ic = t.inventory_count
--order by t.inventory_count desc
Output
c , "ford car 1"
t , "ford truck 1"
b , "ford bike 6"
In general, might there not be more than one item with max(inventory_count)?
To get max inventory per cateogry, use a subquery, (syntax will depend on your database):
SELECT LEFT(sku,1) as category, MAX(inventory_count) as c
FROM Table1
GROUP BY LEFT(sku,1)
SORT BY LEFT(sku,1)
This will give you a table of max_inventory by category, thus:
b,18
c,7
t,8
So now you know the max per category. To get matching products, use this result as
a subquery and find all products in the given cateogry that match the given max(inventory_count):
SELECT t1.*
FROM Table1 AS t1,
(SELECT LEFT(sku,1) AS category, MAX(inventory_count) AS c
FROM Table1
GROUP BY LEFT(sku,1)
) AS t2
WHERE LEFT(t1.sku,1) = t2.category AND t2.c = t1.inventory_count
Sorry, the code above may/may not work in your database, but hope you get the idea.
Bill
PS -- probably not helpful, but the table design isn't really helping you here. If you have control over the schema, would help to separate this into multiple tables.