SQL Query - Display Count & All ID's With Same Name - sql

I'm trying to display the amount of table entries with the same name and the unique ID's associated with each of those entries.
So I have a table like so...
Table Names
------------------------------
ID Name
0 John
1 Mike
2 John
3 Mike
4 Adam
5 Mike
I would like the output to be something like:
Name | Count | IDs
---------------------
Mike 3 1,3,5
John 2 0,2
Adam 1 4
I have the following query which does this except display all the unique ID's:
select name, count(*) as ct from names group by name order by ct desc;

select name,
count(id) as ct,
group_concat(id) as IDs
from names
group by name
order by ct desc;
You can use GROUP_CONCAT for that

Depending on version of MSSQL you are using (2005+), you can use the FOR XML PATH option.
SELECT
Name,
COUNT(*) AS ct,
STUFF((SELECT ',' + CAST(ID AS varchar(MAX))
FROM names i
WHERE i.Name = n.Name FOR XML PATH(''))
, 1, 1, '') as IDs
FROM names n
GROUP BY Name
ORDER BY ct DESC
Closest thing to group_concat you'll get on MSSQL unless you use the SQLCLR option (which I have no experience doing). The STUFF function takes care of the leading comma. Also, you don't want to alias the inner SELECT as it will wrap the element you're selecting in an XML element (alias of TD causes each element to return as <TD>value</TD>).
Given the input above, here's the result I get:
Name ct IDs
Mike 3 1,3,5
John 2 0,2
Adam 1 4
EDIT: DISCLAIMER
This technique will not work as intended for string fields that could possibly contain special characters (like ampersands &, less than <, greater than >, and any number of other formatting characters). As such, this technique is most beneficial for simple integer values, although can still be used for text if you are ABSOLUTELY SURE there are no special characters that would need to be escaped. As such, read the solution posted HERE to ensure these characters get properly escaped.

Here is another SQL Server method, using recursive CTE:
Link to SQLFiddle
; with MyCTE(name,ids, name_id, seq)
as(
select name, CAST( '' AS VARCHAR(8000) ), -1, 0
from Data
group by name
union all
select d.name,
CAST( ids + CASE WHEN seq = 0 THEN '' ELSE ', ' END + cast(id as varchar) AS VARCHAR(8000) ),
CAST( id AS int),
seq + 1
from MyCTE cte
join Data d
on cte.name = d.name
where d.id > cte.name_id
)
SELECT name, ids
FROM ( SELECT name, ids,
RANK() OVER ( PARTITION BY name ORDER BY seq DESC )
FROM MyCTE ) D ( name, ids, rank )
WHERE rank = 1

Related

Split column value into separate columns based on length

I have multiple comma-separated values in one column with a size up to 20000 characters, and I want to split that column into one column but its based on character values 2000 (like into one new column it will take 2000 character and if length is grater than 2000 then its will be in second column like this).
When it's comma-separated value goes into first new column, then it should be meaningful like it should be based on , and up to 2000 characters only like this.
I have done from row level value to column level only but its should be 2000 character and based on comma
Could you please help me with this ?
siddesh, although this question lacks of everything I want to point some things out and help you (as you are an unexperienced SO-user):
First I set up a minimal reproducible exampel. This is on you the next time.
I'll start with a declared table with some rows inserted.
We on SO can copy'n'paste this into our environment which makes it easy to answer.
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO #tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--This is, what you actually need:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
The result should be stored within a physical table, where the fkID points to the ID of the original row and the fragmentPosition stores the order. fkID and fragmentPosition should be a combined unique key.
If you really want to do, what you are suggesting in your question (not recommended!) you can try something along this:
DECLARE #maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>#maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>#maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
The result
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
The idea in short:
The first cte does the splitting (as above)
The recursive cte will iterate down the string and do the magic.
The final SELECT uses a hack with TOP 1 WITH TIES together with an ORDER BY ROW_NUMBER() OVER(...). This will return the highest intermediate result only.
Hint: Don't do this...
UPDATE
Just for fun:
You can replace the final SELECT with this
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
This will return exactly what you are asking for.
But - as said before - this is against all rules of best practice...

Concatenate multiple rows from inside a correlated 'group by' subquery into a single text string

Similar questions have been asked before but I am specifically looking for an answer to do much the same with a correlated subquery.
I am doing this on SQL Server, and I cannot utilize stored procedure or temp table creation approach.
For those familiar with Client Matter billing; I have formulated a 'group by' query using row_number technique to return me back the top 3 performers for each unique clientmatter, summing their amounts over a period of time.
This gives me something like this:
clientmatterno attorneyname amount seq_num
111111.00001 John Doe $30,000 1
111111.00001 Mark Tim $23,000 2
111111.00001 Jane Sue $15,000 3
111111.00001 Mary Ann $5,000 4
222221.00501 John Doe $35,000 1
222221.00501 David Hu $30,000 2
444444.00003 Shelly Y $50,000 1
I think, I would have to first do a group by clause to sum up the amounts for each attorney in order to find the totals and hence get the correct seq_num to appear across.
I am now trying to use this subquery results to do the string concatenation such that I get the following results:
111111.00001 John Doe|Mark Tim|Jane Sue
222221.00501 John Doe|David Hu
444444.00003 Shelly Y
The Query that I think will work, seeing past questions on this topic:
select subq.clientmatterno as [Id],
,
STUFF(
(SELECT DISTINCT ',' + subq.attorneyname
FROM ????
WHERE ????
FOR XML PATH (''))
, 1, 1, '') AS TopPerformers
from (
SELECT clientmatterno, attorneyname, sum(amount),
row_number() over (partition by clientmatterno order by sum(amount) desc) as seq_num
FROM ...
WHERE ...
GROUP BY clientmatterno, attorneyname
) as subq
where seq_num <= 3
group by clientmatterno
My problem is on how to connect and build up the STUFF function. The error is very simple: I cannot seem to use the subquery set 'subq' in the FROM clause inside the STUFF function.
I have not tried out XML FOR Auto approach.
Try using a common table expression instead of a derived table:
with cte as (
SELECT
clientmatterno,
attorneyname,
sum(amount) amount,
seq = row_number() over (partition by clientmatterno order by sum(amount) desc)
FROM ...
WHERE ...
GROUP BY clientmatterno, attorneyname
)
SELECT
clientmatterno,
STUFF(
(
SELECT '|' + attorneyname
FROM cte
WHERE clientmatterno = a.clientmatterno
AND seq <= 3
FOR XML PATH ('')
), 1, 1, ''
) AS Attorneynames
FROM cte AS a
GROUP BY clientmatterno

Need to select a JSON array element dynamically from a postgresql table

I have data as follows:
ID Name Data
1 Joe ["Mary","Joe"]
2 Mary ["Sarah","Mary","Mary"]
3 Bill ["Bill","Joe"]
4 James ["James","James","James"]
I want to write a query that selects the LAST element from the array, which does not equal the Name field. For example, I want the query to return the following results:
ID Name Last
1 Joe Mary
2 Mary Sarah
3 Bill Joe
4 James (NULL)
I am getting close - I can select the last element with the following query:
SELECT ID, Name,
(Data::json->(json_array_length(Data::json)-1))::text AS Last
FROM table;
ID Name Last
1 Joe Joe
2 Mary Mary
3 Bill Joe
4 James James
However, I need one more level - to evaluate the last item, and if it is the same as the name field, to try the next to last field, and so on.
Any help or pointers would be most appreciated!
json in Postgres 9.3
This is hard in pg 9.3, because useful functionality is missing.
Method 1
Unnest in a LEFT JOIN LATERAL (clean and standard-conforming), trim double-quotes from json after casting to text. See links below.
SELECT DISTINCT ON (1)
t.id, t.name, d.last
FROM tbl t
LEFT JOIN LATERAL (
SELECT ('[' || d::text || ']')::json->>0 AS last
FROM json_array_elements(t.data) d
) d ON d.last <> t.name
ORDER BY 1, row_number() OVER () DESC;
While this works, and I have never seen it fail, the order of unnested elements depends on undocumented behavior. See links below!
Improved the conversion from json to text with the expression provided by #pozs in the comment. Still hackish, but should be safe.
Method 2
SELECT DISTINCT ON (1)
id, name, NULLIF(last, name) AS last
FROM (
SELECT t.id, t.name
,('[' || json_array_elements(t.data)::text || ']')::json->>0 AS last
, row_number() OVER () AS rn
FROM tbl t
) sub
ORDER BY 1, (last = name), rn DESC;
Unnest in the SELECT list (non-standard).
Attach row number (rn) in parallel (more reliable).
Convert to text like above.
The expression (last = name) in the ORDER BY clause sorts matching names last (but before NULL). So a matching name is only selected if no other name is available. Last link below.
In the SELECT list, NULLIF replaces a matching name with NULL, arriving at the same result as above.
SQL Fiddle.
json or jsonb in Postgres 9.4
pg 9.4 ships all the necessary improvements:
SELECT DISTINCT ON (1)
t.id, t.name, d.last
FROM tbl t
LEFT JOIN LATERAL json_array_elements_text(data) WITH ORDINALITY d(last, rn)
ON d.last <> t.name
ORDER BY d.rn DESC;
Use jsonb_array_elements_text() for jsonb. All else the same.
json / jsonb functions in the manual
Related answers with more explanation:
How to turn json array into postgres array?
PostgreSQL unnest() with element number
Index for finding an element in a JSON array
Time based priority in Active Record Query
Select first row in each GROUP BY group?

How to split the string value in one column and return the result table

Assume we have the following table:
id name member
1 jacky a;b;c
2 jason e
3 kate i;j;k
4 alex null
Now I want to use the sql or t-sql to return the following table:
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
......
How to do that?
I'm using the MSSQL, MYSQL and Oracle database.
This is the shortest and readable string-to-rows splitter one could devise, and could be faster too.
Use case of choosing pure CTE instead of function, e.g. when you're not allowed to create a function on database :-)
Creating rows generator via function(which could be implemented by using loop or via CTE too) shall still need to use lateral joins(DB2 and Sybase have this functionality, using LATERAL keyword; In SQL Server, this is similar to CROSS APPLY and OUTER APPLY) to ultimately join the splitted rows generated by a function to the main table.
Pure CTE approach could be faster than function approach. The speed metrics lies in profiling though, just check the execution plan of this compared to other solutions if this is indeed faster:
with Pieces(theId, pn, start, stop) AS
(
SELECT id, 1, 1, charindex(';', member)
from tbl
UNION ALL
SELECT id, pn + 1, stop + 1, charindex(';', member, stop + 1)
from tbl
join pieces on pieces.theId = tbl.id
WHERE stop > 0
)
select
t.id, t.name,
word =
substring(t.member, p.start,
case WHEN stop > 0 THEN p.stop - p.start
ELSE 512
END)
from tbl t
join pieces p on p.theId = t.id
order by t.id, p.pn
Output:
ID NAME WORD
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
3 kate j
3 kate k
4 alex (null)
Base logic sourced here: T-SQL: Opposite to string concatenation - how to split string into multiple records
Live test: http://www.sqlfiddle.com/#!3/2355d/1
Well... let me first introduce you to Adam Machanic who taught me about a Numbers table. He's also written a very fast split function using this Numbers table.
http://dataeducation.com/counting-occurrences-of-a-substring-within-a-string/
After you implement a Split function that returns a table, you can then join against it and get the results you want.
IF OBJECT_ID('dbo.Users') IS NOT NULL
DROP TABLE dbo.Users;
CREATE TABLE dbo.Users
(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
member VARCHAR(1000)
)
GO
INSERT INTO dbo.Users(name, member) VALUES
('jacky', 'a;b;c'),
('jason', 'e'),
('kate', 'i;j;k'),
('alex', NULL);
GO
DECLARE #spliter CHAR(1) = ';';
WITH Base AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Base
WHERE n < CEILING(SQRT(1000)) --generate numbers from 1 to 1000, you may change it to a larger value depending on the member column's length.
)
, Nums AS --Numbers Common Table Expression, if your database version doesn't support it, just create a physical table.
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n
FROM Base AS B1 CROSS JOIN Base AS B2
)
SELECT id,
SUBSTRING(member, n, CHARINDEX(#spliter, member + #spliter, n) - n) AS element
FROM dbo.Users
JOIN Nums
ON n <= DATALENGTH(member) + 1
AND SUBSTRING(#spliter + member, n, 1) = #spliter
ORDER BY id
OPTION (MAXRECURSION 0); --Nums CTE is generated recursively, we don't want to limit recursion count.

How to select the top n from a union of two queries where the resulting order needs to be ranked by individual query?

Let's say I have a table with usernames:
Id | Name
-----------
1 | Bobby
20 | Bob
90 | Bob
100 | Joe-Bob
630 | Bobberino
820 | Bob Junior
I want to return a list of n matches on name for 'Bob' where the resulting set first contains exact matches followed by similar matches.
I thought something like this might work
SELECT TOP 4 a.* FROM
(
SELECT * from Usernames WHERE Name = 'Bob'
UNION
SELECT * from Usernames WHERE Name LIKE '%Bob%'
) AS a
but there are two problems:
It's an inefficient query since the sub-select could return many rows (looking at the execution plan shows a join happening before top)
(Almost) more importantly, the exact match(es) will not appear first in the results since the resulting set appears to be ordered by primary key.
I am looking for a query that will return (for TOP 4)
Id | Name
---------
20 | Bob
90 | Bob
(and then 2 results from the LIKE query, e.g. 1 Bobby and 100 Joe-Bob)
Is this possible in a single query?
You could use a case to place the exact matches on top:
select top 4 *
from Usernames
where Name like '%Bob%'
order by
case when Name = 'Bob' then 1 else 2 end
Or, if you're worried about performance and have an index on (Name):
select top 4 *
from (
select 1 as SortOrder
, *
from Usernames
where Name = 'Bob'
union all
select 2
, *
from Usernames
where Name like '%Bob%'
and Name <> 'Bob'
and 4 >
(
select count(*)
from Usernames
where Name = 'Bob'
)
) as SubqueryAlias
order by
SortOrder
A slight modification to your original query should solve this. You could add in an additional UNION that matches WHERE Name LIKE 'Bob%' and give this priority 2, changing the '%Bob' priority to 3 and you'd get an even better search IMHO.
SELECT TOP 4 a.* FROM
(
SELECT *, 1 AS Priority from Usernames WHERE Name = 'Bob'
UNION
SELECT *, 2 from Usernames WHERE Name LIKE '%Bob%'
) AS a
ORDER BY Priority ASC
This might do what you want with better performance.
SELECT TOP 4 a.* FROM
(
SELECT TOP 4 *, 1 AS Sort from Usernames WHERE Name = 'Bob'
UNION ALL
SELECT TOP 4 *, 2 AS Sort from Usernames WHERE Name LIKE '%Bob%' and Name <> 'Bob'
) AS a
ORDER BY Sort
This works for me:
SELECT TOP 4 * FROM (
SELECT 1 as Rank , I, name FROM Foo WHERE Name = 'Bob'
UNION ALL
SELECT 2 as Rank,i,name FROM Foo WHERE Name LIKE '%Bob%'
) as Q1
ORDER BY Q1.Rank, Q1.I
SET ROWCOUNT 4
SELECT * from Usernames WHERE Name = 'Bob'
UNION
SELECT * from Usernames WHERE Name LIKE '%Bob%'
SET ROWCOUNt 0
The answer from Will A got me over the line, but I'd like to add a quick note, that if you're trying to do the same thing and incorporate "FOR XML PATH", you need to write it slightly differently.
I was specifying XML attributes and so had things like :
SELECT Field_1 as [#attr_1]
What you have to do is remove the "#" symbol in the sub queries and then add them back in with the outer query. Like this:
SELECT top 1 a.SupervisorName as [#SupervisorName]
FROM
(
SELECT (FirstNames + ' ' + LastName) AS [SupervisorName],1 as OrderingVal
FROM ExamSupervisor SupervisorTable1
UNION ALL
SELECT (FirstNames + ' ' + LastName) AS [SupervisorName],2 as OrderingVal
FROM ExamSupervisor SupervisorTable2
) as a
ORDER BY a.OrderingVal ASC
FOR XML PATH('Supervisor')
This is a cut-down version of my final query, so it doesn't really make sense, but you should get the idea.