Partial Combination Count - sql

Another person asked a similar question found here:
Frequency of all combinations of values for certain column
in it they asked:
*I have a dataset in SQL Server 2012 with a column for id and value, like this:
[id] [value]
--------------
A 15
A 11
A 11
B 13
B 15
B 12
C 12
C 13
D 13
D 12
My goal is to get a frequency count of all combinations of [value], with two caveats:
Order doesn't matter, so [11,12,15] is not counted separately from [12,11,15]
Repeated values are counted separately, so [11,11,12,15] is counted separately from [11,12,15]
I'm interested in all combinations, of any length (not just pairs)
So the outcome would look like:
[combo] [frequency]
---------------------
11,11,15 1
12,13,15 1
12,13 2
I've seen answers here involving recursion that answer similar questions but where order counts, and answers here involving self-joins that yield pair-wise combinations. These come close but I'm not quite sure how to adapt for my specific needs.*
somone then responded with a pretty good response/answer
select vals, count(*) as frequency
from (select string_agg(value, ',') within group (order by value) as vals, id
from t
group by id
) i
group by vals;
The only problem I see is although 12,13,15 occurs once and 12,13,(nothing) happens twice the solution I would like is that 12,13 actually occurs 3 times (while also reporting 12,13,15 count as well)!
#Gordon Linoff
#zealous
select vals, count(*) as frequency
from (select string_agg(value, ',') within group (order by value) as vals, id
from t
group by id
) i
group by vals;
expecting counts of any instance of two or more values in a combination.

It seems that you have an overlapping groups problem here, i.e (12,13) overlaps with (12,13,15) and (11,12,13). You may solve this problem by performing a self left join as the following query.
Note that your version of (SQL Server 2012) does not support the STRING_AGG function so I used the xml path method instead.
WITH CTE AS
(
SELECT T.id
, STUFF((SELECT CONCAT(',', D.[value])
FROM table_name D
WHERE D.id = T.id
ORDER BY D.[value]
FOR XML PATH('')), 1, LEN(','), '') AS combo
FROM table_name T
GROUP BY T.id
)
SELECT A.combo, COUNT(DISTINCT B.id) frequency
FROM CTE A LEFT JOIN CTE B
ON B.combo LIKE CONCAT('%', A.combo, '%')
GROUP BY A.combo
ORDER BY MAX(A.id)
See a demo.
The output according to your input data:
combo frequency
11,11,15 1
12,13,15 1
12,13 3

Related

SQL - Count Results of 2 Columns

I have the following table which contains ID's and UserId's.
ID UserID
1111 11
1111 300
1111 51
1122 11
1122 22
1122 3333
1122 45
I'm trying to count the distinct number of 'IDs' so that I get a total, but I also need to get a total of ID's that have also seen the that particular ID as well... To get the ID's, I've had to perform a subquery within another table to get ID's, I then pass this into the main query... Now I just want the results to be displayed as follows.
So I get a Total No for ID and a Total Number for Users ID - Also would like to add another column to get average as well for each ID
TotalID Total_UserID Average
2 7 3.5
If Possible I would also like to get an average as well, but not sure how to calculate that. So I would need to count all the 'UserID's for an ID add them altogether and then find the AVG. (Any Advice on that caluclation would be appreciated.)
Current Query.
SELECT DISTINCT(a.ID)
,COUNT(b.UserID)
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
GROUP BY a.ID
Which then Lists all the IDs and COUNT's all the USERID.. I would like a total of both columns. I've tried warpping the query in a
SELECT COUNT(*) FROM (
but this only counts the ID's which is great, but how do I count the USERID column as well
You seem to want this:
SELECT COUNT(DISTINCT a.ID), COUNT(b.UserID),
COUNT(b.UserID) * 1.0 / COUNT(DISTINCT a.ID)
FROM a INNER JOIN
b
ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999);
Note: DISTINCT is not a function. It applies to the whole row, so it is misleading to put an expression in parentheses after it.
Also, the GROUP BY is unnecessary.
The 1.0 is because SQL Server does integer arithmetic and this is a simple way to convert a number to a decimal form.
You can use
SELECT COUNT(DISTINCT a.ID) ...
to count all distinct values
Read details here
I believe you want this:
select TotalID,
Total_UserID,
sum(Total_UserID+TotalID) as Total,
Total_UserID/TotalID as Average
from (
SELECT (DISTINCT a.ID) as TotalID
,COUNT(b.UserID) as Total_UserID
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
) x

Need to select a JSON array element dynamically from a postgresql table

I have data as follows:
ID Name Data
1 Joe ["Mary","Joe"]
2 Mary ["Sarah","Mary","Mary"]
3 Bill ["Bill","Joe"]
4 James ["James","James","James"]
I want to write a query that selects the LAST element from the array, which does not equal the Name field. For example, I want the query to return the following results:
ID Name Last
1 Joe Mary
2 Mary Sarah
3 Bill Joe
4 James (NULL)
I am getting close - I can select the last element with the following query:
SELECT ID, Name,
(Data::json->(json_array_length(Data::json)-1))::text AS Last
FROM table;
ID Name Last
1 Joe Joe
2 Mary Mary
3 Bill Joe
4 James James
However, I need one more level - to evaluate the last item, and if it is the same as the name field, to try the next to last field, and so on.
Any help or pointers would be most appreciated!
json in Postgres 9.3
This is hard in pg 9.3, because useful functionality is missing.
Method 1
Unnest in a LEFT JOIN LATERAL (clean and standard-conforming), trim double-quotes from json after casting to text. See links below.
SELECT DISTINCT ON (1)
t.id, t.name, d.last
FROM tbl t
LEFT JOIN LATERAL (
SELECT ('[' || d::text || ']')::json->>0 AS last
FROM json_array_elements(t.data) d
) d ON d.last <> t.name
ORDER BY 1, row_number() OVER () DESC;
While this works, and I have never seen it fail, the order of unnested elements depends on undocumented behavior. See links below!
Improved the conversion from json to text with the expression provided by #pozs in the comment. Still hackish, but should be safe.
Method 2
SELECT DISTINCT ON (1)
id, name, NULLIF(last, name) AS last
FROM (
SELECT t.id, t.name
,('[' || json_array_elements(t.data)::text || ']')::json->>0 AS last
, row_number() OVER () AS rn
FROM tbl t
) sub
ORDER BY 1, (last = name), rn DESC;
Unnest in the SELECT list (non-standard).
Attach row number (rn) in parallel (more reliable).
Convert to text like above.
The expression (last = name) in the ORDER BY clause sorts matching names last (but before NULL). So a matching name is only selected if no other name is available. Last link below.
In the SELECT list, NULLIF replaces a matching name with NULL, arriving at the same result as above.
SQL Fiddle.
json or jsonb in Postgres 9.4
pg 9.4 ships all the necessary improvements:
SELECT DISTINCT ON (1)
t.id, t.name, d.last
FROM tbl t
LEFT JOIN LATERAL json_array_elements_text(data) WITH ORDINALITY d(last, rn)
ON d.last <> t.name
ORDER BY d.rn DESC;
Use jsonb_array_elements_text() for jsonb. All else the same.
json / jsonb functions in the manual
Related answers with more explanation:
How to turn json array into postgres array?
PostgreSQL unnest() with element number
Index for finding an element in a JSON array
Time based priority in Active Record Query
Select first row in each GROUP BY group?

SQL Server find the missing number

I have a table like below
id name year
--------------
1 A 2000
2 B 2000
2 B 2000
2 B 2000
5 C 2000
1 D 2001
3 E 2001
as well as you see in the year 2000 we missed id '3' and id '4' and in the year 2001 we missed id '2'. I want to generate my second table which includes missed items.
2nd table :
From-id to-id name year
--------------------------------
3 4 null 2000
2 null null 2001
Which method in a SQL query can solve my problem?
Gaps and Islands in Sequences is the name of this problem. you read this article
Here's something to get you started:
WITH cte AS
(
SELECT *
FROM
(VALUES
(1),(2),(3),(4),(5)
) Tally(number)
), cte2 as
(
SELECT DISTINCT [year]
FROM
(VALUES
(2000),(2000),(2001)
)tbl([year])
), cte3 as
(
SELECT *
FROM cte
CROSS JOIN cte2
)
SELECT *
FROM cte3
LEFT OUTER JOIN YourTable ON cte3.number = YourTable.id AND cte3.[year] = YourTable[year)
A few notes: please avoid using reserved keywords as column names (such as year).
Furthermore, since I didn't know how you'd handle multiple missing ranges I did not format the output to reflect a range. For example: What would be your expected output if only one row with id=3 would be in your table?
I'd probably use ROW_NUMBER for this
This query gives you what the correct ID should be (if I interpreted your question right):
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
It assigns a row number (so a number starting from 1 increasing by 1 every time the year is the same).
And to let you know which ones are missing I think this should be a working solution:
WITH missing AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY yr ORDER BY name, yr) as "Correct ID", *
FROM misorder
)
SELECT * FROM missing
WHERE "Correct ID" != "id"
It takes the first query as a base to select only those records where the assumed correct ID is not equal to the currently assigned ID. You can turn this into a query to include the ranges you mentioned, but not sure if that is really necessary.
Hope this helps.

How to concatenate rows delimited with comma using standard SQL?

Let's suppose we have a table T1 and a table T2. There is a relation of 1:n between T1 and T2. I would like to select all T1 along with all their T2, every row corresponding to T1 records with T2 values concatenated, using only SQL-standard operations.
Example:
T1 = Person
T2 = Popularity (by year)
for each year a person has a certain popularity
I would like to write a selection using SQL-standard operations, resulting something like this:
Person.Name Popularity.Value
John Smith 1.2,5,4.2
John Doe NULL
Jane Smith 8
where there are 3 records in the popularity table for John Smith, none for John Doe and one for Jane Smith, their values being the values represented above. Is this possible? How?
I'm using Oracle but would like to do this using only standard SQL.
Here's one technique, using recursive Common Table Expressions. Unfortunately, I'm not confident on its performance.
I'm sure that there are ways to improve this code, but it shows that there doesn't seem to be an easy way to do something like this using just the SQL standard.
As far as I can see, there really should be some kind of STRINGJOIN aggregate function that would be used with GROUP BY. That would make things like this much easier...
This query assumes that there is some kind of PersonID that joins the two relations, but the Name would work too.
WITH cte (id, Name, Value, ValueCount) AS (
SELECT id,
Name,
CAST(Value AS VARCHAR(MAX)) AS Value,
1 AS ValueCount
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS id,
Name,
Value
FROM Person AS per
INNER JOIN Popularity AS pop
ON per.PersonID = pop.PersonID
) AS e
WHERE id = 1
UNION ALL
SELECT e.id,
e.Name,
cte.Value + ',' + CAST(e.Value AS VARCHAR(MAX)) AS Value,
cte.ValueCount + 1 AS ValueCount
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS id,
Name,
Value
FROM Person AS per
INNER JOIN Popularity AS pop
ON per.PersonID = pop.PersonID
) AS e
INNER JOIN cte
ON e.id = cte.id + 1
AND e.Name = cte.Name
)
SELECT p.Name, agg.Value
FROM Person p
LEFT JOIN (
SELECT Name, Value
FROM (
SELECT Name,
Value,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ValueCount DESC)AS id
FROM cte
) AS p
WHERE id = 1
) AS agg
ON p.Name = agg.Name
This is an example result:
--------------------------------
| Name | Value |
--------------------------------
| John Smith | 1.2,5,4.2 |
--------------------------------
| John Doe | NULL |
--------------------------------
| Jane Smith | 8 |
--------------------------------
As per in Oracle you can use listagg to achive this -
select t1.Person_Name, listagg(t2.Popularity_Value)
within group(order by t2.Popularity_Value)
from t1, t2
where t1.Person_Name = t2.Person_Name (+)
group by t1.Person_Name
I hope this will solve your problem.
But the comment you have given after #DavidJashi question .. well this is not sql standard and I think he is correct. I am also with David that you can not achieve this in pure sql statement.
I know that I'm SUPER late to the party, but for anyone else that might find this, I don't believe that this is possible using pure SQL92. As I discovered in the last few months fighting with NetSuite to try to figure out what Oracle methods I can and cannot use with their ODBC driver, I discovered that they only "support and guarantee" SQL92 standard.
I discovered this, because I had a need to perform a LISTAGG(). Once I found out I was restricted to SQL92, I did some digging through the historical records, and LISTAGG() and recursive queries (common table expressions) are NOT supported in SQL92, at all.
LISTAGG() was added in Oracle SQL version 11g Release 2 (2009 – 11 years ago: reference https://oracle-base.com/articles/misc/string-aggregation-techniques#listagg) , CTEs were added to Oracle SQL in version 9.2 (2007 – 13 years ago: reference https://www.databasestar.com/sql-cte-with/).
VERY frustrating that it's completely impossible to accomplish this kind of effect in pure SQL92, so I had to solve the problem in my C# code after I pulled a ton of extra unnecessary data. Very frustrating.

SQL Query - Display Count & All ID's With Same Name

I'm trying to display the amount of table entries with the same name and the unique ID's associated with each of those entries.
So I have a table like so...
Table Names
------------------------------
ID Name
0 John
1 Mike
2 John
3 Mike
4 Adam
5 Mike
I would like the output to be something like:
Name | Count | IDs
---------------------
Mike 3 1,3,5
John 2 0,2
Adam 1 4
I have the following query which does this except display all the unique ID's:
select name, count(*) as ct from names group by name order by ct desc;
select name,
count(id) as ct,
group_concat(id) as IDs
from names
group by name
order by ct desc;
You can use GROUP_CONCAT for that
Depending on version of MSSQL you are using (2005+), you can use the FOR XML PATH option.
SELECT
Name,
COUNT(*) AS ct,
STUFF((SELECT ',' + CAST(ID AS varchar(MAX))
FROM names i
WHERE i.Name = n.Name FOR XML PATH(''))
, 1, 1, '') as IDs
FROM names n
GROUP BY Name
ORDER BY ct DESC
Closest thing to group_concat you'll get on MSSQL unless you use the SQLCLR option (which I have no experience doing). The STUFF function takes care of the leading comma. Also, you don't want to alias the inner SELECT as it will wrap the element you're selecting in an XML element (alias of TD causes each element to return as <TD>value</TD>).
Given the input above, here's the result I get:
Name ct IDs
Mike 3 1,3,5
John 2 0,2
Adam 1 4
EDIT: DISCLAIMER
This technique will not work as intended for string fields that could possibly contain special characters (like ampersands &, less than <, greater than >, and any number of other formatting characters). As such, this technique is most beneficial for simple integer values, although can still be used for text if you are ABSOLUTELY SURE there are no special characters that would need to be escaped. As such, read the solution posted HERE to ensure these characters get properly escaped.
Here is another SQL Server method, using recursive CTE:
Link to SQLFiddle
; with MyCTE(name,ids, name_id, seq)
as(
select name, CAST( '' AS VARCHAR(8000) ), -1, 0
from Data
group by name
union all
select d.name,
CAST( ids + CASE WHEN seq = 0 THEN '' ELSE ', ' END + cast(id as varchar) AS VARCHAR(8000) ),
CAST( id AS int),
seq + 1
from MyCTE cte
join Data d
on cte.name = d.name
where d.id > cte.name_id
)
SELECT name, ids
FROM ( SELECT name, ids,
RANK() OVER ( PARTITION BY name ORDER BY seq DESC )
FROM MyCTE ) D ( name, ids, rank )
WHERE rank = 1