SQL return only rows where value exists multiple times and other value is present - sql

I have a table like this in MS SQL SERVER
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
| 3 | C |
| 3 | C |
+------+------+
I don't know the values in column "Cust" and I want to return all rows where the value of "Cust" appears multiple times and where at least one of the "ID" values is "1".
Like this:
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
+------+------+
Any ideas? I can't find it.

You may use COUNT window function as the following:
SELECT ID, Cust
FROM
(
SELECT ID, Cust,
COUNT(*) OVER (PARTITION BY Cust) cn,
COUNT(CASE WHEN ID=1 THEN 1 END) OVER (PARTITION BY Cust) cn2
FROM table_name
) T
WHERE cn>1 AND cn2>0
ORDER BY ID, Cust
COUNT(*) OVER (PARTITION BY Cust) to check if the value of "Cust" appears multiple times.
COUNT(CASE WHEN ID=1 THEN 1 END) OVER (PARTITION BY Cust) to check that at least one of the "ID" values is "1".
See a demo.

Related

Boolean Condition Group By, Window Functions

I am trying to add a new column based on a condition in a group.
Could we do something like
BOOL() OVER (PARTITION BY id 'D' in val)
That is something like GROUP BY id and check if the value 'D" val column
Input:
-------------
| id | val |
------------|
| 1 | A |
| 1 | B |
| 1 | D |
| 2 | B |
| 2 | C |
| 2 | A |
Output
-------------------
| id | val | res |
------------|-----|
| 1 | A | 1 |
| 1 | B | 1 |
| 1 | D | 1 |
| 2 | B | 0 |
| 2 | C | 0 |
| 2 | A | 0 |
You didn't specify your DBMS, but in standard ANSI SQL, you can use a filter() clause:
count(*) filter (where val = 'D') over (partition by id) > 0
In Postgres, you can use bool_or() for this:
select t.*, bool_or(val = 'D') over(partition by id) res
from mytable t
Demo on DB Fiddle:
id | val | res
-: | :-- | :--
1 | A | t
1 | B | t
1 | D | t
2 | B | f
2 | C | f
2 | A | f
This gives you a boolean result. If you want it as an integer value instead, then:
(bool_or(val = 'D') over(partition by id))::int

SQL Count number of column based on another column

Simple question here but I don't have any data to test with so I'm not sure if it is right.
Let's say I have a table that looks something like this:
+---------+---------+
| ColumnA | ColumnB |
+---------+---------+
| 1 | a |
| 2 | a |
| 3 | a |
| 3 | b |
| 4 | a |
| 5 | a |
| 6 | a |
| 6 | b |
| 6 | c |
| 7 | a |
+---------+---------+
I want to count the number of distinct items in column b for each column a.
So the result table would look like this:
+---------+---------+
| ColumnA | Count |
+---------+---------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 1 |
| 6 | 3 |
| 7 | 1 |
+---------+---------+
How would I do this?
I've been trying something like this:
dense_rank() over (partition by ColumnA order by ColumnB) count
You can use GROUP BY and COUNT with DISTINCT:
SELECT ColumnA, COUNT(DISTINCT ColumnB) AS [Count]
FROM MyTable
GROUP BY ColumnA
demo on dbfiddle.uk

Hive - over (partition by ...) with a column not in group by

Is it possible to do something like:
select
avg(count(distinct user_id))
over (partition by some_date) as average_users_per_day
from user_activity
group by user_type
(notably, the partition by column, some_date, is not in the group by columns)
The idea I'm going for is something like: the average users per day by user type.
I know how to do it using subqueries (see below), but I'd like to know if there is a nice way using only over (partition by ...) and group by.
Notes:
From reading this answer, my understanding (correct me if I'm wrong) is that the following query:
select
avg(count(distinct a)) over (partition by b)
from foo
group by b
can be expanded equivalently to:
select
avg(count_distinct_a)
from (
select
b,
count(distinct a) as count_distinct_a
from foo
group by b
)
group by b
And from that, I can tweak it a bit to achieve what I want:
select
avg(count_distinct_user_id) as average_users_per_day
from (
select
user_type,
count(distinct user_id) as count_distinct_user_id
from user_activity
group by user_type, some_date
)
group by user_type
(notably, the inner group by user_type, some_date differs from the outer group by user_type)
I'd like to be able to tell the partition by-group by interaction to use a "sub-group-by" for the windowing part. Please let me know if my understanding of partition by/group by is completely off.
EDIT: Some sample data and desired output.
Source table:
+---------+-----------+-----------+
| user_id | user_type | some_date |
+---------+-----------+-----------+
| 1 | a | 1 |
| 1 | a | 2 |
| 2 | a | 1 |
| 3 | a | 2 |
| 3 | a | 2 |
| 4 | b | 2 |
| 5 | b | 1 |
| 5 | b | 3 |
| 5 | b | 3 |
| 6 | c | 1 |
| 7 | c | 1 |
| 8 | c | 4 |
| 9 | c | 2 |
| 9 | c | 3 |
| 9 | c | 4 |
+---------+-----------+-----------+
Sample intermediate table (for reasoning with):
+-----------+-----------+---------------------+
| user_type | some_date | distinct_user_count |
+-----------+-----------+---------------------+
| a | 1 | 2 |
| a | 2 | 2 |
| b | 1 | 1 |
| b | 2 | 1 |
| b | 3 | 1 |
| c | 1 | 2 |
| c | 2 | 1 |
| c | 3 | 1 |
| c | 4 | 2 |
+-----------+-----------+---------------------+
SQL is: select user_type, some_date, count(distinct user_id) from user_activity group by user_type, some_date.
Desired result:
+-----------+---------------------+
| user_type | average_daily_users |
+-----------+---------------------+
| a | 2 |
| b | 1 |
| c | 1.5 |
+-----------+---------------------+

Limit a sorted number of rows joined

I have two tables, A and B, and a join table M. I want to, for each A.id, get the top 2 B.id's sorting on the value in table M, producing the results below. This is running on an Azure SQL database
Table A Table M Table B
+-----+ +-----+-----+-------+ +-----+
| Id | | AId | BId | Value | | Id |
+-----+ +-----+-----+-------+ +-----+
| 1 | | 1 | 3 | 4 | | 1 |
| 2 | | 1 | 2 | 3 | | 2 |
| 3 | | 3 | 2 | 3 | | 3 |
| 4 | | 3 | 5 | 6 | | 4 |
+-----+ | 3 | 3 | 4 | | 5 |
| 4 | 1 | 2 | +-----+
| 4 | 2 | 1 |
| 4 | 4 | 3 |
+-----+-----+-------+
Result
+-----+-----+-------+
| AId | BId | Value |
+-----+-----+-------+
| 1 | 3 | 4 |
| 1 | 2 | 3 |
| 3 | 5 | 6 |
| 3 | 3 | 4 |
| 4 | 1 | 2 |
| 4 | 4 | 3 |
+-----+-----+-------+
I know that I can select all the M.AId rows where they equal 1, sort it, and limit by 2, but I need to do this for every row in Table A. I've made an attempt to use group by, but I wasn't sure how to sort and limit it. I've also tried to search for resources associated with this issue but I couldn't find any resources.
(I also wasn't sure how to word the title for this issue)
You can just use ROW_NUMBER:
SELECT
AId, BId, Value
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY AId ORDER BY Value DESC)
FROM M
) t
WHERE Rn <= 2

Create a combined list from two tables

I have a table with CostCenter_ID (int) and a second table with Process_ID (int).
I'd like to combine the results of both tables so that each cost center ID is assigned to all process IDs, like so:
|CostCenterID | ProcessID |
---------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
I've done it before but I'm drawing a blank. I've tried this:
SELECT CostCenter_ID,NULL FROM dbo.Cost_Centers
UNION ALL
SELECT NULL,Process_ID FROM dbo.Processes
which returns this:
|CostCenterID | ProcessID |
---------------------------
| 1 | NULL |
| NULL | 1 |
| NULL | 2 |
| NULL | 3 |
Try:
select a.CostCenterID, b.ProcessID
from table1 a
cross join table2 b
or:
select a.CostCenterID, b.ProcessID
from table1 a
,table2 b
NB: cross join is the better method as it makes it clearer to the reader what your intentions are.
More info (with pics) here: http://www.w3resource.com/sql/joins/cross-join.php