How to assign groups to another group in sql? - sql

I have a grouped result/table:
tenant|city|count|
1 |A |36 |
2 |A |50 |
1 |B |3 |
1 |C |6 |
2 |C |2 |
1 |D |1 |
2 |D |2 |
Sum of count is 100.
As you can see a city has multiple tenants. If the sum of the count of a city is less than 5% of the total count then that city count should be added to another group named by the 'other' identifier while maintaining the tenant dimension. Resultant data should be.
tenant|city |count|
1 |A |36 |
2 |A |50 |
1 |C |6 |
2 |C |2 |
1 |other |4 | --> Addition of count of B city and count of D city for tenant 1
2 |other |2 | --> count of D city for tenant 2
I want to produce the same result for two databases PostgreSQL and Clickhouse. Any ideas on how to do this? Even if I will have the query to produce this result in either of the DB, I think it should not be difficult to create the query for other DB too. So answer for either database is acceptable.

You can do:
select tenant, grp as city, sum(cnt) as count
from (
select *,
case when sum(cnt) over(partition by city) >= 5
then city else 'Other' end as grp
from t
) x
group by grp, tenant
order by grp, tenant
Result:
tenant city count
------- ------ -----
1 A 36
2 A 50
1 C 6
2 C 2
1 Other 4
2 Other 2
See example at DB Fiddle.

Using a CTE, this query has to scan the table only once:
WITH tc AS (
SELECT tenant, city, sum(count)::int AS total
, sum(sum(count)) OVER (PARTITION BY city) AS city_count
FROM tbl
GROUP BY 1, 2
)
SELECT tenant, city, total
FROM tc
WHERE city_count >= 5
UNION ALL
SELECT tenant, 'other', sum(total)
FROM tc
WHERE city_count < 5
GROUP BY 1
db<>fiddle here
Not sure whether ClickHouse supports window functions over aggregate functions like Postgres does. See:
Postgres window function and group by exception
Their SQL reference does not address that explicitly. I does say this, though:
expressions involving window functions, e.g. (count(*) over ()) / 2)
not supported, wrap in a subquery (feature request)

Related

SQL DB2 Split result of group by based on count

I would like to split the result of a group by in several rows based on a count, but I don't know if it's possible. For instance, if I have a query like this :
SELECT doc.client, doc.template, COUNT(doc) FROM document doc GROUP BY doc.client, doc.template
and a table document with the following data :
ID | name | client | template
1 | doc_a | a | temp_a
2 | doc_b | a | temp_a
3 | doc_c | a | temp_a
4 | doc_d | a | temp_b
The result for the query would be :
client | template | count
a | temp_a | 3
a | temp_b | 1
But I would like to split a row of the result in two or more if the count is higher than 2 :
client | template | count
a | temp_a | 2
a | temp_a | 1
a | temp_b | 1
Is there a way to do this in SQL ?
You can use RCTE like below. Run this statement AS IS first playing with different values in the last column. Max batch size here is 1000.
WITH
GRP_RESULT (client, template, count) AS
(
-- Place your SELECT ... GROUP BY here
-- instead of VALUES
VALUES
('a', 'temp_a', 4500)
, ('a', 'temp_b', 3001)
)
, T (client, template, count, max_batch_size) AS
(
SELECT client, template, count, 1000
FROM GRP_RESULT
UNION ALL
SELECT client, template, count - max_batch_size, max_batch_size
FROM T
WHERE count > max_batch_size
)
SELECT client, template, CASE WHEN count > max_batch_size THEN max_batch_size ELSE count END count
FROM T
ORDER BY client, template, count DESC
The result is:
|CLIENT|TEMPLATE|COUNT |
|------|--------|-----------|
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |500 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1 |
You may place your SELECT ... GROUP BY statement as specified above afterwards to achieve your goal.
You can use window functions and then aggregate:
SELECT client, template, COUNT(*)
FROM (SELECT doc.client, doc.template,
ROW_NUMBER() OVER (PARTITION BY doc.client, doc.template ORDER BY doc.client) - 1 as seqnum,
COUNT(*) OVER (PARTITION BY doc.client, doc.template) as cnt
FROM document doc
) d
GROUP BY doc.client, doc.template, floor(seqnum * n / cnt)
The subquery enumerates the rows. The outer query then splits the rows into groups of two using MOD().

How can I count occasions of grouped values in a table?

I have the table in my postgres db below. I would like to know how many times the the values (name1, name2, name3) occur in the table where trial is 1.
In the case below the expected output:
name1, 4
name2, 3
name3, 2
+--------------+
| id|name|trial|
+--------------+
|1 |name1|1 |
|2 |name1|1 |
|3 |name1|1 |
|4 |name1|1 |
|5 |name2|1 |
|6 |name2|1 |
|7 |name2|1 |
|8 |name3|1 |
|9 |name3|1 |
What I tried so far:
SELECT count(C.NAME)
FROM FIRST AS C
WHERE NAME = (
SELECT CS.NAME
FROM FIRST AS CS
WHERE TRIAL = 1
GROUP BY CS.NAME
)
this query returns with 9, which is number of rows.
You're missing the group by clause. Also, the query can be simplified, try this:
SELECT count(1), Name
FROM FIRST
WHERE TRIAL = 1
GROUP BY Name

Max value from joined table

I have two tables:
Operations (op_id,super,name,last)
Orders (or_id,number)
Operations:
+--------------------------------+
|op_id| super| name | last|
+--------------------------------+
|1 1 OperationXX 1 |
|2 1 OperationXY 2 |
|3 1 OperationXC 4 |
|4 1 OperationXZ 3 |
|5 2 OperationXX 1 |
|6 3 OperationXY 2 |
|7 4 OperationXC 1 |
|8 4 OperationXZ 2 |
+--------------------------------+
Orders:
+--------------+
|or_id | number|
+--------------+
|1 2UY |
|2 23X |
|3 xx2 |
|4 121 |
+--------------+
I need query to get table:
+-------------------------------------+
|or_id |number |max(last)| name |
|1 2UY 4 OperationXC|
|2 23X 1 OperationXX|
|3 xx2 2 OperationXY|
|4 121 2 OperationXZ|
+-------------------------------------+
use corelared subquery and join
select o.*,a.last,a.name from
(
select super,name,last from Operations from operations t
where last = (select max(last) from operations t2 where t2.super=t.super)
) a join orders o on t1.super =o.or_id
you can use row_number as well
with cte as
(
select * from
(
select * , row_number() over(partition by super order by last desc) rn
from operations
) tt where rn=1
) select o.*,cte.last,cte.name from Orders o join cte on o.or_id=cte.super
SELECT Orders.or_id, Orders.number, Operations.name, Operations.last AS max
FROM Orders
INNER JOIN Operations on Operations.super = Orders.or_id
GROUP BY Orders.or_id, Orders.number, Operations.name;
I don't have a way of testing this right now, but I think this is it.
Also, you didn't specify the foreign key, so the join might be wrong.

Hive conditional count by resetting counter?

I have two hive tables, customers and transaction.
customer table
---------------------------------
customer_id | account_threshold
---------------------------------
101 | 200
102 | 500
transaction table
-------------------------------------------
transaction_date | customer_id | amount
-------------------------------------------
07/01/2018 101 250
07/01/2018 102 450
07/02/2018 101 500
07/03/2018 102 100
07/04/2018 102 50
Result:
------------------------------
customer_id | breach_count
------------------------------
101 2
102 1
I have to count the number of instances the sum of amount in transaction table exceeds the account_threshold in customer table.
When a breach is detected I reset the counter to 0.
For customer 101, the first transaction is above threshold so, the breach count is 1. Then again there is a breach for 101 in 3rd transaction. Hence, the total breach count for 101 is 2.
for customer 102, the first transaction(450) is below the threshold. Next transaction for 102 is $100 which breaches the threshold of 500, so breach_count will be 1.
I have tried windowing but I am not able to get any clue how to proceed by joining two tables.
You can try to write a subquery to get accumulate amount order by amount by customer_id, then Outer JOIN base on customer then Count
SELECT t.customer_id, COUNT(t.totle) breach_count
FROM customer c
LEFT JOIN
(
select t1.*,SUM(t1.amount) OVER(PARTITION BY t1.customer_id order by t1.amount) as totle
from transaction1 t1
) t on c.customer_id = t.customer_id
WHERE c.account_threshold < t.totle
GROUP BY t.customer_id
Here is a sqlfildde from Sqlserver, although different DBMS, but the windows function syntax is the same
[Results]:
| customer_id | breach_count |
|-------------|--------------|
| 101 | 2 |
| 102 | 1 |
To reset count/rank/sum whenever value changes
Input table :-
Time | value
12 |A
13 |A
14 |C
15 |C
16 |B
17 |B
18 |A
You just need to take lag to know about previous value
Step 1.Select *, lag(status) as lagval
Now compare lag value to actual value and if it differs take it 1 else 0 ( take this column as flag)
Step 2. Select * , case when lagval! = status then 1 else 0
Now do sum over flag take it as running sum - you will get sum values different for each group, group means whenver value changed its a new group
Step 3. Select *, sum(flag) over (order by time) flag_sum
Now just row number on each group
Step 4.Select Rownumber() over (partition by flag_sum order by time)
Final result
Time | value | lagval | flag | flag_sum | rownumber
12 |A | null | 1 | 1 | 1
13 |A | A |0 |1 |2
14 |C |A |1 |2 |1
15 |C | C |0 |2 |2
16 |B |C |1 | 3 |1
17 |B |B |0 |3 |2
18 |A |B |1 |4 |1
You can use sum / count in place of rownumber whatever you want to reset whenever value changes.

distinct item for each date

for each date, i have different items with unique ID, how to create a table to show each date with distinct item in each day?
date |item | Trade ID
1 |A | 123
2 |A | 124
1 |A | 125
3 |B | 126
1 |A | 127
2 |A | 128
3 |C | 129
1 |A | 130
desired results
date |item
1 |A
2 |A
3 |B
3 |C
i tried the following code, but i got an error msg
select date, distinct item
from mytable
it says found "distinct" expecting an identifier found a keyword
thank you!
select distinct date, item
from your_table
Try with DISTINCT like:
SELECT DISTINCT
date,
item
FROM elbat;
Or with GROUP BY like:
SELECT date,
item
FROM elbat
GROUP BY date,
item;
DISTINCT doesn't work on certain columns, but on the whole row you are selecting. It has to follow the SELECT keyword directly.
select distinct date, item from mytable;