distinct item for each date - sql

for each date, i have different items with unique ID, how to create a table to show each date with distinct item in each day?
date |item | Trade ID
1 |A | 123
2 |A | 124
1 |A | 125
3 |B | 126
1 |A | 127
2 |A | 128
3 |C | 129
1 |A | 130
desired results
date |item
1 |A
2 |A
3 |B
3 |C
i tried the following code, but i got an error msg
select date, distinct item
from mytable
it says found "distinct" expecting an identifier found a keyword
thank you!

select distinct date, item
from your_table

Try with DISTINCT like:
SELECT DISTINCT
date,
item
FROM elbat;
Or with GROUP BY like:
SELECT date,
item
FROM elbat
GROUP BY date,
item;

DISTINCT doesn't work on certain columns, but on the whole row you are selecting. It has to follow the SELECT keyword directly.
select distinct date, item from mytable;

Related

How to assign groups to another group in sql?

I have a grouped result/table:
tenant|city|count|
1 |A |36 |
2 |A |50 |
1 |B |3 |
1 |C |6 |
2 |C |2 |
1 |D |1 |
2 |D |2 |
Sum of count is 100.
As you can see a city has multiple tenants. If the sum of the count of a city is less than 5% of the total count then that city count should be added to another group named by the 'other' identifier while maintaining the tenant dimension. Resultant data should be.
tenant|city |count|
1 |A |36 |
2 |A |50 |
1 |C |6 |
2 |C |2 |
1 |other |4 | --> Addition of count of B city and count of D city for tenant 1
2 |other |2 | --> count of D city for tenant 2
I want to produce the same result for two databases PostgreSQL and Clickhouse. Any ideas on how to do this? Even if I will have the query to produce this result in either of the DB, I think it should not be difficult to create the query for other DB too. So answer for either database is acceptable.
You can do:
select tenant, grp as city, sum(cnt) as count
from (
select *,
case when sum(cnt) over(partition by city) >= 5
then city else 'Other' end as grp
from t
) x
group by grp, tenant
order by grp, tenant
Result:
tenant city count
------- ------ -----
1 A 36
2 A 50
1 C 6
2 C 2
1 Other 4
2 Other 2
See example at DB Fiddle.
Using a CTE, this query has to scan the table only once:
WITH tc AS (
SELECT tenant, city, sum(count)::int AS total
, sum(sum(count)) OVER (PARTITION BY city) AS city_count
FROM tbl
GROUP BY 1, 2
)
SELECT tenant, city, total
FROM tc
WHERE city_count >= 5
UNION ALL
SELECT tenant, 'other', sum(total)
FROM tc
WHERE city_count < 5
GROUP BY 1
db<>fiddle here
Not sure whether ClickHouse supports window functions over aggregate functions like Postgres does. See:
Postgres window function and group by exception
Their SQL reference does not address that explicitly. I does say this, though:
expressions involving window functions, e.g. (count(*) over ()) / 2)
not supported, wrap in a subquery (feature request)

SQL DB2 Split result of group by based on count

I would like to split the result of a group by in several rows based on a count, but I don't know if it's possible. For instance, if I have a query like this :
SELECT doc.client, doc.template, COUNT(doc) FROM document doc GROUP BY doc.client, doc.template
and a table document with the following data :
ID | name | client | template
1 | doc_a | a | temp_a
2 | doc_b | a | temp_a
3 | doc_c | a | temp_a
4 | doc_d | a | temp_b
The result for the query would be :
client | template | count
a | temp_a | 3
a | temp_b | 1
But I would like to split a row of the result in two or more if the count is higher than 2 :
client | template | count
a | temp_a | 2
a | temp_a | 1
a | temp_b | 1
Is there a way to do this in SQL ?
You can use RCTE like below. Run this statement AS IS first playing with different values in the last column. Max batch size here is 1000.
WITH
GRP_RESULT (client, template, count) AS
(
-- Place your SELECT ... GROUP BY here
-- instead of VALUES
VALUES
('a', 'temp_a', 4500)
, ('a', 'temp_b', 3001)
)
, T (client, template, count, max_batch_size) AS
(
SELECT client, template, count, 1000
FROM GRP_RESULT
UNION ALL
SELECT client, template, count - max_batch_size, max_batch_size
FROM T
WHERE count > max_batch_size
)
SELECT client, template, CASE WHEN count > max_batch_size THEN max_batch_size ELSE count END count
FROM T
ORDER BY client, template, count DESC
The result is:
|CLIENT|TEMPLATE|COUNT |
|------|--------|-----------|
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |500 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1 |
You may place your SELECT ... GROUP BY statement as specified above afterwards to achieve your goal.
You can use window functions and then aggregate:
SELECT client, template, COUNT(*)
FROM (SELECT doc.client, doc.template,
ROW_NUMBER() OVER (PARTITION BY doc.client, doc.template ORDER BY doc.client) - 1 as seqnum,
COUNT(*) OVER (PARTITION BY doc.client, doc.template) as cnt
FROM document doc
) d
GROUP BY doc.client, doc.template, floor(seqnum * n / cnt)
The subquery enumerates the rows. The outer query then splits the rows into groups of two using MOD().

How can I count occasions of grouped values in a table?

I have the table in my postgres db below. I would like to know how many times the the values (name1, name2, name3) occur in the table where trial is 1.
In the case below the expected output:
name1, 4
name2, 3
name3, 2
+--------------+
| id|name|trial|
+--------------+
|1 |name1|1 |
|2 |name1|1 |
|3 |name1|1 |
|4 |name1|1 |
|5 |name2|1 |
|6 |name2|1 |
|7 |name2|1 |
|8 |name3|1 |
|9 |name3|1 |
What I tried so far:
SELECT count(C.NAME)
FROM FIRST AS C
WHERE NAME = (
SELECT CS.NAME
FROM FIRST AS CS
WHERE TRIAL = 1
GROUP BY CS.NAME
)
this query returns with 9, which is number of rows.
You're missing the group by clause. Also, the query can be simplified, try this:
SELECT count(1), Name
FROM FIRST
WHERE TRIAL = 1
GROUP BY Name

Hive conditional count by resetting counter?

I have two hive tables, customers and transaction.
customer table
---------------------------------
customer_id | account_threshold
---------------------------------
101 | 200
102 | 500
transaction table
-------------------------------------------
transaction_date | customer_id | amount
-------------------------------------------
07/01/2018 101 250
07/01/2018 102 450
07/02/2018 101 500
07/03/2018 102 100
07/04/2018 102 50
Result:
------------------------------
customer_id | breach_count
------------------------------
101 2
102 1
I have to count the number of instances the sum of amount in transaction table exceeds the account_threshold in customer table.
When a breach is detected I reset the counter to 0.
For customer 101, the first transaction is above threshold so, the breach count is 1. Then again there is a breach for 101 in 3rd transaction. Hence, the total breach count for 101 is 2.
for customer 102, the first transaction(450) is below the threshold. Next transaction for 102 is $100 which breaches the threshold of 500, so breach_count will be 1.
I have tried windowing but I am not able to get any clue how to proceed by joining two tables.
You can try to write a subquery to get accumulate amount order by amount by customer_id, then Outer JOIN base on customer then Count
SELECT t.customer_id, COUNT(t.totle) breach_count
FROM customer c
LEFT JOIN
(
select t1.*,SUM(t1.amount) OVER(PARTITION BY t1.customer_id order by t1.amount) as totle
from transaction1 t1
) t on c.customer_id = t.customer_id
WHERE c.account_threshold < t.totle
GROUP BY t.customer_id
Here is a sqlfildde from Sqlserver, although different DBMS, but the windows function syntax is the same
[Results]:
| customer_id | breach_count |
|-------------|--------------|
| 101 | 2 |
| 102 | 1 |
To reset count/rank/sum whenever value changes
Input table :-
Time | value
12 |A
13 |A
14 |C
15 |C
16 |B
17 |B
18 |A
You just need to take lag to know about previous value
Step 1.Select *, lag(status) as lagval
Now compare lag value to actual value and if it differs take it 1 else 0 ( take this column as flag)
Step 2. Select * , case when lagval! = status then 1 else 0
Now do sum over flag take it as running sum - you will get sum values different for each group, group means whenver value changed its a new group
Step 3. Select *, sum(flag) over (order by time) flag_sum
Now just row number on each group
Step 4.Select Rownumber() over (partition by flag_sum order by time)
Final result
Time | value | lagval | flag | flag_sum | rownumber
12 |A | null | 1 | 1 | 1
13 |A | A |0 |1 |2
14 |C |A |1 |2 |1
15 |C | C |0 |2 |2
16 |B |C |1 | 3 |1
17 |B |B |0 |3 |2
18 |A |B |1 |4 |1
You can use sum / count in place of rownumber whatever you want to reset whenever value changes.

2 column with same ID to 1 row

I have a table with only 2 column which is as follow
|ID | Date |
===================
|1 | 03/04/2017 |
|1 | 09/07/1997 |
|2 | 04/04/2014 |
I want to achieve an end result as follow
|ID | Date 1 |Date 2 |
================================
|1 | 03/04/2017 | 09/07/1997 |
|2 | 04/04/2014 | NULL |
I'm currently reading up on PIVOT function and I'm not sure am I on the right track. Am still new to SQL
A simple pivot query should work here, with a twist. For your ID 2 data, there is only one row, but in this case you want to report a first date and a NULL second date. We can use a CASE expression to handle this case.
SELECT
ID,
MAX(Date) AS date_1,
CASE WHEN COUNT(*) = 2 THEN MIN(Date) ELSE NULL END AS date_2
FROM yourTable
GROUP BY ID
Output:
Demo here:
Rextester
This can be done easily using min/max aggregate function
select Id,min(Date),
case when min(Date)<>max(Date) then max(Date) end
From yourtable
Group by Id
If this will not help you with your original data, then alter sample data and expected result