How to group query results based on date in one month - sql

I have database looks like below:
ID |DATETIME |T_NUMBER|SOLD|STORE_ID|
---+----------+--------+----+--------+
1 |2019-02-01|1111 |10 |STORE_1
-------------------------------------|
2 |2019-02-01|1112 |5 |STORE_1
-------------------------------------|
3 |2019-02-02|1113 |10 |STORE_1
-------------------------------------|
4 |2019-02-02|1114 |7 |STORE_1
-------------------------------------|
5 |2019-02-02|1115 |3 |STORE_1
-------------------------------------|
6 |2019-02-03|1116 |4 |STORE_1
-------------------------------------| etc.
And the result that what i want looks like below:
STORE | 1 | 2 | 3 | 4 | 5 | ..... |28|
-------+---+---+---+---+---+-------+--+
STORE_1| 2 | 3 | 1 | 0 | 0 | ..... |0 |
---------------------------------------
STORE_2| X | X | X | X | X | ..... |X |
A little bit explanation: Number 1, 2 ,3 ... 28 in the header means DATE in feb.
Number 2,3,1,0 .... 0 that means Sum of transactions per DATE. The report that i want in one month. Store_2 means if any store data that i have in the future.
My T-SQL looks like below (absolutelly wrong)
select SUM(T_NUMBER) as 'Total'
from store_logs
group by cast(time as date)
Thanks a lot

You can use PIVOT operator.
Sum of transactions per DATE do you mean you wanted the COUNT of number of transactions per day ? If this is the case change SUM (T_NUMBER) to COUNT (T_NUMBER)
SELECT *
FROM (
SELECT [STORE_ID], [DAY] = DATEPART (DAY , [DATETIME])
FROM store_logs
) AS D
PIVOT
(
SUM (T_NUMBER)
FOR DAY IN ([1], [2], [3], [4], [5], ... [31])
) AS P

Related

How to assign groups to another group in sql?

I have a grouped result/table:
tenant|city|count|
1 |A |36 |
2 |A |50 |
1 |B |3 |
1 |C |6 |
2 |C |2 |
1 |D |1 |
2 |D |2 |
Sum of count is 100.
As you can see a city has multiple tenants. If the sum of the count of a city is less than 5% of the total count then that city count should be added to another group named by the 'other' identifier while maintaining the tenant dimension. Resultant data should be.
tenant|city |count|
1 |A |36 |
2 |A |50 |
1 |C |6 |
2 |C |2 |
1 |other |4 | --> Addition of count of B city and count of D city for tenant 1
2 |other |2 | --> count of D city for tenant 2
I want to produce the same result for two databases PostgreSQL and Clickhouse. Any ideas on how to do this? Even if I will have the query to produce this result in either of the DB, I think it should not be difficult to create the query for other DB too. So answer for either database is acceptable.
You can do:
select tenant, grp as city, sum(cnt) as count
from (
select *,
case when sum(cnt) over(partition by city) >= 5
then city else 'Other' end as grp
from t
) x
group by grp, tenant
order by grp, tenant
Result:
tenant city count
------- ------ -----
1 A 36
2 A 50
1 C 6
2 C 2
1 Other 4
2 Other 2
See example at DB Fiddle.
Using a CTE, this query has to scan the table only once:
WITH tc AS (
SELECT tenant, city, sum(count)::int AS total
, sum(sum(count)) OVER (PARTITION BY city) AS city_count
FROM tbl
GROUP BY 1, 2
)
SELECT tenant, city, total
FROM tc
WHERE city_count >= 5
UNION ALL
SELECT tenant, 'other', sum(total)
FROM tc
WHERE city_count < 5
GROUP BY 1
db<>fiddle here
Not sure whether ClickHouse supports window functions over aggregate functions like Postgres does. See:
Postgres window function and group by exception
Their SQL reference does not address that explicitly. I does say this, though:
expressions involving window functions, e.g. (count(*) over ()) / 2)
not supported, wrap in a subquery (feature request)

How to subset the readmitted cases from an inpatients’ table to calculate the total length of stay of the readmitted cases in SQL Server 17?

I am working with an inpatients' data table that looks like the following:
ID | AdmissionDate |DischDate |LOS |Readmitted30days
+------+-------+-------------+---------------+---------------+
|001 | 2014-01-01 | 2014-01-12 |11 |1
|101 | 2014-02-05 | 2014-02-12 |7 |1
|001 | 2014-02-18 | 2018-02-27 |9 |1
|001 | 2018-02-01 | 2018-02-13 |12 |0
|212 | 2014-01-28 | 2014-02-12 |15 |1
|212 | 2014-03-02 | 2014-03-15 |13 |0
|212 | 2016-12-23 | 2016-12-29 |4 |0
|1011 | 2017-06-10 | 2017-06-21 |11 |0
|401 | 2018-01-01 | 2018-01-11 |10 |0
|401 | 2018-10-01 | 2018-10-10 |9 |0
I want to create another table from the above in which the total length of stay (LOS) is summed up for those who have been readmitted within 30 days. The table I want to create looks like the following:
ID |Total LOS
+------+-----------
|001 |39
|212 |28
|212 |4
|1011 |11
|401 |10
|401 |9
I am using SQL Server Version 17.
Could anyone help me do this?
Thanks in advance
The Readmitted30days column seems irrelevant to the question and a complete red herring. What you seem to want is to aggregate rows which are within 30 days of each other.
This is a type of gaps-and-islands problem. There are a number of solutions, here is one:
We use LAG to check whether the previous DischDate is within 30 days of this AdmissionDate
Based on that we assign a grouping ID by doing a running count
Then simply group by ID and our grouping ID, and sum
The dates and LOS don't seem to match up, so I've given you both
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN
DATEADD(day, -30, AdmissionDate) <
LAG(DischDate) OVER (PARTITION BY ID ORDER BY DischDate)
THEN 1 END
FROM YourTable
),
Groupings AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY ID ORDER BY DischDate ROWS UNBOUNDED PRECEDING)
FROM StartPoints
)
SELECT
ID,
TotalBasedOnDates = SUM(DATEDIFF(day, AdmissionDate, DischDate)), -- do you need to add 1 within the sum?
TotalBasedOnLOS = SUM(LOS)
FROM Groupings
GROUP BY ID, GroupID;
db<>fiddle
if I understand correctly :
select Id, sum(LOS)
from tablename
where Readmitted30days = 1
group by Id
You want to use aggregation:
select id, sum(los)
from t
group by id
having max(Readmitted30days) = 1;
This filters after the aggregation so all los values are included in the sum.
EDIT:
I think I understand. Every occasion where Readmitted30days = 0, you want a row in the result set that combines that row with the following rows up to the next matching row.
If that interpretation is correct, you can construct groups using a cumulative sum and then aggregate:
select id, sum(los)
from (select t.*,
sum(1 - Readmitted30days = 0) over (partition by id order by admissiondate) as grp
from t
) t
group by id, grp;

SQL DB2 Split result of group by based on count

I would like to split the result of a group by in several rows based on a count, but I don't know if it's possible. For instance, if I have a query like this :
SELECT doc.client, doc.template, COUNT(doc) FROM document doc GROUP BY doc.client, doc.template
and a table document with the following data :
ID | name | client | template
1 | doc_a | a | temp_a
2 | doc_b | a | temp_a
3 | doc_c | a | temp_a
4 | doc_d | a | temp_b
The result for the query would be :
client | template | count
a | temp_a | 3
a | temp_b | 1
But I would like to split a row of the result in two or more if the count is higher than 2 :
client | template | count
a | temp_a | 2
a | temp_a | 1
a | temp_b | 1
Is there a way to do this in SQL ?
You can use RCTE like below. Run this statement AS IS first playing with different values in the last column. Max batch size here is 1000.
WITH
GRP_RESULT (client, template, count) AS
(
-- Place your SELECT ... GROUP BY here
-- instead of VALUES
VALUES
('a', 'temp_a', 4500)
, ('a', 'temp_b', 3001)
)
, T (client, template, count, max_batch_size) AS
(
SELECT client, template, count, 1000
FROM GRP_RESULT
UNION ALL
SELECT client, template, count - max_batch_size, max_batch_size
FROM T
WHERE count > max_batch_size
)
SELECT client, template, CASE WHEN count > max_batch_size THEN max_batch_size ELSE count END count
FROM T
ORDER BY client, template, count DESC
The result is:
|CLIENT|TEMPLATE|COUNT |
|------|--------|-----------|
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |500 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1 |
You may place your SELECT ... GROUP BY statement as specified above afterwards to achieve your goal.
You can use window functions and then aggregate:
SELECT client, template, COUNT(*)
FROM (SELECT doc.client, doc.template,
ROW_NUMBER() OVER (PARTITION BY doc.client, doc.template ORDER BY doc.client) - 1 as seqnum,
COUNT(*) OVER (PARTITION BY doc.client, doc.template) as cnt
FROM document doc
) d
GROUP BY doc.client, doc.template, floor(seqnum * n / cnt)
The subquery enumerates the rows. The outer query then splits the rows into groups of two using MOD().

Hive conditional count by resetting counter?

I have two hive tables, customers and transaction.
customer table
---------------------------------
customer_id | account_threshold
---------------------------------
101 | 200
102 | 500
transaction table
-------------------------------------------
transaction_date | customer_id | amount
-------------------------------------------
07/01/2018 101 250
07/01/2018 102 450
07/02/2018 101 500
07/03/2018 102 100
07/04/2018 102 50
Result:
------------------------------
customer_id | breach_count
------------------------------
101 2
102 1
I have to count the number of instances the sum of amount in transaction table exceeds the account_threshold in customer table.
When a breach is detected I reset the counter to 0.
For customer 101, the first transaction is above threshold so, the breach count is 1. Then again there is a breach for 101 in 3rd transaction. Hence, the total breach count for 101 is 2.
for customer 102, the first transaction(450) is below the threshold. Next transaction for 102 is $100 which breaches the threshold of 500, so breach_count will be 1.
I have tried windowing but I am not able to get any clue how to proceed by joining two tables.
You can try to write a subquery to get accumulate amount order by amount by customer_id, then Outer JOIN base on customer then Count
SELECT t.customer_id, COUNT(t.totle) breach_count
FROM customer c
LEFT JOIN
(
select t1.*,SUM(t1.amount) OVER(PARTITION BY t1.customer_id order by t1.amount) as totle
from transaction1 t1
) t on c.customer_id = t.customer_id
WHERE c.account_threshold < t.totle
GROUP BY t.customer_id
Here is a sqlfildde from Sqlserver, although different DBMS, but the windows function syntax is the same
[Results]:
| customer_id | breach_count |
|-------------|--------------|
| 101 | 2 |
| 102 | 1 |
To reset count/rank/sum whenever value changes
Input table :-
Time | value
12 |A
13 |A
14 |C
15 |C
16 |B
17 |B
18 |A
You just need to take lag to know about previous value
Step 1.Select *, lag(status) as lagval
Now compare lag value to actual value and if it differs take it 1 else 0 ( take this column as flag)
Step 2. Select * , case when lagval! = status then 1 else 0
Now do sum over flag take it as running sum - you will get sum values different for each group, group means whenver value changed its a new group
Step 3. Select *, sum(flag) over (order by time) flag_sum
Now just row number on each group
Step 4.Select Rownumber() over (partition by flag_sum order by time)
Final result
Time | value | lagval | flag | flag_sum | rownumber
12 |A | null | 1 | 1 | 1
13 |A | A |0 |1 |2
14 |C |A |1 |2 |1
15 |C | C |0 |2 |2
16 |B |C |1 | 3 |1
17 |B |B |0 |3 |2
18 |A |B |1 |4 |1
You can use sum / count in place of rownumber whatever you want to reset whenever value changes.

Sql Server Aggregation or Pivot Table Query

I'm trying to write a query that will tell me the number of customers who had a certain number of transactions each week. I don't know where to start with the query, but I'd assume it involves an aggregate or pivot function. I'm working in SqlServer management studio.
Currently the data is looks like where the first column is the customer id and each subsequent column is a week :
|Customer| 1 | 2| 3 |4 |
----------------------
|001 |1 | 0| 2 |2 |
|002 |0 | 2| 1 |0 |
|003 |0 | 4| 1 |1 |
|004 |1 | 0| 0 |1 |
I'd like to see a return like the following:
|Visits |1 | 2| 3 |4 |
----------------------
|0 |2 | 2| 1 |0 |
|1 |2 | 0| 2 |2 |
|2 |0 | 1| 1 |1 |
|4 |0 | 1| 0 |0 |
What I want is to get the count of customer transactions per week. E.g. during the 1st week 2 customers (i.e. 002 and 003) had 0 transactions, 2 customers (i.e. 001 and 004) had 1 transaction, whereas zero customers had more than 1 transaction
The query below will get you the result you want, but note that it has the column names hard coded. It's easy to add more week columns, but if the number of columns is unknown then you might want to look into a solution using dynamic SQL (which would require accessing the information schema to get the column names). It's not that hard to turn it into a fully dynamic version though.
select
Visits
, coalesce([1],0) as Week1
, coalesce([2],0) as Week2
, coalesce([3],0) as Week3
, coalesce([4],0) as Week4
from (
select *, count(*) c from (
select '1' W, week1 Visits from t union all
select '2' W, week2 Visits from t union all
select '3' W, week3 Visits from t union all
select '4' W, week4 Visits from t ) a
group by W, Visits
) x pivot ( max (c) for W in ([1], [2], [3], [4]) ) as pvt;
In the query your table is called t and the output is:
Visits Week1 Week2 Week3 Week4
0 2 2 1 1
1 2 0 2 2
2 0 1 1 1
4 0 1 0 0