Count repeat records per day (without window functions) - sql

I'm trying to get a count of repeat customer records per day and I'm having a bit of trouble using MariaDB 10.1 as window functions weren't introduced until 10.2 (therefore no partitioning, rank, etc)
I have an example set of data that looks like this:
| Date | Country | Type | Email | Response_Time |
| ---------- | ------- | --------- | ------------- | ------------- |
| 2021-05-21 | AU | Enquiry | bill#fake.com | 910 |
| 2021-05-21 | AU | Enquiry | bill#fake.com | 1050 |
| 2021-05-21 | NZ | Complaint | jim#fake.com | 56 |
| 2021-05-22 | NZ | Enquiry | jim#fake.com | 1000 |
| 2021-05-22 | NZ | Enquiry | jim#fake.com | 845 |
| 2021-05-22 | NZ | Enquiry | jim#fake.com | 700 |
| 2021-05-22 | NZ | Complaint | jim#fake.com | 217 |
| 2021-05-23 | UK | Enquiry | jane#fake.com | 843 |
| 2021-05-23 | NZ | Enquiry | jim#fake.com | 1795 |
| 2021-05-23 | NZ | Enquiry | jim#fake.com | 521 |
| 2021-05-23 | AU | Complaint | bill#fake.com | 150 |
The above can be produced with the following query:
SELECT
DATE(Start_Time) AS "Date",
Country,
Type,
Email,
Response_Time
FROM EMAIL_DETAIL
WHERE DATE(Start_Time) BETWEEN '2021-05-21' AND '2021-05-23'
AND COUNTRY IN ('AU','NZ','UK')
;
I'd like to get a count of email addresses that appear more than once in the group of day, country and type, and display it as a summary like this:
| Country | Type | Volume | Avg_Response_Time | Repeat_Daily |
| ------- | --------- | ------ | ----------------- | ------------ |
| AU | Enquiry | 2 | 980 | 1 |
| AU | Complaint | 1 | 150 | 0 |
| NZ | Enquiry | 5 | 972 | 3 |
| NZ | Complaint | 1 | 137 | 0 |
| UK | Enquiry | 1 | 843 | 0 |
The repeat daily count is a count of records where the email address appeared more than once in the group of date, country and type. Volume is the total count of records per country and type.
I'm having a hard time with the lack of window functions in this version of MariaDB and any help would really be appreciated.
(Apologies for the tables formatted as code, I was getting a formatting error when trying to post otherwise)

Hmmm . . . I think this is two levels of aggregation:
SELECT country, type, SUM(cnt) as volume,
SUM(Total_Response_Time) / SUM(cnt) as avg_Response_time,
SUM(CASE WHEN cnt > 1 THEN cnt ELSE 0 END) as repeat_daily
FROM (SELECT DATE(Start_Time) AS "Date", Country, Type, Email,
SUM(Response_Time) as Total_Response_Time, COUNT(*) as cnt
FROM EMAIL_DETAIL
WHERE DATE(Start_Time) BETWEEN '2021-05-21' AND '2021-05-23' AND
COUNTRY IN ('AU','NZ','UK')
GROUP BY date, country, type, email
) ed
GROUP BY country, type

select Date,Country, Type, AVG(Response_Time) ,sum(cc) as Volumn, sum(case when cc>1 then 1 end) as Repeat_Daily
from (
SELECT
DATE(Start_Time) AS "Date",
Country,
Type,
count(email) cc,
AVG(Response_Time) Response_Time
FROM EMAIL_DETAIL
WHERE DATE(Start_Time) BETWEEN '2021-05-21' AND '2021-05-23'
AND COUNTRY IN ('AU','NZ','UK')
group by
DATE(Start_Time) AS "Date",
Country,Type, email
)
group by "Date",Country,Type

Related

Get row for each unique user based on highest column value

I have the following data
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
+--------+-----------+--------+
I am trying to find the distribution of each user's Rating, based on their latest row in the table (latest is determined by Timestamp). On the path to that, I am trying to get a list of user IDs and Ratings which would look like the following
+--------+--------+
| UserId | Rating |
+--------+--------+
| 1 | 1198 |
| 2 | 1202 |
+--------+--------+
Trying to get here, I sorted the list on UserId and Timestamp (desc) which gives the following.
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
+--------+-----------+--------+
So now I just need to take the top N rows, where N is the number of players. But, I can't do a LIMIT statement as that needs a constant expression, as I want to use count(id) as the input for LIMIT which doesn't seem to work.
Any suggestions on how I can get the data I need?
Cheers!
Andy
This should work:
SELECT test.UserId, Rating FROM test
JOIN
(select UserId, MAX(Timestamp) Timestamp FROM test GROUP BY UserId) m
ON test.UserId = m.UserId AND test.Timestamp = m.Timestamp
If you can use WINDOW FUNCTIONS then you can use the following:
SELECT UserId, Rating FROM(
SELECT UserId, Rating, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Timestamp DESC) row_num FROM test
)m WHERE row_num = 1

How to count results from each column transposing in rows using window functions

I have query
SELECT
cn.country_name,
pbi.gender,
pbi.first_name,
pbi.last_name, pbi.iq_level,
pf.dominate_feature, pv.political_view,
—RANK() OVER(ORDER BY gender DESC) AS "rank",--1,529
—RANK() OVER(ORDER BY iq_level DESC) AS "iq_rank",--rank by iq
—ROW_NUMBER() OVER(ORDER BY 1) AS rownum, — how many rows, numere each row
RANK() OVER(PARTITION BY first_name ORDER BY political_view DESC) AS rnk_pol_view
FROM person_basic_info pbi
JOIN country_names cn ON cn.id = pbi.country_id
JOIN persons_features pf ON pf.person_id = pbi.id
JOIN political_views pv ON pv.person_id = pbi.id;
Result of it is
+--------------+--------+-------------+---------------+----------+-------------------+-------------------------+--------------+
| country_name | gender | first_name | last_name | iq_level | dominate_feature | political_view | rnk_pol_view |
+--------------+--------+-------------+---------------+----------+-------------------+-------------------------+--------------+
| Yemen | Male | Abeu | Flieg | 118 | Conscientiousness | Liberal feminism | 1 |
| Yemen | Female | Adeline | Munt | 101 | Conscientiousness | Classical liberalism | 1 |
| Yemen | Female | Adey | Jobbing | 145 | Openness | Ordoliberalism | 1 |
| Yemen | Female | Adore | Dorwood | 105 | Conscientiousness | Neoliberalism | 1 |
| Yemen | Female | Adrianna | Wardhaugh | 125 | Agreeableness | Individualism | 1 |
| Yemen | Male | Adriano | Grieswood | 160 | Agreeableness | Neoliberalism | 1 |
| Yemen | Female | Afton | Kleanthous | 87 | Extraversion | Market liberalism | 1 |
| Yemen | Male | Aguie | Lampbrecht | 138 | Conscientiousness | Liberal feminism | 1 |
| Yemen | Male | Aguistin | Basnett | 145 | Extraversion | Ordoliberalism | 1 |
| Yemen | Male | Ahmad | Billingham | 122 | Agreeableness | Neoliberalism | 1 |
| Yemen | Female | Aime | Adrianello | 111 | Agreeableness | Liberal feminism | 1 |
+--------------+--------+-------------+---------------+----------+-------------------+-------------------------+--------------+
but
desired result is:
+--------+-----------------+-------------------+-------------------+-------------------+-------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+--------+
| Gender | Count_by_gender | Conscientiousness | Openness | Agreeableness | Extraversion | Liberal feminism | Classical liberalism | Ordoliberalism | Neoliberalism | Individualism | Market liberalism | Avg_Iq |
+--------+-----------------+-------------------+-------------------+-------------------+-------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+--------+
| Female | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | avg() |
| Male | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*) | count(*)
| count(*) | count(*) | avg() |
+--------+-----------------+-------------------+-------------------+-------------------+-------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+--------+
Could you help to do it using window functions?
I don't see where window functions fit in. I think you want conditional aggregation:
SELECT pbi.gender, COUNT(*),
COUNT(*) FILTER (WHERE gender = 'Female') as female,
COUNT(*) FILTER (WHERE gender = 'Male') as male,
COUNT(*) FILTER (WHERE pf.dominate_feature = 'Conscientiousness') as Conscientiousness,
. . .
FROM person_basic_info pbi JOIN
country_names cn
ON cn.id = pbi.country_id JOIN
persons_features pf
ON pf.person_id = pbi.id
GROUP BY gender

How can I add a new value to an existing column without writing the new value to the table?

I have the below table1:
| yyyy_mm_dd | id | feature | status |
|------------|----|-----------------|---------------|
| 2019-05-13 | 2 | pricing | implemented |
| 2019-05-13 | 2 | pricing | first_contact |
| 2019-05-13 | 5 | reviews | implemented |
| 2019-05-13 | 5 | pricing | implemented |
| 2019-05-13 | 6 | reviews | first_contact |
| 2019-05-13 | 6 | reviews | implemented |
| 2019-05-13 | 6 | promotions_geo | first_contact |
| 2019-05-13 | 6 | prop_management | first_contact |
There are two statuses, implemented and first_contact. I want to introduce a third which will be no_contact. This will be the total count of ids minus the sum of ids in implemented and first_contact status.
I can get the total number of ids from a secondary table like so:
select
count(id)
from
table2
So I've tried to union the above so I can get the total count of IDs which can then be subtracted:
select
yyyy_mm_dd,
feature,
count(s.id) as implemented_and_first_contact_total,
null as total_ids
from
table1 s
where
s.yyyy_mm_dd = '2020-05-06'
group by
1,2,4
union all
select
null as yyyy_mm_dd,
null as feature,
null as implemented_and_first_contact_total,
count(id) as total_ids
from
table2
Now I'm unsure how I can subtract implemented_and_first_contact_total from total_ids in order to get a value for no_contact and have this as a value within status column. Maybe a union isn't correct to use here?
Edit: output. Say it turns out there are 300 total ids. The output would look like this:
| yyyy_mm_dd | feature | status | id_count |
|------------|-----------------|---------------|----------|
| 2019-05-13 | pricing | implemented | 2 |
| 2019-05-13 | pricing | first_contact | 1 |
| 2019-05-13 | pricing | no_contact | 297 |
| 2019-05-13 | reviews | implemented | 2 |
| 2019-05-13 | reviews | first_contact | 1 |
| 2019-05-13 | reviews | no_contact | 297 |
| 2019-05-13 | promotions_geo | first_contact | 1 |
| 2019-05-13 | promotions_geo | no_contact | 299 |
| 2019-05-13 | prop_management | first_contact | 1 |
| 2019-05-13 | prop_management | no_contact | 299 |
Is this what you want?
select yyyy_mm_dd,
(count(distinct id) -
count(distinct case when status in ('implemented', 'first_contact') then id end)
) as no_contact
from t
group by yyyy_mm_dd
Update: Removed uncorrelated subquery from SELECT and added cross join
Try this:
select yyyy_mm_dd, feature, status,
count(id) as id_count
from table1
group by yyyy_mm_dd, feature, status
union all
select yyyy_mm_dd, feature, 'no_contact' as status,
(cnt - count(id)) as id_count
from table1 cross join (select count(id) as cnt from table2)
group by yyyy_mm_dd, feature, cnt;

Count Distinct Over Multiple Columns

I have two CTEs . The following is the output of my first CTE.
| ORDER_NUMBER | ORDER_FLAG | EMPLOYEE | PRODUCT_CATEGORY | SALES |
|--------------|------------|----------|------------------|--------|
| 3158132 | 1 | Don | Newpaper Ad | 16.00 |
| 3158132 | 1 | Don | Magazine Ad | 15.00 |
| 3158132 | 0 | Don | TV Ad | 0.00 |
| 3158132 | 1 | Don | Billboard Ad | 56.00 |
| 3006152 | 1 | Roger | TV Ad | 20.00 |
| 3006152 | 0 | Roger | Magazine Ad | 0.00 |
| 3006152 | 1 | Roger | Newspaper Ad | 214.00 |
| 3012681 | 1 | Ken | TV Ad | 130.00 |
| 3012681 | 0 | Ken | Magazine Ad | 0.00 |
| 9818123 | 1 | Pete | Billboard Ad | 200.00 |
I'm attempting to count the distinct order numbers and the sales amount by employee. The order flag will be either 1 or a 0. If sales are greater than 0.00 the order flag will be set to 1.
My desired output.
| Employee | Sales | Orders |
|----------|--------|--------|
| Don | 87.00 | 1 |
| Ken | 130.00 | 1 |
| Pete | 200.00 | 1 |
| Roger | 234.00 | 1 |
I was attempting to do a combination of distinct, case, and concat statements without any luck. Any thoughts?
You can use this:
with cteTotalSales (...) as ()
select employee,
case when (sum(sales)) > 0
then 1 else 0 as Orders,
sum(sales)
from cteTotalSales
group by employee
This should be as simple as :
with cte as (...)
select
employee,
sum(sales),
count(distinct order_number)
from cte
group by employee
This query would work for you
SELECT
EMPLOYEE,
SUM(SALES) SALES,
1 AS ORDERS
FROM
YOUR_TABLE
GROUP BY
EMPLOYEE
you can replace your subquery with YOUR_TABLE.
SELECT
EMPLOYEE,
SUM(SALES) SALES,
1 AS ORDERS
FROM
(
SELECT * FROM ...
)
GROUP BY
EMPLOYEE

SQL Multiple count columns with multiple conditionS

I am trying to gather some basic statistics from a table "Data_Table" that gets updated on a daily basis. Each row represents a case, which can be opened/closed/cancelled by an operator with a unique ID. I want to be able to show the count for the actions that each operator did the previous day. So getting from Data_Table to Ideal table.
Data_Table
| LOCATION | DATE | REFERENCE | OPENED_ID | CLOSED_ID | CANCELLED_ID |
| NYC | 20180102 | 123451 | 123 | 234 | 0 |
| TEX | 20180102 | 123452 | 345 | 123 | 0 |
| NYC | 20180102 | 123453 | 345 | 0 | 123 |
| TEX | 20180102 | 123453 | 234 | 0 | 123 |
Ideal Table
| LOCATION | DATE | USER_ID | OPEN | CLOSED | CANCELLED |
| NYC | 20180102 | 123 | 1 | 0 | 1 |
| NYC | 20180102 | 234 | 0 | 1 | 0 |
| NYC | 20180102 | 345 | 1 | 0 | 0 |
| TEX | 20180102 | 123 | 0 | 1 | 1 |
| TEX | 20180102 | 234 | 1 | 0 | 0 |
| TEX | 20180102 | 345 | 1 | 0 | 0 |
User 123 opened 1 case and cancelled 1 case in location NYC on date 20180102...etc.
I have made a few small queries for each action in each site that looks like this:
SELECT LOCATION, DATE, OPENED_ID, COUNT(DISTINCT [DATA_TABLE].REFERENCE)
FROM [DATA_TABLE]
WHERE DATE = CONVERT(DATE,GETDATE()-1)
AND LOCATION = 'NYC'
AND OPENED_ID in (SELECT NYC FROM [OP_ID_TABLE]WHERE [DATE FINISH] > GETDATE() )
GROUP BY OPENED_ID, LOCATION, DATE
ORDER BY LOCATION
And then repeat this query for each location for each operator action. After which I do some messy vlookups in excel to organise it into the Ideal table format, which on a daily basis is ..not ideal.
I've tried to make some sum functions but haven't had any luck.
Any help would be much appreciated.
You need to unpivot and re-aggregate. One method uses union all and group by:
select location, date, user_id,
sum(opened) as opens, sum(closed) as closes, sum(cancelled) as cancels
from ((select location, date, opened_id as user_id, 1 as opened, 0 as closed, 0 as cancelled
from t
) union all
(select location, date, closed_id as user_id, 0 as opened, 1 as closed, 0 as cancelled
from t
) union all
(select location, date, cancelled_id as user_id, 0 as opened, 0 as closed, 1 as cancelled
from t
)
) t
group by location, date, user_id;
There are other methods for doing these operations, depending on the database. However, this is ANSI-standard syntax.