Count total without duplicate records - sql

I have a table that contains the following columns: TrackingStatus, Year, Month, Order, Notes
I need to calculate the total number of tracking status for each year and month.
For example, if the table contains the following orders:
TrackingStatus
Year
Month
Order
Notes
F
2020
1
33
F
2020
1
33
DFF
E
2020
2
36
xxx
A
2021
3
34
X1
A
2021
3
34
DD
A
2021
3
88
A
2021
2
45
The result should be:
• Tracking F , year 2020, month 1 the total will be one (because it's the same year, month, and order).
• Tracking A , year 2021, month 2 the total will be one. (because there is only one record with the same year, month, and order).
• Tracking A , year 2021, month 3 the total will be two. (because there are two orders within the same year and month).
So the expected SELECT output will be like that:
TrackingStatus
Year
Month
Total
F
2020
1
1
E
2020
2
1
A
2021
2
1
A
2021
3
2
I was trying to use group by but then it will count the number of records which in my scenario is wrong.
How can I get the total orders for each month without counting “duplicate” records?
Thank you

You can use a COUNT DISTINCT aggregation function, whereas the COUNT allows you to count the values, but the DISTINCT condition will allow each value only once.
SELECT TrackingStatus,
Year,
Month,
COUNT(DISTINCT Order) AS Total
FROM tab
GROUP BY TrackingStatus,
Year,
Month
ORDER BY Year,
Month
Here you can find a tested solution in a MySQL environment, though this should work with many DBMS.

Related

Is there a way to count distinct from first record to last day of each month? BigQuery

I am trying to compute the total of customer base from 2018-01-01 till last day of the months this year to achieve a month on month look. For instance, for the month of Jan in 2022, it will be the total count of distinct customers from 2018-01-01 to 2022-01-30. For the month of feb in 2022, it will be total count of distinct customers from 2018-01-01 to 2022-02-29. Could someone enlighten me?
select count(distinct customername) from table
where billingdate between "2018-01-01" and "2022-01-30";
currently, I only get the result for first month.
result
I think you are expecting cumulative customer count month wise,
example: in jan 2018 the customer count is 10 and in feb 2018 count is 20
jan 2018 - 10
feb 2018 - 20
what you need is
jan 2018 - 10
feb 2018 - 30 <--
In this case, group the dates and use 'over' clause, to get the cumulative count
select year_month_date,sum(customer_count) over(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cumulative_customer_count_from_jan_2018 from (select year_month_date, count(distinct customername) as customer_count from (select date(extract(year from billingdate),extract(month from billingdate),1) as year_month_date, customername from table) as table group by year_month_date order by year_month_date) where year_month_date >= date(2018,1,1)

How to append more rows to a table depending on a condition in BigQuery

I have two tables one with volume data and other with price data. Table with price data at times might not match the period covered in the table with volume data. For example volumes are available for the period starting from Jan 1 2022 to Dec 31 2023, but the price data is available only up until Dec 31 2022. However, eventually it will get populated at some point. Thus, I want to copy Q4 2022 prices from the price table and append it 4 times to the main table labeled as Q1 2023, Q2 2023, Q3 2023 and Q4 2023. But I only want to do it when a condition is met, and that condition is if the period in two tables match or not.
So essentially this:
SELECT year, quarter, SKU, price FROM prices_table
UNION ALL
SELECT 2023 as year, 1 as quarter, SKU, price FROM prices_table WHERE year = 2022 and quarter = 4
UNION ALL
SELECT 2023 as year, 2 as quarter, SKU, price FROM prices_table WHERE year = 2022 and quarter = 4
and so on until I have full 4 quarters of 2023 prices.
I want the script to first check if prices_table already has that data or not. Basically if year * 10 + quarter from volume_table > year * 10 + quarter from prices_table then I want the script to handle it as I have described above. But if year year * 10 + quarter from volume_table = year * 10 + quarter from prices_table I don't want the script to do anything since all the prices I need are already available in the prices_table

Query to find SUM based on week

I have a table like date , sales , region
date
Sales
Region
11/02/2021
20
1
12/02/2021
23
1
13/02/2021
30
2
14/02/2021
50
2
15/02/2021
10
3
16/02/2021
10
3
How to extract sum of sales per region based on weeks (Week starting from Monday to Sunday)
You need to select the week before grouping.
This should work for you:
SELECT DATEPART(week, date) AS Week,
FROM table
GROUP BY DATEPART(week, RegistrationDate);

Running Total by Year in SQL

I have a table broken out into a series of numbers by year, and need to build a running total column but restart during the next year.
The desired outcome is below
Amount | Year | Running Total
-----------------------------
1 2000 1
5 2000 6
10 2000 16
5 2001 5
10 2001 15
3 2001 18
I can do an ORDER BY to get a standard running total, but can't figure out how to base it just on the year such that it does the running total for each unique year.
SQL tables represent unordered sets. You need a column to specify the ordering. One you have this, it is a simple cumulative sum:
select amount, year, sum(amount) over (partition by year order by <ordering column>)
from t;
Without a column that specifies ordering, "cumulative sum" does not make sense on a table in SQL.

Can I calculate an aggregate duration over multiple rows with a single row per day?

I'm creating an Absence Report for HR. The Absence Data is stored in the database as a single row per day (the columns are EmployeeId, Absence Date, Duration). So if I'm off work from Tuesday 11 February 2020 to Friday 21 February 2020 inclusive, there will be 9 rows in the table:
11 February 2020 - 1 day
12 February 2020 - 1 day
13 February 2020 - 1 day
14 February 2020 - 1 day
17 February 2020 - 1 day
18 February 2020 - 1 day
19 February 2020 - 1 day
20 February 2020 - 1 day
21 February 2020 - 1 day
(see screenshot below)
HR would like to see a single entry in the report for a contiguous period of absence:
My question is - without using a cursor, how can I calculate the is in SQL (even more complicated because I have to do this using Linq to SQL, but I might be able to swap this out for a stored procedure. Note that the criterion for contiguous data is adjacent working days EXCLUDING weekends and bank holidays. I hope I've made myself clear ... apologies if not.
This is a form of gaps-and-islands. In this case, use lag() to see if two vacations overlap and then a cumulative sum:
select employee, min(absent_from), max(absent_to)
from (select t.*,
sum(case when prev_absent_to = dateadd(day, -1, absent_from) then 0 else 1
end) over (partition by employee order by absent_to) as grp
from (select t.*,
lag(absent_to) over (partition by employee order by absent_from) as prev_absent_to
from t
) t
) t
group by employee, grp;
If you need to deal with holidays and weekends, then you need a calendar table.