SQL Grouping by year gives incorrect results - sql

I am trying to summerize sales date, by month, sales region and type. The problem is, the results change when I try to group by year.
My simplified query is as follows:
SELECT
DAB700.DATUM,DAB000.X_REGION,DAB700.BELEG_ART, // the date, sales region, order type
// calculate the number of orders per month
COUNT (DISTINCT CASE WHEN MONTH(DAB700.DATUM) = 1 THEN DAB700.BELEG_NR END) as jan,
COUNT (DISTINCT CASE WHEN MONTH(DAB700.DATUM) = 2 THEN DAB700.BELEG_NR END) as feb,
COUNT (DISTINCT CASE WHEN MONTH(DAB700.DATUM) = 3 THEN DAB700.BELEG_NR END) as mar
FROM "DAB700.ADT" DAB700
left join "DAB050.ADT" DAB050 on DAB700.BELEG_NR = DAB050.ANUMMER // join to table 050, to pull in order info
left join "DF030000.DBF" DAB000 on DAB050.KDNR = DAB000.KDNR // join table 000 to table 050, to pull in customer info
left join "DAB055.ADT" DAB055 on DAB050.ANUMMER = left (DAB055.APNUMMER,6)// join table 055 to table 050, to pull in product info
WHERE (DAB700.BELEG_ART = 10 OR DAB700.BELEG_ART = 20) AND (DAB700.DATUM>={d '2021-01-01'}) AND (DAB700.DATUM<={d '2021-01-11'}) AND DAB055.ARTNR <> '999999' AND DAB055.ARTNR <> '999996' AND DAB055.TERMIN <> 'KW.22.22' AND DAB055.TERMIN <> 'KW.99.99' AND DAB050.AUF_ART = 0
group by DAB700.DATUM,DAB000.X_REGION,DAB700.BELEG_ART
This returns the following data, which is correct (manually checked):
| DATUM | X_REGION | BELEG_ART | jan | feb | mar |
|------------|----------|-----------|-----|-----|-----|
| 04.01.2021 | 1 | 10 | 3 | 0 | 0 |
| 04.01.2021 | 3 | 10 | 2 | 0 | 0 |
| 04.01.2021 | 4 | 10 | 1 | 0 | 0 |
| 04.01.2021 | 4 | 20 | 1 | 0 | 0 |
| 04.01.2021 | 6 | 20 | 2 | 0 | 0 |
| 05.01.2021 | 1 | 10 | 1 | 0 | 0 |
and so on....
The total number of records for Jan is 117 (correct).
Now I now want to summerize the data in one row (for example, data grouped by region and type)..
so I change my code so that I have:
SELECT
YEAR(DAB700.DATUM),
and
group by YEAR(DAB700.DATUM)
the rest of the code stays the same.
Now my results are:
| EXPR | X_REGION | BELEG_ART | jan | feb | mar |
|------|----------|-----------|-----|-----|-----|
| 2021 | 1 | 10 | 16 | 0 | 0 |
| 2021 | 1 | 20 | 16 | 0 | 0 |
| 2021 | 2 | 10 | 19 | 0 | 0 |
| 2021 | 2 | 20 | 22 | 0 | 0 |
| 2021 | 3 | 10 | 12 | 0 | 0 |
| 2021 | 3 | 20 | 6 | 0 | 0 |
Visually it is correct. But, the total count for January is now 116. A difference of 1. What am I doing wrong?
How can I keep the results from the first code - but have it presented as per the 2nd set?

You count distinct BELEG_NR. This is what makes the difference. Let's look at an example. Let's say your table contains four rows:
DATUM
X_REGION
BELEG_ART
BELEG_NR
04.01.2021
1
10
100
04.01.2021
1
10
200
05.01.2021
1
10
100
05.01.2021
1
10
300
That gives you per day, region and belegart:
DATUM
X_REGION
BELEG_ART
DISTINCT COUNT BELEG_NR
04.01.2021
1
10
2
05.01.2021
1
10
2
and per year, region and belegart
YEAR
X_REGION
BELEG_ART
DISTINCT COUNT BELEG_NR
2021
1
10
3
The BELEG_NR 100 never appears more than once per day, so every instance gets counted. But it appears twice for the year, so it gets counted once instead of twice.

Related

Sum case from previous month

I couldn't find the answer to this on here or on google.
This is part of the main table
+---+-------+----------------+--------------+
| | Acct | Last_trans_date|Last_transpay |
+---+-------+----------------+--------------+
| 1 | ABC | July 31 | Nov 5 |
| 2 | DEF | Mar 1 | Aug 8 |
| 3 | GFH | Mar 9 | Feb 7 |
+---+------+-----------------+--------------+
I want the total account for the previous month that includes last_trans_date and Last_transpay = previous month as count.
I used this
Select
year(open)
sum(case when month(last_trans_date) = month(current date - 1) and month(last_transpay) = month(current_date - 1) then 1 else 0 end) as activity
from table
group by 1.
I don't think it's outputting the correct amount
SELECT Count(*)
FROM [table]
WHERE
CHARINDEX(#PrevMonth, Last_trans_date) = 1
AND CHARINDEX(#PrevMonth, Last_transpay) = 1

SQL Query to Display Daily Count Results in Columns from 1st to last day of month

Need a Query to Display daily count of each item bought by customers in columns from 1st day of month to last day
Sample data table "Item"
+--------+--------+----------+---------------+
| Purchase Date | Item Code| Item Name| Price|
|--------+--------+----------+--------------+
| 01-JAN-20 | 11 | Apple | 1 |
| 01-JAN-20 | 11 | Apple | 1 |
| 02-JAN-20 | 12 | Orange | 2 |
| 02-JAN-20 | 11 | Apple | 1 |
| 03-JAN-20 | 12 | Orange | 2 |
| 03-JAN-20 | 12 | Orange | 2 |
| 04-JAN-20 | 12 | Orange | 2 |
| 04-JAN-20 | 11 | Apple | 1 |
+--------+--------+----------+--------------+
SQL Query should Display Daily Count using Item code and Result to be displayed as below table .
Count daily with each day displayed in column base on the day e.g If today is 4th of Jan then count tomorrow will create new column with count result and continues until last day of month or something similar.
+--------+--------+----------+---------------+
| Items | Jan 01| Jan 02| Jan 03|Jan 04| etc
+--------+--------+----------+--------------+
| Apple | 2 | 1 | 2 | 1 |
| Orange | 0 | 1 | 0 | 1 |
+--------+--------+----------+--------------+
If you know what dates you want, you can use conditional aggregation:
select item,
sum(case when purchase_date = '2020-01-01' then 1 else 0 end) as jan_1,
sum(case when purchase_date = '2020-01-02' then 1 else 0 end) as jan_2,
sum(case when purchase_date = '2020-01-03' then 1 else 0 end) as jan_3,
sum(case when purchase_date = '2020-01-04' then 1 else 0 end) as jan_4,
. . .
from items
group by item;
Note that this assumes that purchase_date is really stored as an internal date format. So the comparison is a date constant -- however, that might differ among databases.
If you do not have a specific set of dates in mind, then you will need to use dynamic SQL.

Display values from different month side by side?

I have a requirement to display data according to month side by side.
Below are the records in the table:
|Student ID| Type | Date | Amount($)
|00000001 | Foods |01/01/2009 | 10
|00000001 | Foods |01/02/2009 | 20
|00000002 | Drinks |01/01/2009 | 10
|00000003 | Snacks |01/02/2009 | 10
|00000003 | Drinks |01/02/2009 | 10
The expected results are like below:
|Student ID| Type | Jan | Feb
|00000001 | Foods | 10 | 20
|00000002 | Drinks | 10 | 0
|00000003 | Snacks | 0 | 10
|00000003 | Drinks | 0 | 10
The amount of type Foods for Student ID is displayed according to the month.
I tried to achieve the expected results by using CASE statement like below but it can't achieve like what is expected.
SELECT STUDENT_ID
, TYPE
, CASE WHEN MONTH(DATE)=1 THEN AMOUNT ELSE 0 AS JAN
, CASE WHEN MONTH(DATE)=2 THEN AMOUNT ELSE 0 AS FEB
FROM CANTEEN_SPENT
But the results are like below
|Student ID| Type | Jan | Feb
|00000001 | Foods | 10 | 0
|00000001 | Foods | 0 | 20
|00000002 | Drinks | 10 | 0
|00000003 | Snacks | 0 | 10
|00000003 | Drinks | 0 | 10
The data for Student ID 00000001 should be merged together as shown in the expected example.
Try conditional aggregation:
SELECT STUDENT_ID
, TYPE
, SUM(CASE WHEN MONTH(DATE) = 1 THEN AMOUNT END) AS JAN
, SUM(CASE WHEN MONTH(DATE) = 2 THEN AMOUNT END) AS FEB
FROM CANTEEN_SPENT
GROUP BY STUDENT_ID, TYPE
ORDER BY STUDENT_ID

SQL - Adding an avg column to a detail table

I'm on Teradata. I have an order table like the below.
custID | orderID | month | order_amount
-----------------------------------------
1 | 1 | jan | 10
1 | 2 | jan | 20
1 | 3 | feb | 5
1 | 4 | feb | 7
2 | 5 | mar | 20
2 | 6 | apr | 30
I'd like to add a column to the above table called "Avg order amount per month per customer". Since the table is at an order level, adding this column will cause duplicates like the below, which is ok.
custID | orderID | month | order_amount | avgOrdAmtperMonth
-------------------------------------------------------------
1 | 1 | jan | 10 | 15
1 | 2 | jan | 20 | 15
1 | 3 | feb | 5 | 6
1 | 4 | feb | 7 | 6
2 | 5 | mar | 20 | 20
2 | 6 | apr | 30 | 30
I want the output to have all the columns above, not just the custid and the new column. I'm not sure how to write this because one part of the table is an at order level and the new column needs to be grouped by customer+month. How would I do this?
This is a simple group average:
AVG(order_amount) OVER (PARTITION BY custID, month)
Why not just do the calculation when you query the table?
select t.*,
avg(order_amount) over (partition by custId, month) as avgOrderAmtPerMonth
from t;
You can add this into a view if you want to make it available to multiple downstream queries.
Actually adding the column to the table is a maintenance "nightmare". You have to add triggers to the table and update the value for updates, inserts, and deletes.

Weekly Average Reports: Redshift

My Sales data for first two weeks of june, Monday Date i.e 1st Jun , 8th Jun are below
date | count
2015-06-01 03:25:53 | 1
2015-06-01 03:28:51 | 1
2015-06-01 03:49:16 | 1
2015-06-01 04:54:14 | 1
2015-06-01 08:46:15 | 1
2015-06-01 13:14:09 | 1
2015-06-01 16:20:13 | 5
2015-06-01 16:22:13 | 1
2015-06-01 16:27:07 | 1
2015-06-01 16:29:57 | 1
2015-06-01 19:16:45 | 1
2015-06-08 10:54:46 | 1
2015-06-08 15:12:10 | 1
2015-06-08 20:35:40 | 1
I need a find weekly avg of sales happened in a given range .
Complex Query:
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
count(sales_count)
from final_result_set
group by h, dow.
Output :
h | day_of_week | count
3 | 1 | 3
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 8
19 | 1 | 1
20 | 1 | 1
If I try to apply avg on the above final result, It is not actually fetching correct answer!
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
avg(sales_count)
from final_result_set
group by h, dow.
h | day_of_week | count
3 | 1 | 1
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 1
19 | 1 | 1
20 | 1 | 1
So I 've two mondays in the given range, it is not actually dividing by it. I am not even sure what is happening inside redshift.
To get "weekly averages" use date_trunc():
SELECT date_trunc('week', my_date_column) as week
, avg(sales_count) AS avg_sales
FROM final_result_set
GROUP BY 1;
I hope you are not actually using date as name for your date column. It's a reserved word in SQL and a basic type name, don't use it as identifier.
If you group by the day of week (DOW) you get averages per weekday. and sunday is 0. (Use ISODOW to get 7 for Sunday.)