SUM with GROUP BY don't display zero sums - sql

I'm trying to get data from a single table. I grouped by CURR. I have 12 condition listed. But some are zero.
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR
FROM XXX
WHERE DATEPART(year, CONVERT(dateTime, DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, DATE)) = 1
GROUP BY CURR
This is returning 3 value. But I want 12 value including zero sums. I tried this with CASE, but I could not.
Thanks...

What the others have been trying to tell you is that you need a "list" of the CURR values you want to see in your results. Generally, these would come from another table in a properly normalized database. Do you have one? It seems not but it is worth asking. A properly normalized database would generally have one.
So how do you create this list dynamically? Let us assume that your existing table has at least one row for every CURR value you desire in your resultset - even if that row has a date that falls outside of your period of interest. We can use that to form this list and then outer join that list to your existing query that does the summing.
with curr_list as (select distinct CURR from dbo.XXX)
select curr_list.CURR,
sum(isnull(tbl.AMOUNT)) as TOPL
from curr_list left join dbo.XXX as tbl
on curr_list.CURR = tbl.CURR
and tbl.[DATE] >= '20180101'
and tbl.[DATE] < '20180201'
group by curr_list.CURR
order by curr_list.CURR;
That should work assuming I made no typos. The CTE (named curr_list) creates the list of ID values that you want to see in your resultset. When you outer join that to your transaction data you will get at least one row for each CURR value. You then sum the amounts to aggregate those rows into a single row for each CURR value. Notice the change to the date criteria. Your original approach prevents the optimizer from using any useful indexes on that column.
If your existing table does not have a row for every value of CURR you want in your results, then you can simply change the cte and hardcode the values you desire.

I may get what you are trying to do.
If you have 12 CURR (I'm guessing currency?) and want to get the total amount of the transaction in January 2018 grouped by currency.
If this is the case, here is what you should try to do :
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR_TABLE.CURR
FROM CURR_TABLE
LEFT JOIN XXX ON XXX.CURR = CURR_TABLE.CURR
WHERE DATEPART(year, CONVERT(dateTime, XXX.DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, XXX.DATE)) = 1
GROUP BY CURR_TABLE.CURR
That way you'll get all currency listed (even if no reccord of that currency is available for that month.
EDIT:
I don't like that kind of syntax when you can avoid it, but you can :
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR_TABLE.CURR
FROM (SELECT distinct CURR FROM XXX) CURR_TABLE
LEFT JOIN XXX ON XXX.CURR = CURR_TABLE.CURR
WHERE DATEPART(year, CONVERT(dateTime, XXX.DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, XXX.DATE)) = 1
GROUP BY CURR_TABLE.CURR

Please execute below query and compare with your expected output.
SELECT CURR, SUM(ISNULL(AMOUNT,0)) AS TOPL,
CURR
FROM XXX
WHERE DATEPART(year, CONVERT(dateTime, DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, DATE)) = 1
GROUP BY CURR

Related

SQL if statement for date range

Hi I was needing help with the syntax to add a condition where the current date is retrieved if today is after the 5th of each month but if its between the 1st to the 5th then it should retrieve the month before this month. Is it something you can help with please? Below is how my query is structured.
Select *
FROM table1
left join table2
on e.ENTITY_NBR = d.entity_nbr
and cast(getdate() as date) between MONTH_BEGIN_DATE and MONTH_END_DATE
Select *,
CASE WHEN day(GETDATE()) > 5 THEN GETDATE()
ELSE DATEADD(month,-1,getdate()) END as date
FROM table1
left join table2
on e.ENTITY_NBR = d.entity_nbr
and cast(getdate() as date) between MONTH_BEGIN_DATE and MONTH_END_DATE
Based on a vague description of your problem this is the best I can write.
If you simply want to include todays date (or the same date from last month if it's currently the 5th or earlier in the current month), then this can be done in your SELECT clause:
select
case
when datepart(day,getdate()) <= 5
then dateadd(month,-1,getdate())
else getdate()
end
If you want to actually use this date to compare to some field in your dataset, then you can include this same case expression in your WHERE clause.
where the current date is retrieved if today is after the 5th of each month but if its between the 1st to the 5th then it should retrieve the month before this month.
Based on this description, you want something like this:
select *
from table1 e left join
table2 d
on e.ENTITY_NBR = d.entity_nbr and
(day(getdate() > 5 and datediff(month, d.date_col, getdate()) = 0 or
day(getdate() <= 5 and datediff(month, d.date_col, getdate()) = 1)
)

Running Total - Create row for months that don't have any sales in the region (1 row for each region in each month)

I am working on the below query that I will use inside Tableau to create a line chart that will be color-coded by year and will use the region as a filter for the user. The query works, but I found there are months in regions that don't have any sales. These sections break up the line chart and I am not able to fill in the missing spaces (I am using a non-date dimension on the X-Axis - Number of months until the end of its fiscal year).
I am looking for some help to alter my query to create a row for every month and every region in my dataset so that my running total will have a value to display in the line chart. if there are no values in my table, then = 0 and update the running total for the region.
I have a dimDate table and also a Regions table I can use in the query.
My Query now, (Results sorted in Excel to view easier) Results Table Now
What I want to do; New rows highlighted in Yellow What I want to do
My Code using SQL Server:
SELECT b.gy,
b.sales_month,
b.region,
b.gs_year_total,
b.months_away,
Sum(b.gs_year_total)
OVER (
partition BY b.gy, b.region
ORDER BY b.months_away DESC) RT_by_Region_GY
FROM (SELECT a.gy,
a.region,
a.sales_month,
Sum(a.gy_total) Gs_Year_Total,
a.months_away
FROM (SELECT g.val_id,
g.[gs year] AS GY
,
g.sales_month
AS
Sales_Month,
g.gy_total,
Datediff(month, g.sales_month, dt.lastdayofyear) AS
months_away,
g.value_type,
val.region
FROM uv_sales g
JOIN dbo.dimdate AS dt
ON g.[gs year] = dt.gsyear
JOIN dimvalsummary val
ON g.val_id = val.val_id
WHERE g.[gs year] IN ( 2017, 2018, 2019, 2020, 2021 )
GROUP BY g.valuation_id,
g.[gs year],
val.region,
g.sales_month,
dt.lastdayofyear,
g.gy_total,
g.value_type) a
WHERE a.months_away >= 0
AND sales_month < Dateadd(month, -1, Getdate())
GROUP BY a.gy,
a.region,
a.sales_month,
a.months_away) b
It's tough to envision the best method to solve without data and the meaning of all those fields. Here's a rough sketch of how one might attempt to solve it. This is not complete or tested, sorry, but I'm not sure the meaning of all those fields and don't have data to test.
Create a table called all_months and insert all the months from oldest to whatever date in the future you need.
01/01/2017
02/01/2017
...
12/01/2049
May need one query per region and union them together. Select the year & month from that all_months table, and left join to your other table on month. Coalesce your dollar values.
select 'East' as region,
extract(year from m.month) as gy_year,
m.month as sales_month,
coalesce(g.gy_total, 0) as gy_total,
datediff(month, m.month, dt.lastdayofyear) as months_away
from all_months m
left join uv_sales g on g.sales_month = m.month
--and so on

Match between tables with New, Not New

I am still learning TSQL at the moment and im new to here so forgive me if Ive not done this right.
I have a table that each day loads new days data. Each day that loads has a report date for the previous day.
I want to get yesterdays data (eg - 17/09/2019) from the table, and I want to look at the data in the same table from the day before that (eg - 16/09/2019) and I want to run a check for the reference number and if the Reference number appears on the day before then I want it to say Not New, and if it does match to the day before then I want it to say New.
The columns I have is :
ReferenceNumber, ReportData, NewAppt
NewAppt column will be where it put the outcome of New/Not New
Something like this should work:
WITH Yesterday AS (
SELECT DISTINCT
ReferenceNumber,
CONVERT(DATE, ReportDate) AS ReportDate
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -1, GETDATE())),
DayBeforeYesterday AS (
SELECT DISTINCT
ReferenceNumber
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -2, GETDATE()))
SELECT
y.ReferenceNumber,
y.ReportDate,
CASE
WHEN x.ReferenceNumber IS NOT NULL THEN 0
ELSE 1
END AS NewAppointment
FROM
Yesterday y
LEFT JOIN DayBeforeYesterday x ON x.ReferenceNumber = y.ReferenceNumber;
Make a list of all the DISTINCT reference numbers from each day, and then join them into one big list, with the logic to see if there was a reference number yesterday that was also in the list from the day before yesterday.
I suppose your column ReportData is some sort of 'date' type and contains only the date (no time).
Furthermore, for each date, there should be at most 1 record for a specific ReferenceNumber.
In that case, try this:
SELECT t1.ReferenceNumber,
t1.ReportData,
CASE
WHEN t2.ReferenceNumber IS NULL THEN 'New'
ELSE 'Not New'
END AS NewAppt
FROM my_table t1
LEFT OUTER JOIN my_table t2
ON t1.ReferenceNumber = t2.ReferenceNumber
AND t2.ReportData = DATEADD(day, -1, t1.ReportData)
WHERE t1.ReportData = '2019-09-17';
Here's an approach using LAG which removes the need to join to the same table several times and instead just checks the preceding row for that Reference Number.
Note that in my interpretation of your request if a Reference Number disappears for a day and then returns then it's flagged as new. You can adapt the query to simply check if the number has appeared at any point in the past if that's not what you need.
CREATE TABLE #TestData (ReferenceNumber int,Reportdata date)
INSERT INTO #TestData
VALUES (1,'2019-01-16'),(1,'2019-01-17'),(1,'2019-01-18'),(2,'2019-01-18'),(3,'2019-01-17'),(3,'2019-01-18'),(4,'2019-01-17')
SELECT
ReferenceNumber
,ReportData
,IIF(
LAG(ReportData) OVER(PARTITION BY ReferenceNumber ORDER BY ReportData)
= dateadd(day,-1,ReportData)
,'Not New'
,'New'
) AS NewAppt
FROM #TestData

Add one for every row that fulfills where criteria between period

I have a Postgres table that I'm trying to analyze based on some date columns.
I'm basically trying to count the number of rows in my table that fulfill this requirement, and then group them by month and year. Instead of my query looking like this:
SELECT * FROM $TABLE WHERE date1::date <= '2012-05-31'
and date2::date > '2012-05-31';
it should be able to display this for the months available in my data so that I don't have to change the months manually every time I add new data, and so I can get everything with one query.
In the case above I'd like it to group the sum of rows which fit the criteria into the year 2012 and month 05. Similarly, if my WHERE clause looked like this:
date1::date <= '2012-06-31' and date2::date > '2012-06-31'
I'd like it to group this sum into the year 2012 and month 06.
This isn't entirely clear to me:
I'd like it to group the sum of rows
I'll interpret it this way: you want to list all rows "per month" matching the criteria:
WITH x AS (
SELECT date_trunc('month', min(date1)) AS start
,date_trunc('month', max(date2)) + interval '1 month' AS stop
FROM tbl
)
SELECT to_char(y.mon, 'YYYY-MM') AS mon, t.*
FROM (
SELECT generate_series(x.start, x.stop, '1 month') AS mon
FROM x
) y
LEFT JOIN tbl t ON t.date1::date <= y.mon
AND t.date2::date > y.mon -- why the explicit cast to date?
ORDER BY y.mon, t.date1, t.date2;
Assuming date2 >= date1.
Compute lower and upper border of time period and truncate to month (adding 1 to upper border to include the last row, too.
Use generate_series() to create the set of months in question
LEFT JOIN rows from your table with the declared criteria and sort by month.
You could also GROUP BY at this stage to calculate aggregates ..
Here is the reasoning. First, create a list of all possible dates. Then get the cumulative number of date1 up to a given date. Then get the cumulative number of date2 after the date and subtract the results. The following query does this using correlated subqueries (not my favorite construct, but handy in this case):
select thedate,
(select count(*) from t where date1::date <= d.thedate) -
(select count(*) from t where date2::date > d.thedate)
from (select distinct thedate
from ((select date1::date as thedate from t) union all
(select date2::date as thedate from t)
) d
) d
This is assuming that date2 occurs after date1. My model is start and stop dates of customers. If this isn't the case, the query might not work.
It sounds like you could benefit from the DATEPART T-SQL method. If I understand you correctly, you could do something like this:
SELECT DATEPART(year, date1) Year, DATEPART(month, date1) Month, SUM(value_col)
FROM $Table
-- WHERE CLAUSE ?
GROUP BY DATEPART(year, date1),
DATEPART(month, date1)

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.