Forcing empty rows from query - sql

I have a table containing monthly statistics for clients.
Columns are CustNo, Year, Month, Trips
Some customers do not have any trips in some months and therefore there are combinations of CustNo, Year and Month that have no rows in that table.
I am trying to write a Query that shows 0 for those combinations of CustNo, Year and Month that have no trips, instead of producing an empty row.
To start with I have created a ValidPeriods table that has a Year and a Month column containing those periods that are valid.
I can then Query like this:
SELECT v.ValidYear, v.ValidMonth, tc.CustNo, tc.Trips
FROM ValidPeriods v
LEFT OUTER JOIN TempTrips AS tc ON v.ValidYear = tc.Year
AND v.ValidMonth = tc.Month
WHERE tc.CustNo IN (1001230, 1001286, 1001292)
This will give me rows for all periods, with 1 row with NULL values for those periods where there are no customers in the list that have any trips.
But how do I get one row for each customer in the list for all periods?
Ideally I want this:
2016 1 1001230 0
2016 1 1001286 14
2016 1 1001292 23
2016 2 1001230 7
2016 2 1001286 0
2016 2 1001292 4
etc...

Generate the rows using cross join. Then fill in the values using left join:
SELECT ym.ValidYear, ym.ValidMonth, c.CustNo, COALESCE(tt.Trips, 0)
FROM ValidPeriods ym CROSS JOIN
(VALUES (1001230), (1001286), (1001292)) c(CustNo) LEFT JOIN
TempTrips tt
ON tt.ValidYear = ym.ValidYear AND tt.ValidMOnth = ym.ValidMonth AND
tt.CustNo = c.CustNo;

Related

LAG function alternative. I need the results for the missing year in between

I have this table so far. However, I would like to obtain the results for 2019 which there are no records so it becomes 0. Are there any alternatives to the LAG funciton.
ID
Year
Year_Count
1
2018
10
1
2020
20
Whenever I use the LAG function in SQL it gives me the results for 2018. However, I would like to get 0 for 2019 and then 10 for 2018
LAG(YEAR_COUNT) OVER (PARTITION BY ID ORDER BY YEAR) AS previous_year_count
untested notepad scribble
CASE
WHEN 1 = YEAR - LAG(YEAR) OVER (PARTITION BY ID ORDER BY YEAR)
THEN LAG(YEAR_COUNT) OVER (PARTITION BY ID ORDER BY YEAR)
ELSE 0
END AS previous_year_count
I'll add on to Nick's comment here with an example.
The YEARS CTE here is creating that table of years as he suggested, the RECORDS table is matching the above posted. Then they get joined together with COALESCE to fill in the null values left by the LEFT JOIN (filled ID with 0, not sure what your case would be).
You would need to LEFT JOIN onto the YEAR table and select the YEAR variable from the YEAR table in the final query, otherwise you'd only end up with only 2018/2020 or those years and some null values
WITH
YEARS AS
(
SELECT 2016 AS YEAR UNION ALL
SELECT 2017 UNION ALL
SELECT 2018 UNION ALL
SELECT 2019 UNION ALL
SELECT 2020 UNION ALL
SELECT 2021 UNION ALL
SELECT 2022
)
,
RECORDS AS
(
SELECT 1 ID, 2018 YEAR, 10 YEAR_COUNT UNION ALL
SELECT 1, 2020, 20)
SELECT
COALESCE(ID, 0) AS ID,
Y.YEAR,
COALESCE(YEAR_COUNT, 0) AS YEAR_COUNT
FROM YEARS AS Y
LEFT JOIN RECORDS AS R
ON R.YEAR = Y.YEAR
Here is the dbfiddle so you can visualize - https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9e777ad925b09eb8ba299d610a78b999
Vertica SQL is not an available test environment, so this may not work directly but should at least get you on the right track.
The LAG function would not work to get 2019 for a few reasons
It's a window function and can only grab from data that is available - the default for LAG in your case appears to be 1 aka LAG(YEAR_COUNT, 1)
Statements in the select typically can't add any rows data back into a table, you would need to add in data with JOINs
If 2019 does exist in a prior table and you're using group by to get year count, it's possible that you have a where clause excluding the data.

How to create a Select statement which contains a SUM on a different table

I currently have a select statement that is causing me issues, i have two tables:
Customer table
ID Month4Value Month5Value
1 24 5
Orders table
ID Year Month Value Quantity
1 2018 8 10 2
1 2018 4 2 1
1 2018 6 10 4
1 2018 4 7 3
I currently have the below view:
Create View Values as
Select ID, Year, Month, ROUND(SUM(Value*Quantity),2) as NewQuantity
FROM Orders
GROUP BY ID, Year, Month
The below select statement is what i am trying to run
Select Customer.ID, Customer.Month5Value, NewQuantity
from Customer inner join Values on Customer.ID = Values.ID
where ROUND(Customer.Month5Value, 2) <> ROUND(NewQuantity,2)
AND Values.Year = 2018
AND Values.Month = 5
What i am trying to achieve is to find any mismatches between the Orders table and the Customer table. In the above example, what i am expecting is to highlight that the value in Customer.Month5Value does not match the total of the (Quantity*Value) from the Orders table.
As there are 0 orders for Month 5 in the Orders Table, the Month5Value should be 0. However, it returns no entrys.
Any thoughts about what i have missed?
EDIT -
I have updated my query to this:
Select Customer.ID, Customer.Month5Value, NewQuantity
from Customer left join Values on Customer.ID = Values.ID
where ROUND(Customer.Month5Value, 2) <> ISNULL((Select NewQuantity from Customer left join Values on Customer.ID = Values.ID where Values.Month = 5 and Values.Year = 2018),0)
This has given me a list of IDs which have an incorrect amount in Month5Value on the Customer table, but displays lines for each month entry
ID Month5Value NewQuantity
1 5 24
1 5 40
1 5 20
How can i adjust this so that I get one line per ID with the correct value for NewQuantity (either 0 or NULL in this case)?
I think the INNER JOIN is removing any records which are missing from VALUES. Replacing the INNER JOIN with LEFT JOIN may give the result you are looking for.

smart way to create a master list (avoiding cross joins)

I need to create a Table of date,product and inventory count only for the days inventory 0 , something like this
Date Product store Inv
Jan1 1 1 0
Feb4 1 1 0
From the inventory table that only has a record whenever inventory changes
Like this
Store Product start_date end_date Inv
1 1 Jan 4 Jan10 5
1 1 Jan10 jan 15 4
I know I can create a master table by cross joining all store,product and calendar days in a year and then join only with days where date falls between start and end date of the inventory table. Is there a better way than this ? Can cross join be avoided ? Thanks
Are you looking for lag():
select t.*
from (select t.*,
lag(inventory) over (partition by product, store order by date) as prev_inventory
from t
) t
where prev_inventory is null or prev_inventory <> inventory;
Create a table with dates (which can be handy to have around for a ton of reasons) then left join from the inventory table and use BETWEEN against your date columns.

Sum duplicate row values

I'm trying to sum the values of rows two of which have duplicate values, the table I have is below:
Table name (Customers)
value years total
1 30 30
3 10 10
4 15 15
4 25 25
I would ideally like to finally have:
value years total
1 30 30
3 10 10
4 40 40
I've tried using SELECT DISTINCT and GROUP BY to get rid of the duplicate row, also the join in the code below isn't necessary. Regardless both commands come to no avail. Here's my code too:
SELECT DISTINCT
value,
years,
SUM(customer.years) AS total
FROM customer
INNER JOIN language
ON customer.expert=language.l_id
GROUP BY
expert,
years;
But that produces a copy of the first table, any input welcome. Thanks!!!
SELECT
value,
SUM(years) AS years,
SUM(total) AS total
FROM customers
GROUP BY value;
You want the sum of the years and the sum of the total, per — grouped by — value.
SELECT
value,
years,
SUM(customer.years) AS total FROM (SELECT DISTINCT
value,
years,
customer.years AS total
FROM customer
INNER JOIN language
ON customer.expert=language.l_id ) as TABLECUS
GROUP BY
expert,
years;

SQL query to compare product sales by month

I have a Monthly Status database view I need to build a report based on. The data in the view looks something like this:
Category | Revenue | Yearh | Month
Bikes 10 000 2008 1
Bikes 12 000 2008 2
Bikes 12 000 2008 3
Bikes 15 000 2008 1
Bikes 11 000 2007 2
Bikes 11 500 2007 3
Bikes 15 400 2007 4
... And so forth
The view has a product category, a revenue, a year and a month. I want to create a report comparing 2007 and 2008, showing 0 for the months with no sales. So the report should look something like this:
Category | Month | Rev. This Year | Rev. Last Year
Bikes 1 10 000 0
Bikes 2 12 000 11 000
Bikes 3 12 000 11 500
Bikes 4 0 15 400
The key thing to notice is how month 1 only has sales in 2008, and therefore is 0 for 2007. Also, month 4 only has no sales in 2008, hence the 0, while it has sales in 2007 and still show up.
Also, the report is actually for financial year - so I would love to have empty columns with 0 in both if there was no sales in say month 5 for either 2007 or 2008.
The query I got looks something like this:
SELECT
SP1.Program,
SP1.Year,
SP1.Month,
SP1.TotalRevenue,
IsNull(SP2.TotalRevenue, 0) AS LastYearTotalRevenue
FROM PVMonthlyStatusReport AS SP1
LEFT OUTER JOIN PVMonthlyStatusReport AS SP2 ON
SP1.Program = SP2.Program AND
SP2.Year = SP1.Year - 1 AND
SP1.Month = SP2.Month
WHERE
SP1.Program = 'Bikes' AND
SP1.Category = #Category AND
(SP1.Year >= #FinancialYear AND SP1.Year <= #FinancialYear + 1) AND
((SP1.Year = #FinancialYear AND SP1.Month > 6) OR
(SP1.Year = #FinancialYear + 1 AND SP1.Month <= 6))
ORDER BY SP1.Year, SP1.Month
The problem with this query is that it would not return the fourth row in my example data above, since we didn't have any sales in 2008, but we actually did in 2007.
This is probably a common query/problem, but my SQL is rusty after doing front-end development for so long. Any help is greatly appreciated!
Oh, btw, I'm using SQL 2005 for this query so if there are any helpful new features that might help me let me know.
The Case Statement is my best sql friend. You also need a table for time to generate your 0 rev in both months.
Assumptions are based on the availability of following tables:
sales: Category | Revenue | Yearh |
Month
and
tm: Year | Month (populated with all
dates required for reporting)
Example 1 without empty rows:
select
Category
,month
,SUM(CASE WHEN YEAR = 2008 THEN Revenue ELSE 0 END) this_year
,SUM(CASE WHEN YEAR = 2007 THEN Revenue ELSE 0 END) last_year
from
sales
where
year in (2008,2007)
group by
Category
,month
RETURNS:
Category | Month | Rev. This Year | Rev. Last Year
Bikes 1 10 000 0
Bikes 2 12 000 11 000
Bikes 3 12 000 11 500
Bikes 4 0 15 400
Example 2 with empty rows:
I am going to use a sub query (but others may not) and will return an empty row for every product and year month combo.
select
fill.Category
,fill.month
,SUM(CASE WHEN YEAR = 2008 THEN Revenue ELSE 0 END) this_year
,SUM(CASE WHEN YEAR = 2007 THEN Revenue ELSE 0 END) last_year
from
sales
Right join (select distinct --try out left, right and cross joins to test results.
product
,year
,month
from
sales --this ideally would be from a products table
cross join tm
where
year in (2008,2007)) fill
where
fill.year in (2008,2007)
group by
fill.Category
,fill.month
RETURNS:
Category | Month | Rev. This Year | Rev. Last Year
Bikes 1 10 000 0
Bikes 2 12 000 11 000
Bikes 3 12 000 11 500
Bikes 4 0 15 400
Bikes 5 0 0
Bikes 6 0 0
Bikes 7 0 0
Bikes 8 0 0
Note that most reporting tools will do this crosstab or matrix functionality, and now that i think of it SQL Server 2005 has pivot syntax that will do this as well.
Here are some additional resources.
CASE
https://web.archive.org/web/20210728081626/https://www.4guysfromrolla.com/webtech/102704-1.shtml
SQL SERVER 2005 PIVOT
http://msdn.microsoft.com/en-us/library/ms177410.aspx
#Christian -- markdown editor -- UGH; especially when the preview and the final version of your post disagree...
#Christian -- full outer join -- the full outer join is overruled by the fact that there are references to SP1 in the WHERE clause, and the WHERE clause is applied after the JOIN. To do a full outer join with filtering on one of the tables, you need to put your WHERE clause into a subquery, so the filtering happens before the join, or try to build all of your WHERE criteria onto the JOIN ON clause, which is insanely ugly. Well, there's actually no pretty way to do this one.
#Jonas: Considering this:
Also, the report is actually for financial year - so I would love to have empty columns with 0 in both if there was no sales in say month 5 for either 2007 or 2008.
and the fact that this job can't be done with a pretty query, I would definitely try to get the results you actually want. No point in having an ugly query and not even getting the exact data you actually want. ;)
So, I'd suggest doing this in 5 steps:
1. create a temp table in the format you want your results to match
2. populate it with twelve rows, with 1-12 in the month column
3. update the "This Year" column using your SP1 logic
4. update the "Last Year" column using your SP2 logic
5. select from the temp table
Of course, I guess I'm working from the assumption that you can create a stored procedure to accomplish this. You might technically be able to run this whole batch inline, but that kind of ugliness is very rarely seen. If you can't make an SP, I suggest you fall back on the full outer join via subquery, but it won't get you a row when a month had no sales either year.
The trick is to do a FULL JOIN, with ISNULL's to get the joined columns from either table. I usually wrap this into a view or derived table, otherwise you need to use ISNULL in the WHERE clause as well.
SELECT
Program,
Month,
ThisYearTotalRevenue,
PriorYearTotalRevenue
FROM (
SELECT
ISNULL(ThisYear.Program, PriorYear.Program) as Program,
ISNULL(ThisYear.Month, PriorYear.Month),
ISNULL(ThisYear.TotalRevenue, 0) as ThisYearTotalRevenue,
ISNULL(PriorYear.TotalRevenue, 0) as PriorYearTotalRevenue
FROM (
SELECT Program, Month, SUM(TotalRevenue) as TotalRevenue
FROM PVMonthlyStatusReport
WHERE Year = #FinancialYear
GROUP BY Program, Month
) as ThisYear
FULL OUTER JOIN (
SELECT Program, Month, SUM(TotalRevenue) as TotalRevenue
FROM PVMonthlyStatusReport
WHERE Year = (#FinancialYear - 1)
GROUP BY Program, Month
) as PriorYear ON
ThisYear.Program = PriorYear.Program
AND ThisYear.Month = PriorYear.Month
) as Revenue
WHERE
Program = 'Bikes'
ORDER BY
Month
That should get you your minimum requirements - rows with sales in either 2007 or 2008, or both. To get rows with no sales in either year, you just need to INNER JOIN to a 1-12 numbers table (you do have one of those, don't you?).
About the markdown - Yeah that is frustrating. The editor did preview my HTML table, but after posting it was gone - So had to remove all HTML formatting from the post...
#kcrumley I think we've reached similar conclusions. This query easily gets real ugly. I actually solved this before reading your answer, using a similar (but yet different approach). I have access to create stored procedures and functions on the reporting database. I created a Table Valued function accepting a product category and a financial year as the parameter. Based on that the function will populate a table containing 12 rows. The rows will be populated with data from the view if any sales available, if not the row will have 0 values.
I then join the two tables returned by the functions. Since I know all tables will have twelve roves it's allot easier, and I can join on Product Category and Month:
SELECT
SP1.Program,
SP1.Year,
SP1.Month,
SP1.TotalRevenue AS ThisYearRevenue,
SP2.TotalRevenue AS LastYearRevenue
FROM GetFinancialYear(#Category, 'First Look', 2008) AS SP1
RIGHT JOIN GetFinancialYear(#Category, 'First Look', 2007) AS SP2 ON
SP1.Program = SP2.Program AND
SP1.Month = SP2.Month
I think your approach is probably a little cleaner as the GetFinancialYear function is quite messy! But at least it works - which makes me happy for now ;)
I could be wrong but shouldn't you be using a full outer join instead of just a left join? That way you will be getting 'empty' columns from both tables.
http://en.wikipedia.org/wiki/Join_(SQL)#Full_outer_join
Using pivot and Dynamic Sql we can achieve this result
SET NOCOUNT ON
IF OBJECT_ID('TEMPDB..#TEMP') IS NOT NULL
DROP TABLE #TEMP
;With cte(Category , Revenue , Yearh , [Month])
AS
(
SELECT 'Bikes', 10000, 2008,1 UNION ALL
SELECT 'Bikes', 12000, 2008,2 UNION ALL
SELECT 'Bikes', 12000, 2008,3 UNION ALL
SELECT 'Bikes', 15000, 2008,1 UNION ALL
SELECT 'Bikes', 11000, 2007,2 UNION ALL
SELECT 'Bikes', 11500, 2007,3 UNION ALL
SELECT 'Bikes', 15400, 2007,4
)
SELECT * INTO #Temp FROM cte
Declare #Column nvarchar(max),
#Column2 nvarchar(max),
#Sql nvarchar(max)
SELECT #Column=STUFF((SELECT DISTINCT ','+ 'ISNULL('+QUOTENAME(CAST(Yearh AS VArchar(10)))+','+'''0'''+')'+ 'AS '+ QUOTENAME(CAST(Yearh AS VArchar(10)))
FROM #Temp order by 1 desc FOR XML PATH ('')),1,1,'')
SELECT #Column2=STUFF((SELECT DISTINCT ','+ QUOTENAME(CAST(Yearh AS VArchar(10)))
FROM #Temp FOR XML PATH ('')),1,1,'')
SET #Sql= N'SELECT Category,[Month],'+ #Column +'FRom #Temp
PIVOT
(MIN(Revenue) FOR yearh IN ('+#Column2+')
) AS Pvt
'
EXEC(#Sql)
Print #Sql
Result
Category Month 2008 2007
----------------------------------
Bikes 1 10000 0
Bikes 2 12000 11000
Bikes 3 12000 11500
Bikes 4 0 15400