Find missing values SQL Server?

Find missing values SQL Server? - sql

I have a table (dataset_final) that contains data on the number of sales (field quantity) of goods in a particular store for a particular week of the year. Unique goods about 200 thousand, about 50 stores, the period of 6 years.
dataset_final
+---------+-------------+---------+----------+----------+
| year_id | week_number | good_id | store_id | quantity |
+---------+-------------+---------+----------+----------+
| 2017 | 37 | 137233 | 9 | 1 |
+---------+-------------+---------+----------+----------+
| 2017 | 38 | 137233 | 9 | 4 |
+---------+-------------+---------+----------+----------+
| 2017 | 40 | 137233 | 9 | 3 |
+---------+-------------+---------+----------+----------+
| 2016 | 35 | 152501 | 23 | 6 |
+---------+-------------+---------+----------+----------+
| 2016 | 37 | 152501 | 23 | 3 |
+---------+-------------+---------+----------+----------+
I would like the missing values, i.e. when the combination of good and store was not sold in a certain week of the year, to fill in the zero. For example.
+---------+-------------+---------+----------+----------+
| year_id | week_number | good_id | store_id | quantity |
+---------+-------------+---------+----------+----------+
| 2017 | 37 | 137233 | 9 | 1 |
+---------+-------------+---------+----------+----------+
| 2017 | 38 | 137233 | 9 | 4 |
+---------+-------------+---------+----------+----------+
| 2017 | 40 | 137233 | 9 | 3 |
+---------+-------------+---------+----------+----------+
| 2016 | 35 | 152501 | 23 | 6 |
+---------+-------------+---------+----------+----------+
| 2016 | 37 | 152501 | 23 | 3 |
+---------+-------------+---------+----------+----------+
| 2017 | 39 | 137233 | 9 | 0 |
+---------+-------------+---------+----------+----------+
| 2016 | 36 | 152501 | 23 | 0 |
+---------+-------------+---------+----------+----------+
I wanted to do this: find all unique combinations of year_id, week_number, good_id, store_id and add only those that are not in the dataset_final table. My query:
WITH t1 AS (SELECT DISTINCT
[year_id]
,[week_number]
,[good_id]
,[store_id]
FROM [fs_db].[dbo].[ds_dataset_final]),
t2 AS (SELECT DISTINCT [year_id], [week_number] FROM [fs_db].[dbo].[ds_dataset_final])
SELECT t2.[year_id], t2.[week_number], t1.[good_id], t1. [store_id] FROM t1
full join t2 ON t2.[year_id]=t1.[year_id] AND t2.[week_number]=t2.[week_number]
This query produces about 1.2 billion unique combinations, which seems too much.
Also, I take into account the combination only from the beginning of sales of goods, for example, if the table has sales of a particular product only from 2017, then I do not need to fill in earlier data.

The basic idea is to general all the rows using cross join and then use left join to bring in the values.
Assuming you have all year/week combinations in your original table and have all the goods and stores in the table, you can use:
select vw.year_id, vw.week_number,
g.good_id, s.store_id,
coalesce(d.quantity, 0) as quantity
from (select distinct year_id, week_number
from fs_db..ds_dataset_final
) yw cross join
(select distinct good_id
from fs_db..ds_dataset_final
) g cross join
(select distinct store_id
from fs_db..ds_dataset_final
) s left join
fs_db..ds_dataset_final d
on d.year_id = vw.year_id and
d.week_number = vw.week_number and
d.good_id = g.good_id and
d.store_id = s.store_id;
You may have other sources for each of the dimensions (such as a proper dimension table). If so, don't use select distinct but use the reference tables.
EDIT:
Just add as the last line the in the query:
where yw.year >= 2015 and yw.year < 2019
if you want the years 2015, 2016, 2017, and 2018.

This is very much pseudo SQL in the absence of what your actual database looks like, it should, however, get you on the right path. You'll need to replace the objects like dbo.Store with your actual objects, and I suggest creating a proper calendar table:
--This shoudl really be a full calendar table, but we'll making a sample here
CREATE TABLE dbo.Weeks (Year int,
Week int);
INSERT INTO dbo.Weeks (Year, Week)
SELECT Y.Year,
W.Week
FROM (VALUES(2016),(2017),(2018),(2019))Y(Year)
CROSS APPLY (SELECT TOP 52 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Week
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N1(N),
(VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N2(N)) W
GO
WITH CTE AS(
SELECT W.Year,
W.Week,
S.StoreID,
G.GoodsID
FROM dbo.Weeks W
CROSS JOIN dbo.Store S
CROSS JOIN dbo.Goods G
WHERE EXISTS (SELECT 1
FROM dbo.YourTable YT
WHERE YT.year_id <= W.Year
AND YT.store_id = S.StoreID))
SELECT C.Year,
C.Week,
C.StoreID,
C.GoodsID,
ISNULL(YT.quantity,0) AS quantity
FROM CTE C
LEFT JOIN YourTable YT ON C.Year = YT.year_id
AND C.Week = YT.week_number
AND C.StoreID = YT.store_id
AND C.GoodsID = YT.good_id
--WHERE?

Related

SQL aggregate quarterly records

I'm trying to aggregate quarterly records that belong to unique customer ids. I would then like to filter out those by $amount spent on X.
below is what I've coded:
select year, count(*), sum($spentX), customerid
from table A
where year = 2017
group by year, customerid
having sum($spentX)<=1000
Is this correct? Also, how do I sum the results to give me 2017 total, this query lists all customerids so I have to aggregate in another software.
2nd question:
How do I use the above when I join two tables?
thank you so much in advance

Your code looks correct. The way you have it set up your output would look something like the below. It is already filtered down to 2017 and grouped by the year and customer.
year | count(*) | sum($spentX) | customerid
2017 | 5 | 500.00 | 56
2017 | 8 | 800.00 | 43
2017 | 3 | 300.00 | 85
2017 | 2 | 200.00 | 56
2017 | 5 | 500.00 | 25
If you want to get a sum total for all customers who spent less than or equal to $1000 you would need put the entire thing in a subquery and sum it again
SELECT SUB.YEAR, SUM(SUB.TOTAL_COUNT), SUM(SUB.TOTAL_SPENT)
FROM (
SELECT YEAR, COUNT(*) AS TOTAL_COUNT, SUM($SPENTX) AS TOTAL_SPENT, CUSTOMERID
FROM TABLE A
WHERE YEAR = 2017
GROUP BY YEAR, CUSTOMERID
HAVING SUM($SPENTX)<=1000
) SUB
GROUP BY SUB.YEAR
OUTPUT:
YEAR | SUM(SUB.TOTAL_COUNT) | SUM(SUB.TOTAL_SPENT)
2017 | 23 | 2300.00
If you wanted to join that to another table say one with customer information it would look like this:
SELECT SUB.* , C.CUSTOMER_NAME
FROM (
SELECT YEAR, COUNT(*) AS TOTAL_COUNT, SUM($SPENTX) AS TOTAL_SPENT, CUSTOMERID
FROM TABLE A
WHERE YEAR = 2017
GROUP BY YEAR, CUSTOMERID
HAVING SUM($SPENTX)<=1000
) SUB
INNER JOIN TABLE_CUSTOMERS C ON C.CUSTOMERID = SUB.CUSTOMERID
The output for this query would look something like this:
YEAR | TOTAL_COUNT | TOTAL_SPENT | customerid | CUSTOMER_NAME
2017 | 5 | 500.00 | 56 | LARRY
2017 | 8 | 800.00 | 43 | MARGE
2017 | 3 | 300.00 | 85 | JOHN
2017 | 2 | 200.00 | 56 | RICK
2017 | 5 | 500.00 | 25 | SAM

Is it possible to find the MAX value of a an already aggregated calculation inside the same view?

I have created a calculation in Microsoft SQL Server Management Studio that creates a running total per company and quarter, but at a monthly level and this part works fine.
So if company X sold 40 apples, hypothetically, in Jan and then 60 in Feb, then the running total in Feb would be 100 and if they sold 30 in March, then March's running total would be 130 and then in April it would reset for the new quarter.
What I need now is to find the MAX of these values, per month across all companies. So if Company 'X' sold 100 in Feb, but Company 'Y' sold 150, I want to return 150.
The calculation I use to get the rolling values per quarter calls on two functions to calculate the quarter each month falls into, as well as the relevant Fiscal Period / year ('GetQuarter' and 'GetFiscalPeriod' being the functions).
So my question is, is there any way to find the max at a different level of detail (in this case across ALL Companies) when the value you are looking at is already aggregated at Company level?
I'm told Stored Procedures would make this a lot simpler but the software I use can't call on Stored Procedures, only views and tables.
SELECT
cm.Company_Code,
cm.[Date],
cm.Measure,
SUM(cm.Actual) OVER (
PARTITION BY (
SELECT dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2))),
cm.Measure,
cm.Company_Code,
(LEFT((SELECT dbo.GetFiscalPeriod(cm.[Date])), 4))
ORDER BY cm.[Date]
) AS Current_QTD_Actual
FROM mytable cm
Desired Output would look like the "MAX" field below:
+--------------+--------+-----+-----+----------+---------+-----+------------+
| Company_Code | Actual | QTD | MAX | Date | Measure | QTR | FiscalYear |
| AAA | 40 | 40 | 40 | 20180701 | Bananas | Q1 | 2019 |
| BBB | 35 | 35 | 40 | 20180701 | Bananas | Q1 | 2019 |
| AAA | 60 | 100 | 105 | 20180801 | Bananas | Q1 | 2019 |
| BBB | 70 | 105 | 105 | 20180801 | Bananas | Q1 | 2019 |
| AAA | 30 | 130 | 150 | 20180901 | Bananas | Q1 | 2019 |
| BBB | 45 | 150 | 150 | 20180901 | Bananas | Q1 | 2019 |
| AAA | 25 | 25 | 45 | 20181001 | Bananas | Q2 | 2019 |
| BBB | 45 | 45 | 45 | 20181001 | Bananas | Q2 | 2019 |
| AAA | 30 | 55 | 85 | 20181101 | Bananas | Q2 | 2019 |
| BBB | 40 | 85 | 85 | 20181101 | Bananas | Q2 | 2019 |
+--------------+--------+-----+-----+----------+---------+-----+------------+
As the QTD calculation I currently have is already a rolled up SUM, simply wrapping this in a MAX function does not work for obvious reasons.
I tried creating a temporary table within the calculation using examples I've seen online, which I then call back into the original table and max that value but I think my syntax is wrong because it never comes out right (I'm still a novice so temporary table syntaxes still elude me quite a bit).

You seem to want the cumulative sum of the maximum values for each month. If this is correct, you can use two levels of window functions:
select measure, fiscalyear, qtr, date, actual,
sum(actual) over (partition by measure fiscalyear, qtr order by date) as running_actual
from (select t.*,
row_number() over (partition by measure, date order by actual desc) as seqnum
from t
) t
where seqnum = 1;

You can't stack aggregates together on the same SELECT with the only exception of appying a windowed aggregate (with an OVER clause) over a regular aggregate. For example:
SELECT
T.GroupedColumn,
RowsByGroup = COUNT(*), -- Regular aggregate
SumOfAllRows = SUM(COUNT(*)) OVER () -- Windowed aggregate of a regular one
FROM
MyTable AS T
GROUP BY
T.GroupedColumn
You can however apply them if you warp the former on a subquery or CTE, which also make the query more readable IMO. I believe you are looking for something like the following:
;WITH RunningSumPerQuarterPerCompany AS
(
SELECT
cm.Company_Code,
cm.[Date],
cm.Measure,
Current_QTD_Actual = SUM(cm.Actual) OVER (
PARTITION BY
dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2)),
cm.Measure,
cm.Company_Code,
LEFT(dbo.GetFiscalPeriod(cm.[Date]), 4)
ORDER BY
cm.[Date]),
-- Add additional PARTITION BY columns for the GROUP BY later on
Quarter = dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2)),
FiscalPeriod = LEFT(dbo.GetFiscalPeriod(cm.[Date]), 4)
FROM
mytable cm
),
MaxRunningSumPerQuarter AS
(
SELECT
R.Quarter,
R.FiscalPeriod,
Max_Current_QTD_Actual = MAX(R.Current_QTD_Actual)
FROM
RunningSumPerQuarterPerCompany AS R
GROUP BY
R.Quarter,
R.FiscalPeriod -- GROUP BY whichever dimension you need
)
SELECT
R.*,
M.Max_Current_QTD_Actual
FROM
RunningSumPerQuarterPerCompany AS R
LEFT JOIN MaxRunningSumPerQuarter AS M ON
R.Quarter = M.Quarter AND
R.FiscalPeriod = M.FiscalPeriod -- Join by the GROUP BY columns to display the MAX

postgresql - cumul. sum active customers by month (removing churn)

I want to create a query to get the cumulative sum by month of our active customers. The tricky thing here is that (unfortunately) some customers churn and so I need to remove them from the cumulative sum on the month they leave us.
Here is a sample of my customers table :
customer_id | begin_date | end_date
-----------------------------------------
1 | 15/09/2017 |
2 | 15/09/2017 |
3 | 19/09/2017 |
4 | 23/09/2017 |
5 | 27/09/2017 |
6 | 28/09/2017 | 15/10/2017
7 | 29/09/2017 | 16/10/2017
8 | 04/10/2017 |
9 | 04/10/2017 |
10 | 05/10/2017 |
11 | 07/10/2017 |
12 | 09/10/2017 |
13 | 11/10/2017 |
14 | 12/10/2017 |
15 | 14/10/2017 |
Here is what I am looking to achieve :
month | active customers
-----------------------------------------
2017-09 | 7
2017-10 | 6
I've managed to achieve it with the following query ... However, I'd like to know if there are a better way.
select
"begin_date" as "date",
sum((new_customers.new_customers-COALESCE(churn_customers.churn_customers,0))) OVER (ORDER BY new_customers."begin_date") as active_customers
FROM (
select
date_trunc('month',begin_date)::date as "begin_date",
count(id) as new_customers
from customers
group by 1
) as new_customers
LEFT JOIN(
select
date_trunc('month',end_date)::date as "end_date",
count(id) as churn_customers
from customers
where
end_date is not null
group by 1
) as churn_customers on new_customers."begin_date" = churn_customers."end_date"
order by 1
;

You may use a CTE to compute the total end_dates and then subtract it from the counts of start dates by using a left join
SQL Fiddle
Query 1:
WITH edt
AS (
SELECT to_char(end_date, 'yyyy-mm') AS mon
,count(*) AS ct
FROM customers
WHERE end_date IS NOT NULL
GROUP BY to_char(end_date, 'yyyy-mm')
)
SELECT to_char(c.begin_date, 'yyyy-mm') as month
,COUNT(*) - MAX(COALESCE(ct, 0)) AS active_customers
FROM customers c
LEFT JOIN edt ON to_char(c.begin_date, 'yyyy-mm') = edt.mon
GROUP BY to_char(begin_date, 'yyyy-mm')
ORDER BY month;
Results:
| month | active_customers |
|---------|------------------|
| 2017-09 | 7 |
| 2017-10 | 6 |

Left join on same table, analysis on trends mysql

table structure is as follows
+---------------+---------+---------+
| customer_name | date | balance |
+---------------+---------+---------+
| 123 | june 14 | 20 |
| 123 | june 15 | 30 |
| 1234 | june 14 | 30 |
| 12345 | june 16 | 50 |
+---------------+---------+---------+
i would like to join on the same table, keeping my original data set as 2014 and i want to analyse trends to see which customers balance doesnt change from 2014.
for example i would like to show the below
+-----------+-----------+-----------+
| custmomer | june14bal | june15bal |
+-----------+-----------+-----------+
| 1234 | 30 | null |
| 123 | 20 | 30 |
+-----------+-----------+-----------+
I have trids multiple left joins but cant seem to get it working. the most important thing is starting my sample with records from 2014 only.
current script
with TABLE_DATA as
(
select Customer ,DATE, Balance
from table
where dATE in ('30-JUN-2014','30-juN-2015')
)
SELECT
sum(inv1.balance) as year1bal,
suminv2.balance) as year2bal,
customer,
date
from table_datA inv1
left join TABLE_DATA inv2
on inv1.customer= inv2.customer and inv2.as_of_Date = '30-June-2015'
group by date, customer

you can add having clause after group by Like:
having sum(inv1.balance) != sum(inv2.balance)
or try the below query
with table2014 as
(
select Customer ,sum(Balance) Balance2014
from tableName
where dATE ='30-JUN-2014' group by Customer
)
,Table2015 as
( select Customer ,sum( Balance) Balance2015
from tableName
where dATE ='30-juN-2015' group by Customer
)
SELECT
inv1.customer,Balance2014, Balance2015
from table2014 inv1
left join Table2015 inv2
on inv1.customer= inv2.customer
--where Balance2014 !=Balance2015

Subtract the value of a row from grouped result

I have a table supplier_account which has five coloumns supplier_account_id(pk),supplier_id(fk),voucher_no,debit and credit. I want to get the sum of debit grouped by supplier_id and then subtract the value of credit of the rows in which voucher_no is not null. So for each subsequent rows the value of sum of debit gets reduced. I have tried using 'with' clause.
with debitdetails as(
select supplier_id,sum(debit) as amt
from supplier_account group by supplier_id
)
select acs.supplier_id,s.supplier_name,acs.purchase_voucher_no,acs.purchase_voucher_date,dd.amt-acs.credit as amount
from supplier_account acs
left join supplier s on acs.supplier_id=s.supplier_id
left join debitdetails dd on acs.supplier_id=dd.supplier_id
where voucher_no is not null
But here the debit value will be same for all rows. After subtraction in the first row I want to get the result in second row and subtract the next credit value from that.
I know it is possible by using temporary tables. The problem is I cannot use temporary tables because the procedure is used to generate reports using Jasper Reports.

What you need is an implementation of the running total. The easiest way to do it with a help of a window function:
with debitdetails as(
select id,sum(debit) as amt
from suppliers group by id
)
select s.id, purchase_voucher_no, dd.amt, s.credit,
dd.amt - sum(s.credit) over (partition by s.id order by purchase_voucher_no asc)
from suppliers s
left join debitdetails dd on s.id=dd.id
order by s.id, purchase_voucher_no
SQL Fiddle
Results:
| id | purchase_voucher_no | amt | credit | ?column? |
|----|---------------------|-----|--------|----------|
| 1 | 1 | 43 | 5 | 38 |
| 1 | 2 | 43 | 18 | 20 |
| 1 | 3 | 43 | 8 | 12 |
| 2 | 4 | 60 | 5 | 55 |
| 2 | 5 | 60 | 15 | 40 |
| 2 | 6 | 60 | 30 | 10 |

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find missing values SQL Server? - sql

Related

SQL aggregate quarterly records

Is it possible to find the MAX value of a an already aggregated calculation inside the same view?

postgresql - cumul. sum active customers by month (removing churn)

Left join on same table, analysis on trends mysql

Subtract the value of a row from grouped result

Categories

Resources