I am trying to write an SQL query that shows STORES that stopped ordering in a month. That would be STORES that have orders the month before but no orders that month. For example STORES that have orders in January but do Not have orders in Febuary (these would be the STORES that stopped ordering for Febuary). I want to do this for every month (grouped) for a given date range - #datefrom-#dateto
I have one table with an INVOICE#,STORE# and a DATE column
I guess distinct STORE would be in there somewhere.
You can try something like this, break them into two select statements and left outer join them.
select table1.stores from (select * from table where date = 'January') as table1
left outer join (select * from table where date = 'Feburary') as table2
on table1.invoice= table2.invoice
this will return the unique results in January that does not match the results from February
ps. that was not an exact sql statement, just an idea
I have an example that might be close to what you desire. You may have to tweak it to your convenience and desired performance - http://sqlfiddle.com/#!3/231c4/15
create table test (
invoice int identity,
store int,
dt date
);
-- let's add some data to show that
-- store 1 ordered in Jan, Feb and Mar
-- store 2 ordered in Jan (missed Feb and Mar)
-- store 3 ordered in Jan and Mar (missed Feb)
insert into test (store, dt) values
(1, '2015-01-01'),(1, '2015-02-01'),(1, '2015-03-01'),
(2, '2015-01-01'),
(3, '2015-01-01'), (3, '2015-03-01');
Query
-----
with
months as (select distinct year(dt) as yr, month(dt) as mth from test),
stores as (select distinct store from test),
months_stores as (select * from months cross join stores)
select *
from months_stores ms
left join test t
on t.store = ms.store
and year(t.dt) = ms.yr
and month(t.dt) = ms.mth
where
(ms.yr = 2015 and ms.mth between 1 and 3)
and t.invoice is null
Result:
yr mth store ...other columns
2015 2 2
2015 2 3
2015 3 2
The results show us that store 2 missed orders in months Feb and Mar
and store 3 missed an order in Feb
Related
I have a historical database with about 9000 records with unique UserID and date they created an account CreatedDate that looks like this:
UserID CreatedDate
1 5/12/2019
2 1/1/2018
3 4/2/2015
4 8/9/2016
. ..
I would like to know how many accounts were created UP TO a certain date, but for multiple months.
For example, how many accounts were there in Jan 2020, Feb 2020, Mar 2020, so on and so forth.
The manual way would be to do this for each month but it would be tedious:
select count(*)
from SCHEMA
--KEEP REPLACING THE MONTH TO GET COUNTS
where CreatedDate <= '2020-01-31'
Just wondering if there is a more efficient way? A group by wouldn't work because it just totals for each month, but I'm trying to get a historical count. Thanks!
You seem to need running total for each month. If so, you need group by to compute total counts per month and then you have to sum them using analytical sum function.
This is how you would do it in Postgres (db fiddle). Other vendors may differ in the way how month is extracted but the principle is same.
with schema(UserID, CreatedDate) as (values
(1, date '2019-12-05'),
(2, date '2018-01-01'),
(3, date '2015-01-04'),
(4, date '2016-09-08')
)
select month, sum(cnt) over (order by month) from (
select date_trunc('month', CreatedDate)::date as month, count(*) as cnt
from schema
group by date_trunc('month', CreatedDate)::date
) x
Note if data has gaps in month sequence and you want continuous sequence (for example all months between 2015-01 and 2019-12), you have to pregenerate calendar (relation with all months) and left join table schema to it. (It is not in my example yet because of YAGNI.)
I am trying to figure out if an event occurred in the three consecutive previous years by month. For example:
Item Type Month Year
Hat S May 2015
Shirt P June 2015
Hat S June 2015
Hat S May 2016
Shirt P May 2016
Hat S May 2017
I am interested in seeing what item was purchased/sold for three consecutive years in the same month. Hat was sold in May in 2015, 2016, and 2017; therefore, I would like to identify that. Shirt was purchased in June 2015 and May 2016. Since this is different months in consecutive years, it does not qualify.
Essentially, I want it to be able to look back 3 years and identify those purchases/sales that reoccurred in the same month each year, preferably with an indicator variable.
I tried the following code:
select distinct a.*
from dataset as a inner join dataset as b
on a.type = b.type
and a.month = b.month
and a.item = b.item
and a.year = b.year-1
and a.year = b.year-2;
I want to get:
Item Type Month Year
Hat S May 2015
Hat S May 2016
Hat S May 2017
I guess I should add that my data is longer than 2015-2017. It spans 10 years, but I want to see if there are any 3 consecutive years (or more) within that 10 year span.
There are many ways to do this, however, one way in SQL, with the key understanding that rows can be grouped by Item and Month, is to restrict Year to the three years between 2015 and 2017. In order to qualify for 3 consecutive the count of the distinct values of year within the group should be 3. Such criteria will handle data with repetition, such as a group with 3 S-type Hats and 3 P-type Hats.
select item, type, month, year
from have
where year between 2015 and 2017
group by item, month
having count(distinct year) = 3
order by item, type, month, year
For the more generic problem of identifying runs within a group, SAS Data step is very suited and powerful. The serial DOW loop technique loops first over a range of rows based on some condition, whilst computing a group metric -- in this case, consecutive year runlength. A second loops over the same rows and utilizes the group metric within.
Consider this example in which the rungroup is computed based on year adjacency of item/month. Once the rungroups are established, the double DOW technique is applied.
data have;
do comboid = 1 to 1000;
itemid = ceil(10 * ranuni(123));
typeid = ceil(2* ranuni(123));
month = ceil(12 * ranuni(123));
year = 2009 + floor (10 * ranuni(123));
output;
end;
run;
proc sort data=have;
by itemid month year;
run;
data have_rungrouped;
set have;
by itemid month year;
rungroup + (first.month or not first.month and year - lag(year) > 1);
run;
data want;
do index = 1 by 1 until (last.rungroup);
set have_rungrouped;
by rungroup;
* distinct number of years in rungroup;
years_runlength = sum (years_runlength, first.rungroup or year ne lag(year));
end;
do index = 1 to index;
set have_rungrouped;
if years_runlength >= 3 then output;
end;
run;
Here is an example that would check if any item happened in consecutive years and list all from original table that qualify for at least two consecutive years:
DECLARE #table TABLE
(
Item NVARCHAR(MAX),
Type CHAR,
Month NVARCHAR(MAX),
Year INT
)
INSERT INTO #table VALUES
('Hat','S','May','2015'),
('Shirt','P','June','2015'),
('Hat','S','June','2015'),
('Hat','S','May','2016'),
('Shirt','P','May','2016'),
('Hat','S','May','2017')
SELECT * FROM #table
WHERE CONCAT(Item,Month) IN
(
SELECT CONCAT(group1.Item, group1.Month) FROM
(
SELECT Item,Year,Month FROM #table
GROUP BY Year, Item, Month
) group1
FULL OUTER JOIN
(
SELECT Item,Year,Month FROM #table
GROUP BY Year, Item, Month
) group2
ON group1.Year = group2.Year + 1 AND group1.Item = group2.Item AND group1.Month = group2.Month
WHERE group1.Item IS NOT NULL AND group2.Item IS NOT NULL
)
ORDER BY Item,Month,Year
As you can see I found all items that matched year + 1 in the same month.
OUTPUT:
Hat S May 2015
Hat S May 2016
Hat S May 2017
I'm trying to write a SQL query that will sum total production from the following two example tables:
Table: CaseLots
DateProduced kgProduced
October 1, 2013 10000
October 1, 2013 10000
October 2, 2013 10000
Table: Budget
OperatingDate BudgetHours
October 1, 2013 24
October 2, 2013 24
I would like to output a table as follows:
TotalProduction TotalBudgetHours
30000 48
Here is what I have for code so far:
SELECT
Sum(kgProduced) AS TotalProduction, Sum(BudgetHours) AS TotalBudgetHours
FROM
dbo.CaseLots INNER JOIN dbo.Budget ON dbo.CaseLots.DateProduced = dbo.Budget.OperatingDate
WHERE
dbo.Budget.OperatingDate BETWEEN '2013-10-01' AND '2013-10-02'
It seems that the query is double summing the budget hour in instances where more than one case lot is produced in a day. The table I'm getting is as follows:
Total Production BudgetHours
30000 72
How do I fix this?
Think about what the INNER JOIN is doing.
For every row in CaseLot, its finding any row in Budget that has a matching date.
If you were to remove your aggregation statements in SQL, and just show the inner join, you would see the following result set:
DateProduced kgProduced OperatingDate BudgetHours
October 1, 2013 10000 October 1, 2013 24
October 1, 2013 10000 October 1, 2013 24
October 2, 2013 10000 October 2, 2013 24
(dammit StackOverflow, why don't you have Markdown for tables :( )
Running your aggregation on top of that it is easy to see how you get the 72 hours in your result.
The correct query needs to aggregate the CaseLots table first, then join onto the Budget table.
SELECT DateProduced, TotalKgProduced, SUM(BudgetHours) AS TotalBudgetHours
FROM
(
SELECT DateProduced, SUM(kgProduced) AS TotalKgProduced
FROM CaseLots
GROUP BY DateProduced
) AS TotalKgProducedByDay
INNER JOIN
Budget
ON TotalKgProducedByDay.DateProduced = Budget.OperatingDate
WHERE DateProduced BETWEEN '1 Oct 2013' AND '2 Oct 2013'
GROUP BY DateProduced
The problem is in the INNER JOIN produces a 3 row table since the keys match on all. So there is three '24's with a sum of 72.
To fix this, it would probably be easier to split this into two queries.
SELECT Sum(kgProduced) AS TotalProduction
FROM dbo.CaseLots
WHERE dbo.CaseLots.OperatingDate BETWEEN '2013-10-01' AND '2013-10-02'
LEFT JOIN
SELECT Sum(BudgetHours) AS TotalBudgetHours
FROM dbo.Budget
WHERE dbo.Budget.OperatingDate BETWEEN '2013-10-01' AND '2013-10-02'
This could be easily achieved by this:
SELECT
(SELECT SUM(kgProduced) FROM dbo.CaseLots WHERE DateProduced BETWEEN '2013-10-01' AND '2013-10-02') AS TotalProduction,
(SELECT SUM(BudgetHours) FROM dbo.Budget WHERE OperatingDate BETWEEN '2013-10-01' AND '2013-10-02') AS TotalBudgetHours
There's no need for joining the two tables.
The other answers are simpler for this particular case. However if you needed to SUM 10 different values on the CaseLots table, you'd need 10 different subqueries. The following is a general, more scaleable solution:
SELECT
SUM(DayKgProduced) AS TotalProduction,
SUM(BudgetHours) AS TotalBudgetHours
FROM (
SELECT
DateProduced,
SUM(kgProduced) AS DayKgProduced,
FROM dbo.CaseLots
WHERE DateProduced BETWEEN '2013-10-01' AND '2013-10-02'
GROUP BY DateProduced
) DailyTotals
INNER JOIN dbo.Budget b ON DailyTotals.DateProduced = b.OperatingDate
First you SUM the production of each CaseLot without having to SUM the BudgetHours. If you used a SELECT * FROM in the query above you'd see:
Date DayKgProduced BudgetHours
2013-10-01 20000 24
2013-10-02 10000 24
But you want the overall total, so we SUM those daily values, correctly producing:
TotalProduction TotalBudgetHours
30000 48
Try this:
select DateProduced,TotalProduction,TotalBudgetHours from
(select DateProduced,sum(kgProduced) as TotalProduction
from CaseLots group by DateProduced) p
join
(select OperatingDate,sum(BudgetHours) as TotalBudgetHours
from Budget group by OperatingDate) b
on (p.DateProduced=b.OperatingDate)
where p.DateProduced between '2013-10-01' AND '2013-10-02'
I have a table that has the following structure:
Account_No Contact Date
-------------------------
1 2013-10-1
2 2013-9-12
3 2013-10-15
3 2013-8-1
3 2013-8-20
2 2013-10-25
4 2013-9-12
4 2013-10-2
I need to search the table and return any account numbers that have two contact dates that are within 30 days of each other. Some account numbers may have 5 or 6 contact dates. I essentially just need to return all of the full account numbers and the records that are within 30 days of each other and ignore the rest. Contact date is being stored as a date data type.
So for example account number 3 would return the 2013-8-1 and the 2013-8-20 records, and both of the records for account number 4 would appear as well, but not the other account number records nor the account number 3 from 2013-10-15.
I am using SQL Server 2008 R2.
Thanks for any help in advance!
You can use DATEADD for the +/-30 days and compare against the time window:
DECLARE #ContactDates TABLE (
Account_No int
, Contact Date
)
-- Sample data
INSERT #ContactDates (Account_No, Contact)
VALUES
(1, '2013-10-01')
, (2, '2013-09-12')
, (3, '2013-10-15')
, (3, '2013-08-01')
, (3, '2013-08-20')
, (2, '2013-10-25')
, (4, '2013-09-12')
, (4, '2013-10-02')
-- Find the records within +/-30 days
SELECT c1.Account_No, c1.Contact AS Contact_Date1
FROM #ContactDates AS c1
JOIN (
-- Inner query with the time window
SELECT Account_No
, Contact
, DATEADD(dd, 30, Contact) AS Date_Max
, DATEADD(dd, -30, Contact) AS Date_Min
FROM #ContactDates
) AS c2
-- Compare based on account number, exclude the same date
-- from comparing against itself. Usually this would be
-- a primary key, but this example doesn't show a PK.
ON (c1.Account_No = c2.Account_No AND c1.Contact != c2.Contact)
-- Compare against the +/-30 day window
WHERE c1.Contact BETWEEN c2.Date_Min AND c2.Date_Max
This returns the following:
Account_No Contact
========== ==========
3 2013-08-20
3 2013-08-01
4 2013-10-02
4 2013-09-12
In SQL Server 2012, you would have the lag() and lead() functions. In 2008, you can do the following for values that are in the same calendar month:
select distinct account_no
from t t1
where exists (select 1
from t t2
where t1.account_no = t2.account_no and
datediff(month, t1.ContactDate, t2.ContactDate) = 0
)
There is a bit of a challenge in defining what a "month" is when dates are in different months. (Is March 16 "one month" after Feb 15? They are closer in time than Jan 1 and Jan 31.) You could just go with 30 days:
select distinct account_no
from t t1
where exists (select 1
from t t2
where t1.account_no = t2.account_no and
datediff(day, t1.ContactDate, t2.ContactDate) <= 30
)
Is there an id for each of these records? if so you wont need to create one like i did but based off the data you posted
With cte as
(
Select *,
row_number() over (order by contact_date) id
From tbl
)
Select *
From cte b
Where exists (
Select 1
From cte a
Where a.account_no = b.account_no
And a.id <> b.id
And a.contact_date between b.contact_date and dateadd(d, 30, b.contact_date)
)
When I am doing SQL Query on the database then all the months that are there in database and all the values corresponding to that particular month will be summed up in the Amount Column.
Suppose this is a table
Month Category Amount Year
January Rent 12 2011
March Food 13 2011
January Gas 14 2011
May Enter 15 2011
March General 16 2011
So I written the query to sum all the values of a particular month by using this:-
"SELECT Month, SUM(Amount) AS OrderTotal FROM budget1 WHERE year="2011" GROUP BY month "
So I got the result as this:-
Month Amount
January 26
March 29
May 15
But I want is that it should show all the months from January to December and Value of 0 infront of those month which are not there in the database like this for above example.
Month Amount
January 26
February 0
March 29
April 0
May 15
June 0
July 0
August 0
September 0
October 0
November 0
December 0
Any help will be appreciated..!!
Create a table with all months Jan-Dec, call it Months. Just a single column with the names or add an extra integer for sort order (I usually call this the ordinal column), as follows:
create table months (
month varchar(20),
ordinal
);
insert into months values ('January', 1);
insert into months values ('February', 2);
insert into months values ('March', 3);
...
insert into months values ('December', 12);
The specific syntax may depend upon your database platform. Then, depending upon your database:
SELECT months.Month, SUM(Amount) AS OrderTotal
FROM months
left join budget1
on months.month = budget1.Month
WHERE year="2011" or year is null
GROUP BY months.month, months.ordinal
ORDER by month.ordinal
You'll need to convert SUM(Amount) to 0 when null. The specific function or approach to do this depends upon your database platform, or you can just do it in the code that is interpreting the results.
Build a month table, with your months and the sort order. Then left join your month column to the month column in your data table. That will get you the zeros.
So your table will look like
Month Sort
======================
January 1
February 2
March 3
etc.
You can create the table by using Create Table, following by Insert Scripts
CREATE TABLE #months (month VARCHAR(50), sort INT);
INSERT INTO #months VALUES ('January', 1);
INSERT INTO #months VALUES ('February', 2);
etc.
Then
SELECT m.Month, SUM(Amount) AS OrderTotal
FROM #months m LEFT OUTER JOIN budget1 on m.Month = budget1.Month
WHERE year=2011
GROUP BY m.Month
ORDER BY m.Sort