SQL moving average with calc of average price per square metre - sql

I'm trying to get a moving average for a set of real estate data that I can then graph, using MS SQL 2005.
Fields are: DateSold datetime, FloorSQM decimal(8,2), Price decimal(10,0)
What I need to do is get an average price per square metres of all the properties sold within each month as far back as we have data, which can be several years.
Therefore I need to Price/FloorSQM each property, and then apply a moving average to them for each month. Hope that makes sense.
I am completely stuck trying to figure this out, and googling endlessly is endlessly confusing.
I thought I found what I'm looking for in another post, but I keep getting "incorrect syntax near 'm'" when I try to compile it.
SELECT YEAR(DateSold) AS year,
MONTH(DateSold) AS month,
ROUND(Euros/CONVERT(DECIMAL(10,2),FloorSQM),2) as avg
FROM
(select s.*, DATE_ADD(DTE, INTERVAL m.i MONTH) DateSold, m.i
FROM Property s
cross join (select 0 i union select 1 union select 2 union select 3 union
select 4 union select 5 union select 6 union select 7 union
select 8 union select 9 union select 10 union select 11) m) sq
WHERE DateSold <= curdate()
GROUP BY YEAR(DateSold), MONTH(DateSold)
When I have this working, I then need to introduce a WHERE clause so I can limit the properties found to certain kinds (houses, apartments) and areas (New York, Los Angeles, etc)
Can anyone lend a hand please? Many thanks for your help.

This should run, you were using some Mysql syntax:
SELECT YEAR(DateSold) AS year,
MONTH(DateSold) AS month,
ROUND(Euros/CONVERT(DECIMAL(10,2),FloorSQM),2) as avg
FROM (SELECT s.*, DATEADD(MONTH,m.i,DateSold) DateSold2, m.i
FROM Property s
cross join (select 0 i union select 1 union select 2 union select 3 union
select 4 union select 5 union select 6 union select 7 union
select 8 union select 9 union select 10 union select 11
) m
)sq
WHERE DateSold<= GETDATE()
GROUP BY YEAR(DateSold), MONTH(DateSold)
Not sure if it returns the desired results.
Update: I thought DTE was a field name, you can't have DateSold referenced twice in your subquery, so I changed to DateSold2, not sure if you intend for the DATEADD() result to be used in all places where DateSold is used, you could also just remove DateSold from your subquery SELECT list.

Try this:
SELECT YEAR(DateSold) AS year,
MONTH(DateSold) AS month,
AVG(ROUND(Price/CONVERT(DECIMAL(10,2),FloorSQM),2)) as avg
FROM property p
GROUP BY YEAR(DateSold), MONTH(DateSold);
Note, changed the query from Euros to Price, as you specified in the schema.
The grouping of year, month is all you need to get the average of the price for that month. You were missing the aggregate function, AVG around your formula (though might be more accurate if ROUND was wrapped around AVG). I set up a quick sample at SQLFiddle.

Related

How to filter the last 7 days based on the previous query? -BigQuery

Hi I just want to ask how to resolve this problem.
Example in the query indicated below.
In the next query I will prepare, I want to filter the last 7 days of the delivery date. Do not use current_date because the maximum date is very late.
Assuming the current date is 7/12/2022 but the query shows a maximum date of 7/07/2022. How can I filter the date from 7/1/2022 to 7/07/2022?
, Datas1 as
(select distinct (delivery_due_date) as delivery_date
, Specialist
, Id_number
, Staff_Total as Total_Items
from joining
where Delivery_Due_Date is not null
)
Actually I used max function in where but I get an error. Please help me.
Created Examples of such data in first block.
Performed the select on that data in second block.
Extracted Maximum Delivery data in 3rd Block.
Restricted last block for 7 days of data collected from 3rd block.
WITH joining AS(
SELECT '2022-07-01' AS delivery_due_date, 'ABC' as Specialist,222 as Id_number, 21 as Staff_Total union all
SELECT '2022-07-07' AS delivery_due_date, 'ABC2' as Specialist,223 as Id_number, 01 as Staff_Total union all
SELECT '2022-07-15' AS delivery_due_date, 'ABC4' as Specialist,212 as Id_number, 25 as Staff_Total union all
SELECT '2022-07-20' AS delivery_due_date, 'AB5C' as Specialist,224 as Id_number, 15 as Staff_Total union all
SELECT '2022-07-05' AS delivery_due_date, 'ABC7' as Specialist,226 as Id_number, 87 as Staff_Total ),
Datas1 as (select distinct (delivery_due_date) as delivery_date , Specialist
, Id_number , Staff_Total as Total_Items from joining where Delivery_Due_Date is not null ),
Datas2 as (
select max(delivery_date) as ddd from Datas1)
select Datas1.* from Datas1,Datas2 where date(delivery_date) between date_sub(date(Datas2.ddd), interval 7 day) and date(Datas2.ddd)

Optimizing a SQL Query when joining two tables. Naive algorithm gives me millions of rows

I apologize, I am not sure how to word the heading for this question. If someone can reform it for me to better suit what I am asking that would be greatly appreciated.
I have a quite a problem that I have been stuck on for the longest of time. I use Tableau in conjunction with SQLServer 2014.
I have a single table that essentially shows all the employees within our company. Their hire date and termination date (NULL if still employed). I am looking to generate a headcount forthe past. Here is an example of this table:
employeeID HireDate TermDate FavouriteFish FavouriteColor
1 1/1/15 1/1/18 Cod Blue
2 4/12/16 NULL Bass Red
.
.
.
n
As you can see this list can go on and on.. In fact the table in question I currently have over 10000 rows for all past and current employees.
My goal is to construct a view to see on each day of the year for the last 5 years what the total head count of employed employees we had. Here is the kicker though... I need to retain the rest of the information such as:
FavouriteFish FavouriteColor... and so on
The only way I can think of doing this,and it doesn't work so well because it is extremely slow, is to create a separate calendar table for each day of the year for the past 5 years; like so:
Date CrossJoinKey
1/1/2013 1
1/2/2013 1
1/3/2013 1
.
.
.
4/4/2018 1
From here I add a column to my original Employee Table called: CrossJoinKey; like so..
employeeID HireDate TermDate FavouriteFish FavouriteColor CrossJoinKey
1 1/1/15 1/1/18 Cod Blue 1
2 4/12/16 NULL Bass Red 1
.
.
.
n
From here I create a LEFT JOIN Calendar ON Employee.CrossKeyJoin=Calendar.CrossKeyJoin
Hopefully here you can immediately see the problem.. It creates a relationship with A LOT OF ROWS!! In fact it gives me somewhere around 18million rows. It gives me the information I am after, however it takes a LONG time to query, and when I import this to Tableau to create an extract it takes a LONG time to do that as well.. However, once Tableau eventually creates the extract it is relatively fast. I can use the inner guts to isolate and creating a headcount by day for the past 5 years... by seeing if the date field is in between the termDate and HireDate. But this entire process needs to be quite frequently, and I feel the current method is unpractical.
I feel this is a naive way to accomplish what I am after, and I feel this problem has to have addressed before in the past. Is there anyone here that could please shed some light on how to optimize this?
Word of note... I have considered essentially creating a query that populates a calendar table by looking through the employee table and 'counting' each employee that is still employed, but this method loses resolution and I am not able to retain any of the other data for the employees.
Something like this, shown below, works and is much faster, but NOT what I am looking for:
Date HeadCount
1/1/2013 1200
1/2/2013 1201
1/3/2013 1200
.
.
.
4/4/2018 5000
Thank you very much for spending some time on this.
UPDATE:
Here is a link to a google sheets data sample
I've edited some of your data, as you can see in the #example table.
I wanted to note: you either spelt =D favourite or Color incorrectly, Please correct it, either FavoriteColor, or FavouriteColour.
declare #example as table (
exampleid int identity(1,1) not null primary key clustered
, StartDate date not null
, TermDate date null
);
insert into #example (StartDate, TermDate)
select '1/1/2016', '1/1/2018' union all
select '4/3/2017', '1/10/2018' union all
select '9/3/2016', '2/4/2018' union all
select '5/9/2017', '11/21/2017' union all
select '9/18/2016', '11/15/2017' union all
select '12/12/2015', '2/8/2018' union all
select '6/18/2016', '12/20/2017' union all
select '7/26/2015', '11/4/2017' union all
select '1/7/2015', NULL union all
select '10/2/2013', '10/21/2013' union all
select '10/14/2013', '12/12/2017' union all
select '10/11/2013', '11/3/2017' union all
select '6/30/2015', '1/12/2018' union all
select '2/17/2016', NULL union all
select '8/12/2015', '11/26/2017' union all
select '12/2/2015', '11/15/2017' union all
select '3/30/2016', '11/30/2017' union all
select '6/18/2016', '11/9/2017' union all
select '4/3/2017', '2/12/2018' union all
select '3/26/2017', '1/15/2018' union all
select '1/27/2017', NULL union all
select '7/29/2016', '1/10/2018';
--| This is an adaption of Aaron Bertrand's work (time dim table)
--| this will control the start date
declare #date datetime = '2013-10-01';
;with cte as (
select 1 ID
, #date date_
union all
select ID + 1
, dateadd(day, 1, date_)
from cte
)
, cte2 as (
select top 1000 ID
, cast(date_ as date) date_
, 0 Running
, iif(datepart(weekday, date_) in(1,7), 0,1) isWeekday
, datepart(weekday, date_) DayOfWeek
, datename(weekday, date_) DayOfWeekName
, month(date_) Month
, datename(month, date_) MonthName
, datepart(quarter, date_) Quarter
from cte
--option (maxrecursion 1000)
)
, cte3 as (
select a.id
, Date_
, b.StartDate
, iif(b.StartDate is not null, 1, 0) Add_
, iif(c.TermDate is not null, -1, 0) Remove_
from cte2 a
left join #example b
on a.date_ = b.StartDate
left join #example c
on a.date_ = c.TermDate
-- option (maxrecursion 1000)
)
select date_
--, Add_
--, Remove_
, sum((add_ + remove_)) over (order by date_ rows unbounded preceding) CurrentCount
from cte3
option (maxrecursion 1000)
Result Set:
date_ CurrentCount
2013-10-01 0
2013-10-02 1
2013-10-03 1
2013-10-04 1
2013-10-05 1

Sort Numbers in varchar value in SQL Server

My Goal is to load a monthly-daily tabular presentation of sales data with sum total and other average computation at the bottom,
I have one data result set with one column that is named as 'Day' which corresponds to the days of the month, with automatic datatype of int.
select datepart(day, a.date ) as 'Day'
On my second result set, is the loading of the sum at the bottom, it happens that the word 'Sum' is aligned to the column of Day, and I used Union All TO COMBINE the result set together, expected result set is something to this like
day sales
1 10
2 20
3 30
4 10
5 20
6 30
.
.
.
31 10
Sum 130
What I did is to convert the day value, originally in int to varchar datatype. this is to successfully join columns and it did, the new conflict is the sorting of the number
select * from #SalesDetailed
UNION ALL
select * from #SalesSum
order by location, day
Assuming your union query returns the correct results, just messes up the order, you can use case with isnumeric in the order by clause to manipulate your sort:
SELECT *
FROM
(
SELECT *
FROM #SalesDetailed
UNION ALL
SELECT *
FROM #SalesSum
) u
ORDER BY location,
ISNUMERIC(day) DESC,
CASE WHEN ISNUMERIC(day) = 1 THEN cast(day as int) end
The isnumeric will return 1 when day is a number and 0 when it's not.
Try this
select Day, Sum(Col) as Sales
from #SalesDetailed
Group by Day With Rollup
Edit (Working Sample) :
select
CASE WHEN Day IS NULL THEN 'SUM' ELSE STR(Day) END as Days,
Sum(Sales) from
(
Select 1 as Day , 10 as Sales UNION ALL
Select 2 as Day , 20 as Sales
) A
Group by Day With Rollup
EDIT 2:
select CASE WHEN Day IS NULL THEN 'SUM' ELSE STR(Day) END as Days,
Sum(Sales) as Sales
from #SalesDetailed
Group by Day With Rollup

alternative to lag SQL command

I have a table which has a table like this.
Month-----Book_Type-----sold_in_Dollars
Jan----------A------------ 100
Jan----------B------------ 120
Feb----------A------------ 50
Mar----------A------------ 60
Mar----------B------------ 30
and so on
I have to calculate the expected sales for each month and book type based on the last 2 months sales.
So for March and type A it would be (100+50)/2 = 75
For March and type B it is 120/1 since no data for Feb is there.
I was trying to use the lag function but it wouldn't work since there is data missing in a few rows.
Any ideas on this?
Since it plans to ignore missing values, this should probably work. Don't have a database to test it on at the moment but will give it another go in the morning
select
month,
book_type,
sold_in_dollars,
avg(sold_in_dollars) over (partition by book_type order by month
range between interval '2' month preceding and interval '1' month preceding) as avg_sales
from myTable;
This sort of assumes that month has a date datatype and can be sorted on... if it's just a text string then you'll need something else.
Normally you could just use rows between 2 preceding and 1 preceding but but this will take the two previous data points and not necessarily the two previous months if there are rows missing.
You could work it out with lag but it would be a bit more complicated.
As far as I know, you can give a default value to lag() :
SELECT Book_Type,
(lag(sold_in_Dollars, 1, 0) OVER(PARTITION BY Book_Type ORDER BY Month) + lag(sold_in_Dollars, 2, 0) OVER(PARTITION BY Book_Type ORDER BY Month))/2 AS expected_sales
FROM your_table
GROUP BY Book_Type
(Assuming Month column doesn't really contain JAN or FEB but real, orderable dates.)
What about something like (forgive the sql server syntax, but you get the idea):
Select Book_type, AVG(sold_in_dollars)
from MyTable
where Month in (Month(DATEADD('mm'-1,GETDATE)),Month(DATEADD('mm'-2,GETDATE)))
group by booktype
A partition outer join can help create the missing data. Create a set of months and join those values to each row by the month and perform the join once for each book type. I created the months January through April in this example:
with test_data as
(
select to_date('01-JAN-2010', 'DD-MON-YYYY') month, 'A' book_type, 100 sold_in_dollars from dual union all
select to_date('01-JAN-2010', 'DD-MON-YYYY') month, 'B' book_type, 120 sold_in_dollars from dual union all
select to_date('01-FEB-2010', 'DD-MON-YYYY') month, 'A' book_type, 50 sold_in_dollars from dual union all
select to_date('01-MAR-2010', 'DD-MON-YYYY') month, 'A' book_type, 60 sold_in_dollars from dual union all
select to_date('01-MAR-2010', 'DD-MON-YYYY') month, 'B' book_type, 30 sold_in_dollars from dual
)
select book_type, month, sold_in_dollars
,case when denominator = 0 then 'N/A' else to_char(numerator / denominator) end expected_sales
from
(
select test_data.book_type, all_months.month, sold_in_dollars
,count(sold_in_dollars) over
(partition by book_type order by all_months.month rows between 2 preceding and 1 preceding) denominator
,sum(sold_in_dollars) over
(partition by book_type order by all_months.month rows between 2 preceding and 1 preceding) numerator
from
(
select add_months(to_date('01-JAN-2010', 'DD-MON-YYYY'), level-1) month from dual connect by level <= 4
) all_months
left outer join test_data partition by (test_data.book_type) on all_months.month = test_data.month
)
order by book_type, month

SELECT any FROM system

Can any of these queries be done in SQL?
SELECT dates FROM system
WHERE dates > 'January 5, 2010' AND dates < 'January 30, 2010'
SELECT number FROM system
WHERE number > 10 AND number < 20
I'd like to create a generate_series, and that's why I'm asking.
I assume you want to generate a recordset of arbitrary number of values, based on the first and last value in the series.
In PostgreSQL:
SELECT num
FROM generate_series (11, 19) num
In SQL Server:
WITH q (num) AS
(
SELECT 11
UNION ALL
SELECT num + 1
FROM q
WHERE num < 19
)
SELECT num
FROM q
OPTION (MAXRECURSION 0)
In Oracle:
SELECT level + 10 AS num
FROM dual
CONNECT BY
level < 10
In MySQL:
Sorry.
Sort of for dates...
Michael Valentine Jones from SQL Team has an AWESOME date function
Check it out here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61519
In Oracle
WITH
START_DATE AS
(
SELECT TO_CHAR(TO_DATE('JANUARY 5 2010','MONTH DD YYYY'),'J')
JULIAN FROM DUAL
),
END_DATE AS
(
SELECT TO_CHAR(TO_DATE('JANUARY 30 2010','MONTH DD YYYY'),'J')
JULIAN FROM DUAL
),
DAYS AS
(
SELECT END_DATE.JULIAN - START_DATE.JULIAN DIFF
FROM START_DATE, END_DATE
)
SELECT TO_CHAR(TO_DATE(N + START_DATE.JULIAN, 'J'), 'MONTH DD YYYY')
DESIRED_DATES
FROM
START_DATE,
(
SELECT LEVEL N
FROM DUAL, DAYS
CONNECT BY LEVEL < DAYS.DIFF
)
If you want to get the list of days, with a SQL like
select ... as days where date is between '2010-01-20' and '2010-01-24'
And return data like:
days
----------
2010-01-20
2010-01-21
2010-01-22
2010-01-23
2010-01-24
This solution uses no loops, procedures, or temp tables. The subquery generates dates for the last thousand days, and could be extended to go as far back or forward as you wish.
select a.Date
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date between '2010-01-20' and '2010-01-24'
Output:
Date
----------
2010-01-24
2010-01-23
2010-01-22
2010-01-21
2010-01-20
Notes on Performance
Testing it out here, the performance is surprisingly good: the above query takes 0.0009 sec.
If we extend the subquery to generate approx. 100,000 numbers (and thus about 274 years worth of dates), it runs in 0.0458 sec.
Incidentally, this is a very portable technique that works with most databases with minor adjustments.
Not sure if this is what you're asking, but if you are wanting to select something not from a table, you can use 'DUAL'
select 1, 2, 3 from dual;
will return a row with 3 columns, contain those three digits.
Selecting from dual is useful for running functions. A function can be run with manual input instead of selecting something else into it. For example:
select some_func('First Parameter', 'Second parameter') from dual;
will return the results of some_func.
In SQL Server you can use the BETWEEN keyword.
Link:
http://msdn.microsoft.com/nl-be/library/ms187922(en-us).aspx
You can select a range by using WHERE and AND WHERE. I can't speak to performance, but its possible.
The simplest solution to this problem is a Tally or Numbers table. That is a table that simply stores a sequence of integers and/or dates
Create Table dbo.Tally (
NumericValue int not null Primary Key Clustered
, DateValue datetime NOT NULL
, Constraint UK_Tally_DateValue Unique ( DateValue )
)
GO
;With TallyItems
As (
Select 0 As Num
Union All
Select ROW_NUMBER() OVER ( Order By C1.object_id ) As Num
From sys.columns as c1
cross join sys.columns as c2
)
Insert dbo.Tally(NumericValue, DateValue)
Select Num, DateAdd(d, Num, '19000101')
From TallyItems
Where Num
Once you have that table populated, you never need touch it unless you want to expand it. I combined the dates and numbers into a single table but if you needed more numbers than dates, then you could break it into two tables. In addition, I arbitrarily filled the table with 100K rows but you could obviously add more. Every day between 1900-01-01 to 9999-12-31 takes about 434K rows. You probably won't need that many but even if you did, the storage is tiny.
Regardless, this is a common technique to solving many gaps and sequences problems. For example, your original queries all ran in less than tenth of a second. You can also use this sort of table to solve gaps problems like:
Select NumericValue
From dbo.Tally
Left Join MyTable
On Tally.NumericValue = MyTable.IdentityColumn
Where Tally.NumericValue Between SomeLowValue And SomeHighValue