MySQL function to get list of Mondays - sql

What is the most efficient way to get a list of all of the unique Mondays from a date field?
When I'm not all that concerned about efficiency, I have done something like:
DATE-weekday(DATE) + 1. But now I need to compute this on a large dataset and I don't want my user wishing for a Rubik's cube because it is taking so long. :)
Yes, the field is indexed.
EDIT:
What I need is a list of all of the weeks that contain records. I am creating a payroll report where the user will select the week to filter the report.
Here is what I came up with:
SELECT DISTINCT ((DATE(`timStart`)-DAYOFWEEK(`timStart`))+2)
FROM `time`
ORDER BY 1 DESC
Anyone have any improvement to suggest?

"unique mondays from a date field" should be as simple as:
SELECT DISTINCT(`date`) FROM `table` WHERE WEEKDAY(`date`)=0
"weeks in which we have date values" should be as simple as:
SELECT DISTINCT(WEEK(`date`)) FROM `table` WHERE YEAR(`date`)=2010;
SELECT WEEK(now()),YEAR(now());
+-------------+-------------+
| WEEK(now()) | YEAR(now()) |
+-------------+-------------+
| 31 | 2010 |
+-------------+-------------+
which will benefit you as well in your other payroll queries, using
WHERE WEEK(`date`)=31
Put your trust in mysql to handle things from there.

Related

postgreSQL 1 minute average of values

I need to query from my database (postgreSQL) the value average of 1 minute.
The database is recorded in milliseconds and looks like this:
timestamp | value
------------------
1528029265001 | 123
1528029265020 | 232
1528029265025 | 332
1528029265029 | 511
... | ...
1528029265176 | 231
I tried:
SELECT
avg(value),
extract(minutes FROM to_timestamp(timestamp/1000)) as one_min
FROM table GROUP BY one_min
But it seems to be stuck in querying.
I'm sure there is a super easy way to do it.
Any suggestions?
I am guessing that you want:
SELECT floor(timestamp/(60 * 1000)) as timestamp_minute
avg(value)
FROM table
GROUP BY timestamp_minute;
However, if your problem is performance, this will have the same performance issues. For that, you would want a where clause that limits the amount of data being processed.
Because the data is not being collected at even intervals, you might want the simple average of the first and last values, or something like that.

Get a count of records created each week in SQL

I have a table Questions. How can I get a count of all questions asked in a week?
More generically, how can I bucket records by the week they were created in?
Questions
id created_at title
----------------------------------------------------
1 2014-12-31 09:43:42 "Add things"
2 2013-11-23 02:98:55 "How do I ruby?"
3 2015-01-15 15:11:19 "How do I python?"
...
I'm using SQLLite, but PG answers are fine too.
Or if you have the answer using Rails ActiveRecord, that is amazing, but not required.
I've been trying to use DATEPART() but haven't come up with anything successful yet: http://msdn.microsoft.com/en-us/library/ms174420.aspx
In postgreSQL it's as easy as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
FROM Questions
If you wanted to get the # of questions per week, simply do the following:
SELECT date_trunc('week', created_at) created_week, COUNT(*) weekly_cnt
FROM Questions
GROUP BY date_trunc('week', created_at)
Hope this helps. Note that date_trunc() will return a date and not a number (i.e., it won't return the ordinal number of the week in the year).
Update:
Also, if you wanted to accomplish both in a single query you could do so as follows:
SELECT id, created_at, title, date_trunc('week', created_at) created_week
, COUNT(*) OVER ( PARTITION BY date_trunc('week', created_at) ) weekly_cnt
FROM Questions
In the above query I'm using COUNT(*) as a window function and partitioning by the week in which the question was created.
If the created_at field is already indexed, I would simply look for all rows with a created_at value between X and Y. That way the index can be used.
For instance, to get rows with a created_at value in the 3rd week of 2015, you would run:
select *
from questions
where created_at between '2015-01-11' and '2015-01-17'
This would allow the index to be used.
If you want to be able to specify a week in the where clause, you could use the date_part or extract functions to add a column to this table storing the year and week #, and then index that column so that queries can take advantage of it.
If you don't want to add the column, you could of course use either function in the where clause and query against the table, but you won't be able to take advantage of any indexes.
Because you mentioned not wanting to add a column to the table, I would recommend adding a function based index.
For example, if your ddl were:
create table questions
(
id int,
created_at timestamp,
title varchar(20)
);
insert into questions values
(1, '2014-12-31 09:43:42','"Add things"'),
(2, '2013-11-23 02:48:55','"How do I ruby?"'),
(3, '2015-01-15 15:11:19','"How do I python?"');
create or replace function to_week(ts timestamp)
returns text
as 'select concat(extract(year from ts),extract(week from ts))'
language sql
immutable
returns null on null input;
create index week_idx on questions (to_week(created_at));
You could run:
select q.*, to_week(created_at) as week
from questions q
where to_week(created_at) = '20153';
And get:
| ID | CREATED_AT | TITLE | WEEK |
|----|--------------------------------|--------------------|-------|
| 3 | January, 15 2015 15:11:19+0000 | "How do I python?" | 20153 |
(reflecting the third week of 2015, ie. '20153')
Fiddle: http://sqlfiddle.com/#!15/c77cd/3/0
You could similarly run:
select q.*,
concat(extract(year from created_at), extract(week from created_at)) as week
from questions q
where concat(extract(year from created_at), extract(week from created_at)) =
'20153';
Fiddle: http://sqlfiddle.com/#!15/18c1e/3/0
But it would not take advantage of the function based index, because there is none. In addition, it would not use any index you might have on the created_at field because, while that field might be indexed, you really aren't searching on that field. You are searching on the result of a function applied against that field. So the index on the column itself cannot be used.
If the table is large you will either want a function based index or a column holding that week that is itself indexed.
SQLite has no native datetime type like MS SQL Server does, so the answer may depend on how you are storing dates. Not all T-SQL will work in SQLite.
You can store datetime as an integer that counts seconds since 1/1/1970 12:00 AM. There are 604,800 seconds in a week. So you could query on an expression like
rawdatetime / 604800 -- iff rawdatetime is integer
More on handling datetimes in SQLite here: https://www.sqlite.org/datatype3.html
Get the week number using strfdate(%V)
Store it in DB, and use it to identify in which week a question was asked
http://apidock.com/ruby/DateTime/strftime
SQL can do it too with the DATE_FORMAT(datetime,'%u')
So use:
SELECT DATE_FORMAT(column,'%u') FROM Table

How to count the number of active days in a dataset with SQL Server 2008

SQL Server 2008, rendered in html via aspx webpage.
What I want to achieve, is to get an average per day figure that makes allowance for missing days. To do this I need to count the number of active days in a table.
Example:
Date | Amount
---------------------
2014-08-16 | 234.56
2014-08-16 | 258.30
2014-08-18 | 25.84
2014-08-19 | 259.21
The sum of the lot (777.961) divided by the number of active days (3) would = 259.30
So it needs to go "count number of different dates in the returned range"
Is there a tidy way to do this?
If you just want that one row of output then this should work:
select sum(amount) / count(distinct date) as your_average
from your_table
Fiddle:
http://sqlfiddle.com/#!2/7ffd1/1/0
I don't know this will be help to you, how about using Group By, Avg, count function.
SELECT Date, AVG(Amount) AS 'AmountAverage', COUNT(*) AS 'NumberOfActiveDays'
FROM YourTable WITH(NOLOCK)
GROUP BY Date
About AVG function, see here: Link

Is there an established pattern for SQL queries which group by a range?

I've seen a lot of questions on SO concerning how to group data by a range in a SQL query.
The exact scenarios vary, but the general underlying problem in each is to group by a range of values rather than each discrete value in the GROUP BY column. In other words, to group by a less precise granularity than you're storing in the database table.
This crops up often in the real world when producing things like histograms, calendar representations, pivot tables and other bespoke reporting outputs.
Some example data (tables unrelated):
| OrderHistory | | Staff |
--------------------------- ------------------------
| Date | Quantity | | Age | Name |
--------------------------- ------------------------
|01-Jul-2012 | 2 | | 19 | Barry |
|02-Jul-2012 | 5 | | 53 | Nigel |
|08-Jul-2012 | 1 | | 29 | Donna |
|10-Jul-2012 | 3 | | 26 | James |
|14-Jul-2012 | 4 | | 44 | Helen |
|17-Jul-2012 | 2 | | 49 | Wendy |
|28-Jul-2012 | 6 | | 62 | Terry |
--------------------------- ------------------------
Now let's say we want to use the Date column of the OrderHistory table to group by weeks, i.e. 7-day ranges. Or perhaps group the Staff into 10-year age ranges:
| Week | QtyCount | | AgeGroup | NameCount |
-------------------------------- -------------------------
|01-Jul to 07-Jul | 7 | | 10-19 | 1 |
|08-Jul to 14-Jul | 8 | | 20-29 | 2 |
|15-Jul to 21-Jul | 2 | | 30-39 | 0 |
|22-Jul to 28-Jul | 6 | | 40-49 | 2 |
-------------------------------- | 50-59 | 1 |
| 60-69 | 1 |
-------------------------
GROUP BY Date and GROUP BY Age on their own won't do it.
The most common answers I see (none of which are consistently voted "correct") are to use one or more of:
a bunch of CASE statements, one per grouping
a bunch of UNION queries, with a different WHERE clause per grouping
as I'm working with SQL Server, PIVOT() and UNPIVOT()
a two-stage query using a sub-select, temp table or View construct
Is there an established generic pattern for dealing with such queries?
You can use some of the dimensional modeling techniques, such as fact tables and dimension tables. Order History can act as a fact table with DateKey foreign key relation to a Date dimension.
Date dimension can have a schema such as below:
Note that Date table is pre-filled with data up-to N number of years.
Using an example above, here is a sample query to get the result:
select CalendarWeek, sum(Quantity)
from OrderHistory a
join DimDate b
on a.DateKey = b.DateKey
group by CalendarWeek
For Staff table, you can store Birthday Key instead of age and let the query calculate the age and ranges.
Here is SQL Fiddle
Date dimension population script was taken from here.
As is often the case this SQL problem requires using more than one pattern in composition.
In this case the two you can use are
NTILE
Numbers Table
You can use NTITLE to create a set number of groups. However since you don't have each member of the groups represented you also need to use a numbers table Since you're using SQL Server you have it easy as you don't have to simulate either.
Here's an example for the Staff problem
WITH g as (
SELECT
NTILE(6) OVER (ORDER BY number) grp,
NUMBER
FROM
master..spt_values
WHERE
TYPE = 'P'
and number >=10 and number <=69
)
SELECT
CAST(min(g.number) as varchar) + ' - ' +
CAST(max(g.number) as varchar) AgeGroup ,
COUNT(s.age) NameCount
FROM
g
LEFT JOIN Staff s
ON g.NUMBER = s.Age
GROUP BY
grp
DEMO
You can apply this to dates as well it just requires some date to day maniplulation
Take a look at the OVER clause and its associated clauses: PARTITION BY, ROW, RANGE...
Determines the partitioning and ordering of a rowset before the
associated window function is applied. That is, the OVER clause
defines a window or user-specified set of rows within a query result
set. A window function then computes a value for each row in the
window. You can use the OVER clause with functions to compute
aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
My favorite case in this genre is where transactions must be grouped by fiscal quarter or fiscal year. The fiscal quarter or fiscal year boundaries of various enterprises can border on the bizarre.
My favorite way to implement this is to create a separate table for the attributes of a date. Let's call the table "Almanac". One of the columns in this table is the fiscal quarter, and another one is the fiscal year. The key to this table is of course the date. Ten years worth of data fill up 3,650 rows, plus a few for leap years. You then need a program that can populate this table from scratch. All the enterprise calendar rules are built into this one program.
When you need to group transaction data by fiscal quarter, you just join with this table over date, and then group by fiscal quarter.
I figure this pattern could be extended to groupings by other kinds of ranges, but I've never done it myself.
In your first example your intervals are regular so you can achieve the desired result simply by using functions. Below is an example that gets the data as you require it. The first query keeps the first column in date format (how I would preferably deal with it doing any formatting outside of SQL), the second does the string conversion for you.
DECLARE #OrderHistory TABLE (Date DATE, Quantity INT)
INSERT #OrderHistory VALUES
('20120701', 2), ('20120702', 5), ('20120708', 1), ('20120710', 3),
('20120714', 4), ('20120717', 2), ('20120728', 6)
SET DATEFIRST 7
SELECT DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date) AS WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
GROUP BY DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date)
SELECT WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
CROSS APPLY
( SELECT CONVERT(VARCHAR(6), DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date), 6) + ' to ' +
CONVERT(VARCHAR(6), DATEADD(DAY, 7 - DATEPART(WEEKDAY, Date), Date), 6) AS WeekStart
) ws
GROUP BY WeekStart
Something similar can be done for your age grouping using:
SELECT CAST(FLOOR(Age / 10.0) * 10 AS INT)
However this fails for 30-39 because there is no data for this group.
My stance on the matter would be, if you are doing the query as a one off, using a temp table, cte or case statement should work just fine, this should also extend to reusing the same query on small sets of data.
If you are likely to reuse the group however, or you are referring to significant amounts of data then create a permanent table with the ranges defined and indices applied to any columns required. This is the basis of creating dimensions in OLAP.
Couldn't you treat the age (or date) as a foreign key in a new, tiny table that is just ages (or dates) and their corresponding ranges? A join statement could provide a new table with a column that contains AgeGroups. With the new table you could use the standard group-by method.
It does seem reckless to make a new table for grouping, but it would be easy to make programatically and I think it would be easier to maintain (or drop and recreate) than a case statement or a where clause. If the result of this query is a one-off, a throwaway sql statement would probably work best, but I think my method makes the most sense for long-term use.
Well, some years ago with Oracle DB we did it the following way:
We had two tables: Sessions and Ranges. Ranges had foreign key that referenced Session.
When we needed to perform SQL, we created a new record in Sessions and several new records in Ranges that referred to that session.
Our SQL joined Ranges with filter by Session:
select sum(t.Value), r.Name
from DataTable t
join Ranges r on (r.Session = ? and r.Start t.MyDate)
group by r.Name
After we got results we deleted that record from Sessions and records from Ranges where deleted by cascade.
We had daemon job that purged Sessions from junk records that were leaked in case of extraordinary situation (killed processes, etc).
This worked perfectly. Since that time Oracle added new SQL clauses, and maybe they could be used instead. But on other RDBMSes this is still a valid way.
Another approach is to create a number of functions such as GET_YEAR_BY_DATE or GET_QUARTER_BY_DATE or GET_WEEK_BY_DATE (they would return start date of corresponding
period, for example, for any date return start date of year). And then group by them:
select sum(Value), GET_YEAR_BY_DATE(MyDate) from DataTable
group by GET_YEAR_BY_DATE(MyDate)

SQL Incremenet and Reset Variable Insert

I have probably missed something simple with my problem. However its like me to overlook any small details. But I have been searching for a while now and havent come across anything similar to my issue.
Setup:
SQL 2005, Stored Procedure.
I have a table that is updated frequently with call attempts. Using a UNIQUEIDENTIFIER to tie all the records together i.e.
xxx-xxx-xxx-xxx-xxxx | 20/06/2011 12:00 | 10
I want to have a stored procedure that Grabs all the records, and sorts them by the UNIQUEIDENTIFIER and at the same time, producing a counter for the attempts. i.e.
1111-1111-1111-1111 | 20/06/2011 12:05 | 10 | 0
1111-1111-1111-1111 | 20/06/2011 12:06 | 30 | 1
2222-2222-2222-2222 | 20/06/2011 12:10 | 120 | 0
3333-3333-3333-3333 | 20/06/2011 12:20 | 50 | 0
From the above it should be simple to be able to to indentify the call attempts and add on the number. However im probably being very silly.
Any help is appreciated.
Regards
Chris
You can use ROW_NUMBER.
e.g.
SELECT ID, DateField, FieldA,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY DateField ASC) AS Counter
FROM YourTable
ORDER BY ID, DateField
The PARTITION BY basically resets the counter for each distinct ID and the following ORDER BY ensures the counter is assigned incrementally ordered by the Date field. Note this will be a counter starting from 1 each time. If you want it to start from 0, you can just subtract 1 in the SELECT
Could you not use the "group by" clause (with a count) in SQL ?
You could use "order by" to perform the sort (although its not immediately apparent how SQL implements that or what use it would be).
Also, the question title doesn't seem to match what you're asking.