BigQuery querying _TABLE_SUFFIX in views with wildcard tables - sql

Summarised Question:
How can I query a view that is based off a wildcard table using _TABLE_SUFFIX as a filter, rather than a column (which would query all the tables)?
e.g.
SELECT Name, date, weight
FROM `my_view`
WHERE _TABLE_SUFFIX >= '2020-01-01'
rather than
SELECT Name, date, weight
FROM `my_view`
WHERE date >= '2020-01-01'
Detailed Question:
Say I have a table bq.school.weights_20191231 with the following format
| Name | Date | Weight |
|-------|------------|--------|
| Bob | 2019-12-31 | 55kg |
| Alex | 2019-12-31 | 95kg |
| ... | ... | .. |
| Sandy | 2019-12-31 | 43kg |
and a table bq.school.weights_20200101
| Name | Date | Weight |
|-------|------------|--------|
| Bob | 2020-01-01 | 60kg |
| Alex | 2020-01-01 | 100kg |
| ... | ... | .. |
| Sandy | 2020-01-01 | 40kg |
And I create a view based off the base table bq.school.weights_* called weights_view, e.g.
SELECT Name, Date, Weight
FROM `bq.school.weights_*`
How can I query weights_view using _TABLE_SUFFIX to filter the date, rather than using WHERE DATE = "2020-01-01" (which would query all the tables)?

In addition to #AH's answer, you can simply reference the _TABLE_SUFFIX within your WHERE statement without having to select it and query the view.
I must point that, according to the documentation, the _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard. Also, it does not need to be within the SELECT clause. Below there is an example using a public dataset, it scans two tables gsod1940 and gsod1944,
#standardSQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
`bigquery-public-data.noaa_gsod.gsod*`
WHERE
( _TABLE_SUFFIX = '1940'
OR _TABLE_SUFFIX = '1944' )
ORDER BY
max DESC
Notice that there is a pattern within the table's names noaa_gsod.gsod which is followed by a year. Also, pay attention that the pseudo column _TABLE_SUFFIX is not selected. Although you can select if you desire.
Lastly, I must stress that there are limitations for using wildcards with _TABLE_SUFFIX, here:
The wildcard table functionality does not support views. If the wildcard table matches any view in the dataset, the query returns an
error. This is true whether or not your query contains a WHERE clause
on the _TABLE_SUFFIX pseudo column to filter out the view.
Currently, cached results are not supported for queries against multiple tables using a wildcard even if the Use Cached Results option
is checked. If you run the same wildcard query multiple times, you are
billed for each query. Wildcard tables support native BigQuery storage
only. You cannot use wildcards when querying an external table or a
view.
Queries that contain Data Manipulation Language (DML) statements cannot use a wildcard table as the target of the query. For example, a
wildcard table may be used in the FROM clause of an UPDATE query, but
a wildcard table cannot be used as the target of the UPDATE operation.
Wildcard queries are not supported for tables protected by customer-managed encryption keys (CMEK).

Solution
Similar to the BigQuery Date-Partitioned Views question, you need to expose the _TABLE_SUFFIX column, and then query off that, e.g.
SELECT Name, Weight, _TABLE_SUFFIX AS the_date
FROM `bq.school.weights_*
and then query the view with
SELECT *
FROM `weights_view`
WHERE the_date = "20200101"

You can do it directly in the where clause
SELECT Name, Weight
FROM bq.school.weights_* WHERE _table_suffix >= "20200101"

Related

Indexing an SQL table by datetime that is scaling

I have a large table that gets anywhere from 1-3 new entries per minute. I need to be able to find records at specific times which I can do by using a SELECT statement but it's incredibly slow. Lets say the table looks like this:
Device | Date-Time | Data |
-----------------------------------
1 | 2020-01-01 08:00 | 325
2 | 2020-01-01 08:01 | 384
1 | 2020-01-01 08:01 | 175
3 | 2020-01-01 08:01 | 8435
7 | 2020-01-01 08:02 | 784
.
.
.
I'm trying to get data like this:
SELECT *
FROM table
WHERE Date-Time = '2020-01-01 08:00' AND Device = '1'
I also need to get data like this:
SELECT *
FROM table
WHERE Date-Time > '2020-01-01 08:00' Date-Time < '2020-01-10 08:00' AND Device = '1'
But I don't know what the Date-Time will be until requested. In this case, I will have to search the entire table for these times. Can I index the start of the day so I know where dates are?
Is there a way to index this table in order to dramatically decrease the queries? Or is there a better way to achieve this?
I have tried indexing the Date-Time column but I did not decrease the query time at all.
For this query:
SELECT *
FROM mytable
WHERE date_time = '2020-01-01 08:00' AND device = 1
You want an index on mytable(date_time, device). This matches the columns that come into play in the WHERE clause, so the database should be able to lookup the matching rows efficiently.
Note that I removed the single quotes around the literal value given to device: if this is an integer, as it looks like, then it should be treated as such.
The ordering of the column in the index matters; generally, you want the most restrictive column first - from the description of your question, this would probably be date_time, hence the above suggestion. You might want to try the other way around as well (so: mytable(device, date_time)).
Another thing to keep in mind from performance perspective: you should probably enumerate the columns you want in the SELECT clause; if you just want a few additional columns, then it can be useful to add them to the index as well; this gives you a covering index, that the database can use to execute the whole query without even looking back at the data.
Say:
SELECT date_time, device, col1, col2
FROM mytable
WHERE date_time = '2020-01-01 08:00' AND device = 1
Then consider:
mytable(date_time, device, col1, col2)
Or:
mytable(device, date_time, col1, col2)
You can use TimeInMilliseconds as new column and populate it with milliseconds from the year 1970 and create Index on this column. TimeInMilliseconds will always be unique number and it will help the index to search queries faster.

Create a DB2 Calendar table for 20 years with columns dependant on the original date

I'm trying to create a calendar table for 20 years ranging from 2000 - 2020. The aim is to have one row per day along with some other columns that will use logic based on the calendar date generated. An example of this would be having One column as the calendar date (2000-01-01) and the year column reading the year from the values within the calendar date column (2000).
The code for the table is below:
CREATE TABLE TEST.CALENDAR(
CALENDAR_DATE DATE NOT NULL,
CALENDAR_YEAR INTEGER NOT NULL,
CALENDAR_MONTH_NUMBER INTEGER NOT NULL,
CALENDAR_MONTH_NAME VARCHAR(100),
CALENDAR_DAY_OF_MONTH INTEGER NOT NULL,
CALENDAR_DAY_OF_WEEK INTEGER NOT NULL,
CALENDAR_DAY_NAME VARCHAR(100),
CALENDAR_YEAR_MONTH INTEGER NOT NULL);
At the moment, I have a bunch of insert statements that manually insert rows for this table over 20 years. I'm looking to make an insert statement with variables instead and this insert statement would insert data in daily increments until the start date variable is not less than the end date variable.
Currently, I cannot get this to work at all let alone include any logic for any other columns.
Code for the variable insert statement:
declare startdate DATE, enddate DATEset startdate = '2000-01-01'
set enddate = DATEADD(yy,20,startdate)
while startdate < enddate
begin insert into TEST.CALENDAR (CALENDAR_DATE) select startdate
set startdate = DATEADD(dd,1,startdate) end
Would anyone have any ideas of how I can get this to work?
You can do this with a DB2 recursive query and date functions:
Consider:
with cte (
calendar_date,
calendar_year,
calendar_month_number,
calendar_month_name,
calendar_day_of_month,
calendar_day_of_week,
calendar_day_name
) as (
select
calendar_date,
year(calendar_date),
month(calendar_date),
monthname(calendar_date),
dayofmonth(calendar_date),
dayofweek(calendar_date),
dayname(calendar_date)
from (values(date('2000-01-01'))) as t(calendar_date)
union all
select
calendar_date + 1,
year(calendar_date + 1),
month(calendar_date + 1),
monthname(calendar_date + 1),
dayofmonth(calendar_date + 1),
dayofweek(calendar_date + 1),
dayname(calendar_date + 1)
from cte where calendar_date < date('2021-01-01')
)
select * from cte
Note: it is unclear to me what column CALENDAR_YEAR_MONTH means, so I left it apart.
Demo on DB Fiddle for the first 10 days:
CALENDAR_DATE | CALENDAR_YEAR | CALENDAR_MONTH_NUMBER | CALENDAR_MONTH_NAME | CALENDAR_DAY_OF_MONTH | CALENDAR_DAY_OF_WEEK | CALENDAR_DAY_NAME
------------: | ------------: | --------------------: | ------------------: | --------------------: | -------------------: | ----------------:
2000-01-01 | 2000 | 1 | January | 1 | 7 | Saturday
2000-01-02 | 2000 | 1 | January | 2 | 1 | Sunday
2000-01-03 | 2000 | 1 | January | 3 | 2 | Monday
2000-01-04 | 2000 | 1 | January | 4 | 3 | Tuesday
2000-01-05 | 2000 | 1 | January | 5 | 4 | Wednesday
2000-01-06 | 2000 | 1 | January | 6 | 5 | Thursday
2000-01-07 | 2000 | 1 | January | 7 | 6 | Friday
2000-01-08 | 2000 | 1 | January | 8 | 7 | Saturday
2000-01-09 | 2000 | 1 | January | 9 | 1 | Sunday
2000-01-10 | 2000 | 1 | January | 10 | 2 | Monday
Problem • Relational Knowledge
Currently, I cannot get this to work at all let alone include any logic for any other columns.
Well, there is a reason for that:
Since the Relational Model is founded on First Order Predicate Calculus (aka First Order Logic)
there is nothing in the universe that cannot be defined in terms of the RM, and stored in a Relational database (ie. one that complies with the RM), such that it can be retrieved easily and without limitation (including complex queries and DW).
Since SQL is the data sub-language for the RM, and it is Set-oriented
there is nothing, no code requirement, that cannot be implemented in Set-oriented SQL.
DB2 is a Relational database engine, with genuine Set-oriented processing, using SQL.
It appears that you are not aware of that. Instead, you are attempting to:
define low-level data structures
that you think you need for your programming,
rather than ones that are required within the RM and SQL,
that define the data, as data, and nothing but data.
write code that you need to manipulate those data structures, which is:
(a) procedural (one row at a time; WHILE loops; CURSORS; similar abominations), instead of Set-oriented, and
(b) thus the code is consequently complex, if not impossible.
Not to mention, slow as maple syrup in winter
Rather than using the available blindingly fast, Set-oriented code, which will be simple and straight-forward.
The problem may be a bit tricky, but the tables and the code required are not.
Problem • Platform Knowledge
An example of this would be having One column as the calendar date (2000-01-01) and the year column reading the year from the values within the calendar date column (2000)
That breaks two principles, and results in massive duplication within each row:
Use the correct datatype for the datum, as you have with CALENDAR_DATE. Only.
It is a DATE datatype
Using the built-in DATE datatype and DATE_PART(), DATEADD() functions means that DB2 controls the year; month; day; etc values
and all DATE errors such as 29-Feb-2019 and 31-Sep-2019 are prevented
as opposed to your code, which may have one or more errors.
a column that contains any part of a date must be stored as a DATE datatype (any part of a time as TIME; date and time as DATETIME; etc).
It breaks Codd's First Normal Form (as distinct from the ever-changing insanity purveyed as "1NF" by the pretenders)
.
Each domain [column, datum] must be Atomic wrt the platform
.
DB2 handles DATE and DATE_PART() perfectly, so there is no excuse.
All the following columns are redundant, duplicates of CALENDAR_DATE *in the same row`:
CALENDAR_YEAR
CALENDAR_MONTH_NUMBER
CALENDAR_MONTH_NAME
CALENDAR_DAY_OF_MONTH
CALENDAR_DAY_OF_WEEK
CALENDAR_DAY_NAME
CALENDAR_YEAR_MONTH
Kind of like buying a car (CALENDAR_DATE), putting it drive, and then walking beside it (7 duplicated DATE parts). You need to understand the platform; SQL, and trust it a little.
Not only that, but you will have a lot of fun and no hair left, trying to populate those duplicate columns out without making mistakes.
It needs to be said, you need to know all the date Functions in DB2 reasonably well, in order to use it proficiently.
They are duplicate columns because the values can be derived easily via DATE_PART(), something like:
SELECT DATE,
Year = DATE_PART( 'YEAR', DATE ),
MonthNo = DATE_PART( 'MONTH', DATE ),
MonthName = SUBSTR( ‘January February March April May June July August SeptemberOctober November December ‘,
( DATE_PART( 'MONTH', DATE ) - 1 ) * 9 + 1, 9 ),
DayOfMonth = DATE_PART( 'DAY', DATE ),
DayOfWeek = DATE_PART( 'DOW', DATE ),
DayName = SUBSTR( ‘Sunday Monday Tuesday WednesdayThursday Friday Saturday',
( DATE_PART( 'DOW', DATE ) - 1 ) * 9 + 1, 9 ),
YearMonth = DATE_PART( 'YEAR', DATE ) * 100 + DATE_PART( 'MONTH', DATE )
FROM TEST.CALENDAR
In Sybase, I do not have to use SUBSTR() because I have the MonthName and DayName values in tables, the query is simpler still. Or else use CASE.
Do not prefix the columns in each table with the table name. In SQL, to reference a column in a particular table, in order to resolve ambiguity, use:
.
TableName.ColumnName
.
Same as prefixing the table name with an user name TEST.CALENDAR.
The full specification is as follows, with DB2 supplying the relevant defaults based on the context of the command:
.
[SERVER.][Database.][Owner.][Table.]Column
The reason for this rule is this. Columns in one table may well be related to the same column in another table, and should be named the same. That is the nature of Relational. If you break this rule, it will retard your progressive understanding of the Relational Model, and of SQL, its data sub-language.
Problem • Data Knowledge
The aim is to have one row per day along with some other columns that will use logic based on the calendar date generated.
Why on earth would you do that ?
We store Facts about the universe in a Relational database. Only.
We do not need to store non-facts, such as:
Kyle's name isNOT"Fred"
CustomerCode "IBX" doesNOTexist.
A non-fact is simply the absence of a stored Fact.
If Fred does not exist in the Person table, and you
SELECT ... FROM Person WHERE Name = "Fred"
you will obtain an empty result set.
As it should be.
You are storing the grid that you imagine, consisting of
20 years
* 365 days
* whatever Key is relevant [eg. CustomerCode, etc),
in the form of rows.
That will only keep the database chock-full of empty rows, storing non-facts such as [eg.] CustomerCode XYZ has no event on each date for the next 20 years.
What you imagine, is the result set, or the view, which you may construct in the GUI app. It is not the storage.
Store only Facts, [eg.] actual Events per Customer.
Solution
Now for the solution.
Let me assure you that I have implemented this structure, upon which fairly complex logic has been built, in quite a few databases.
- The problem is, educating the developers in order to get them to write correct SQL code.
- Your situation is that of a developer, trying to not only write non-simple code, but to define the structures upon which it depends.
- Two distinct and different sciences.
Data Model
First, a visual data model, in order to understand the data properly.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners.
SQL DDL
Only because you appear to work at that level:
CREATE TABLE TEST.Customer (
CustomerCode CHAR(6) NOT NULL,
Name CHAR(30) NOT NULL,
CONSTRAINT PK
PRIMARY KEY ( CustomerCode )
CONSTRAINT AK1
UNIQUE ( Name )
...
);
CREATE TABLE TEST.Event (
CustomerCode CHAR(6) NOT NULL,
Date DATE NOT NULL,
Event CHAR(30) NOT NULL,
CONSTRAINT pk
PRIMARY KEY ( CustomerCode, Date )
CONSTRAINT Customer_Schedules_Event
FOREIGN KEY ( CustomerCode )
REFERENCES Customer ( CustomerCode )
...
);
INSERT a row only when a Customer reserves a Date
Non-facts are not stored.
SELECT ... WHERE CustomerCode = ... AND Date = ...
will obtain a row if the Customer has booked an Event on that Date
will obtain nothing (empty result set) if the Customer has NOT booked an Event on that Date
If you need to store recurring Events, such as a birthday for the next 20 years, use a Projection in SQL to generate the 20 INSERT commands, which is the Set-oriented method.
If you cannot code a Projection, and only then, write a WHILE loop, which is procedural, one row per execution.
Please feel free to ask questions, the more specific the better.
As you can see, this Question is really about How to set up a Calendar for Events, but I won't change the title until I am sure this answer is what you are seeking. And about modelling data for a Relational database. I will add the tag.

Referencing current row in FILTER clause of window function

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.
Assume I have a table with a bunch of columns, one of which is of date type:
col1 | col2 | dt
------------------------
1 | a | 2015-07-01
2 | b | 2015-07-03
3 | c | 2015-07-10
4 | d | 2015-07-11
5 | e | 2015-07-11
6 | f | 2015-07-13
...
A window definition for processing on the date over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)
I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:
col1 | col2 | dt | count
--------------------------------
1 | a | 2015-07-01 | 1
2 | b | 2015-07-03 | 2
3 | c | 2015-07-10 | 1
4 | d | 2015-07-11 | 3
5 | e | 2015-07-11 | 3
6 | f | 2015-07-13 | 4
...
The FILTER clause of the window functions seems like the obvious choice:
count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win
But how do I specify current_row.dt (for lack of a better syntax)? Is this even possible?
If this is not possible, are there other ways of selecting date ranges in a window frame? The frame specification is no help as it is all row-based.
I am not interested in alternative solutions using sub-queries, it has to be based on window processing.
You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.
To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:
SELECT t.*, ct.ct_last4days
FROM (
SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
FROM (
SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
FROM tbl t1
) d
LEFT JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
) ct
JOIN tbl t USING (dt);
Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.
SQL Fiddle.
Related:
Select finishes where athlete didn't finish first for the past 3 events
PostgreSQL: running count of rows for a query 'by minute'
PostgreSQL unnest() with element number
I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause
take just an a_expr, which is just the normal expression clauses. There
is nothing specific to window functions or filter clauses in an expression.
As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you
what you want.
It's possible that you could get some traction from an enclosing query:
http://www.postgresql.org/docs/current/static/sql-expressions.html
When an aggregate expression appears in a subquery (see Section 4.2.11
and Section 9.22), the aggregate is normally evaluated over the rows
of the subquery. But an exception occurs if the aggregate's arguments
(and filter_clause if any) contain only outer-level variables: the
aggregate then belongs to the nearest such outer level, and is
evaluated over the rows of that query.
but it's not obvious to me how.
https://www.postgresql.org/docs/release/11.0/
Window functions now support all framing options shown in the SQL:2011
standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode,
and frame exclusion options
https://dbfiddle.uk/p-TZHp7s
You can do something like
count(dt) over(order by dt RANGE BETWEEN INTERVAL '3 DAYS' PRECEDING AND CURRENT ROW)

Is there an established pattern for SQL queries which group by a range?

I've seen a lot of questions on SO concerning how to group data by a range in a SQL query.
The exact scenarios vary, but the general underlying problem in each is to group by a range of values rather than each discrete value in the GROUP BY column. In other words, to group by a less precise granularity than you're storing in the database table.
This crops up often in the real world when producing things like histograms, calendar representations, pivot tables and other bespoke reporting outputs.
Some example data (tables unrelated):
| OrderHistory | | Staff |
--------------------------- ------------------------
| Date | Quantity | | Age | Name |
--------------------------- ------------------------
|01-Jul-2012 | 2 | | 19 | Barry |
|02-Jul-2012 | 5 | | 53 | Nigel |
|08-Jul-2012 | 1 | | 29 | Donna |
|10-Jul-2012 | 3 | | 26 | James |
|14-Jul-2012 | 4 | | 44 | Helen |
|17-Jul-2012 | 2 | | 49 | Wendy |
|28-Jul-2012 | 6 | | 62 | Terry |
--------------------------- ------------------------
Now let's say we want to use the Date column of the OrderHistory table to group by weeks, i.e. 7-day ranges. Or perhaps group the Staff into 10-year age ranges:
| Week | QtyCount | | AgeGroup | NameCount |
-------------------------------- -------------------------
|01-Jul to 07-Jul | 7 | | 10-19 | 1 |
|08-Jul to 14-Jul | 8 | | 20-29 | 2 |
|15-Jul to 21-Jul | 2 | | 30-39 | 0 |
|22-Jul to 28-Jul | 6 | | 40-49 | 2 |
-------------------------------- | 50-59 | 1 |
| 60-69 | 1 |
-------------------------
GROUP BY Date and GROUP BY Age on their own won't do it.
The most common answers I see (none of which are consistently voted "correct") are to use one or more of:
a bunch of CASE statements, one per grouping
a bunch of UNION queries, with a different WHERE clause per grouping
as I'm working with SQL Server, PIVOT() and UNPIVOT()
a two-stage query using a sub-select, temp table or View construct
Is there an established generic pattern for dealing with such queries?
You can use some of the dimensional modeling techniques, such as fact tables and dimension tables. Order History can act as a fact table with DateKey foreign key relation to a Date dimension.
Date dimension can have a schema such as below:
Note that Date table is pre-filled with data up-to N number of years.
Using an example above, here is a sample query to get the result:
select CalendarWeek, sum(Quantity)
from OrderHistory a
join DimDate b
on a.DateKey = b.DateKey
group by CalendarWeek
For Staff table, you can store Birthday Key instead of age and let the query calculate the age and ranges.
Here is SQL Fiddle
Date dimension population script was taken from here.
As is often the case this SQL problem requires using more than one pattern in composition.
In this case the two you can use are
NTILE
Numbers Table
You can use NTITLE to create a set number of groups. However since you don't have each member of the groups represented you also need to use a numbers table Since you're using SQL Server you have it easy as you don't have to simulate either.
Here's an example for the Staff problem
WITH g as (
SELECT
NTILE(6) OVER (ORDER BY number) grp,
NUMBER
FROM
master..spt_values
WHERE
TYPE = 'P'
and number >=10 and number <=69
)
SELECT
CAST(min(g.number) as varchar) + ' - ' +
CAST(max(g.number) as varchar) AgeGroup ,
COUNT(s.age) NameCount
FROM
g
LEFT JOIN Staff s
ON g.NUMBER = s.Age
GROUP BY
grp
DEMO
You can apply this to dates as well it just requires some date to day maniplulation
Take a look at the OVER clause and its associated clauses: PARTITION BY, ROW, RANGE...
Determines the partitioning and ordering of a rowset before the
associated window function is applied. That is, the OVER clause
defines a window or user-specified set of rows within a query result
set. A window function then computes a value for each row in the
window. You can use the OVER clause with functions to compute
aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
My favorite case in this genre is where transactions must be grouped by fiscal quarter or fiscal year. The fiscal quarter or fiscal year boundaries of various enterprises can border on the bizarre.
My favorite way to implement this is to create a separate table for the attributes of a date. Let's call the table "Almanac". One of the columns in this table is the fiscal quarter, and another one is the fiscal year. The key to this table is of course the date. Ten years worth of data fill up 3,650 rows, plus a few for leap years. You then need a program that can populate this table from scratch. All the enterprise calendar rules are built into this one program.
When you need to group transaction data by fiscal quarter, you just join with this table over date, and then group by fiscal quarter.
I figure this pattern could be extended to groupings by other kinds of ranges, but I've never done it myself.
In your first example your intervals are regular so you can achieve the desired result simply by using functions. Below is an example that gets the data as you require it. The first query keeps the first column in date format (how I would preferably deal with it doing any formatting outside of SQL), the second does the string conversion for you.
DECLARE #OrderHistory TABLE (Date DATE, Quantity INT)
INSERT #OrderHistory VALUES
('20120701', 2), ('20120702', 5), ('20120708', 1), ('20120710', 3),
('20120714', 4), ('20120717', 2), ('20120728', 6)
SET DATEFIRST 7
SELECT DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date) AS WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
GROUP BY DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date)
SELECT WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
CROSS APPLY
( SELECT CONVERT(VARCHAR(6), DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date), 6) + ' to ' +
CONVERT(VARCHAR(6), DATEADD(DAY, 7 - DATEPART(WEEKDAY, Date), Date), 6) AS WeekStart
) ws
GROUP BY WeekStart
Something similar can be done for your age grouping using:
SELECT CAST(FLOOR(Age / 10.0) * 10 AS INT)
However this fails for 30-39 because there is no data for this group.
My stance on the matter would be, if you are doing the query as a one off, using a temp table, cte or case statement should work just fine, this should also extend to reusing the same query on small sets of data.
If you are likely to reuse the group however, or you are referring to significant amounts of data then create a permanent table with the ranges defined and indices applied to any columns required. This is the basis of creating dimensions in OLAP.
Couldn't you treat the age (or date) as a foreign key in a new, tiny table that is just ages (or dates) and their corresponding ranges? A join statement could provide a new table with a column that contains AgeGroups. With the new table you could use the standard group-by method.
It does seem reckless to make a new table for grouping, but it would be easy to make programatically and I think it would be easier to maintain (or drop and recreate) than a case statement or a where clause. If the result of this query is a one-off, a throwaway sql statement would probably work best, but I think my method makes the most sense for long-term use.
Well, some years ago with Oracle DB we did it the following way:
We had two tables: Sessions and Ranges. Ranges had foreign key that referenced Session.
When we needed to perform SQL, we created a new record in Sessions and several new records in Ranges that referred to that session.
Our SQL joined Ranges with filter by Session:
select sum(t.Value), r.Name
from DataTable t
join Ranges r on (r.Session = ? and r.Start t.MyDate)
group by r.Name
After we got results we deleted that record from Sessions and records from Ranges where deleted by cascade.
We had daemon job that purged Sessions from junk records that were leaked in case of extraordinary situation (killed processes, etc).
This worked perfectly. Since that time Oracle added new SQL clauses, and maybe they could be used instead. But on other RDBMSes this is still a valid way.
Another approach is to create a number of functions such as GET_YEAR_BY_DATE or GET_QUARTER_BY_DATE or GET_WEEK_BY_DATE (they would return start date of corresponding
period, for example, for any date return start date of year). And then group by them:
select sum(Value), GET_YEAR_BY_DATE(MyDate) from DataTable
group by GET_YEAR_BY_DATE(MyDate)

Select query with date condition

I would like to retrieve the records in certain dates after d/mm/yyyy, or after d/mm/yyyy and before d/mm/yyyy, how can I do it ?
SELECT date
FROM table
WHERE date > 1/09/2008;
and
SELECT date
FROM table
WHERE date > 1/09/2008;
AND date < 1/09/2010
It doesn't work.
Be careful, you're unwittingly asking "where the date is greater than one divided by nine, divided by two thousand and eight".
Put # signs around the date, like this #1/09/2008#
The semicolon character is used to terminate the SQL statement.
You can either use # signs around a date value or use Access's (ACE, Jet, whatever) cast to DATETIME function CDATE(). As its name suggests, DATETIME always includes a time element so your literal values should reflect this fact. The ISO date format is understood perfectly by the SQL engine.
Best not to use BETWEEN for DATETIME in Access: it's modelled using a floating point type and anyhow time is a continuum ;)
DATE and TABLE are reserved words in the SQL Standards, ODBC and Jet 4.0 (and probably beyond) so are best avoided for a data element names:
Your predicates suggest open-open representation of periods (where neither its start date or the end date is included in the period), which is arguably the least popular choice. It makes me wonder if you meant to use closed-open representation (where neither its start date is included but the period ends immediately prior to the end date):
SELECT my_date
FROM MyTable
WHERE my_date >= #2008-09-01 00:00:00#
AND my_date < #2010-09-01 00:00:00#;
Alternatively:
SELECT my_date
FROM MyTable
WHERE my_date >= CDate('2008-09-01 00:00:00')
AND my_date < CDate('2010-09-01 00:00:00');
select Qty, vajan, Rate,Amt,nhamali,ncommission,ntolai from SalesDtl,SalesMSt where SalesDtl.PurEntryNo=1 and SalesMST.SaleDate= (22/03/2014) and SalesMST.SaleNo= SalesDtl.SaleNo;
That should work.
hey guys i think what you are looking for is this one using select command.
With this you can specify a RANGE GREATER THAN(>) OR LESSER THAN(<) IN MySQL WITH THIS:::::
select* from <**TABLE NAME**> where year(**COLUMN NAME**) > **DATE** OR YEAR(COLUMN NAME )< **DATE**;
FOR EXAMPLE:
select name, BIRTH from pet1 where year(birth)> 1996 OR YEAR(BIRTH)< 1989;
+----------+------------+
| name | BIRTH |
+----------+------------+
| bowser | 1979-09-11 |
| chirpy | 1998-09-11 |
| whistler | 1999-09-09 |
+----------+------------+
FOR SIMPLE RANGE LIKE USE ONLY GREATER THAN / LESSER THAN
mysql>
select COLUMN NAME from <TABLE NAME> where year(COLUMN NAME)> 1996;
FOR EXAMPLE
mysql>
select name from pet1 where year(birth)> 1996 OR YEAR(BIRTH)< 1989;
+----------+
| name |
+----------+
| bowser |
| chirpy |
| whistler |
+----------+
3 rows in set (0.00 sec)