How can I select if Date column is as same as current year - sql

This is my Student table
Id(int) | Name(varchar) | registerDate(Date)
1 John 2012-01-01
How can I write the appropriate query to check if the person's registerDate value is as same as current year (2012)?

SELECT *
FROM Student
WHERE YEAR(registerDate) = YEAR(getdate())

The most direct solution would be to use the YEAR or DATEPART function in whatever flavor of SQL you're using. This will probably meet your needs but keep in mind that this approach does not allow you to use an index if you're searching the table for matches. In this case, it would be more efficient to use the BETWEEN operator.
e.g.
SELECT id, name, registerDate
FROM Student
WHERE registerDate BETWEEN 2012-01-01 and 2012-12-31
How you would generate the first and last day of the current year will vary by SQL flavor.
Because you're using a range, and index can be utilized. If you were using a function to calculate the year for each row, it would need to be computed for each row in the table instead of seeking directly to the relevant rows.

If by chance your flavor of sql is Microsoft TSql then this works:
SELECT * FROM Student Where datepart(yy,registerDate) = datepart(yy,GetDate())

This should work for SQL Query:
SELECT * FROM myTable
WHERE registerDate=YEAR(CURDATE())

Related

In SQL, how to select rows with matching values in one column, based on earliest date in another column

I think this should be a simple SQL exercise, but I am not sure how it is done as I am new to querying dbs with SQL. I have a table that looks like this:
select * from myschema.mytable
customer_name date
nick 2017-06-19 19:26:40
tom 2017-06-21 19:24:40
peter 2017-06-23 21:25:10
nick 2017-06-24 13:43:39
I'd like for this query to return only one row for each unique name. Specifically, I'd like the query to return the rows for each customer_name with the earliest date. In this case, the first row for nick should be returned (with date 2017-06-19), but not the other row with date 2017-06-24.
Is this a simple exercise in SQL?
Thanks!
A simple MIN will do:
SELECT
customer_name,
MIN(date) AS earliest_date
FROM myschema.mytable
GROUP BY customer_name;
For this kind of problems you can use aggregate functions. what has to be done basically is grouping the rows by name and choosing the minimum date:
select cutomer_name, MIN(date)
FROM myschema.mytable
GROUP BY customer_name

SQL for Middle Value Rather than MIN/MAX or FIRST/LAST

Is there a SQL function to return the middle value of three?
For example, assume I have a table with people who have three cars, sorted alphabetically by AutoMaker.
John: Ford
John: Honda
John: VW
then
MIN(AutoMaker) returns Ford.
MAX(AutoMaker) returns VW.
Is there a similar SQL function that will return Honda?
I am working with MS Access and Oracle.
Thank you.
Short answer: No. It's too specific.
Longer answer: It's too specific. Hence, the "middle" in what you said is actually the second record. But if you had 5 records, it would be the third, and so on. If you need that in practice, just assign a row number to each row (Oracle, Access) and then select the ((n+1)/2)nd row (WHERE row_number = (n+1)/2).
PS - which is the middle row if you have 4 rows? :)
The query could be something like this
select row_id, Field1
FROM tbl
where row_id = (select cInt(count(Field1)/2) from tbl)
The problem in access is that you do not have a row_number you would need to add a row_id to the table and then populate row_id 1,2,3,4 (ordered on Field1)

SQL - Insert using Column based on SELECT result

I currently have a table called tempHouses that looks like:
avgprice | dates | city
dates are stored as yyyy-mm-dd
However I need to move the records from that table into a table called houses that looks like:
city | year2002 | year2003 | year2004 | year2005 | year2006
The information in tempHouses contains average house prices from 1995 - 2014.
I know I can use SUBSTRING to get the year from the dates:
SUBSTRING(dates, 0, 4)
So basically for each city in tempHouses.city I need to get the the average house price from the above years into one record.
Any ideas on how I would go about doing this?
This is an SQL Server approach, and a PIVOT may be a better, but here's one way:
SELECT City,
AVG(year2002) AS year2002,
AVG(year2003) AS year2003,
AVG(year2004) AS year2004
FROM (
SELECT City,
CASE WHEN Dates BETWEEN '2002-01-01T00:00:00' AND '2002-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2002,
CASE WHEN Dates BETWEEN '2003-01-01T00:00:00' AND '2003-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2003
CASE WHEN Dates BETWEEN '2004-01-01T00:00:00' AND '2004-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2004
-- Repeat for each year
)
GROUP BY City
The inner query gets the data into the correct format for each record (City, year2002, year2003, year2004), whilst the outer query gets the average for each City.
There many be many ways to do this, and performance may be the deciding factor on which one to choose.
The best way would be to use a script to perform the query execution for you because you will need to run it multiple times and you extract the data based on year. Make sure that the only required columns are city & row id:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT INTO <table> (city) VALUES SELECT DISTINCT `city` from <old_table>;
Then for each city extract the average values, insert them into a temporary table and then insert into the main table.
SELECT avg(price), substring(dates, 0, 4) dates from <old_table> GROUP BY dates;
Otherwise you're looking at a combination query using joins and potentially unions to extrapolate the data. Because you're flattening the table into a single row per city it's going to be a little tough to do. You should create indexes first on the date column if you don't want the database query to fail with memory limits or just take a very long time to execute.

Best practice for setup and querying versioned records in T-SQL

I'm trying to optimize my SQL queries and I always come back to this one issue and I was hoping to get some insight into how I could best optimize this.
For brevity, lets say I have a simple employee table:
tbl_employees
Id HiredDateTime
------------------
1 ...
2 ...
That has versioned information in another another table for each employee:
tbl_emplyees_versioned
Id Version Name HourlyWage
-------------------------------
1 1 Bob 10
1 2 Bob 20
1 3 Bob 30
2 1 Dan 10
2 2 Dan 20
And this is how the latest version records are retrieved in a View:
Select tbl_employees.Id, employees_LatestVersion.Name, employees_LatestVersion.HourlyWage, employees_LatestVersion.Version
From tbl_employees
Inner Join tbl_employees_versioned
ON tbl_employees.Id = tbl_employees_versioned.Id
CROSS APPLY
(SELECT Id, Max(Version) AS Version
FROM tbl_employees_versioned AS employees_LatestVersion
WHERE Id = tbl_employees_versioned.Id
GROUP BY Id) AS employees_LatestVersion
To get a response like this:
Id Version Name HourlyWage
-------------------------------
1 3 Bob 30
2 2 Dan 20
When pulling a query that has over 500 employees records for which each have a couple few versions, this query starts choking up and takes a few seconds to run.
There are a couple strikes right off the bat, but I'm not sure how to overcome them.
Obviously the Cross Apply adds some performance loss. Is there a best practice when dealing with versioned information like this? Is there a better way to get just a record with the highest version?
The versioned table doesn't have a clustered index beause neither Id or Version are unique. Concatenated together they would be, but it doesn't work like that. Instead there is a non-clustered index for Id and another one for Version. Is there a better way to index this table to get any performance gain? Would an indexed view really help here?
I think the best way to structure the data is using start dates and end dates. So, the data structure for your original table would look like:
create table tbl_EmployeesHistory (
EmployeeHistoryId int,
EffDate date not null,
EndDate date,
-- Fields that describe the employee during this time
)
Then, you can see the current version using a view:
create view vw_Employees as
select *
from tbl_EmployeesHistory
where EndDate is NULL
In some cases, where future end dates are allowed, the where clause would be:
where coalesce(EndDate, getdate()) >= getdate()
Alternatively, in this case, you can default EndDate to some future date far, far away such as '01-o1-9999'. You would add this as the default in the create table statement, make the column not null, and then you can always use the statement:
where getdate() between EffDate and EndDate
As Martin points out in his comment, the coalesce() might impede the use of an index (it does in SQL Server), whereas this does not have that problem.
This is called a slowly changing dimension. Ralph Kimball discusses this concept in some length in his books on data warehousing.
Here's one way you can get a view of the most recent version for each employee:
Select Id, Name, HourlyWage, Version
FROM (
Select E.Id, V.Name, V.HourlyWage, V.Version,
row_number() OVER (PARTITION BY V.ID ORDER BY V.Version DESC) as nRow
From tbl_employees E
Inner Join tbl_employees_versioned V ON E.Id = V.Id
) A
WHERE A.nRow = 1
I suspect that this will perform better than your previous solution. One index across Id and Version in tbl_employees_versioned would most likely also help.
Also, note that you only need to join on tbl_employees if you're selecting fields that are not in tbl_employees_versioned.

How can I optimize this SQL query to get rid of the filesort and temp table?

Here's the query:
SELECT
count(id) AS count
FROM `numbers`
GROUP BY
MONTH(created_at),
YEAR(created_at)
ORDER BY
YEAR(created_at),
MONTH(created_at)
That query throws a 'Using temporary' and 'Using filesort' when doing EXPLAIN.
Ultimately what I'm doing is looking at a table of user-submitted tracking numbers and counting the number of submitted rows a grouping the counts by month/year.
ie. In November 2008 there were 11,312 submitted rows.
UPDATE, here's the DESCRIBE for the numbers table.
id int(11) NO PRI NULL auto_increment
tracking varchar(255) YES NULL
service varchar(255) YES NULL
notes text YES NULL
user_id int(11) YES NULL
active tinyint(1) YES 1
deleted tinyint(1) YES 0
feed text YES NULL
status varchar(255) YES NULL
created_at datetime YES NULL
updated_at datetime YES NULL
scheduled_delivery date YES NULL
carrier_service varchar(255) YES NULL
Give this a shot:
SELECT COUNT(x.id)
FROM (SELECT t.id,
MONTH(t.created_at) 'created_month',
YEAR(t.created_at) 'created_year'
FROM NUMBERS t) x
GROUP BY x.created_month, x.created_year
ORDER BY x.created_month, x.created_year
It's not a good habit to use functions in the WHERE, GROUP BY and ORDER BY clauses because indexes can't be used.
...query throws a 'Using temporary' and 'Using filesort' when doing EXPLAIN.
From what I found, that's to be expected when using DISTINCT/GROUP BY.
Make sure you have a covering index over YEAR and MONTH (that is, both fields within the same index) so that the ORDER BY component of your query can use an index. This should remove the need for a filesort, although a temporary table may still be needed to handle the grouping.
SELECT
count(`id`) AS count, MONTH(`created_at`) as month, YEAR(`created_at`) as year
FROM `numbers`
GROUP BY month, year
ORDER BY created_at
This will be the best you can get, as far as I can tell. I created a table with an id and a datetime column and filled it with 10000 rows. The query above uses a sub select, but it really doesn't do you any different and has the overhead of a sub select. The resulting time for mine was 0.015s and his was 0.016s.
Make sure that you have an index on created_at, this will help your initial query out. It is pretty rare to not end up with a file sort when the group by comes about, but it may be possible in other situations. MySql's docs have an article about this if you feel so inclined. I do not see how those methods can be applied here, with the information you have provided.
Whenever MySQL has to do work in memory, and that work exceeds the available amount (innodb_buffer_pool_size), it starts having to use the disk to store temporary work. You could increase the variable I mentioned, but setting it too high could cause performance problems in other areas.
If you're running a dedicated server, set it to ~50-75%.
The best method would be creating a helper column that would contain numberic values of YEAR and MONTH concatenated together:
YEAR(created_at) * 100 + MONTH(created_at)
Grouping on this column would use INDEX FOR GROUP BY.
However, you can create two helper tables, the first one containing reasonable number of years (say, from 1900 to 2100), the second one containing months (from 0 to 11), and use these tables to generate the sets:
SELECT (
SELECT COUNT(*)
FROM numbers
WHERE created_at >= '1900-01-01' + INTERVAL y YEAR + INTERVAL m MONTH
AND created_at < '1900-01-01' + INTERVAL y YEAR + INTERVAL m + 1 MONTH
)
FROM year_table
CROSS JOIN
month_table
WHERE y BETWEEN 2008 AND 2010
I'm sorry, but I have to disagree with the other answers.
I think what you need is to add an index to your table, preferably a covering index.
If you add an index on the columns you are searching on (created_at) and also on the columns you want to get a result from (id) then it will be dramatically faster then before.
The reason why you are using a temp table is because you use a group by.
To speed up the group by, you can change the MySQL server settings to increase the size of the tmp table and the max heap table size so that the temp table will be in memory.