Calculating age from incomplete SQL data - sql

Two columns in table looks like this:
Year of birth
ID
2005
-
1997
-
85
-
95...
How do I create a SQL SELECT from all the data that will return the age of each person based only on the year of birth, and if the whole is not given or only the ID is given, then:
-if only two digits of the year are given such as 85 then by default the year of birth is 1985
-if no year is given then on the basis of the ID whose first two digits are the year of birth as above i.e. ID 95...- first two digits are 95 so the year of birth is 1995

MySQL
A simple example of using MySQL CASE function:
SELECT
CASE
WHEN year_of_birth REGEXP '^[0-9]{4}$' THEN year_of_birth
WHEN year_of_birth REGEXP '^[0-9]{2}$' THEN CONCAT("19", year_of_birth)
ELSE CONCAT("19", ID)
END as year_of_birth
FROM Accounts;
First, check for 4 digit year_of_birth, if not found, check for 2 digit, if not found then get ID. Using CONCAT function to prepend "19" to the 2 digit year and 2 digit ID. Also using REGEXP to check for 4 or 2 digit years.
Try it here: https://onecompiler.com/mysql/3y6yc7mv2

Firstly, I would suggest structuring your database in a cleaner way. Having some years formatted as four digits (e. g. 1985), and others as two is confusing and causes issues such as the one you have run into.
That being said, here is an ad-hoc transact sql formula that will calculate the age based on the incomplete data.
IF 'Year of Birth' IS NULL
SELECT YEAR(NOW()) - (1900 + CAST(LEFT('ID',2) AS INT));
ELSE
IF 'Year of Birth' < 100
SELECT YEAR(NOW()) - (1900 + 'Year of Birth');
ELSE
SELECT YEAR(NOW()) - 'Year of Birth'
This code is untested, and I assumed that the ID column is a string. You'll likely have to make adjustments to make it actually work for your database
To fix the structure of your table, however, a better approach might be cleaning the data and then calculating the date, using the following commands
Filling in null year values:
UPDATE table_name
SET 'Year of Birth' = CAST(LEFT('ID',2) AS INT)
WHERE IS_NULL('Year of Birth')
Making all year values 4 digits long:
UPDATE table_name
SET 'Year of Birth' = 1900 + 'Year of Birth'
WHERE 'Year of Birth' < 100
Now, you can simply subtract the current year from the 'Year of Birth' Column to calculate the age.
Good Luck!
Here is some relevant documentation
If-Else in SQL
Year Function in SQL
String Slicing in SQL
Casting Strings to Integers in SQL

You can follow these steps:
filter out all null values (using the WHERE clause and the COALESCE function)
transform each number to a valid year
year of birth has length 2 > map it to a value smaller than the current year (e.g. 22 -> 2022, 23 -> 1993)
year of birth has length 4 > skip
cast the year of birth string to a number
compute the difference between current year and retrieved year
Here's the full query:
WITH cte AS (
SELECT COALESCE(yob, ID) AS yob
FROM tab
WHERE NOT (yob IS NULL AND ID IS NULL)
)
SELECT yob,
YEAR(NOW()) -
CASE WHEN LENGTH(yob) = 2
THEN IF(CONCAT('20',yob) > YEAR(NOW()),
CONCAT('19',yob),
CONCAT('20',yob) )
WHEN LENGTH(yob) = 1
THEN CONCAT('200', yob)
ELSE yob
END +0 AS age
FROM cte
Check the demo here.

Lots of opportunities to clean up what you started with, and lots of open questions too, but the code below should get you started.
drop table if exists #x
create table #x (YearOfBirth nvarchar(4), ID nvarchar(50))
insert into #x values
('2005', NULL),
('1997', NULL),
('85', NULL),
(NULL, '951234567890')
select
year(getdate()) -
case when len(isnull(YearOfBirth, '')) <> 4
then year(convert(date, '01/01/' +
case when YearOfBirth is NULL
then left(ID, 2)
else YearOfBirth end))
else YearOfBirth end
as PossibleAge
from #x
where (isnumeric(YearOfBirth) <> 0 and len(YearOfBirth) in (2, 4))
or (YearOfBirth is NULL and isnumeric(ID) <> 0)
One and three digit years will be ignored. Lots of ways to adjust this, but without knowing data types, etc. it's just meant to be a rough start.

Related

Concat in not working inside case statement

I have a month column with values from 1,2,3 up to 12. I am writing below query to convert column values with 1 digit to 2 digits that is values like 1 and 2 will be converted to 01 and 02, but that concatenation is not working, the month still remains as single digit.
Main query:
select
case
when len(month) = 1
then concat(0, month)
else month
end as month_new,
month
from
Table
But when I tried the query separately as below the concatenation works and it converts single digit month to 2 digits
Query 1
select top 10 concat(0, month), month
from table
Query 1 alone is working
Query 2
select
case
when len(month) = 1
then 1
else 0
end,
month
from
Table
Query 2 alone is working, means the checking of length in column month is working as expected. But when concat used inside case it is not working.
I have modified the query as below and worked for me
select
case
when len(month) = 1
then concat(0, month)
else cast(month as varchar)
end as month_new,
month
from
table
The problem is that month is an integer, whereas the result from concat() is a string. So. case is trying to cast the string back into an integer. You could force the integer into a string by using cast, but there are better ways to do this.
Instead, just use the FORMAT function:
select
format(month, '00') as month_new
, month
from viivscaazure.F_SALES_DETAIL
Don't know what database are you using and since you don't provide any sample data I only can assume that your CASE is not the problem, but if you want to do so that means your datatype is string and you tried to CONCAT string with integer in your query.
Maybe you can try to add "quote" to your zero string and CAST the result as a string.

How to list records with conditional values and non-missing records

I have a view that produces the result shown in the image below. I need help with the logic.
Requirement:
List of all employees who achieved no less than 100% target in ALL Quarters in past two years.
"B" received 90% in two different quarters. An employee who received less than 100% should NOT be listed.
Notice that "A" didn't work for Q2-2016. An employee who didn't work for that quarter should NOT be listed.
"C" is the only one who worked full two years, and received 100% in each quarter.
Edit: added image link showing Employee name,Quarter, Year, and the score.
https://i.imgur.com/FIXR0YF.png
The logic is pretty easy, it's math with quarters that is a bit of a pain.
There are 8 quarters in the last two years, so you simply need to select all the employee names in the last two years with a target >= 100%, group by employee name, and apply a HAVING clause to limit the output to those employees with count(*) = 8.
To get the current year and quarter, you can use these expressions:
cast(extract('year' from current_date) as integer) as yr,
(cast(extract('month' from current_date) as integer)-1) / 3 + 1 as quarter;
Subtract 2 from the current year to find the previous year and quarter. The code will be clearer if you put these expressions in a subquery because you will need them multiple times for the quarter arithmetic. To do the quarter arithmetic you must extract the integer value of the quarter from the text values you have stored.
Altogether, the solution should look something like this:
select
employee
from
(select employee, cast(right(quarter,1) as integer) as qtr, year
from your_table
where target >= 100
) as tgt
cross join (
select
cast(extract('year' from current_date) as integer) as yr,
(cast(extract('month' from current_date) as integer)-1) / 3 + 1 as quarter
) as qtr
where
tgt.year between qtr.yr-1 and qtr.yr
or (tgt.year = qtr.yr - 2 and tgt.qtr > qtr.quarter)
group by
employee
having
count(*) = 8;
This is untested.
If you happen to be using Postgres and expect to be doing a lot of quarter arithmetic you may want to define a custom data type as described in A Year and Quarter Data Type for PostgreSQL

PHP SQL Select between 4 columns

I´m looking for a solution, where I can select the entries between 2 dates. My table is like this
ID | YEAR | MONTH | ....
Now i want to SELECT all entries between
MONTH 9 | YEAR 2015
MONTH 1 | YEAR 2016
I don´t get any entries, because the 2nd month is lower than the 1st month. Here is my query:
SELECT *
FROM table
WHERE YEAR >= '$year'
AND MONTH >= '$month'
AND YEAR <= '$year2'
AND MONTH <= '$month2'
I can´t change the columns of the table, because a csv import is like this. Can anyone help me on this?
The years aren't disconnected from the months, so you can't test them separately.
Try something like
$date1 = $year*100+$month; // will be 201509
$date2 = $year2*100+$month2; // will be 201602
...
SELECT * FROM table WHERE (YEAR*100)+MONTH >= '$date1' AND (YEAR*100)+MONTH <= '$date2'
Make sure you protect against SQL injection though.
SELECT
*
FROM
`my_table`
WHERE
((`YEAR` * 12) + `MONTH`) >= (($year * 12) + $month)
AND ((`YEAR` * 12) + `MONTH`) <= (($year2 * 12) + $month2)
Since they aren't date fields, you need to convert to numbers that can be compared against. Multiplying the year by 12 and adding the month will give you a unique number specific to that month of the year. Then you can compare on that.
There are a couple of good answers, but assuming taht you don't/can't change the date's format something you can do is
WHERE ((YEAR>'$year') OR
(YEAR=='$year' AND MONTH>='$month')
AND ((YEAR<'$year2') OR
(YEAR=='$year2' AND MONTH<='$month2')
I would suggest the workarounds though (like alphabetically comparing in YYYYMM[DD] format).
You need to pad the month to make sure it starts with a zero. Otherwise 20162 will be lower than 201512, for example.
$date1 = $year . str_pad($month, 2, "0", STR_PAD_LEFT);
$date2 = $year2 . str_pad($month2, 2, "0", STR_PAD_LEFT);
"SELECT * FROM dates WHERE concat(`year`, LPAD(`month`, 2, '0')) >= '$date1' AND concat(`year`, LPAD(`month`, 2, '0')) <= '$date2'"
Though there are a lot of ways to solve this problem, but the best way is to convert these values into a proper date type in mysql query using str_to_date it is PHP's equivalent of strtotime, your new query should look like this
SELECT
d.*
from
dates as d
where
STR_TO_DATE( concat('1,',d.month,',',d.year) ,'%d,%m,%Y') > STR_TO_DATE('1,5,2015','%d,%m,%Y')
and
STR_TO_DATE( concat('1,',d.month,',',d.year) ,'%d,%m,%Y') < STR_TO_DATE('1,4,2016','%d,%m,%Y')
Using this technique you can easily compare dates and do much more and not worry about other complexities of calendars.
Source: MySQL date and time functions

count number of ocurrences in a year in a particular field

I have strings like FVS101209GO5 Stored in a MS Access data table, I want to count the number of strings in a certain year, so in the example that would be the year 2010
I was doing
query = "SELECT SUM( IIF( Mid( KEYLastName, 4, 2) , 1,0)) AS occur FROM MyTable WHERE Year(mydate)=2010 ;"
The length of the string is 12 or 13, for the examples #JW added
qwe123456XXX - 2012
asd345678XXX - 2034
FVS101209GO5 - 2010
If you wish to find the count of occurrences of various years within a string, you might like to use:
SELECT Mid([KEYLastName],4,2) AS [Year],
Count(KEYLastName) AS CountOfOccurances
FROM MyTable
GROUP BY Mid([KEYLastName],4,2)
This will return all the two digit years at (4,2) and the number of times they each occur.
Edit re Comments
SELECT KEYLastName,
Mid([KEYLastName],4,2) AS [Year],
DCount("*","MyTable","Mid([KEYLastName],4,2)="
& Mid([KEYLastName],4,2)) AS YearCount
FROM MyTable
Seems the 4th and 5th characters in KEYLastName represent the last 2 digits of a year, so "FVS101209GO5" is for 2010. If that is correct you can count the number of KEYLastName values which represent 2010 with either of these 2 queries:
SELECT Sum(IIf(Mid(KEYLastName, 4, 2) = "10", 1, 0)) AS occur
FROM MyTable;
SELECT Count(IIf(Mid(KEYLastName, 4, 2) = "10", 1, Null)) AS occur
FROM MyTable;
However, I'm unsure why you also have a WHERE clause to restrict the rows to those where mydate is from 2010. If you want that, too, create an index on mydate and include this WHERE clause in one of the above queries.
WHERE mydate >= #2010-1-1# AND mydate < #2011-1-1#
With an index on mydate that should be much faster than asking the db engine to apply the Year() function to the mydate value from every row in the table.

Select range age with unit of measure in same field

I am working on a table where the age of a person is in a string field where it is in the following format: (amount UnitOfMeasurement)
1 year old = 1 y
11 months old = 11 m
5 Days old = 5 d
I am trying to do a search between a range of age. Is is possible to this via a SQL query where it would order the days (d) first, then months (m), and years (y)?
The database is on SQL Server 2008, but the query will probably be done on Access as it is used for a report's record source.
The first thing I'd do in your situation is try to clean up the messy age field, and standardise it. A quick start might be to create a query where you separate the age value and the age unit, by using expressions such as:
age_unit: Right([age], 1)
and
age_value: Val([age])
If you then sort by age_unit and age_value, you will get all ages sorted correctly (under the assumption that an age in days is always less than an age in months, which in turn is always less than an age in years). Note that you must sort by unit first, then value.
If you want to return ages between a certain minimum and maximum, it's not a problem if you're sticking to a single unit, such as all ages between 5 years and 15 years. Just enter "y" as a criteria under the "age_unit" field (assuming you're using the visual query builder here) and enter "Between 5 and 15" under the "age_value" field.
If you're mixing units ("all ages between 6 months and 2 years") it gets a little more complicated. In this case you'd need to do the following:
On one criteria row you'd enter the following values for each field:
age_unit: "m"
age_value: >=6
And then on the next criteria row:
age_unit: "y"
age_value: <=2
This will return all ages having unit "m" and a value >= 6 OR having unit "y" with a value <=2.
Another somewhat simpler solution would be to convert all ages to a standard unit such as years, by doing some simple calculations, e.g. divide "d" unit values by 365.25, and divide "m" unit values by 12. Then create a new field in your table for the new standardised age data.
Your best bet would be to create a new colum with a real DATETIME value in it. You could then write code, such as a CASE statement, to help convert the string into a DATETIME. Once completed, your calculations will become much simpler.
1.This field doesn't has atomic values. This means that your table is not in 1NF.
You should split Age field into 2 columns with atomic values: IntervalType(CHAR(1)... CHECK(IntervalType IN ('d','m','y')) and IntervalValue (INT; 1,2, etc).
So, instead of Table(...,Age) you can use Table(...,IntervalType,IntervalValue) and
SELECT *
,CONVERT(VARCHAR(10),IntervalValue)
+' '+CASE IntervalType WHEN 'd' THEN 'day' WHEN 'm' THEN 'month' WHEN 'y' THEN 'year' END
+CASE WHEN IntervalValue > 1 THEN 's' ELSE '' END
+' old = '
+CONVERT(VARCHAR(10),IntervalValue)
+' '+IntervalType
FROM table
2.How do you sort these two values: 30 d and 1 month ? One month can have from 28 to 31 days.
3.SQL Server solution:
DECLARE #TestData TABLE
(
Age VARCHAR(25) NOT NULL
,IntervalValue AS CONVERT(INT,LEFT(Age,CHARINDEX(' ',Age))) PERSISTED
,IntervalType AS RIGHT(Age,1) PERSISTED
);
INSERT #TestData
VALUES
('1 year old = 1 y')
,('2 years old = 2 y')
,('11 months old = 11 m')
,('30 Days old = 30 d')
,('5 Days old = 5 d');
SELECT *
FROM #TestData a
ORDER BY a.IntervalType, a.IntervalValue;