Select range age with unit of measure in same field - sql

I am working on a table where the age of a person is in a string field where it is in the following format: (amount UnitOfMeasurement)
1 year old = 1 y
11 months old = 11 m
5 Days old = 5 d
I am trying to do a search between a range of age. Is is possible to this via a SQL query where it would order the days (d) first, then months (m), and years (y)?
The database is on SQL Server 2008, but the query will probably be done on Access as it is used for a report's record source.

The first thing I'd do in your situation is try to clean up the messy age field, and standardise it. A quick start might be to create a query where you separate the age value and the age unit, by using expressions such as:
age_unit: Right([age], 1)
and
age_value: Val([age])
If you then sort by age_unit and age_value, you will get all ages sorted correctly (under the assumption that an age in days is always less than an age in months, which in turn is always less than an age in years). Note that you must sort by unit first, then value.
If you want to return ages between a certain minimum and maximum, it's not a problem if you're sticking to a single unit, such as all ages between 5 years and 15 years. Just enter "y" as a criteria under the "age_unit" field (assuming you're using the visual query builder here) and enter "Between 5 and 15" under the "age_value" field.
If you're mixing units ("all ages between 6 months and 2 years") it gets a little more complicated. In this case you'd need to do the following:
On one criteria row you'd enter the following values for each field:
age_unit: "m"
age_value: >=6
And then on the next criteria row:
age_unit: "y"
age_value: <=2
This will return all ages having unit "m" and a value >= 6 OR having unit "y" with a value <=2.
Another somewhat simpler solution would be to convert all ages to a standard unit such as years, by doing some simple calculations, e.g. divide "d" unit values by 365.25, and divide "m" unit values by 12. Then create a new field in your table for the new standardised age data.

Your best bet would be to create a new colum with a real DATETIME value in it. You could then write code, such as a CASE statement, to help convert the string into a DATETIME. Once completed, your calculations will become much simpler.

1.This field doesn't has atomic values. This means that your table is not in 1NF.
You should split Age field into 2 columns with atomic values: IntervalType(CHAR(1)... CHECK(IntervalType IN ('d','m','y')) and IntervalValue (INT; 1,2, etc).
So, instead of Table(...,Age) you can use Table(...,IntervalType,IntervalValue) and
SELECT *
,CONVERT(VARCHAR(10),IntervalValue)
+' '+CASE IntervalType WHEN 'd' THEN 'day' WHEN 'm' THEN 'month' WHEN 'y' THEN 'year' END
+CASE WHEN IntervalValue > 1 THEN 's' ELSE '' END
+' old = '
+CONVERT(VARCHAR(10),IntervalValue)
+' '+IntervalType
FROM table
2.How do you sort these two values: 30 d and 1 month ? One month can have from 28 to 31 days.
3.SQL Server solution:
DECLARE #TestData TABLE
(
Age VARCHAR(25) NOT NULL
,IntervalValue AS CONVERT(INT,LEFT(Age,CHARINDEX(' ',Age))) PERSISTED
,IntervalType AS RIGHT(Age,1) PERSISTED
);
INSERT #TestData
VALUES
('1 year old = 1 y')
,('2 years old = 2 y')
,('11 months old = 11 m')
,('30 Days old = 30 d')
,('5 Days old = 5 d');
SELECT *
FROM #TestData a
ORDER BY a.IntervalType, a.IntervalValue;

Related

Calculating age from incomplete SQL data

Two columns in table looks like this:
Year of birth
ID
2005
-
1997
-
85
-
95...
How do I create a SQL SELECT from all the data that will return the age of each person based only on the year of birth, and if the whole is not given or only the ID is given, then:
-if only two digits of the year are given such as 85 then by default the year of birth is 1985
-if no year is given then on the basis of the ID whose first two digits are the year of birth as above i.e. ID 95...- first two digits are 95 so the year of birth is 1995
MySQL
A simple example of using MySQL CASE function:
SELECT
CASE
WHEN year_of_birth REGEXP '^[0-9]{4}$' THEN year_of_birth
WHEN year_of_birth REGEXP '^[0-9]{2}$' THEN CONCAT("19", year_of_birth)
ELSE CONCAT("19", ID)
END as year_of_birth
FROM Accounts;
First, check for 4 digit year_of_birth, if not found, check for 2 digit, if not found then get ID. Using CONCAT function to prepend "19" to the 2 digit year and 2 digit ID. Also using REGEXP to check for 4 or 2 digit years.
Try it here: https://onecompiler.com/mysql/3y6yc7mv2
Firstly, I would suggest structuring your database in a cleaner way. Having some years formatted as four digits (e. g. 1985), and others as two is confusing and causes issues such as the one you have run into.
That being said, here is an ad-hoc transact sql formula that will calculate the age based on the incomplete data.
IF 'Year of Birth' IS NULL
SELECT YEAR(NOW()) - (1900 + CAST(LEFT('ID',2) AS INT));
ELSE
IF 'Year of Birth' < 100
SELECT YEAR(NOW()) - (1900 + 'Year of Birth');
ELSE
SELECT YEAR(NOW()) - 'Year of Birth'
This code is untested, and I assumed that the ID column is a string. You'll likely have to make adjustments to make it actually work for your database
To fix the structure of your table, however, a better approach might be cleaning the data and then calculating the date, using the following commands
Filling in null year values:
UPDATE table_name
SET 'Year of Birth' = CAST(LEFT('ID',2) AS INT)
WHERE IS_NULL('Year of Birth')
Making all year values 4 digits long:
UPDATE table_name
SET 'Year of Birth' = 1900 + 'Year of Birth'
WHERE 'Year of Birth' < 100
Now, you can simply subtract the current year from the 'Year of Birth' Column to calculate the age.
Good Luck!
Here is some relevant documentation
If-Else in SQL
Year Function in SQL
String Slicing in SQL
Casting Strings to Integers in SQL
You can follow these steps:
filter out all null values (using the WHERE clause and the COALESCE function)
transform each number to a valid year
year of birth has length 2 > map it to a value smaller than the current year (e.g. 22 -> 2022, 23 -> 1993)
year of birth has length 4 > skip
cast the year of birth string to a number
compute the difference between current year and retrieved year
Here's the full query:
WITH cte AS (
SELECT COALESCE(yob, ID) AS yob
FROM tab
WHERE NOT (yob IS NULL AND ID IS NULL)
)
SELECT yob,
YEAR(NOW()) -
CASE WHEN LENGTH(yob) = 2
THEN IF(CONCAT('20',yob) > YEAR(NOW()),
CONCAT('19',yob),
CONCAT('20',yob) )
WHEN LENGTH(yob) = 1
THEN CONCAT('200', yob)
ELSE yob
END +0 AS age
FROM cte
Check the demo here.
Lots of opportunities to clean up what you started with, and lots of open questions too, but the code below should get you started.
drop table if exists #x
create table #x (YearOfBirth nvarchar(4), ID nvarchar(50))
insert into #x values
('2005', NULL),
('1997', NULL),
('85', NULL),
(NULL, '951234567890')
select
year(getdate()) -
case when len(isnull(YearOfBirth, '')) <> 4
then year(convert(date, '01/01/' +
case when YearOfBirth is NULL
then left(ID, 2)
else YearOfBirth end))
else YearOfBirth end
as PossibleAge
from #x
where (isnumeric(YearOfBirth) <> 0 and len(YearOfBirth) in (2, 4))
or (YearOfBirth is NULL and isnumeric(ID) <> 0)
One and three digit years will be ignored. Lots of ways to adjust this, but without knowing data types, etc. it's just meant to be a rough start.

SQL; Split a value up using sql in hive/hue

needing some advice on splitting a number into a date timestamp, currently using Hue to query the hive db;
In a table I have a column that is used to capture a unique ref for a record. The value looks like this;
219872021081000741
Contained within this is a date and time, I'm looking to extract (using sql) the date/time from this and have it as a column of its own. Here is the breakdown of the number:
Based on the bold values from left to right is DD YYYY MM HHMM
21 987 2021 08 1000 741
regex
[0-3]?[0-9]{1}$ref[2][0-9][0-9][0-9][0-1][0-9][0-2][0-9][0-5][0-9][0-9]{3}_"
Using sql, I want to assess the number then create a column that then formats it to DD-MM-YY HHMM as timestamp. Have reviewed some posts, and trying out a few things, but not having much luck. The other sticking point is the DD will not always be 2 values eg, if it was the 1st then it will be 1 not 01.
Trying to incorporate into the below. Thanks in advance for any advice.
select *,
cast((UTC +(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`
from Table.Name
where
name rlike 'FieldValue.*'
UPDATE: In a roundabout way I updated the sql to do a count of the value.
If it has 17 digits, then i know the day is anywhere from the 1st-9th
so I tag it as 17.
If it has 18 digits, then I know the day is anywhere from the 10h-endofmonth
From here i use substring to return the day components, which I'll bring into a single field via concat or something along those lines.**
Here is the update sql, just need to figure out/get some guidance on how I now determine how to use the new column FieldCount eg it is 17, then substring(FieldValue ,1,1) given its anything from the 1st-9th. If its 18, then substring(FieldValue ,1,2) given its anything from the 10th up.
select *,
cast((utc+(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`,
case
when FieldValue REGEXP '^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$' then '17'
when FieldValue REGEXP '^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$' then '17'
end FieldCount,
substring(FieldValue ,6,4) as Years,
substring(FieldValue ,1,1) as Days,
substring(FieldValue ,10,2) as Months,
substring(FieldValue ,12,2) as Hours,
substring(FieldValue ,14,2) as Minut
from table.name
New Update, I changed this now to separate based on case condition. This basically separates out the value into separate fields. Any ideas to concat based on alias field names?
select
AField,
cast((UTC+(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`,
case when length(AField) = 18 then substring(AField,1,2) else substring(AField,1,1) end Days,
case when length(AField) = 18 then substring(AField,10,2) else substring(AField,9,2) end Months,
case when length(AField) = 18 then substring(AField,6,4) else substring(AField,5,4) end years,
case when length(AField) = 18 then substring(AField,12,2) else substring(caseid,11,2) end Hours,
case when length(AField) = 18 then substring(AField,14,2) else substring(AField,13,2) end minutes
from table.name
Correct timestamp string representation in Hive is yyyy-MM-dd HH:mm:ss.S.
You do not need to extract all parts separately, then concat to get timestamps. Using regexp_replace you can build correct timestamp using backreferences to capturing groups (in round brackets) in the regexp.
with mytable as(--test dataset, use your table instead
select stack(2,
'219872021081000741',
'19872021081000741'
) as AField
)
select
case when length(AField) = 18
then timestamp(regexp_replace(AField,'^(\\d{2})\\d{3}(\\d{4})(\\d{2})(\\d{2})(\\d{2})\\d{3}$','$2-$3-$1 $4:$5:00.0'))
else timestamp(regexp_replace(AField,'^(\\d)\\d{3}(\\d{4})(\\d{2})(\\d{2})(\\d{2})\\d{3}$','$2-$3-0$1 $4:$5:00.0'))
end as result
from mytable
Result:
result
2021-08-21 10:00:00.0
2021-08-01 10:00:00.0
Note: timestamp() construct here is to demonstrate that string produced is compatible with timestamp data type and is being cast correctly, you can keep it as string if you prefer.

How do I calculate averages of dates formatted as VARCHAR from multiple rows?

I'm having an issue where I am running a script against a database to get the average difference between multiple VARCHARs that need to be converted to DateTimes, and then take the average between all the results.
My code is:
SELECT YEAR(b.DateAcknow),AVG(datediff(dd,convert(datetime,b.DateAssign),
convert(datetime,b.DateResolv))) as DayAverage,
AVG(datediff(hh,convert(datetime,b.TimeAcknow),
convert(datetime,b.TimeResolv))) as HourAverage
FROM table AS b
WHERE (x = y)
AND YEAR(DateResolv) >= 2006
AND YEAR(DateResolv) < 2016
AND b.resolution <>''
GROUP BY YEAR(b.DateAcknow)
ORDER BY YEAR(b.DateAcknow)`
The result I'm getting does not seem to make sense, much less it includes 1900 which falls outside of my parameters of the where clause
Here it is:
NULL 42 NULL
1900 0 12
2006 7 -5
2007 6 1
2008 7 1
2009 4 1
2010 2 0
2011 2 0
2012 2 0
2013 2 0
2014 2 0
2015 2 0
Am I converting the VARCHARs wrong?
I doubt that the average for thousands of entries from 2010-2015 are all the same 2 days and 0 hours too, so either I'm doing something wrong or the data is bad.
You are filtering by DateResolv and group by DateAcknow.
Filter and group by the same Field and NULL and values outside of the Range should disappear.
You'll probably want to take away the aggregate part and just run:
SELECT YEAR(b.DateAcknow)
, convert(datetime,b.DateAssign) AS DateAssignDateTime
, convert(datetime,b.DateResolv) AS DateResolveDateTime
, datediff(dd,convert(datetime,b.DateAssign), convert(datetime,b.DateResolv)) AS AssignResolveDayDiff
, convert(datetime,b.TimeAcknow) AS TimeAcknowDateTime
, convert(datetime,b.TimeResolv) AS TimeResolveDateTime
, datediff(hh,convert(datetime,b.TimeAcknow), convert(datetime,b.TimeResolv)) AS AcknowResolveHourDiff
FROM table AS b
WHERE (x = y)
AND YEAR(DateAcknow) >= 2006
AND YEAR(DateAcknow) < 2016
AND b.resolution <>''
ORDER BY YEAR(b.DateAcknow)
To ensure that all of your conversions are making sense first. Then you will have a better understanding of what it is you're actually averaging.
Afterwards, if it all checks out, then your query should work fine (though, do check that mxix' change from
...
AND YEAR(DateResolv) >= 2006
AND YEAR(DateResolv) < 2016
...
to
...
AND YEAR(b.DateAcknow) >= 2006
AND YEAR(b.DateAcknow) < 2016
...
makes sense for you.
If you're looking to increase the precision of the output, then try converting your datediffs like so:
Old: AVG(datediff(dd,convert(datetime,b.DateAssign), convert(datetime,b.DateResolv)))
New: AVG(Convert(Decimal(10, 5), datediff(dd,convert(datetime,b.DateAssign), convert(datetime,b.DateResolv))))
Your old query is averaging days, rounded to the nearest integer value, giving you values like '2'. This new adjustment will give you answers like "1.51235" days instead.
Since there's 100k records of differences (both plus and minus), there's a good chance the averages will be close to zero if they follow a normal or uniform distribution. Also try:
AVG(Convert(Decimal(10, 5), ABS(datediff(dd,convert(datetime,b.DateAssign), convert(datetime,b.DateResolv)))))
if you want absolute difference instead. If your old data had values "5, -3, 4, -1, 3", then the old method would produce the average of 2, but if you had the "ABS" function working on them, it would change the values to "5, 3, 4, 1, 3" and will move your resulting average in the ++ direction (here, it changes to "3", or "3.2", if you did your decimal conversion too).
My intention is to display for each year what the average response
time is in Days and Hours. – obizues
Assuming:
DateAcknow is a varchar date with an empty timestamp (e.g., "2011/01/15")
TimeAcknow is DateAcknow's corresponding varchar time (e.g., "15:35")
DateResolve is a varchar date with an empty timestamp (e.g., "2011/01/16") which is always greater than or equal to DateAcknow
TimeResolve is DateResolve's corresponding varchar time (e.g., "13:47")
You want to average total hours difference (using the above example, this record's hours difference is 22)
If you need help with your varchar date's format and the convert function, see:
http://msdn.microsoft.com/en-us/library/ms187928.aspx
The following approach should work to achieve your intention:
SELECT YEAR(b.DateAcknow)
, AVG(DateDiff(Day, Convert(datetime, b.DateAcknow) + convert(datetime, b.TimeAcknow), Convert(datetime, b.DateResolv) + Convert(datetime, b.TimeResolve))) AS AvgDaysDifference
, AVG(DateDiff(Hour, Convert(datetime, b.DateAcknow) + convert(datetime, b.TimeAcknow), Convert(datetime, b.DateResolv) + Convert(datetime, b.TimeResolve))) AS AvgHoursDifference
FROM table AS b
WHERE (x = y) AND YEAR(DateAcknow) >= 2006 AND YEAR(DateAcknow) < 2016
AND b.resolution <>''
GROUP BY YEAR(b.DateAcknow)
This should do it if the assumptions about your data and your intention are correct. It is difficult to help when it's not clear.

Oracle - Count the same value used on consecutive days

Date jm Text
-------- ---- ----
6/3/2015 ne Good
6/4/2015 ne Good
6/5/2015 ne Same
6/8/2015 ne Same
I want to count how often the "same" value occurs in a set of consecutive days.
I dont want to count the value for the whole database. Now on the current date it is 2 (above example).
It is very important for me that "Same" never occurs...
The query has to ignore the weekend (6 and 7 june).
Date jm Text
-------- ---- ----
6/3/2015 ne Same
6/4/2015 ne Same
6/5/2015 ne Good
6/8/2015 ne Good
In this example the count is zero
Okay, I'm starting to get the picture, although at first I thought you wanted to count by jm, and now it seems you want to count by Text = 'Same'. Anyway, that's what this query should do. It gets the row for the current date. Is connects all previous rows and counts them. Also, it shows whether the current text (and that of the connected rows).
So the query will return one row (if there is one for today), which will show the date, jm and Text of the current date, the number of consecutive days for which the Text has been the same (just in case you want to know how many days it is 'Good'), and the number of days (either 0 or the same as the other count) for which the Text has been 'Same'.
I hope this query is right, or at least it gives you an idea of how to solve the problem using CONNECT BY. I should mention I based the 'Friday-detection' on this question.
Also, I don't have Oracle at hand, so please forgive me for any minor syntax errors.
WITH
VW_SAMESTATUSES AS
( SELECT t.*
FROM YourTable t
START WITH -- Start with the row for today
t.Date = trunc(sysdate)
CONNECT BY -- Connect to previous row that have a lower date.
-- Note that PRIOR refers to the prior record, which is
-- actually the NEXT day. :)
t.Date = PRIOR t.Date +
CASE MOD(TO_CHAR(t.Date, 'J'), 7) + 1
WHEN 5 THEN 3 -- Friday, so add 3
ELSE 1 -- Other days, so add one
END
-- And the Text also has to match to the one of the next day.
AND t.Text = PRIOR t.Text)
SELECT s.Date,
s.jm,
MAX(Text) AS CurrentText, -- Not really MAX, they are actually all the same
COUNT(*) AS ConsecutiveDays,
COUNT(CASE WHEN Text = 'Same' THEN 1 END) as SameCount
FROM VW_SAMESTATUSES s
GROUP BY s.Date,
s.jm
This recursive query (available from Oracle version 11g) might be useful:
with s(tcode, tdate) as (
select tcode, tdate from test where tdate = date '2015-06-08'
union all
select t.tcode, t.tdate from test t, s
where s.tcode = t.tcode
and t.tdate = s.tdate - decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) )
select count(1) cnt from s
SQLFiddle
I prepared sample data according to your original question, without further edits, you can see them in attached SQLFiddle. Additional conditions for column 'Text'
are very simple, just add something like ... and Text ='Same' in where clauses.
In current version query counts number of previous days starting from given date (change it in line 2) where dates are consecutive (excluding weekend days) and values in column tcode is the same for all days.
Part: decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) is for substracting days depending if it's Monday or other day, and should work independently from NLS settings.

Adjust date column for change over time

This is an easy enough problem, but wondering if anyone can provide a more elegant solution.
I've got a table that consists of a date column (month end dates over time) and several value columns--say the price on a variety of stocks over time, one column for each stock. I'd like to calculate the change in value columns for each period represented in the date column (eg, a daily return from a table filled with prices).
My current plan is to join the table to itself and simply create a new column for the return as ret = b.price/a.price - 1. Code as follows:
select Date, Ret = (b.stock1/a.stock1 - 1)
from #temp a, #temp b
where datediff(day, a.Date,b.Date) between 25 and 35
order by a.Date
This works fine, BUT:
(1) I need to do this for, say, dozens of stocks--is there a good way to replicate the calculation without copying and pasting the return calculation and replacing 'stock1' with each other stock name?
(2) Is there a better way to do this join? I'm effectively doing a cross join at this point and only keeping entries that are adjacent (as defined by the datediff and range), but wondering if there's a better way to join a table like this to itself.
EDIT: Per request, data is in the form (my data has multiple price columns though):
Date Price
7/1/1996 349.22
7/31/1996 337.72
8/30/1996 343.70
9/30/1996 357.23
10/31/1996 364.07
11/29/1996 385.04
12/31/1996 383.68
And from that, I'd like to calculate return, to generate a table like this (again, with additional columns for the extra price columns that exist in the actual table):
Date Ret
7/31/1996 -0.03
8/30/1996 0.02
9/30/1996 0.04
10/31/1996 0.02
11/29/1996 0.06
12/31/1996 0.00
I would do the following. First, use the month and year to do the self join. I woudl recommend you take the year * 12 + the month number to get a unique value for each month and year combination. So, Jan of 2011 would have a value of (2011 * 12 + 1 = 24133) and December of 2010 would have a value of (2010 * 12 + 12 = 24132). This will allow you to accurately compare months without having to mess with rolling over from December to January. Next, you need to supply the calculations in the select clause. If you have the stock values in different columns then you will have to type them out as a.stock1-b.stock1, a.stock2-b.stock2, etc. The only way around that would be to massage the data to where there is only one stock value column and add a stockname column that would identify what stock that value is for.
Using the Month and Year for the self join, the following query should work:
select Date, Ret = (b.stock1/a.stock1 - 1)
from #temp a
inner join #temp b on (YEAR(a.Date) * 12) + MONTH(a.Date) = (YEAR(b.Date) * 12) + MONTH(b.Date) + 1
order by a.Date