Parse dynamic substring SQL - sql

Querying data in a Postgres SQL database..
I have a column job_name with data like so, where I want to parse the year out of the string (the first 4 numbers)..
AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS
In this case I want to pull 2017
This works fine and pulls the rows where the 4 digits are >= 2019...
select
job_id,
cast(substring(job_name, 4, 4) as integer) as year
from job_header
where
job_type = 'PAMA'
and cast(substring(job_name, 4, 4) as integer) >= 2019
The issue is we have some data where the string is formatted slightly differently, and there are 3 characters in the beginning, or 4 (instead of 2), followed by an underscore like this...
APR3_2018_Another_Test_SC_PAMA
I still want to pull the first 4 numbers (so 2018 in this case), cast to int etc...Is there a way to make my first query dynamic to handle this?

You can use a regular expression to find the first group of 4 digits, e.g.:
with job_header(job_id, job_name, job_type) as (
values
(1, 'AG_2017_AJSJGHJS8GJSJ_7_25_WK_AHGSGHGHS', 'PAMA'),
(2, 'APR3_2018_Another_Test_SC_PAMA', 'PAMA')
)
select
job_id,
cast(substring(job_name from '\d{4}') as int) as year
from job_header
where job_type = 'PAMA'
job_id | year
--------+------
1 | 2017
2 | 2018
(2 rows)
Read about POSIX Regular Expressions in the documentation.

Related

Calculating age from incomplete SQL data

Two columns in table looks like this:
Year of birth
ID
2005
-
1997
-
85
-
95...
How do I create a SQL SELECT from all the data that will return the age of each person based only on the year of birth, and if the whole is not given or only the ID is given, then:
-if only two digits of the year are given such as 85 then by default the year of birth is 1985
-if no year is given then on the basis of the ID whose first two digits are the year of birth as above i.e. ID 95...- first two digits are 95 so the year of birth is 1995
MySQL
A simple example of using MySQL CASE function:
SELECT
CASE
WHEN year_of_birth REGEXP '^[0-9]{4}$' THEN year_of_birth
WHEN year_of_birth REGEXP '^[0-9]{2}$' THEN CONCAT("19", year_of_birth)
ELSE CONCAT("19", ID)
END as year_of_birth
FROM Accounts;
First, check for 4 digit year_of_birth, if not found, check for 2 digit, if not found then get ID. Using CONCAT function to prepend "19" to the 2 digit year and 2 digit ID. Also using REGEXP to check for 4 or 2 digit years.
Try it here: https://onecompiler.com/mysql/3y6yc7mv2
Firstly, I would suggest structuring your database in a cleaner way. Having some years formatted as four digits (e. g. 1985), and others as two is confusing and causes issues such as the one you have run into.
That being said, here is an ad-hoc transact sql formula that will calculate the age based on the incomplete data.
IF 'Year of Birth' IS NULL
SELECT YEAR(NOW()) - (1900 + CAST(LEFT('ID',2) AS INT));
ELSE
IF 'Year of Birth' < 100
SELECT YEAR(NOW()) - (1900 + 'Year of Birth');
ELSE
SELECT YEAR(NOW()) - 'Year of Birth'
This code is untested, and I assumed that the ID column is a string. You'll likely have to make adjustments to make it actually work for your database
To fix the structure of your table, however, a better approach might be cleaning the data and then calculating the date, using the following commands
Filling in null year values:
UPDATE table_name
SET 'Year of Birth' = CAST(LEFT('ID',2) AS INT)
WHERE IS_NULL('Year of Birth')
Making all year values 4 digits long:
UPDATE table_name
SET 'Year of Birth' = 1900 + 'Year of Birth'
WHERE 'Year of Birth' < 100
Now, you can simply subtract the current year from the 'Year of Birth' Column to calculate the age.
Good Luck!
Here is some relevant documentation
If-Else in SQL
Year Function in SQL
String Slicing in SQL
Casting Strings to Integers in SQL
You can follow these steps:
filter out all null values (using the WHERE clause and the COALESCE function)
transform each number to a valid year
year of birth has length 2 > map it to a value smaller than the current year (e.g. 22 -> 2022, 23 -> 1993)
year of birth has length 4 > skip
cast the year of birth string to a number
compute the difference between current year and retrieved year
Here's the full query:
WITH cte AS (
SELECT COALESCE(yob, ID) AS yob
FROM tab
WHERE NOT (yob IS NULL AND ID IS NULL)
)
SELECT yob,
YEAR(NOW()) -
CASE WHEN LENGTH(yob) = 2
THEN IF(CONCAT('20',yob) > YEAR(NOW()),
CONCAT('19',yob),
CONCAT('20',yob) )
WHEN LENGTH(yob) = 1
THEN CONCAT('200', yob)
ELSE yob
END +0 AS age
FROM cte
Check the demo here.
Lots of opportunities to clean up what you started with, and lots of open questions too, but the code below should get you started.
drop table if exists #x
create table #x (YearOfBirth nvarchar(4), ID nvarchar(50))
insert into #x values
('2005', NULL),
('1997', NULL),
('85', NULL),
(NULL, '951234567890')
select
year(getdate()) -
case when len(isnull(YearOfBirth, '')) <> 4
then year(convert(date, '01/01/' +
case when YearOfBirth is NULL
then left(ID, 2)
else YearOfBirth end))
else YearOfBirth end
as PossibleAge
from #x
where (isnumeric(YearOfBirth) <> 0 and len(YearOfBirth) in (2, 4))
or (YearOfBirth is NULL and isnumeric(ID) <> 0)
One and three digit years will be ignored. Lots of ways to adjust this, but without knowing data types, etc. it's just meant to be a rough start.

Using Parameter within timestamp_trunc in SQL Query for DataStudio

I am trying to use a custom parameter within DataStudio. The data is hosted in BigQuery.
SELECT
timestamp_trunc(o.created_at, #groupby) AS dateMain,
count(o.id) AS total_orders
FROM `x.default.orders` o
group by 1
When I try this, it returns an error saying that "A valid date part name is required at [2:35]"
I basically need to group the dates using a parameter (e.g. day, week, month).
I have also included a screenshot of how I have created the parameter in Google DataStudio. There is a default value set which is "day".
A workaround that might do the trick here is to use a rollup in the group by with the different levels of aggregation of the date, since I am not sure you can pass a DS parameter to work like that.
See the following example for clarity:
with default_orders as (
select timestamp'2021-01-01' as created_at, 1 as id
union all
select timestamp'2021-01-01', 2
union all
select timestamp'2021-01-02', 3
union all
select timestamp'2021-01-03', 4
union all
select timestamp'2021-01-03', 5
union all
select timestamp'2021-01-04', 6
),
final as (
select
count(id) as count_orders,
timestamp_trunc(created_at, day) as days,
timestamp_trunc(created_at, week) as weeks,
timestamp_trunc(created_at, month) as months
from
default_orders
group by
rollup(days, weeks, months)
)
select * from final
The output, then, would be similar to the following:
count | days | weeks | months
------+------------+----------+----------
6 | null | null | null <- this, represents the overall (counted 6 ids)
2 | 2021-01-01| null | null <- this, the 1st rollup level (day)
2 | 2021-01-01|2020-12-27| null <- this, the 1st and 2nd (day, week)
2 | 2021-01-01|2020-12-27|2021-01-01 <- this, all of them
And so on.
At the moment of visualizing this on data studio, you have two options: setting the metric as Avg instead of Sum, because as you can see there's kind of a duplication at each stage of the day column; or doing another step in the query and get rid of nulls, like this:
select
*
from
final
where
days is not null and
weeks is not null and
months is not null

How to run a query for multiple independent date ranges?

I would like to run the below query that looks like this for week 1:
Select week(datetime), count(customer_call) from table where week(datetime) = 1 and week(orderdatetime) < 7
... but for weeks 2, 3, 4, 5 and 6 all in one query and with the 'week(orderdatetime)' to still be for the 6 weeks following the week(datetime) value.
This means that for 'week(datetime) = 2', 'week(orderdatetime)' would be between 2 and 7 and so on.
'datetime' is a datetime field denoting registration.
'customer_call' is a datetime field denoting when they called.
'orderdatetime' is a datetime field denoting when they ordered.
Thanks!
I think you want group by:
Select week(datetime), count(customer_call)
from table
where week(datetime) = 1 and week(orderdatetime) < 7
group by week(datetime);
I would also point out that week doesn't take the year into account, so you might want to include that in the group by or in a where filter.
EDIT:
If you want 6 weeks of cumulative counts, then use:
Select week(datetime), count(customer_call),
sum(count(customer_call)) over (order by week(datetime)
rows between 5 preceding and current row) as running_sum_6
from table
group by week(datetime);
Note: If you want to filter this to particular weeks, then make this a subquery and filter in the outer query.

Find each position of same characters in a string through SQL query

Below is my data set
Year Month Holiday list
----------------------------------------------------------
2007 1 WWHHWWWWWHHWWWWWHHWWWWWHHWWWWWH
2008 4 HWWWWWHHWWWWWHHWWWWWHHWWWWWHHWW
I want to write SQL query to get below output wherein the position of each "H" is displayed.
Year Month Holiday list
-----------------------------------------------------
2007 1 3 4 10 11 18 19 25 26 31
2008 4 1 7 8 14 15 21 22 28 29
The INSTR function in Oracle returns the position of first character. So, I cannot use it.
I could write a Function and pass the Holiday List column value to it. Loop through the string and
find the position of each "H". But, I want to achieve this through plain SQL select query.
How to find each position of same characters in a string through SQL query in Oracle database?
Something like this might do it. You can use the INSTRfunction and a hierarchical query to go through the string and then use the LISTAGG analytic function to concatenate the results.
WITH data AS (
SELECT 2007 year, 1 month, 'WWHHWWWWWHHWWWWWHHWWWWWHHWWWWWH' holiday_list FROM DUAL
UNION
SELECT 2008 year, 4 month, 'HWWWWWHHWWWWWHHWWWWWHHWWWWWHHWW' holiday_list FROM DUAL)
SELECT year, month, (SELECT LISTAGG(INSTR(holiday_list,'H',1,LEVEL),' ') WITHIN GROUP (ORDER BY LEVEL)
FROM DUAL
CONNECT BY LEVEL < INSTR(holiday_list,'H',1,LEVEL)) S
FROM data;
Well you can get a list, see the dbfiddle here
SELECT id, INSTR(some, 'H', 1, c.no) AS Pos
FROM tst CROSS JOIN consecutive c
WHERE INSTR(some, 'H', 1, c.no) > 0
This gives you a list of the days with an H. You could then try to transpose this list. But be warned, there is an intentional CROSS JOIN there, so for every row in your data you will get up to 31 rows with this statement. Like Tim Biegeleisen said, it may be better to use an UDF for this.

Using crosstab, dynamically loading column names of resulting pivot table in one query?

The gem we have installed (Blazer) on our site limits us to one query.
We are trying to write a query to show how many hours each employee has for the past 10 days. The first column would have employee names and the rest would have hours with the column header being each date. I'm having trouble figuring out how to make the column headers dynamic based on the day. The following is an example of what we have working without dynamic column headers and only using 3 days.
SELECT
pivot_table.*
FROM
crosstab(
E'SELECT
"User",
"Date",
"Hours"
FROM
(SELECT
"q"."qdb_users"."name" AS "User",
to_char("qdb_works"."date", \'YYYY-MM-DD\') AS "Date",
sum("qdb_works"."hours") AS "Hours"
FROM
"q"."qdb_works"
LEFT OUTER JOIN
"q"."qdb_users" ON
"q"."qdb_users"."id" = "q"."qdb_works"."qdb_user_id"
WHERE
"qdb_works"."date" > current_date - 20
GROUP BY
"User",
"Date"
ORDER BY
"Date" DESC,
"User" DESC) "x"
ORDER BY 1, 2')
AS
pivot_table (
"User" VARCHAR,
"2017-10-06" FLOAT,
"2017-10-05" FLOAT,
"2017-10-04" FLOAT
);
This results in
| User | 2017-10-05 | 2017-10-04 | 2017-10-03 |
|------|------------|------------|------------|
| John | 1.5 | 3.25 | 2.25 |
| Jill | 6.25 | 6.25 | 6 |
| Bill | 2.75 | 3 | 4 |
This is correct, but tomorrow, the column headers will be off unless we update the query every day. I know we could pivot this table with date on the left and names on the top, but that will still need updating with each new employee – and we get new ones often.
We have tried using functions and queries in the "AS" section with no luck. For example:
AS
pivot_table (
"User" VARCHAR,
current_date - 0 FLOAT,
current_date - 1 FLOAT,
current_date - 2 FLOAT
);
Is there any way to pull this off with one query?
You could select a row for each user, and then per column sum the hours for one day:
with user_work as
(
select u.name as user
, to_char(w.date, 'YYYY-MM-DD') as dt_str
, w.hours
from qdb_works w
join qdb_users u
on u.id = w.qdb_user_id
where w.date >= current_date - interval '2 days'
)
select User
, sum(case when dt_str = to_char(current_date,
'YYYY-MM-DD') then hours end) as Today
, sum(case when dt_str = to_char(current_date - 'interval 1 day',
'YYYY-MM-DD') then hours end) as Yesterday
, sum(case when dt_str = to_char(current_date - 'interval 2 days',
'YYYY-MM-DD') then hours end) as DayBeforeYesterday
from user_work
group by
user
, dt_str
It's often easier to return a list and pivot it client side. That also allows you to generate column names with a date.
Is there any way to pull this off with one query?
No, because a fixed SQL query cannot have any variability in its output columns. The SQL engine determines the number, types and names of every column of a query before executing it, without reading any data except in the catalog (for the structure of tables and other objects), execution being just the last of 5 stages.
A single-query dynamic pivot, if such a thing existed, couldn't be prepared, since a prepared query always have the same results structure, whereas by definition a dynamic pivot doesn't, as the rows that pivot into columns can change between executions. That would be at odds again with the Prepare-Bind-Execute model.
You may find some limited workarounds and additional explanations in other questions, for example: Execute a dynamic crosstab query, but since you mentioned specifically:
The gem we have installed (Blazer) on our site limits us to one
query
I'm afraid you're out of luck. Whatever the workaround, it always need at best one step with a query to figure out the columns and generate a dynamic query from them, and a second step executing the query generated at the previous step.