Calculating Datediff from nested json sql query

Calculating Datediff from nested json sql query - sql

I am trying to use datediff() to find the age of a person in a postgres db (from their date of birth(. I know I can run datediff() like this
SELECT DATEDIFF(current_date, '2021-01-24');
The query i use to get date of birth is (its in json)
select date_of_birth from (select attrs::json->'info'->>'date_of_birth' from users) as date_of_birth;
This gives me output like
date_of_birth
--------------
(2000-11-03)
(2000-06-11)
(2000-05-31)
(2008-11-26)
(2007-11-09)
(2020-03-26)
(2018-06-30)
I tried using
SELECT DATEDIFF(current_date, (select date_of_birth from (select attrs::json->'info'->>'date_of_birth' as date_of_birth from users));
It doesn't work. I tried several permutations but i can't get it to work.
How should I edit my query to calculate the user age?

This query:
select date_of_birth
from (
select attrs::json->'info'->>'date_of_birth'
from users
) as date_of_birth;
Returns the row rather than a column, (the column expression for the extracted date value has no defined alias). It's like using select users from users. You need to make `date_of_birth_ a column alias (not a table alias) and use that in the outer query.
To get the difference between two dates, just subtract them but you need to cast the valued to a date to be able to do that.
select current_date - u.date_of_birth
from (
select (attrs::json->'info'->>'date_of_birth')::date as date_of_birth
from users
) as u;
Or without a derived table:
select current_date - (u.attrs::json->'info'->>'date_of_birth')::date
from users as u
Apparently your dates are stored in a non-standard format. In that case you can't use as cast, but you need to use the to_date() function:
to_date(u.attrs::json->'info'->>'date_of_birth', 'mm:dd:yyyy')
If you are storing JSON in the attrs column you should convert it from a text (or varchar column to a proper json (or better jsonb) so you don't need to cast it all the time.

Related

Not getting unique values inspite of using the distinct function

I am using the below code to return a set of distinct UUIDs and a corresponding date when the first action was taken on those UUIDs. The raw data will have non-distinct UUIDs and a corresponding date when an action was taken. I am trying to extract unique UUIDs and the first date when the action was taken as represented by date1. Can someone help where I am going wrong.
The output that I get is the same raw data and the UUIDs are unfortunately non-unique and has many duplicates
with raw_data as (
select UUID, cast(datestring as timestamp) as date1
from raw)
select
distinct UUID,
date_trunc('week', date1)
from raw_date

Use the min() aggregation function:
select UUID,
min(date_trunc('week', cast(datestring as timestamp)))
from raw
group by UUID;
This should do everything your query is doing. There is no need for a subquery or CTE.

Select columns from second subquery if first returns NULL

I have two queries that I'm running separately from a PHP script. The first is checking if an identifier (group) has a timestamp in a table.
SELECT
group, MAX(timestamp) AS timestamp, value
FROM table_schema.sample_table
GROUP BY group, value
If there is no timestamp, then it runs this second query that retrieves the minimum timestamp from a separate table:
SELECT
group, MIN(timestamp) as timestamp, value AS value
FROM table_schema.src_table
GROUP BY group, value
And goes on from there.
What I would like to do, for the sake of conciseness, is to have a single query that runs the first statement, but that defaults to the second if NULL. I've tried with coalesce() and CASE statements, but they require subqueries to return single columns (which I hadn't run into being an issue yet). I then decided I should try a JOIN on the table with the aggregate timestamp to get the whole row, but then quickly realized I can't variate the table being joined (not to my knowledge). I opted to try joining both results and getting the max, something like this:
Edit: I am so tired, this should be a UNION, not a JOIN
sorry for any possible confusion :(
SELECT smpl.group, smpl.value, MAX(smpl.timestamp) AS timestamp
FROM table_schema.sample_table as smpl
INNER JOIN
(SELECT src.group, src.value, MIN(src.timestamp) AS timestamp
FROM source_table src
GROUP BY src.group, src.value) AS history
ON
smpl.group = history.group
GROUP BY smpl.group, smpl.value
I don't have a SELECT MAX() on this because it's really slow as is, most likely because my SQL is a bit rusty.
If anyone knows a better approach, I'd appreciate it!

Please try this:
select mx.group,(case when mx.timestamp is null then mn.timestamp else mx.timestamp end)timestamp,
(case when mx.timestamp is null then mn.value else mx.value end)value
(
SELECT
group, MAX(timestamp) AS timestamp, value
FROM table_schema.sample_table
GROUP BY group, value
)mx
left join
(
SELECT
group, MIN(timestamp) as timestamp, value AS value
FROM table_schema.src_table
GROUP BY group, value
)mn
on mx.group = mn.group

How to sum two columns which already created using sum in Redshift

I tried adding two numbers that are present in two different columns but it's not adding up when there are no numbers present in the second column(B). Please find the screenshot of the table and the query I was using to achieve this.
Not getting the value present in COLUMN A in total sales.
The query which I ran but wasn't successful.
SELECT Date,
SUM(sales a) as "total_a",
SUM(sales b) as "total_b",
("total_a"+"total_b") as "total_sales"
FROM data_table
GROUP BY Date;

I would suggest:
SELECT Date,
SUM(sales_a) as "total_a",
SUM(sales_b) as "total_b",
COALESCE(SUM(sales_a, 0) + COALESCE(SUM(sales_b, 0)) as "total_sales"
FROM data_table
GROUP BY Date;
I do know that Amazon Redshift allows the re-use of table aliases -- contravening the SQL standard. However, I find it awkward to depend on that functionality; and it can lead to hard-to-find-errors if your column aliases match existing column names.

You can't reuse column aliases in the same scope, so your query should error. You need to repeat the SUM() expressions.
Then: if one of the sums returns NULL, it propagates the the results of the addition. You can use coalesce() to avoid that:
SELECT Date,
SUM(sales_a) as total_a,
SUM(sales_b) as total_b,
COALESCE(SUM(sales_a), 0) + COALESCE(SUM(sales_b), 0) as total_sales
FROM data_table
GROUP BY Date;

Filling gaps in DATE fiel

I am querying a DATE field:
SELECT DATE ,
FIELD2 ,
FIELD3
into Table_new
FROM Table_old
WHERE (criteria iLIKE '%xxxyyy%')
The DATE field runs from 10/1/2010 to present, but it has missing days along the way. When I export the data (in Tableau, for example), I need the data to line up with a calendar that DOES NOT have any missing dates. This means I need a space/holder for a date, even if no data exists for that date in the query. How can I achieve this?
Right now I am exporting the data, and manually creating a space where no data for a date exists, which is extremely inefficient.

Tableau can do this natively. No need to alter your data set. You just need to make sure that your DATE field is of the date type in Tableau and then show emptycolumns/rows.
My test data:
Before I show empty columns:
How I show empty columns:
After I show empty columns (end result):
If you want to then restrict those dates, you can add the date field to the filter, select your date range, and Apply to Context.

In Postgres, you can easily generate the dates:
select mm.date, t.field1, t.field2
from (select generate_series(mm.mindate, mm.maxdate, interval '1 day') as date
from (select min(date) as mindate, max(date) as maxdate
from table_old
where criteria ilike '%xxxyyy%'
) mm
) d left join
table_old t
on t.date = mm.date and
criteria ilike '%xxxyyy%';
This returns all dates between the minimum and maximum for the criteria. If you have another date range in mind, just use that for the generate_series().
Note: The final condition on criteria needs to go in the on clause not a where clause.

How to replace a nested select with "group by" with a "having" clause?

My SQL is getting somewhat rusty and the only way I have managed to retrieve from a table the ids of the newest records (based on a date field) of the same type is with a nested select, but I suspect that there must be a way to do the same with a having clause or something more efficient.
Supposing that the only columns are ID, TYPE and DATE, my current query is:
select ID from MY_TABLE,
(select TYPE as GROUP_TYPE,
max(DATE) as MAX_DATE
from MY_TABLE group by TYPE)
where TYPE = GROUP_TYPE
and DATE = MAX_DATE
(I'm writing it from my memory, maybe there are some syntax errors, but you get the idea)

I'd prefer to stick to pure standard SQL without proprietary extensions.
Then there is no "more efficient" way to write this query. Not in standard ANSI-SQL. The problem is that you are trying to compare an AGGREGATE column (Max-date) against a base column (date) to return another base column (ID). The HAVING clause cannot handle this type of comparison.
There are ways using ROW_NUMBER (windowing function) or MySQL (group by hack) to do it, but those are not portable across database systems.

SELECT a.id, a.type, a.dater
from my_table a inner join
(
select type, max(dater) as dater2
from my_table
group by type
) b
on a.type= b.type and a.dater= b.dater2

This should get you closer depending on your data
select ID from MY_TABLE
where (DATE = (select max(DATE) from MY_TABLE as X
where X.TYPE = MY_TABLE.TYPE)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculating Datediff from nested json sql query - sql

Related

Not getting unique values inspite of using the distinct function

Select columns from second subquery if first returns NULL

How to sum two columns which already created using sum in Redshift

Filling gaps in DATE fiel

How to replace a nested select with "group by" with a "having" clause?

Categories

Resources