Search by date in array of objects within PostgreSQL JSONB Column - sql

I have two tables in my PostgreSQL 9.6 instance.
users
+----+------------+-----------+-------------------+
| id | first_name | last_name | email |
+----+------------+-----------+-------------------+
| 1 | John | Doe | john.doe#test.com |
+----+------------+-----------+-------------------+
| 2 | Jane | Doe | jane.doe#test.com |
+----+------------+-----------+-------------------+
| 3 | Mike | Doe | mike.doe#test.com |
+----+------------+-----------+-------------------+
surveys
+----+---------+----------------------------------------------------------------------------------------------------+
| id | user_id | survey_data |
+----+---------+----------------------------------------------------------------------------------------------------+
| 1 | 1 | {'child_list': [{'gender': 1, 'birthday': '2015-10-01'}, {'gender': 2, 'birthday': '2017-05-01'}]} |
+----+---------+----------------------------------------------------------------------------------------------------+
| 2 | 2 | {'child_list': []} |
+----+---------+----------------------------------------------------------------------------------------------------+
| 3 | 3 | {'child_list': [{'gender': 2, 'birthday': '2008-01-01'}]} |
+----+---------+----------------------------------------------------------------------------------------------------+
I would like be able to query these two tables to get the number of users who have children between certain age. The survey_data column in surveys table is a JSONB column.
So far I've tried using jsonb_populate_recordset with LATERAL joins. I was able to SELECT the child_list array as two columns but couldn't figure out how to use that with my JOIN between users and surveys tables. The query I used is as below:
SELECT DISTINCT u.email
FROM surveys
CROSS JOIN LATERAL (
SELECT *
FROM jsonb_populate_recordset(null::json_type, (survey.survey_data->>'child_list')::jsonb) AS d
) d
INNER JOIN users u ON u.id = survey.user_id
WHERE d.birthday BETWEEN '2014-05-05' AND '2018-05-05';
This also uses a custom type which was created using this:
CREATE type json_type AS (gender int, birthday date)
My question is, is there an easier to read way to do this? I would like to use this query with many other JOINs and WHERE clauses and I was wondering if there is a better way of doing this.
Note: this is mainly going to be used by a reporting system which does not need to be super fast but of course any speed gains are welcome.

Use the function jsonb_array_elements(), examples:
select email, (elem->>'gender')::int as gender, (elem->>'birthday')::date as birthday
from users u
left join surveys s on s.user_id = u.id
cross join jsonb_array_elements(survey_data->'child_list') as arr(elem)
email | gender | birthday
-------------------+--------+------------
john.doe#test.com | 1 | 2015-10-01
john.doe#test.com | 2 | 2017-05-01
mike.doe#test.com | 2 | 2008-01-01
(3 rows)
or
select distinct email
from users u
left join surveys s on s.user_id = u.id
cross join jsonb_array_elements(survey_data->'child_list') as arr(elem)
where (elem->>'birthday')::date between '2014-05-05' and '2018-05-05';
email
-------------------
john.doe#test.com
(1 row)
You can make your life easier using a view:
create view users_children as
select email, (elem->>'gender')::int as gender, (elem->>'birthday')::date as birthday
from users u
left join surveys s on s.user_id = u.id
cross join jsonb_array_elements(survey_data->'child_list') as arr(elem);
select distinct email
from users_children
where birthday between '2014-05-05' and '2018-05-05';

Related

SQL query to get data if fields match or null if they dont

I need to get a list of users with their corresponding bed_id in case user has bed_id and fill with null if user doesn't have bed_id since there will always been more users than beds.
When I try this:
select users.*, beds.bed_id from users, beds where users.user_id = beds.user_id;
I got a list of users who have assigned bed_id alone but I need the complete list of users with their bed_id or null.
USERS
--------------
| user_id |
| name |
| field1 |
| field2 |
| ... |
--------------
BEDS
--------------
| bed_id |
| user_id |
| room_id |
| field1 |
| ... |
--------------
Thanks for the great support you always share here.
Use LEFT OUTER JOIN:
select users.*,
beds.bed_id
from users
LEFT OUTER JOIN beds
ON (users.user_id = beds.user_id);

Simple SQL Query to bring back null if no match found

EDIT
I've edited this question to make it a little more concise, if you see my edit history you will see my effort and 'what I've tried' but it was adding a lot of unnecessary noise and causing confusion so here is a summary of input and output:
People:
ID | FullName
--------------------
1 | Jimmy
2 | John
3 | Becky
PeopleJobRequirements:
ID | PersonId | Title
--------------------
1 | 1 | Some Requirement
2 | 1 | Another Requirement
3 | 2 | Some Requirement
4 | 3 | Another Requirement
Output:
FullName | RequirementTitle
---------------------------
Jimmy | Some Requirement
Jimmy | Another Requirement
John | Some Requirement
John | null
Becky | null
Becky | Another Requirement
Each person has 2 records, because that's how many distinct requirements there are in the table (distinct based on 'Title').
Assume there is no third table - the 'PeopleJobRequirements' is unique to each person (one person to many requirements), but there will be duplicate Titles in there (some people have the same job requirements).
Sincere apologies for any confusion caused by the recent updates.
CROSS JOIN to get equal record for each person and LEFT JOIN for matching records.
Following query should work in your scenario
select p.Id, p.FullName,r.Title
FROM People p
cross join (select distinct title from PeopleJobRequirements ) pj
left join PeopleJobRequirements r on p.id=r.personid and pj.Title=r.Title
order by fullname
Online Demo
Output
+----+----------+---------------------+
| Id | FullName | Title |
+----+----------+---------------------+
| 3 | Becky | Another Requirement |
+----+----------+---------------------+
| 3 | Becky | NULL |
+----+----------+---------------------+
| 1 | Jimmy | Some Requirement |
+----+----------+---------------------+
| 1 | Jimmy | Another Requirement |
+----+----------+---------------------+
| 2 | John | NULL |
+----+----------+---------------------+
| 2 | John | Some Requirement |
+----+----------+---------------------+
use left join, no need any subquery
select p.*,jr.*,jrr.*
from People p left join
PeopleJobRequirements jr on p.Id=jrPersonId
left join JobRoleRequirements jrr p.id=jrr.PersonId
according the explanation, People and PeopleJobRequirements tables have many to many relationship (n to n).
so first of all you'll need another table to relate these to table.
first do this and then a full join will make it right.

Postgres complicated full outer join keeping nulls from the "on" column

I have written a PostgresSQL query that is relatively performant at scale and gives me the dataset I want back, but I am wondering if it is the simplest/best way to write the query. It seems like there should be a simpler join operation that satisfies the conditions I need.
EDIT: I do need this to be performant on large tables. In the example given below, pets is 150 million rows, food is roughly 100k rows. My solution at the bottom clocks in at about 0.6ms. Both tables have an index on id and user_id. The food table also includes an index on pet_id.
I have two tables that are related in my system that have one guaranteed shared attribute - the user_id. Here is an example that in essence shows my problem:
Pets
+------+-------+---------+
| id | type | user_id |
+------+-------+---------+
| 1234 | dog | 1 |
| 1235 | cat | 1 |
| 1236 | gecko | 1 |
+------+-------+---------+
Food
+------+-----------+---------+--------+
| id | name | user_id | pet_id |
+------+-----------+---------+--------+
| 4321 | hamburger | 1 | NULL |
| 4322 | dog food | 1 | 1234 |
| 4323 | cat food | 1 | 1235 |
+------+-----------+---------+--------+
Desired Results
+------+------+
| p.id | f.id |
+------+------+
| NULL | 4321 | --no pet, hamburger
| 1234 | 4322 | --dog, dog food
| 1235 | 4323 | --cat, cat food
| 1236 | NULL | --gecko, no food
+------+------+
Now with an example to refer to, I'll make sure it's clear what the result is. The result contains all rows from both sides that belong to my user_id (imagine that the table could contain thousands of other rows that don't belong to user_id 1). I want these result rows to include exactly ONE copy of each row matched to the other table.
An example of a full outer join that I tried to make this work:
SELECT p.id, f.id
FROM pets p FULL OUTER JOIN food f ON p.user_id = f.user_id
WHERE p.user_id = 1;
There's a bit of a problem in this query because
It excludes NULLs from the left side of the query. I need those.
Because the user_id is essentially the constant here, I end up with plenty of duplicates because it matches on user_id. Every row from the left gets matched to every row from the right. Not what I need. I need a one-to-one match.
I could fix #1 by including an OR in the WHERE filter:
SELECT p.id, f.id
FROM pets p FULL OUTER JOIN food f ON p.user_id = f.user_id
WHERE p.user_id = 1 OR f.user_id = 1;
For reasons I'm not completely sure of, it makes the query take a very long time. In our system, both tables have an index on user_id, so it isn't the lack of an index.
To solve my issue, I landed on the following query (really two combined):
SELECT p.id, f.id
FROM pets p LEFT JOIN food f
ON p.id = f.pet_id AND f.user_id = 1
WHERE p.user_id = 1
UNION
SELECT p.id, f.id FROM pets p RIGHT JOIN food f
ON p.id = f.pet_id
WHERE f.user_id = 1 AND p.id IS NULL;
So my question is this: Is there a simpler way to execute this as a single query?
SQL DEMO
SELECT p.id, f.id
FROM pets p
FULL OUTER JOIN food f
ON p.user_id = f.user_id
AND p.id = f.pet_id
AND p.user_id = 1;
OUTPUT
| id | id |
|--------|--------|
| 1234 | 4322 |
| 1235 | 4323 |
| 1236 | (null) |
| (null) | 4321 |
NOTE:
You should add a composite index on (user_id, pet_id) for both tables.
You're just overthinking this a bit. You want to join on P.ID = F.PET_ID:
SELECT P.ID, F.ID
FROM PETS P
FULL OUTER JOIN FOOD F ON P.ID = F.PET_ID
AND P.USER_ID = F.USER_ID
AND P.USER_ID = 1 --optional
ORDER BY P.ID

Self Referencing SQL query when condition is met

I'm trying to create a SQL query to return the column values from a table that meet certain criteria.
Currently I have used the CONCAT function to join the first and last names into a single column in the query result for employees that have the role of 'Programmer'.
SELECT
person.id, CONCAT(person.firstname,' ', person.lastname) AS FULLNAME
FROM
person, role
WHERE
person.role_id = role.id AND role.name = 'Programmer'
This successfully runs and returns all programmers from the tables. Notice in my table structure I have an actingas_id column. This is the key to another person.id for people who are working on behalf of another people whilst they're on leave from work.
Thus, we arrive at my Question: How do I modify the SQL query such that when a person is acting that the query retrieves the first and last name of this person as well as the person who's 'shoes are being filled'?
My table structure is as follows:
person:
id | firstname | lastname | role_id | actingas_id |
role:
id | name |
+----+-----------+----------+---------+-------------+
| id | firstname | lastname | role_id | actingas_id |
+----+-----------+----------+---------+-------------+
| 1 | John | Smith | 1 | 0 |
| 2 | Kevin | Tull | 2 | 1 |
| 3 | Michael | Woods | 1 | 0 |
+----+-----------+----------+---------+-------------+
Here Kevin is Acting for for John, and Michael is also a Programmer, so the result of my query should be:
+----+-------------------------+
| id | NAME |
+----+-------------------------+
| 1 | John Smith - Kevin Tull |
| 3 | Michael Woods |
| x | Other Programmers.. |
+----+-------------------------+
This untested query should give you the result you whant:
SELECT person1.id, CASE WHEN person1.actingas_id =0 then CONCAT(person1.firstname,' ', person1.lastname) else CONCAT(person1.firstname,' ', person1.lastname,' - ', person2.firstname,' ', person2.lastname) AS FULLNAME
FROM person person1 left join person person2 on person1.actingas_id=person2.id
join role on person1.role_id=role.id
WHERE role.name = 'Programmer'
Use UNION ALL to add those additional records:
SELECT person.id,
CONCAT(person.firstname,' ', person.lastname) AS FULLNAME
FROM person INNER JOIN role
ON person.role_id = role.id
WHERE role.name = 'Programmer'
AND NOT EXISTS(SELECT 1 FROM person p WHERE p.actingas_id = person.id)
UNION ALL
SELECT a.id,
CONCAT(a.firstname,' ', a.lastname, ' - ', person.firstname,' ', person.lastname) AS FULLNAME
FROM person INNER JOIN person a
ON a.acting_as = person.id
INNER JOIN role
ON person.role_id = role.id
WHERE role.name = 'Programmer' AND a.actingas_id <> 0
Also, avoid using old style comma-separated JOINs. Use INNER JOINs

Confirm result of left join sybase query

I need some help with sybase sql syntax. Going through some example questions after many years of being away from sql. Given the following two sybase table structures and data:
Table name: users
| name | salary |
| joe | 100000 |
| nick | 10000 |
Table name: user_data
| name | percent |
| joe | 0.67 |
and the following query:
select u.name, ud.percent from users u, user_data ud where u.name *= ud..name
am I right in thinking that the output will be:
| name | percent |
| joe | 0.67 |
| nick | NULL |
based on the reasoning that the *= means left join?
The other question I had is what does the '..' mean in the ud..name?
Thanks.
Yes that is older JOIN syntax that has been replaced with ANSI JOIN syntax. The query should be written:
select u.name, ud.percent
from users u
left join user_data ud
on u.name = ud.name
This query will return all users in your table even if there is not a matching row in the user_data table. For those rows not in the table a null will be returned.