How do I identify distinct combinations across array-columns and then unnest in sql presto

How do I identify distinct combinations across array-columns and then unnest in sql presto - sql

I have a database called programs created as
CREATE TABLE programs (
name varchar(200) NOT NULL,
role varchar(200) NOT NULL,
section text[] NOT NULL,
sub_section text[] NOT NULL,
title text[] NOT NULL
);
INSERT INTO programs (name, role, section, sub_section, title) VALUES
('John','Lead','{"VII","VII","VII"}','{"A","A","C"}','{"STUDY","STUDY","STUDY"}'),
('Olga','Member','{"VII","VII"}','{"A","A"}','{"STUDY","STUDY"}'),
('Ben','Co-Lead','{"XI","X"}','{"A","B"}','{"STUDY","TRAVEL"}'),
('Ana','Member','{"VII","II","VI"}','{"A","ALL","B"}','{"STUDY","STUDY","TRAVEL"}');
Here's what the table looks like
| name | role | section | sub_section | title |
| ---- | ------- | ------------ | ----------- | ------------------------ |
| John | Lead | VII,VII,VII | A,A,C | STUDY,STUDY,STUDY |
| Olga | Member | VII,VII | A,A | STUDY,STUDY |
| Ben | Co-Lead | XI,X | A,B | STUDY,TRAVEL |
| Ana | Member | VII,II,VI | A,ALL,B | STUDY,STUDY,TRAVEL |
I want to identify distinct combinations across the section, sub-section, and title columns, as well as unnesting to get this as output
| name | role | section.sub_section | title |
| ---- | ------- | ------------------- | ------------------------ |
| John | Lead | VII.A | STUDY
| John | Lead | VII.C | STUDY
| Olga | Member | VII.A | STUDY
| Ben | Co-Lead | XI.A | STUDY
| Ben | Co-Lead | X.B | TRAVEL
| Ana | Member | VII.A | STUDY
| Ana | Member | II.ALL | STUDY
| Ana | Member | VI.B | TRAVEL
I'm fairly new to SQL and I'm really struggling with getting desired output. Your help would be very much appreciated.

You desired data does not show "combinations across the section, sub-section, and title columns", it seems that you require to match corresponding array based on positions, so you can just unnest and group by fields which you want to distinct on.
Assuming that corresponding columns contain arrays of varchars (if not - you will need to use some string functions to convert them):
-- sample data
WITH dataset (name, role, section, sub_section, title) AS (
VALUES ('John','Lead',array['VII','VII','VII'],array['A','A','C'],array['STUDY','STUDY','STUDY']),
('Olga','Member',array['VII','VII'],array['A','A'],array['STUDY','STUDY']),
('Ben','Co-Lead',array['XI','X'],array['A','B'],array['STUDY','TRAVEL']),
('Ana','Member',array['VII','II','VI'],array['A','ALL','B'],array['STUDY','STUDY','TRAVEL'])
)
--query
select name,
role,
sec || '.' || sub_sec "section.sub_section",
t title
from dataset
cross join unnest(section, sub_section, title) as t(sec, sub_sec, t)
group by name, role, sec, sub_sec, t
order by name
Output:
name
role
section.sub_section
title
Ana
Member
VII.A
STUDY
Ana
Member
II.ALL
STUDY
Ana
Member
VI.B
TRAVEL
Ben
Co-Lead
XI.A
STUDY
Ben
Co-Lead
X.B
TRAVEL
John
Lead
VII.A
STUDY
John
Lead
VII.C
STUDY
Olga
Member
VII.A
STUDY

Related

How can I load rows by name and role and order them based on their name, but preference the name? (SQL)

What I mean is the following.
I have a database containing people. These people van a name and a role.
It looks like this:
-----------------------------
| (PK) id | name | role |
-----------------------------
| 1 | ben | fireman |
| 2 | ron | cook |
| 3 | chris | coach |
| 4 | remco | barber |
-----------------------------
Ive created a searchbar where you can search for people in the database. When you press search, it looks for name and roles, for example:
When I type in 'co', the result I get is:
-----------------------------
| (PK) id | name | role |
-----------------------------
| 3 | chris | coach |
| 4 | remco | barber |
| 2 | ron | cook |
-----------------------------
This is because its looking for matches in the name and role column.
The query I use is:
SELECT * FROM people WHERE name LIKE '$search' OR role LIKE '$search' ORDER BY name";
The only issue with this is that it just order by name.
I want it to first order every result from the name column by name and then order every remaining result from the role column by name, so it ends up looking like this:
-----------------------------
| (PK) id | name | role |
-----------------------------
| 4 | remco | barber | <- 'co' found in name column, ordered by name
| 3 | chris | coach | <- 'co' found in role column, ordered by name
| 2 | ron | cook | <- 'co' found in role column, ordered by name
-----------------------------
How can I do this?
Edit: $search is the output from the searchbar

Use a case expression to put the 'co' names first:
order by case when name LIKE '$search' then 0 else 1 end, name, role

Set inclusion in SQL

The quest is to check if one set fully includes another. As simplified example we can take four tables:
worker (id, name),
worker_skills (worker_id, skill),
job (id, type)
job_required_skills (job_id, skill)
I want to match the worker to the job but only if job required skills are fully match worker skills, i. e. if worker has some skills which are not required on job it's ok, but if job has at least one skill which worker doesn't then they don't match.
All I can think of includes ridiculous amount of joins and can't be used as a serious solution, so any advices are highly appreciated. Database is postgres 9.6. Thanks!
EDIT:
Some sample data:
+------+---------------+
| name | worker_skills |
+------+---------------+
| John | java |
| John | sql |
| John | ruby |
| Jane | js |
| Jane | html |
+------+---------------+
+---------------------+-------------+
| type | job_skills |
+---------------------+-------------+
| Writing_queries | sql |
| Writing_queries | black_magic |
| Generic_programming | java |
| Frontend_stuff | js |
| Frontend_stuff | html |
+---------------------+-------------+
Result:
+------+---------------------+
| John | Generic_programming |
+------+---------------------+
| Jane | Frontend_stuff |
+------+---------------------+
John is perfectly qualified for Generic_programming (the only needed skill is in his skillset) but can't do Writing_queries as it requires some black_magic; Jane can do Frontend_stuff as she has both required skills.

You can use a left join and aggregation:
select jrs.id, ws.id
from job_required_skills jrs left join
worker_skills ws
on jrs.skill = ws.skill
group by jrs.id, ws.id
having count(*) = count(ws.skill)

Inserting value into column based on column in separate table

I currently have a situation where I will have 2 tables:
+--------------+-----------------+-----------------+-----------------+
| OriginalID | NewID | FirstName | lastName |
+--------------+-----------------+-----------------+-----------------+
| 123456 | | billy | bob |
| 234567 | | tommy | smith |
| 987654 | | sarah | anders |
+--------------+-----------------+-----------------+-----------------+ etc etc
and
+--------------+-----------------+
| OriginalID | NewID |
+--------------+-----------------+
| 123456 | 1111111 |
| 234567 | 1111112 |
| 987654 | 1111113 |
+--------------+-----------------+
Without going in-depth into the process itself, I take a record from the first table and insert it into a different system, which gives a record in the form of the second table (generates a custom ID for it).
What I want to do is for every record in the second table, take the NewID and place it into the row with the same OriginalID in the first table (so that it looks like this:
+--------------+-----------------+-----------------+-----------------+
| OriginalID | NewID | FirstName | lastName |
+--------------+-----------------+-----------------+-----------------+
| 123456 | 1111111 | billy | bob |
+--------------+-----------------+-----------------+-----------------+
As a note, the only values I care about dealing with are the OriginalID and NewID, none of the other values are needed. This will happen for multiple tables with different names, so I have given a generic example. For this example the tables can be called
ContactRecords (first table) and NewContact (second table)
I have read over several examples on SO about this type of problem, but none of them quite fit the solution I'm looking for.
Thanks in advance!

This looks like a join update which we had here many times.
update old
set NewID = new.NewID
from ContactRecords as old
inner join NewContact as New on new.OriginalID = old.OriginalID

SQL for calculated column that chooses from value in own row

I have a table in which several indentifiers of a person may be stored. In this table I would like to create a single calculated identifier column that stores the best identifier for that record depending on what identifiers are available.
For example (some fictional sample data) ....
Table = "Citizens"
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | ?
2 | Jones | JG892435262 | NULL | NULL | ?
3 | Trask | TSK73948379 | NULL | 9276542119 | ?
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | ?
I know the order in which I would like choose identifiers ...
Drivers-License-No
Social-Security-No
State-Id-No
So I would like the calculated identifier column to be part of the table schema. The desired results would be ...
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | 374-784-8888
2 | Jones | JG892435262 | NULL | 4537409273 | JG892435262
3 | Trask | NULL | NULL | 9276542119 | 9276542119
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | CL231429888
IS this possible? If so what SQL would I use to calculate what goes in the "Calculated" column?
I was thinking of something like ..
SELECT
CASE
WHEN ([DL-No] is NOT NULL) THEN [DL-No]
WHEN ([SS-No] is NOT NULL) THEN [SS-No]
WHEN ([State-Id-No] is NOT NULL) THEN [State-Id-No]
AS "Calculated"
END
FROM Citizens

The easiest solution is to use coalesce():
select c.*,
coalesce([DL-No], [SS-No], [State-ID-No]) as calculated
from citizens c
However, I think your case statement will also work, if you fix the syntax to use when rather than where.

SQL: Joining two tables with email adresses in SQL Server [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have spent hours researching how to write the proper SQL for the following task, and finally I feel that I have to get some help as this is the most complex SQL query I have ever had to write :(
I am putting together an email list with all the email adresses that we have on our customers.
I have two tables: one customer table that contains customer level information, and one contact person table that contains person level information. Most of the data is overlapping, so the same email adress will occure in both tables. But the email adress field can be empty in both tables, and I do not want to return any empty rows.
Users that buy in our physical store are often only registered in the customer level table, but users that buys online are always registered both in the customer level table and the person level table.
I want to create a full list where I get all email adresses, where all email adresses are unique, no email adresses are duplicates and no email adresses are null.
Also I want to join in columns from the customer table when the data is retrieved from the person table (the zip code in my example below).
Customers
| CustomerID | Firstname | Lastname | Email | Zipcode |
| 22 | Jeff | Carson | jeffcar#mail.com | 81712 |
| 29 | John | Doe | null | 51211 |
| 37 | Gina | Andersen | null | 21147 |
| 42 | Brad | Cole | brad#company.org | 39261 |
Contact persons
| PersonID | CustomerID | Firstname | Lastname | Email |
| 8712 | 22 | Jeff | Carson | null || 8916 | 29 | Jane | Doe | jane#doe.net || 8922 | 29 | Danny | Doe | null |
| 9181 | 37 | Gina | Andersen | gina#gmail.com |
| 9515 | 37 | Ben | Andersen | ben88#gmail.com |
I want to join the tables to generate the following:
Final table
| PersonID | CustomerID | Firstname | Lastname | Email | Zipcode |
| 8712 | 22 | Jeff | Carson | jeffcar#mail.com | 81712 |
| 8916 | 29 | Jane | Doe | jane#doe.net | 51211 |
| 9181 | 37 | Gina | Andersen | gina#gmail.com | 21147 |
| 9515 | 37 | Ben | Andersen | ben88#gmail.com | 21147 |
| null | 42 | Brad | Cole | brad#company.org | 39261 |
I guessed this would be a fairly common task to do, but I haven't found anyone with a similar question, so I put my trust in the expertise out there.

This SQL will get you exactly the results table you were looking for. I've made a live demo you can play with here at SQLFiddle.
SELECT
ContactPerson.PersonID,
Customer.CustomerID,
COALESCE(ContactPerson.FirstName, Customer.FirstName) AS FirstName,
COALESCE(ContactPerson.LastName, Customer.LastName) AS LastName,
COALESCE(ContactPerson.Email, Customer.Email) AS Email,
Customer.ZipCode
FROM Customer
LEFT JOIN ContactPerson
ON ContactPerson.CustomerID = Customer.CustomerID
WHERE COALESCE(ContactPerson.Email, Customer.Email) IS NOT NULL
Results (identical to your desired results):
| PersonID | CustomerID | FirstName | LastName | Email | ZipCode |
| 8712 | 22 | Jeff | Carson | jeffcar#mail.com | 81712 |
| 8916 | 29 | Jane | Doe | jane#doe.net | 51211 |
| 9181 | 37 | Gina | Andersen | gina#gmail.com | 21147 |
| 9515 | 37 | Ben | Andersen | ben88#gmail.com | 21147 |
| NULL | 42 | Brad | Cole | brad#company.org | 39261 |
A quick explanation of some key points to aid understanding:
The query uses a LEFT JOIN to join the two tables together. JOINs are pretty common once you get into SQL problems like this. I won't go into an in-depth explanation here: now that you know what they are called you should have no trouble Googling for loads of info on them!
NB: COALESCE basically means 'the first one of these options which isn't null' (docs). So this query will grab their name and email address from ContactPerson IF POSSIBLE, otherwise from Customer. If NEITHER of these tables hold an email address, then the WHERE clause makes sure that record isn't included at all, as required.

This will work:
SELECT b.PersonID
,a.CustomerID
,a.FirstName
,a.LastName
,COALESCE(a.Email,b.Email) AS Email
,a.ZipCode
FROM Customers a
LEFT JOIN Contact b
ON a.CustomerID = b.CustomerID
WHERE COALESCE(a.Email, b.Email) IS NOT NULL
Demo: SQL Fiddle

select con.personid,
con.customerid,
con.firstname,
con.lastname,
coalesce(con.email, cus.email) email,
cus.zipcode
from contact_persons con
right join
customers cus
on con.customerid = cus.customerid

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I identify distinct combinations across array-columns and then unnest in sql presto - sql

Related

How can I load rows by name and role and order them based on their name, but preference the name? (SQL)

Set inclusion in SQL

Inserting value into column based on column in separate table

SQL for calculated column that chooses from value in own row

SQL: Joining two tables with email adresses in SQL Server [closed]

Categories

Resources