How to convert list of comma separated Ids into their name? - sql

I have a table that contains:
id task_ids
1 10,15
2 NULL
3 17
I have the table that has the names of this tasks:
id task_name
10 a
15 b
17 c
I want to generate the following output
id task_ids task_names
1 10,15 a,b
2 null null
3 17 c
I know this structure isn't ideal but this is legacy table which I will not change now.
Is there easy way to get the output ?
I'm using Presto but I think this can be solved with native sql

WITH data AS (
SELECT * FROM (VALUES (1, '10,15'), (2, NULL)) x(id, task_ids)
),
task AS (
SELECT * FROM (VALUES ('10', 'a'), ('15', 'b')) x(id, task_name)
)
SELECT
d.id, d.task_ids
-- array_agg will obviously capture NULL task_name comping from LEFT JOIN, so we need to filter out such results
IF(array_agg(t.task_name) IS NOT DISTINCT FROM ARRAY[NULL], NULL, array_agg(t.task_name)) task_names
FROM data d
-- split task_ids by `,`, convert into numbers, UNNEST into separate rows
LEFT JOIN UNNEST (split(d.task_ids, ',')) AS e(task_id) ON true
-- LEFT JOIN with task to pull the task name
LEFT JOIN task t ON e.task_id = t.id
-- aggregate back
GROUP BY d.id, d.task_ids;

You have a horrible data model, but you can do what you want with a bit of effort. Arrays are better than strings, so I'll just use that:
select t.id, t.task_id, array_agg(tt.task_name) as task_names
from t left join lateral
unnest(split(t.task_ids, ',')) u(task_id)
on 1=1 left join
tasks tt
on tt.task_id = u.task_id
group by t.id, t.task_id;
I don't have Presto on hand to test this. But this or some minor variant should do what you want.
EDIT:
This version might work:
select t.id, t.task_id,
(select array_agg(tt.task_name)
from unnest(split(t.task_ids, ',')) u(task_id) join
tasks tt
on tt.task_id = u.task_id
) as task_names
from t ;

Related

Optimizing SQL Cross Join that checks if any array value in other column

Let's say I have a table events with structure:
id
value_array
XXXX
[a,b,c,d]
...
...
I have a second table values_of_interest with structure:
value
x
y
z
a
I want to find id's that have any of the values found in values_of_interest. All else equal, what would be the most performant SQL to make this happen? (I am using BigQuery, but feel free to answer more generally)
My current thought is:
SELECT
DISTINCT e.id
FROM
events e, values_of_interest vi
WHERE
EXISTS(
SELECT
value
FROM
UNNEST(e.value_array) value
JOIN
vi ON vi.value = e.value
)
Few quick options for BigQuery Standard SQL
Option 1
select id
from `project.dataset.events`
where exists (
select 1
from `project.dataset.values_of_interest`
where value in unnest(value_array)
)
Option 2
select id
from `project.dataset.events` t
where (
select count(1)
from t.value_array as value
join `project.dataset.values_of_interest`
using(value)
) > 0
I would write this using exists and a join:
select e.id
from `project.dataset.events` e
where exists (select 1
from unnest(e.value_array) val join
`project.dataset.values_of_interest` voi
on val = voi.value
);

Unable to convert this legacy SQL into Standard SQL in Google BigQuery

I am not able to validate this legacy sql into standard bigquery sql as I don't know what else is required to change here(This query fails during validation if I choose standard SQL as big query dialect):
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name,
salesperson.name,
rate_card.*
FROM (
SELECT
*
FROM
dfp_data.dfp_order_lineitem
WHERE
DATE(end_datetime) >= DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'YEAR'))
OR end_datetime IS NULL ) lineitem
JOIN (
SELECT
*
FROM
dfp_data.dfp_order) porder
ON
lineitem.order_id = porder.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal_lineitem) proposal_lineitem
ON
lineitem.id = proposal_lineitem.dfp_lineitem_id
JOIN (
SELECT
*
FROM
dfp_data.dfp_company) company
ON
porder.advertiser_id = company.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_product) product
ON
proposal_lineitem.product_id=product.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_proposal) proposal
ON
proposal_lineitem.proposal_id=proposal.id
LEFT JOIN (
SELECT
*
FROM
adpoint_data.dfp_rate_card) rate_card
ON
proposal_lineitem.ratecard_id=rate_card.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) trafficker
ON
porder.trafficker_id =trafficker.id
LEFT JOIN (
SELECT
id,
name
FROM
dfp_data.dfp_user) salesperson
ON
porder. salesperson_id =salesperson.id
Most likely the error you are getting is something like below
Duplicate column names in the result are not supported. Found duplicate(s): name
Legacy SQL adjust trafficker.name and salesperson.name in your SELECT statement into respectively trafficker_name and salesperson_name thus effectively eliminating column names duplication
Standard SQL behaves differently and treat both those columns as named name thus producing duplication case. To avoid it - you just need to provide aliases as in example below
SELECT
lineitem.*,
proposal_lineitem.*,
porder.*,
company.*,
product.*,
proposal.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name,
rate_card.*
FROM ( ...
You can easily check above explained using below simplified/dummy queries
#legacySQL
SELECT
porder.*,
trafficker.name,
salesperson.name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
and
#standardSQL
SELECT
porder.*,
trafficker.name AS trafficker_name,
salesperson.name AS salesperson_name
FROM (
SELECT 1 order_id, 'abc' order_name, 1 trafficker_id, 2 salesperson_id
) porder
LEFT JOIN (SELECT 1 id, 'trafficker' name) trafficker
ON porder.trafficker_id =trafficker.id
LEFT JOIN (SELECT 2 id, 'salesperson' name ) salesperson
ON porder. salesperson_id =salesperson.id
Note: if you have more duplicate names - you need to alias all of them too

Multiple records as result of inner join on array values

I have two tables in Postgres:
events (id, occurrence_dates) # occurrence_dates is array
dates_calendar (date)
When events.occurrence_dates consists of a few equal values, {"2017-11-08 00:00:00","2017-11-08 00:00:00"} 2 values as example, following query gives a single record:
SELECT "events".*
FROM "events"
INNER JOIN dates_calendar dc ON dc.date = ALL(occurrence_dates)
How to get as many records as length of events.occurrence_dates?
UPD: i'm using Ruby on Rails, but question is provided in SQL context
Rails scope based on #michel.milezzi answer:
scope :all_events, -> do
select("events_sb.*").from(<<-SQL.squish)
dates_calendar dc
INNER JOIN (SELECT *, UNNEST(occurrence_dates) oc_date FROM events) AS events_sb
ON (events_sb.oc_date = dc.date)
SQL
end
You can use UNNEST to expand the array and then use a regular join:
-- CTE with test data
WITH dates_calendar(date) AS (
VALUES
('2017-11-08 00:00:00'::TIMESTAMP),
('2017-11-09 00:00:00'),
('2017-11-10 00:00:00')
), events (id, occurrence_dates) AS (
VALUES
(1, '{"2017-11-08 00:00:00", "2017-11-08 00:00:00","2017-11-09 00:00:00","2017-11-10 00:00:00"}'::TIMESTAMP[]),
(2, '{"2017-11-08 00:00:00","2017-11-09 00:00:00"}'),
(3, '{"2017-11-08 00:00:00"}')
), events_sb AS (
SELECT id, UNNEST(occurrence_dates) oc_date FROM events
)
SELECT
events_sb.*
FROM
dates_calendar dc JOIN events_sb ON (events_sb.oc_date = dc.date) ORDER BY id;
--CTE
WITH events_sb AS (
SELECT id, UNNEST(occurrence_dates) oc_date FROM events
)
SELECT
events_sb.*
FROM
dates_calendar dc JOIN events_sb ON (events_sb.oc_date = dc.date) ORDER BY id;
--SUBQUERY
SELECT
events_sb.*
FROM
dates_calendar dc JOIN (SELECT id, UNNEST(occurrence_dates) oc_date FROM events) AS events_sb ON (events_sb.oc_date = dc.date) ORDER BY id;
To call it in Rails you must do a plain sql, such explained here and here.
Is occurrence_dates a json data type? If yes, you could simply use the json_populate_recordset() function.

Compare fields from different rows

First off I am using SQL Server.
I am joining a table on itself like in the example below:
SELECT t.theDate,
s.theDate,
t.bitField,
s.bitField,
t.NAME,
s.NAME
FROM table1 t
INNER JOIN table1 s ON t.NAME = s.NAME
If I take a random row (i.e. X) from the dataset produced.
Can I compare values in any field on row X to values in any field on row X-1 OR row X+1?
Example: I want to compare t.theDate on row 5 to s.theDate on row 4 or s.theDate on row 3.
Sample data looks like:
Desired results:
I want to pull all pairs of rows where the t.bitfield and s.bitfield are opposite and t.theDate and s.theDate are opposite.
From the image the would be row (3 & 4), (5 & 6), (7 & 8) ... etc.
I really appreciate any help!
Can it be done?
Varinant 1: It looks like you would like to use ranking function.
if objcet_id('tempdb..#TmpOrderedTable') is not null drop table #TmpOrderedTable
select *, row_number(order by columnlist, (select 0)) rn
into #TmpOrderedTable
from table1 t
select *
from #TmpOrderedTable t0
inner join #TmpOrderedTable tplus on t0.rn = tplus.rn + 1 -- next one
inner join #TmpOrderedTable tminus on t0.rn = tminus.rn - 1 -- previous one
Varinant 2:
To get scalar values you can use ranking function lag and lead. Or subquery.
Varinant 3:
You can use selfjoin, but you have to specify unique nonarbitary key if you don't want duplicates.
Varinant 4:
You can use apply.
Your question isn't too clear, so i hope it was your goal.
How about this?
WITH ts as (
SELECT t.theDate as theDate1, s.theDate as theDate2,
t.bitField as bitField1, s.bitField as bitField2,
t.NAME -- there is only one name
FROM table1 t INNER JOIN
table1 s
ON t.NAME = s.NAME
)
SELECT ts.*
FROM ts
WHERE EXISTS (SELECT 1
FROM ts ts2
WHERE ts2.name = ts.name AND
ts2.theDate1 = ts.theDate2 AND
ts2.theDate2 = ts.theDate1 AND
ts2.bitField1 = ts.bitField2 AND
ts2.bitField2 = ts.bitField1
);

SQL - Select records not present in another table (3 table relation)

I have 3 tables:
Table_Cars
-id_car
-description
Table_CarDocuments
-id_car
-id_documentType
-path_to_document
Table_DocumentTypes
-id_documentType
-description
I want to select all cars that do NOT have documents on the table Table_CarDocuments with 4 specific id_documentType.
Something like this:
Car1 | TaxDocument
Car1 | KeyDocument
Car2 | TaxDocument
With this i know that i'm missing 2 documents of car1 and 1 document of car2.
You are looking for missing car documents. So cross join cars and document types and look for combinations NOT IN the car douments table.
select c.description as car, dt.description as doctype
from table_cars c
cross join table_documenttypes dt
where (c.id_car, dt.id_documenttype) not in
(
select cd.id_car, cd.id_documenttype
from table_cardocuments cd
);
UPDATE: It shows that SQL Server's IN clause is very limited and not capable of dealing with value lists. But a NOT IN clause can easily be replaced by NOT EXISTS:
select c.description as car, dt.description as doctype
from table_cars c
cross join table_documenttypes dt
where not exists
(
select *
from table_cardocuments cd
where cd.id_car = c.id_car
and cd.id_documenttype = dt.id_documenttype
);
UPDATE: As you are only interested in particular id_documenttype (for which you'd have to add and dt.id_documenttype in (1, 2, 3, 4) to the query), you can generate records for them on-the-fly instead of having to read the table_documenttypes.
In order to do that replace
cross join table_documenttypes dt
with
cross join (values (1), (2), (3), (4)) as dt(id_documentType)
You can use the query below to get the result:
SELECT
c.description,
dt.description
FROM
Table_Cars c
JOIN Table_CarDocuments cd ON c.id_car = cd.id_car
JOIN Table_DocumentTypes dt ON cd.id_documentType = dt.id_documentType
WHERE
dt.id_documentType NOT IN (1, 2, 3, 4) --replace with your document type id
Thanks to #Thorsten Kettner help
select c.description as car, dt.description as doctype
from table_cars c
cross join table_documenttypes dt
where dt.id no in (
(
select cd.id_documentType
from table_cardocuments cd
where cd.idcar = c.id AND cd.id_doctype = dt.id
)
AND dt.id IN (1, 2, 3, 4)
This can be a complicated query. The idea is to generate all combinations of cars and the four documents that you want (using cross join). Then use left join to determine if the document actually exists:
select c.id_car, dd.doctype
from cars c cross join
(select 'doc1' as doctype union all
select 'doc2' union all
select 'doc3' union all
select 'doc4'
) dd left join
CarDocuments cd
on c.id_car = cd.id_car left join
Documents d
on cd.id_document_type = d.id_document_type and d.doctype = dd.doctype
where dd.id_document_type is null;
Finally, the where clause finds the car/doctype pairs that are not present in the data.