Four Table Join in BigQuery - google-bigquery

Okay, so I'm trying to link together four different tables, and its getting very difficult. I provided snippets of each table in the hopes you all could help out
Table 1: data
+--------+--------+-----------+
| charge | amount | date |
+--------+--------+-----------+
| 123 | 10000 | 2/10/2016 |
| 456 | 10000 | 1/28/2016 |
| 789 | 10000 | 3/30/2016 |
+--------+--------+-----------+
Table 2: data_metadata
+--------+------------+------------+
| charge | key | value |
+--------+------------+------------+
| 123 | identifier | trrkfll212 |
| 456 | code | test |
| 789 | ID | 123xyz |
+--------+------------+------------+
Table 3: buyer
+-----+-----------+----------+----------+
| id | date | discount | plan |
+-----+-----------+----------+----------+
| ABC | 2/13/2016 | yes | option a |
| DEF | 2/1/2016 | yes | option a |
| GHI | 1/22/2016 | no | option a |
+-----+-----------+----------+----------+
Table 4: buyer_metadata
+--------------+-----------+--------+
| id | |key| | value |
+--------------+-----------+--------+
| ABC | migration | TRUE |
| DEF | emid | foo |
| GHI | ID | 123xyz |
+--------------+-----------+--------+
Okay, so the tables data and data_metadata are obviously connected by the charge column.
The tables buyer and buyer_metadata are connected by the id column.
But I want to link all of them together. I'm pretty sure the way to accomplish this is through linking the metadata tables together through the common field in the "value" column (in this example: 123xyz).
Could anyone help?

This might look like something like that if all "link" columns are unique :
SELECT *
FROM data d
JOIN data_metadata dm ON d.charge = dm.charge
JOIN buyer_metada bm ON dm.value = bm.value
JOIN buyer b ON bm.id = b.id
If not, I think you'll have to use something like GROUP BY clause

Let's take it in two steps, first create composite tables for data and buyer. Composite table for data:
SELECT data.charge, data.amount, data.date,
data_metadata.key, data_metadata.value
FROM [data] AS data
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge
And composite table for buyer:
SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer
JOIN (SELECT key, value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id
And then let's join the two composite tables
SELECT composite_data.*, composite_buyer.*
FROM (
SELECT data.charge, data.amount, data.date,
data_metadata.key, data_metadata.value
FROM [data] AS data
JOIN (SELECT charge, key, value FROM [data_metadata]) AS data_metadata
ON data.charge = data_metadata.charge) AS composite_data
JOIN (
SELECT buyer.id, buyer.date, buyer.discount, buyer.plan,
buyer_metadata.key, buyer_metadata.value
FROM [buyer] AS buyer
JOIN (SELECT key, value FROM [buyer_metadata]) AS buyer_metadata
ON buyer.id = buyer_metadata.id) AS composite_buyer
ON composite_data.value = composite_buyer.value
I haven't tested it but it's probably close.
For reference, here is the page on BigQuery JOINs. And have you seen this SO?

Related

How to select table with a concatenated column?

I have the following data:
select * from art_skills_table;
+----+------+---------------------------+
| ID | Name | skills |
+----+------+---------------------------|
| 1 | Anna | ["painting","photography"]|
| 2 | Bob | ["drawing","sculpting"] |
| 3 | Cat | ["pastel"] |
+----+------+---------------------------+
select * from computer_table;
+------+------+-------------------------+
| ID | Name | skills |
+------+------+-------------------------+
| 1 | Anna | ["word","typing"] |
| 2 | Cat | ["code","editing"] |
| 3 | Bob | ["excel","code"] |
+------+------+-------------------------+
I would like to write an SQL statement which results in the following table.
+------+------+-----------------------------------------------+
| ID | Name | skills |
+------+------+-----------------------------------------------+
| 1 | Anna | ["painting","photography","word","typing"] |
| 2 | Bob | ["drawing","sculpting","excel","code"] |
| 3 | Cat | ["pastel","code","editing"] |
+------+------+-----------------------------------------------+
I've tried something like SELECT * from art_skills_table LEFT JOIN computer_table ON name. However it doesn't give what I need. I've read about array_cat but I'm having a bit of trouble implementing it.
if the skills column from both tables are arrays, then you should be able to get away with this:
SELECT a.ID, a.name, array_cat(a.skills, c.skills)
FROM art_skills_table a LEFT JOIN computer_table c
ON c.id = a.id
That said, While you used LEFT join in your sample, I think either an INNER or FULL (OUTER) join might serve you better.
First, i wondered why the data are stored in such a model.
Was of the opinion that NoSQL databases lack ability for joins and ...
... a semantic triple would be in the form of subject–predicate–object.
... a Key-value (KV) stores use associative arrays.
... a relational database would be normalized.
A few information about the use case would have helped.
Nevertheless, you can select the data with CONCAT and REPLACE for the desired form.
SELECT art_skills_table.ID, computer_table.name,
CONCAT(
REPLACE(art_skills_table.skills, '}',','),
REPLACE(computer_table.skills, '{','')
)
FROM art_skills_table JOIN computer_table ON art_skills_table.ID = computer_table.ID
The query returns the following result:
+----+------+--------------------------------------------+
| ID | Name | Skills |
+----+------+--------------------------------------------+
| 1 | Anna | {"painting","photography","word","typing"} |
| 2 | Cat | {"drawing","sculpting","code","editing"} |
| 3 | Bob | {"pastel","excel","code"} |
+----+------+--------------------------------------------+
I've used the ID for the JOIN, even though Bob has different values.
The JOIN should probably be done over the name.
JOIN computer_table ON art_skills_table.Name = computer_table.Name
BTW, you need to tell us what SQL engine you're running on.

Join two tables returning all rows as single row from the second table

I want to get data in a single row from two tables which have one to many relation.
Primary table
Secondary table
I know that for each record of primary table secondary table can have maximum 10 rows. Here is structure of the table
Primary Table
-------------------------------------------------
| ImportRecordId | Summary |
--------------------------------------------------
| 1 | Imported Successfully |
| 2 | Failed |
| 3 | Imported Successfully |
-------------------------------------------------
Secondary table
------------------------------------------------------
| ImportRecordId | CodeName | CodeValue |
-------------------------------------------------------
| 1 | ABC | 123456A |
| 1 | DEF | 8766339 |
| 1 | GHI | 887790H |
------------------------------------------------------
I want to write a query with inner join to get data from both table in a way that from secondary table each row should be treated as column instead showing as multiple row.
I can hard code 20 columns names(as maximum 10 records can exist in secondary table and i want to display values of two columns in a single row) so if there are less than 10 records in the secondary table all other columns will be show as null.
Here is expected Output. You can see that for first record in primary table there was only three rows that's why two required columns from these three rows are converted into columns and for all others columns values are null.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ImportRecordId | Summary | CodeName1 | CodeValue1 | CodeName2 | CodeValue2 | CodeName3 | CodeValue3 | CodeName4 | CodeValue4| CodeName5 | CodeValue5| CodeName6 | CodeValue6| CodeName7 | CodeValue7 | CodeName8 | CodeValue8 | CodeName9 | CodeValue9 | CodeName10 | CodeValue10|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Imported Successfully | ABC | 123456A | DEF | 8766339 | GHI | 887790H | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is my simple SQL query which return all data from both tables but instead multiple rows from secondary table i want to get them in a single row like above result set.
Select p.ImportRecordId,p.Summary,s.*
from [dbo].[primary_table] p
inner join [dbo].[secondary_table] s on p.ImportRecordId = s.ImportRecordId
The following uses Row_Number(), a JOIN and a CROSS APPLY to create the source of the PIVOT
You'll have to add the CodeName/Value 4...10
Example
Select *
From (
Select A.[ImportRecordId]
,B.Summary
,C.*
From (
Select *
,RN = Row_Number() over (Partition by [ImportRecordId] Order by [CodeName])
From Secondary A
) A
Join Primary B on A.[ImportRecordId]=B.[ImportRecordId]
Cross Apply (values (concat('CodeName' ,RN),CodeName)
,(concat('CodeValue',RN),CodeValue)
) C(Item,Value)
) src
Pivot (max(value) for Item in (CodeName1,CodeValue1,CodeName2,CodeValue2,CodeName3,CodeValue3) ) pvt
Returns
ImportRecordId Summary CodeName1 CodeValue1 CodeName2 CodeValue2 CodeName3 CodeValue3
1 Imported Successfully ABC 123456A DEF 8766339 GHI 887790H

How to update the PK of every record

I have two tables that are exactly the same, except for one column. Both tables have the same amount of records and the same amount of columns, and all of the data is the same.
Table A:
+--------------------------------------+--------+--------+---------+
| GUID | Animal | Person | Vehicle |
+--------------------------------------+--------+--------+---------+
| 1D001609-7071-4DBB-9E65-0000B3EEF751 | cat | matt | car |
| 90260783-E3C3-4A9B-BEA0-000388EA41E1 | dog | rich | truck |
| DD18FCFA-99BD-4FBC-AFC2-00058EF95D0A | zebra | alex | van |
+--------------------------------------+--------+--------+---------+
Table B:
+--------------------------------------+--------+--------+---------+
| GUID | Animal | Person | Vehicle |
+--------------------------------------+--------+--------+---------+
| F67A3079-8589-4304-AA3C-000688696BAA | cat | matt | car |
| C71710EC-492F-424E-805D-00068AFE4E82 | dog | rich | truck |
| 5F830142-F4CC-4580-974D-000710F1AB5F | zebra | alex | van |
+--------------------------------------+--------+--------+---------+
I need the GUIDs of table A to be updated to equal the GUIDs of Table B. Something like this will work:
update a
set a.guid=b.guid
from tablea a
join tableb b
on a.animal=b.animal
and a.person=b.person
and a.vehicle=b.vehicle
But the above solution will not work for me. I have actually approximately 50 columns and about 1 million records.
Would you have any other suggestions on how to update the GUID in table A more efficiently?
UPDATE A
SET [GUID] = (
SELECT [GUID]
FROM B
WHERE EXISTS (
SELECT A.Animal,A.Person,A.Vehicle
INTERSECT
SELECT B.Animal,B.Person,B.Vehicle
)
)
If you want the tables identical, how about this:
DROP TABLE B
SELECT *
INTO B
FROM A

Join table condition for between 2 rows

Is it possible to join these tables:
Log table:
+--------+---------------+------------+
| name | ip | created |
+--------+---------------+------------+
| 408901 | 178.22.51.168 | 1390887682 |
| 408901 | 178.22.51.168 | 1390927059 |
| 408901 | 178.22.51.168 | 1390957854 |
+--------+---------------+------------+
Orders table:
+---------+------------+
| id | created |
+---------+------------+
| 8563863 | 1390887692 |
| 8563865 | 1390897682 |
| 8563859 | 1390917059 |
| 8563860 | 1390937059 |
| 8563879 | 1390947854 |
+---------+------------+
Result table would be:
+---------+--------------+---------+---------------+------------+
|orders.id|orders.created|logs.name| logs.ip |logs.created|
+---------+--------------+---------+---------------+------------+
| 8563863 | 1390887692 | 408901 | 178.22.51.168 | 1390887682 |
| 8563865 | 1390897682 | 408901 | 178.22.51.168 | 1390887682 |
| 8563859 | 1390917059 | 408901 | 178.22.51.168 | 1390887682 |
| 8563860 | 1390937059 | 408901 | 178.22.51.168 | 1390927059 |
| 8563879 | 1390947854 | 408901 | 178.22.51.168 | 1390927059 |
+---------+--------------+---------+---------------+------------+
Is it possible?
Espessialy, if first table is result of some query.
UPDATE
Sorry for this mistake. I want found in log who make order. So orders table relate to logs table by created field, i.e.
first row with condition (orders.created >= log.created)
This will result in a non-equi join with a horrible performance:
SELECT *
FROM t2 JOIN t1
ON t1.created =
(
SELECT MAX(t1.created)
FROM t1 WHERE t1.created <= t2.created
)
You might better go with a cursor based on a UNION like this (you probably need to add some type casts to get a working UNION):
SELECT *
FROM
(
SELECT NULL AS name, NULL AS ip, NULL AS created2, t2.*
FROM t2
UNION ALL
SELECT t1.*, NULL AS id, NULL AS created
FROM t1
) AS dt
ORDER BY COALESCE(created, created2)
Now you can process the rows in the right order and remember the rows from the last t1 row.
There is nothing to bind these 2 together.
No ID or other column exists in both tables.
If this were the case, you could join these 2 tables in a stored procedure.
At the moment you ask the first query, store the data in a newly created table, use it in the join to get your results and delete it afterwards.
Kind regards
simply you can use union
select id, created from table_2
union all
select name, ip, created from table_1

PostgreSQL Inner Join on the same table + second table?

If this is a stupid question, forgive me, I'm not very familiar with PostgreSQL.
I've collected inventory data from used car dealerships in my area and stored it in a postgreSQL table. I've got a second table with particular details regarding certain makes and models. For example:
The dealership table is structured like so:
-----------------------------------------
| Dealership | Make | Model | Year | ID |
----------------------------------------|
| A | Ford | F250 | 2003 | 1 |
| A | Chevy| Cobalt| 2005 | 2 |
| B | Ford | F250 | 2003 | 1 |
| B | Dodge| Chrgr | 2012 | 3 |
-----------------------------------------
The details table is structured like so:
-----------------------------------------
| ID | DetailA| DetailB| DetailC|
-----------------------------------------
| 1 | data | data | data |
| 2 | data | data | data |
| 3 | data | data | data |
| 4 | data | data | data |
-----------------------------------------
My goal is to retrieve vehicle matches from multiple dealerships and display the appropriate details. In the above example, I would like to see:
-----------------------------------------------------
| Make | Model | Year | DetailA | DetailB | DetailC |
-----------------------------------------------------
| Ford | F250 | 2003 | data | data | data |
-----------------------------------------------------
With this result, I will know that both A and B havea 2003 Ford F250 for sale, and can view the related details of the vehicle.
I've tried many different queries, but most are variations on something like this:
SELECT DISTINCT
dealership_table.make,
dealership_table.model,
dealership_table.year
details_table.detaila,
details_table.detailb,
details_table.detailc
FROM
dealership_table
INNER JOIN
details_table
ON
dealership_table.id = details_table.id
WHERE
dealership_table.dealership = 'A'
OR
dealership_table.dealership = 'B'
However this returns all of the distinct matches from the table where dealership is either A or B. I've tried multiple inner-joins, but I an error complaining details_table is specified multiple times.
If I'm doing something really silly, I apologize. Like I said before, I'm pretty much an SQL noob.
What am I doing wrong? How should I go about retrieving the desired results? Any suggestions, solutions, or advice is greatly appreciated!
You can write:
SELECT dealership_table1.make,
dealership_table1.model,
dealership_table1.year,
details_table.detaila,
details_table.detailb,
details_table.detailc
FROM dealership_table dealership_table1
JOIN dealership_table dealership_table2
ON dealership_table1.make = dealership_table2.make
AND dealership_table1.model = dealership_table2.model
AND dealership_table1.year = dealership_table2.year
JOIN details_table
ON dealership_table.id = details_table.id
WHERE dealership_table1.dealership = 'A'
AND dealership_table1.dealership = 'B'
;
(Note that the FROM dealership_table dealership_table1 and JOIN dealership_table dealership_table2 set up distinct "aliases", so you can use the same table multiple different times in the same query without getting name-conflicts.)
I may be misunderstanding your table layout, but I think you should consider changing to a different structure. Here's what I would propose:
Vehicle:
----------------------------
| ID | Make | Model | Year |
----------------------------
| 1 | Ford | F250 | 2003 |
| 2 | Chevy| Cobalt| 2005 |
| 3 | Dodge| Chrgr | 2012 |
----------------------------
Dealership:
----------------------------
| Dealership | ID | Detail |
----------------------------
| A | 1 | data |
| A | 2 | data |
| B | 1 | data |
| B | 3 | data |
----------------------------
This way you're not storing vehicle information (make/model/year) in more than one place.
Here's how you would write your desired query given the above schema:
SELECT Make, Model, Year, A.Detail, B.Detail, C.Detail
FROM Vehicle V
LEFT OUTER JOIN Dealership A on A.Dealership = 'A' and A.id = V.id
LEFT OUTER JOIN Dealership B on B.Dealership = 'B' and B.id = V.id
LEFT OUTER JOIN Dealership C on C.Dealership = 'C' and C.id = V.id