Join in Google BigQuery via Cloud Datalab - sql

I am trying to do JOIN on two columns from two different tables (one of them is a view) in Google BigQuery. I have tried this numerous ways, but have received this error the most consistently:
invalidQuery: 2.1 - 0.0: JOIN cannot be applied directly to a table union or to a table wildcard function. Consider wrapping the table union or table wildcard function in a subquery (e.g., SELECT *).
Here is my SQL (legacy) query:
SELECT
blp_today.beta_key,
blp_today.px_last,
blp_today.eqy_weighted_avg_px,
blp_today.created_date,
blp_today.security_ticker,
ciq_company_stg.ticker,
ciq_company_stg.ciq
FROM
[fcm-dw:acquisition_bloomberg.blp_today],
[fcm-dw:acquisition_ciq]
JOIN
blp_today.security_ticker AS ticker
ON
blp_today.security_ticker = ciq_company_stg.ticker
LIMIT 1000
Any help would be much appreciated.

I think you either want something like this:
SELECT * FROM(SELECT
beta_key,
px_last,
eqy_weighted_avg_px,
created_date,
security_ticker,
FROM
[fcm-dw:acquisition_bloomberg.blp_today],
[fcm-dw:acquisition_ciq] ) as a
JOIN
blp_today.security_ticker AS ticker
ON
a.security_ticker = ciq_company_stg.ticker
LIMIT 1000
//edit: I kind of missed earlier that the table that you are joining (after your join statement) does not actually seem to be a table. Are you trying to join or to union these two tables: [fcm-dw:acquisition_bloomberg.blp_today] and [fcm-dw:acquisition_ciq] ? And is the latter even a table? Your code seems to indicate that there is another table named: [fcm-dw:acquisition_ciq.ciq_company_stg]?

First wrap your union into a sub select then join the result
select ...
FROM
(select * from
[fcm-dw:acquisition_bloomberg.blp_today],
[fcm-dw:acquisition_ciq] ) t
JOIN
blp_today.security_ticker AS ticker

Related

Joining tables with incompatible types

I'm trying to join two tables using this command :
SELECT * FROM bigquery-public-data.github_repos.files INNER JOIN bigquery-public-data.github_repos.commits USING (repo_name)
but there are incompatible types on either side of the join: STRING and ARRAY< STRING> Is there a way to go through this ?
Thank you !
You want to join a 2 billion row table with a 200 million row one. This won't end up well, unless you define restrictions on what you want to get out of this.
As for the technical problems of this query: The error says you are trying to JOIN a single value with an array of values. You need to UNNEST() that array.
This would work syntactically:
SELECT *
FROM `bigquery-public-data.github_repos.files` a
INNER JOIN (
SELECT * EXCEPT(repo_name)
FROM `bigquery-public-data.github_repos.commits`
, UNNEST(repo_name) repo
) b
ON a.repo_name=b.repo
But if you go for it, it will use all your free monthly quota (1TB of data scanned) for no good purpose, as far as I can tell.

SQL - Select only statement: How do create something similar to a "Vlookup"?

I am new to stackoverflow so hope I am phrasing this clear enough.
I have a lot of data on our sql server and can only use a Select statement to extract data. I am looking now to extract a certain part out of a part name so I can see how many of that type we use:
SELECT CASE
WHEN Part.Part_No like '%B-123%' THEN 'B-123'
WHEN Part.Part_No like '%B-456%' THEN 'B-456'
WHEN Part.Part_No like '%IW-10%' THEN 'IW-10'
WHEN Part.Part_No like '%T-TLT%' THEN 'T-TLT'
WHEN Part.Part_No like '%B-TLT3060%' THEN 'B-TLT3060'
ELSE NULL END AS Type
FROM dbo.CATEGORY
So rather then writing hundreds of these lines, I was wondering if I can add these in a table and then run through the table like a VLookup. But I can only do it in the select statement without creating new tables ( I am not too sure how this would work with a temp table).
Anyone any ideas?
You could use a CTE for managing your lookup table. You can then join (left join) your table and the lookup table, using LIKE as a join clause.
This query should do it:
WITH cte AS (SELECT '%B-123%' AS a, 'B-123' AS b
UNION SELECT '%B-456%', 'B-456'
UNION SELECT '%IW-10%', 'IW-10'
UNION SELECT '%T-TLT%', 'T-TLT'
UNION SELECT '%B-TLT3060%', 'B-TLT3060')
SELECT c.b
FROM dbo.CATEGORY t
LEFT OUTER JOIN cte c
ON Part.Part_No LIKE c.a
Depending on your data and table structure, it could happen that it gets more than one result for each row (in case one part_no matches more than one case). If that's your case, you could limit it using a GROUP BY on the primary key of the table.

SQL subquery multiple times error

I am making a subquery but I am getting a strange error
The column 'RealEstateID' was specified multiple times for 'NotSold'.
here is my code
SELECT *
FROM
(SELECT *
FROM RealEstatesInfo AS REI
LEFT JOIN Purchases AS P
ON P.RealEstateID=REI.RealEstateID
WHERE DateBought IS NULL) AS NotSold
INNER JOIN OwnerEstate AS OE
ON OE.RealEstateID=NotSold.RealEstateID
It's on SQL server by the way.
That's because there will be 2 realestiteids in your subquery. You need to change it to explicitly list the columns from both table and only include 1 realestateid. It doesn't matter which as you use it for your join.
If you're very Lazy you can select rei.* and only name the p cols apart from realestateid.
Btw select * is probably never a good idea in sub queries or derived tables or ctes.

Memory allocation failed: How to combine four result sets into one table

I have four tables. Every table has just one column with 32768 rows, like:
|calculated|
|2.45644534|
|3.23323567|
[...]
Now I want to combine these four results/tables into one table with four columns, like:
|calc1|calc2|calc3|calc4|
[values]
There are no IDs or something else to identify unique rows.
This is my query:
SELECT A.*, B.*, C.*, D.*
FROM
(
SELECT * FROM :REAL_RESULT
) AS A
JOIN
(
SELECT * FROM :PHASE_RESULT
) AS B
ON 1=1
JOIN
(
SELECT * FROM :AMPLITUDE_RESULT
) AS C
ON 1=1 [...]
Now the server is throwing this error:
Error: (dberror) 2048 - column store error: search table error:
"TEST"."data::fourier": line 58
col 4 (at pos 1655): [2048] (range 3): column store error: search
table error: [9] Memory allocation failed
What can I do now? Are there any other options? Thanks!
what you do in your original code is effectively a cross join on four tables, each containing 2^15 rows. The result size would contain 2^60 rows, quite a few petabyte... That's the reason for the OOM. I used a similar example to show colleagues what can happen when joining big tables with the wrong the join condition.
Besides that, SQL is set based and your rows do not have any natural order.
If the tables are column store tables, you could technically join on the internal column $rowid$. But $rowid$ is not officially documented and I can therefore not recommend using it.
A clean solution is the one suggested by Craig. I would probably use an IDENTITY column.
If this cross join was not your original intention, but you wanted join a list of values without any actual join condition you might try UNION:
SELECT COLUMN,0,0,0 from A
union all
SELECT 0,COLUMN,0,0 from B
union all
SELECT 0,0,COLUMN,0 from C
union all
SELECT 0,0,0,COLUMN from D
The output will be the sum of all records for these tables.

Specifying SELECT, then joining with another table

I just hit a wall with my SQL query fetching data from my MS SQL Server.
To simplify, say i have one table for sales, and one table for customers. They each have a corresponding userId which i can use to join the tables.
I wish to first SELECT from the sales table where say price is equal to 10, and then join it on the userId, in order to get access to the name and address etc. from the customer table.
In which order should i structure the query? Do i need some sort of subquery or what do i do?
I have tried something like this
SELECT *
FROM Sales
WHERE price = 10
INNER JOIN Customers
ON Sales.userId = Customers.userId;
Needless to say this is very simplified and not my database schema, yet it explains my problem simply.
Any suggestions ? I am at a loss here.
A SELECT has a certain order of its components
In the simple form this is:
What do I select: column list
From where: table name and joined tables
Are there filters: WHERE
How to sort: ORDER BY
So: most likely it was enough to change your statement to
SELECT *
FROM Sales
INNER JOIN Customers ON Sales.userId = Customers.userId
WHERE price = 10;
The WHERE clause must follow the joins:
SELECT * FROM Sales
INNER JOIN Customers
ON Sales.userId = Customers.userId
WHERE price = 10
This is simply the way SQL syntax works. You seem to be trying to put the clauses in the order that you think they should be applied, but SQL is a declarative languages, not a procedural one - you are defining what you want to occur, not how it will be done.
You could also write the same thing like this:
SELECT * FROM (
SELECT * FROM Sales WHERE price = 10
) AS filteredSales
INNER JOIN Customers
ON filteredSales.userId = Customers.userId
This may seem like it indicates a different order for the operations to occur, but it is logically identical to the first query, and in either case, the database engine may determine to do the join and filtering operations in either order, as long as the result is identical.
Sounds fine to me, did you run the query and check?
SELECT s.*, c.*
FROM Sales s
INNER JOIN Customers c
ON s.userId = c.userId;
WHERE s.price = 10