I am looking to convert the below Legacy SQL Query to Standard SQL. The issue I am having is that I need to unnest two tables (labels and credits). How can I convert this query? Thanks!
I run into a "Scalar subquery produced more than one element" whenever I try to rewrite this query (see below).
Legacy SQL Query I am trying to rewrite:
SELECT
service.description,
sku.description,
usage_start_time,
usage_end_time,
labels.key,
labels.value,
cost,
usage.amount,
project.name,
credits.name,
credits.amount
FROM
flatten([gcp_billing_data.gcp_billing_export],
credits)
What I have tried so far in Standard SQL:
SELECT
service.description AS service,
sku.description AS sku,
usage_start_time,
usage_end_time,
l.key,
l.value,
cost,
usage.amount AS usage,
project.name AS project,
c.name AS credit,
c.amount
FROM
`gcp_billing_data.gcp_billing_export`,
UNNEST(labels) AS l,
UNNEST(credits) AS c
Group by 1,2,3,4,5,7,8,9,10,11
This query runs, but the number of rows is significantly less than I would expect.
quick and formal fix for your query in Standard SQL is something like to replace below
(select l.value from unnest(labels) as l)
with
(select string_agg(l.value) from unnest(labels) as l)
But it is still not exactly the same as what initial version of your Legacy SQL version of query is doing
Related
I'm learning the databricks platform at the moment, and I'm on a lesson where we are talking about CTE's. This specific query is of a CTE in a CTE definition, and the girl in the video is not doing the best job breaking down what exactly this query is doing.
WITH lax_bos AS (
WITH origin_destination (origin_airport, destination_airport) AS (
SELECT
origin,
destination
FROM
external_table
)
SELECT
*
FROM
origin_destination
WHERE
origin_airport = 'LAX'
AND destination_airport = 'BOS'
)
SELECT
count(origin_airport) AS `Total Flights from LAX to BOS`
FROM
lax_bos;
the output of the query comes out to 684 which I know comes from the last select statement, It's just mostly everything that's going on above, I don't fully understand what's happening.
at first you choose 2 needed columns from external_table and name this cte "origin_destination" :
SELECT
origin,
destination
FROM
external_table
next you filter it in another cte named "lax_bos"
SELECT
*
FROM
origin_destination ------the cte you already made
WHERE
origin_airport = 'LAX'
AND destination_airport = 'BOS'
and this is the main query where you use cte "lax_bos" that you made in previous step, here you just count a number of flights:
SELECT
count(origin_airport) AS `Total Flights from LAX to BOS`
FROM
lax_bos
Nesting CTE's is wierd. Normally they form a single-level transformation pipeline, like this:
WITH origin_destination (origin_airport, destination_airport) AS
(
SELECT origin, destination
FROM external_table
), lax_bos AS
(
SELECT *
FROM origin_destination
WHERE origin_airport = 'LAX'
AND destination_airport = 'BOS'
)
SELECT count(origin_airport) AS `Total Flights from LAX to BOS`
FROM lax_bos;
I do not understand why you are using an common table expression (cte).
I am going to give you a quick overview of how this can be done without an cte.
Always, use some type of sample data set. There are plenty that are installed with databricks. In fact, there is one for delayed airplane departures.
The next step is to read in the file and convert it to a temporary view.
At this point, we can use the Spark SQL magic command to query the data.
The query shows plane flights from LAX to BOS. We can remove the limit 10 option and change the '*' to "count(*) as Total" to get your answer. Thus, we solved the problem without a CTE.
The above image uses a CTE to pull the origin, destination and delay for all flights from LAX to BOS. Then it bins the delays from -9 to 9 hours with counts.
Again, this can all be done in one SQL statement that might be cleaner.
I reserve CTE for more complex situations. For instance, calculating a complex math formula using a range of data and paring it with the base data set.
CTE can be recursive query, or subquery. Here, they are only simple subquery.
1st, the query origin_destination is done. Second, the query lax_bos is done over origin_destination result. And then, the final query is done on lax_bos result.
I'm unable to convert MS Access query to SQL SERVER Query, with changing the group by columns because it will effect in the final result. The purpose of this query is to calculate the Creditor and debtor of accounts of projects.
I tried rewriting with 'CTE' but couldn't get any good result.. I hope someone could help me.. Thanks in advance...
this is the query I want to convert:
SELECT Sum(ZABC.M) AS M, Sum(ZABC.D) AS D, ZABC.ACC_NUMBER, ZABC.PROJECT_NUMBER, [M]-[D] AS RM, [D]-
[M] AS RD
FROM ZABC
GROUP BY ZABC.ACC_NUMBER, ZABC.PROJECT_NUMBER
ORDER BY ZABC.PROJECT_NUMBER;
The problem with the query are [M] and [D] in the select clause: these columns should either be repeated in the group by clause, or surrounded by an aggregate function. Your current group by clause gives you one row per (acc_number, project_number) tuple: you need to choose which computation you want for D and M, that may have several different values per group.
You did not explain the purpose of the original query. Maybe you meant:
SELECT
Sum(ZABC.M) AS M,
Sum(ZABC.D) AS D,
ZABC.ACC_NUMBER,
ZABC.PROJECT_NUMBER,
Sum(ZABC.M) - SUM(ZABC.D) AS RM,
SUM(ZABC.D) - SUM(ZABC.M) AS RD
FROM ZABC
GROUP BY ZABC.ACC_NUMBER, ZABC.PROJECT_NUMBER
ORDER BY ZABC.PROJECT_NUMBER;
There is a vast variety of aggregate functions available for you to pick from, such as MIN(), MAX(), AVG(), and so on.
I'm trying to write a few Oracle SQL scripts for an assignment. I've managed to get all of it to work, except for one part. To summarize, I have to display data from 2 tables if the average of 1 column in table A is greater than the average of another column in table B. I realize you cannot include AVG functions in a WHERE clause or HAVING clause since it seems unable to properly access the data (from what I've read). When I exclude this clause, the script executes properly, so I'm confident there are no other errors.
I've tried writing it as follows but the error I get is ORA-00936: missing expression and it is just before the > sign. I thought this may be due to improper bracket placing but none of my attempts resolved this. Here is my attempt:
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING (SELECT AVG(l.l_cost) OVER (PARTITION BY l.l_cost)) >
(SELECT AVG(r.r_sold) OVER (PARTITION BY r.r_sold));
I tried doing this without the OVER (PARTITION BY ...) as well as putting it into a WHERE clause but it didn't resolve the error. I'm pretty sure I need to put it into a SELECT statement somehow but I'm at a loss.
You do not need to use the OVER clause when applying the aggregate functions in the HAVING clause. Just use the aggregate functions on their own.
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING HAVING AVG(l.l_cost) > AVG(r.r_sold)
I recently purchased LINQPad in hopes that it would allow me to convert SQL statements into LINQ statements.
Using LINQPad, I am able to attach a DB and run the SQL statement which returns the results I need.
But I can not find a 'command' to convert that SQL statement into LINQ.
Can you please let me know how to convert SQL to LINQ by using LINQPad OR another tool?
There is a tool called Linqer, but be careful: transliterating from SQL to LINQ can give you the worst of both worlds.
For instance, suppose you want all purchases of $1000 or greater paid for in cash or by customers who live in Washington. Here's the query in SQL:
SELECT p.*
FROM Purchase p
LEFT OUTER JOIN
Customer c INNER JOIN Address a ON c.AddressID = a.ID
ON p.CustomerID = c.ID
WHERE
(a.State = 'WA' || p.CustomerID IS NULL)
AND p.ID in
(
SELECT PurchaseID FROM PurchaseItem
GROUP BY PurchaseID HAVING SUM (SaleAmount) > 1000
)
How would translate this to LINQ? The wrong way is to transliterate the query into LINQ, trying to reproduce the outer and inner joins, subquery and group clause. The right way is to map your original query (in English) directly into LINQ, leveraging LINQ's linear flow of data and association properties:
I want all purchases...
from p in db.Purchases
...of $1000 or greater...
where p.PurchaseItems.Sum (pi => pi.SaleAmount) > 1000
...paid for in cash...
where p.Customer == null
...or by customers who live in Washington
|| p.Customer.Address.State == "WA"
Here's the final query:
from p in db.Purchases
where p.PurchaseItems.Sum (pi => pi.SaleAmount) > 1000
where p.Customer == null || p.Customer.Address.State == "WA"
select p
More info here.
In general there are no tools to covert SQL to Linq as #andres-abel mention before, but sometimes you have to write Linq that will execute exactly as specified SQL (for example because of performance issues, backward compatability or some other reasons).
In this case I'll advice you to do reverse engineering by yourself:
configure logging of dump SQL statements generated by Linq to stdout using
ObjectQuery.ToTraceString,
DbCommand.CommandText,
logger availabe to your data source
manually rewrite Linq statement until you'll get what you need
LinqPad contains no SQL->LINQ translator. LinqPad does actually not contain any LINQ->SQL translator either. It relies on the .Net Linq-to-Sql library or Entity framework for the translation.
I don't know of any other tool with that capability either. In simple cases it would be possible to make one, but for more complex scenarios it would be impossible as there is no LINQ expression that matches some SQL constructs.
I was wondering how would avg and group by represented in query trees?
I have a query like this:
SELECT Stats.StuId, Stats.CrsAve
FROM (SELECT T.StuId, AVG(T.Grd) AS CrsAvg
FROM Transcript T
WHERE T.Semester IN (‘F2004’, ‘S2006’)
GROUP BY T.StuId) AS Stats
WHERE Stats.CrsAvg > 3.5
So, modules GROUP BY and AVG worry me - how are they drawn?
You have to use "Avg", but to optimize the query you can avoid using two select adding a clause "Having":
SELECT T.StuId, AVG(T.Grd) AS CrsAvg
FROM Transcript T
WHERE T.Semester IN (‘F2004’, ‘S2006’)
GROUP BY T.StuId
having AVG(T.Grd) > 3.5
Also, you can consider adding appropriate indexes to the table.