BigQuery query creation without variables? - google-bigquery

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.
There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?

It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);

There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.

Its not elegant, and its a a pain, but...
The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.
I have opened a feature request asking for "Dynamic SQL" capabilities.

If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.

Related

Ordering output of LIST in Snowflake

I've been using the LIST command to check the files staged to a table in Snowflake. However, the data is unordered and I'd like to order it by last_modified. I tried embedding it into a SELECT query like this:
SELECT *
FROM LIST #MY_DATABASE.MY_SCHEMA.%my_table/path/to/data PATTERN = '.*[.]csv.*'
However, this query fails to compile. I've tried preceding LIST with the CALL keyword as well, but no luck there. I've even tried assigning it to a local variable, but that doesn't work either. The data appears to be tabular so I'm not sure why I can't work with it.
How can I query on the output of LIST?
I am personally using the following "hacky" solution:
Start by executing the "list" command.
I Then use the result_scan function combined with last_query_id function to fetch the results of that query, as this point I can start querying the data, here's how it looks:
LIST #MY_DATABASE.MY_SCHEMA.%my_table/path/to/data PATTERN = '.*[.]csv.*'
WITH data(name, size, md5, last_modified) as (
SELECT * FROM table(result_scan(last_query_id()))
)
select *
from data
order by last_modified desc;
Obviously this is a manual hack as I retrieve the last query id, if you can't ensure this property you need to get the actual query id and use that explicitly instead.

SQL Pivot: Can I make the pivot list dynamic without using stored proc?

Here is my SQL containing a pivot:
select * from (
select
[event_id]
,[attnum]
,PollId
,PollResponseDisplayText
from
[dbo].[v_PollReportDetails2]
) as tmp
pivot (max(tmp.[PollResponseDisplayText])
for tmp.PollId in ([703],[805],[806],[807],[808],[809])) as pivot_table
I want to change the pivot list to be something like this:
for tmp.PollId in (select PollId
from Polls
where event_id = 100100
and isVisible = 1)) as pivot_table
I can do this all in a stored proc and dynamically generate a SQL statement to feed into an execute() statement, but I need to be able to do this in a view.
Maybe I'm looking in the wrong places, but I can't seem to find official documentation that actually says that this can't be done. But after hours of experimenting and discussions, Apparently, this can not be done this way.
#aeoluseros said it best in his comment,
syntax of PIVOT clause requires these distinct values to be known at
query design time. So you cannot use that in view
Logically, this would make sense, because as we add rows to the table over time, it would have the potential to add columns dynamically to the view result.
Thanks #Damien_The_Unbeliever for additional clarification, see here:
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15#syntax

SQL-92 (Filemaker): How can I UPDATE a list of sequential numbers?

I need to re-assign all SortID's, starting from 1 until MAX (SortID) from a subset of records of table Beleg, using SQL-92, after one of the SortID's has changed (for example from 444 to 444.1). I have tried several ways (for example SET #a:=0; UPDATE table SET field=#a:=#a+1 WHERE whatever='whatever' ORDER BY field2), but it didn't work, as these solutions all need a special kind of SQL, like SQLServer or Oracle, etc.
The SQL that I use is SQL-92, implemented in FileMaker (INSERT and UPDATE are available, though, but nothing fancy).
Thanks for any hint!
Gary
From what I know, SQL-92 is a standard and not a language. So you can say you are using T-SQL, which is mostly SQL-92 compliant, but you can't say I program SQL Server in SQL-92. The same applies to FileMaker.
I suppose you are trying to update your table through ODBC? The Update statement looks OK, but there are no variables if FileMaker SQL (and I am not sure using a variable inside query will give you result you expect, I think you will set SortId in every row to 1). You are thinking about doing something like Window functions with row() in TSQL, but I do not think this functionality is available.
The easiest solution is to use FileMaker, resetting the numbering for a column is really a trivial task which takes seconds. Do you need help with this?
Edit:
I was referring to TSQL functions rank() and row_number(), there is no row() function in TSQL
I finally got the answer from Ziggy Crueltyfree Zeitgeister on the Database Administrators copy of my question.
He suggested to break this down into multiple steps using a temporary table to store the results:
CREATE TABLE sorting (sid numeric(10,10), rn int);
INSERT INTO sorting (sid, rn)
SELECT SortID, RecordNumber FROM Beleg
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210
ORDER BY SortID;
UPDATE Beleg SET SortID = (SELECT rn FROM sorting WHERE sid=Beleg.SortID)
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210;
DROP TABLE sorting;
Of course! I just keep the table definition in Filemaker (let the type coercion be done by Filemaker this way), and filling and deleting from it with my function: RenumberSortID ().

Look for "a,b,c" in column with data "a,z,b,c,x" with SQL query

I've been refactoring and upgrading an existing news site's data layer which I didn't developed from the start. The application is taking quite a lot of visits and after a bit of research I decided to ditch EF and go with Ado.Net / Dapper since the sql commands will never be exposed to any kind of UI layer or string manipulation.
One problem I've come up with, tough, is news tags are not normalized in the database and stored as a comma seperated string in News table and there is a front-end functionality which requires "related news" to be shown to the user.
So I need to search any occurences of a comma delimited string values in a table column that also contains comma delimited string values.
I've come up with the following query in sql management studio but it (obviously) takes a good time to return the results. Is there a way to do this operation better? I don't have expert knowledge in SQL so with my knowledge this is the query working at the moment:
-- I'm declaring this variable only for testing. In reality, #Tags should also be a query
-- which returns the set of tags of the target news...
DECLARE #Tags nvarchar(MAX)
Select #Tags = Tags FROM News WHERE Id = 7978 -- No idea where / how to include this query
-- in the actual search query :/
-- dbo.Split is a table valued function that takes a comma delimited nvarchar as parameter
-- and returns table(Id int, Data nvarchar, Order int) with the seperated values of the CSV
SELECT DISTINCT TOP 10 N.Id, N.Title, N.CreatedAt From News N
CROSS APPLY dbo.Split(N.Tags) B
WHERE B.Data IN
(
SELECT C.Data FROM dbo.Split(#Tags) C
)
ORDER BY N.CreatedAt DESC, N.Id DESC
I have full text index enabled and set for "Tags" column in the News table, but couldn't think of a proper query to use benefits of it.
SQL Server version: 2008 R2
This query supposed to supply an IEnumerable<NewsDto> GetRelatedNews(int targetNewsId) api method.
Will you try following query:
SELECT DISTINCT TOP 10 n.Id, n.Title, n.CreatedAt
FROM dbo.Split(#Tags) c
CROSS APPLY
(
SELECT id, Title, CreatedAt
FROM News
WHERE CONTAINS(Tags, c.Data) //THIS SHOULD MAKE USE OF FT
) n
But one drawback is that it may get all top 10 news from the first tag.
Further research didn't produce any alternatives to what I gave as an example in my original post. So I decided to go with that query and turn it into a stored procedure.
It takes 3 seconds to return all results and in my Web project I'm calling this method via ajax and caching the results to prevent running the same SP for every request.
Overall it doesn't impact my WebUI performance since it loads related news asynchronously and uses cached result if any exists.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).