Ordering output of LIST in Snowflake - sql

I've been using the LIST command to check the files staged to a table in Snowflake. However, the data is unordered and I'd like to order it by last_modified. I tried embedding it into a SELECT query like this:
SELECT *
FROM LIST #MY_DATABASE.MY_SCHEMA.%my_table/path/to/data PATTERN = '.*[.]csv.*'
However, this query fails to compile. I've tried preceding LIST with the CALL keyword as well, but no luck there. I've even tried assigning it to a local variable, but that doesn't work either. The data appears to be tabular so I'm not sure why I can't work with it.
How can I query on the output of LIST?

I am personally using the following "hacky" solution:
Start by executing the "list" command.
I Then use the result_scan function combined with last_query_id function to fetch the results of that query, as this point I can start querying the data, here's how it looks:
LIST #MY_DATABASE.MY_SCHEMA.%my_table/path/to/data PATTERN = '.*[.]csv.*'
WITH data(name, size, md5, last_modified) as (
SELECT * FROM table(result_scan(last_query_id()))
)
select *
from data
order by last_modified desc;
Obviously this is a manual hack as I retrieve the last query id, if you can't ensure this property you need to get the actual query id and use that explicitly instead.

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

I have a table named Posts I would like to count and profile in Snowflake using the current Snowsight UI.
When I return the results via EXPLAIN using TABLULAR I am able to return the set with the combination of TABLE, RESULT_SCAN, and LAST_QUERY_ID functions, but any predicate or filter or column reference seems to fail.
Is there a valid way to do this in Snowflake with the TABLE function or is there another way to query the output of the EXPLAIN using TABLULAR?
-- Works
EXPLAIN using TABULAR SELECT COUNT(*) from Posts;
-- Works
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t;
-- Does not work
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t where operation = 'GlobalStats';
-- invalid identifier 'OPERATION', the column does not seem recognized.
Tried the third example and expected the predicate to apply to the function output. I don't understand why the filter works on some TABLE() results and not others.
You need to double quote the column name
where "operation"=
From the Documentation
Note that because the output column names from the DESC USER command
were generated in lowercase, the commands use delimited identifier
notation (double quotes) around the column names in the query to
ensure that the column names in the query match the column names in
the output that was scanned

How to use json array in WHERE IN clause in Postgres

I have a Postgres query like this
SELECT * FROM my_table WHERE status IN (2,1);
This is part of a big query, but I am facing an issue with the WHERE IN part here. I am using this query inside a function and the input parameters are in JSON format. Now the status values I am getting in in the form of a JSON array and it will be like status=[2,1]. I need to use this array in the WHERE clause in the query and not sure how to do that. Currently, I am using like
SELECT * FROM my_table WHERE status IN (array([2,1]));
But this is giving me an error. The status column is of smallint data type. I know this is simple, but I am very much new to Postgres and could not figure out any method to use the JSON array in WHERE IN clause. Any help will be appreciated.

Getting empty rows from "SELECT ALL" query in PostgreSQL

I'm using PostgreSQL server on my Windows.
I have created table user(id, name, age) and inserted some data there.
When I'm trying to retrieve that data via SELECT ALL FROM user I'm getting only the rows count and just plain nothingness.
How it looks like in psql CLI:
...and I'm also receiving empty object array as query result in my nodejs app:
Postgres supports SELECT ALL as a synonym for SELECT -- as a two words syntax parallel to SELECT DISTINCT. The ALL does nothing.
In addition, Postgres -- unlike other databases -- allows a SELECT to select no columns.
So:
SELECT ALL
FROM user;
Selects all the rows from user but no columns. In general, you would instead write:
SELECT *
FROM user;
to see the contents of the rows. If you want to be verbose, yo could use:
SELECT ALL *
FROM user;
But that is used somewhere close to never in practice.
You specified that you want the data from the column named "all". To get data from all columns you need to specify them all or use the asterisk symbol like so:
SELECT * FROM "user";

SqlQuery and SqlFieldsQuery

it looks that SqlQuery only supports sql that starts with select *? Doesn't it support other sql that only select some columns like
select id, name from person and maps the columns to the corresponding POJO?
If I use SqlFieldQuery to run sql, the result is a QueryCursor of List(each List contains one record of the result). But if the sql starts with select *, the this list's contents would be different with field query like:
select id,name,age from person
For the select *, each List is constructed with 3 parts:
the first elment is the key of the cache
the second element is the pojo object that contains the data
the tailing element are the values for each column.
Why was it so designed? If I don't know what the sql that SqlFieldsQuery runs , then I need additional effort to figure out what the List contains.
SqlQuery returns key and value objects, while SqlFieldsQuery allows to select specific fields. Which one to use depends on your use case.
Currently select * indeed includes predefined _key and _val fields, and this will be improved in the future. However, generally it's a good practice to list fields you want to fetch when running SQL queries (this is true for any SQL database, not only Ignite). This way your code will be protected from unexpected behavior in case schema is changed, for example.

BigQuery query creation without variables?

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.
There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?
It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);
There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.
Its not elegant, and its a a pain, but...
The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.
I have opened a feature request asking for "Dynamic SQL" capabilities.
If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.