Snowflake compute costs by QUERY TYPE

Snowflake compute costs by QUERY TYPE - sql

I could find a valid list of query types that generate compute costs on Snowflake.
Would you happen to know accurately which ones do generate cost please ?
Here is a list that i was able to extract from our execution logs : (table QUERY_HISTORY)
ALTER
ALTER_ACCOUNT
ALTER_NETWORK_POLICY
ALTER_PIPE ALTER_SESSION
ALTER_TABLE
ALTER_TABLE_ADD_COLUMN
ALTER_TABLE_DROP_COLUMN
ALTER_TABLE_MODIFY_COLUMN
ALTER_USER
ALTER_USER_RESET_PASSWORD
ALTER_VIEW_MODIFY_SECURITY
ALTER_WAREHOUSE_RESUME
ALTER_WAREHOUSE_SUSPEND
BEGIN_TRANSACTION
COMMIT
COPY
CREATE
CREATE_CONSTRAINT
CREATE_EXTERNAL_TABLE
CREATE_NETWORK_POLICY
CREATE_ROLE
CREATE_SEQUENCE
CREATE_STREAM
CREATE_TABLE
CREATE_TABLE_AS_SELECT
CREATE_TASK
CREATE_USER
CREATE_VIEW
DELETE
DESCRIBE
DESCRIBE_QUERY
DROP
DROP_CONSTRAINT
DROP_NETWORK_POLICY
DROP_ROLE
DROP_STREAM
DROP_TASK
DROP_USER
EXPLAIN
EXTERNAL_TABLE_REFRESH
GET_FILES
GRANT
INSERT
LIST_FILES
MERGE
PUT_FILES
REMOVE_FILES
RENAME_COLUMN
RENAME_SCHEMA
RENAME_TABLE
RENAME_VIEW
RESTORE
REVOKE
ROLLBACK
SELECT
SET
SHOW
TRUNCATE_TABLE
UNKNOWN
UNLOAD
UNSET
UPDATE
USE
thanks

All operations generate a cost in some way. Operations that do not use a warehouse are usually run under the cloud services layer
Snowflake give you free CSL cost up to 10% of your compute costs. Over that and they start charging you for it

Metadata operations such as ALTER TABLE do not consume "compute costs", but I do not have a full list. I think you can use the same approach to find the query types to consume WH credits by checking the warehouse size:
select distinct query_type from snowflake.account_usage.query_history
where WAREHOUSE_SIZE is not null;

Here's a SF query that tells you compute costs for the top 15 query types over a date range.
select query_type
, count(*) freq
, sum(CREDITS_USED_CLOUD_SERVICES) as total_CS_CREDITS
from snowflake.account_usage.query_history
where 1=1
and start_time = :daterange
group by 1
order by 3 desc
fetch 15 rows;

Related

Query just runs, doesn't execute

my query just runs and doesnt execute, what is wrong. work on oracle sql developer, company server
CREATE TABLE voice2020 AS
SELECT
to_char(SDATE , 'YYYYMM') as month,
MSISDN,
SUM(CH_MONEY_SUBS_DED)/100 AS AIRTIME_VOICE,
SUM(CALLDURATION/60) AS MIN_USAGE,
sum(DUR_ONNET_OOB/60) as DUR_ONNET_OOB,
sum(DUR_ONNET_IB/60) as DUR_ONNET_IB,
sum(DUR_ONNET_FREE/60) as DUR_ONNET_FREE,
sum(DUR_OFFNET_OOB/60) as DUR_OFFNET_OOB,
sum(DUR_OFFNET_IB/60) as DUR_OFFNET_IB,
sum(DUR_OFFNET_FREE/60) as DUR_OFFNET_FREE,
SUM(case when sdate < to_date('20190301','YYYYMMDD')
then CH_MONEY_PAID_DED-nvl(CH_MONEY_SUBS_DED,0)-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming
else (CH_MONEY_OOB-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming) end)/100 AS VOICE_OOB_SPEND
FROM CCN.CCN_VOICE_MSISDN_MM#xdr1
where MSISDN IN ( SELECT MSISDN FROM saayma_a.BASE30112020) --change date
GROUP BY
MSISDN,
to_char(SDATE , 'YYYYMM')
;

This is a performance issue. Clearly the query driving your CREATE TABLE statement is taking too long to return a result set.
You are querying from a table in a remote database (CCN.CCN_VOICE_MSISDN_MM#xdr1) and then filtering against a local table (saayma_a.BASE30112020) . This means you are going to copy all of that remote table across the network, then discard the records which don't match the WHERE clause.
You know your data (or at least you should know it): does that sound efficient? If you're actually discarding most of the records you should try to filter CCN_VOICE_MSIDN_MM in the remote database.
If you need more advice you need to provide more information. Please read this post about asking Oracle tuning questions on this site, then edit your question to include some details.

You are executing CTAS (CREATE TABLE AS SELECT) and the purpose of this query is to create the table with data which is generated via this query.
If you want to just execute the query and see the data then remove first line of your query.
-- CREATE TABLE voice2020 AS
SELECT
.....
Also, the data of your actual query must be present in the voice2020 table if you have already executed it once.
Select * from voice2020;

Looks like you are trying to copying the data from one table to another table, Can you once create the table if it's not created and then try this statement.
insert into target_table select * from source_table;

Redshift showing 0 rows for external table, though data is viewable in Athena

I created an external table in Redshift and then added some data to the specified S3 folder. I can view all the data perfectly in Athena, but I can't seem to query it from Redshift. What's weird is that select count(*) works, so that means it can find the data, but it can't actually show anything. I'm guessing it's some mis-configuration somewhere, but I'm not sure what.
Some stuff that may be relevant (I anonymized some stuff):
create external schema spectrum_staging
from data catalog
database 'spectrum_db'
iam_role 'arn:aws:iam::############:role/RedshiftSpectrumRole'
create external database if not exists;
create external table spectrum_staging.errors(
id varchar(100),
error varchar(100))
stored as parquet
location 's3://mybucket/errors/';
My sample data is stored in s3://mybucket/errors/2018-08-27-errors.parquet
This query works:
db=# select count(*) from spectrum_staging.errors;
count
-------
11
(1 row)
This query does not:
db=# select * from spectrum_staging.errors;
id | error
----+-------
(0 rows)

Check your parquet file and make sure the column data types in the Spectrum table match up.
Then run SELECT pg_last_query_id(); after your query to get the query number and look in the system tables STL_S3CLIENT and STL_S3CLIENT_ERROR to find further details about the query execution.

You don't need to define external tables when you have defined external schema based on Glue Data Catalog. Redshift Spectrum pics up all the tables that are in the Catalog.
What's probably going on there is that you somehow have two things with the same name and in one case it picks it up from the data catalog and in the other case it tries to use the external table.
Check these tables from Redshift side to get a better view of what's there:
select * from SVV_EXTERNAL_SCHEMAS
select * from SVV_EXTERNAL_TABLES
select * from SVV_EXTERNAL_PARTITIONS
select * from SVV_EXTERNAL_COLUMNS
And these tables for queries that use the tables from external schema:
select * from SVL_S3QUERY_SUMMARY
select * from SVL_S3LOG order by eventtime desc
select * from SVL_S3QUERY where query = xyz
select * from SVL_S3PARTITION where query = xyz

was there ever a resolution for this? a year down, i have the same problem today.
nothing stands out in terms of schema differences- an error exists though
select recordtime, file, process, errcode, linenum as line,
trim(error) as err
from stl_error order by recordtime desc;
/home/ec2-user/padb/src/sys/cg_util.cpp padbmaster 1 601 Compilation of segment failed: /rds/bin/padb.1.0.10480/data/exec/227/48844003/de67afa670209cb9cffcd4f6a61e1c32a5b3dccc/0
Not sure what this means.

I encountered a similar issue when creating an external table in Athena using RegexSerDe row format. I was able to query this external table from Athena without any issues. However, when querying the external table from Redhift the results were null.
Resolved by converting to parquet format as Spectrum cannot handle regular expression serialization.
See link below:
Redshift spectrum shows NULL values for all rows

Oracle SQL use variable partition name

I run a daily report that has to query another table which is updated separately. Due to the high volume of records in the source table (8M+ per day) each day is stored in it's own partition. The partition has a standard format as P ... 4 digit year ... 2 digit month ... 2 digit date, so yesterday's partition is P20140907.
At the moment I use this expression, but have to manually change the name of the partition each day:
select * from <source_table> partition (P20140907) where ....
By using sysdate, toChar and Concat I have created another table called P_NAME2 that will automatically generate and update a string value as the name of the partition that I need to read. Now I need to update my main query so it does this:
select * from <source_table> partition (<string from P_NAME2>) where ....

You are working too hard. Oracle already does all these things for you. If you query the table using the correct date range oracle will perform the operation only on the relevant partitions - this is called pruning .
I suggest reading the docs on that.
If you'r still skeptic, Query all_tab_partitions.HIGH_VALUE to get each partitions high value (the table you created ... ).

I thought I'd pop back to share how I solved this in the end. The source database has a habit of leaking dates across partitions which is why queries for one day were going outside a single partition. I can't affect this, just work around it ...
begin
execute immediate
'create table LL_TEST as
select *
from SCHEMA.TABLE Partition(P'||TO_CHAR(sysdate,'YYYYMMDD')||')
where COLUMN_A=''Something''
and COLUMN_B=''Something Else''
';
end
;
Using the PL/SQL script I create the partition name with TO_CHAR(sysdate,'YYYYMMDD') and concatenate the rest of the query around it.
Note that the values you are searching for in the where clause require double apostrophes so to send 'Something' to the query you need ''Something'' in the script.
It may not be pretty, but it works on the database that I have to use.

BigQuery query creation without variables?

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.
There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?

It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);

There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.

Its not elegant, and its a a pain, but...
The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.
I have opened a feature request asking for "Dynamic SQL" capabilities.

If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.

How do I pass the same input to all the sql files only once at the start?

I am calling 4-5 scripts from a file at once.
But I need to give only one input to the file in the first sql that I am calling.
That input will be the input for all the other sql files I have called after the first one.
Is there any way to do that?
please help.

I think you can achieve what you want by using the sqlcmd utility and scripting variables. The last link states that you can also use environment variables.

Do you mean:
query2 is based on the result of query1,
query3 is based on the result of query2 etc...
If so, you can use views to
create view view1 as select * from table1;
create view view2 as select * from view2;
create view view3 as select * from view3;
create view view4 as select * from view4;
select * from view4
Of course you have to add the where clause yourself.
See for more on views http://dev.mysql.com/doc/refman/5.0/en/create-view.html

no
START fbm.sql
START fba.sql
START fei.sql
START fbe.sql
START fae.sql
START tfat.sql
START ins_FBH.sql
is the code.
in fbm.sql
i have an input like bill id = '&1'.
also i have the same input of bill id in other sql's.
but whn i run the master sql it will run fbm.sql and ask me for the bill id input.
suppose i give it as 'ABC' and again after completing this fbm.sql it will ask me the input for bill id again for fba.sql which i dont want to give again n again.
wat i want is that this fba.sql and other corresponding sql's should take the input bill id as 'ABC' without me entering it.

Have you thought about using a stored procedure for this. It does depend on having version 5.0 (or later) of MySQL of course. But this allows you to define variables and to use them within the procedure, very flexible and great fun to use! Caveat, have not tested this myself, my experience has been with Oracle PL/SQL but concepts are similar.
Then you can do stuff like this (from the MySQL newsletter at:
http://www.mysql.com/news-and-events/newsletter/2004-01/a0000000297.html
DELIMITER // [1]
CREATE PROCEDURE payment [2]
(payment_amount DECIMAL(6,2),
payment_seller_id INT)
BEGIN
DECLARE n DECIMAL(6,2);
SET n = payment_amount - 1.00;
INSERT INTO Moneys VALUES (n, CURRENT_DATE);
IF payment_amount > 1.00 THEN
UPDATE Sellers
SET commission = commission + 1.00
WHERE seller_id = payment_seller_id;
END IF;
END;
//

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Snowflake compute costs by QUERY TYPE - sql

All operations generate a cost in some way. Operations that do not use a warehouse are usually run under the cloud services layer Snowflake give you free CSL cost up to 10% of your compute costs. Over that and they start charging you for it

Related

Query just runs, doesn't execute

Redshift showing 0 rows for external table, though data is viewable in Athena

Oracle SQL use variable partition name

BigQuery query creation without variables?

How do I pass the same input to all the sql files only once at the start?

Categories

Resources