is there any way within snowflake/sql query to view what tables are being queried the most as well as what columns? I want to know what data is of most value to my users and not sure how to do this programatically. Any thoughts are appreciated - thank you!
2021 update
The new ACCESS_HISTORY view has this information (in preview right now, enterprise edition).
For example, if you want to find the most used columns:
select obj.value:objectName::string objName
, col.value:columnName::string colName
, count(*) uses
, min(query_start_time) since
, max(query_start_time) until
from snowflake.account_usage.access_history
, table(flatten(direct_objects_accessed)) obj
, table(flatten(obj.value:columns)) col
group by 1, 2
order by uses desc
Ref: https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
2020 answer
The best I found (for now):
For any given query, you can find what tables are scanned through looking at the plan generated for it:
SELECT *, "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.any_table_or_view')))
WHERE "operation"='TableScan'
You can find all of your previous ran queries too:
select QUERY_TEXT
from table(information_schema.query_history())
So the natural next step would be combine both - but that's not straightforward, as you'll get an error like:
SQL compilation error: argument 1 to function EXPLAIN_JSON needs to be constant, found 'SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.c')'
The solution would be to combine the queries from the query_history() with the SYSTEM$EXPLAIN_PLAN_JSON outside (to make the strings constant), and then you will be able to find out the most queried tables.
Related
This is for BigQuery SQL
I have some data like
version
color
v1
red
v2
blue
I want to get it into an output format like this:
v1
v2
red
blue
I guess this is a classic transpose but not sure the best way to do this. I tried a nested query:
select * from
(select color from table where version = 'v1'),
(select color from table where version = 'v2')
(simplified query from my real column etc names!)
but that gives me multiple rows per item with different versions of v1
most of the examples I found seemed a lot more complex.
https://towardsdatascience.com/pivot-in-bigquery-4eefde28b3be
Appreciate some help in basic pivots or groupby or the best way to transpose?
UPDATE it almost works but fails cos of field names.
select *
from (
select agent, text, expect
from `my.data.runs`
)
pivot (
min(expect) as expect,
min(agent) as agent
for agent in ("august-mr")
)
Invalid field name "expect_august-mr".
if my agent was named 'august_mr' it works fine!
Is there anyway to try and escape or enable a dash in the values for the agent?
just simple pivot as in below example
select *
from table
pivot (min(color) for version in ('v1', 'v2'))
if applied to sample data in your question - output is
I am currently working on an access database where we collect customers feedback.
I have one table with the following structure and data :
And I want to display the following result :
Indeed, what I want is a MS Access Request that displays, for every date value in my table, the amount of records that matches the same date on the column "date_import" (2nd column of the result) and the amount of records that matches this criteria on the column "date_answered" (3rd column of the result).
I have no idea how to do this since all the subqueries should be aware of each other.
Has anyone ever faced this issue and might be able to help me ?
Thanks in advance,
P.S. : I'm using the 2016 version of MS Access but I'm pretty sure what I'm trying to do is also achievable in previous versions of Access, this is what I added several tags.
Hmmm . . . I think this will work:
select dte, sum(is_contact), sum(is_answer)
from (select date_import as dte, 1 as is_contact, 0 as is_answer
from t
union all
select date_answers, 0 as is_contact, 1 as is_answer
from t
) t
group by dte;
Not all versions of MS Access allow union all in the FROM clause. If that is a problem, you can create a view and then select from the view.
I'm trying to query my database to pull only duplicate/old data to write to a scratch section in excel (Using a macro passing SQL to the DB).
For now, I'm currently testing in Access alone to only filter out the old data.
First, I'm trying to filter my database by a specifed WorkOrder, RunNumber, and Row.
The code below only filters by Work Order, RunNumber, and Row. ...but SQL doesn't like when I tack on a 2nd AND statement; so this currently isn't working.
SELECT *
FROM DataPoints
WHERE (((DataPoints.[WorkOrder])=[WO2]) AND ((DataPoints.[RunNumber])=6) AND ((DataPoints.[Row]=1)
Once I figure that portion out....
Then if there is only 1 entry with specified WorkOrder, RunNumber, and Row, then I want filter it out. (its not needed in the scratch section, because its data is already written to the main section of my report)
If there are 2 or more entries with said criteria(WO, RN, and Row), then I want to filter out the newest entry based on RunDate and RunTime, and only keep all older entries.
For instance, in the clip below. The only item remaining in my filtered query will be the top entry with the timestamp 11:47:00AM.
.
Are there any recommended commands to complete this problem? Any ideas are helpful. Thank you.
I would suggest something along the lines of the following:
select t.*
from datapoints t
where
t.workorder = [WO2] and
t.runnumber = 6 and
t.row = 1 and
exists
(
select 1
from datapoints u
where
u.workorder = t.workorder and
u.runnumber = t.runnumber and
u.row = t.row and
(u.rundate > t.rundate or (u.rundate = t.rundate and u.runtime > t.runtime))
)
Here, if the correlated subquery within the where clause finds a record with the same workorder, runnumber and row, but with either a later rundate or the same rundate and a later runtime, then the record is returned by the main query.
You need two more )'s at the end of your code snippet. Or you can delete the parentheses completely in this example, MS Access will ad them back in as it deems necessary.
M.S. Access SQL can be tricky as it is not standards compliant and either doesn't allow for super complex queries, or it needs an ugly work around, like having a parentheses nesting nightmare when trying to join more than two tables.
For these reasons, I suggest using multiple Access queries to produce your results.
I have been struggling with a question that seem simple, yet eludes me.
I am dealing with the public BigQuery table on bitcoin and I would like to extract the first transaction of each block that was mined. In other word, to replace a nested field by its first row, as it appears in the table preview. There is no field that can identify it, only the order in which it was stored in the table.
I ran the following query:
#StandardSQL
SELECT timestamp,
block_id,
FIRST_VALUE(transactions) OVER (ORDER BY (SELECT 1))
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
But it process 492 GB when run and throws the following error:
Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Sort operator used for OVER(ORDER BY) used too much memory..
It seems so simple, I must be missing something. Do you have an idea about how to handle such task?
#standardSQL
SELECT * EXCEPT(transactions),
(SELECT transaction FROM UNNEST(transactions) transaction LIMIT 1) transaction
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
Recommendation: while playing with large table like this one - I would recommend creating smaller version of it - so it incur less cost for your dev/test. Below can help with this - you can run it in BigQuery UI with destination table which you will then be using for your dev. Make sure you set Allow Large Results and unset Flatten Results so you preserve original schema
#legacySQL
SELECT *
FROM [bigquery-public-data:bitcoin_blockchain.blocks#1529518619028]
The value of 1529518619028 is taken from below query (at a time of running) - the reason I took four days ago is that I know number of rows in this table that time was just 912 vs current 528,858
#legacySQL
SELECT INTEGER(DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -24*4, 'HOUR')/1000)
An alternative approach to Mikhail's: Just ask for the first row of an array with [OFFSET(0)]:
#StandardSQL
SELECT timestamp,
block_id,
transactions[OFFSET(0)] first_transaction
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
LIMIT 10
That first row from the array still has some nested data, that you might want to flatten to only their first row too:
#standardSQL
SELECT timestamp
, block_id
, transactions[OFFSET(0)].transaction_id first_transaction_id
, transactions[OFFSET(0)].inputs[OFFSET(0)] first_transaction_first_input
, transactions[OFFSET(0)].outputs[OFFSET(0)] first_transaction_first_output
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
LIMIT 1000
I am on a work term from school. I am not very comfortable using SQL, I am trying to get a hold of it....
My supervisor gave me a task for a user in which I need to take row data and make columns. We used the Crosstab Wizard and automagically created the SQL to get what we needed.
Basically, we have a table like this:
ReqNumber Year FilledFlag(is a checkbox) FilledBy
1 2012 (notchecked) ITSchoolBoy
1 2012 (checked) GradStudent
1 2012 (notchecked) HighSchooler
2 etc, etc.
What the user would like is to have a listing of all of the req numbers and what is checked
Our automatic pivot code gives us all of the FilledBy options (there are 9 in total) as column headings, and groups it all by reqnumber.
How can you do this without the pivot? I would like to wrap my head around this. Nearest I can find is something like:
SELECT
SUM(IIF(FilledBy = 'ITSchoolboy',1,0) as ITSchoolboy,
SUM(IIF(FilledBy = 'GradStudent',1,0) as GradStudent, etc.
FROM myTable
Could anyone help explain this to me? Point me in the direction of a guide? I've been searching for the better part of a day now, and even though I am a student, I don't think this will be smiled upon for too long. But I would really like to know!
I think your boss' suggestion could work if you GROUP BY ReqNumber.
SELECT
ReqNumber,
SUM(IIF(FilledBy = 'ITSchoolboy',1,0) as ITSchoolboy,
SUM(IIF(FilledBy = 'GradStudent',1,0) as GradStudent,
etc.
FROM myTable
GROUP BY ReqNumber;
A different approach would be to JOIN multiple subqueries. This example pulls in 2 of your categories. If you need to extend it to 9 categories, you would have a whole lot of joining going on.
SELECT
itsb.ReqNumber,
itsb.ITSchoolboy,
grad.GradStudent
FROM
(
SELECT
ReqNumber,
FilledFlag AS ITSchoolboy
FROM myTable
WHERE FilledBy = "ITSchoolboy"
) AS itsb
INNER JOIN
(
SELECT
ReqNumber,
FilledFlag AS GradStudent
FROM myTable
WHERE FilledBy = "GradStudent"
) AS grad
ON itsb.ReqNumber = grad.ReqNumber
Please notice I'm not suggesting you should use this approach. However, since you asked about alternatives to your pivot approach (which works) ... this is one. Stay tuned in case someone else offers a simpler alternative. :-)