SQL to select with multiple different WHEREs / transpose / pivot table? - sql

This is for BigQuery SQL
I have some data like
version
color
v1
red
v2
blue
I want to get it into an output format like this:
v1
v2
red
blue
I guess this is a classic transpose but not sure the best way to do this. I tried a nested query:
select * from
(select color from table where version = 'v1'),
(select color from table where version = 'v2')
(simplified query from my real column etc names!)
but that gives me multiple rows per item with different versions of v1
most of the examples I found seemed a lot more complex.
https://towardsdatascience.com/pivot-in-bigquery-4eefde28b3be
Appreciate some help in basic pivots or groupby or the best way to transpose?
UPDATE it almost works but fails cos of field names.
select *
from (
select agent, text, expect
from `my.data.runs`
)
pivot (
min(expect) as expect,
min(agent) as agent
for agent in ("august-mr")
)
Invalid field name "expect_august-mr".
if my agent was named 'august_mr' it works fine!
Is there anyway to try and escape or enable a dash in the values for the agent?

just simple pivot as in below example
select *
from table
pivot (min(color) for version in ('v1', 'v2'))
if applied to sample data in your question - output is

Related

Query Snowflake Jobs [duplicate]

is there any way within snowflake/sql query to view what tables are being queried the most as well as what columns? I want to know what data is of most value to my users and not sure how to do this programatically. Any thoughts are appreciated - thank you!
2021 update
The new ACCESS_HISTORY view has this information (in preview right now, enterprise edition).
For example, if you want to find the most used columns:
select obj.value:objectName::string objName
, col.value:columnName::string colName
, count(*) uses
, min(query_start_time) since
, max(query_start_time) until
from snowflake.account_usage.access_history
, table(flatten(direct_objects_accessed)) obj
, table(flatten(obj.value:columns)) col
group by 1, 2
order by uses desc
Ref: https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
2020 answer
The best I found (for now):
For any given query, you can find what tables are scanned through looking at the plan generated for it:
SELECT *, "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.any_table_or_view')))
WHERE "operation"='TableScan'
You can find all of your previous ran queries too:
select QUERY_TEXT
from table(information_schema.query_history())
So the natural next step would be combine both - but that's not straightforward, as you'll get an error like:
SQL compilation error: argument 1 to function EXPLAIN_JSON needs to be constant, found 'SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.c')'
The solution would be to combine the queries from the query_history() with the SYSTEM$EXPLAIN_PLAN_JSON outside (to make the strings constant), and then you will be able to find out the most queried tables.

Group By Using Wildcards in Big Query

I have this query:
SELECT SomeTableA.*
FROM SomeTableB
LEFT JOIN SomeTableA USING (XYZ)
GROUP BY SomeTableA.*
I know that I cannot do the GROUP BY part with wildcards. At the same time, I don't really like listing all the columns (can be up to 20) manually.
Could this be added as new feature? Or is there any way how to easily get the list of all 20 columns from SomeTableA for the GROUP BY part?
If you really have the exact query shown in your question - then try below instead - no grouping required
#standardSQL
SELECT DISTINCT *
FROM `project.dataset.tableA`
WHERE xyz IN (SELECT xyz FROM `project.dataset.tableB`)
As of Group By Using Wildcards in Big Query this sounds more like grouping by struct which is not supported so you can submit feature request if you want - https://issuetracker.google.com/issues/new?component=187149&template=0

BigQuery Wildcard tables with Regex and date range

Is it possible to combine the table wildcard functions as documented here?
I've taken a look through the Table Query functions SO answer, but doesn't quite seem to cover my use case.
I have table names in the format: s_CUSTOMER_ID_YYYYMMDD
I can find all the tables for a customer ID using:
SELECT *
FROM TABLE_QUERY([project:dataset],
'REGEXP_MATCH(table_id, r"^s_CUSTOMER_ID")')
And I can find all the tables for a date range via:
SELECT *
FROM (TABLE_DATE_RANGE([project:dataset],
TIMESTAMP('2016-01-01'),
TIMESTAMP('2016-03-01')))
But how do I query for both at the same time?
I tried using sub queries like this:
SELECT * FROM
(SELECT *
FROM TABLE_QUERY([project:dataset],
'REGEXP_MATCH(table_id, r"^s_CUSTOMER_ID")'))
,(SELECT *
FROM (TABLE_DATE_RANGE([project:dataset],
TIMESTAMP('2016-01-01'),
TIMESTAMP('2016-03-01'))))
...but the parser complains of Error: Can't parse table: project:dataset.
Adding a dot so they are project:dataset. brings an error Error: Error preparing subsidiary query: Dataset project:dataset. not found
Are my table names poorly done? What would be a better way of organising them if so?
Below quick "solution" - should work and you can improve it based on real/extra requirements you probably have
SELECT *
FROM
TABLE_QUERY([project:dataset],
'REGEXP_MATCH(table_id, r"^s_CUSTOMER_ID")
AND RIGHT(table_id, 8) BETWEEN "20160101" AND "20160301"')

Convert row data into columns Access 07 without using PIVOT

I am on a work term from school. I am not very comfortable using SQL, I am trying to get a hold of it....
My supervisor gave me a task for a user in which I need to take row data and make columns. We used the Crosstab Wizard and automagically created the SQL to get what we needed.
Basically, we have a table like this:
ReqNumber Year FilledFlag(is a checkbox) FilledBy
1 2012 (notchecked) ITSchoolBoy
1 2012 (checked) GradStudent
1 2012 (notchecked) HighSchooler
2 etc, etc.
What the user would like is to have a listing of all of the req numbers and what is checked
Our automatic pivot code gives us all of the FilledBy options (there are 9 in total) as column headings, and groups it all by reqnumber.
How can you do this without the pivot? I would like to wrap my head around this. Nearest I can find is something like:
SELECT
SUM(IIF(FilledBy = 'ITSchoolboy',1,0) as ITSchoolboy,
SUM(IIF(FilledBy = 'GradStudent',1,0) as GradStudent, etc.
FROM myTable
Could anyone help explain this to me? Point me in the direction of a guide? I've been searching for the better part of a day now, and even though I am a student, I don't think this will be smiled upon for too long. But I would really like to know!
I think your boss' suggestion could work if you GROUP BY ReqNumber.
SELECT
ReqNumber,
SUM(IIF(FilledBy = 'ITSchoolboy',1,0) as ITSchoolboy,
SUM(IIF(FilledBy = 'GradStudent',1,0) as GradStudent,
etc.
FROM myTable
GROUP BY ReqNumber;
A different approach would be to JOIN multiple subqueries. This example pulls in 2 of your categories. If you need to extend it to 9 categories, you would have a whole lot of joining going on.
SELECT
itsb.ReqNumber,
itsb.ITSchoolboy,
grad.GradStudent
FROM
(
SELECT
ReqNumber,
FilledFlag AS ITSchoolboy
FROM myTable
WHERE FilledBy = "ITSchoolboy"
) AS itsb
INNER JOIN
(
SELECT
ReqNumber,
FilledFlag AS GradStudent
FROM myTable
WHERE FilledBy = "GradStudent"
) AS grad
ON itsb.ReqNumber = grad.ReqNumber
Please notice I'm not suggesting you should use this approach. However, since you asked about alternatives to your pivot approach (which works) ... this is one. Stay tuned in case someone else offers a simpler alternative. :-)

SQL Server 2008 Query Result Formating (changing x and y axis fields)

What is the most efficient way to format query results, wether in the actual SQL Server SQL code or using a different program such as access or excel so that the X (first row column headers) and Y Axis (first column field values) can be changed, but with the same query result data still being represented, just in a different way.
they way the data is stored in my database and they way my original query results are returned in SQL Server 2008 are as follows:
Original Results Format
And Below is the way I need to have the data look:
How I need the Results to Look
In essence, I need to have the zipcode field go down the Y Axis (first column) and the Coveragecode field to go across the top first Row (X Axis) with the exposures filling in the rest of the data.
The only way I can thing of getting this done is by bringing the data into excel and doing a bunch of V-LookUps. I tried using some pivot tables but didn't get to far. I'm going to continue trying to format the data using the V-LookUps but hopefully someone can come up with a better way of doing this.
What you are looking for is a table operator called PIVOT. Docs: http://msdn.microsoft.com/en-us/library/ms177410.aspx
WITH Src AS(
SELECT Coveragecode, Zipcode, EarnedExposers
FROM yourTable
)
SELECT Zipcode, [BI], [PD], [PIP], [UMBI], [COMP], [COLL]
FROM Src
PIVOT(MAX(EarnedExposers) FOR CoverageCode
IN(
[BI],
[PD],
[PIP],
[UMBI],
[COMP],
[COLL]
)
) AS P;