IF Field Exists in StandardSQL

IF Field Exists in StandardSQL - google-bigquery

I have a table with these columns:
Apples
Bananas
Peaches - however, this column may or may not
appear. The table is dropped and loaded every 5 hours and I need to
be ready for situation where column "Peaches" is not available.
I have found couple similar questions here on StackOverflow but they were all using LegacySQL to solve the problem.
I was trying something like this:
SELECT *
FROM project.dataset.fruits
WHERE EXISTS(
SELECT peaches
FROM project.dataset.fruits
)
The code gives me that "peaches" is unknown name in case the "fruits" table does not currently have the column and the entire query fails.
Any ideas how to get around this?

Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset.fruits`
WHERE EXISTS (
SELECT 1 FROM `project.dataset.fruits` t
WHERE REGEXP_CONTAINS(TO_JSON_STRING(t), '[{,]"peaches":')
LIMIT 1
)

You may use INFORMATION_SCHEMA
SELECT
1
FROM
`project.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="fruits" AND column_name="peaches"

Related

AWS Athena Query Structure

I have a complex type and I want to query it using Athena
{id={s=c937b52e-fee8-4899-ae26-d4748e65fb5f}, __typename={s=Account}, role={s=COLLABORATOR}, updatedat={s=2021-04-23T04:38:29.385Z}, entityid={s=70f8a1a8-6f20-4dd3-a484-8385198ddf97}, status={s=ACTIVE}, createdat={s=2021-04-23T04:38:20.045Z}, email={s=dd#mail.com}, showonboarding={bool=true}, position={s=beta}, name={s=User2}, lastlogindate={s=2021-04-23T04:41:07.775Z}}
How to do it?
SELECT c.*
FROM "db"."table" c
LIMIT 10
returns all data in the table. However if I select like
SELECT c.id
FROM "db"."table" c
LIMIT 10
it shows the error.
Thanks in advance.

The query is missing name for column which stores the shown data. You should change your query to something like:
SELECT c.some_column_name.id -- use real column name instead of some_column_name
FROM "db"."table" c
LIMIT 10

How to select the nth column, and order columns' selection in BigQuery

I have this huge table upon which I apply a lot of processing (using CTEs), and I want to perform a UNION ALL on 2 particular CTEs.
SELECT *
, 0 AS orders
, 0 AS revenue
, 0 AS units
FROM secondary_prep_cte WHERE purchase_event_flag IS FALSE
UNION ALL
SELECT *
FROM results_orders_and_revenues_cte
I get a "Column 1164 in UNION ALL has incompatible types : STRING,DATE at [97:5]
Obviously I don't know the name of the column, and I'd like to debug this but I feel like I'm going to waste a lot of time if I can't pin-point which column is 1164.
I also think this is a problem of the order of columns between the CTEs, so I have 2 questions:
How do I identify the 1164th column
How do I order my columns before performing the UNION ALL
I found this similar question but it is for MSSQL, I am using BigQuery

You can get information from INFORMATION_SCHEMA.COLUMNS but you'll need to create a table or view from the CTE:
CREATE OR REPLACE VIEW `project.dataset.secondary_prep_view` as select * from (select 1 as id, "a" as name, "b" as value)
Then:
SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'secondary_prep_view';

SQL Server : compare two tables with UNION and Select * plus additional label column

I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?

Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b

You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.

Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1

"unpivot" data from Excel

I need to clean up some ugly data. What I have is similar to
ID,someFields,Supplier,Supplier_1,Supplier_2,Price,Price_1,Price_2,Weight; Weight_1,Weight_2
and so forth. Fields are named up to _9 and there are actually 8 different such fields named _1 to _9. Of course Price_1 is for Supplier_1 and so forth.
I would now like to unpivot to
ID,someFields,Supplier,Price,Weight
by duplicating ID and somefields.
An important note is that those _1 to _9 fields can be null, in fact most of them are.
Tools I have.
Excel
MS Access
could (mis)use oracle schema I have access to...
I found this
How to simulate UNPIVOT in Access 2010?
However that also multiplies rows that only have 1 Supplier.
Any ideas?

You can use a union query.
SELECT * INTO NewTable FROM
(SELECT ID,someFields,Supplier,Price,Weight FROM Table
WHERE SomeField Is Not Null
UNION ALL
SELECT ID,someFields1,Supplier1,Price1,Weight1 FROM Table
WHERE SomeField1 Is Not Null
<...>)

Is there any way to combine IN with LIKE in an SQL statement?

I am trying to find a way, if possible, to use IN and LIKE together. What I want to accomplish is putting a subquery that pulls up a list of data into an IN statement. The problem is the list of data contains wildcards. Is there any way to do this?
Just something I was curious on.
Example of data in the 2 tables
Parent table
ID Office_Code Employee_Name
1 GG234 Tom
2 GG654 Bill
3 PQ123 Chris
Second table
ID Code_Wildcard
1 GG%
2 PQ%
Clarifying note (via third-party)
Since I'm seeing several responses which don't seems to address what Ziltoid asks, I thought I try clarifying what I think he means.
In SQL, "WHERE col IN (1,2,3)" is roughly the equivalent of "WHERE col = 1 OR col = 2 OR col = 3".
He's looking for something which I'll pseudo-code as
WHERE col IN_LIKE ('A%', 'TH%E', '%C')
which would be roughly the equivalent of
WHERE col LIKE 'A%' OR col LIKE 'TH%E' OR col LIKE '%C'
The Regex answers seem to come closest; the rest seem way off the mark.

I'm not sure which database you're using, but with Oracle you could accomplish something equivalent by aliasing your subquery in the FROM clause rather than using it in an IN clause. Using your example:
select p.*
from
(select code_wildcard
from second
where id = 1) s
join parent p
on p.office_code like s.code_wildcard

In MySQL, use REGEXP:
WHERE field1 REGEXP('(value1)|(value2)|(value3)')
Same in Oracle:
WHERE REGEXP_LIKE(field1, '(value1)|(value2)|(value3)')

Do you mean somethign like:
select * FROM table where column IN (
SELECT column from table where column like '%%'
)
Really this should be written like:
SELECT * FROM table where column like '%%'
Using a sub select query is really beneficial when you have to pull records based on a set of logic that you won't want in the main query.
something like:
SELECT * FROM TableA WHERE TableA_IdColumn IN
(
SELECT TableA_IdColumn FROM TableB WHERE TableA_IDColumn like '%%'
)
update to question:
You can't combine an IN statement with a like statement:
You'll have to do three different like statements to search on the various wildcards.

You could use a LIKE statement to obtain a list of IDs and then use that in the IN statement.
But you can't directly combine IN and LIKE.

Perhaps something like this?
SELECT DISTINCT
my_column
FROM
My_Table T
INNER JOIN My_List_Of_Value V ON
T.my_column LIKE '%' + V.search_value + '%'
In this example I've used a table with the values for simplicity, but you could easily change that to a subquery. If you have a large list (like tens of thousands) then performance might be rough.

select *
from parent
where exists( select *
from second
where office_code like trim( code_wildcard ) );
Trim code_wildcard just in case it has trailing blanks.

You could do the Like part in a subquery perhaps?
Select * From TableA Where X in (Select A from TableB where B Like '%123%')

tsql has the contains statement for a full-text-search enabled table.
CONTAINS(Description, '"sea*" OR "bread*"')

If I'm reading the question correctly, we want all Parent rows that have an Office_code that matches any Code_Wildcard in the "Second" table.
In Oracle, at least, this query achieves that:
SELECT *
FROM parent, second
WHERE office_code LIKE code_wildcard;
Am I missing something?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

IF Field Exists in StandardSQL - google-bigquery

Below is for BigQuery Standard SQL #standardSQL SELECT * FROM `project.dataset.fruits` WHERE EXISTS ( SELECT 1 FROM `project.dataset.fruits` t WHERE REGEXP_CONTAINS(TO_JSON_STRING(t), '[{,]"peaches":') LIMIT 1 )

You may use INFORMATION_SCHEMA SELECT 1 FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS WHERE table_name="fruits" AND column_name="peaches"

Related

AWS Athena Query Structure

How to select the nth column, and order columns' selection in BigQuery

SQL Server : compare two tables with UNION and Select * plus additional label column

"unpivot" data from Excel

Is there any way to combine IN with LIKE in an SQL statement?

Categories

Resources