How to extract the table name from a CREATE/UPDATE/INSERT statement in an SQL query? - sql

I am trying to parse the table being created, inserted into or updated from the following sql queries stored in a table column.
Let's call the table column query. Following is some sample data to demonstrate variations in how the data could look like.
with sample_data as (
select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
select 2 as id, 'CREATE OR REPLACE TABLE tbl1 ...' as query union all
select 3 as id, 'DROP TABLE IF EXISTS tbl1; CREATE TABLE tbl1 ...' as query union all
select 4 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 5 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 6 as id, 'UPDATE tbl3 SET col1 = ...' as query union all
select 7 as id, '/*some garbage comments*/ UPDATE tbl3 SET col1 = ...' as query union all
select 8 as id, 'DELETE tbl4 ...' as query
),
Following are the formats of the queries (we are trying to extract table_name ):
#1
some optional statements like drop table
CREATE some comments or optional statement like OR REPLACE TABLE table_name
everything else
#2
some optional statements like drop table
INSERT some comments INTO some comments table_name
#3
some optional statements like drop table
UPDATE some comments table_name
everything else

Regular Expression
To construct a suitable regex, let's start with the following relatively simple/readable version:
((CREATE( OR REPLACE)?|DROP) TABLE( IF EXISTS)?|UPDATE|DELETE|INSERT INTO) ([^\s\/*]+)
All the spaces above could be replaced with "at least one whitespace character", i.e. \s+. But we also need to allow comments. For a comment that looks like /*anything*/ the regex looks like \/\*.*\*\/ (where the comment characters are escaped with \ and "anything" is the .* in the middle). Given there could be multiple such comments, optionally separated by whitespace, we end up with (\s*\/\*.*\*\/\s*?)*\s+. Plugging this in everywhere there was a space gives:
((CREATE((\s*\/\*.*\*\/\s*?)*\s+OR(\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(\s*\/\*.*\*\/\s*?)*\s+TABLE((\s*\/\*.*\*\/\s*?)*\s+IF(\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(\s*\/\*.*\*\/\s*?)*\s+INTO)(\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)
One further refinement needs to be made: Bracketed expressions have been used for choices, e.g. (CHOICE1|CHOICE2). But this syntax includes them as capturing groups. Actually we only require one capturing group for the table name so we can exclude all the other capturing groups via ?:, e.g. (?:CHOICE1|CHOICE2). This gives:
(?:(?:CREATE(?:(?:\s*\/\*.*\*\/\s*?)*\s+OR(?:\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(?:\s*\/\*.*\*\/\s*?)*\s+TABLE(?:(?:\s*\/\*.*\*\/\s*?)*\s+IF(?:\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(?:\s*\/\*.*\*\/\s*?)*\s+INTO)(?:\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)
Online Regex Demo
Here's a demo of it working with your examples: Regex101 demo
SQL
The Google BigQuery documentation for REGEXP_EXTRACT says it will return the substring matched by the capturing group. So I'd expect something like this to work:
with sample_data as (
select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
select 2 as id, 'CREATE OR REPLACE TABLE tbl1 ...' as query union all
select 3 as id, 'DROP TABLE IF EXISTS tbl1; CREATE TABLE tbl1 ...' as query union all
select 4 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 5 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 6 as id, 'UPDATE tbl3 SET col1 = ...' as query union all
select 7 as id, '/*some garbage comments*/ UPDATE tbl3 SET col1 = ...' as query union all
select 8 as id, 'DELETE tbl4 ...' as query
)
SELECT
*, REGEXP_EXTRACT(query, r"(?:(?:CREATE(?:(?:\s*\/\*.*\*\/\s*?)*\s+OR(?:\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(?:\s*\/\*.*\*\/\s*?)*\s+TABLE(?:(?:\s*\/\*.*\*\/\s*?)*\s+IF(?:\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(?:\s*\/\*.*\*\/\s*?)*\s+INTO)(?:\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)") AS table_name
FROM sample_data;
(The above is untested so please let me know in the comments if there are any issues.)

I think it really depends on your data, but you might find some success using an approach like this:
with data as (
select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
select 2 as id, 'INSERT INTO tbl2 ...' as query union all
select 3 as id, 'UPDATE tbl3 ...' as query union all
select 4 as id, 'DELETE tbl4 ...' as query
),
splitted as (
select id, split(query, ' ') as query_parts from data
)
select
id,
case
when query_parts[safe_offset(0)] in('CREATE', 'INSERT') then query_parts[safe_offset(2)]
when query_parts[safe_offset(0)] in('UPDATE', 'DELETE') then query_parts[safe_offset(1)]
else 'Error'
end as table_name
from splitted
Of course this depends on the cleanliness and syntax in your query column. Also, if your table_name is qualified with project.table.dataset you would need to do further splitting.

Related

In BigQuery, identify when columns do not match on UNION ALL

with
table1 as (
select 'joe' as name, 17 as age, 25 as speed
),
table2 as (
select 'nick' as name, 21 as speed, 23 as strength
)
select * from table1
union all
select * from table2
In Google BigQuery, this union all does not throw an error because both tables have the same number of columns (3 each). However I receive bad data output because the columns do not match. Rather than outputting a new table with 4 columns name, age, speed, strength with correct values + nulls for missing values (which would probably be preferred), the union all keeps the 3 columns from the top row.
Is there a good way to catch that the columns do not match, rather than the query silently returning bad data? Is there any way for this to return an error perhaps, as opposed to a successful table? I'm not sure how to check in SQL that the columns in the 2 tables match.
Edit: in this example it is clear to see that the columns do not match, however in our data we have 100+ columns and we want to avoid a situation where we make an error in a UNION ALL
Below is for BigQuery Standard SQL and using scripting feature of BQ
DECLARE statement STRING;
SET statement = (
WITH table1_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table1` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), table2_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table2` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), all_columns AS (
SELECT column FROM table1_columns UNION DISTINCT SELECT column FROM table2_columns
)
SELECT (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table1` UNION ALL '
FROM all_columns a LEFT JOIN table1_columns t USING(column)
) || (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table2`'
FROM all_columns a LEFT JOIN table2_columns t USING(column)
)
);
EXECUTE IMMEDIATE statement;
when applied to sample data from your question - output is
Row name age speed strength
1 joe 17 25 null
2 nick null 21 23
After saving table1 and table2 as 2 tables in a dataset in BigQuery, I then used the metadata using INFORMATION_SCHEMA to check that the columns matched.
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table1'
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table2'
INFORMATION_SCHEMA.COLUMNS returns information including the column names and their positioning. I can join these 2 tables together then to check that the names match...

SQL Search rows that contain strings from 2nd table list

I have a master table that contains a list of strings to search for. it returns TRUE/FALSE if any string in the cell contains text from the master lookup table. Currently I use excel's
=SUMPRODUCT(--ISNUMBER(SEARCH(masterTable,[#searchString])))>0
is there a way to do something like this in SQL? LEFT JOIN or OUTER APPLY would be simple solutions if the strings were equal; but they need be contains..
SELECT *
FROM t
WHERE col1 contains(lookupString,lookupColumn)
--that 2nd table could be maintained and referenced from multiple queries
hop
bell
PRS
2017
My desired results would be a column that shows TRUE/FALSE if the row contains any string from the lookup table
SEARCH_STRING Contained_in_lookup_column
hopping TRUE
root FALSE
Job2017 TRUE
PRS_tool TRUE
hand FALSE
Sorry i dont have access to the DB now to confirm the syntax, but should be something like this:
SELECT t.name,
case when (select count(1) from data_table where data_col like '%' || t.name || '%' > 0) then 'TRUE' else 'FALSE' end
FROM t;
or
SELECT t.name,
case when exists(select null from data_table where data_col like '%' || t.name || '%') then 'TRUE' else 'FALSE' end
FROM t;
Sérgio
You can use a combination of % wildcards with LIKE and EXISTS.
Example (using Oracle syntax) - we have a v_data table containing the data and a v_queries table containing the query terms:
with v_data (pk, value) as (
select 1, 'The quick brown fox jumps over the lazy dog' from dual union all
select 2, 'Yabba dabba doo' from dual union all
select 3, 'forty-two' from dual
),
v_queries (text) as (
select 'quick' from dual union all
select 'forty' from dual
)
select * from v_data d
where exists (
select null
from v_queries q
where d.value like '%' || q.text || '%');

What is the equivalent of the Oracle "Dual" table in MS SqlServer?

What is the equivalent of the Oracle "Dual" table in MS SqlServer?
This is my Select:
SELECT pCliente,
'xxx.x.xxx.xx' AS Servidor,
xxxx AS Extension,
xxxx AS Grupo,
xxxx AS Puerto
FROM DUAL;
In sql-server, there is no dual you can simply do
SELECT pCliente,
'xxx.x.xxx.xx' AS Servidor,
xxxx AS Extension,
xxxx AS Grupo,
xxxx AS Puerto
However, if your problem is because you transfered some code from Oracle which reference to dual you can re-create the table :
CREATE TABLE DUAL
(
DUMMY VARCHAR(1)
)
GO
INSERT INTO DUAL (DUMMY)
VALUES ('X')
GO
You don't need DUAL in MSSQLserver
in oracle
select 'sample' from dual
is equal to
SELECT 'sample'
in sql server
While you usually don't need a DUAL table in SQL Server as explained by Jean-François Savard, I have needed to emulate DUAL for syntactic reasons in the past. Here are three options:
Create a DUAL table or view
-- A table
SELECT 'X' AS DUMMY INTO DUAL;
-- A view
CREATE VIEW DUAL AS SELECT 'X' AS DUMMY;
Once created, you can use it just as in Oracle.
Use a common table expression or a derived table
If you just need DUAL for the scope of a single query, this might do as well:
-- Common table expression
WITH DUAL(DUMMY) AS (SELECT 'X')
SELECT * FROM DUAL
-- Derived table
SELECT *
FROM (
SELECT 'X'
) DUAL(DUMMY)
In SQL Server there is no dual table. If you want to put a WHERE clause, you can simple put it directly like this:
SELECT 123 WHERE 1<2
I think in MySQL and Oracle they need a FROM clause to use a WHERE clause.
SELECT 123 FROM DUAL WHERE 1<2
This could be of some help I guess, when you need to join some tables based on local variables and get the information from those tables:
Note: Local variables must have been
Select #XCode as 'XCode '
,#XID as 'XID '
,x.XName as 'XName '
,#YCode as 'YCode '
,#YID as 'YID '
,y.YName as 'YName '
From (Select 1 as tst) t
Inner join Xtab x on x.XID = #XID
Inner join Ytab y on y.YID = #YID
It's much simpler than that.
Use literal values to establish data types.
Put quotes around column names if they need special characters.
Skip the WHERE clause if you need 1 row of data:
SELECT 'XCode' AS XCode
,1 AS XID
,'XName' AS "X Name"
,'YCode' AS YCode
,getDate() AS YID
,'YName' AS "Your Name"
WHERE 1 = 0

Same query but different tables

I'm faced with a big query that is generated in a string and executed with "OPEN pCursor FOR vQuery" and I'm trying to get the query out of the string variable and as a proper "compilable" query.
I'm having this problem where a different table is query depending on a variable
vQuery := 'SELECT ...';
IF pVar = 1 Then
vQuery := vQuery || ' FROM table1';
ELSE
vQuery := vQuery || ' FROM table2';
END IF
vQuery := vQuery || ' WHERE ...';
The two tables have pretty much the same column name. Is there a way to have this as a single query
OPEN Pcursorout FOR
SELECT ... FROM CASE WHEN pVar = 1 THEN table1 ELSE table1 END WHERE ...;
Or I'm stuck at having two queries?
IF pVar = 1 Then
OPEN Pcursorout FOR SELECT ... FROM table1 WHERE ...;
ELSE
OPEN Pcursorout FOR SELECT ... FROM table2 WHERE ...;
END IF
The select and where part are large and exactly the same for both table.
You could use a UNION and use your variable pVar to only include the results from one query in the result set.
SELECT t1.col1, t1.col2, ..., t1.col10
FROM table1 t1
WHERE pVar = 1 and ...
UNION
SELECT t2.col1, t2.col2, ..., t2.col10
FROM table1 t2
WHERE pVar <> 1 and ...
This isn't exactly what you asked about -- not being required to have duplicate lists of columns for the two select statements -- but I think it might capture your intent. It will require that the columns selected by both queries have the same datatype so there will be a (somewhat weak) constraint that the columns of both query results are the same. For example, you won't be able to add a new column to one query but not the other.
Perhaps using UNION / UNION ALL to unite both queries? The requirement for using UNION/UNION ALL is that all SELECTs being united must return columns with the same names.
So if you have
SELECT t.f1,
t.f2,
t.f3
FROM t
WHERE ...
and your other query is
SELECT q.f1,
q.f2,
q.f3
FROM q
WHERE ...
you can have both running as a single SQL statement with UNION:
SELECT t.f1,
t.f2,
t.f3
FROM t
WHERE ...
UNION
SELECT q.f1,
q.f2,
q.f3
FROM q
WHERE ...
Keep in mind that if you need to return columns that exist in one table but not in the other, you can still use UNION, just return NULL and name the column correspondingly to the column name in the table that has it.
Its a bit of a kludge and you might need to look at the performance impact, but you could use an inline view that unions the two base tables, with a flag on each part that you then compare to your variable
SELECT ...
FROM (
SELECT 1 as var, table1.*
FROM table1
UNION ALL
SELECT 2 as var, table2.*
FROM table2
) t
WHERE t.var = pVar
AND ...;
Using an inline view means you don't have to duplicate the main select-list or the where clause etc. If the tables have different columns then you can (and maybe should anyway) only select the columns in the inner queries that will be referenced in the outer select-list.

T-SQL Comma delimited value from resultset to in clause in Subquery

I have an issue where in my data I will have a record returned where a column value will look like
-- query
Select col1 from myTable where id = 23
-- result of col1
111, 104, 34, 45
I want to feed these values to an in clause. So far I have tried:
-- Query 2 -- try 1
Select * from mytableTwo
where myfield in (
SELECT col1
from myTable where id = 23)
-- Query 2 -- try 2
Select * from mytableTwo
where myfield in (
SELECT '''' +
Replace(col1, ',', ''',''') + ''''
from myTable where id = 23)
-- query 2 test -- This works and will return data, so I verify here that data exists
Select * from mytableTwo
where myfield in ('111', '104', '34', '45')
Why aren't query 2 try 1 or 2 working?
You don't want an in clause. You want to use like:
select *
from myTableTwo t2
where exists (select 1
from myTable t
where id = 23 and
', '+t.col1+', ' like '%, '+t2.myfield+', %'
);
This uses like for the comparison in the list. It uses a subquery for the value. You could also phrase this as a join by doing:
select t2.*
from myTableTwo t2 join
myTable t
on t.id = 23 and
', '+t.col1+', ' like '%, '+t2.myfield+', %';
However, this could multiply the number of rows in the output if there is more than one row with id = 23 in myTable.
If you observe closely, Query 2 -- try 1 & Query 2 -- try 2 are considered as single value.
like this :
WHERE myfield in ('111, 104, 34, 45')
which is not same as :
WHERE myfield in ('111', '104', '34', '45')
So, If you intend to filter myTable rows from MyTableTwo, you need to extract the values of fields column data to a table variable/table valued function and filter the data.
I have created a table valued function which takes comma seperated string and returns a table value.
you can refer here T-SQL : Comma separated values to table
Final code to filter the data :
DECLARE #filteredIds VARCHAR(100)
-- Get the filter data
SELECT #filteredIds = col1
FROM myTable WHERE id = 23
-- TODO : Get the script for [dbo].[GetDelimitedStringToTable]
-- from the given link and execute before this
SELECT *
FROM mytableTwo T
CROSS APPLY [dbo].[GetDelimitedStringToTable] ( #filteredIds, ',') F
WHERE T.myfield = F.Value
Please let me know If this helps you!
I suppose col is a character type, whose result would be like like '111, 104, 34, 45'. If this is your situation, it's not the best of the world (denormalized database), but you can still relate these tables by using character operators like LIKE or CHARINDEX. The only gotcha is to convert the numeric column to character -- the default conversion between character and numeric is numeric and it will cause a conversion error.
Since #Gordon, responded using LIKE, I present a solution using CHARINDEX:
SELECT *
FROM mytableTwo tb2
WHERE EXISTS (
SELECT 'x'
FROM myTable tb1
WHERE tb1.id = 23
AND CHARINDEX(CONVERT(VARCHAR(20), tb2.myfield), tb1.col1) > 0
)