IF Conditional to Run Schedule Query - google-bigquery

I'm using BigQuery. I have a query-scheduler to generate a table (RESULT TABLE) that depends on another table (SOURCE TABLE). The case is, this source table doesn't always have data, there's a possibility that this source table is empty.
I want to Schedule the Query to make the RESULT TABLE only if there's data in SOURCE TABLE.
The example would be:
IF COUNT(1) FROM data.source_table > 0 THEN RUN:
SELECT *
FROM data.source_table
LEFT JOIN data.other_source_table
ELSE [Don't Run]
Thanks in Advance

The syntax is
IF condition THEN [sql_statement_list]
[ELSEIF condition THEN sql_statement_list]
[ELSEIF condition THEN sql_statement_list]...
[ELSE sql_statement_list]
END IF;
So for your case it's
IF COUNT(1) FROM data.source_table > 0
THEN
SELECT *
FROM data.source_table
LEFT JOIN data.other_source_table;
END IF;
For more details, you can read https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#if

At the moment you can't set a destination table when using BigQuery Scripting. It means that solutions based on IF statement will not work for your case.
Besides that, it seems that when you set a destination table, BigQuery creates the table before your query's execution, which means that independently of the results, the table will be created.
The query below is only SQL. In other words, it doesn't contains scripting. If you use it to create a scheduled query and set a destination table, you will see that even when the sub query is not run an empty table will be created.
SELECT
*
FROM
UNNEST(
(SELECT
(
CASE (SELECT COUNT(1) FROM data.source_table) > 0
WHEN TRUE
THEN (
SELECT ARRAY(
SELECT AS STRUCT *
FROM data.source_table
LEFT JOIN data.other_source_table)
)
END
)
)
)
As a workaround, you could keep your existing scheduled query and create another scheduled query just like below to run some minutes after the first one:
IF (SELECT count(1) FROM `dataset.destination_table`) = 0
THEN DROP TABLE `dataset.destination_table`;
END IF
To summarize, your solution would be:
Run a scheduled query that will create a destination table,
A few minutes later, run a scheduled query that will check if the created table is empty. If so, the table will be deleted.
I hope it helps

Related

Hive: read table partitions defined in subselect

I have a Hive table which is partitioned by partitionDate field.
I can read partition of my choice via simple
select * from myTable where partitionDate = '2000-01-01'
My task is to specify the partition of my choise dynamically. I.e. first I want to read it from some table, and only then run select to myTable. And of course, I want the power of partitions to be used.
I have written a query which looks like
select * from myTable mt join thatTable tt on tt.reportDate = mt.partitionDate
The query works but looks like partitions are not used. The query works too long.
I tried another approach:
select * from myTable where partitionDate in (select reportDate from thatTable)
.. and again I see that the query works too slowly.
Is there a way to implement this in Hive?
update: create table for myTable
CREATE TABLE `myTable`(
`theDate` string,
')
PARTITIONED BY (
`partitionDate` string)
TBLPROPERTIES (
'DO_NOT_UPDATE_STATS'='true',
'STATS_GENERATED_VIA_STATS_TASK'='true',
'spark.sql.create.version'='2.2 or prior',
'spark.sql.sources.schema.numPartCols'='1',
'spark.sql.sources.schema.numParts'='2',
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"theDate","type":"string","nullable":true}...
'spark.sql.sources.schema.part.1'='{"name":"partitionDate","type":"string","nullable":true}...',
'spark.sql.sources.schema.partCol.0'='partitionDate')
If you are running Hive on Tez execution engine, try
set hive.tez.dynamic.partition.pruning=true;
Read more details and related configuration in the Jira HIVE-7826
and at the same time try to rewrite as a LEFT SEMI JOIN:
select *
from myTable t
left semi join (select distinct reportDate from thatTable) s on t.partitionDate = s.reportDate
If nothing helps, see this workaround: https://stackoverflow.com/a/56963448/2700344
Or this one: https://stackoverflow.com/a/53279839/2700344
Similar question: Hive Query is going for full table scan when filtering on the partitions from the results of subquery/joins

SQL - Delete selected row/s from database

I'm quite new to SQL and I'm having issues with deleting a selected row/s from a table.
I've written a query that selects the desired rows from the table, but when I try to execute DELETE FROM table_name WHERE EXISTS it deletes all the rows in the database.
Here is my complete query:
DELETE FROM USR_PREF WHERE EXISTS (
SELECT *
FROM USR_PREF
WHERE USR_PREF.USR_ID = 1
AND ((USR_PREF.SRV NOT IN (SELECT SEC_ENTITY_FOR_USR_ACTION_VIEW.ENTITYT_ID
FROM SEC_ENTITY_FOR_USR_ACTION_VIEW
WHERE SEC_ENTITY_FOR_USR_ACTION_VIEW.USR_ID = 1
AND SEC_ENTITY_FOR_USR_ACTION_VIEW.ENTITYTYP_CODE = 2
AND USR_PREF.DEVICE IS NULL)
OR (USR_PREF.DEVICE NOT IN (SELECT SEC_ENTITY_FOR_USR_ACTION_VIEW.ENTITYT_ID
FROM SEC_ENTITY_FOR_USR_ACTION_VIEW
WHERE SEC_ENTITY_FOR_USR_ACTION_VIEW.USR_ID = 1
AND SEC_ENTITY_FOR_USR_ACTION_VIEW.ENTITYTYP_CODE = 3)))))
The select query returns the desired rows, but the DELETE command just deletes that entire table.
Please assist.
Your where clause WHERE EXISTS (SOME QUERY) is the problem here. You are basically saying "Delete everything if this subquery returns even one result".
You need to be more explicit. Perhaps something like:
DELETE FROM USR_PREF
WHERE USR_FIELD IN (
SELECT USR_FIELD
FROM USR_PREF
WHERE USR_PREF_T.USER_ID=1
AND ((USR_PREF.SRV NOT IN ...
and so on... With this, only records that match records returned in your subquery will be deleted.

Copy results of joined query into identical table in other database

I am trying to take the results of a select query in SQL and place them in another table in a different database. The table structure is identical. The select query is as follows;
USE Warwick
Go
Select tblOperations.Link, Project.*
From tblOperations
Inner Join Warwick.dbo.Project
On tblOperations.Link= Warwick.dbo.Project.[Project ID]
Where tblOperations.Job# = Warwick.dbo.Project.[Job Number] and
tblOperations.[Status] = 'Active' or tblOperations.[Status] = 'Pending'
The join lets me select just the jobs that are considered active. I need to copy the results into the table WCI_DB.dbo.Project, which already exists. I would lke to append and not overwrite if the record exists.
Any help would be appreciated.
Thanks.
You should tag your question with the database, which seems to be SQL Server. The SQL syntax is insert:
insert into WCI_DB.dbo.Project
<your select here>;
Normally, you want to list columns after the table name:
insert into WCI_DB.dbo.Project(list of columns>
<your select here>;
However, if this is a one-time exercise and you know the columns are the same, then it is small sin to omit them once.
To create a new table, using select into, which is documented here.
select . . .
into WCI_DB.dbo.Project
. . .

SQL Server Empty Result

I have a valid SQL select which returns an empty result, up and until a specific transaction has taken place in the environment.
Is there something available in SQL itself, that will allow me to return a 0 as opposed to an empty dataset? Similar to isNULL('', 0) functionality. Obviously I tried that and it didn't work.
PS. Sadly I don't have access to the database, or the environment, I have an agent installed that is executing these queries so I'm limited to solving this problem with just SQL.
FYI: Take any select and run it where the "condition" is not fulfilled (where LockCookie='777777777' for example.) If that condition is never met, the result is empty. But at some point the query will succeed based on a set of operations/tasks that happen. But I would like to return 0, up until that event has occurred.
You can store your result in a temp table and check ##rowcount.
select ID
into #T
from YourTable
where SomeColumn = #SomeValue
if ##rowcount = 0
select 0 as ID
else
select ID
from #T
drop table #T
If you want this as one query with no temp table you can wrap your query in an outer apply against a dummy table with only one row.
select isnull(T.ID, D.ID) as ID
from (values(0)) as D(ID)
outer apply
(
select ID
from YourTable
where SomeColumn = #SomeValue
) as T
alternet way is from code, you can check count of DataSet.
DsData.Tables[0].Rows.count > 0
make sure that your query matches your conditions

Check whether a table contains rows or not sql server 2005

How to Check whether a table contains rows or not sql server 2005?
For what purpose?
Quickest for an IF would be IF EXISTS (SELECT * FROM Table)...
For a result set, SELECT TOP 1 1 FROM Table returns either zero or one rows
For exactly one row with a count (0 or non-zero), SELECT COUNT(*) FROM Table
Also, you can use exists
select case when exists (select 1 from table)
then 'contains rows'
else 'doesnt contain rows'
end
or to check if there are child rows for a particular record :
select * from Table t1
where exists(
select 1 from ChildTable t2
where t1.id = t2.parentid)
or in a procedure
if exists(select 1 from table)
begin
-- do stuff
end
Like Other said you can use something like that:
IF NOT EXISTS (SELECT 1 FROM Table)
BEGIN
--Do Something
END
ELSE
BEGIN
--Do Another Thing
END
FOR the best performance, use specific column name instead of * - for example:
SELECT TOP 1 <columnName>
FROM <tableName>
This is optimal because, instead of returning the whole list of columns, it is returning just one. That can save some time.
Also, returning just first row if there are any values, makes it even faster. Actually you got just one value as the result - if there are any rows, or no value if there is no rows.
If you use the table in distributed manner, which is most probably the case, than transporting just one value from the server to the client is much faster.
You also should choose wisely among all the columns to get data from a column which can take as less resource as possible.
Can't you just count the rows using select count(*) from table (or an indexed column instead of * if speed is important)?
If not then maybe this article can point you in the right direction.
Fast:
SELECT TOP (1) CASE
WHEN **NOT_NULL_COLUMN** IS NULL
THEN 'empty table'
ELSE 'not empty table'
END AS info
FROM **TABLE_NAME**