SQL if statement to select items form different tables - sql

I am creating a new table joining 3 different tables. The problem is that I have some data that I want to select for other_info divided into two different tables. table_1 has preference over table_2, but it is possible that in table_1 are missing values. So, I want to select the value of box if it's not empty from table_1 and select it from table_2 if the value in table_1 does not exist.
This is the code I have very simplified, but I think it's enough to see what I want to do. I've written an IF ... ELSE statement inside a with, and this is the error I get:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword IF at [26:5]
Besides, I've tried different things inside the conditional of the if, but none of them is what I expect. Here is the code:
CREATE OR REPLACE TABLE `new_table`
PARTITION BY
Current_date
AS (
WITH info AS (
SELECT
Date AS Day,
Box,
FROM
`table_1`
),
other_info AS (
IF (...)
BEGIN{
SELECT
Date AS Day,
Box
FROM
`table_1`}
END
ELSE
BEGIN{
SELECT
Date AS Day,
Box
FROM
`table_2`}
END
)
SELECT
Date
Box
Box_description
FROM
`table_3`
LEFT JOIN info(Day)
LEFT JOIN other_info(Day)
)

You're not going to be able to embed an IF within a CTE or a Create-Table-As.
An alternative structure can be to union two queries with mutually exclusive WHERE clauses... (Such that only one of the two queries ever returns anything.)
For example, if the code below, something is checked for being NULL or NOT NULL, and so only one of the two can ever return data.
WITH
info AS
(
SELECT
Date AS Day,
Box,
FROM
`table_1`
),
other_info AS
(
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
THIS BIT
--------------------------------------------------------------------------------
SELECT
Date AS Day,
Box
FROM
`table_1`
WHERE
(SELECT MAX(x) FROM y) IS NULL
UNION ALL
SELECT
Date AS Day,
Box
FROM
`table_2`
WHERE
(SELECT MAX(x) FROM y) IS NOT NULL
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
)
SELECT
Date
Box
Box_description
FROM
`table_3`
LEFT JOIN info(Day)
LEFT JOIN other_info(Day)

In stead of the if..., you could do something like this (in MySQL):
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2 WHERE `date` NOT IN (SELECT `date` FROM table1)
I am not sure (as in: I did not test), but I do think this is also possible in google-bigquery
see: DBFIDDLE

Related

use multiple LEFT JOINs from multiple datasets SQL

I need to perform multiple JOINs, I am grabbing the data from multiple tables and JOINing on id. The tricky part is that one table I need to join twice. Here is the code:
(
SELECT
content.brand_identifier AS brand_name,
CAST(timestamp(furniture.date) AS DATE) AS order_date,
total_hearst_commission
FROM
`furniture_table` AS furniture
LEFT JOIN `content_table` AS content ON furniture.site_content_id = content.site_content_id
WHERE
(
timestamp(furniture.date) >= TIMESTAMP('2020-06-01 00:00:00')
)
)
UNION
(
SELECT
flowers.a_merchant_name AS merchant_name
FROM
`flowers_table` AS flowers
LEFT JOIN `content` AS content ON flowers.site_content_id = content.site_content_id
)
GROUP BY
1,
2,
3,
4
ORDER BY
4 DESC
LIMIT
500
I thought I could use UNION but it gives me an error Syntax error: Expected keyword ALL or keyword DISTINCT but got "("
I'm not able to comment, but like GHB states, the queries do not have the same number of columns; therefore, UNION will not work here.
I think it would be helpful to know why sub-queries are needed in the first place. I'm guessing this query does not product the results you want, so please elaborate on why that is.
select
f.a_merchant_name as merchant_name,
c.brand_identifier as brand_name,
CAST(timestamp(f.date) AS DATE) AS order_date,
total_hearst_commission
from furniture_table f
left join content_table c on c.site_content_id = f.site_content_id
where timestamp(f.date) >= TIMESTAMP('2020-06-01 00:00:00')
group by 1,2,3,4

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

LEFT JOIN on static list of items?

DBMS is intersystems-cache!
Motivation: I need to do a left join on a table so I can get the same list of message types every time, even if the result is zero or null. Unfortunately, this is a large table so including a SELECT DISTINCT() is prohibitively slow. These should never change, so I thought I'd get the list once and just join them statically.
Based on another SO question, here is what I have to replace the SELECT DISTINCT():
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
This returns results that look exactly as expected, identical to the Distinct query. However, when I plug this into my JOIN statement, all the counts come back as zero.
Failing Query
SELECT mh.MessageBodyClassName, count(l.MessageBodyClassName) as MessageCount FROM
(
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
) mh LEFT JOIN
(
SELECT messageBodyClassName FROM ens.messageheader WHERE TimeCreated > DATEADD(hh, -1, GETUTCDATE())
) l ON mh.MessageBodyClassName = l.MessageBodyClassName
GROUP BY mh.MessageBodyClassName
Failed results
MessageBodyClassName MessageCount
------------------------------------- ------------
HS.MESSAGE.GATEWAYREGISTRATIONREQUEST 0
HS.MESSAGE.MERGEPATIENTREQUEST 0
HS.MESSAGE.PATIENTSEARCHREQUEST 0
Working Query
SELECT mh.MessageBodyClassName, count(l.MessageBodyClassName) as MessageCount FROM
(
SELECT DISTINCT(MessageBodyClassName) FROM ens.messageheader
) mh LEFT JOIN
(
SELECT messageBodyClassName FROM ens.messageheader WHERE TimeCreated > DATEADD(hh, -1, GETUTCDATE())
) l ON mh.MessageBodyClassName = l.MessageBodyClassName
GROUP BY mh.MessageBodyClassName
Working and expected results
MessageBodyClassName MessageCount
------------------------------------- ------------
HS.MESSAGE.GATEWAYREGISTRATIONREQUEST 0
HS.MESSAGE.MERGEPATIENTREQUEST 0
HS.MESSAGE.PATIENTSEARCHREQUEST 54
For VKP: Why are the results different? How can I adjust the first query with literals to get the proper (same) results?
The last thing I can think of is to run your DISTINCT query once into a permanent table in your database. That way the inner SELECT in your query will only have to process those three lines. The inner query would lose DISTINCT, like
SELECT MessageBodyClassName FROM ens.messageheader_permvals
EDIT: The below answer did not work
This may be a longshot, but if it doesn't work it might help you diagnose the problem. Instead of the UNION try
SELECT MessageBodyClassName FROM ens.messageheader
WHERE MessageBodyClassName in (
'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST',
'HS.MESSAGE.MERGEPATIENTREQUEST',
'HS.MESSAGE.PATIENTSEARCHREQUEST')
That should return records only if those values actually exist in the table and are compatible with the format of MessageBodyClassName, which we know works using the DISTINCT version. I don't know if the performance will be better this way, but hopefully it will shed some light on the issue.
EDIT: the below answer does not apply, as the OP is was actually trying to select the literal quoted values
You don't have a FROM statements in your UNION query. Try
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
FROM ens.messageheader
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
FROM ens.messageheader
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
FROM ens.messageheader
The rest of the query looks right.
I agree with xQbert, problem is the hard codes values
Try
SELECT T1.MessageBodyClassName, T2.MessageBodyClassName
FROM (
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
) as T1
LEFT JOIN (
SELECT DISTINCT(MessageBodyClassName) as MessageBodyClassName
FROM ens.messageheader
) as T2
ON T1.MessageBodyClassName = T2.MessageBodyClassName
Possible solution: Create a temporal table
CREATE TABLE className as
SELECT DISTINCT(MessageBodyClassName) as MessageBodyClassName
FROM ens.messageheader

SQL Logic: Finding Non-Duplicates with Similar Rows

I'll do my best to summarize what I am having trouble with. I never used much SQL until recently.
Currently I am using SQL Server 2012 at work and have been tasked with trying to find oddities in SQL tables. Specifically, the tables contain similar information regarding servers. Kind of meta, I know. So they each share a column called "DB_NAME". After that, there are no similar columns. So I need to compare Table A and Table B and produce a list of records (servers) where a server is NOT listed in BOTH Table A and B. Additionally, this query is being ran against an exception list. I'm not 100% sure of the logic to best handle this. And while I would love to get something "extremely efficient", I am more-so looking at something that just plain works at the time being.
SELECT *
FROM (SELECT
UPPER(ta.DB_NAME) AS [DB_Name]
FROM
[CMS].[dbo].[TABLE_A] AS ta
UNION
SELECT
UPPER(tb.DB_NAME) AS [DB_Name]
FROM
[CMS].[dbo].[TABLE_B] as tb
) AS SQLresults
WHERE NOT EXISTS (
SELECT *
FROM
[CMS].[dbo].[TABLE_C_EXCEPTIONS] as tc
WHERE
SQLresults.[DB_Name] = tc.DB_NAME)
ORDER BY SQLresults.[DB_Name]
One method uses union all and aggregation:
select ab.*
from ((select upper(name) as name, 'A' as which
from CMS.dbo.TABLE_A
) union all
(select upper(name), 'B' as which
from CMS.dbo.TABLE_B
)
) ab
where not exists (select 1
from CMS.dbo.TABLE_C_EXCEPTION e
where upper(e.name) = ab.name
)
having count(distinct which) <> 2;
SQL Server is case-insensitive by default. I left the upper()s in the query in case your installation is case sensitive.
Here is another option using EXCEPT. I added a group by in each half of the union because it was not clear in your original post if DB_NAME is unique in your tables.
select DatabaseName
from
(
SELECT UPPER(ta.DB_NAME) AS DatabaseName
FROM [CMS].[dbo].[TABLE_A] AS ta
GROUP BY UPPER(ta.DB_NAME)
UNION ALL
SELECT UPPER(tb.DB_NAME) AS DatabaseName
FROM [CMS].[dbo].[TABLE_B] as tb
GROUP BY UPPER(tb.DB_NAME)
) x
group by DatabaseName
having count(*) < 2
EXCEPT
(
select DN_Name
from CMS.dbo.TABLE_C_EXCEPTION
)

differentiate rows in a union table

I m selecting data from two different tables with no matching columns using this sql query
select * from (SELECT s.shout_id, s.user_id, s.time FROM shouts s
union all
select v.post_id, v.sender_user_id, v.time from void_post v)
as derived_table order by time desc;
Now is there any other way or with this sql statement only can i
differentiate the data from the two tables.
I was thinking of a dummy row that can be created at run-time(in the select statement only ) which would flag the row from the either tables.
As there is no way i can differentiate the shout_id that is thrown in the unioned table is
shout_id from the shout table or from the void_post table.
Thanks
Pradyut
You can just include an extra column in each select (I'd suggest a BIT)
select * from
(SELECT s.shout_id, s.user_id, s.time, 1 AS FromShouts FROM shouts s
union all
select v.post_id, v.sender_user_id, v.time, 0 AS FromShouts from void_post v)
as derived_table order by time desc;
Sure, just add a new field in your select statement called something like source with a different constant value for each source.
SELECT s.shout_id, s.user_id, s.time, 'shouts' as source FROM shouts s
UNION ALL
SELECT v.post_id, v.sender_user_id, v.time, 'void_post' as source FROM void_post v
A dummy variable is a nice way to do it. There isn't much overhead in the grand scheme of things.
p.s., the dummy variable represents a column and not a row.